Sharad Singhal, The Machine & Matthias Becker, University of Bonn | HPE Discover Madrid 2017

>> Announcer: Live from Madrid, Spain, it's theCUBE, covering HPE Discover Madrid 2017, brought to you by Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is theCUBE, the leader in live tech coverage and my name is Dave Vellante, and I'm here with Peter Burris, this is day two of HPE Hewlett Packard Enterprise Discover in Madrid, this is their European version of a show that we also cover in Las Vegas, kind of six month cadence of innovation and organizational evolution of HPE that we've been tracking now for several years. Sharad Singal is here, he covers software architecture for the machine at Hewlett Packard Enterprise, and Matthias Becker, who's a postdoctoral researcher at the University of Bonn. Gentlemen, thanks so much for coming in theCUBE. >> Thank you. >> No problem. >> You know, we talk a lot on theCUBE about how technology helps people make money or save money, but now we're talking about, you know, something just more important, right? We're talking about lives and the human condition and >> Peter: Hard problems to solve. >> Specifically, yeah, hard problems like Alzheimer's. So Sharad, why don't we start with you, maybe talk a little bit about what this initiative is all about, what the partnership is all about, what you guys are doing. >> So we started on a project called the Machine Project about three, three and a half years ago and frankly at that time, the response we got from a lot of my colleagues in the IT industry was "You guys are crazy", (Dave laughs) right. We said we are looking at an enormous amount of data coming at us, we are looking at real time requirements on larger and larger processing coming up in front of us, and there is no way that the current architectures of the computing environments we create today are going to keep up with this huge flood of data, and we have to rethink how we do computing, and the real question for those of us who are in research in Hewlett Packard Labs, was if we were to design a computer today, knowing what we do today, as opposed to what we knew 50 years ago, how would we design the computer? And this computer should not be something which solves problems for the past, this should be a computer which deals with problems in the future. So we are looking for something which would take us for the next 50 years, in terms of computing architectures and what we will do there. In the last three years we have gone from ideas and paper study, paper designs, and things which were made out of plastic, to a real working system. We have around Las Vegas time, we'd basically announced that we had the entire system working with actual applications running on it, 160 terabytes of memory all addressable from any processing core in 40 computing nodes around it. And the reason is, although we call it memory-driven computing, it's really thinking in terms of data-driven computing. The reason is that the data is now at the center of this computing architecture, as opposed to the processor, and any processor can return to any part of the data directly as if it was doing, addressing in local memory. This provides us with a degree of flexibility and freedom in compute that we never had before, and as a software person, I work in software, as a software person, when we started looking at this architecture, our answer was, well, we didn't know we could do this. Now if, given now that I can do this and I assume that I can do this, all of us in the programmers started thinking differently, writing code differently, and we suddenly had essentially a toy to play with, if you will, as programmers, where we said, you know, this algorithm I had written off decades ago because it didn't work, but now I have enough memory that if I were to think about this algorithm today, I would do it differently. And all of a sudden, a new set of algorithms, a new set of programming possibilities opened up. We worked with a number of applications, ranging from just Spark on this kind of an environment, to how do you do large scale simulations, Monte Carlo simulations. And people talk about improvements in performance from something in the order of, oh I can get you a 30% improvement. We are saying in the example applications we saw anywhere from five, 10, 15 times better to something which where we are looking at financial analysis, risk management problems, which we can do 10,000 times faster. >> So many orders of magnitude. >> Many, many orders >> When you don't have to wait for the horrible storage stack. (laughs) >> That's right, right. And these kinds of results gave us the hope that as we look forward, all of us in these new computing architectures that we are thinking through right now, will take us through this data mountain, data tsunami that we are all facing, in terms of bringing all of the data back and essentially doing real-time work on those. >> Matthias, maybe you could describe the work that you're doing at the University of Bonn, specifically as it relates to Alzheimer's and how this technology gives you possible hope to solve some problems. >> So at the University of Bonn, we work very closely with the German Center for Neurodegenerative Diseases, and in their mission they are facing all diseases like Alzheimer's, Parkinson's, Multiple Sclerosis, and so on. And in particular Alzheimer's is a really serious disease and for many diseases like cancer, for example, the mortality rates improve, but for Alzheimer's, there's no improvement in sight. So there's a large population that is affected by it. There is really not much we currently can do, so the DZNE is focusing on their research efforts together with the German government in this direction, and one thing about Alzheimer's is that if you show the first symptoms, the disease has already been present for at least a decade. So if you really want to identify sources or biomarkers that will point you in this direction, once you see the first symptoms, it's already too late. So at the DZNE they have started on a cohort study. In the area around Bonn, they are now collecting the data from 30,000 volunteers. They are planning to follow them for 30 years, and in this process we generate a lot of data, so of course we do the usual surveys to learn a bit about them, we learn about their environments. But we also do very more detailed analysis, so we take blood samples and we analyze the complete genome, and also we acquire imaging data from the brain, so we do an MRI at an extremely high resolution with some very advanced machines we have, and all this data is accumulated because we do not only have to do this once, but we try to do that repeatedly for every one of the participants in the study, so that we can later analyze the time series when in 10 years someone develops Alzheimer's we can go back through the data and see, maybe there's something interesting in there, maybe there was one biomarker that we are looking for so that we can predict the disease better in advance. And with this pile of data that we are collecting, basically we need something new to analyze this data, and to deal with this, and when we heard about the machine, we though immediately this is a system that we would need. >> Let me see if I can put this in a little bit of context. So Dave lives in Massachusetts, I used to live there, in Framingham, Massachusetts, >> Dave: I was actually born in Framingham. >> You were born in Framingham. And one of the more famous studies is the Framingham Heart Study, which tracked people over many years and discovered things about heart disease and relationship between smoking and cancer, and other really interesting problems. But they used a paper-based study with an interview base, so for each of those kind of people, they might have collected, you know, maybe a megabyte, maybe a megabyte and a half of data. You just described a couple of gigabytes of data per person, 30,000, multiple years. So we're talking about being able to find patterns in data about individuals that would number in the petabytes over a period of time. Very rich detail that's possible, but if you don't have something that can help you do it, you've just collected a bunch of data that's just sitting there. So is that basically what you're trying to do with the machine is the ability to capture all this data, to then do something with it, so you can generate those important inferences. >> Exactly, so with all these large amounts of data we do not only compare the data sets for a single person, but once we find something interesting, we have also to compare the whole population that we have captured with each other. So there's really a lot of things we have to parse and compare. >> This brings together the idea that it's not just the volume of data. I also have to do analytics and cross all of that data together, right, so every time a scientist, one of the people who is doing biology studies or informatic studies asks a question, and they say, I have a hypothesis which this might be a reason for this particular evolution of the disease or occurrence of the disease, they then want to go through all of that data, and analyze it as as they are asking the question. Now if the amount of compute it takes to actually answer their questions takes me three days, I have lost my train of thought. But if I can get that answer in real time, then I get into this flow where I'm asking a question, seeing the answer, making a different hypothesis, seeing a different answer, and this is what my colleagues here were looking for. >> But if I think about, again, going back to the Framingham Heart Study, you know, I might do a query on a couple of related questions, and use a small amount of data. The technology to do that's been around, but when we start looking for patterns across brain scans with time series, we're not talking about a small problem, we're talking about an enormous sum of data that can be looked at in a lot of different ways. I got one other question for you related to this, because I gotta presume that there's the quid pro quo for getting people into the study, is that, you know, 30,000 people, is that you'll be able to help them and provide prescriptive advice about how to improve their health as you discover more about what's going on, have I got that right? >> So, we're trying to do that, but also there are limits to this, of course. >> Of course. >> For us it's basically collecting the data and people are really willing to donate everything they can from their health data to allow these large studies. >> To help future generations. >> So that's not necessarily quid pro quo. >> Okay, there isn't, okay. But still, the knowledge is enough for them. >> Yeah, their incentive is they're gonna help people who have this disease down the road. >> I mean if it is not me, if it helps society in general, people are willing to do a lot. >> Yeah of course. >> Oh sure. >> Now the machine is not a product yet that's shipping, right, so how do you get access to it, or is this sort of futures, or... >> When we started talking to one another about this, we actually did not have the prototype with us. But remember that when we started down this journey for the machine three years ago, we know back then that we would have hardware somewhere in the future, but as part of my responsibility, I had to deal with the fact that software has to be ready for this hardware. It does me no good to build hardware when there is no software to run on it. So we have actually been working at the software stack, how to think about applications on that software stack, using emulation and simulation environments, where we have some simulators with essentially instruction level simulator for what the machine does, or what that prototype would have done, and we were running code on top of those simulators. We also had performance simulators, where we'd say, if we write the application this way, this is how much we think we would gain in terms of performance, and all of those applications on all of that code we were writing was actually on our large memory machines, Superdome X to be precise. So by the time we started talking to them, we had these emulation environments available, we had experience using these emulation environments on our Superdome X platform. So when they came to us and started working with us, we took their software that they brought to us, and started working within those emulation environments to see how fast we could make those problems, even within those emulation environments. So that's how we started down this track, and most of the results we have shown in the study are all measured results that we are quoting inside this forum on the Superdome X platform. So even in that emulated environment, which is emulating the machine now, on course in the emulation Superdome X, for example, I can only hold 24 terabytes of data in memory. I say only 24 terabytes >> Only! because I'm looking at much larger systems, but an enormously large number of workloads fit very comfortably inside the 24 terabytes. And for those particular workloads, the programming techniques we are developing work at that scale, right, they won't scale beyond the 24 terabytes, but they'll certainly work at that scale. So between us we then started looking for problems, and I'll let Matthias comment on the problems that they brought to us, and then we can talk about how we actually solved those problems. >> So we work a lot with genomics data, and usually what we do is we have a pipeline so we connect multiple tools, and we thought, okay, this architecture sounds really interesting to us, but if we want to get started with this, we should pose them a challenge. So if they can convince us, we went through the literature, we took a tool that was advertised as the new optimal solution. So prior work was taking up to six days for processing, they were able to cut it to 22 minutes, and we thought, okay, this is a perfect challenge for our collaboration, and we went ahead and we took this tool, we put it on the Superdome X that was already running and stepped five minutes instead of just 22, and then we started modifying the code and in the end we were able to shrink the time down to just 30 seconds, so that's two magnitudes faster. >> We took something which was... They were able to run in 22 minutes, and that was already had been optimized by people in the field to say "I want this answer fast", and then when we moved it to our Superdome X platform, the platform is extremely capable. Hardware-wise it compares really well to other platforms which are out there. That time came down to five minutes, but that was just the beginning. And then as we modified the software based on the emulation results we were seeing underneath, we brought that time down to 13 seconds, which is a hundred times faster. We started this work with them in December of last year. It takes time to set up all of this environment, so the serious coding was starting in around March. By June we had 9X improvement, which is already a factor of 10, and since June up to now, we have gotten another factor of 10 on that application. So I'm now at a 100X faster than what the application was able to do before. >> Dave: Two orders of magnitude in a year? >> Sharad: In a year. >> Okay, we're out of time, but where do you see this going? What is the ultimate outcome that you're hoping for? >> For us, we're really aiming to analyze our data in real time. Oftentimes when we have biological questions that we address, we analyze our data set, and then in a discussion a new question comes up, and we have to say, "Sorry, we have to process the data, "come back in a week", and our idea is to be able to generate these answers instantaneously from our data. >> And those answers will lead to what? Just better care for individuals with Alzheimer's, or potentially, as you said, making Alzheimer's a memory. >> So the idea is to identify Alzheimer long before the first symptoms are shown, because then you can start an effective treatment and you can have the biggest impact. Once the first symptoms are present, it's not getting any better. >> Well thank you for your great work, gentlemen, and best of luck on behalf of society, >> Thank you very much >> really appreciate you coming on theCUBE and sharing your story. You're welcome. All right, keep it right there, buddy. Peter and I will be back with our next guest right after this short break. This is theCUBE, you're watching live from Madrid, HPE Discover 2017. We'll be right back.

Published Date : Nov 29 2017

SUMMARY :

brought to you by Hewlett Packard Enterprise. that we also cover in Las Vegas, So Sharad, why don't we start with you, and frankly at that time, the response we got When you don't have to computing architectures that we are thinking through and how this technology gives you possible hope and in this process we generate a lot of data, So Dave lives in Massachusetts, I used to live there, is the Framingham Heart Study, which tracked people that we have captured with each other. Now if the amount of compute it takes to actually the Framingham Heart Study, you know, there are limits to this, of course. and people are really willing to donate everything So that's not necessarily But still, the knowledge is enough for them. people who have this disease down the road. I mean if it is not me, if it helps society in general, Now the machine is not a product yet and most of the results we have shown in the study that they brought to us, and then we can talk about and in the end we were able to shrink the time based on the emulation results we were seeing underneath, and we have to say, "Sorry, we have to process the data, Just better care for individuals with Alzheimer's, So the idea is to identify Alzheimer Peter and I will be back with our next guest

ENTITIES

Entity	Category	Confidence
Neil	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jonathan	PERSON	0.99+
John	PERSON	0.99+
Ajay Patel	PERSON	0.99+
Dave	PERSON	0.99+
$3	QUANTITY	0.99+
Peter Burris	PERSON	0.99+
Jonathan Ebinger	PERSON	0.99+
Anthony	PERSON	0.99+
Mark Andreesen	PERSON	0.99+
Savannah Peterson	PERSON	0.99+
Europe	LOCATION	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Matthias Becker	PERSON	0.99+
Greg Sands	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jennifer Meyer	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Target	ORGANIZATION	0.99+
Blue Run Ventures	ORGANIZATION	0.99+
Robert	PERSON	0.99+
Paul Cormier	PERSON	0.99+
Paul	PERSON	0.99+
OVH	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
Peter	PERSON	0.99+
California	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Sony	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Robin	PERSON	0.99+
Red Cross	ORGANIZATION	0.99+
Tom Anderson	PERSON	0.99+
Andy Jazzy	PERSON	0.99+
Korea	LOCATION	0.99+
Howard	PERSON	0.99+
Sharad Singal	PERSON	0.99+
DZNE	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
five minutes	QUANTITY	0.99+
$2.7 million	QUANTITY	0.99+
Tom	PERSON	0.99+
John Furrier	PERSON	0.99+
Matthias	PERSON	0.99+
Matt	PERSON	0.99+
Boston	LOCATION	0.99+
Jesse	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+

Richard Cramer, Informatica - Informatica World 2017 - #INFA17 - #theCUBE

>> Announcer: Live from San Francisco, It's The Cube. Covering Informatica World 2017 brought to you by Informatica. >> Hello everyone, welcome back to The Cube coverage, exclusive coverage of Informatica 2017, we are live in San Francisco breaking down all the action of Informatica's big conference Informatica World 2017, I'm John Furrier with Silicon Angle The Cube, my cohost Peter Burris, head of research and also general manager wikibon.com check it out, great research there, next guest is Richard Cramer, Chief Healthcare Strategist fpr Informatica, welcome to The Cube. >> Thank you John. >> Great to see you, we were just talking before we went live about you love data, you love customers, and healthcare is booming, certainly healthcare is one of those use cases, it's a vertical that everyone can relate to, one. Two, it's the most dynamic with data right now and internet of things connected sensors you know what a room looks like a zillion things connected, now you got wearables, still you got the data problem, it's never going away, certainly it exists there, but now it's changing, so break it down for us, what is the challenges and drivers right now in the healthcare industry relative to getting great software and great solutions to help patients. >> Well you're 100% right, one of the things that's exciting about healthcare is it matters to all of us. Every one of us is a patient, every one of us has a horror story of interacting with a healthcare system and so when we look at the opportunity for data, healthcare has historically not used data very well. We had the High-tech Act in 2009 that got electronic healthcare records in place, we're coming out of the backside of that, so arguably for the first time we finally have the deep rich clinical data that we've needed to do analytics with for the first time. We now have the technology that's coming around with what we call data 3.0 and big data processing power and then as you mentioned internet of things and all of the rich sources of new data that we can do discovery on and learn new things about how to treat patients better and then really the final component is we have the financial incentives are finally aligned. We used to in healthcare pay for piecework. The more you did, the more you got paid. And shockingly we were inefficient, we did too much. (laughs) And now we're changing to paying for value. And we can pay for value because we can finally measure quality and outcomes because we have the data. And so that's really the analytics opportunity that's so exciting in healthcare right now. >> What's interesting is that in this digital transformation, and business transformation and all the conversations we've had over the years on The Cube and look at all the top shows in the enterprise and emerging tech, you're seeing one pattern, we had the Chicago Cubs on yesterday talking baseball but whether it's sports, business, or healthcare or whatever vertical, there's kind of three things, and we'll take baseball, right? Fan experience, how to run the players and the team, and how to run the organization. Healthcare is the same thing, how to run an organization, how to take care of the players, the doctors and the practitioners, and then also the end user, the fan experience, the patient experience. So now you have, it used to be hey are we running our organization and the practitioners were part of that maybe subordinate to it, maybe they interacted with it, but now like a baseball team you have how do I run my organization, how do I make the players, the doctors and practitioners successful, and now the patients, the end users are now part of it as well. This is opening up massive innovation opportunities. What's your reaction to that and how should people think about the data in that context? >> So I think the first piece of what you said is very true, which is really for the first time, healthcare organizations are behaving like real businesses. When you start to get paid for results, you now care about a lot of things that you didn't care about before. Patient experience matters 'cause consumers have choice, those types of things, so all of those digital transformation examples from other industries are now relevant and front and center for healthcare organizations. Which is radically different and so that opportunity to use data and use it for a specific purpose is very valuable. I think the other thing that's important with digital transformation is historically healthcare is very local. It's regional, you go to the hospital that's closest to you. And digital disruption is all about removing geographic barriers. The goal in healthcare today is we're reducing cost, you want to push healthcare out of that high cost hospital into the most cost effective highest quality organization you can. That maybe a retail clinic in a shopping mall. And how do you do that? You do that with digital technology. Telehealth in the home. All of those types of things are traditional digital transformation types of capabilities that healthcare has not traditionally cared about. >> So optimizing a network effect if you will, we always hear in network, out of network as a term (laughs) >> Yep. >> My wife and I go oh it's in network, oh good, so out of network always kind of means spendy but now you're talking about a reconfiguration of making things much more efficient as piece parts. >> Well exactly right and the idea of the network, the network used to be drive everybody to the hospital, 'cause that's where we made our money. Well when you're getting paid for results, the hospital's a cost center, not a revenue center. You actually want to keep people out of the hospital. And as a consumer and as somebody who's paying for healthcare, that's actually a good thing. If I can avoid going to the hospital and get healthcare in a more convenient setting that I want to do at home or someplace closer to home and not be admitted to a hospital, hospitals are dangerous places. >> Peter, you've been doing, I've seen you and I comment on Facebook all the time, certainly the healthcare things sparks the conversation but big data can solve a lot of this stuff, I know you're doing a lot of thinking around this. >> Well so fascinating conversation, I'd say a couple things really quickly and then get your take on it. First off a lot of the evidence based management techniques we heard about yesterday originated in healthcare. Because of the >> You mean like data management and all that stuff? >> Peer review, how we handle clinical trials, the amount of data that's out there, so a lot of the principles about how data could be used in a management framework began in healthcare and they have kind of diffused the marketplace, but the data hasn't been there. Now there's some very powerfully aligned interests. Hospitals like their data, manufacturers of products like their data, doctors like their data, consumers don't know what to do with their data. They don't know what value the data is. So if we take a look at those interests, it's going to be hard, and there's a lot of standards, there's a lot of conventions, each of those groups have their so now the data's available, but the integration is going to be a major challenge. People are using HIPAA as an excuse not to do it, manufacturers and other folks are using other kinds of excuses not to facilitate the data because everybody wants control of the final money. So we've heard a lot at the conference about how, liberate the data, free it up, make it available to do more work, but the second step is integration. You have got the integration problem of all integration problems in data. >> Yes. >> Talk about how some of the healthcare leaders are starting to think about how they're going to break down some of these barriers and begin the process of integrating some of their data so they can in fact enact different types of behaviors. >> Yeah and great context for what's happening in healthcare with data. So if you think of five, six, seven years ago at Informatica, my role was to go and look at what other industries had done for traditional enterprise data warehousing and bring that knowledge back into healthcare and say healthcare you're ten years behind the rest of industry (laughs) here's how you should think about your data analytics. Well that's completely different now. The data challenge as you've outlined it are we've always had data complexity, we now have internet of things data like nobody's business and we also have this obligation to use the data far more effectively than we ever have before. Well one of the key parts of this is that the idea of centralizing and controlling data as a path value is no longer viable. We can argue whether it was ever successful, but it really is not even an option anymore when you look at the proliferation of data sources, the proliferation of data types, the complexity, we simply can't govern data to perfection before we get using (laughs) which is traditionally the healthcare approach. What we're really looking at now is this whole idea of big data analytics applied to all data and being able to do discovery that says we can make good decisions with data that may not be perfect and this is the big data, put it into a data lake, do some self service discovery, some self service data preparation, reduce the distance between the people who know what the data means and being able to get hands on and work with it so that you can iterate and you can discover. You cannot do that in an old fashioned EDW context where we have to extract, transform, load, govern to perfection, all the data before anybody ever gets to use it. >> John: That's why I'm excited about data in motion. >> Well even, yeah data, we'll get to that in a second because that's important, but even before we get there, John, I mean again, think about how powerful some of these industries are. Drug companies keep drug prices high in the U.S. because they have visibility into the data, the nature of the treatments, et cetera. One of the most interesting things, this is one I want to attest with you on. Is that doctors, where a lot of this evidence based management has started because of peer review, because of their science orientation, even though they get grooved into their own treatments, generally speaking our interest is in exploring new pathways to health and wellness. So is, do you have a very powerful user group that will adopt this ability to integrate data very quickly because they can get greater visibility into new tactics, new techniques, new healthcare regimes as well as new information about patients? Are doctors going to be crucial to this process in your opinion? >> Doctors are going to be crucial to the discussion, we had a healthcare breakfast with a speaker from Deloitte the other day who talked about using data with clinicians to have a data discussion. Not use data to tell them you're wrong or whatnot but actually to engage them in the discovery process of here's what the data shows about your practice. And you talk about the idea of data control, that's absolutely one of the biggest barriers. The technology does not solve data control. >> Right. >> In the old days, everybody admits we have silo data, we have HIPAA, it was so hard to break down those barriers and actually share data that nobody really addressed the fact that people didn't want to. Because they couldn't. Well now with the technology that's available it's >> What's possible, the art of possible. >> Yeah, now it's possible to actually get data from everywhere and do things with it quickly. We run into the fact that people have to explicitly say I don't want to share. >> But here's where that data movement issue becomes so important John and I think that this is a play for Informatica. Because metadata is going to be crucial to this process. Being, giving people who do have some understanding of data, clinicians, physicians, because of their background, because of the way that medicine is supposed to be run at that level, giving them visibility into the data that's available, that could inform their practices and their decisions is really crucial. >> Absolutely. One of, a good friend who's a clinician has been asking for years, he says if all you did was give me access to data about my patients so I could explore my own clinical practice, says I'm guaranteed I take care of diabetics the way I learned in medical school 25 years ago. There has been a lot of innovation in that and just having the perspective on my own practice patterns from my own data would change my behavior. And we, typically I haven't been able to do that. We can now. >> So I've got to ask you, so let's get down and dirty on Informatica, 'cause first of all I think instrumentation of everything now is a reality, I think people now are warming up to certainly in levels, super hot to like I realize it's a transformation area. What are you guys saying to customers? Because they're kind of drowning in the data, one. Two, they are maybe held back, 'cause of HIPAA and other things, now it's time to act, so the art of the possible things are now possible, damn I got to get a plan, so they're hustling around to put a plan together, architecture, plan, what do you guys pitch to customers? What is the value proposition that you go in, and take us through an example, a use case of a day in the life of your role with customers. >> So I have the best job in Informatica. I get to go out and meet with senior customer executive teams and talk about data, how they're going to use data, and how we can help them do it. So it's the best job in the company. But if you look at the typical pitch, we start out, we first we get them to agree with the principle, centralizing control is dead, being able to manage data as an enterprise asset in a decentralized fashion with customer self service is the future reality. And everybody universally says yep, we get it, we agree. >> John: Next. (laughs) Check. >> But then we talk about what does that actually mean? And it's amazing how at every step in my presentation, the 20 questions always are the same, it comes down to well how do we control that? How do we control that? >> Peter: How do we manage it? >> So you start with, you think of this idea that says hey, decentralized data, customer self service, you got to have a data catalog. Well enterprise information catalog is a perfect solution. If you don't know where your data assets are and who's using them, you cannot manage data as an asset >> And they're comfortable with that because that's the old mindset of the warehouse like that big fenced in organization, but now they say okay I can free it up >> Yes. >> And manage it with a catalog and get the control I need. >> That's right and so the first piece is the catalog, well then the minute you say to people the catalog is the way to get value from your data, there's somebody in every room that says ooh that value represents risk. You're letting people see data and make data easy to find, that can't possibly be good, it's risky. Well then we have secure at source was the opposite product from enterprise information catalog that says here's the risk profile of all those data sources for HIPAA and protected health information so we got a great answer to that question, and then you look and you say well how do I fundamentally work with data differently and that's the idea of a data lake. Rather than making data hard to get in so it's easy to query which is the traditional enterprise data warehouse, and even people who do enterprise data warehousing well, little secret is, takes too long, costs too much, and it's not agile. >> Yeah. >> We're not suggesting for a second that a centralized repository, a trustworthy data governed within an inch of it's life so that it can be used broadly throughout the organization without people hurting themselves is not good, it can't be the only place to work with data. Takes too long, costs too much, and it's not agile. What you want is the data lake that says put all the data that you care about in a place, big data, IOT data, data that you don't know what you're going to use, and apply effort at query time only to the data that you care about. >> And we're always talking about cleanliness and hygiene yesterday versus heart surgeon, different roles in an organization, the big fear that we hear from customers, we talk to on The Cube, I want to get your thoughts and then reaction of this is that my data lakes turn into a data swamp. Because it's just, I'm not using it, it's just sitting there, it gets stale, I'm not managing it properly, I'm not vectoring it into the right apps in real time, moving it around, your reaction to that objection. >> Early days of the data lake, absolutely data swamp because we didn't have the tools, people weren't using them correctly, so just because you put it in a data lake doesn't mean that it's ungoverned. It doesn't mean you don't want to put the catalog on it so you know what's there and how to use it. It doesn't mean you don't want to have end to end transparency and visibility from the data consumer to the data source because transparency is actually the first level of government. That's what provides confidence. It's not agreeing on a single version of the truth and making sure the data's right. It's just simply allowing the transparency and so when you have a data lake with a catalog, with intelligent data lake for self service data preparation, with the ability to see end to end what's happening with that data, I don't care that it's not been governed if I can inspect it easily and quickly to validate that your assumptions are reasonable, 'cause this is the biggest thing in healthcare. We can't handle the new data, the IOT data, and the scope of things we want to do that we haven't thought about the old way. >> Yeah we have limited time. >> One last question. Framingham Heart Study has shown us that healthcare data ages differently than most other data. How do we anticipate what data's going to be important today and what data's going to be important in the future? Given that we're talking about people and how they age over time. >> So the key thing with that and we talked about it earlier, you can't analyze data that you threw away. And so a big part of this is if the data might potentially be of interest, stage it, and don't put it in an archive, don't put it someplace in the database backup, it's got to be staged and accessible, which is the data lake. >> And ready. >> And ready, you've got to, and you can't have distance between it. Somebody can't have to go and request it. They need to be able to work on it. And that's the revolution that really is represented by data 3.0, we finally can afford to save data, huge amounts of data, that we don't know we care about. Because somebody may care about it in the future. >> Peter: That's right. >> Great Richard, great commentary, great insight, and appreciate you coming on The Cube and sharing what's update in the healthcare obviously super important again they're running like business, a lot of optimization, a lot of changes going on, you guys are doing some good work there, congratulations data 3.0 strategy. Hopefully that'll permeate down to the healthcare organizations and hopefully the user experience, me, the patient when I go in, I want to be in and out >> Peter: Wellness. >> Of the hospital and also preventative which I'm trying to do a good job on but too many Cube interviews keeping me busy, I'm going to have a heart attack on The Cube, no I'm only kidding (laughs) Great coverage here at Informatica World in San Francisco, I'm John Furrier, Peter Burris, more live coverage of day two at Informatica World, Cube, we'll be right back stay with us.

Published Date : May 17 2017

SUMMARY :

brought to you by Informatica. we are live in San Francisco breaking down all the action Two, it's the most dynamic with data right now so arguably for the first time we finally have Healthcare is the same thing, how to run an organization, Telehealth in the home. but now you're talking about a reconfiguration Well exactly right and the idea of the network, certainly the healthcare things sparks the conversation Because of the but the integration is going to be a major challenge. and begin the process of integrating some of their data all the data before anybody ever gets to use it. One of the most interesting things, the other day who talked about using data with clinicians In the old days, everybody admits we have silo data, the art of possible. We run into the fact that people have to explicitly say because of the way that medicine is supposed to be run and just having the perspective on my own practice patterns What is the value proposition that you go in, how they're going to use data, and how we can help them do it. and who's using them, you cannot manage data as an asset and that's the idea of a data lake. that says put all the data that you care about in a place, the big fear that we hear from customers, and the scope of things we want to do and how they age over time. So the key thing with that and we talked about it earlier, And that's the revolution that really is represented and hopefully the user experience, me, Of the hospital and also preventative

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Peter Burris	PERSON	0.99+
Richard	PERSON	0.99+
Richard Cramer	PERSON	0.99+
Peter	PERSON	0.99+
Chicago Cubs	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
San Francisco	LOCATION	0.99+
Informatica	ORGANIZATION	0.99+
20 questions	QUANTITY	0.99+
ten years	QUANTITY	0.99+
HIPAA	TITLE	0.99+
100%	QUANTITY	0.99+
first piece	QUANTITY	0.99+
yesterday	DATE	0.99+
second step	QUANTITY	0.99+
each	QUANTITY	0.99+
High-tech Act	TITLE	0.99+
first time	QUANTITY	0.99+
Informatica World 2017	EVENT	0.99+
one	QUANTITY	0.99+
Deloitte	ORGANIZATION	0.98+
day two	QUANTITY	0.98+
#INFA17	EVENT	0.98+
One	QUANTITY	0.97+
Two	QUANTITY	0.97+
three things	QUANTITY	0.97+
Informatica 2017	EVENT	0.97+
U.S.	LOCATION	0.96+
first	QUANTITY	0.96+
First	QUANTITY	0.95+
single version	QUANTITY	0.94+
seven years ago	DATE	0.93+
today	DATE	0.92+
Cube	ORGANIZATION	0.92+
one pattern	QUANTITY	0.92+
Framingham Heart Study	ORGANIZATION	0.91+
Informatica World	ORGANIZATION	0.9+
Facebook	ORGANIZATION	0.9+
2009	DATE	0.89+
The Cube	ORGANIZATION	0.88+
first level	QUANTITY	0.87+
five	DATE	0.84+
Silicon Angle	ORGANIZATION	0.83+
25 years ago	DATE	0.8+
second	QUANTITY	0.77+
every one	QUANTITY	0.77+
baseball	TITLE	0.76+
Chief	PERSON	0.75+
wikibon.com	ORGANIZATION	0.74+
years	QUANTITY	0.73+
six	QUANTITY	0.67+
#theCUBE	ORGANIZATION	0.62+
a day	QUANTITY	0.61+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Framingham Heart Study: