Sharad Singhal, The Machine & Matthias Becker, University of Bonn | HPE Discover Madrid 2017
>> Announcer: Live from Madrid, Spain, it's theCUBE, covering HPE Discover Madrid 2017, brought to you by Hewlett Packard Enterprise. >> Welcome back to Madrid, everybody, this is theCUBE, the leader in live tech coverage and my name is Dave Vellante, and I'm here with Peter Burris, this is day two of HPE Hewlett Packard Enterprise Discover in Madrid, this is their European version of a show that we also cover in Las Vegas, kind of six month cadence of innovation and organizational evolution of HPE that we've been tracking now for several years. Sharad Singal is here, he covers software architecture for the machine at Hewlett Packard Enterprise, and Matthias Becker, who's a postdoctoral researcher at the University of Bonn. Gentlemen, thanks so much for coming in theCUBE. >> Thank you. >> No problem. >> You know, we talk a lot on theCUBE about how technology helps people make money or save money, but now we're talking about, you know, something just more important, right? We're talking about lives and the human condition and >> Peter: Hard problems to solve. >> Specifically, yeah, hard problems like Alzheimer's. So Sharad, why don't we start with you, maybe talk a little bit about what this initiative is all about, what the partnership is all about, what you guys are doing. >> So we started on a project called the Machine Project about three, three and a half years ago and frankly at that time, the response we got from a lot of my colleagues in the IT industry was "You guys are crazy", (Dave laughs) right. We said we are looking at an enormous amount of data coming at us, we are looking at real time requirements on larger and larger processing coming up in front of us, and there is no way that the current architectures of the computing environments we create today are going to keep up with this huge flood of data, and we have to rethink how we do computing, and the real question for those of us who are in research in Hewlett Packard Labs, was if we were to design a computer today, knowing what we do today, as opposed to what we knew 50 years ago, how would we design the computer? And this computer should not be something which solves problems for the past, this should be a computer which deals with problems in the future. So we are looking for something which would take us for the next 50 years, in terms of computing architectures and what we will do there. In the last three years we have gone from ideas and paper study, paper designs, and things which were made out of plastic, to a real working system. We have around Las Vegas time, we'd basically announced that we had the entire system working with actual applications running on it, 160 terabytes of memory all addressable from any processing core in 40 computing nodes around it. And the reason is, although we call it memory-driven computing, it's really thinking in terms of data-driven computing. The reason is that the data is now at the center of this computing architecture, as opposed to the processor, and any processor can return to any part of the data directly as if it was doing, addressing in local memory. This provides us with a degree of flexibility and freedom in compute that we never had before, and as a software person, I work in software, as a software person, when we started looking at this architecture, our answer was, well, we didn't know we could do this. Now if, given now that I can do this and I assume that I can do this, all of us in the programmers started thinking differently, writing code differently, and we suddenly had essentially a toy to play with, if you will, as programmers, where we said, you know, this algorithm I had written off decades ago because it didn't work, but now I have enough memory that if I were to think about this algorithm today, I would do it differently. And all of a sudden, a new set of algorithms, a new set of programming possibilities opened up. We worked with a number of applications, ranging from just Spark on this kind of an environment, to how do you do large scale simulations, Monte Carlo simulations. And people talk about improvements in performance from something in the order of, oh I can get you a 30% improvement. We are saying in the example applications we saw anywhere from five, 10, 15 times better to something which where we are looking at financial analysis, risk management problems, which we can do 10,000 times faster. >> So many orders of magnitude. >> Many, many orders >> When you don't have to wait for the horrible storage stack. (laughs) >> That's right, right. And these kinds of results gave us the hope that as we look forward, all of us in these new computing architectures that we are thinking through right now, will take us through this data mountain, data tsunami that we are all facing, in terms of bringing all of the data back and essentially doing real-time work on those. >> Matthias, maybe you could describe the work that you're doing at the University of Bonn, specifically as it relates to Alzheimer's and how this technology gives you possible hope to solve some problems. >> So at the University of Bonn, we work very closely with the German Center for Neurodegenerative Diseases, and in their mission they are facing all diseases like Alzheimer's, Parkinson's, Multiple Sclerosis, and so on. And in particular Alzheimer's is a really serious disease and for many diseases like cancer, for example, the mortality rates improve, but for Alzheimer's, there's no improvement in sight. So there's a large population that is affected by it. There is really not much we currently can do, so the DZNE is focusing on their research efforts together with the German government in this direction, and one thing about Alzheimer's is that if you show the first symptoms, the disease has already been present for at least a decade. So if you really want to identify sources or biomarkers that will point you in this direction, once you see the first symptoms, it's already too late. So at the DZNE they have started on a cohort study. In the area around Bonn, they are now collecting the data from 30,000 volunteers. They are planning to follow them for 30 years, and in this process we generate a lot of data, so of course we do the usual surveys to learn a bit about them, we learn about their environments. But we also do very more detailed analysis, so we take blood samples and we analyze the complete genome, and also we acquire imaging data from the brain, so we do an MRI at an extremely high resolution with some very advanced machines we have, and all this data is accumulated because we do not only have to do this once, but we try to do that repeatedly for every one of the participants in the study, so that we can later analyze the time series when in 10 years someone develops Alzheimer's we can go back through the data and see, maybe there's something interesting in there, maybe there was one biomarker that we are looking for so that we can predict the disease better in advance. And with this pile of data that we are collecting, basically we need something new to analyze this data, and to deal with this, and when we heard about the machine, we though immediately this is a system that we would need. >> Let me see if I can put this in a little bit of context. So Dave lives in Massachusetts, I used to live there, in Framingham, Massachusetts, >> Dave: I was actually born in Framingham. >> You were born in Framingham. And one of the more famous studies is the Framingham Heart Study, which tracked people over many years and discovered things about heart disease and relationship between smoking and cancer, and other really interesting problems. But they used a paper-based study with an interview base, so for each of those kind of people, they might have collected, you know, maybe a megabyte, maybe a megabyte and a half of data. You just described a couple of gigabytes of data per person, 30,000, multiple years. So we're talking about being able to find patterns in data about individuals that would number in the petabytes over a period of time. Very rich detail that's possible, but if you don't have something that can help you do it, you've just collected a bunch of data that's just sitting there. So is that basically what you're trying to do with the machine is the ability to capture all this data, to then do something with it, so you can generate those important inferences. >> Exactly, so with all these large amounts of data we do not only compare the data sets for a single person, but once we find something interesting, we have also to compare the whole population that we have captured with each other. So there's really a lot of things we have to parse and compare. >> This brings together the idea that it's not just the volume of data. I also have to do analytics and cross all of that data together, right, so every time a scientist, one of the people who is doing biology studies or informatic studies asks a question, and they say, I have a hypothesis which this might be a reason for this particular evolution of the disease or occurrence of the disease, they then want to go through all of that data, and analyze it as as they are asking the question. Now if the amount of compute it takes to actually answer their questions takes me three days, I have lost my train of thought. But if I can get that answer in real time, then I get into this flow where I'm asking a question, seeing the answer, making a different hypothesis, seeing a different answer, and this is what my colleagues here were looking for. >> But if I think about, again, going back to the Framingham Heart Study, you know, I might do a query on a couple of related questions, and use a small amount of data. The technology to do that's been around, but when we start looking for patterns across brain scans with time series, we're not talking about a small problem, we're talking about an enormous sum of data that can be looked at in a lot of different ways. I got one other question for you related to this, because I gotta presume that there's the quid pro quo for getting people into the study, is that, you know, 30,000 people, is that you'll be able to help them and provide prescriptive advice about how to improve their health as you discover more about what's going on, have I got that right? >> So, we're trying to do that, but also there are limits to this, of course. >> Of course. >> For us it's basically collecting the data and people are really willing to donate everything they can from their health data to allow these large studies. >> To help future generations. >> So that's not necessarily quid pro quo. >> Okay, there isn't, okay. But still, the knowledge is enough for them. >> Yeah, their incentive is they're gonna help people who have this disease down the road. >> I mean if it is not me, if it helps society in general, people are willing to do a lot. >> Yeah of course. >> Oh sure. >> Now the machine is not a product yet that's shipping, right, so how do you get access to it, or is this sort of futures, or... >> When we started talking to one another about this, we actually did not have the prototype with us. But remember that when we started down this journey for the machine three years ago, we know back then that we would have hardware somewhere in the future, but as part of my responsibility, I had to deal with the fact that software has to be ready for this hardware. It does me no good to build hardware when there is no software to run on it. So we have actually been working at the software stack, how to think about applications on that software stack, using emulation and simulation environments, where we have some simulators with essentially instruction level simulator for what the machine does, or what that prototype would have done, and we were running code on top of those simulators. We also had performance simulators, where we'd say, if we write the application this way, this is how much we think we would gain in terms of performance, and all of those applications on all of that code we were writing was actually on our large memory machines, Superdome X to be precise. So by the time we started talking to them, we had these emulation environments available, we had experience using these emulation environments on our Superdome X platform. So when they came to us and started working with us, we took their software that they brought to us, and started working within those emulation environments to see how fast we could make those problems, even within those emulation environments. So that's how we started down this track, and most of the results we have shown in the study are all measured results that we are quoting inside this forum on the Superdome X platform. So even in that emulated environment, which is emulating the machine now, on course in the emulation Superdome X, for example, I can only hold 24 terabytes of data in memory. I say only 24 terabytes >> Only! because I'm looking at much larger systems, but an enormously large number of workloads fit very comfortably inside the 24 terabytes. And for those particular workloads, the programming techniques we are developing work at that scale, right, they won't scale beyond the 24 terabytes, but they'll certainly work at that scale. So between us we then started looking for problems, and I'll let Matthias comment on the problems that they brought to us, and then we can talk about how we actually solved those problems. >> So we work a lot with genomics data, and usually what we do is we have a pipeline so we connect multiple tools, and we thought, okay, this architecture sounds really interesting to us, but if we want to get started with this, we should pose them a challenge. So if they can convince us, we went through the literature, we took a tool that was advertised as the new optimal solution. So prior work was taking up to six days for processing, they were able to cut it to 22 minutes, and we thought, okay, this is a perfect challenge for our collaboration, and we went ahead and we took this tool, we put it on the Superdome X that was already running and stepped five minutes instead of just 22, and then we started modifying the code and in the end we were able to shrink the time down to just 30 seconds, so that's two magnitudes faster. >> We took something which was... They were able to run in 22 minutes, and that was already had been optimized by people in the field to say "I want this answer fast", and then when we moved it to our Superdome X platform, the platform is extremely capable. Hardware-wise it compares really well to other platforms which are out there. That time came down to five minutes, but that was just the beginning. And then as we modified the software based on the emulation results we were seeing underneath, we brought that time down to 13 seconds, which is a hundred times faster. We started this work with them in December of last year. It takes time to set up all of this environment, so the serious coding was starting in around March. By June we had 9X improvement, which is already a factor of 10, and since June up to now, we have gotten another factor of 10 on that application. So I'm now at a 100X faster than what the application was able to do before. >> Dave: Two orders of magnitude in a year? >> Sharad: In a year. >> Okay, we're out of time, but where do you see this going? What is the ultimate outcome that you're hoping for? >> For us, we're really aiming to analyze our data in real time. Oftentimes when we have biological questions that we address, we analyze our data set, and then in a discussion a new question comes up, and we have to say, "Sorry, we have to process the data, "come back in a week", and our idea is to be able to generate these answers instantaneously from our data. >> And those answers will lead to what? Just better care for individuals with Alzheimer's, or potentially, as you said, making Alzheimer's a memory. >> So the idea is to identify Alzheimer long before the first symptoms are shown, because then you can start an effective treatment and you can have the biggest impact. Once the first symptoms are present, it's not getting any better. >> Well thank you for your great work, gentlemen, and best of luck on behalf of society, >> Thank you very much >> really appreciate you coming on theCUBE and sharing your story. You're welcome. All right, keep it right there, buddy. Peter and I will be back with our next guest right after this short break. This is theCUBE, you're watching live from Madrid, HPE Discover 2017. We'll be right back.
SUMMARY :
brought to you by Hewlett Packard Enterprise. that we also cover in Las Vegas, So Sharad, why don't we start with you, and frankly at that time, the response we got When you don't have to computing architectures that we are thinking through and how this technology gives you possible hope and in this process we generate a lot of data, So Dave lives in Massachusetts, I used to live there, is the Framingham Heart Study, which tracked people that we have captured with each other. Now if the amount of compute it takes to actually the Framingham Heart Study, you know, there are limits to this, of course. and people are really willing to donate everything So that's not necessarily But still, the knowledge is enough for them. people who have this disease down the road. I mean if it is not me, if it helps society in general, Now the machine is not a product yet and most of the results we have shown in the study that they brought to us, and then we can talk about and in the end we were able to shrink the time based on the emulation results we were seeing underneath, and we have to say, "Sorry, we have to process the data, Just better care for individuals with Alzheimer's, So the idea is to identify Alzheimer Peter and I will be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Neil | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jonathan | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Ajay Patel | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
$3 | QUANTITY | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Jonathan Ebinger | PERSON | 0.99+ |
Anthony | PERSON | 0.99+ |
Mark Andreesen | PERSON | 0.99+ |
Savannah Peterson | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Yahoo | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Paul Gillin | PERSON | 0.99+ |
Matthias Becker | PERSON | 0.99+ |
Greg Sands | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jennifer Meyer | PERSON | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
Target | ORGANIZATION | 0.99+ |
Blue Run Ventures | ORGANIZATION | 0.99+ |
Robert | PERSON | 0.99+ |
Paul Cormier | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
OVH | ORGANIZATION | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Sony | ORGANIZATION | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Robin | PERSON | 0.99+ |
Red Cross | ORGANIZATION | 0.99+ |
Tom Anderson | PERSON | 0.99+ |
Andy Jazzy | PERSON | 0.99+ |
Korea | LOCATION | 0.99+ |
Howard | PERSON | 0.99+ |
Sharad Singal | PERSON | 0.99+ |
DZNE | ORGANIZATION | 0.99+ |
U.S. | LOCATION | 0.99+ |
five minutes | QUANTITY | 0.99+ |
$2.7 million | QUANTITY | 0.99+ |
Tom | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Matthias | PERSON | 0.99+ |
Matt | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Jesse | PERSON | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |