Brett McMillen, AWS | AWS re:Invent 2020

>>From around the globe. It's the cube with digital coverage of AWS reinvent 2020, sponsored by Intel and AWS. >>Welcome back to the cubes coverage of AWS reinvent 2020 I'm Lisa Martin. Joining me next is one of our cube alumni. Breton McMillan is back the director of us, federal for AWS. Right. It's great to see you glad that you're safe and well. >>Great. It's great to be back. Uh, I think last year when we did the cube, we were on the convention floor. It feels very different this year here at reinvent, it's gone virtual and yet it's still true to how reinvent always been. It's a learning conference and we're releasing a lot of new products and services for our customers. >>Yes. A lot of content, as you say, the one thing I think I would say about this reinvent, one of the things that's different, it's so quiet around us. Normally we're talking loudly over tens of thousands of people on the showroom floor, but great. That AWS is still able to connect in such an actually an even bigger way with its customers. So during Theresa Carlson's keynote, want to get your opinion on this or some info. She talked about the AWS open data sponsorship program, and that you guys are going to be hosting the national institutes of health, NIH sequence, read archive data, the biologist, and may former gets really excited about that. Talk to us about that because especially during the global health crisis that we're in, that sounds really promising >>Very much is I am so happy that we're working with NIH on this and multiple other initiatives. So the secret greed archive or SRA, essentially what it is, it's a very large data set of sequenced genomic data. And it's a wide variety of judge you gnomic data, and it's got a knowledge human genetic thing, but all life forms or all branches of life, um, is in a SRA to include viruses. And that's really important here during the pandemic. Um, it's one of the largest and oldest, um, gen sequence genomic data sets are out there and yet it's very modern. It has been designed for next generation sequencing. So it's growing, it's modern and it's well used. It's one of the more important ones that it's out there. One of the reasons this is so important is that we know to find cures for what a human ailments and disease and death, but by studying the gem genomic code, we can come up with the answers of these or the scientists can come up with answer for that. And that's what Amazon is doing is we're putting in the hands of the scientists, the tools so that they can help cure heart disease and diabetes and cancer and, um, depression and yes, even, um, uh, viruses that can cause pandemics. >>So making this data, sorry, I'm just going to making this data available to those scientists. Worldwide is incredibly important. Talk to us about that. >>Yeah, it is. And so, um, within NIH, we're working with, um, the, um, NCBI when you're dealing with NIH, there's a lot of acronyms, uh, and uh, at NIH, it's the national center for, um, file type technology information. And so we're working with them to make this available as an open data set. Why, why this is important is it's all about increasing the speed for scientific discovery. I personally think that in the fullness of time, the scientists will come up with cures for just about all of the human ailments that are out there. And it's our job at AWS to put into the hands of the scientists, the tools they need to make things happen quickly or in our lifetime. And I'm really excited to be working with NIH on that. When we start talking about it, there's multiple things. The scientists needs. One is access to these data sets and SRA. >>It's a very large data set. It's 45 petabytes and it's growing. I personally believe that it's going to double every year, year and a half. So it's a very large data set and it's hard to move that data around. It's so much easier if you just go into the cloud, compute against it and do your research there in the cloud. And so it's super important. 45 petabytes, give you an idea if it were all human data, that's equivalent to have a seven and a half million people or put another way 90% of everybody living in New York city. So that's how big this is. But then also what AWS is doing is we're bringing compute. So in the cloud, you can scale up your compute, scale it down, and then kind of the third they're. The third leg of the tool of the stool is giving the scientists easy access to the specialized tool sets they need. >>And we're doing that in a few different ways. One that the people would design these toolsets design a lot of them on AWS, but then we also make them available through something called AWS marketplace. So they can just go into marketplace, get a catalog, go in there and say, I want to launch this resolve work and launches the infrastructure underneath. And it speeds the ability for those scientists to come up with the cures that they need. So SRA is stored in Amazon S3, which is a very popular object store, not just in the scientific community, but virtually every industry uses S3. And by making this available on these public data sets, we're giving the scientists the ability to speed up their research. >>One of the things that Springs jumps out to me too, is it's in addition to enabling them to speed up research, it's also facilitating collaboration globally because now you've got the cloud to drive all of this, which allows researchers and completely different parts of the world to be working together almost in real time. So I can imagine the incredible power that this is going to, to provide to that community. So I have to ask you though, you talked about this being all life forms, including viruses COVID-19, what are some of the things that you think we can see? I expect this to facilitate. Yeah. >>So earlier in the year we took the, um, uh, genetic code or NIH took the genetic code and they, um, put it in an SRA like format and that's now available on AWS and, and here's, what's great about it is that you can now make it so anybody in the world can go to this open data set and start doing their research. One of our goals here is build back to a democratization of research. So it used to be that, um, get, for example, the very first, um, vaccine that came out was a small part. It's a vaccine that was done by our rural country doctor using essentially test tubes in a microscope. It's gotten hard to do that because data sets are so large, you need so much computer by using the power of the cloud. We've really democratized it and now anybody can do it. So for example, um, with the SRE data set that was done by NIH, um, organizations like the university of British Columbia, their, um, cloud innovation center is, um, doing research. And so what they've done is they've scanned, they, um, SRA database think about it. They scanned out 11 million entries for, uh, coronavirus sequencing. And that's really hard to do in a typical on-premise data center. Who's relatively easy to do on AWS. So by making this available, we can have a larger number of scientists working on the problems that we need to have solved. >>Well, and as the, as we all know in the U S operation warp speed, that warp speed alone term really signifies how quickly we all need this to be progressing forward. But this is not the first partnership that AWS has had with the NIH. Talk to me about what you guys, what some of the other things are that you're doing together. >>We've been working with NIH for a very long time. Um, back in 2012, we worked with NIH on, um, which was called the a thousand genome data set. This is another really important, um, data set and it's a large number of, uh, against sequence human genomes. And we moved that into, again, an open dataset on AWS and what's happened in the last eight years is many scientists have been able to compute about on it. And the other, the wonderful power of the cloud is over time. We continue to bring out tools to make it easier for people to work. So what they're not they're computing using our, um, our instance types. We call it elastic cloud computing. whether they're doing that, or they were doing some high performance computing using, um, uh, EMR elastic MapReduce, they can do that. And then we've brought up new things that really take it to the next layer, like level like, uh, Amazon SageMaker. >>And this is a, um, uh, makes it really easy for, um, the scientists to launch machine learning algorithms on AWS. So we've done the thousand genome, uh, dataset. Um, there's a number of other areas within NIH that we've been working on. So for example, um, over at national cancer Institute, we've been providing some expert guidance on best practices to how, how you can architect and work on these COVID related workloads. Um, NIH does things with, um, collaboration with many different universities, um, over 2,500, um, academic institutions. And, um, and they do that through grants. And so we've been working with doc office of director and they run their grant management applications in the RFA on AWS, and that allows it to scale up and to work very efficiently. Um, and then we entered in with, um, uh, NIH into this program called strides strides as a program for knowing NIH, but also all these other institutions that work within NIH to use the power of the cloud use commercial cloud for scientific discovery. And when we started that back in July of 2018, long before COVID happened, it was so great that we had that up and running because now we're able to help them out through the strides program. >>Right. Can you imagine if, uh, let's not even go there? I was going to say, um, but so, okay. So the SRA data is available through the AWS open data sponsorship program. You talked about strides. What are some of the other ways that AWS system? >>Yeah, no. So strides, uh, is, uh, you know, wide ranging through multiple different institutes. So, um, for example, over at, uh, the national heart lung and blood Institute, uh, do di NHL BI. I said, there's a lot of acronyms and I gel BI. Um, they've been working on, um, harmonizing, uh, genomic data. And so working with the university of Michigan, they've been analyzing through a program that they call top of med. Um, we've also been working with a NIH on, um, establishing best practices, making sure everything's secure. So we've been providing, um, AWS professional services that are showing them how to do this. So one portion of strides is getting the right data set and the right compute in the right tools, in the hands of the scientists. The other areas that we've been working on is making sure the scientists know how to use it. And so we've been developing these cloud learning pathways, and we started this quite a while back, and it's been so helpful here during the code. So, um, scientists can now go on and they can do self-paced online courses, which we've been really helping here during the, during the pandemic. And they can learn how to maximize their use of cloud technologies through these pathways that we've developed for them. >>Well, not education is imperative. I mean, there, you think about all of the knowledge that they have with within their scientific discipline and being able to leverage technology in a way that's easy is absolutely imperative to the timing. So, so, um, let's talk about other data sets that are available. So you've got the SRA is available. Uh, what are their data sets are available through this program? >>What about along a wide range of data sets that we're, um, uh, doing open data sets and in general, um, these data sets are, um, improving the human condition or improving the, um, the world in which we live in. And so, um, I've talked about a few things. There's a few more, uh, things. So for example, um, there's the cancer genomic Atlas that we've been working with, um, national cancer Institute, as well as the national human genomic research Institute. And, um, that's a very important data set that being computed against, um, uh, throughout the world, uh, commonly within the scientific community, that data set is called TCGA. Um, then we also have some, uh, uh, datasets are focused on certain groups. So for example, kids first is a data set. That's looking at a lot of the, um, challenges, uh, in diseases that kids get every kind of thing from very rare pediatric cancer as to heart defects, et cetera. >>And so we're working with them, but it's not just in the, um, uh, medical side. We have open data sets, um, with, uh, for example, uh, NOAA national ocean open national oceanic and atmospheric administration, um, to understand what's happening better with climate change and to slow the rate of climate change within the department of interior, they have a Landsat database that is looking at pictures of their birth cell, like pictures of the earth, so we can better understand the MCO world we live in. Uh, similarly, uh, NASA has, um, a lot of data that we put out there and, um, over in the department of energy, uh, there's data sets there, um, that we're researching against, or that the scientists are researching against to make sure that we have better clean, renewable energy sources, but it's not just government agencies that we work with when we find a dataset that's important. >>We also work with, um, nonprofit organizations, nonprofit organizations are also in, they're not flush with cash and they're trying to make every dollar work. And so we've worked with them, um, organizations like the child mind Institute or the Allen Institute for brain science. And these are largely like neuro imaging, um, data. And we made that available, um, via, um, our open data set, um, program. So there's a wide range of things that we're doing. And what's great about it is when we do it, you democratize science and you allowed many, many more science scientists to work on these problems. They're so critical for us. >>The availability is, is incredible, but also the, the breadth and depth of what you just spoke. It's not just government, for example, you've got about 30 seconds left. I'm going to ask you to summarize some of the announcements that you think are really, really critical for federal customers to be paying attention to from reinvent 2020. >>Yeah. So, um, one of the things that these federal government customers have been coming to us on is they've had to have new ways to communicate with their customer, with the public. And so we have a product that we've had for a while called on AWS connect, and it's been used very extensively throughout government customers. And it's used in industry too. We've had a number of, um, of announcements this weekend. Jasmine made multiple announcements on enhancement, say AWS connect or additional services, everything from helping to verify that that's the right person from AWS connect ID to making sure that that customer's gets a good customer experience to connect wisdom or making sure that the managers of these call centers can manage the call centers better. And so I'm really excited that we're putting in the hands of both government and industry, a cloud based solution to make their connections to the public better. >>It's all about connections these days, but I wish we had more time, cause I know we can unpack so much more with you, but thank you for joining me on the queue today, sharing some of the insights, some of the impacts and availability that AWS is enabling the scientific and other federal communities. It's incredibly important. And we appreciate your time. Thank you, Lisa, for Brett McMillan. I'm Lisa Martin. You're watching the cubes coverage of AWS reinvent 2020.

Published Date : Dec 10 2020

SUMMARY :

It's the cube with digital coverage of AWS It's great to see you glad that you're safe and well. It's great to be back. Talk to us about that because especially during the global health crisis that we're in, One of the reasons this is so important is that we know to find cures So making this data, sorry, I'm just going to making this data available to those scientists. And so, um, within NIH, we're working with, um, the, So in the cloud, you can scale up your compute, scale it down, and then kind of the third they're. And it speeds the ability for those scientists One of the things that Springs jumps out to me too, is it's in addition to enabling them to speed up research, And that's really hard to do in a typical on-premise data center. Talk to me about what you guys, take it to the next layer, like level like, uh, Amazon SageMaker. in the RFA on AWS, and that allows it to scale up and to work very efficiently. So the SRA data is available through the AWS open data sponsorship And so working with the university of Michigan, they've been analyzing absolutely imperative to the timing. And so, um, And so we're working with them, but it's not just in the, um, uh, medical side. And these are largely like neuro imaging, um, data. I'm going to ask you to summarize some of the announcements that's the right person from AWS connect ID to making sure that that customer's And we appreciate your time.

ENTITIES

Entity	Category	Confidence
NIH	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Brett McMillan	PERSON	0.99+
Brett McMillen	PERSON	0.99+
AWS	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
July of 2018	DATE	0.99+
2012	DATE	0.99+
Theresa Carlson	PERSON	0.99+
Jasmine	PERSON	0.99+
Lisa	PERSON	0.99+
90%	QUANTITY	0.99+
New York	LOCATION	0.99+
Allen Institute	ORGANIZATION	0.99+
SRA	ORGANIZATION	0.99+
last year	DATE	0.99+
Breton McMillan	PERSON	0.99+
NCBI	ORGANIZATION	0.99+
45 petabytes	QUANTITY	0.99+
SRE	ORGANIZATION	0.99+
seven and a half million people	QUANTITY	0.99+
third leg	QUANTITY	0.99+
One	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
earth	LOCATION	0.99+
over 2,500	QUANTITY	0.99+
SRA	TITLE	0.99+
S3	TITLE	0.98+
pandemic	EVENT	0.98+
first partnership	QUANTITY	0.98+
one	QUANTITY	0.98+
child mind Institute	ORGANIZATION	0.98+
U S	LOCATION	0.98+
this year	DATE	0.98+
pandemics	EVENT	0.98+
national cancer Institute	ORGANIZATION	0.98+
both	QUANTITY	0.98+
national heart lung and blood Institute	ORGANIZATION	0.98+
NOAA	ORGANIZATION	0.97+
national human genomic research Institute	ORGANIZATION	0.97+
today	DATE	0.97+
Landsat	ORGANIZATION	0.96+
first	QUANTITY	0.96+
11 million entries	QUANTITY	0.96+
about 30 seconds	QUANTITY	0.95+
year and a half	QUANTITY	0.94+
AWS connect	ORGANIZATION	0.93+
university of British Columbia	ORGANIZATION	0.92+
COVID	EVENT	0.91+
COVID-19	OTHER	0.91+
over tens of thousands of people	QUANTITY	0.91+

The University of Edinburgh and Rolls Royce Drive in Exascale Style | Exascale Day

>>welcome. My name is Ben Bennett. I am the director of HPC Strategic programs here at Hewlett Packard Enterprise. It is my great pleasure and honor to be talking to Professor Mark Parsons from the Edinburgh Parallel Computing Center. And we're gonna talk a little about exa scale. What? It means we're gonna talk less about the technology on Maura about the science, the requirements on the need for exa scale. Uh, rather than a deep dive into the enabling technologies. Mark. Welcome. >>I then thanks very much for inviting me to tell me >>complete pleasure. Um, so I'd like to kick off with, I suppose. Quite an interesting look back. You and I are both of a certain age 25 plus, Onda. We've seen these milestones. Uh, I suppose that the S I milestones of high performance computing's come and go, you know, from a gig a flop back in 1987 teraflop in 97 a petaflop in 2000 and eight. But we seem to be taking longer in getting to an ex a flop. Um, so I'd like your thoughts. Why is why is an extra flop taking so long? >>So I think that's a very interesting question because I started my career in parallel computing in 1989. I'm gonna join in. IPCC was set up then. You know, we're 30 years old this year in 1990 on Do you know the fastest computer we have them is 800 mega flops just under a getting flogged. So in my career, we've gone already. When we reached the better scale, we'd already gone pretty much a million times faster on, you know, the step from a tariff block to a block scale system really didn't feel particularly difficult. Um, on yet the step from A from a petaflop PETA scale system. To an extent, block is a really, really big challenge. And I think it's really actually related to what's happened with computer processes over the last decade, where, individually, you know, approached the core, Like on your laptop. Whoever hasn't got much faster, we've just got more often So the perception of more speed, but actually just being delivered by more course. And as you go down that approach, you know what happens in the supercomputing world as well. We've gone, uh, in 2010 I think we had systems that were, you know, a few 1000 cores. Our main national service in the UK for the last eight years has had 118,000 cores. But looking at the X scale we're looking at, you know, four or five million cores on taming that level of parallelism is the real challenge. And that's why it's taking an enormous and time to, uh, deliver these systems. That is not just on the hardware front. You know, vendors like HP have to deliver world beating technology and it's hard, hard. But then there's also the challenge to the users. How do they get the codes to work in the face of that much parallelism? >>If you look at what the the complexity is delivering an annex a flop. Andi, you could have bought an extra flop three or four years ago. You couldn't have housed it. You couldn't have powered it. You couldn't have afforded it on, do you? Couldn't program it. But you still you could have You could have bought one. We should have been so lucky to be unable to supply it. Um, the software, um I think from our standpoint, is is looking like where we're doing mawr enabling with our customers. You sell them a machine on, then the the need then to do collaboration specifically seems mawr and Maura around the software. Um, so it's It's gonna be relatively easy to get one x a flop using limb pack, but but that's not extra scale. So what do you think? On exa scale machine versus an X? A flop machine means to the people like yourself to your users, the scientists and industry. What is an ex? A flop versus >>an exa scale? So I think, you know, supercomputing moves forward by setting itself challenges. And when you when you look at all of the excess scale programs worldwide that are trying to deliver systems that can do an X a lot form or it's actually very arbitrary challenge. You know, we set ourselves a PETA scale challenge delivering a petaflop somebody manage that, Andi. But you know, the world moves forward by setting itself challenges e think you know, we use quite arbitrary definition of what we mean is well by an exit block. So, you know, in your in my world, um, we either way, first of all, see ah flop is a computation, so multiply or it's an ad or whatever on we tend. Thio, look at that is using very high precision numbers or 64 bit numbers on Do you know, we then say, Well, you've got to do the next block. You've got to do a billion billion of those calculations every second. No, a some of the last arbitrary target Now you know today from HPD Aiken by my assistant and will do a billion billion calculations per second. And they will either do that as a theoretical peak, which would be almost unattainable, or using benchmarks that stressed the system on demonstrate a relaxing law. But again, those benchmarks themselves attuned Thio. Just do those calculations and deliver and explore been a steady I'll way if you like. So, you know, way kind of set ourselves this this this big challenge You know, the big fence on the race course, which were clambering over. But the challenge in itself actually should be. I'm much more interesting. The water we're going to use these devices for having built um, eso. Getting into the extra scale era is not so much about doing an extra block. It's a new generation off capability that allows us to do better scientific and industrial research. And that's the interesting bit in this whole story. >>I would tend to agree with you. I think the the focus around exa scale is to look at, you know, new technologies, new ways of doing things, new ways of looking at data and to get new results. So eventually you will get yourself a nexus scale machine. Um, one hopes, sooner rather >>than later. Well, I'm sure you don't tell me one, Ben. >>It's got nothing to do with may. I can't sell you anything, Mark. But there are people outside the door over there who would love to sell you one. Yes. However, if we if you look at your you know your your exa scale machine, Um, how do you believe the workloads are going to be different on an extra scale machine versus your current PETA scale machine? >>So I think there's always a slight conceit when you buy a new national supercomputer. On that conceit is that you're buying a capability that you know on. But many people will run on the whole system. Known truth. We do have people that run on the whole of our archer system. Today's A 118,000 cores, but I would say, and I'm looking at the system. People that run over say, half of that can be counted on Europe on a single hand in a year, and they're doing very specific things. It's very costly simulation they're running on. So, you know, if you look at these systems today, two things show no one is. It's very difficult to get time on them. The Baroque application procedures All of the requirements have to be assessed by your peers and your given quite limited amount of time that you have to eke out to do science. Andi people tend to run their applications in the sweet spot where their application delivers the best performance on You know, we try to push our users over time. Thio use reasonably sized jobs. I think our average job says about 20,000 course, she's not bad, but that does mean that as we move to the exits, kill two things have to happen. One is actually I think we've got to be more relaxed about giving people access to the system, So let's give more people access, let people play, let people try out ideas they've never tried out before. And I think that will lead to a lot more innovation and computational science. But at the same time, I think we also need to be less precious. You know, we to accept these systems will have a variety of sizes of job on them. You know, we're still gonna have people that want to run four million cores or two million cores. That's absolutely fine. Absolutely. Salute those people for trying really, really difficult. But then we're gonna have a huge spectrum of views all the way down to people that want to run on 500 cores or whatever. So I think we need Thio broaden the user base in Alexa Skill system. And I know this is what's happening, for example, in Japan with the new Japanese system. >>So, Mark, if you cast your mind back to almost exactly a year ago after the HPC user forum, you were interviewed for Premier Magazine on Do you alluded in that article to the needs off scientific industrial users requiring, you know, uh on X a flop or an exa scale machine it's clear in your in your previous answer regarding, you know, the workloads. Some would say that the majority of people would be happier with, say, 10 100 petaflop machines. You know, democratization. More people access. But can you provide us examples at the type of science? The needs of industrial users that actually do require those resources to be put >>together as an exa scale machine? So I think you know, it's a very interesting area. At the end of the day, these systems air bought because they are capability systems on. I absolutely take the argument. Why shouldn't we buy 10 100 pattern block systems? But there are a number of scientific areas even today that would benefit from a nexus school system and on these the sort of scientific areas that will use as much access onto a system as much time and as much scale of the system as they can, as you can give them eso on immediate example. People doing chroma dynamics calculations in particle physics, theoretical calculations, they would just use whatever you give them. But you know, I think one of the areas that is very interesting is actually the engineering space where, you know, many people worry the engineering applications over the last decade haven't really kept up with this sort of supercomputers that we have. I'm leading a project called Asimov, funded by M. P S O. C in the UK, which is jointly with Rolls Royce, jointly funded by Rolls Royce and also working with the University of Cambridge, Oxford, Bristol, Warrick. We're trying to do the whole engine gas turbine simulation for the first time. So that's looking at the structure of the gas turbine, the airplane engine, the structure of it, how it's all built it together, looking at the fluid dynamics off the air and the hot gasses, the flu threat, looking at the combustion of the engine looking how fuel is spread into the combustion chamber. Looking at the electrics around, looking at the way the engine two forms is, it heats up and cools down all of that. Now Rolls Royce wants to do that for 20 years. Andi, Uh, whenever they certify, a new engine has to go through a number of physical tests, and every time they do on those tests, it could cost them as much as 25 to $30 million. These are very expensive tests, particularly when they do what's called a blade off test, which would be, you know, blade failure. They could prove that the engine contains the fragments of the blade. Sort of think, continue face really important test and all engines and pass it. What we want to do is do is use an exa scale computer to properly model a blade off test for the first time, so that in future, some simulations can become virtual rather than having thio expend all of the money that Rolls Royce would normally spend on. You know, it's a fascinating project is a really hard project to do. One of the things that I do is I am deaf to share this year. Gordon Bell Price on bond I've really enjoyed to do. That's one of the major prizes in our area, you know, gets announced supercomputing every year. So I have the pleasure of reading all the submissions each year. I what's been really interesting thing? This is my third year doing being on the committee on what's really interesting is the way that big systems like Summit, for example, in the US have pushed the user communities to try and do simulations Nowhere. Nobody's done before, you know. And we've seen this as well, with papers coming after the first use of the for Goku system in Japan, for example, people you know, these are very, very broad. So, you know, earthquake simulation, a large Eddie simulations of boats. You know, a number of things around Genome Wide Association studies, for example. So the use of these computers spans of last area off computational science. I think the really really important thing about these systems is their challenging people that do calculations they've never done before. That's what's important. >>Okay, Thank you. You talked about challenges when I nearly said when you and I had lots of hair, but that's probably much more true of May. Um, we used to talk about grand challenges we talked about, especially around the teraflop era, the ski red program driving, you know, the grand challenges of science, possibly to hide the fact that it was a bomb designing computer eso they talked about the grand challenges. Um, we don't seem to talk about that much. We talk about excess girl. We talk about data. Um Where are the grand challenges that you see that an exa scale computer can you know it can help us. Okay, >>so I think grand challenges didn't go away. Just the phrase went out of fashion. Um, that's like my hair. I think it's interesting. The I do feel the science moves forward by setting itself grand challenges and always had has done, you know, my original backgrounds in particle physics. I was very lucky to spend four years at CERN working in the early stage of the left accelerator when it first came online on. Do you know the scientists there? I think they worked on left 15 years before I came in and did my little ph d on it. Andi, I think that way of organizing science hasn't changed. We just talked less about grand challenges. I think you know what I've seen over the last few years is a renaissance in computational science, looking at things that have previously, you know, people have said have been impossible. So a couple of years ago, for example, one of the key Gordon Bell price papers was on Genome Wide Association studies on some of it. If I may be one of the winner of its, if I remember right on. But that was really, really interesting because first of all, you know, the sort of the Genome Wide Association Studies had gone out of favor in the bioinformatics by a scientist community because people thought they weren't possible to compute. But that particular paper should Yes, you could do these really, really big Continental little problems in a reasonable amount of time if you had a big enough computer. And one thing I felt all the way through my career actually is we've probably discarded Mawr simulations because they were impossible at the time that we've actually decided to do. And I sometimes think we to challenge ourselves by looking at the things we've discovered in the past and say, Oh, look, you know, we could actually do that now, Andi, I think part of the the challenge of bringing an extra service toe life is to get people to think about what they would use it for. That's a key thing. Otherwise, I always say, a computer that is unused to just be turned off. There's no point in having underutilized supercomputer. Everybody loses from that. >>So Let's let's bring ourselves slightly more up to date. We're in the middle of a global pandemic. Uh, on board one of the things in our industry has bean that I've been particularly proud about is I've seen the vendors, all the vendors, you know, offering up machine's onboard, uh, making resources available for people to fight things current disease. Um, how do you see supercomputers now and in the future? Speeding up things like vaccine discovery on help when helping doctors generally. >>So I think you're quite right that, you know, the supercomputer community around the world actually did a really good job of responding to over 19. Inasmuch as you know, speaking for the UK, we put in place a rapid access program. So anybody wanted to do covert research on the various national services we have done to the to two services Could get really quick access. Um, on that, that has worked really well in the UK You know, we didn't have an archer is an old system, Aziz. You know, we didn't have the world's largest supercomputer, but it is happily bean running lots off covert 19 simulations largely for the biomedical community. Looking at Druk modeling and molecular modeling. Largely that's just been going the US They've been doing really large uh, combinatorial parameter search problems on on Summit, for example, looking to see whether or not old drugs could be reused to solve a new problem on DSO, I think, I think actually, in some respects Kobe, 19 is being the sounds wrong. But it's actually been good for supercomputing. Inasmuch is pointed out to governments that supercomputers are important parts off any scientific, the active countries research infrastructure. >>So, um, I'll finish up and tap into your inner geek. Um, there's a lot of technologies that are being banded around to currently enable, you know, the first exa scale machine, wherever that's going to be from whomever, what are the current technologies or emerging technologies that you are interested in excited about looking forward to getting your hands on. >>So in the business case I've written for the U. K's exa scale computer, I actually characterized this is a choice between the American model in the Japanese model. Okay, both of frozen, both of condoms. Eso in America, they're very much gone down the chorus plus GPU or GPU fruit. Um, so you might have, you know, an Intel Xeon or an M D process er center or unarmed process or, for that matter on you might have, you know, 24 g. P. U s. I think the most interesting thing that I've seen is definitely this move to a single address space. So the data that you have will be accessible, but the G p u on the CPU, I think you know, that's really bean. One of the key things that stopped the uptake of GPS today and that that that one single change is going Thio, I think, uh, make things very, very interesting. But I'm not entirely convinced that the CPU GPU model because I think that it's very difficult to get all the all the performance set of the GPU. You know, it will do well in H p l, for example, high performance impact benchmark we're discussing at the beginning of this interview. But in riel scientific workloads, you know, you still find it difficult to find all the performance that has promised. So, you know, the Japanese approach, which is the core, is only approach. E think it's very attractive, inasmuch as you know They're using very high bandwidth memory, very interesting process of which they are going to have to, you know, which they could develop together over 10 year period. And this is one thing that people don't realize the Japanese program and the American Mexico program has been working for 10 years on these systems. I think the Japanese process really interesting because, um, it when you look at the performance, it really does work for their scientific work clothes, and that's that does interest me a lot. This this combination of a A process are designed to do good science, high bandwidth memory and a real understanding of how data flows around the supercomputer. I think those are the things are exciting me at the moment. Obviously, you know, there's new networking technologies, I think, in the fullness of time, not necessarily for the first systems. You know, over the next decade we're going to see much, much more activity on silicon photonics. I think that's really, really fascinating all of these things. I think in some respects the last decade has just bean quite incremental improvements. But I think we're supercomputing is going in the moment. We're a very very disruptive moment again. That goes back to start this discussion. Why is extra skill been difficult to get? Thio? Actually, because the disruptive moment in technology. >>Professor Parsons, thank you very much for your time and your insights. Thank you. Pleasure and folks. Thank you for watching. I hope you've learned something, or at least enjoyed it. With that, I would ask you to stay safe and goodbye.

Published Date : Oct 16 2020

SUMMARY :

I am the director of HPC Strategic programs I suppose that the S I milestones of high performance computing's come and go, But looking at the X scale we're looking at, you know, four or five million cores on taming But you still you could have You could have bought one. challenges e think you know, we use quite arbitrary focus around exa scale is to look at, you know, new technologies, Well, I'm sure you don't tell me one, Ben. outside the door over there who would love to sell you one. So I think there's always a slight conceit when you buy a you know, the workloads. That's one of the major prizes in our area, you know, gets announced you know, the grand challenges of science, possibly to hide I think you know what I've seen over the last few years is a renaissance about is I've seen the vendors, all the vendors, you know, Inasmuch as you know, speaking for the UK, we put in place a rapid to currently enable, you know, I think you know, that's really bean. Professor Parsons, thank you very much for your time and your insights.

ENTITIES

Entity	Category	Confidence
Ben Bennett	PERSON	0.99+
1989	DATE	0.99+
Rolls Royce	ORGANIZATION	0.99+
UK	LOCATION	0.99+
500 cores	QUANTITY	0.99+
10 years	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Parsons	PERSON	0.99+
1990	DATE	0.99+
Mark	PERSON	0.99+
2010	DATE	0.99+
1987	DATE	0.99+
HP	ORGANIZATION	0.99+
118,000 cores	QUANTITY	0.99+
first time	QUANTITY	0.99+
four years	QUANTITY	0.99+
America	LOCATION	0.99+
CERN	ORGANIZATION	0.99+
third year	QUANTITY	0.99+
four	QUANTITY	0.99+
first	QUANTITY	0.99+
30 years	QUANTITY	0.99+
2000	DATE	0.99+
four million cores	QUANTITY	0.99+
two million cores	QUANTITY	0.99+
Genome Wide Association	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Ben	PERSON	0.99+
first systems	QUANTITY	0.99+
two forms	QUANTITY	0.99+
US	LOCATION	0.99+
both	QUANTITY	0.99+
IPCC	ORGANIZATION	0.99+
three	DATE	0.99+
today	DATE	0.98+
Hewlett Packard Enterprise	ORGANIZATION	0.98+
University of Cambridge	ORGANIZATION	0.98+
five million cores	QUANTITY	0.98+
a year ago	DATE	0.98+
single	QUANTITY	0.98+
Mark Parsons	PERSON	0.98+
two things	QUANTITY	0.98+
$30 million	QUANTITY	0.98+
one	QUANTITY	0.98+
Edinburgh Parallel Computing Center	ORGANIZATION	0.98+
Aziz	PERSON	0.98+
Gordon Bell	PERSON	0.98+
May	DATE	0.98+
64 bit	QUANTITY	0.98+
Europe	LOCATION	0.98+
One	QUANTITY	0.97+
each year	QUANTITY	0.97+
about 20,000 course	QUANTITY	0.97+
Today	DATE	0.97+
Alexa	TITLE	0.97+
this year	DATE	0.97+
HPC	ORGANIZATION	0.96+
Intel	ORGANIZATION	0.96+
Xeon	COMMERCIAL_ITEM	0.95+
25	QUANTITY	0.95+
over 10 year	QUANTITY	0.95+
1000 cores	QUANTITY	0.95+
Thio	PERSON	0.95+
800 mega flops	QUANTITY	0.95+
Professor	PERSON	0.95+
Andi	PERSON	0.94+
one thing	QUANTITY	0.94+
couple of years ago	DATE	0.94+
over 19	QUANTITY	0.93+
U. K	LOCATION	0.92+
Premier Magazine	TITLE	0.92+
10 100 petaflop machines	QUANTITY	0.91+
four years ago	DATE	0.91+
Exascale	LOCATION	0.91+
HPD Aiken	ORGANIZATION	0.91+

ON DEMAND R AND D DATA PLATFORM GSK FINAL2

>>Hey, everyone, Thanks for taking them to join the story. Hope you and your loved ones are safe during these tough times. Let me start by introducing myself. My name is Michelle. When I walk for GlaxoSmithKline, GSK as an engineering manager in my current role, A little protocol platform A P s, which is part of the already data platform here in G S, K R and D Tech. I live in Dallas, Texas. I have a Masters degree in computer science on a bachelor's in electronics and communication engineering. I started my career as a software developer on over these years again a lot of experience in leading and building, not scale and predicts products and solutions. I also have a complete accountability for container platforms here at GSK or any tick. I've been working very closely with Dr Enterprise, which is no Miranda's for more than three years to enable container platforms that yes, came on mainly in our own Itek. So that's me. Let >>me give you a quick overview on agenda for today's talk. I'll start with what we do here at GSK on what is RND data platform. Then I'll give you an overview on What are the business drivers that >>motivated US toe? Take this container Germany on some insight into learnings on accomplishments over these years. Working with Dr Enterprise on the container platforms Lately, you must have seen a lot of articles off there which talk about how ts case liberating technologies like artificial intelligence, mission learning, UN data and analytics for the Douglas Corey process. I'm very excited to see the progress we have made in technology, but what makes us truly unique is our commitment to the patient. >>We're G escape, help millions of people, do more, feel better and live longer. Wear a global company that is focused on three were tickles pharmaceuticals vaccines on consumer healthcare. Our main intent is to lower the >>burden on the impact of diseases on the patients. Here at GSK, we allow science to drive the technology. This helps us toe build innovative products. That's helps our scientists to make better and faster additions throughout the drug discovery by plane. >>With that, let me give you some >>context on what currently data platform is how it is enabled. A T escape started in mid 2016. What used to be called us are any information platform whose main focus was to centralize curate on rationalized all the data produced within the others are in the business systems in orderto drive, a strategic business value, standardization of clinical trials, Genome Wide Association Study Analysis, also known as Jesus Storage and Crossing Off Rheal. World Evidence data some of the examples off how the only platform was used to deliver the business value four years later. No, a new set off business rivals of changing our landscape. The irony Information Platform is evolving to be a hybrid, multi cloud solution and is known as already did a platform refering to 20 >>19 GSK's annual report. These are the four teams that there are any platform will be mainly focused on. We're expanding our data capabilities to support the use. Escape by a former company on evolving into a hybrid medical platform is one of the many steps that we're taking to be future ready. Our key focus will still be making >>greater recommendations better and faster by using that wants us. We're making the areas like artificial intelligence and machine learning. No doc brings us toe. What is Germany is important. Why are we taking this German with that? Let me take you to the next topic off. Like the process of discovery, Francisco is not an easy process. Talking about the recent events occurred over the last few months on the way. How all our lives are impacted. It is a lot of talk on information going about. Why did drug discovery process is so tough working for a global health care company? I get asked this question very frequently. From many people I interact with. Question is like, Why is that? This car is so tough on why it takes so much time. Drug discovery is a complex process that involves multiple different stages on at each and every stage. There is huge amounts of data that the scientists have took process to make some decisions. Studies have shown that only 3% off small molecules entering the human studies actually become medicines. If you're new to drug discovery, you may ask, like what is the targets? Targets so low? We humans are very complex species, >>not going into the details of the process. We're G escape >>have made a lot of investments into technology that enabled us to make data river conditions. Throw the drug Discovery pipeline >>as we implement. As we started implementing these tools and technologies to enable already did a platform, we started to get a better appreciation off how these tools in track on integrate >>with each other. Our goal wants to make this platform a jail, the platform that can work at scale so that we can provide a great user experience and contribute back to the bread discovery pipeline so that the scientists can make faster editions. We want our ardently users to consume the data, and the service is available on the platform seamlessly in a self service fashion. And we also have to accomplish this by establishing trust. And then we have to end also enable the academic partnerships, acquisitions, collaborations that DSK has, which actually brings a lot of data on value to our scientists. So when we talk about so many collaborations and a lot of these systems, what this brings in is wide range off systems and platforms that are fundamentally built on different infrastructure. This is where Doctor comes into fiction on our containers significance. >>We have realized the power of containers on how we can simplify this complex ecosystem by using containers and provide a faster access off data to war scientists who didn't go >>back and contribute back to the drug discovery by play. >>With that, let me take talk to you about >>the containers journey and she escaped. So we started our container journey in late 2017. We started working with Dr Enterprise to enable the container platform. This is on our on prem infrastructure Back then, or first year or so we walked through multiple Pelosis did a lot of testing to make sure our platform is stable before we onboard either the data or the user applications. I was part of this complete journey on Dr Stream has worked with us very closely towards you. The first milestone off establishing a stable container platform. A tsk. Now, getting into 2019 we started deploying our applications in production environment. I cannot go into the details of what this Absar, but they do include both data pipelines as well as Web services. You know, initial days we have worked a lot on swamp, but in 2019 is when we started looking into communities in the same year, we enable kubernetes orchestration on the doctor and replace platform here at GSK and also made it as a de facto orchestra coming into 2020. All our micro service applications are undead. A pipelines are migrated to the container platforms on all of these are orchestrated by Cuban additional on these air applications that are running in production. As of today, we have made the container forced approach as an architectural standard across already taking GSK. We also started deploying our AML training models onto containers on All this work is happening on our Doctor Enterprise platform. Also as part off are currently platforms hybrid multicolored journey. We started enabling container and kubernetes based platforms on public clubs. Now going into 2021 on future. Enabling our RND users to easily access data and applications in a platform agnostic way is very crucial for our success because previously we had only onto him. Now we have public clothes that are getting involved on One of >>the many steps we're taking through this journey is to >>watch allies the data on ship data and containers or kubernetes volumes on demand to our our end users of scientists. And this allows us to deliver data to our scientists wherever they want in a very security on. We're leveraging doctor to do it. So that's >>our future. Learning on with that, let's take a deep dive into fuel for >>our accomplishments over these years. I want to start with a general demand and innovative one very interesting use case that we developed on Dr. This is a rapid prototyping capability that enabled our scientists seamlessly to Monday cluster communication. This was one off the biggest challenges which way his face for a long time and with the help of containers, were able to solve this on provide this as a capability to our scientists. We actually have shockers this capability in one of the doctor conferences before next. As I've said before, by migrating all over web services into containers, we not only achieved horizontal scalability for those specific services, but also saved more than 50% in support costs for the applications which we have migrated by making Docker image as an immutable artifact In our bill process, we are now able to deploy our APS or models in any container or Cuban, its base platform, either in on Prem or in a public club. We also made significant improvements towards the process. A not a mission By leveraging docker containers, containers have played a significant role in keeping US platform agnostic and thus enabling our hybrid multi cloud Germany valuable for out already did scientists. As I mentioned before, data virtualization is another viewpoint we have in terms off our next steps off where we want to take kubernetes on where we wanna leverage open it. Us. What you see here are just a few off many accomplishments which we have our, um, achieved by using containers for the past three years or so. So with that before I close all the time and acknowledge all our internal partners who has contributed a lot to this journey mainly are in the business are on the deck on the broader take. Organizations that escape also want to time document present Miranda's for being such a great partner throughout this journey and also giving us an opportunity to share this success story today. Lastly, thanks for everyone to listening to the stop and please feel free to reach out. If you have any questions or suggestions, let's be fit safe. Thank you

Published Date : Sep 14 2020

SUMMARY :

Hey, everyone, Thanks for taking them to join the story. What are the business drivers that our commitment to the patient. Our main intent is to lower the burden on the impact of diseases on the patients. World Evidence data some of the examples off how the only platform was evolving into a hybrid medical platform is one of the many steps that we're taking to be There is huge amounts of data that the scientists have took process to not going into the details of the process. have made a lot of investments into technology that enabled us to make data river conditions. enable already did a platform, we started to get a better appreciation off how these And then we have to end also enable the academic partnerships, I cannot go into the details of what this Absar, but they do include both data pipelines We're leveraging doctor to do it. Learning on with that, let's making Docker image as an immutable artifact In our bill process, we are now able to

ENTITIES

Entity	Category	Confidence
DSK	ORGANIZATION	0.99+
Michelle	PERSON	0.99+
2019	DATE	0.99+
2020	DATE	0.99+
GSK	ORGANIZATION	0.99+
late 2017	DATE	0.99+
2021	DATE	0.99+
G S	ORGANIZATION	0.99+
three	QUANTITY	0.99+
mid 2016	DATE	0.99+
K R	ORGANIZATION	0.99+
Monday	DATE	0.99+
more than 50%	QUANTITY	0.99+
D Tech	ORGANIZATION	0.99+
Dallas, Texas	LOCATION	0.99+
four teams	QUANTITY	0.99+
more than three years	QUANTITY	0.98+
GlaxoSmithKline	ORGANIZATION	0.98+
four years later	DATE	0.98+
US	LOCATION	0.98+
today	DATE	0.98+
first milestone	QUANTITY	0.98+
Dr Stream	ORGANIZATION	0.97+
millions of people	QUANTITY	0.97+
3%	QUANTITY	0.97+
one	QUANTITY	0.96+
Miranda	PERSON	0.95+
Germany	LOCATION	0.94+
20	QUANTITY	0.94+
Itek	ORGANIZATION	0.93+
both data pipelines	QUANTITY	0.92+
Dr Enterprise	ORGANIZATION	0.92+
Francisco	PERSON	0.88+
Miranda	ORGANIZATION	0.84+
each	QUANTITY	0.82+
Cuban	OTHER	0.82+
G escape	ORGANIZATION	0.78+
first year	QUANTITY	0.75+
One	QUANTITY	0.74+
last	DATE	0.72+
past three years	DATE	0.71+
months	DATE	0.7+
Crossing Off Rheal	TITLE	0.68+
GSK	TITLE	0.67+
German	OTHER	0.65+
Douglas Corey	PERSON	0.62+
same year	DATE	0.59+
Cuban	LOCATION	0.56+
Wide Association	TITLE	0.55+
Jesus Storage	TITLE	0.55+
R	ORGANIZATION	0.5+
19 GSK	QUANTITY	0.5+
Genome	ORGANIZATION	0.48+
Doctor	TITLE	0.45+
Pelosis	LOCATION	0.42+

Shez Partovi MD, AWS | AWS Summit New York 2019

>> live from New York. It's the Q covering AWS Global Summit 2019 brought to you by Amazon Web service, is >> welcome back here to New York City. You're watching the Cube, the worldwide leader in Enterprise Tech cover jumps to minimum. My co host for today is Cory Quinn and happy to welcome to the program. A first time guest on the program, says Heart O. B. Who is a senior leader of global business development with Healthcare Life. Scientists know this group and AWS thanks so much for joining us. All right, so you know, we love digging into some of the verticals here in New York City. Of course, it's been a lot of time on the financial service is peas we actually had, Ah, another one of our teams out of the eight of us. Imagine show going on yesterday in Seattle with a lot of the education pieces. So healthcare, life sciences in genomics, little bit of tech involved in those groups, a lot of change going on in that world. So give us a thumbnail if you would as toe what what's happening in your >> world so well just from a scope one of you Health care includes life set paid on provider Life sciences is far more by attacking its most medical device and then genomics and what we're seeing in those spaces. Let's start with health care. It's such a broad thing, will just sort of back to back and forth in health care itself. What we're sort of seeing their customs ask us to focus on and to help them do falls into three categories. First, is a lot of customers ask us to help them personalized the consumer health journey. You and I, all of us, are so accustomed to that frictionless experiences we have elsewhere and in health care. There's a lot more friction. And so we're getting a lot of enquiries and request for us to help them transform that experience. Make it frictionless. So an example That would be if you're familiar with Doc. Doc started here in New York. Actually, when you want a book, an appointment, Doc, Doc, you can normally, if you go online, I have to put information for insurance. You type it all. Then it's full of friction. Have to put all the fields in. They use one of our A I service's image recognition, and you simply hold up your card to the camera and it able to pull your in transporation, determine eligibility and look the right appointment for you. So that's an example of removing friction for the consumer of the health consume over the patient as they're trying to go to that health care and excessive category one frictionless experiences using AWS to support it with a i service is category, too. We're getting a lot of interest for us to help health systems predict patient health events. So anything of value base care the way you actually are able to change the cost. Quality Curve is predicting events, not just dealing with math and so using a i Am L service is on top of data to predict and forecast events is a big part of one example would be with sooner where they moved, they're healthy and 10 platform, which is a launch to a patient record platform onto AWS. About 223,000,000 individuals that are on that platform Men we did a study with him where way consume about 210,000 individual patient data and created a machine learning model this is published where you can predict congestive heart failure 15 months in advance of it actually occurring. So when you look at that, that prediction are forecasting that sort of one of the powers of this princess. What category number two is predicting health events, and then the last one I'd be remiss in leaving out is that you probably have heard a lot of discussion on physician and a clinician. Burnout to the frustrations of the nurses or doctors and Muslims have the heart of that is not having the right information the right time to take care of the right patient. Data liquidity and in Trop ability is a huge challenge, and a lot of our customers are asking us to help solve those problems with them. You know it hims. This year we announced, together with change Healthcare Change Healthcare said they want to provide free and troubling to the country on AWS, with the platform supporting that. So those are sort of three categories. Personalize the consumer health journey. Predicting patient health events and promoting intra ability is sort of the signals that we're seeing in areas that were actively supporting our customers and sort of elevating the human condition. >> It's very easy to look at the regulation around things like health care and say, Oh, that gets in the way and its onerous and we're not gonna deal with it or it should be faster. I don't think anyone actively wants that. We like the fact that our hospitals were safe, that health care is regulated and in some of the ways that it is at least. But I saw an artifact of that means that more than many other areas of what AWS does is your subject to regulatory speed of Sloane. A speed of feature announcement, as opposed to being able to do it as fast technology allows relatively easy example of this was a few years back. In order to run, get eight of us to sign a B A. For hip, a certification, you have to run dedicated tendency instances and will not changed about a year and 1/2 2 years ago or even longer. Depending it's it all starts to run together after a time, but once people learn something, they don't tend to go back and validate whether it's still true. How do you just find that communicating to your customers about things that were not possible yesterday now are, >> yeah, when you look at hip eligibility. So as you know, a devious is about over 100 him eligible service's, which means that these are so this is that so compliance that you start their compliance, Remember, is an outcome, not a future. So compliance is a combination of people process platform, and we bring the platform that's hip eligible, and our customers bring the people in process, if you will, to use that platform, which then becomes complying with regulatory requirements. And so you're absolutely right. There's a diffusion of sort of understanding of eligibility, a platform, and then they worked with customers have to do in order as a shared responsibility to do it. That diffusion is sometimes slower. In fact, there's sometimes misinformation. So we always see it work with our customers and that shared, responsive model so that they can meet their requirements as they come to the cloud. And we can bring platforms that are eligible for hip. They can actually carry out the work clothes they need to. So it's it's that money, you know, the way I think of it is. This when you think of compliance, is that if if I were to build for you a deadbolt for your door and I can tell you that this complies boasted of things, but you put the key under the mat way might not be complying with security and regular requirements for our house. So it's a share responsible. I'll make the platform be eligible and compliant, and so that collective does daytime and dusting. People are saying that there is a flat from this eligible, and then they have to also, in their response to work to the people in process potion to make the totality of it comply with the requirements for regulatory for healthcare regulatory requirements. >> Some of the interesting conversations I've had in the last few years in health care in the industry is collaborations that are going on, you know, how do we share data while still maintaining all of the regulations that are involved? Where does that leave us get involved? There >> should. That's a fact. There is a data sharing part of that did a liquidity story that we talked about earlier in terms of instability. I'll give an example of where AWS actually actively working in that space. You may be familiar with a service we launched last November at Reinvent called Amazon Campion Medical and Campion Medical. What it does is it looks at a medical note and can extract key information. So if you think back to in high school, when you used to read a book in highlighting yellow key concepts that you wanted to remember for an exam Amazon Carmen Medical Same thing exactly, can lift key elements and goes from a text blob, too discrete data that has relationship ontology and that allows data sharing where you where you need to. But then there's one of the piece, so that's when you're allowed to disclose there's one of me. Sometimes you and I want to work on something, but we want to actually read act the patient information that allows data sharing as well. So Amazon coming medical also allows you to read, act. Think of when a new challenge shows that federally protected doctor that's blacked out Amazon com for American also remove patient identifying information. So if you and I want to collaborate on research project, you have a set of data that you wanna anonima de identify. I have data information of I D identified. To put it together, I can use Amazon com Medical Read Act All the patient information Make it d identified. You can do the same. And now we can combine the three of us that information to build models, to look a research and to do data sharing. So whether you have full authority to to share patient information and use the ontological portion of it, or whether you want to do the identifying matter, Amazon competent medical helps you do that. >> What's impressive and incredible is that whether we like it or not, there's something a little special about health care where I can decide I'm not going to be on the Internet. Social media things all stop tweeting. Most people would thank me for that, or I can opt out of ride sharing and only take taxis, for example. But we're all sooner or later going to be customers of the health care industry, and as a result, this is some of that effects, all of us, whether we want to acknowledge that or not. I mean, where some of us are still young enough to believe that we have this immortality streak going on. So far, so good. But it becomes clear that this is the sort of thing where the ultimate customer is all of us. As you take a look at that, does that inform how AWS is approaching this entire sector? >> Absolutely. In fact, I'd like to think that a W brought a physician toe lead sector because they understood that in addition to our customer obsession that we see through the customer to the individual and that we want to elevate the human condition we wanted obsess over our customers success so that we can affect positive action on the lives of individuals everywhere. To me, that is a turn. The reason I joined it of U. S s. So that's it. Certainly practice of healthcare Life's I said on genomic Seti ws has been around for about six years. A doubIe s double that. And so actually it's a mature practice and our understanding of our customers definitely includes that core flame that it's about people and each of us come with a special story. In fact, you know the people that work in the U. S. Healthcare life, science team people that have been to the bedside there, people that have been adventure that I worked in the farm industry, healthcare, population, health. They all are there because of that thing you just said. Certainly I'm there because that on the entire practice of self life sciences is keenly aware of looking through the customers to the >> individual pieces. All right, how much? You know, mix, you know, definitely an area where compute storage are critically important than we've seen. Dramatic change. You know, in the last 5 to 10 years, anything specific you could share on that >> Genomics genomex is an area where you need incredible computer storage on. In our case, for example, alumina, which is one of our customers, runs about 85% of all gene sequencing on the planet is in aws customer stores. All that data on AWS. So when you look at genomex, real power of genomics is the fact that enables precision diagnostics. And so when you look at one of our customers, Grail Grail, that uses genomic fragments in the blood that may be coming from cancer and actually sequences that fragment and then on AWS will use the power of the computer to do machine learning on that Gino Mexicans from to determine if you might have one of those 1 10 to 12 cancers that they're currently screening for. And so when you talk to a position health, it really can't be done without position diagnostics, which depends on genomex, which really is an example of that. It runs on AWS because we bring compute and storage essentially infinite power. To do that you want, For example, you know the first whole genome sequence took 14 years. And how many billions of dollars Children's Hospital Philadelphia now does 1000 whole genome sequences in two hours and 20 minutes on AWS, they spike up 20,000 see few cores, do that desi and then moved back down. Genomics. The field that literally can't be. My humble opinion can't be done outside the cloud. It just the mechanics of needed. The storage and compute power is one that is born in the cloud on AWS has those examples that I shared with you. >> It's absolutely fantastic and emerging space, and it's it's interesting to watch that despite the fact there is a regulatory burden that everything was gonna dispute that and the gravity of what it does. I'm not left with sense that feature enhancement and development and velocity of releases is slower somehow in health care than it is across the entire rest of the stack. Is that an accurate assessment, or is there a bit of a drag effect on that? >> Do you mean in the health care customers are on AWS speaking >> on AWS aside, citizen customers are going to be customers. Love them. We >> do aws. You know, we obviously innovation is a rowdy and we release gosh everything. About 2011 we released 80 front service than features and jumped 1015 where it was like 702 jumped 2018. Where was 1957 features? That's like a 25 fold. Our pace of innovation is not going to slow down. It's going to continue. It's in our blood in our d. N. A. We in fact, hire people that are just not satisfied. The status quo on want to innovate and change things. Just, you know, innovation is the beginning of the end of the story, so, no, I don't have to spend any slowdown. In fact, when you add machine learning models on machine learning service that we're putting in? I only see it. An even faster hockey stick of the service is that we're gonna bring out. And I want you to come to reinvent where we're going to announce the mall and you you will be there and see that. All >> right, well, on that note thank you so much for giving us the update on healthcare Life Sciences in genomics. Absolutely. Want to see the continued growth and innovation in that? >> My pleasure. Thank you for having a show. All >> right. For Cory, Queen of Stupid Men. The Cube's coverage never stops either. We, of course, will be at eight of us reinvent this fall as well as many other shows. So, as always, thanks for watching the cue.

Published Date : Jul 11 2019

SUMMARY :

Global Summit 2019 brought to you by Amazon Web service, All right, so you know, we love digging into some of the verticals here of that is not having the right information the right time to take care of the right patient. Oh, that gets in the way and its onerous and we're not gonna deal with it or it should be faster. So it's it's that money, you know, the way I think of it is. ontology and that allows data sharing where you where you need to. of the health care industry, and as a result, this is some of that effects, S. Healthcare life, science team people that have been to the bedside there, You know, mix, you know, definitely an area where compute To do that you want, For example, that despite the fact there is a regulatory burden that everything was gonna dispute that and the on AWS aside, citizen customers are going to be customers. And I want you to come to reinvent where we're going to announce the mall and you you will be there and see that. right, well, on that note thank you so much for giving us the update on healthcare Life Sciences in genomics. Thank you for having a show. of course, will be at eight of us reinvent this fall as well as many other shows.

ENTITIES

Entity	Category	Confidence
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
New York City	LOCATION	0.99+
Cory Quinn	PERSON	0.99+
1000 whole genome sequences	QUANTITY	0.99+
three	QUANTITY	0.99+
14 years	QUANTITY	0.99+
two hours	QUANTITY	0.99+
yesterday	DATE	0.99+
First	QUANTITY	0.99+
one	QUANTITY	0.99+
Amazon com	ORGANIZATION	0.99+
eight	QUANTITY	0.99+
last November	DATE	0.99+
15 months	QUANTITY	0.99+
first time	QUANTITY	0.99+
25 fold	QUANTITY	0.98+
today	DATE	0.98+
Shez Partovi	PERSON	0.98+
12 cancers	QUANTITY	0.98+
20,000	QUANTITY	0.98+
2018	DATE	0.97+
each	QUANTITY	0.97+
10	QUANTITY	0.97+
about six years	QUANTITY	0.97+
About 223,000,000 individuals	QUANTITY	0.97+
Healthcare Life	ORGANIZATION	0.97+
This year	DATE	0.97+
Queen of Stupid Men	TITLE	0.97+
Medical	ORGANIZATION	0.96+
billions of dollars	QUANTITY	0.96+
first whole genome sequence	QUANTITY	0.96+
aws	ORGANIZATION	0.96+
Campion Medical	ORGANIZATION	0.95+
Reinvent	ORGANIZATION	0.95+
about 85%	QUANTITY	0.95+
Doc	PERSON	0.94+
AWS Global Summit 2019	EVENT	0.93+
Amazon Web	ORGANIZATION	0.92+
Children's Hospital Philadelphia	ORGANIZATION	0.91+
Heart O. B. Who	PERSON	0.89+
AWS Summit	EVENT	0.87+
one of	QUANTITY	0.86+
U. S	LOCATION	0.86+
702	QUANTITY	0.86+
2011	DATE	0.86+
Cube	PERSON	0.86+
double	QUANTITY	0.85+
80 front service	QUANTITY	0.84+
U. S.	LOCATION	0.83+
about 210,000 individual patient data	QUANTITY	0.82+
few years back	DATE	0.82+
1 10	QUANTITY	0.81+
this fall	DATE	0.81+
2019	EVENT	0.8+
1015	QUANTITY	0.8+
1/2 2 years ago	DATE	0.79+
about over 100	QUANTITY	0.78+
10 years	QUANTITY	0.77+
one of our	QUANTITY	0.76+
i Am	TITLE	0.76+
Amazon Carmen Medical	ORGANIZATION	0.75+
three categories	QUANTITY	0.75+
20 minutes	QUANTITY	0.74+
com Medical Read Act	TITLE	0.72+
American	LOCATION	0.72+
few cores	QUANTITY	0.71+
years	DATE	0.7+
one example	QUANTITY	0.68+

Dr. Swaine Chen, Singapore Genomics Institute | AWS Public Sector Summit 2018

>> Live from Washington D.C., it's theCUBE. Covering AWS Public Sector Summit 2018. Brought to you by Amazon Web Services and its ecosystem partners. (upbeat music) >> Hey welcome back everyone we're here live in Washington D.C. for Amazon Web Services Public Sector Summit, I'm John Furrier. Stu Miniman our next guest is Dr. Swaine Chen, Senior Research Scientist of Infectious Disease, the Genome institute of Singapore. And also an assistant professor at The Medicinal National University of Singapore. Great to have you on, I know you've been super busy, you were on stage yesterday, we tried to get you on today, thanks for coming in and kind of bring it in to our two days of coverage here. >> Thank you for having me, I'm very excited to be here. >> So we were in between breaks here and we're talking about some of the work around DNA sequencing but, you know it's super fascinating. I know you've done some work there but, I want to talk first about your presence here at the Public Sector Summit. You were on stage, tell your story 'cause you have an very interesting presentation around some of the cool things you're doing in the cloud, take a minute to explain. >> That's right, so one of the big things that's happening in genomics is the rate of data acquisition is outstripping Moore's Law right? So for a single institute to try to keep up with compute for that, we really can't do it. So that really is the big driver for us to move to cloud, and why we're on AWS. And so then, of course once we can do that once we can sort of have this capacity, there's lots of things that my research is mostly on infection diseases, so one of the things where really you've got, all of a sudden, you've got a huge amount of data you need to process would be a case like an outbreak. And that just happens it happens unexpectedly. So we had one of these that happened that I talked about. And the keynote yesterday was on Group B Streptococcus. This is a totally unexpected disease. And so all of a sudden we had all this data we had to process, and try to figure out what was going on with that outbreak. And unfortunately we're pretty sure that there's going to be other outbreaks coming up in the future as well, and just, being able to be prepared for that. AWS helps us provide some of that capacity, and we're you know, continuously trying to upgrade our analytics for that as well. >> So give me an example of kind of where this kind of hits home for you, where it works. What is doing specifically? Is it changing the timeframe? Is it changing the analysis? Where is the impact for you? >> Yeah so it's all of this right? So it's all the sort of standard things that AWS is providing all of the other companies. So it's cheaper for us to just pay for what we use, especially when we have super spiky work loads. Like in the case of an outbreak right? If all of a sudden we need to sort of take over the cluster internally, well there's going to be a lot of people screaming about that, right? So we can kick that out to the cloud, just pay for what we use, we don't have to sort of requisition all the hardware to do that, so it really helps us along these things. And also gives us the capacity too think about you know as data just comes in more and more, we start to think about, lets just increase our scale. This is somethings that been happening, sort of incessantly in science, incessantly in genomics. So as just an example from my work and my lab we're studying infectious diseases we're studying mostly bacterial genomics. So the genomes of bacteria that cause infections. We've increased our scale 100x in the last four years in terms of the data sets that we're processing. And we see the samples coming in, we're going to do another 10x in the next two years. We just really wouldn't have been able to do that on our current hardware. >> Yeah, Dr. Chen, fascinating space. We love for years there was discussion of well oh how much it costs, to be able to do everything had gone down. But what has been fascinating is you've look, you've talked about that date and outstripping Moore's Law, and not only what you can do but in collaboration with others now, because there's many others around the globe that are doing this. 'Cause talk about that level of data, and how the cloud enables that. >> Yeah so that's actually another great point. So genomics is very strong into open source, especially in the academic community. Whenever we publish a paper, all the genomic data that's in that paper, it gets, uh oh (laughs). Whenever we, whenever we publish-- >> Mall's closing in three minutes. >> Three minutes cloud count. >> Three minutes, okay. Whenever we publish a paper, that data goes up and gets submitted to these public databases. So when I talk about 100x scale, that's really incorporating world wide globally all the data that's present for that species. So as an example, I talked about Group B Streptococcus, another bacteria we study a lot is E. coli, Escherichia coli. So that causes diarrhea, it causes urinary tract infections, bloodstream infections. When we pull down a data set locally, in Singapore, with 100, 200, 300 strains we can now integrate that with a global database of 10,000, 20,000 strains and just gain a global prospective on that. We get higher resolution, and really AWS helps us to pull in from these public databases, and gives the scale to burst out that processing of that many more strains. >> So the DNA piece of your work, does that tie into this at all? I mean obviously you've done a lot of work with the DNA side, was that playing into this as well? >> The? >> The DNA work, you've done in the past? >> Yeah so all of the stuff that we're doing is DNA, basically. So there are other frontiers, that have been explored quite a lot. So looking at RNA and looking at proteins and carbohydrates and lipids, but at the Genome Institute in Singapore, we're very focused on the genetics, and mostly are doing DNA. >> How has the culture changed from academic communities with cloud computing. We're seeing sharing, certainly a key part of data sharing. Can you talk about that dynamic, and what's different now than it was say five to even 10 years ago? >> Huh, I'd say that the academic community has always been pretty open, the academic community right? It's always been a very strong open source compatible kind of community right? So data was always supposed to be submitted to public databases. Didn't always happen, but I think as the data scale goes up and we see the value of the sort of having a global perspective on infectious diseases and looking for the source of an outbreak, the imperative to share data right? That looking at outbreaks like Ebola, where in the past people might try to hold data back because they wanted to publish that. But from a public health point of view, the imperative to share that data immediately is much stronger now that we see the value of having that out there. So I would say that's one of the biggest changes is the imperative is there more. >> I agree I think academic people I talk to, they always want to share, it might be not uploaded fast enough. So time is key. But I got to ask you a personal question, of all the work you've done on, you've seen a lot of outbreaks. This is kind of like scary stuff. Have you had those aha moments, just like mind blowing moments where you go, oh my God we did that because of the cloud? I mean an you point to some examples where it's like that is awesome, that's great stuff. >> Well so we certainly have quite a few examples. I mean outbreaks are just unexpected. Figuring out any of them and being able to impact, or sort of say this is how this transmission is, or this is what the source is. This is how we should try to control this outbreak. I mean all of those are great stories. I would say that , you know, to be honest were still early in our transition to the cloud, and we're kind of running a hybrid environment right now. Like really when we need to burst out, then we'll do that with the cloud. But most of our examples, so far, you know we're still early in this for cloud. >> To the spiky is the key value for you, when the hits pipe out. >> So what excited you about the future of the technology that, do you believe we'll be able to do as we just accelerate, prices go down, access to more information, access to more. What do you think we're going to see in this field the next, you know, one to three years? >> Oh I think on of the biggest changes that's going to happen, is we're going to shift completely how we do, for example in outbreaks right? We're going to shift completely how we do outbreak detection. It's already happening in the U.S. and Europe. We're trying to implement this in Singapore as well. Basically the way we detect outbreaks right now, is we see a rise in the number of cases, you see it at the hospitals, you see a cluster of cases of people getting sick. And what defines a cluster? You kind of need enough of these cases that it sort of statistically goes above your base line. But we actually, when we look at genomic data we can tell, we can find clusters of outbreaks that are buried in the baseline. Because we just have higher resolution. We can see the same bacteria causing infections in groups of people. It might be a small outbreak, it might be self limited. But we can see this stuff happening, and it's buried below the baseline. So this is really what's going to happen, is instead of waiting until, a bunch of people get sick before you know that there's an outbreak. We're going to see that in the baseline or as it's coming up with two, three, five cases. We can save hundreds of infections. And that's one of the things that's super exciting about moving towards the future where sequencing is just going to be a lot cheaper. Sequencing will be faster. Yeah it's a super exciting time. >> And more researching is a flywheel. More researching come over the top. >> Yep, exactly, exactly. >> That's great work, Dr. Swaine Chen, thanks for coming on theCUBE. We really appreciate-- >> No thank you. >> Congratulations, great talk on the keynote yesterday, really appreciate it. This is theCUBE bringing you all the action here as we close down our reporting. They're going to shut us down. theCUBE will go on until they pull the plug, literally. Thanks for watching, I'm John Ferrier, Stu Miniman, and Dave Vellante. Amazons Web Services Public Sector Summit, thanks for watching. (upbeat techno music)

Published Date : Jun 21 2018

SUMMARY :

Brought to you by Amazon Web Services of Infectious Disease, the Genome institute of Singapore. So we were in between breaks here and we're So that really is the big driver for us to move Where is the impact for you? So it's all the sort of standard things that and how the cloud enables that. especially in the academic community. and gives the scale to burst out that Yeah so all of the stuff that we're How has the culture changed from academic the imperative to share that data immediately of all the work you've done on, This is how we should try to control this outbreak. To the spiky is the key value for you, the next, you know, one to three years? Basically the way we detect outbreaks right now, More researching come over the top. We really appreciate-- Congratulations, great talk on the

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Singapore	LOCATION	0.99+
Stu Miniman	PERSON	0.99+
John Ferrier	PERSON	0.99+
two	QUANTITY	0.99+
Three minutes	QUANTITY	0.99+
Swaine Chen	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Genome Institute	ORGANIZATION	0.99+
Chen	PERSON	0.99+
three minutes	QUANTITY	0.99+
Escherichia coli	OTHER	0.99+
100	QUANTITY	0.99+
10x	QUANTITY	0.99+
U.S.	LOCATION	0.99+
Washington D.C.	LOCATION	0.99+
two days	QUANTITY	0.99+
Europe	LOCATION	0.99+
three	QUANTITY	0.99+
100x	QUANTITY	0.99+
yesterday	DATE	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
three years	QUANTITY	0.98+
Public Sector Summit	EVENT	0.98+
E. coli	OTHER	0.97+
Dr.	PERSON	0.97+
five	QUANTITY	0.96+
Ebola	EVENT	0.96+
Amazon Web Services Public Sector Summit	EVENT	0.96+
The Medicinal National University of Singapore	ORGANIZATION	0.96+
outbreak	EVENT	0.95+
theCUBE	ORGANIZATION	0.95+
Singapore Genomics Institute	ORGANIZATION	0.94+
10,000, 20,000 strains	QUANTITY	0.94+
AWS Public Sector Summit 2018	EVENT	0.94+
Amazons Web Services Public Sector Summit	EVENT	0.94+
outbreaks	EVENT	0.93+
first	QUANTITY	0.92+
five cases	QUANTITY	0.91+
hundreds of infections	QUANTITY	0.91+
Moore	PERSON	0.91+
last four years	DATE	0.87+
Group B Streptococcus	OTHER	0.84+
200, 300 strains	QUANTITY	0.83+
next two years	DATE	0.81+
single institute	QUANTITY	0.81+
Streptococcus	OTHER	0.76+
Genome institute of Singapore	ORGANIZATION	0.76+
10 years ago	DATE	0.75+
Group B	OTHER	0.67+
people	QUANTITY	0.5+
Scientist	PERSON	0.48+
Infectious Disease	ORGANIZATION	0.45+
bunch	QUANTITY	0.38+
theCUBE	TITLE	0.37+

George Mihaiescu, OICR | OpenStack Summit 2018

>> Narrator: Live from Vancouver, Canada, it's theCUBE, covering OpenStack Summit North America 2018, brought to you by Red Hat, the OpenStack Foundation, and its ecosystem partners. >> The sun has come out, but we're still talking about a lot of the cloud here at the OpenStack Summit 2018 in Vancouver. I'm Stu Miniman with my co-host John Troyer. Happy to welcome to the program the 2018 Super User Award winner, George Mihaiescu, who's the senior cloud architect with the Ontario Institute for Cancer Research or OICR. First of all, congratulations. >> Thank you very much for having me. >> And thank you so much for joining us. So cancer research, obviously is, one of the things we talk about is how can technology really help us at a global standpoint, help people. So, tell us a little about the organization first, before we get into the tech of it? >> So OICR is the largest cancer research institution in Canada, and is funded by government of Ontario. Located in Toronto, we support about 1,700 researchers, trainees and clinician staff. It's focused entirely on cancer research, it's located in a hub of cancer research in downtown Toronto, with Princess Margaret Hospital, Sick Kids Hospital, Mount Sinai, very, very powerful research centers, and OICR basically interconnects all these research centers and tries to bring together and to advance cancer research in the province, in Canada and globally. >> That's fantastic George. So with that, sketch out for us a little bit your role, kind of the purview that you have, the scope of what you cover. >> So I was hired four years ago by OICR to build and design cloud environment, based on a research grant that was awarded to a number of principal investigators in Canada to build this cloud computing infrastructure that can be used by cancer researchers to do large-scale analysis. What happens with cancer, because the variety of limitations happening in cancer patients, researchers found that they cannot just analyze a few samples and draw a conclusion, because the conclusion wouldn't be actually valid. So they needed to do large-scale research, and the ICGC, which is International Cancer Genome Consortium, an organization that's made of 17 countries that are donating, collecting and analyzing data from cancer patients, okay, they decided to put together all this data and to align it uniformly using the same algorithm and then analyze it using the same workflows, in order to actually draw conclusion that's valid across multiple data sets. They are focusing on the 50 most common types of cancer that affect most people in this world, and for each type of cancer, at least two countries provide and collect data. So for brain cancer, let's say we have data sets from two countries, for melanoma, for skin, and this basically gives you better confidence that the conclusion you draw is valid, and then the more pieces of the puzzle you throw on the table, the easier to see the big picture that's this cancer. >> You know George, I mean, I'm a former academic, and you know, the more data you get right, the more infrastructure you're going to have to have. I'm just reading off the announcement, 2,600 cores, 18 terabytes of RAM, 7.3 petabytes of storage, right, that's a lot of data, and it's a lot of... accessed by a lot of different researchers. When you came in, was the decision to use OpenStack already made, or did you make that decision, and how was the cloud architected in that way? >> The decision was basically made to use open source. We wanted basically to spend the money on capacity, on hardware, on research and not on licensing and support. >> John: Good use of everybody's tax dollars. >> Exactly, so you cannot do that if you have to spend money for paying licensing, then you probably have only half of the capacity that you could. So that means less large analysis, and longer it takes, and more costly. So Ceph for storing the data sets and OpenStack for infrastructure as a service offering was a no-brainer. My specialty was in OpenStack and Ceph, I started OpenStack seven years ago, so I was hired to design and build, and I had a chance to actually do alignment, and invitation calling for some of the data sets, so I was able to monitor the kind of stress that this workflows put on the system, so when I design it, I knew what is important, and what to focus on. So it's a cloud environment, it's customized for cancer research. We have very good ratio of RAM per CPU, we have very large local discs for the VM, for the virtual machines to be able to download very large data sets. We built it so if one compute node fails, you only impact a few workflows running there, you don't impact single small points of failures. Another tuning that we applied to the system too. >> George, can walk us through a little bit of the stack? What do you use, do you build your own OpenStack, or do you get it from someone? >> So basically, we use community hardware, we just high-density chassis, currently from Super Micro, Ubuntu for the operating system, no licensing there, OpenStack from the VM packages. We focus more on stability, scalability and support costs, internal support costs, because it's just myself and I have a colleague Gerard Baker, who's a cloud engineer, and you have to support all this environment, so we try to focus on the features that are most useful to our users, as well as less strain on our time and support resources. >> I mean that's, let's talk about the scalability right? You said the team is you and a colleague. >> George: Yes. >> But mostly, right. And you know, in the olden days, right, you would be taking care of maybe a handful of machines, and maybe some disk arrays in the lab. Now you're basically servicing an entire infrastructure for all of Canada, right? At how many universities? >> Well basically, it's global, so we have 40 research projects from four continents. So we have from Australia, from Israel, from China, from Europe, US, Canada. So approved cancer researchers that can access the data open up an account with us, and they get a quota, and they start their virtual machines, they download the data sets from the extra API of Ceph to their VMS, and they do analysis and we charge them for the time used, and because the use, everything is open source, and we don't pay any licensing fees, we are able to, and we don't run for profit, we charge them just what it costs us to be able to replenish the hardware when it fails. >> Nice, nice. And these are actually the very large machines, right? Because you have to have huge, thick data sets, you've got big data sets you have to compare all at once. >> Yeah, an average bandwidth of a file that has the normal DNA of the patient, and they need also the tumor DNA from the biopsy, an average whole genome sequence is about 150 gigabytes. So they need at least 300 gigabytes, and depending on the analysis, if they find mutations, then the output is usually five, 10 gigabytes, so much smaller. For other workflows, you have to actually align the data, so you input 150 gigabytes and the output is 150 or a bit more with metadata. And so nevertheless, you need very large storage for the virtual machines, and these are virtual machines that run very hard, in terms of you cannot do CPU over subscription, you cannot do memory over subscription, when you have a workflow that runs for four days, hundred percent CPU. So is different than other web scale environments, where you have website was running at 10%, or you can do 10 to one subscription, and then you go much cheaper or different solutions. Here you have to only provide what you have physically. >> John: That's great. >> George, you've said you participated in the OpenStack community for about seven years now. >> George: Yes. >> What kind of, do you actually contribute code, what pieces are you active in the community? >> Yeah, so I'm not a developer. My background is in networking, system administration and security, but I was involved in OpenStack since the beginning, before it was a foundation. I went to the first OpenStack public conference in Boston seven years ago, at the International Intercontinental Hotel and over time I was involved in discussions from the RAC channel, mailing list support, reporting backs. Even recently we had very interesting packet affected as well. The cloud package that is supposed to resize the disk of the VM as it boots, it was not using more than two terabytes because it was a bug, okay. So we reported this, and Scott Moffat, who's the maintainer of the cloud utils package, worked on the bug, and two days later, we had a fix, and they built a package, it's in the latest cloud Ubuntu image, and that happen, everybody else is going to use the same virtual Ubuntu package, so somebody who now has larger than two terabytes VMs, when they boot, they'll be able to resize and use the entire disk. And that's just an example of how with open source we can achieve things that would take much longer in commercial distribution, where even if you pay, doesn't necessarily mean that the response... >> Sure. Also George, any lessons learned? You've been with us a long time, right, and like Ceph. One thing we noticed today in the keynote, is actually a lot of the storage networking and compute wasn't really talked, those projects were maybe down focused a bit, as they talked about all the connectivity to everything else. So, I mean any lessons, so you... My point is, the infrastructure is stable of OpenStack, but any lessons learned along the journey? >> I think the lessons are that you can definitely build very affordable and useful and scalable infrastructure, but you have to get your expectations right. We only use from the open standard project that we consider are stable enough, so we can support them confidently without spending, like if a project adds 5% value to your offering, but eats 80% of your time debugging and trying to get it working, and doesn't have packages and missing documentation and so on, that's maybe not a good fit for your environment if you don't have the manpower to. And if it's not absolutely needed. Another very important lesson is that you have to really stay up to date, like go to the conferences, read the emails from the mailing list, be active in the community, because the OpenStack meetups in Toronto for 2018, we present there, we talk to other members. In these seven years I read tens of thousands of emails, so I learn from other users experiences, I try to help where I can. You have to be involved with the developers, I know the Ceph core developers, Sage and other people. So, you can't do this just by staying on the side and looking, you have to be involved. >> Good, George what are you looking for next from this community? You talked about the stability, are there pieces that you're hoping reach that maturity threshold for yourselves, or new functionalities that you're looking for down the road? >> I think what we want to provide to our researchers, 'cause they don't run web scale applications, so their needs are a little bit different. We want to add Magnum to our environment, to allow them deploy Kubernetes cluster easily. We want to add Octavia to expose the services, even though they don't run many web services, but you have to find a way to expose them when they run them. Maybe, Trove, database as a service, we'll see if we can deploy it safely and if it's stable enough. Anything that OpenStack comes up with, we basically look, is it useful, is it stable, can you do it, and we try it. >> George, last thing. Your group is the Super User of the Year. Can you just walk us through that journey, what led to the nomination, what does it mean to your team to win? >> I think we are a bit surprised, because we are a very small team, and our scale is not as big as T-Mobile or the other members, but I think it shows that again, for a big company to be able to deploy OpenStack at scale and make it work, it's maybe not very surprising 'cause yes, they have the resources, they have a lot of manpower and a lot of... But for a small institution or organization, or small company to be able to do it, without involving a vendor, without involving extra costs, I think that's the thing that was appreciated by the community and by the OpenStack Foundation, and yeah, we are pretty excited to have won it. >> All right, George, let me give you the final word, as somebody that's been involved with the community for a while. What would you say to people if they're, you know, still maybe looking from the outside or played with it a little bit. What tips would you give? >> I think we are living proof that it can be done, and if you wait until things are perfect, then they will never be, okay. Even Google has services in beta, Amazon has services in beta. You have to install OpenStack, it's much more performant and stable than when I started with OpenStack, where there was just a few projects, but definitely they will get help from the community, and the documentation's much better. Just go and do it, you won't regret it. >> George, as we know, software will eventually work, hardware will eventually fail. >> Absolutely. >> So, George Mihaiescu, congratulations to OICR on the Super User of the Year award, for John Troyer, I'm Stu Miniman, we're getting towards the end of day one of three days of wall to wall coverage here at OpenStack Summit 2018 in Vancouver. Thanks so much for watching theCUBE.

Published Date : May 22 2018

SUMMARY :

brought to you by Red Hat, the OpenStack Foundation, at the OpenStack Summit 2018 in Vancouver. one of the things we talk about is how can technology So OICR is the largest cancer research the scope of what you cover. that the conclusion you draw is valid, and you know, the more data you get right, The decision was basically made to use open source. and invitation calling for some of the data sets, and you have to support all this environment, You said the team is you and a colleague. and maybe some disk arrays in the lab. and because the use, everything is open source, Because you have to have huge, thick data sets, and then you go much cheaper or different solutions. the OpenStack community for about seven years now. and that happen, everybody else is going to is actually a lot of the storage networking and looking, you have to be involved. but you have to find a way to expose them Your group is the Super User of the Year. or the other members, but I think it shows that again, What would you say to people if they're, and if you wait until things are perfect, George, as we know, software will eventually work, congratulations to OICR on the Super User of the Year award,

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
George Mihaiescu	PERSON	0.99+
OICR	ORGANIZATION	0.99+
Canada	LOCATION	0.99+
80%	QUANTITY	0.99+
John Troyer	PERSON	0.99+
Gerard Baker	PERSON	0.99+
Ontario Institute for Cancer Research	ORGANIZATION	0.99+
John	PERSON	0.99+
Boston	LOCATION	0.99+
Red Hat	ORGANIZATION	0.99+
Toronto	LOCATION	0.99+
hundred percent	QUANTITY	0.99+
US	LOCATION	0.99+
Europe	LOCATION	0.99+
150	QUANTITY	0.99+
Scott Moffat	PERSON	0.99+
18 terabytes	QUANTITY	0.99+
2,600 cores	QUANTITY	0.99+
10	QUANTITY	0.99+
40 research projects	QUANTITY	0.99+
7.3 petabytes	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
ICGC	ORGANIZATION	0.99+
150 gigabytes	QUANTITY	0.99+
two countries	QUANTITY	0.99+
International Cancer Genome Consortium	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
OpenStack Foundation	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
five	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Vancouver	LOCATION	0.99+
10%	QUANTITY	0.99+
Sick Kids Hospital	ORGANIZATION	0.99+
four days	QUANTITY	0.99+
Australia	LOCATION	0.99+
Ceph	ORGANIZATION	0.99+
Princess Margaret Hospital	ORGANIZATION	0.99+
T-Mobile	ORGANIZATION	0.99+
Vancouver, Canada	LOCATION	0.99+
Israel	LOCATION	0.99+
today	DATE	0.99+
seven years ago	DATE	0.99+
four years ago	DATE	0.99+
17 countries	QUANTITY	0.99+
two days later	DATE	0.98+
OpenStack	TITLE	0.98+
each type	QUANTITY	0.98+
China	LOCATION	0.98+
2018	DATE	0.98+
about 1,700 researchers	QUANTITY	0.98+
Ubuntu	TITLE	0.98+
three days	QUANTITY	0.98+
10 gigabytes	QUANTITY	0.97+
OpenStack Summit North America 2018	EVENT	0.97+
seven years	QUANTITY	0.97+
four continents	QUANTITY	0.97+
One	QUANTITY	0.97+
International Intercontinental Hotel	LOCATION	0.96+
Super Micro	ORGANIZATION	0.96+
OpenStack Summit 2018	EVENT	0.96+
more than two terabytes	QUANTITY	0.96+
first	QUANTITY	0.95+
50 most common types of cancer	QUANTITY	0.95+
one subscription	QUANTITY	0.95+
one	QUANTITY	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for first whole genome sequence: