Dr. Eng Lim Goh, Joachim Schultze, & Krishna Prasad Shastry | HPE Discover 2020

>> Narrator: From around the globe it's theCUBE, covering HPE Discover Virtual Experience brought to you by HPE. >> Hi everybody. Welcome back. This is Dave Vellante for theCUBE, and this is our coverage of discover 2020, the virtual experience of HPE discover. We've done many, many discoveries, as usually we're on the show floor, theCUBE has been virtualized and we talk a lot at HPE discovers, a lot of storage and server and infrastructure and networking which is great. But the conversation we're going to have now is really, we're going to be talking about helping the world solve some big problems. And I'm very excited to welcome back to theCUBE Dr. Eng Lim Goh. He's a senior vice president of and CTO for AI, at HPE. Hello, Dr. Goh. Great to see you again. >> Hello. Thank you for having us, Dave. >> You're welcome. And then our next guest is Professor Joachim Schultze, who is the Professor for Genomics, and Immunoregulation at the university of Bonn amongst other things Professor, welcome. >> Thank you all. Welcome. >> And then Prasad Shastry, is the Chief Technologist for the India Advanced Development Center at HPE. Welcome, Prasad. Great to see you. >> Thank you. Thanks for having me. >> So guys, we have a CUBE first. I don't believe we've ever had of three guests in three separate times zones. I'm in a fourth time zone. (guests chuckling) So I'm in Boston. Dr. Goh, you're in Singapore, Professor Schultze, you're in Germany and Prasad, you're in India. So, we've got four different time zones. Plus our studio in Palo Alto. Who's running this program. So we've got actually got five times zones, a CUBE first. >> Amazing. >> Very good. (Prasad chuckles) >> Such as the world we live in. So we're going to talk about some of the big problems. I mean, here's the thing we're obviously in the middle of this pandemic, we're thinking about the post isolation economy, et cetera. People compare obviously no surprise to the Spanish flu early part of last century. They talk about the great depression, but the big difference this time is technology. Technology has completely changed the way in which we've approached this pandemic. And we're going to talk about that. Dr. Goh, I want to start with you. You've done a lot of work on this topic of swarm learning. If we could, (mumbles) my limited knowledge of this is we're kind of borrowing from nature. You think about, bees looking for a hive as sort of independent agents, but somehow they come together and communicate, but tell us what do we need to know about swarm learning and how it relates to artificial intelligence and we'll get into it. >> Oh, Dave, that's a great analogy using swarm of bees. That's exactly what we do at HPE. So let's use the of here. When deploying artificial intelligence, a hospital does machine learning of the outpatient data that could be biased, due to demographics and the types of cases they see more also. Sharing patient data across different hospitals to remove this bias is limited, given privacy or even sovereignty the restrictions, right? Like for example, across countries in the EU. HPE, so I'm learning fixers this by allowing each hospital, let's still continue learning locally, but at each cycle we collect the lumped weights of the neural networks, average them and sending it back down to older hospitals. And after a few cycles of doing this, all the hospitals would have learned from each other, removing biases without having to share any private patient data. That's the key. So, the ability to allow you to learn from everybody without having to share your private patients. That's swarm learning, >> And part of the key to that privacy is blockchain, correct? I mean, you you've been too involved in blockchain and invented some things in blockchain and that's part of the privacy angle, is it not? >> Yes, yes, absolutely. There are different ways of doing this kind of distributed learning, which swarm learning is over many of the other distributed learning methods. Require you to have some central control. Right? So, Prasad, and the team and us came up together. We have a method where you would, instead of central control, use blockchain to do this coordination. So, there is no more a central control or coordinator, especially important if you want to have a truly distributed swamp type learning system. >> Yeah, no need for so-called trusted third party or adjudicator. Okay. Professor Schultze, let's go to you. You're essentially the use case of this swarm learning application. Tell us a little bit more about what you do and how you're applying this concept. >> I'm actually by training a physician, although I haven't seen patients for a very long time. I'm interested in bringing new technologies to what we call precision medicine. So, new technologies both from the laboratories, but also from computational sciences, married them. And then I basically allow precision medicine, which is a medicine that is built on new measurements, many measurements of molecular phenotypes, how we call them. So, basically that process on different levels, for example, the genome or genes that are transcribed from the genome. We have thousands of such data and we have to make sense out of this. This can only be done by computation. And as we discussed already one of the hope for the future is that the new wave of developments in artificial intelligence and machine learning. We can make more sense out of this huge data that we generate right now in medicine. And that's what we're interesting in to find out how can we leverage these new technologies to build a new diagnostics, new therapy outcome predictors. So, to know the patient benefits from a disease, from a diagnostics or a therapy or not, and that's what we are doing for the last 10 years. The most exciting thing I have been through in the last three, four, five years is really when HPE introduced us to swarm learning. >> Okay and Prasad, you've been helping Professor Schultze, actually implements swarm learning for specific use cases that we're going to talk about COVID, but maybe describe a little bit about what you've been or your participation in this whole equation. >> Yep, thank. As Dr Eng Lim Goh, mentioned. So, we have used blockchain as a backbone to implement the decentralized network. And through that we're enabling a privacy preserved these centralized network without having any control points, as Professor explained in terms of depression medicines. So, one of the use case we are looking at he's looking at the blood transcriptomes, think of it, different hospitals having a different set of transcriptome data, which they cannot share due to the privacy regulations. And now each of those hospitals, will clean the model depending upon their local data, which is available in that hospital. And shared the learnings coming out of that training with the other hospitals. And we played to over several cycles to merge all these learnings and then finally get into a global model. So, through that we are able to kind of get into a model which provides the performance is equal of collecting all the data into a central repository and trying to do it. And we could really think of when we are doing it, them, could be multiple kinds of challenges. So, it's good to do decentralized learning. But what about if you have a non ID type of data, what about if there is a dropout in the network connections? What about if there are some of the compute nodes we just practice or probably they're not seeing sufficient amount of data. So, that's something we tried to build into the swarm learning framework. You'll handle the scenarios of having non ID data. All in a simple word we could call it as seeing having the biases. An example, one of the hospital might see EPR trying to, look at, in terms of let's say the tumors, how many number of cases and whereas the other hospital might have very less number of cases. So, if you have kind of implemented some techniques in terms of doing the merging or providing the way that different kind of weights or the tuneable parameters to overcome these set of challenges in the swarm learning. >> And Professor Schultze, you you've applied this to really try to better understand and attack the COVID pandemic, can you describe in more detail your goals there and what you've actually done and accomplished? >> Yeah. So, we have actually really done it for COVID. The reason why we really were trying to do this already now is that we have to generate it to these transcriptomes from COVID-19 patients ourselves. And we realized that the scene of the disease is so strong and so unique compared to other infectious diseases, which we looked at in some detail that we felt that the blood transcriptome would be good starting point actually to identify patients. But maybe even more important to identify those with severe diseases. So, if you can identify them early enough that'd be basically could care for those more and find particular for those treatments and therapies. And the reason why we could do that is because we also had some other test cases done before. So, we used the time wisely with large data sets that we had collected beforehand. So, use cases learned how to apply swarm learning, and we are now basically ready to test directly with COVID-19. So, this is really a step wise process, although it was extremely fast, it was still a step wise probably we're guided by data where we had much more knowledge of which was with the black leukemia. So, we had worked on that for years. We had collected many data. So, we could really simulate a Swarm learning very nicely. And based on all the experience we get and gain together with Prasad, and his team, we could quickly then also apply that knowledge to the data that are coming now from COVID-19 patients. >> So, Dr. Goh, it really comes back to how we apply machine intelligence to the data, and this is such an interesting use case. I mean, the United States, we have 50 different States with 50 different policies, different counties. We certainly have differences around the world in terms of how people are approaching this pandemic. And so the data is very rich and varied. Let's talk about that dynamic. >> Yeah. If you, for the listeners who are or viewers who are new to this, right? The workflow could be a patient comes in, you take the blood, and you send it through an analysis? DNA is made up of genes and our genes express, right? They express in two steps the first they transcribe, then they translate. But what we are analyzing is the middle step, the transcription stage. And tens of thousands of these Transcripts that are produced after the analysis of the blood. The thing is, can we find in the tens of thousands of items, right? Or biomarkers a signature that tells us, this is COVID-19 and how serious it is for this patient, right? Now, the data is enormous, right? For every patient. And then you have a collection of patients in each hospitals that have a certain demographic. And then you have also a number of hospitals around. The point is how'd you get to share all that data in order to have good training of your machine? The ACO is of course a know privacy of data, right? And as such, how do you then share that information if privacy restricts you from sharing the data? So in this case, swarm learning only shares the learnings, not the private patient data. So we hope this approach would allow all the different hospitals to come together and unite sharing the learnings removing biases so that we have high accuracy in our prediction as well at the same time, maintaining privacy. >> It's really well explained. And I would like to add at least for the European union, that this is extremely important because the lawmakers have clearly stated, and the governments that even non of these crisis conditions, they will not minimize the rules of privacy laws, their compliance to privacy laws has to stay as high as outside of the pandemic. And I think there's good reasons for that, because if you lower the bond, now, why shouldn't you lower the bar in other times as well? And I think that was a wise decision, yes. If you would see in the medical field, how difficult it is to discuss, how do we share the data fast enough? I think swarm learning is really an amazing solution to that. Yeah, because this discussion is gone basically. Now we can discuss about how we do learning together. I'd rather than discussing what would be a lengthy procedure to go towards sharing. Which is very difficult under the current privacy laws. So, I think that's why I was so excited when I learned about it, the first place with faster, we can do things that otherwise are either not possible or would take forever. And for a crisis that's key. That's absolutely key. >> And is the byproduct. It's also the fact that all the data stay where they are at the different hospitals with no movement. >> Yeah. Yeah. >> Learn locally but only shared the learnings. >> Right. Very important in the EU of course, even in the United States, People are debating. What about contact tracing and using technology and cell phones, and smartphones to do that. Beside, I don't know what the situation is like in India, but nonetheless, that Dr. Goh's point about just sharing the learnings, bubbling it up, trickling just kind of metadata. If you will, back down, protects us. But at the same time, it allows us to iterate and improve the models. And so, that's a key part of this, the starting point and the conclusions that we draw from the models they're going to, and we've seen this with the pandemic, it changes daily, certainly weekly, but even daily. We continuously improve the conclusions and the models don't we. >> Absolutely, as Dr. Goh explained well. So, we could look at like they have the clinics or the testing centers, which are done in the remote places or wherever. So, we could collect those data at the time. And then if we could run it to the transcripting kind of a sequencing. And then as in, when we learn to these new samples and the new pieces all of them put kind of, how is that in the local data participate in the kind of use swarm learning, not just within the state or in a country could participate into an swarm learning globally to share all this data, which is coming up in a new way, and then also implement some kind of continuous learning to pick up the new signals or the new insight. It comes a bit new set of data and also help to immediately deploy it back into the inference or into the practice of identification. To do these, I think one of the key things which we have realized is to making it very simple. It's making it simple, to convert the machine learning models into the swarm learning, because we know that our subject matter experts who are going to develop these models on their choice of platforms and also making it simple to integrate into that complete machine learning workflow from the time of collecting a data pre processing and then doing the model training and then putting it onto inferencing and looking performance. So, we have kept that in the mind from the beginning while developing it. So, we kind of developed it as a plug able microservices kind of packed data with containers. So the whole library could be given it as a container with a kind of a decentralized management command controls, which would help to manage the whole swarm network and to start and initiate and children enrollment of new hospitals or the new nodes into the swarm network. At the same time, we also looked into the task of the data scientists and then try to make it very, very easy for them to take their existing models and convert that into the swarm learning frameworks so that they can convert or enabled they're models to participate in a decentralized learning. So, we have made it to a set callable rest APIs. And I could say that the example, which we are working with the Professor either in the case of leukemia or in the COVID kind of things. The noodle network model. So we're kind of using the 10 layer neural network things. We could convert that into the swarm model with less than 10 lines of code changes. So, that's kind of a simply three we are looking at so that it helps to make it quicker, faster and loaded the benefits. >> So, that's an exciting thing here Dr. Goh is, this is not an R and D project. This is something that you're actually, implementing in a real world, even though it's a narrow example, but there are so many other examples that I'd love to talk about, but please, you had a comment. >> Yes. The key thing here is that in addition to allowing privacy to be kept at each hospital, you also have the issue of different hospitals having day to day skewed differently. Right? For example, a demographics could be that this hospital is seeing a lot more younger patients, and other hospitals seeing a lot more older patients. Right? And then if you are doing machine learning in isolation then your machine might be better at recognizing the condition in the younger population, but not older and vice versa by using this approach of swarm learning, we then have the biases removed so that both hospitals can detect for younger and older population. All right. So, this is an important point, right? The ability to remove biases here. And you can see biases in the different hospitals because of the type of cases they see and the demographics. Now, the other point that's very important to reemphasize is what precise Professor Schultze mentioned, right? It's how we made it very easy to implement this.Right? This started out being so, for example, each hospital has their own neural network and they training their own. All you do is we come in, as Pasad mentioned, change a few lines of code in the original, machine learning model. And now you're part of the collective swarm. This is how we want to easy to implement so that we can get again, as I like to call, hospitals of the world to uniting. >> Yeah. >> Without sharing private patient data. So, let's double click on that Professor. So, tell us about sort of your team, how you're taking advantage of this Dr. Goh, just describe, sort of the simplicity, but what are the skills that you need to take advantage of this? What's your team look like? >> Yeah. So, we actually have a team that's comes from physicians to biologists, from medical experts up to computational scientists. So, we have early on invested in having these interdisciplinary research teams so that we can actually spend the whole spectrum. So, people know about the medicine they know about them the biological basics, but they also know how to implement such new technology. So, they are probably a little bit spearheading that, but this is the way to go in the future. And I see that with many institutions going this way many other groups are going into this direction because finally medicine understands that without computational sciences, without artificial intelligence and machine learning, we will not answer those questions with this large data that we're using. So, I'm here fine. But I also realize that when we entered this project, we had basically our model, we had our machine learning model from the leukemia's, and it really took almost no efforts to get this into the swarm. So, we were really ready to go in very short time, but I also would like to say, and then it goes towards the bias that is existing in medicine between different places. Dr. Goh said this very nicely. It's one aspect is the patient and so on, but also the techniques, how we do clinical essays, we're using different robots a bit. Using different automates to do the analysis. And we actually try to find out what the Swan learning is doing if we actually provide such a bias by prep itself. So, I did the following thing. We know that there's different ways of measuring these transcriptomes. And we actually simulated that two hospitals had an older technology and a third hospital had a much newer technology, which is good for understanding the biology and the diseases. But it is the new technology is prone for not being able anymore to generate data that can be used to learn and then predicting the old technology. So, there was basically, it's deteriorating, if you do take the new one and you'll make a classifier model and you try old data, it doesn't work anymore. So, that's a very hard challenge. We knew it didn't work anymore in the old way. So, we've pushed it into swarm learning and to swarm recognize that, and it didn't take care of it. It didn't care anymore because the results were even better by bringing everything together. I was astonished. I mean, it's absolutely amazing. That's although we knew about this limitations on that one hospital data, this form basically could deal with it. I think there's more to learn about these advantages. Yeah. And I'm very excited. It's not only a transcriptome that people do. I hope we can very soon do it with imaging or the DCNE has 10 sites in Germany connected to 10 university hospitals. There's a lot of imaging data, CT scans and MRIs, Rachel Grimes. And this is the next next domain in medicine that we would like to apply as well as running. Absolutely. >> Well, it's very exciting being able to bring this to the clinical world And make it in sort of an ongoing learnings. I mean, you think about, again, coming back to the pandemic, initially, we thought putting people on ventilators was the right thing to do. We learned, okay. Maybe, maybe not so much the efficacy of vaccines and other therapeutics. It's going to be really interesting to see how those play out. My understanding is that the vaccines coming out of China, or built to for speed, get to market fast, be interested in U.S. Maybe, try to build vaccines that are maybe more longterm effective. Let's see if that actually occurs some of those other biases and tests that we can do. That is a very exciting, continuous use case. Isn't it? >> Yeah, I think so. Go ahead. >> Yes. I, in fact, we have another project ongoing to use a transcriptome data and other data like metabolic and cytokines that data, all these biomarkers from the blood, right? Volunteers during a clinical trial. But the whole idea of looking at all those biomarkers, we talking tens of thousands of them, the same thing again, and then see if we can streamline it clinical trials by looking at it data and training with that data. So again, here you go. Right? We have very good that we have many vaccines on. In candidates out there right now, the next long pole in the tenth is the clinical trial. And we are working on that also by applying the same concept. Yeah. But for clinical trials. >> Right. And then Prasad, it seems to me that this is a good, an example of sort of an edge use case. Right? You've got a lot of distributed data. And I know you've spoken in the past about the edge generally, where data lives bringing moving data back to sort of the centralized model. But of course you don't want to move data if you don't have to real time AI inferencing at the edge. So, what are you thinking in terms of other other edge use cases that were there swarm learning can be applied. >> Yeah, that's a great point. We could kind of look at this both in the medical and also in the other fields, as we talked about Professor just mentioned about this radiographs and then probably, Using this with a medical image data, think of it as a scenario in the future. So, if we could have an edge note sitting next to these medical imaging systems, very close to that. And then as in when this the systems producers, the medical immediate speed could be an X-ray or a CT scan or MRI scan types of thing. The system next to that, sitting on the attached to that. From the modernity is already built with the swarm lending. It can do the inferencing. And also with the new setup data, if it looks some kind of an outlier sees the new or images are probably a new signals. It could use that new data to initiate another round up as form learning with all the involved or the other medical images across the globe. So, all this can happen without really sharing any of the raw data outside of the systems but just getting the inferencing and then trying to make all of these systems to come together and try to build a better model. >> So, the last question. Yeah. >> If I may, we got to wrap, but I mean, I first, I think we've heard about swarm learning, maybe read about it probably 30 years ago and then just ignored it and forgot about it. And now here we are today, blockchain of course, first heard about with Bitcoin and you're seeing all kinds of really interesting examples, but Dr. Goh, start with you. This is really an exciting area, and we're just getting started. Where do you see swarm learning, by let's say the end of the decade, what are the possibilities? >> Yeah. You could see this being applied in many other industries, right? So, we've spoken about life sciences, to the healthcare industry or you can't imagine the scenario of manufacturing where a decade from now you have intelligent robots that can learn from looking at across men building a product and then to replicate it, right? By just looking, listening, learning and imagine now you have multiple of these robots, all sharing their learnings across boundaries, right? Across state boundaries, across country boundaries provided you allow that without having to share what they are seeing. Right? They can share, what they have lunch learnt You see, that's the difference without having to need to share what they see and hear, they can share what they have learned across all the different robots around the world. Right? All in the community that you allow, you mentioned that time, right? That will even in manufacturing, you get intelligent robots learning from each other. >> Professor, I wonder if as a practitioner, if you could sort of lay out your vision for where you see something like this going in the future, >> I'll stay with the medical field at the moment being, although I agree, it will be in many other areas, medicine has two traditions for sure. One is learning from each other. So, that's an old tradition in medicine for thousands of years, but what's interesting and that's even more in the modern times, we have no traditional sharing data. It's just not really inherent to medicine. So, that's the mindset. So yes, learning from each other is fine, but sharing data is not so fine, but swarm learning deals with that, we can still learn from each other. We can, help each other by learning and this time by machine learning. We don't have to actually dealing with the data sharing anymore because that's that's us. So for me, it's a really perfect situation. Medicine could benefit dramatically from that because it goes along the traditions and that's very often very important to get adopted. And on top of that, what also is not seen very well in medicine is that there's a hierarchy in the sense of serious certain institutions rule others and swarm learning is exactly helping us there because it democratizes, onboarding everybody. And even if you're not sort of a small entity or a small institutional or small hospital, you could become remembering the swarm and you will become as a member important. And there is no no central institution that actually rules everything. But this democratization, I really laugh, I have to say, >> Pasad, we'll give you the final word. I mean, your job is very helping to apply these technologies to solve problems. what's your vision or for this. >> Yeah. I think Professor mentioned about one of the very key points to use saying that democratization of BI I'd like to just expand a little bit. So, it has a very profound application. So, Dr. Goh, mentioned about, the manufacturing. So, if you look at any field, it could be health science, manufacturing, autonomous vehicles and those to the democratization, and also using that a blockchain, we are kind of building a framework also to incentivize the people who own certain set of data and then bring the insight from the data into the table for doing and swarm learning. So, we could build some kind of alternative monetization framework or an incentivization framework on top of the existing fund learning stuff, which we are working on to enable the participants to bring their data or insight and then get rewarded accordingly kind of a thing. So, if you look at eventually, we could completely make dais a democratized AI, with having the complete monitorization incentivization system which is built into that. You may call the parties to seamlessly work together. >> So, I think this is just a fabulous example of we hear a lot in the media about, the tech backlash breaking up big tech but how tech has disrupted our lives. But this is a great example of tech for good and responsible tech for good. And if you think about this pandemic, if there's one thing that it's taught us is that disruptions outside of technology, pandemics or natural disasters or climate change, et cetera, are probably going to be the bigger disruptions then technology yet technology is going to help us solve those problems and address those disruptions. Gentlemen, I really appreciate you coming on theCUBE and sharing this great example and wish you best of luck in your endeavors. >> Thank you. >> Thank you. >> Thank you for having me. >> And thank you everybody for watching. This is theCUBE's coverage of HPE discover 2020, the virtual experience. We'll be right back right after this short break. (upbeat music)

Published Date : Jun 24 2020

SUMMARY :

the globe it's theCUBE, But the conversation we're Thank you for having us, Dave. and Immunoregulation at the university Thank you all. is the Chief Technologist Thanks for having me. So guys, we have a CUBE first. Very good. I mean, here's the thing So, the ability to allow So, Prasad, and the team You're essentially the use case of for the future is that the new wave Okay and Prasad, you've been helping So, one of the use case we And based on all the experience we get And so the data is very rich and varied. of the blood. and the governments that even non And is the byproduct. Yeah. shared the learnings. and improve the models. And I could say that the that I'd love to talk about, because of the type of cases they see sort of the simplicity, and the diseases. and tests that we can do. Yeah, I think so. and then see if we can streamline it about the edge generally, and also in the other fields, So, the last question. by let's say the end of the decade, All in the community that you allow, and that's even more in the modern times, to apply these technologies You may call the parties to the tech backlash breaking up big tech the virtual experience.

ENTITIES

Entity	Category	Confidence
Prasad	PERSON	0.99+
India	LOCATION	0.99+
Joachim Schultze	PERSON	0.99+
Dave	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Boston	LOCATION	0.99+
China	LOCATION	0.99+
Schultze	PERSON	0.99+
Germany	LOCATION	0.99+
Singapore	LOCATION	0.99+
United States	LOCATION	0.99+
10 sites	QUANTITY	0.99+
Prasad Shastry	PERSON	0.99+
10 layer	QUANTITY	0.99+
10 university hospitals	QUANTITY	0.99+
COVID-19	OTHER	0.99+
Goh	PERSON	0.99+
50 different policies	QUANTITY	0.99+
two hospitals	QUANTITY	0.99+
thousands	QUANTITY	0.99+
two steps	QUANTITY	0.99+
Krishna Prasad Shastry	PERSON	0.99+
pandemic	EVENT	0.99+
thousands of years	QUANTITY	0.99+
Eng Lim Goh	PERSON	0.99+
HPE	ORGANIZATION	0.99+
first	QUANTITY	0.99+
ACO	ORGANIZATION	0.99+
DCNE	ORGANIZATION	0.99+
European union	ORGANIZATION	0.99+
each hospitals	QUANTITY	0.99+
less than 10 lines	QUANTITY	0.99+
both hospitals	QUANTITY	0.99+
one	QUANTITY	0.99+
Rachel Grimes	PERSON	0.99+
each	QUANTITY	0.99+
three guests	QUANTITY	0.99+
each cycle	QUANTITY	0.99+
third hospital	QUANTITY	0.99+
each hospital	QUANTITY	0.98+
four	QUANTITY	0.98+
30 years ago	DATE	0.98+
India Advanced Development Center	ORGANIZATION	0.98+
both	QUANTITY	0.98+
tens of thousands	QUANTITY	0.98+
fourth time zone	QUANTITY	0.98+
three	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
EU	LOCATION	0.96+
five years	QUANTITY	0.96+
2020	DATE	0.96+
today	DATE	0.96+
Dr.	PERSON	0.95+
Pasad	PERSON	0.95+

Janet George, Western Digital | Women in Data Science 2017

>> Male Voiceover: Live from Stanford University, it's The Cube covering the Women in Data Science Conference 2017. >> Hi, welcome back to The Cube, I'm Lisa Martin and we are live at Stanford University at the second annual Women in Data Science Technical Conference. It's a one day event here, incredibly inspiring morning we've had. We're joined by Janet George, who is the chief data scientist at Western Digital. Janet, welcome to the show. >> Thank you very much. >> You're a speaker at-- >> Very happy to be here. >> We're very happy to have you. You're a speaker at this event and we want to talk about what you're going to be talking about. Industrialized data science. What is that? >> Industrialized data science is mostly about how data science is applied in the industry. It's less about more research work, but it's more about practical application of industry use cases in which we actually apply machine learning and artificial intelligence. >> What are some of the use cases at Western Digital for that application? >> One of the use case that we use is, we are in the business of creating new technology nodes and for creating new technology nodes we actually create a lot of data. And with that data, we actually look at, can we understand pattern recognition at very large scale? We're talking millions of wafers. Can we understand memory holes? The shape, the type, the curvature, circularity, radius, can we detect these patterns at scale? And then how can we detect if the memory hole is warped or deformed and how can we have machine learning do that for us? We also look at things like correlations during the manufacturing process. Strong correlations, weak correlations, and we try to figure out interactions between different correlations. >> Fantastic. So if we look at big data, it's probably applicable across every industry. How has it helped to transform Western Digital, that's been an institution here in Silicon Valley for a while? >> We in Western Digital we move mountains of data. That's just part of our job, right? And so we are the leaders in storage technology, people store data in Western Digital products, and so data's inherently very familiar to us. We actually deal with data on a regular basis. And now we've started confronting our data with data science. And we started confronting our data with machine learning because we are very aware that artificial intelligence, machine learning can bring a different value to that data. We can look at the insides, we can develop intelligence about how we build our storage products. What we do with our storage. Failure analysis is a huge area for us. So we're really tapping into our data to figure out how can we make artificial intelligence and machine learning ingrained in the way we do work. >> So from a cultural perspective, you've really done a lot to evolve the culture of Western Digital to apply the learnings, to improve the values that you deliver to all of your customers. >> Yes, believe it or not, we've become a data-driven company. That's amazing, because we've invested in our own data, and we've said "Hey, if we are going to store the world's data, we need to lead, from a data perspective" and so we've sort of embraced machine learning and artificial intelligence. We've embraced new algorithms, technologies that's out there we can tap into to look at our data. >> So from a machine learning, human perspective, in storage manufacturing, is there still a dependence on human insight where storage manufacturing devices are concerned, or are you seeing the machine learning really, in this case, take more of a lead? >> No, I think humans play a huge role, right? Because these are domain experts. We're talking about Ph.D.'s in material science and device physics areas so what I see is the augmentation between machine learning and humans, and the domain experts. Domain experts will not be able to scale. When the scale of wafer production becomes very large. So let's talk about 3 million wafers. How is a machine going to physically look at all the failure patterns on those wafers? We're not going to be able to scale just having domain expertise. But taking our core domain expertise and using that as training data to build intelligence models that can inform the domain expert and be smart and come up with all the ideas, that's where we want to be. >> Excellent. So you talked a little bit about the manufacturing process. Who are some of the other constituents that you collaborate with as chief data scientist at Western Digital that are demanding access to data, marketing, etcetera, what are some of those key collaborators for your group? >> Many of our marketing department, as well as our customer service department, we also have collaborations going on with universities, but one of the things we found out was when a drive fails, and it goes to our customer, it's much better for us to figure out the failure. So we've started modeling out all the customer returns that we've received, and look at that and see "How can we predict the life cycle of our storage?" And get to those return possibilities or potential issues before it lands in the hands of customers. >> That's excellent. >> So that's one area we've been focusing quite a bit on, to look at the whole life cycle of failures. >> You also talked about collaborating with universities. Share a little bit about that in terms of, is there a program for internships for example? How are you helping to shape the next generation of computer scientists? >> We are very strongly embedded in universities. We usually have a very good internship program. Six to eight weeks, to 12 weeks in the summer, the interns come in. Ours is a little different where we treat our interns as real value add. They come in, and they're given a hypothesis, or problem domain that they need to go after. And within six to eight weeks, and they have access to tremendous amounts of data, so they get to play with all this industry data that they would never get to play with. They can quickly bring their academic background, or their academic learning to that data. We also take really hard research-ended problems or further out problems and we collaborate with universities on that, especially Stanford University, we've been doing great collaborations with them. I'm super encouraged with Feliz's work on computer vision, and we've been looking into things around deep neural networks. This is an area of great passion for me. I think the cognitive computing space is just started to open up and we have a lot to learn from neural networks and how they work and where the value can be added. >> Looking at, just want to explore the internship topic for a second. And we're at the second annual Women in Data Science Conference. There's a lot of young minds here, not just here in person, but in many cities across the globe. What are you seeing with some of the interns that come in? Are they confident enough to say "I'm getting access to real world data I wouldn't have access to in school", are they confident to play around with that, test out a hypothesis and fail? Or do they fear, "I need to get this right right away, this is my career at stake?" >> It's an interesting dichotomy because they have a really short time frame. That's an issue because of the time frame, and they have to quickly discover. Failing fast and learning fast is part of data science and I really think that we have to get to that point where we're really comfortable with failure, and the learning we get from the failure. Remember the light bulb was invented with 99% negative knowledge, so we have to get to that negative knowledge and treat that as learning. So we encourage a culture, we encourage a style of different learning cycles so we say, "What did we learn in the first learning cycle?" "What discoveries, what hypothesis did we figure out in the first learning cycle, which will then prepare our second learning cycle?" And we don't see it as a one-stop, rather more iterative form of work. Also with the internships, I think sometimes it's really essential to have critical thinking. And so the interns get that environment to learn critical thinking in the industry space. >> Tell us about, from a skills perspective, these are, you can share with us, presumably young people studying computer science, maybe engineering topics, what are some of the traditional data science skills that you think are still absolutely there? Maybe it's a hybrid of a hacker and someone who's got, great statistician background. What about the creative side and the ability to communicate? What's your ideal data scientist today? What are the embodiments of those? >> So this is a fantastic question, because I've been thinking about this a lot. I think the ideal data scientist is at the intersection of three circles. The first circle is really somebody who's very comfortable with data, mathematics, statistics, machine learning, that sort of thing. The second circle is in the intersection of implementation, engineering, computer science, electrical engineering, those backgrounds where they've had discipline. They understand that they can take complex math or complex algorithms and then actually implement them to get business value out of them. And the third circle is around business acumen, program management, critical thinking, really going deeper, asking the questions, explaining the results, very complex charts. The ability to visualize that data and understand the trends in that data. So it's the intersection of these very diverse disciplines, and somebody who has deep critical thinking and never gives up. (laughs) >> That's a great one, that never gives up. But looking at it, in that way, have you seen this, we're really here at a revolution, right? Have you seen that data science traditionalist role evolve into these three, the intersection of these three elements? >> Yeah, traditionally, if you did a lot of computer science, or you did a lot of math, you'd be considered a great data scientist. But if you don't have that business acumen, how do you look at the critical problems? How do you communicate what you found? How do you communicate that what you found actually matters in the scheme of things? Sometimes people talk about anomalies, and I always say "is the anomaly structured enough that I need to care about?" Is it systematic? Why should I care about this anomaly? Why is it different from an alert? If you have modeled all the behaviors, and you understand that this is a different anomaly than I've normally seen, and you must care about it. So you need to have business acumen to ask the right business questions and understand why that matters. >> So your background in computer science, your bachelor's Ph.D.? >> Bachelor's and master's in computer science, mathematics, and statistics, so I've got a combination of all of those and then my business experience comes from being in the field. >> Lisa: I was going to ask you that, how did you get that business acumen? Sounds like it was by in-field training, basically on-the-job? >> It was in the industry, it was on-the-job, I put myself in positions where I've had great opportunities and tackled great business problems that I had to go out and solve, very unique set of business problems that I had to dig deep into figuring out what the solutions were, and so then gained the experience from that. >> So going back to Western Digital, how you're leveraging data science to really evolve the company. You talked about the cultural evolution there, which we both were mentioning off-camera, is quite a feat because it's very challenging. Data from many angles, security, usage, is a board level, boardroom conversation. I'd love to understand, and you also talked about collaboration, so talk to us a little bit about how, and some of the ways, tangible ways, that data science and your team have helped evolve Western Digital. Improving products, improving services, improving revenue. >> I think of it as when an algorithm or a machine learning model is smart, it cannot be a threat. There's a difference between being smart and being a threat. It's smart when it actually provides value. It's a threat when it takes away or does something you would be wanting to do, and here I see that initially there's a lot of fear in the industry, and I think the fear is related to "oh, here's a new technology," and we've seen technologies come in and disrupt in a major way. And machine learning will make a lot of disruptions in the industry for sure. But I think that will cause a shift, or a change. Look at our phone industry, and how much the phone industry has gone through. We never complain that the smart phone is smarter than us. (laughs) We love the fact that the smartphone can show us maps and it can send us in the right, of course, it sends us in the wrong direction sometimes, most of the time it's pretty good. We've grown to rely on our cell phones. We've grown to rely on the smartness. I look at when technology becomes your partner, when technology becomes your ally, and when it actually becomes useful to you, there is a shift in culture. We start by saying "how do we earn the value of the humans?" How can machine learning, how can the algorithms we built, actually show you the difference? How can it come up with things you didn't see? How can it discover new things for you that will create a wow factor for you? And when it does create a wow factor for you, you will want more of it, so it's more, to me, it's most an intent-based progress, in terms of a culture change. You can't push any new technology on people. People will be reluctant to adapt. The only way you can, that people adopt to new technologies is when they the value of the technology instantly and then they become believers. It's a very grassroots-level change, if you will. >> For the foreseeable future, that from a fear perspective and maybe job security, that at least in the storage and manufacturing industry, people aren't going to be replaced by machines. You think it's going to maybe live together for a very long, long time? >> I totally agree. I think that it's going to augment the humans for a long, long time. I think that we will get over our fear, we worry that the humans, I think humans are incredibly powerful. We give way too little credit to ourselves. I think we have huge creative capacity. Machines do have processing capacity, they have very large scale processing capacity, and humans and machines can augment each other. I do believe that the time when we had computers and we relied on our computers for data processing. We're going to rely on computers for machine learning. We're going to get smarter, so we don't have to do all the automation and the daily grind of stuff. If you can predict, and that prediction can help you, and you can feed that prediction model some learning mechanism by reinforced learning or reading or ranking. Look at spam industry. We just taught the Spam-a-Guccis to become so good at catching spam, and we don't worry about the fact that they do the cleansing of that level of data for us and so we'll get to that stage first, and then we'll get better and better and better. I think humans have a natural tendency to step up, they always do. We've always, through many generations, we have always stepped up higher than where we were before, so this is going to make us step up further. We're going to demand more, we're going to invent more, we're going to create more. But it's not going to be, I don't see it as a real threat. The places where I see it as a threat is when the data has bias, or the data is manipulated, which exists even without machine learning. >> I love though, that the analogy that you're making is as technology is evolving, it's kind of a natural catalyst >> Janet: It is a natural catalyst. >> For us humans to evolve and learn and progress and that's a great cycle that you're-- >> Yeah, imagine how we did farming ten years ago, twenty years ago. Imagine how we drive our cars today than we did many years ago. Imagine the role of maps in our lives. Imagine the role of autonomous cars. This is a natural progression of the human race, that's how I see it, and you can see the younger, young people now are so natural for them, technology is so natural for them. They can tweet, and swipe, and that's the natural progression of the human race. I don't think we can stop that, I think we have to embrace that it's a gift. >> That's a great message, embracing it. It is a gift. Well, we wish you the best of luck this year at Western Digital, and thank you for inspiring us and probably many that are here and those that are watching the livestream. Janet George, thanks so much for being on The Cube. >> Thank you. >> Thank you for watching The Cube. We are again live from the second annual Women in Data Science conference at Stanford, I'm Lisa Martin, don't go away. We'll be right back. (upbeat electronic music)

Published Date : Feb 3 2017

SUMMARY :

it's The Cube covering the Women in I'm Lisa Martin and we are going to be talking about. data science is applied in the industry. One of the use case How has it helped to in the way we do work. apply the learnings, to to look at our data. that can inform the a little bit about the the things we found out quite a bit on, to look at the helping to shape the next started to open up and we but in many cities across the globe. That's an issue because of the time frame, the ability to communicate? So it's the intersection of the intersection of I always say "is the So your background in computer science, comes from being in the field. problems that I had to You talked about the how can the algorithms we built, that at least in the I do believe that the time of the human race, Well, we wish you the We are again live from the second annual

ENTITIES

Entity	Category	Confidence
Janet	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Janet George	PERSON	0.99+
Western Digital	ORGANIZATION	0.99+
Six	QUANTITY	0.99+
Lisa	PERSON	0.99+
99%	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
12 weeks	QUANTITY	0.99+
Stanford University	ORGANIZATION	0.99+
third circle	QUANTITY	0.99+
first circle	QUANTITY	0.99+
twenty years ago	DATE	0.99+
second circle	QUANTITY	0.99+
eight weeks	QUANTITY	0.99+
six	QUANTITY	0.99+
The Cube	TITLE	0.99+
ten years ago	DATE	0.99+
One	QUANTITY	0.98+
three	QUANTITY	0.98+
eight weeks	QUANTITY	0.98+
three circles	QUANTITY	0.98+
Women in Data Science Technical Conference	EVENT	0.98+
this year	DATE	0.98+
Feliz	PERSON	0.97+
Stanford	LOCATION	0.97+
three elements	QUANTITY	0.97+
one	QUANTITY	0.97+
Women in Data Science Conference 2017	EVENT	0.97+
both	QUANTITY	0.96+
Women in Data Science Conference	EVENT	0.96+
many years ago	DATE	0.96+
second learning cycle	QUANTITY	0.96+
Women in Data Science	EVENT	0.96+
one day event	QUANTITY	0.96+
first learning cycle	QUANTITY	0.94+
first learning cycle	QUANTITY	0.93+
today	DATE	0.91+
one area	QUANTITY	0.91+
Women in Data Science conference	EVENT	0.89+
second	QUANTITY	0.88+
millions of wafers	QUANTITY	0.87+
first	QUANTITY	0.86+
one-stop	QUANTITY	0.86+
about 3 million wafers	QUANTITY	0.84+
-a-Guccis	ORGANIZATION	0.81+
The Cube	ORGANIZATION	0.77+
University	ORGANIZATION	0.6+
second annual	QUANTITY	0.56+
2017	DATE	0.51+
Cube	PERSON	0.36+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for second learning cycle: