Making AI Real – A practitioner’s view | Exascale Day

>> Narrator: From around the globe, it's theCUBE with digital coverage of Exascale day, made possible by Hewlett Packard Enterprise. >> Hey, welcome back Jeff Frick here with the cube come due from our Palo Alto studios, for their ongoing coverage in the celebration of Exascale day 10 to the 18th on October 18th, 10 with 18 zeros, it's all about big powerful giant computing and computing resources and computing power. And we're excited to invite back our next guest she's been on before. She's Dr. Arti Garg, head of advanced AI solutions and technologies for HPE. Arti great to see you again. >> Great to see you. >> Absolutely. So let's jump into before we get into Exascale day I was just looking at your LinkedIn profile. It's such a very interesting career. You've done time at Lawrence Livermore, You've done time in the federal government, You've done time at GE and industry, I just love if you can share a little bit of your perspective going from hardcore academia to, kind of some government positions, then into industry as a data scientist, and now with originally Cray and now HPE looking at it really from more of a vendor side. >> Yeah. So I think in some ways, I think I'm like a lot of people who've had the title of data scientists somewhere in their history where there's no single path, to really working in this industry. I come from a scientific background. I have a PhD in physics, So that's where I started working with large data sets. I think of myself as a data scientist before the term data scientist was a term. And I think it's an advantage, to be able to have seen this explosion of interest in leveraging data to gain insights, whether that be into the structure of the galaxy, which is what I used to look at, or whether that be into maybe new types of materials that could advance our ability to build lightweight cars or safety gear. It's allows you to take a perspective to not only understand what the technical challenges are, but what also the implementation challenges are, and why it can be hard to use data to solve problems. >> Well, I'd just love to get your, again your perspective cause you are into data, you chose that as your profession, and you probably run with a whole lot of people, that are also like-minded in terms of data. As an industry and as a society, we're trying to get people to do a better job of making database decisions and getting away from their gut and actually using data. I wonder if you can talk about the challenges of working with people who don't come from such an intense data background to get them to basically, I don't know if it's understand the value of more of a data kind decision making process or board just it's worth the effort, cause it's not easy to get the data and cleanse the data, and trust the data and get the right context, working with people that don't come from that background. And aren't so entrenched in that point of view, what surprises you? How do you help them? What can you share in terms of helping everybody get to be a more data centric decision maker? >> So I would actually rephrase the question a little bit Jeff, and say that actually I think people have always made data driven decisions. It's just that in the past we maybe had less data available to us or the quality of it was not as good. And so as a result most organizations have developed organize themselves to make decisions, to run their processes based on a much smaller and more refined set of information, than is currently available both given our ability to generate lots of data, through software and sensors, our ability to store that data. And then our ability to run a lot of computing cycles and a lot of advanced math against that data, to learn things that maybe in the past took, hundreds of years of experiments in scientists to understand. And so before I jumped into, how do you overcome that barrier? Just I'll use an example because you mentioned, I used to work in industry I used to work at GE. And one of the things that I often joked about, is the number of times I discovered Bernoulli's principle, in data coming off a GE jet engines you could do that overnight processing these large data but of course historically that took hundreds of years, to really understand these physical principles. And so I think when it comes to how do we bridge the gap between people who are adapt at processing large amounts of data, and running algorithms to pull insights out? I think it's both sides. I think it's those of us who are coming from the technical background, really understanding the way decisions are currently made, the way process and operations currently work at an organization. And understanding why those things are the way they are maybe their security or compliance or accountability concerns, that a new algorithm can't just replace those. And so I think it's on our end, really trying to understand, and make sure that whatever new approaches we're bringing address those concerns. And I think for folks who aren't necessarily coming from a large data set, and analytical background and when I say analytical, I mean in the data science sense, not in the sense of thinking about things in an abstract way to really recognize that these are just tools, that can enhance what they're doing, and they don't necessarily need to be frightening because I think that people who have been say operating electric grids for a long time, or fixing aircraft engines, they have a lot of expertise and a lot of understanding, and that's really important to making any kind of AI driven solution work. >> That's great insight but that but I do think one thing that's changed you come from a world where you had big data sets, so you kind of have a big data set point of view, where I think for a lot of decision makers they didn't have that data before. So we won't go through all the up until the right explosions of data, and obviously we're talking about Exascale day, but I think for a lot of processes now, the amount of data that they can bring to bear, is so dwarfs what they had in the past that before they even consider how to use it they still have to contextualize it, and they have to manage it and they have to organize it and there's data silos. So there's all this kind of nasty processes stuff, that's in the way some would argue has been kind of a real problem with the promise of BI, and does decision support tools. So as you look at at this new stuff and these new datasets, what are some of the people in process challenges beyond the obvious things that we can think about, which are the technical challenges? >> So I think that you've really hit on, something I talk about sometimes it was kind of a data deluge that we experienced these days, and the notion of feeling like you're drowning in information but really lacking any kind of insight. And one of the things that I like to think about, is to actually step back from the data questions the infrastructure questions, sort of all of these technical questions that can seem very challenging to navigate. And first ask ourselves, what problems am I trying to solve? It's really no different than any other type of decision you might make in an organization to say like, what are my biggest pain points? What keeps me up at night? or what would just transform the way my business works? And those are the problems worth solving. And then the next question becomes, if I had more data if I had a better understanding of something about my business or about my customers or about the world in which we all operate, would that really move the needle for me? And if the answer is yes, then that starts to give you a picture of what you might be able to do with AI, and it starts to tell you which of those data management challenges, whether they be cleaning the data, whether it be organizing the data, what it, whether it be building models on the data are worth solving because you're right, those are going to be a time intensive, labor intensive, highly iterative efforts. But if you know why you're doing it, then you will have a better understanding of why it's worth the effort. And also which shortcuts you can take which ones you can't, because often in order to sort of see the end state you might want to do a really quick experiment or prototype. And so you want to know what matters and what doesn't at least to that. Is this going to work at all time. >> So you're not buying the age old adage that you just throw a bunch of data in a data Lake and the answers will just spring up, just come right back out of the wall. I mean, you bring up such a good point, It's all about asking the right questions and thinking about asking questions. So again, when you talk to people, about helping them think about the questions, cause then you've got to shape the data to the question. And then you've got to start to build the algorithm, to kind of answer that question. How should people think when they're actually building algorithm and training algorithms, what are some of the typical kind of pitfalls that a lot of people fall in, haven't really thought about it before and how should people frame this process? Cause it's not simple and it's not easy and you really don't know that you have the answer, until you run multiple iterations and compare it against some other type of reference? >> Well, one of the things that I like to think about just so that you're sort of thinking about, all the challenges you're going to face up front, you don't necessarily need to solve all of these problems at the outset. But I think it's important to identify them, is I like to think about AI solutions as, they get deployed being part of a kind of workflow, and the workflow has multiple stages associated with it. The first stage being generating your data, and then starting to prepare and explore your data and then building models for your data. But sometimes I think where we don't always think about it is the next two phases, which is deploying whatever model or AI solution you've developed. And what will that really take especially in the ecosystem where it's going to live. If is it going to live in a secure and compliant ecosystem? Is it actually going to live in an outdoor ecosystem? We're seeing more applications on the edge, and then finally who's going to use it and how are they going to drive value from it? Because it could be that your AI solution doesn't work cause you don't have the right dashboard, that highlights and visualizes the data for the decision maker who will benefit from it. So I think it's important to sort of think through all of these stages upfront, and think through maybe what some of the biggest challenges you might encounter at the Mar, so that you're prepared when you meet them, and you can kind of refine and iterate along the way and even upfront tweak the question you're asking. >> That's great. So I want to get your take on we're celebrating Exascale day which is something very specific on 1018, share your thoughts on Exascale day specifically, but more generally I think just in terms of being a data scientist and suddenly having, all this massive compute power. At your disposal yoy're been around for a while. So you've seen the development of the cloud, these huge data sets and really the ability to, put so much compute horsepower against the problems as, networking and storage and compute, just asymptotically approach zero, I mean for as a data scientist you got to be pretty excited about kind of new mysteries, new adventures, new places to go, that we just you just couldn't do it 10 years ago five years ago, 15 years ago. >> Yeah I think that it's, it'll--only time will tell exactly all of the things that we'll be able to unlock, from these new sort of massive computing capabilities that we're going to have. But a couple of things that I'm very excited about, are that in addition to sort of this explosion or these very large investments in large supercomputers Exascale super computers, we're also seeing actually investment in these other types of scientific instruments that when I say scientific it's not just academic research, it's driving pharmaceutical drug discovery because we're talking about these, what they call light sources which shoot x-rays at molecules, and allow you to really understand the structure of the molecules. What Exascale allows you to do is, historically it's been that you would go take your molecule to one of these light sources and you shoot your, x-rays edit and you would generate just masses and masses of data, terabytes of data it was each shot. And being able to then understand, what you were looking at was a long process, getting computing time and analyzing the data. We're on the precipice of being able to do that, if not in real time much closer to real time. And I don't really know what happens if instead of coming up with a few molecules, taking them, studying them, and then saying maybe I need to do something different. I can do it while I'm still running my instrument. And I think that it's very exciting, from the perspective of someone who's got a scientific background who likes using large data sets. There's just a lot of possibility of what Exascale computing allows us to do in from the standpoint of I don't have to wait to get results, and I can either stimulate much bigger say galaxies, and really compare that to my data or galaxies or universes, if you're an astrophysicist or I can simulate, much smaller finer details of a hypothetical molecule and use that to predict what might be possible, from a materials or drug perspective, just to name two applications that I think Exascale could really drive. >> That's really great feedback just to shorten that compute loop. We had an interview earlier in some was talking about when the, biggest workload you had to worry about was the end of the month when you're running your financial, And I was like, why wouldn't that be nice to be the biggest job that we have to worry about? But now I think we saw some of this at animation, in the movie business when you know the rendering for whether it's a full animation movie, or just something that's a heavy duty three effects. When you can get those dailies back to the, to the artist as you said while you're still working, or closer to when you're working versus having this, huge kind of compute delay, it just changes the workflow dramatically and the pace of change and the pace of output. Because you're not context switching as much and you can really get back into it. That's a super point. I want to shift gears a little bit, and talk about explainable AI. So this is a concept that a lot of people hopefully are familiar with. So AI you build the algorithm it's in a box, it runs and it kicks out an answer. And one of the things that people talk about, is we should be able to go in and pull that algorithm apart to know, why it came out with the answer that it did. To me this just sounds really really hard because it's smart people like you, that are writing the algorithms the inputs and the and the data that feeds that thing, are super complex. The math behind it is very complex. And we know that the AI trains and can change over time as you you train the algorithm it gets more data, it adjusts itself. So it's explainable AI even possible? Is it possible at some degree? Because I do think it's important. And my next question is going to be about ethics, to know why something came out. And the other piece that becomes so much more important, is as we use that output not only to drive, human based decision that needs some more information, but increasingly moving it over to automation. So now you really want to know why did it do what it did explainable AI? Share your thoughts. >> It's a great question. And it's obviously a question that's on a lot of people's mind these days. I'm actually going to revert back to what I said earlier, when I talked about Bernoulli's principle, and just the ability sometimes when you do throw an algorithm at data, it might come the first thing it will find is probably some known law of physics. And so I think that really thinking about what do we mean by explainable AI, also requires us to think about what do we mean by AI? These days AI is often used anonymously with deep learning which is a particular type of algorithm that is not very analytical at its core. And what I mean by that is, other types of statistical machine learning models, have some underlying theory of what the population of data that you're studying. And whereas deep learning doesn't, it kind of just learns whatever pattern is sitting in front of it. And so there is a sense in which if you look at other types of algorithms, they are inherently explainable because you're choosing your algorithm based on what you think the is the sort of ground truth, about the population you're studying. And so I think we going to get to explainable deep learning. I think it's kind of challenging because you're always going to be in a position, where deep learning is designed to just be as flexible as possible. I'm sort of throw more math at the problem, because there may be are things that your sort of simpler model doesn't account for. However deep learning could be, part of an explainable AI solution. If for example, it helps you identify what are important so called features to look at what are the important aspects of your data. So I don't know it depends on what you mean by AI, but are you ever going to get to the point where, you don't need humans sort of interpreting outputs, and making some sets of judgments about what a set of computer algorithms that are processing data think. I think it will take, I don't want to say I know what's going to happen 50 years from now, but I think it'll take a little while to get to the point where you don't have, to maybe apply some subject matter understanding and some human judgment to what an algorithm is putting out. >> It's really interesting we had Dr. Robert Gates on a years ago at another show, and he talked about the only guns in the U.S. military if I'm getting this right, that are automatic, that will go based on what the computer tells them to do, and start shooting are on the Korean border. But short of that there's always a person involved, before anybody hits a button which begs a question cause we've seen this on the big data, kind of curve, i think Gartner has talked about it, as we move up from kind of descriptive analytics diagnostic analytics, predictive, and then prescriptive and then hopefully autonomous. So I wonder so you're saying will still little ways in that that last little bumps going to be tough to overcome to get to the true autonomy. >> I think so and you know it's going to be very application dependent as well. So it's an interesting example to use the DMZ because that is obviously also a very, mission critical I would say example but in general I think that you'll see autonomy. You already do see autonomy in certain places, where I would say the States are lower. So if I'm going to have some kind of recommendation engine, that suggests if you look at the sweater maybe like that one, the risk of getting that wrong. And so fully automating that as a little bit lower, because the risk is you don't buy the sweater. I lose a little bit of income I lose a little bit of revenue as a retailer, but the risk of I make that turn, because I'm going to autonomous vehicle as much higher. So I think that you will see the progression up that curve being highly dependent on what's at stake, with different degrees of automation. That being said you will also see in certain places where there's, it's either really expensive or it's humans aren't doing a great job. You may actually start to see some mission critical automation. But those would be the places where you're seeing them. And actually I think that's one of the reasons why you see actually a lot more autonomy, in the agriculture space, than you do in the sort of passenger vehicle space. Because there's a lot at stake and it's very difficult for human beings to sort of drive large combines. >> plus they have a real they have a controlled environment. So I've interviewed Caterpillar they're doing a ton of stuff with autonomy. Cause they're there control that field, where those things are operating, and whether it's a field or a mine, it's actually fascinating how far they've come with autonomy. But let me switch to a different industry that I know is closer to your heart, and looking at some other interviews and let's talk about diagnosing disease. And if we take something specific like reviewing x-rays where the computer, and it also brings in the whole computer vision and bringing in computer vision algorithms, excuse me they can see things probably fast or do a lot more comparisons, than potentially a human doctor can. And or hopefully this whole signal to noise conversation elevate the signal for the doctor to review, and suppress the noise it's really not worth their time. They can also review a lot of literature, and hopefully bring a broader potential perspective of potential diagnoses within a set of symptoms. You said before you both your folks are physicians, and there's a certain kind of magic, a nuance, almost like kind of more childlike exploration to try to get out of the algorithm if you will to think outside the box. I wonder if you can share that, synergy between using computers and AI and machine learning to do really arduous nasty things, like going through lots and lots and lots and lots of, x-rays compared to and how that helps with, doctor who's got a whole different kind of set of experience a whole different kind of empathy, whole different type of relationship with that patient, than just a bunch of pictures of their heart or their lungs. >> I think that one of the things is, and this kind of goes back to this question of, is AI for decision support versus automation? And I think that what AI can do, and what we're pretty good at these days, with computer vision is picking up on subtle patterns right now especially if you have a very large data set. So if I can train on lots of pictures of lungs, it's a lot easier for me to identify the pictures that somehow these are not like the other ones. And that can be helpful but I think then to really interpret what you're seeing and understand is this. Is it actually bad quality image? Is it some kind of some kind of medical issue? And what is the medical issue? I think that's where bringing in, a lot of different types of knowledge, and a lot of different pieces of information. Right now I think humans are a little bit better at doing that. And some of that's because I don't think we have great ways to train on, sort of sparse datasets I guess. And the second part is that human beings might be 40 years of training a model. They 50 years of training a model as opposed to six months, or something with sparse information. That's another thing that human beings have their sort of lived experience, and the data that they bring to bear, on any type of prediction or classification is actually more than just say what they saw in their medical training. It might be the people they've met, the places they've lived what have you. And I think that's that part that sort of broader set of learning, and how things that might not be related might actually be related to your understanding of what you're looking at. I think we've got a ways to go from a sort of artificial intelligence perspective and developed. >> But it is Exascale day. And we all know about the compound exponential curves on the computing side. But let's shift gears a little bit. I know you're interested in emerging technology to support this effort, and there's so much going on in terms of, kind of the atomization of compute store and networking to be able to break it down into smaller, smaller pieces, so that you can really scale the amount of horsepower that you need to apply to a problem, to very big or to very small. Obviously the stuff that you work is more big than small. Work on GPU a lot of activity there. So I wonder if you could share, some of the emerging technologies that you're excited about to bring again more tools to the task. >> I mean, one of the areas I personally spend a lot of my time exploring are, I guess this word gets used a lot, the Cambrian explosion of new AI accelerators. New types of chips that are really designed for different types of AI workloads. And as you sort of talked about going down, and it's almost in a way where we were sort of going back and looking at these large systems, but then exploring each little component on them, and trying to really optimize that or understand how that component contributes to the overall performance of the whole. And I think one of the things that just, I don't even know there's probably close to a hundred active vendors in the space of developing new processors, and new types of computer chips. I think one of the things that that points to is, we're moving in the direction of generally infrastructure heterogeneity. So it used to be when you built a system you probably had one type of processor, or you probably had a pretty uniform fabric across your system you usually had, I think maybe storage we started to get tearing a little bit earlier. But now I think that what we're going to see, and we're already starting to see it with Exascale systems where you've got GPUs and CPUs on the same blades, is we're starting to see as the workloads that are running at large scales are becoming more complicated. Maybe I'm doing some simulation and then I'm running I'm training some kind of AI model, and then I'm inferring it on some other type, some other output of the simulation. I need to have the ability to do a lot of different things, and do them in at a very advanced level. Which means I need very specialized technology to do it. And I think it's an exciting time. And I think we're going to test, we're going to break a lot of things. I probably shouldn't say that in this interview, but I'm hopeful that we're going to break some stuff. We're going to push all these systems to the limit, and find out where we actually need to push a little harder. And I some of the areas I think that we're going to see that, is there We're going to want to move data, and move data off of scientific instruments, into computing, into memory, into a lot of different places. And I'm really excited to see how it plays out, and what you can do and where the limits are of what you can do with the new systems. >> Arti I could talk to you all day. I love the experience and the perspective, cause you've been doing this for a long time. So I'm going to give you the final word before we sign out and really bring it back, to a more human thing which is ethics. So one of the conversations we hear all the time, is that if you are going to do something, if you're going to put together a project and you justify that project, and then you go and you collect the data and you run that algorithm and you do that project. That's great but there's like an inherent problem with, kind of data collection that may be used for something else down the road that maybe you don't even anticipate. So I just wonder if you can share, kind of top level kind of ethical take on how data scientists specifically, and then ultimately more business practitioners and other people that don't carry that title. Need to be thinking about ethics and not just kind of forget about it. That these are I had a great interview with Paul Doherty. Everybody's data is not just their data, it's it represents a person, It's a representation of what they do and how they lives. So when you think about kind of entering into a project and getting started, what do you think about in terms of the ethical considerations and how should people be cautious that they don't go places that they probably shouldn't go? >> I think that's a great question out a short answer. But I think that I honestly don't know that we have a great solutions right now, but I think that the best we can do is take a very multifaceted, and also vigilant approach to it. So when you're collecting data, and often we should remember a lot of the data that gets used isn't necessarily collected for the purpose it's being used, because we might be looking at old medical records, or old any kind of transactional records whether it be from a government or a business. And so as you start to collect data or build solutions, try to think through who are all the people who might use it? And what are the possible ways in which it could be misused? And also I encourage people to think backwards. What were the biases in place that when the data were collected, you see this a lot in the criminal justice space is the historical records reflect, historical biases in our systems. And so is I there are limits to how much you can correct for previous biases, but there are some ways to do it, but you can't do it if you're not thinking about it. So I think, sort of at the outset of developing solutions, that's important but I think equally important is putting in the systems to maintain the vigilance around it. So one don't move to autonomy before you know, what potential new errors you might or new biases you might introduce into the world. And also have systems in place to constantly ask these questions. Am I perpetuating things I don't want to perpetuate? Or how can I correct for them? And be willing to scrap your system and start from scratch if you need to. >> Well Arti thank you. Thank you so much for your time. Like I said I could talk to you for days and days and days. I love the perspective and the insight and the thoughtfulness. So thank you for sharing your thoughts, as we celebrate Exascale day. >> Thank you for having me. >> My pleasure thank you. All right she's Arti I'm Jeff it's Exascale day. We're covering on the queue thanks for watching. We'll see you next time. (bright upbeat music)

Published Date : Oct 16 2020

SUMMARY :

Narrator: From around the globe, Arti great to see you again. I just love if you can share a little bit And I think it's an advantage, and you probably run with and that's really important to making and they have to manage it and it starts to tell you which of those the data to the question. and then starting to prepare that we just you just and really compare that to my and pull that algorithm apart to know, and some human judgment to what the computer tells them to do, because the risk is you the doctor to review, and the data that they bring to bear, and networking to be able to break it down And I some of the areas I think Arti I could talk to you all day. in the systems to maintain and the thoughtfulness. We're covering on the

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
50 years	QUANTITY	0.99+
40 years	QUANTITY	0.99+
Jeff	PERSON	0.99+
Paul Doherty	PERSON	0.99+
GE	ORGANIZATION	0.99+
both sides	QUANTITY	0.99+
Arti	PERSON	0.99+
six months	QUANTITY	0.99+
Bernoulli	PERSON	0.99+
Arti Garg	PERSON	0.99+
second part	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
hundreds of years	QUANTITY	0.99+
first	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
one	QUANTITY	0.99+
10 years ago	DATE	0.99+
1018	DATE	0.98+
Dr.	PERSON	0.98+
Exascale	TITLE	0.98+
each shot	QUANTITY	0.98+
Caterpillar	ORGANIZATION	0.98+
Robert Gates	PERSON	0.98+
15 years ago	DATE	0.98+
LinkedIn	ORGANIZATION	0.98+
HPE	ORGANIZATION	0.98+
first stage	QUANTITY	0.97+
both	QUANTITY	0.96+
five years ago	DATE	0.95+
Exascale day	EVENT	0.95+
two applications	QUANTITY	0.94+
October 18th	DATE	0.94+
two phases	QUANTITY	0.92+
18th	DATE	0.91+
10	DATE	0.9+
one thing	QUANTITY	0.86+
U.S. military	ORGANIZATION	0.82+
one type	QUANTITY	0.81+
a years ago	DATE	0.81+
each little component	QUANTITY	0.79+
single path	QUANTITY	0.79+
Korean border	LOCATION	0.72+
hundred	QUANTITY	0.71+
terabytes of data	QUANTITY	0.71+
18 zeros	QUANTITY	0.71+
three effects	QUANTITY	0.68+
one of these light	QUANTITY	0.68+
Exascale Day	EVENT	0.68+
Exascale	EVENT	0.67+
things	QUANTITY	0.66+
Cray	ORGANIZATION	0.61+
Exascale day 10	EVENT	0.6+
Lawrence Livermore	PERSON	0.56+
vendors	QUANTITY	0.53+
few	QUANTITY	0.52+
reasons	QUANTITY	0.46+
lots	QUANTITY	0.46+
Cambrian	OTHER	0.43+
DMZ	ORGANIZATION	0.41+
Exascale	COMMERCIAL_ITEM	0.39+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Korean border: