Dr Eng Lim Goh, High Performance Computing & AI | HPE Discover 2021
>>Welcome back to HPD discovered 2021 the cubes virtual coverage, continuous coverage of H P. S H. P. S. Annual customer event. My name is Dave Volonte and we're going to dive into the intersection of high performance computing data and AI with DR Eng limb go who is the senior vice president and CTO for AI at Hewlett Packard enterprise Doctor go great to see you again. Welcome back to the cube. >>Hello Dave, Great to talk to you again. >>You might remember last year we talked a lot about swarm intelligence and how AI is evolving. Of course you hosted the day two keynotes here at discover you talked about thriving in the age of insights and how to craft a data centric strategy and you addressed you know some of the biggest problems I think organizations face with data that's You got a data is plentiful but insights they're harder to come by. And you really dug into some great examples in retail banking and medicine and health care and media. But stepping back a little bit with zoom out on discovered 21, what do you make of the events so far? And some of your big takeaways? >>Mm Well you started with the insightful question, right? Yeah. Data is everywhere then. But we like the insight. Right? That's also part of the reason why that's the main reason why you know Antonio on day one focused and talked about that. The fact that we are now in the age of insight. Right? Uh and and uh and and how to thrive thrive in that in this new age. What I then did on the day to kino following Antonio is to talk about the challenges that we need to overcome in order in order to thrive in this new age. >>So maybe we could talk a little bit about some of the things that you took away in terms I'm specifically interested in some of the barriers to achieving insights when you know customers are drowning in data. What do you hear from customers? What we take away from some of the ones you talked about today? >>Oh, very pertinent question. Dave you know the two challenges I spoke about right now that we need to overcome in order to thrive in this new age. The first one is is the current challenge and that current challenge is uh you know stated is you know, barriers to insight, you know when we are awash with data. So that's a statement right? How to overcome those barriers. What are the barriers of these two insight when we are awash in data? Um I in the data keynote I spoke about three main things. Three main areas that received from customers. The first one, the first barrier is in many with many of our customers. A data is siloed. All right. You know, like in a big corporation you've got data siloed by sales, finance, engineering, manufacturing, and so on, uh supply chain and so on. And uh, there's a major effort ongoing in many corporations to build a federation layer above all those silos so that when you build applications above they can be more intelligent. They can have access to all the different silos of data to get better intelligence and more intelligent applications built. So that was the that was the first barrier we spoke about barriers to incite when we are washed with data. The second barrier is uh, that we see amongst our customers is that uh data is raw and dispersed when they are stored and and uh and you know, it's tough to get tough to to get value out of them. Right? And I in that case I I used the example of uh you know the May 6 2010 event where the stock market dropped a trillion dollars in in tens of ministerial. We we all know those who are financially attuned with know about this uh incident But this is not the only incident. There are many of them out there and for for that particular May six event uh you know, it took a long time to get insight months. Yeah before we for months we had no insight as to what happened, why it happened, right. Um and and there were many other incidences like this. And the regulators were looking for that one rule that could, that could mitigate many of these incidences. Um one of our customers decided to take the hard road go with the tough data right? Because data is rolling dispersed. So they went into all the different feeds of financial transaction information. Uh took the took the tough uh took the tough road and analyze that data took a long time to assemble and they discovered that there was court stuffing right? That uh people were sending a lot of traits in and then cancelling them almost immediately. You have to manipulate the market. Um And why why why didn't we see it immediately? Well the reason is the process reports that everybody sees uh rule in there that says all trades. Less than 100 shares don't need to report in there. And so what people did was sending a lot of less than 103 100 100 shares trades uh to fly under the radar to do this manipulation. So here is here the second barrier right? Data could be raw and dispersed. Um Sometimes you just have to take the hard road and um and to get insight And this is 1 1 great example. And then the last barrier is uh is has to do with sometimes when you start a project to to get insight to get uh to get answers and insight. You you realize that all the datas around you but you don't you don't seem to find the right ones To get what you need. You don't you don't seem to get the right ones. Yeah. Um here we have three quick examples of customers. 111 was it was a great example right? Where uh they were trying to build a language translator, a machine language translator between two languages. Right? But not do that. They need to get hundreds of millions of word pairs, you know, of one language compared uh with the corresponding other hundreds of millions of them. They say we are going to get all these word pairs. Someone creative thought of a willing source and a huge, so it was a United Nations you see. So sometimes you think you don't have the right data with you, but there might be another source and a willing one that could give you that data right. The second one has to do with uh there was uh the uh sometimes you you may just have to generate that data, interesting one. We had an autonomous car customer that collects all these data from their cars, right, massive amounts of data, loss of senses, collect loss of data. And uh you know, but sometimes they don't have the data they need even after collection. For example, they may have collected the data with a car uh in in um in fine weather and collected the car driving on this highway in rain and also in stone, but never had the opportunity to collect the car in hale because that's a rare occurrence. So instead of waiting for a time where the car can dr inhale, they build a simulation you by having the car collector in snow and simulated him. So these are some of the examples where we have customers working to overcome barriers, right? You have barriers that is associated the fact that data is silo Federated, it various associated with data. That's tough to get that. They just took the hard road, right? And sometimes, thirdly, you just have to be creative to get the right data you need, >>wow, I tell you, I have about 100 questions based on what you just said. Uh, there's a great example, the flash crash. In fact, Michael Lewis wrote about this in his book, The Flash Boys and essentially right. It was high frequency traders trying to front run the market and sending in small block trades trying to get on the front end it. So that's and they, and they chalked it up to a glitch like you said, for months, nobody really knew what it was. So technology got us into this problem. I guess my question is, can technology help us get out of the problem? And that maybe is where AI fits in. >>Yes, yes. Uh, in fact, a lot of analytics, we went in, uh, to go back to the raw data that is highly dispersed from different sources, right, assemble them to see if you can find a material trend, right? You can see lots of trends right? Like, uh, you know, we, if if humans look at things right, we tend to see patterns in clouds, right? So sometimes you need to apply statistical analysis, um math to be sure that what the model is seeing is is real. Right? And and that required work. That's one area. The second area is uh you know, when um uh there are times when you you just need to to go through that uh that tough approach to to find the answer. Now, the issue comes to mind now is is that humans put in the rules to decide what goes into a report that everybody sees in this case uh before the change in the rules. Right? But by the way, after the discovery, the authorities change the rules and all all shares, all traits of different any sizes. It has to be reported. No. Yeah. Right. But the rule was applied uh you know, to say earlier that shares under 100 trades under 100 shares need not be reported. So sometimes you just have to understand that reports were decided by humans and and under for understandable reasons. I mean they probably didn't want that for various reasons not to put everything in there so that people could still read it uh in a reasonable amount of time. But uh we need to understand that rules were being put in by humans for the reports we read. And as such, there are times you just need to go back to the raw data. >>I want to ask, >>albeit that it's gonna be tough. >>Yeah. So I want to ask a question about AI is obviously it's in your title and it's something you know a lot about but and I want to make a statement, you tell me if it's on point or off point. So it seems that most of the Ai going on in the enterprise is modeling data science applied to troves of data >>but >>but there's also a lot of ai going on in consumer whether it's you know, fingerprint technology or facial recognition or natural language processing. Will a two part question will the consumer market has so often in the enterprise sort of inform us uh the first part and then will there be a shift from sort of modeling if you will to more you mentioned autonomous vehicles more ai influencing in real time. Especially with the edge. She can help us understand that better. >>Yeah, it's a great question. Right. Uh there are three stages to just simplify, I mean, you know, it's probably more sophisticated than that but let's simplify three stages. All right. To to building an Ai system that ultimately can predict, make a prediction right or to to assist you in decision making, have an outcome. So you start with the data massive amounts data that you have to decide what to feed the machine with. So you feed the machine with this massive chunk of data and the machine uh starts to evolve a model based on all the data is seeing. It starts to evolve right to the point that using a test set of data that you have separately campus site that you know the answer for. Then you test the model uh you know after you trained it with all that data to see whether it's prediction accuracy is high enough and once you are satisfied with it, you you then deploy the model to make the decision and that's the influence. Right? So a lot of times depend on what what we are focusing on. We we um in data science are we working hard on assembling the right data to feed the machine with, That's the data preparation organization work. And then after which you build your models, you have to pick the right models for the decisions and prediction you wanted to make. You pick the right models and then you start feeding the data with it. Sometimes you you pick one model and the prediction isn't that robust, it is good but then it is not consistent right now what you do is uh you try another model so sometimes it's just keep trying different models until you get the right kind. Yeah, that gives you a good robust decision making and prediction after which It is tested well Q eight. You would then take that model and deploy it at the edge. Yeah. And then at the edges is essentially just looking at new data, applying it to the model, you're you're trained and then that model will give you a prediction decision. Right? So uh it is these three stages. Yeah, but more and more uh you know, your question reminds me that more and more people are thinking as the edge become more and more powerful. Can you also do learning at the edge? Right. That's the reason why we spoke about swarm learning the last time, learning at the edge as a swamp, right? Because maybe individually they may not have enough power to do so. But as a swampy me, >>is that learning from the edge or learning at the edge? In other words? Yes. Yeah. Question Yeah. >>That's a great question. That's a great question. Right? So uh the quick answer is learning at the edge, right? Uh and also from the edge, but the main goal, right? The goal is to learn at the edge so that you don't have to move the data that the Edge sees first back to the cloud or the core to do the learning because that would be the reason. One of the main reasons why you want to learn at the edge, right? Uh So so that you don't need to have to send all that data back and assemble it back from all the different edge devices, assemble it back to the cloud side to to do the learning right? With swampland. You can learn it and keep the data at the edge and learn at that point. >>And then maybe only selectively send the autonomous vehicle example you gave us. Great because maybe there, you know, there may be only persisting, they're not persisting data that is inclement weather or when a deer runs across the front and then maybe they they do that and then they send that smaller data set back and maybe that's where it's modelling done. But the rest can be done at the edges. It's a new world that's coming down. Let me ask you a question, is there a limit to what data should be collected and how it should be collected? >>That's a great question again. You know uh wow today, full of these uh insightful questions that actually touches on the second challenge. Right? How do we uh in order to thrive in this new age of inside? The second challenge is are you know the is our future challenge, right? What do we do for our future? And and in there is uh the statement we make is we have to focus on collecting data strategically for the future of our enterprise. And within that I talk about what to collect right? When to organize it when you collect and then where will your data be, you know going forward that you are collecting from? So what, when and where for the what data for the what data to collect? That? That was the question you ask. Um it's it's a question that different industries have to ask themselves because it will vary, right? Um let me give you the you use the autonomous car example, let me use that. And you have this customer collecting massive amounts of data. You know, we're talking about 10 petabytes a day from the fleet of their cars. And these are not production autonomous cars, right? These are training autonomous cars collecting data so they can train and eventually deploy commercial cars, right? Um so this data collection cars they collect as a fleet of them collect temporal bikes a day. And when it came to us building a storage system to store all of that data, they realized they don't want to afford to store all of it. Now, here comes the dilemma, right? What should I after I spent so much effort building all these cars and sensors and collecting data, I've now decide what to delete. That's a dilemma right now in working with them on this process of trimming down what they collected. You know, I'm constantly reminded of the sixties and seventies, right? To remind myself 60 and seventies, we call a large part of our D. N. A junk DNA. Today. We realize that a large part of that what we call john has function as valuable function. They are not jeans, but they regulate the function of jeans, you know, So, so what's jump in the yesterday could be valuable today or what's junk today could be valuable tomorrow, Right? So, so there's this tension going on right between you decided not wanting to afford to store everything that you can get your hands on. But on the other hand, you you know, you worry you you you ignore the wrong ones, right? You can see this tension in our customers, right? And it depends on industry here, right? In health care, they say I have no choice. I I want it. All right. One very insightful point brought up by one health care provider that really touched me was, you know, we are not we don't only care. Of course we care a lot. We care a lot about the people we are caring for, right? But you also care for the people were not caring for. How do we find them? Mhm. Right. And that therefore, they did not just need to collect data. That is that they have with from their patients. They also need to reach out right to outside data so that they can figure out who they are not caring for, right? So they want it all. So I tell us them, so what do you do with funding if you want it all? They say they have no choice but to figure out a way to fund it and perhaps monetization of what they have now is the way to come around and find that. Of course they also come back to us rightfully that you know, we have to then work out a way to help them build that system, you know? So that's health care, right? And and if you go to other industries like banking, they say they can't afford to keep them off, but they are regulated, seems like healthcare, they are regulated as to uh privacy and such. Like so many examples different industries having different needs, but different approaches to how what they collect. But there is this constant tension between um you perhaps deciding not wanting to fund all of that uh all that you can store, right? But on the other hand, you know, if you if you kind of don't want to afford it and decide not to store some uh if he does some become highly valuable in the future, right? Yeah. >>We can make some assumptions about the future, can't we? I mean, we know there's gonna be a lot more data than than we've ever seen before. We know that we know well notwithstanding supply constraints on things like nand. We know the prices of storage is going to continue to decline. We also know, and not a lot of people are really talking about this but the processing power but he says moore's law is dead okay. It's waning. But the processing power when you combine the Cpus and NP US and GPUS and accelerators and and so forth actually is is increasing. And so when you think about these use cases at the edge, you're going to have much more processing power, you're gonna have cheaper storage and it's going to be less expensive processing And so as an ai practitioner, what can you do with that? >>Yeah, it's highly again, another insightful questions that we touched on our keynote and that that goes up to the why I do the where? Right, When will your data be? Right. We have one estimate that says that by next year there will be 55 billion connected devices out there. Right. 55 billion. Right. What's the population of the world? Of the other? Of 10 billion? But this thing is 55 billion. Right? Uh and many of them, most of them can collect data. So what do you what do you do? Right. Um So the amount of data that's gonna come in, it's gonna weigh exceed right? Our drop in storage costs are increasing computer power. Right? So what's the answer? Right. So, so the the answer must be knowing that we don't and and even the drop in price and increase in bandwidth, it will overwhelm the increased five G will overwhelm five G. Right? Given amount 55 billion of them collecting. Right? So, the answer must be that there might need to be a balance between you needing to bring all that data from the 55 billion devices of data back to a central as a bunch of central Cause because you may not be able to afford to do that firstly band with even with five G. M and and SD when you'll still be too expensive given the number of devices out there. Were you given storage cause dropping will still be too expensive to try and store them all. So the answer must be to start at least to mitigate the problem to some leave both a lot of the data out there. Right? And only send back the pertinent ones as you said before. But then if you did that, then how are we gonna do machine learning at the core and the cloud side? If you don't have all the data you want rich data to train with. Right? Some sometimes you want a mix of the uh positive type data and the negative type data so you can train the machine in a more balanced way. So the answer must be eventually right. As we move forward with these huge number of devices out of the edge to do machine learning at the edge. Today, we don't have enough power. Right? The edge typically is characterized by a lower uh, energy capability and therefore lower compute power. But soon, you know, even with lower energy, they can do more with compute power improving in energy efficiency, Right? Uh, so learning at the edge today, we do influence at the edge. So we data model deploy and you do influence at the age, that's what we do today. But more and more, I believe, given a massive amount of data at the edge, you you have to have to start doing machine learning at the edge. And and if when you don't have enough power, then you aggregate multiple devices, compute power into a swamp and learn as a swan, >>interesting. So now, of course, if I were sitting and fly on the wall in HP board meeting, I said, okay, HP is as a leading provider of compute, how do you take advantage of that? I mean, we're going, I know it's future, but you must be thinking about that and participating in those markets. I know today you are you have, you know, edge line and other products. But there's it seems to me that it's it's not the general purpose that we've known in the past. It's a new type of specialized computing. How are you thinking about participating in that >>opportunity for your customers? Uh the world will have to have a balance right? Where today the default, Well, the more common mode is to collect the data from the edge and train at uh at some centralized location or a number of centralized location um going forward. Given the proliferation of the edge devices, we'll need a balance. We need both. We need capability at the cloud side. Right. And it has to be hybrid. And then we need capability on the edge side. Yeah. That they want to build systems that that on one hand, uh is uh edge adapted, right? Meaning the environmentally adapted because the edge different they are on a lot of times on the outside. Uh They need to be packaging adapted and also power adapted, right? Because typically many of these devices are battery powered. Right? Um so you have to build systems that adapt to it, but at the same time they must not be custom. That's my belief. They must be using standard processes and standard operating system so that they can run rich a set of applications. So yes. Um that's that's also the insightful for that Antonio announced in 2018, Uh the next four years from 2018, right, $4 billion dollars invested to strengthen our edge portfolio, edge product lines, right Edge solutions. >>I get a doctor go. I could go on for hours with you. You're you're just such a great guest. Let's close what are you most excited about in the future of of of it? Certainly H. P. E. But the industry in general. >>Yeah I think the excitement is uh the customers right? The diversity of customers and and the diversity in a way they have approached their different problems with data strategy. So the excitement is around data strategy right? Just like you know uh you know the the statement made was was so was profound. Right? Um And Antonio said we are in the age of insight powered by data. That's the first line right? The line that comes after that is as such were becoming more and more data centric with data the currency. Now the next step is even more profound. That is um you know we are going as far as saying that you know um data should not be treated as cost anymore. No right. But instead as an investment in a new asset class called data with value on our balance sheet, this is a this is a step change right in thinking that is going to change the way we look at data the way we value it. So that's a statement that this is the exciting thing because because for for me a city of AI right uh machine is only as intelligent as the data you feed it with. Data is a source of the machine learning to be intelligent. So so that's that's why when when people start to value data right? And and and say that it is an investment when we collect it. It is very positive for ai because an Ai system gets intelligent, more intelligence because it has a huge amounts of data and the diversity of data. So it'd be great if the community values values data. Well >>you certainly see it in the valuations of many companies these days. Um and I think increasingly you see it on the income statement, you know data products and people monetizing data services and maybe eventually you'll see it in the in the balance. You know Doug Laney when he was a gardener group wrote a book about this and a lot of people are thinking about it. That's a big change isn't it? Dr >>yeah. Question is is the process and methods evaluation. Right. But uh I believe we'll get there, we need to get started then we'll get their belief >>doctor goes on and >>pleasure. And yeah and then the yeah I will will will will benefit greatly from it. >>Oh yeah, no doubt people will better understand how to align you know, some of these technology investments, Doctor goes great to see you again. Thanks so much for coming back in the cube. It's been a real pleasure. >>Yes. A system. It's only as smart as the data you feed it with. >>Excellent. We'll leave it there. Thank you for spending some time with us and keep it right there for more great interviews from HP discover 21. This is dave a lot for the cube. The leader in enterprise tech coverage right back.
SUMMARY :
at Hewlett Packard enterprise Doctor go great to see you again. the age of insights and how to craft a data centric strategy and you addressed you know That's also part of the reason why that's the main reason why you know Antonio on day one So maybe we could talk a little bit about some of the things that you The first one is is the current challenge and that current challenge is uh you know stated So that's and they, and they chalked it up to a glitch like you said, is is that humans put in the rules to decide what goes into So it seems that most of the Ai going on in the enterprise is modeling be a shift from sort of modeling if you will to more you mentioned autonomous It starts to evolve right to the point that using a test set of data that you have is that learning from the edge or learning at the edge? The goal is to learn at the edge so that you don't have to move the data that the And then maybe only selectively send the autonomous vehicle example you gave us. But on the other hand, you know, if you if you kind of don't want to afford it and But the processing power when you combine the Cpus and NP that there might need to be a balance between you needing to bring all that data from the I know today you are you have, you know, edge line and other products. Um so you have to build systems that adapt to it, but at the same time they must not Let's close what are you most excited about in the future of machine is only as intelligent as the data you feed it with. Um and I think increasingly you see it on the income statement, you know data products and Question is is the process and methods evaluation. And yeah and then the yeah I will will will will benefit greatly from it. Doctor goes great to see you again. It's only as smart as the data you feed it with. Thank you for spending some time with us and keep it right there for more great
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Michael Lewis | PERSON | 0.99+ |
Dave Volonte | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
2018 | DATE | 0.99+ |
HP | ORGANIZATION | 0.99+ |
two languages | QUANTITY | 0.99+ |
The Flash Boys | TITLE | 0.99+ |
55 billion | QUANTITY | 0.99+ |
10 billion | QUANTITY | 0.99+ |
second challenge | QUANTITY | 0.99+ |
Hewlett Packard | ORGANIZATION | 0.99+ |
two challenges | QUANTITY | 0.99+ |
second area | QUANTITY | 0.99+ |
one language | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
last year | DATE | 0.99+ |
Doug Laney | PERSON | 0.99+ |
tomorrow | DATE | 0.99+ |
next year | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
first line | QUANTITY | 0.99+ |
first part | QUANTITY | 0.99+ |
May 6 2010 | DATE | 0.99+ |
$4 billion dollars | QUANTITY | 0.99+ |
two part | QUANTITY | 0.99+ |
Less than 100 shares | QUANTITY | 0.99+ |
HPD | ORGANIZATION | 0.99+ |
one model | QUANTITY | 0.98+ |
one rule | QUANTITY | 0.98+ |
one area | QUANTITY | 0.98+ |
second barrier | QUANTITY | 0.98+ |
60 | QUANTITY | 0.98+ |
55 billion devices | QUANTITY | 0.98+ |
Antonio | PERSON | 0.98+ |
john | PERSON | 0.98+ |
three stages | QUANTITY | 0.98+ |
hundreds of millions | QUANTITY | 0.97+ |
about 100 questions | QUANTITY | 0.97+ |
Eng Lim Goh | PERSON | 0.97+ |
HPE | ORGANIZATION | 0.97+ |
first barrier | QUANTITY | 0.97+ |
first one | QUANTITY | 0.97+ |
Three main areas | QUANTITY | 0.97+ |
yesterday | DATE | 0.96+ |
tens of ministerial | QUANTITY | 0.96+ |
two insight | QUANTITY | 0.96+ |
Q eight | OTHER | 0.95+ |
2021 | DATE | 0.94+ |
seventies | QUANTITY | 0.94+ |
two keynotes | QUANTITY | 0.93+ |
a day | QUANTITY | 0.93+ |
first | QUANTITY | 0.92+ |
H P. S H. P. S. Annual customer | EVENT | 0.91+ |
United Nations | ORGANIZATION | 0.91+ |
less than 103 100 100 shares | QUANTITY | 0.91+ |
under 100 trades | QUANTITY | 0.9+ |
under 100 shares | QUANTITY | 0.9+ |
day one | QUANTITY | 0.88+ |
about 10 petabytes a day | QUANTITY | 0.88+ |
three quick examples | QUANTITY | 0.85+ |
one health care provider | QUANTITY | 0.85+ |
one estimate | QUANTITY | 0.84+ |
three main things | QUANTITY | 0.83+ |
hundreds of millions of word pairs | QUANTITY | 0.82+ |
Antonio | ORGANIZATION | 0.81+ |
sixties | QUANTITY | 0.78+ |
one | QUANTITY | 0.77+ |
May six | DATE | 0.75+ |
firstly | QUANTITY | 0.74+ |
trillion dollars | QUANTITY | 0.73+ |
second one | QUANTITY | 0.71+ |
HP discover 21 | ORGANIZATION | 0.69+ |
DR Eng limb | PERSON | 0.69+ |
one of our customers | QUANTITY | 0.66+ |