Announcement: Sri Ambati, H2O.ai | CUBE Converstion, August 2019

(upbeat music) >> Announcer: From our studios, in the heart of Silicon Valley, Palo Alto, California, this is a Cube conversation. >> Everyone, welcome to this special Cube conversation here in Palo Alto Cube studios. I'm John Furrier, host of the Cube. We have special breaking news here, with Sri Ambati who is the founder and CEO of H2O.ai with big funding news. Great to see you Cube alumni, hot startup, you got some hot funding news, share with us. >> We are very excited to announce our Series D. Goldman Sachs, one of our leading customers and Ping An from China are leading our round. It's a round of $72 million, and bringing our total fundraise to 147. This is an endorsement of their support of our mission to democratize AI and an endorsement of the amazing teamwork behind the company and its customer centricity. Customers have now come to lead two of our rounds. Last round was Series C led by Wells Fargo and NVIDIA and I think it just goes to say how critical a thing we are for their success in AI. >> Well congratulations, I've been watching you guys build this company from scratch, we've had many conversations going back to 2013, '14 on The Cube. You call it-- >> You covered us long before. >> You guys were always on the wave, and you really created a category, this is a new category that Cloud 2.0 is creating which is a DevOps mindset, entrepreneurial mindset, creating a category to enable people to have the kind of infrastructure and tooling and software to enable them to do all the heavy lifting of AI without doing the heavy lifting. As the quote for cloud is, that Amazon always quotes is you do all of the undifferentiated heavy lifting that's required to stand up stuff and then provide tooling for the heavy differentiated lifting to make it easy to use. This has been a key thing. Has that been the-- >> Customers have be core to our, company building. H2O is here to build an amazing piece of innovation and technology and innovation is not new for Silicon Valley, as you know. But I think innovation, with a purpose and with a focus of customer success is something we represent and that's been kind of the key north finder for us. In terms of making things simpler, when we started, it was a grassroots movement in open source and we wanted the mind share of millions of users worldwide and that mind share got us a lot of feedback. And that feedback is how we then built the second generation of the product lines, which is driverless AI. We are also announcing our mission to make every company an AI company, this funding will power that transformation of several businesses that can then go on to build the AI superpower. >> And certainly, cloud computing, more compute more elastic resources is always a great tailwind. What are you guys going to do with the funding in terms of focus? >> You mentioned cloud which is a great story. We're obviously going to make things easier for folks who are doing the cloud, but they are the largest players, as well, Google, Microsoft, Amazon. They're right there, trying to innovate. AI is at the center of every software moment because AI eating software, software is eating the world. And so, all the software players are right there, trying to build a large AI opportunity for the world and we think in ecosystems, not just empires. So our mission is to uplift the entire AI to the place where businesses can use it, verticalize it, build new products, globalize. We are building our sales and marketing efforts now with a much bigger, faster systems-- >> So a lot of, go to market expansion, more customer focus. More field sales and support kind of thing. >> Build our center for AI research in Prague, within the CND, now we are building it in Chennai and Ottawa, and so globalizing the operation, going to China, going to build focus in Asia as well. >> So nice step up on funding at 72 million, you said? >> 72.5 million. >> 72.5 million, that's almost double what you've raised to date, nice kickup. So global expansion, nice philosophy. That's important to you guys, isn't it? >> The world has become a small village. There's no changing that, and data is global. Things are a wide global trend, it's amazing to see that AI is not just transforming the US, it's also transforming China, it's also transforming India. It's transforming Africa. Pay through mobile is a very common theme worldwide and I think data is being collected globally. I think there is no way to unbox it and box it back to a small place, so our vision is very borderless and global and we want the AI companies of the valley to also compete in a global arena and I think that's kind of why we think it's important to be-- >> Love competition, that's certainly going to force everyone to be more open. I got to ask you about the role of the developer. I love the democratization, putting AI in the hands of everybody, it's a great mission. You guys do a lot of AI for Good efforts. So congratulations on that, but how does this change the nature of the developer, because you're seeing with cloud and DevOps, developers are becoming closer to the front lines, they're becoming kingmakers. They're becoming really, really important. So the role of the developer is important. How do you change that role, if any. How do you expand it, what happens? >> There are two important transformations happening right now in the tech world. One is the role of data scientists and the role of the software engineer. Right, so they're coming closer in many ways, in actually in some of the newer places, software engineers are deploying data science models, data scientists are deploying software engineering. So Python has been a good new language, the new languages that are coming up that help that happen more closely. Software engineering as we know it, which was looking at data creating the rules and the logic that runs a program is now being automated to a degree where that logic is being generated from data using data science. So that's where the brains behind how programs run how computers build is now being, is AI inside. And so that's where the world is transforming, software engineers now get to do a lot more with a lot less of tinkering on a daily basis for little modules. They can probably build a whole slew of an application what would take 18 months to build is now compressing into 18 weeks or 18 days. >> Sri, I love how you talk about software engineering and data scientists, very specific. I was having a debate with my young son around what is computer science was the question. Well, computer science is the study of computers the science of computers. It used to be if you were a CS or a comp sci major which is not cool to say anymore but, when you were a computer science major, you were really a software engineer, that was the discipline. Now, computer science as a field has spread so far and so broad, you've got software engineering you've got data science, you have newer roles are emerging. But that brings up the question I want to put to you which is, the whole idea of, I'm a full stack developer. Well, if what you're saying you're doing is true, you're essentially cutting the stack in half. So it's a half stack developer on one end and a data scientist that's got the other half. So the notion of the full stack developer kind of goes away with the idea of horizontally scalable infrastructure and vertically specialized data and AI. Your thoughts, what's your reaction to that? >> I think the most... I would say the most scarce resource in the world is empathy, right? When developers have empathy for their users, they now start building design that cares for the users. So the design becomes still the limiting factor where you can't really automate a lot of that design. So the full stack engineer is now going closer to the front and understanding their users and making applications that are perceptive of how the users are using them and building that empathy into the product. A lot of the full stack, we used to learn how to build up a kernel, deploy it on cloud, scale it on your own servers. All of that is coming together in reasonably easier ways. With cloud is helping there, AI is helping there, data is helping there, and lessons from the data. But I think what has not gone away is imagination, creativity, and how to power that creativity with AI and get it in the hands of someone quickly. Marketing has become easier in the new world. So it's not just enough to make products, you have to make markets for your products and then deliver and get that success for customers-- >> So what you're saying-- >> The developers become-- >> The consistency of the lower end of the stack of wiring together the plumbing and the kernel and everything else is done for you. So you can move up. >> Up the stack. >> So the stack's growing, so it's still kind of full. No one calls themselves a half stack developer. I haven't met anyone say "Yeah I'm a half stack developer." They're full stack developers, but the roles are changing. >> I think what-- >> There's more to do on the front end of creativity so the stack's extending. >> Creativity is changing, I think the one thing we have learned. We've gone past Moore's Law in the valley and people are innovating architectures to run AI faster. So AI is beginning to eat hardware. So you've seen the transformation in microprocessors as well I think once AI starts being part of the overall conversation, you'll see a much more richer coexistence with being how a human programmer and a computer programmer is going to be working closely. But I think this is just the beginning of a real richness when you talk about rich interactive applications, you're going to talk about rich interactive appliances, where you start seeing intelligence really spread around the form. >> Sri, if we really want to have some fun we can just talk about what a 10x engineer is. No I'm only kidding, we're not going to go there. It's always a good debate on Twitter what a 10x engineer is. Sri, congratulations on the funding. $72.5 million in finance for global expansion on the team side as well as in geographies, congratulations. >> Thank you. >> H2O.ai >> The full stack engineer of the future is, finishing up your full stack engineer conversation is going to get that courage and become a leader. Going from managers to leaders, developers to founders. I think it's become easier to democratize entrepreneurship now than ever before and part of our mission as a company is to democratize things, democratize AI, democratize H2O like in the AI for Good, democratize water. But also democratize the art of making more entrepreneurs and remove the common ways to fail and that's also a way to create more opportunity more ownership in the world and so-- >> And I think society will benefit from this globally because in the data is truth, in the data is the notion of being transparent, if it's all there and we're going to get to the data faster and that's where AI helps us. >> That's what it is. >> Sri, congratulations, $72 million of funding for H2O. We're here with the founder and CEO Sri Ambati. Great success story here in Silicon Valley and around the world. I'm John Furrier with the Cube, thanks for watching. >> Sri: Thank you. (upbeat music)

Published Date : Aug 30 2019

SUMMARY :

in the heart of Silicon Valley, Palo Alto, California, I'm John Furrier, host of the Cube. and an endorsement of the amazing teamwork conversations going back to 2013, '14 on The Cube. As the quote for cloud is, that Amazon always quotes and that's been kind of the key north finder for us. What are you guys going to do with the funding AI is at the center of every software moment So a lot of, go to market expansion, more customer focus. and Ottawa, and so globalizing the operation, That's important to you guys, isn't it? and I think data is being collected globally. So the role of the developer is important. and the role of the software engineer. and a data scientist that's got the other half. So the full stack engineer is now going closer to the front The consistency of the lower end of the stack So the stack's growing, so it's still kind of full. so the stack's extending. So AI is beginning to eat hardware. Sri, congratulations on the funding. and remove the common ways to fail because in the data is truth, in the data is the notion and around the world. Sri: Thank you.

ENTITIES

Entity	Category	Confidence
NVIDIA	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Prague	LOCATION	0.99+
John Furrier	PERSON	0.99+
Chennai	LOCATION	0.99+
Wells Fargo	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
18 months	QUANTITY	0.99+
Asia	LOCATION	0.99+
August 2019	DATE	0.99+
$72 million	QUANTITY	0.99+
H2O	ORGANIZATION	0.99+
Ottawa	LOCATION	0.99+
Sri Ambati	PERSON	0.99+
18 weeks	QUANTITY	0.99+
18 days	QUANTITY	0.99+
China	LOCATION	0.99+
H2O.ai	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2013	DATE	0.99+
147	QUANTITY	0.99+
$72.5 million	QUANTITY	0.99+
72 million	QUANTITY	0.99+
Python	TITLE	0.99+
millions of users	QUANTITY	0.99+
One	QUANTITY	0.99+
second generation	QUANTITY	0.99+
Sri	PERSON	0.99+
Palo Alto	LOCATION	0.98+
Cloud 2.0	TITLE	0.98+
Goldman Sachs	ORGANIZATION	0.98+
Africa	LOCATION	0.97+
10x	QUANTITY	0.97+
Twitter	ORGANIZATION	0.96+
72.5 million	QUANTITY	0.96+
Cube	ORGANIZATION	0.94+
CND	LOCATION	0.93+
'14	DATE	0.93+
Palo Alto, California	LOCATION	0.92+
half	QUANTITY	0.92+
US	LOCATION	0.87+
India	LOCATION	0.86+
one end	QUANTITY	0.86+
two of our rounds	QUANTITY	0.84+
two important transformations	QUANTITY	0.78+
Last	QUANTITY	0.77+
double	QUANTITY	0.7+
DevOps	TITLE	0.69+
Ping An	ORGANIZATION	0.68+
Moore	ORGANIZATION	0.67+
H2O.ai	TITLE	0.61+
CEO	PERSON	0.61+
wave	EVENT	0.6+
Series C	EVENT	0.58+
The Cube	TITLE	0.53+
CUBE	EVENT	0.46+
Series	OTHER	0.27+

Sri Ambati, H2O.ai | CUBE Conversation, August 2019

>> from our studios in the heart of Silicon Valley, Palo ALTO, California It is a cute conversation. >> Hello and welcome to this Special Cube conversation here in Palo Alto, California Cubes Studios Jon for your host of the Q. We retreat embodies the founder and CEO of H 20 dot ay, ay, Cuba Lem hot. Start up right in the action of all the machine learning artificial intelligence with the democratization, the role of data in the future, it's all happening with the cloud 2.0, Dev Ops 2.0, great to see you, The test. But the company What's going on, you guys air smoking hot? Congratulations. You got the right formally here with a I explain what's going on. It started about seven >> years ago on Dottie. I was was just a new fad that arrived into Silicon Valley. Today we have thousands of companies in the eye and we're very excited to be partners in making more companies becoming I first. And our region here is to democratize the eye and we've made simple are open source made it easy for people to start adapting data signs and machine learning and different functions inside their large and said the large organizations and apply that for different use cases across financial service is insurance healthcare. >> We leapfrog in 2016 and build our first closer. It's chronic traveler >> C I. We made it on GPS using the latest hardware software innovations Open source. I has funded the rice off automatic machine learning, which >> further reduces the need for >> extraordinary talent to build machine learning. >> No one has time >> today and then we're trying to really bring that automatic mission learning a very significant crunch. Time free, I so people can consuming. I better. >> You know, this is one of the things I love about the current state of the market right now. Entrepreneur Mark, as well as start of some growing companies Go public is that there's a new breed of entrepreneurship going on around large scale, standing up infrastructure, shortening the time it takes to do something like provisioning like the old eyes. I get a phD and we're seeing this in data science. I mean, you don't have to be a python coder. This democratisation is not just a tagline. It's actually the reality is of a business opportunity of whoever can provide the infrastructure and the systems four people to do. It is an opportunity. You guys were doing that. This is a real dynamic. This isn't a new way, a new kind of dynamic in the industry. The three real character >> sticks on ability to adopt. Hey, Iris Oneness Data >> is a team, a team sport, which means that you gotta bring different dimensions within your organization to be able to take advantage of data and the I and, um, you've got to bring in your domain. Scientists work closely with your data. Scientists were closely with your data. Engineers produce applications that can be deployed and then get your design on top of it. That can convince users are our strategist to make those decisions. That delays is showing up, so that takes a multi dimensional workforce to work closely together. So the rial problem, an adoption of the AI today is not just technology, it's also culture. And so we're kind of bringing those aspects together and form of products. One of our products, for example, explainable. Aye, aye. It's helping the data. Scientists tell a story that businesses can understand. Why is the model deciding? I need to take discretion. This'll direction. Why's this moral? Giving this particular nurse a high credit score? Even though she is, she has a very she doesn't have a high school graduation. That kind of figuring out those Democratic democratization goes all the way down there. It's wise, a mortal deciding what's deciding and explaining and breaking that down into English, which which building trust is a huge aspect in a >> well. I want to get to the the talent in the time and the trust equation on the next talk track, but I want to get the hard news out there. You guys are have some news driverless a eyes, your one of your core things. What's the hard Explain the news. What's the big news? >> The big news has Bean, that is, the money ball from business and money Ball, as it has been played out, has been. The experts >> were left out of the >> field and all garden is taking over and there is no participation between experts, the domain scientists and the data scientists and what we're bringing with the new product in travel see eyes, an ability for companies to take away I and become a I companies themselves. The rial air races not between the Googles and the Amazons and Microsoft's and other guy companies, software companies. The relay race is in the word pickles. And how can a company, which is a bank or an insurance giant or a health care company take a I platforms and become, take the data, monetize the data and become a I companies themselves? >> You know, that's a really profound state. I would agree with 100% on that. I think we saw that early on in the big data world round Doop doop kind of died by the wayside. But day Volonte and we keep on team have observed and they actually predicted that the most value was gonna come from practitioners, not the vendors, because they're the ones who have the data. And you mentioned verticals. This is another interesting point. I want to get more explanation from you on Is that APS are driven by data data needs domain specific information. So you can't just say I have data. Therefore, magic happens. It's really at the edge of the domain speak or the domain feature of the application. This is where the data is this kind of supports your idea that the eyes with the company's not that are using it, not the suppliers of the technology. >> Our vision has always being hosted by maker customer service for right to be focused on the customer, and through that we actually made customer one of the product managers inside the company. And the way that the doors that opened from working where it closed with some of our leading customers was that we need to get them to participate and take a eyes, algorithms and platforms that can tune automatically. The algorithms and the right hyper parameter organizations, right features and amend the right data sets that they have. There's a whole data lake around there on their data architecture today, which data sets them and not using in my current problem solving. That's a reasonable problem in looking at that combination of these Berries. Pieces have been automated in travel a, C I. A. And the new version that we're not bringing to market is able to allow them to create their own recipes, bring your own transformers and make that automatic fit for their particular race. Do you think about this as a rebuilt all the components of a race car. They're gonna take it and apply for that particular race to win. >> So that's where driverless comes in its travels in the sense of you don't really need a full operator. It kind of operates on its own. >> In some sense, it's driver less, which is in some there taking the data scientists giving them a power tool that historically before automatic machine learning your valises in the umbrella automatic machine learning they would find tune learning the nuances off the data and the problem, the problem at hand, what they're optimizing for and the right tweaks in the algorithm. So they have to understand how deep the streets are gonna be home, any layers off, off deep learning they need what particular variation and deploying. They should put in a natural language processing what context they need to the long term, short term memory. All these pieces, they have to learn themselves. And they were only a few Grand masters are big data scientist in the world who could come up with the right answer for different problems. >> So you're spreading the love of a I around. So you simplifying that you get the big brains to work on it and democratization. People can then participate in. The machines also can learn both humans and machines between >> our open source and the very maker centric culture we've been able to attract on the world's top data scientists, physicists and compiler engineers to bring in a form factor that businesses can use. And today it one data scientist in a company like Franklin Templeton can operate at the level of 10 or hundreds of them and then bring the best in data science in a form factor that they can plug in and play. >> I was having a cautious We can't Libby, who works with being our platform team. We have all this data with the Cube, and we were just talking. Wait higher data science and a eye specialist and you go out and look around. You get Google and Amazon all these big players, spending between 3 to $4,000,000 per machine learning engineer, and that might be someone under the age of 30. And with no experience or so the talent war is huge. I mean the cost to just hire these guys. We can't hire these people. It's a >> global war. >> There's no there's a talent shortage in China. There's talent shortage in India. There stand shortage in Europe and we have officers in in Europe and in India. The talent shortage in Toronto and Ottawa writes it is. It's a global shortage off physicists and mathematicians and data scientists. So that's where our tools can help. And we see that you see travelers say I as a wave you can drive to New York or you can fly to me >> off. I started my son the other days taking computer science classes in school. I'm like, Well, you know, the machine learning at a eyes kind like dog training. You have dog training. You train that dog to do some tricks that some tricks. Well, if you're a coder, you want to train the machines. This is the machine training. This is data science is what a. I possibilities that machines have to be taught. Something is a base in foot. Machines just aren't self learning on their own. So as you look at the science of a I, this becomes the question on the talent gap. Can the talent get be closed by machines and you got the time you want speed low, latent, see and trust. All these things are hard to do. All three. Balancing all three is extremely difficult. What's your thoughts on those three variables? >> So that's where we brought a I to help the day >> I travel A. C. I's concept that bringing a I to simplify it's an export system to do a I better so you can actually give it to the hands of a new data scientists so you can perform it the power off a Dead ones data centers if you're not disempowering. The data sent that he is a scientist, the park's still foreign data scientist, because he cannot be stopped with the confusion matrix, false positives, false negatives. That's something a data scientists can understand. What you're talking about featured engineering. That's something a data scientists understand. And what travelers say is really doing is helping him may like do that rapidly and automated on the latest hardware. That's what the time is coming into GPS that PTSD pews different form off clouds at cheaper, faster, cheaper and easier. That's the democratization aspect, but it's really targeted. Data Scientist to Prevent Excrement Letter in Science data sciences is a search for truth, but it's a lot of extra minutes to get the truth and law. If you can make the cost of excrement really simple, cheaper on dhe prevent over fitting. That's a common problem in our science. Prevent by us accidental bites that you introduced because the data is last right, trying to kind of prevent the common pitfalls and doing data science leakage. Usually your signal leaks. And how do you prevent those common those pieces? That's kind of weird, revolutionize coming at it. But if you put that in the box, what that really unlocks is imagination. The real hard problems in the world are still the same. >> Aye aye for creative people, for instance. They want infrastructure. They don't wanna have to be an expert. They wanted that value. That's the consumer ization, >> is really the co founder for someone who's highly imaginative and his courage right? And you don't have to look for founders to look for courage and imagination that a lot of intra preneurs in large companies were trying to bring change to that organization. >> You know, we always say that it's intellectual property game's changing from you know I got the protocol. This is locked and patented. Two. You could have a workflow innovation change. One little tweak of a process with data and powerful. Aye, aye, that's the new magic I P equation. It's in the workforce, in the applications, new opportunities. Do you agree with that? >> Absolutely. That the leapfrog from here is businesses will come up with new business processes that we looked at. Business process optimization and globalization can help there. But a I, as you rightfully said earlier, is training computers, not just programming them. Their schooling most of computers that can now with data, think almost at the same level as a go player. Right there was leading Go player. You can think at the same level off an expert in that space. And if that's happening now, I can transform. My business can run 24 by seven at the rate at which I can assembled machines and feed a data data creation becomes making new data becomes the real value that hey, I can >> h 20 today I announcing driverless Aye, aye. Part of their flagship problem product around recipes and democratization. Ay, ay, congratulations. Final point take a minute to explain for the folks just the product, how they buy it. What's it made of? What's the commitment? How did they engage with you >> guys? It's an annual license recruit. License this software license people condone load on our website, get a three week trial, try it on their own retrial. Pretrial recipes are open source, but 100 recipes built by then Masters have been made open source and they could be plugged and tried and taken. Customers, of course, don't have to make their software open source. They can take this, make it theirs. And our region here is to make every company in the eye company. And and that means that they have to embrace it. I learn it. Ticket. Participate some off. The leading conservation companies are giving it back so you can access in the open source. But the real vision here is to build that community off. A practitioners inside large formulations were here or teams air global. And we're here to support that transformation off some of the largest customers. >> So my problem of hiring an aye aye person You could help you solve that right today. Okay, So it was watching. Please get their stuff and come get a job opening here. That's the goal. But that's that's the dream. That is the dream. And we we want to be should one day. I have watched >> you over the last 10 years. You've been an entrepreneur. The fierce passion. We want the eye to be a partner so you can take your message to wider audience and build monetization or on the data you have created. Businesses are the largest after the big data warlords we have on data. Privacy is gonna come eventually. But I think I did. Businesses are the second largest owners of data. They just don't know how to monetize it. Unlock value from it. I will have >> Well, you know, we love day that we want to be data driven. We want to go faster. I love the driverless vision travel. Say I h 20 dot ay, ay here in the Cuban John for it. Breaking news here in Silicon Valley from that start of h 20 dot ay, ay, thanks for watching. Thank you.

Published Date : Aug 20 2019

SUMMARY :

from our studios in the heart of Silicon Valley, Palo ALTO, But the company What's going on, you guys air smoking hot? And our region here is to democratize the eye and we've made simple are open source made We leapfrog in 2016 and build our first closer. I has funded the rice off automatic machine learning, I better. and the systems four people to do. sticks on ability to adopt. Why is the model deciding? What's the hard Explain the news. The big news has Bean, that is, the money ball from business and experts, the domain scientists and the data scientists and what we're bringing with the new product It's really at the edge of And the way that the doors that opened from working where it closed with some of our leading So that's where driverless comes in its travels in the sense of you don't really need a full operator. the nuances off the data and the problem, the problem at hand, So you simplifying that you get the big brains to our open source and the very maker centric culture we've been able to attract on the world's I mean the cost to just hire And we see that you see travelers say I as a wave you can drive to New York or Can the talent get be closed by machines and you got the time The data sent that he is a scientist, the park's still foreign data scientist, That's the consumer ization, is really the co founder for someone who's highly imaginative and his courage It's in the workforce, in the applications, new opportunities. That the leapfrog from here is businesses will come up with new business explain for the folks just the product, how they buy it. And and that means that they have to embrace it. That is the dream. or on the data you have created. I love the driverless vision

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
2016	DATE	0.99+
Amazon	ORGANIZATION	0.99+
New York	LOCATION	0.99+
China	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Amazons	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Ottawa	LOCATION	0.99+
India	LOCATION	0.99+
Toronto	LOCATION	0.99+
August 2019	DATE	0.99+
hundreds	QUANTITY	0.99+
100 recipes	QUANTITY	0.99+
100%	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
three week	QUANTITY	0.99+
24	QUANTITY	0.99+
first	QUANTITY	0.99+
10	QUANTITY	0.99+
today	DATE	0.99+
Today	DATE	0.99+
seven	QUANTITY	0.99+
Sri Ambati	PERSON	0.99+
One	QUANTITY	0.98+
one	QUANTITY	0.98+
Libby	PERSON	0.98+
3	QUANTITY	0.98+
Two	QUANTITY	0.97+
$4,000,000	QUANTITY	0.97+
Franklin Templeton	ORGANIZATION	0.97+
both	QUANTITY	0.96+
three variables	QUANTITY	0.95+
thousands of companies	QUANTITY	0.94+
Jon	PERSON	0.93+
three	QUANTITY	0.92+
H2O.ai	ORGANIZATION	0.91+
Palo ALTO	LOCATION	0.9+
English	OTHER	0.89+
h 20 dot	OTHER	0.86+
H 20 dot ay	ORGANIZATION	0.86+
Volonte	PERSON	0.84+
Dev Ops 2.0	TITLE	0.82+
one day	QUANTITY	0.82+
last 10 years	DATE	0.81+
Palo Alto, California	LOCATION	0.8+
second largest	QUANTITY	0.79+
about seven >> years ago	DATE	0.79+
Cubes Studios	ORGANIZATION	0.77+
CEO	PERSON	0.76+
Lem	PERSON	0.76+
one data scientist	QUANTITY	0.76+
under	QUANTITY	0.76+
four people	QUANTITY	0.73+
30	QUANTITY	0.71+
Dottie	ORGANIZATION	0.66+
Iris	PERSON	0.65+
Bean	PERSON	0.63+
python coder	TITLE	0.59+
California	LOCATION	0.58+
h 20	OTHER	0.57+
Cube	COMMERCIAL_ITEM	0.56+
Go	TITLE	0.55+
age of	QUANTITY	0.52+
go	TITLE	0.51+
Cuban	OTHER	0.49+
Cuba	ORGANIZATION	0.47+
John	PERSON	0.44+
Oneness	ORGANIZATION	0.43+

Sri Satish Ambati, H2O.ai | CUBE Conversation, August 2019

(upbeat music) >> Woman Voiceover: From our studios in the heart of Silicon Valley, Palo Alto, California this is a CUBE Conversation. >> Hello and welcome to this special CUBE Conversation here in Palo Alto, California, CUBE Studios, I'm John Furrier, host of theCUBE, here with Sri Ambati. He's the founder and CEO of H20.ai. CUBE Alum, hot start up right in the action of all the machine learning, artificial intelligence, with democratization the role of data in the future, it's all happening with Cloud 2.0, DevOps 2.0, Sri, great to see you. Thanks for coming by. You're a neighbor, you're right down the street from us at our studio here. >> It's exciting to be at theCUBE Com. >> That's KubeCon, that's Kubernetes Con. CUBEcon, coming soon, not to be confused with KubeCon. Great to see you. So tell us about the company, what's going on, you guys are smoking hot, congratulations. You got the right formula here with AI. Explain what's going on. >> It started about seven years ago, and .ai was just a new fad that arrived that arrived in Silicon Valley. And today we have thousands of companies in AI, and we're very excited to be partners in making more companies become AI-first. And our vision here is to democratize AI, and we've made it simple with our open source, made it easy for people to start adapting data science and machine learning in different functions inside their large organizations. And apply that for different use cases across financial services, insurance, health care. We leapfrogged in 2016 and built our first closed source product, Driverless AI, we made it on GPUs using the latest hardware and software innovations. Open source AI has funded the rise of automatic machine learning, Which further reduces the need for extraordinary talent to fill the machine learning. No one has time today, and then we're trying to really bring that automatic machine learning at a very significant crunch time for AI, so people can consume AI better. >> You know, this is one of the things that I love about the current state of the market right now, the entrepreneur market as well as startups and growing companies that are going to go public. Is that there's a new breed of entrepreneurship going on around large scale, standing up infrastructure, shortening the time it takes to do something. Like provisioning. The old AIs, you got to be a PHD. And we're seeing this in data science, you don't have to be a python coder. This democratization is not just a tag line, actually the reality is of a business opportunity. Whoever can provide the infrastructure and the systems for people to do it. It is an opportunity, you guys are doing that. This is a real dynamic. This is a new way, a new kind of dynamic and an industry. >> The three real characteristics on ability to adopt AI, one is data is a team sport. Which means you've got to bring different dimensions within your organization to be able to take advantage of data and AI. And you've got to bring in your domain scientists, work closely with your data scientists, work closely with your data engineers, produce applications that can be deployed, and then get your design on top of it that can convince users or strategists to make those decisions that data is showing up So that takes a multi-dimensional workforce to work closely together. The real problem in adoption of AI today is not just technology, it's also culture. So we're kind of bringing those aspects together in formal products. One of our products, for example, Explainable AI. It's helping the data scientists tell a story that businesses can understand. Why is the model deciding I need to take this test in this direction? Why is this model giving this particular nurse a high credit score even though she doesn't have a high school graduation? That kind of figuring out those democratization goes all the way down. Why is the model deciding what it's deciding, and explaining and breaking that down into English. And building a trust is a huge aspect in AI right now. >> Well I want to get to the talent, and the time, and the trust equation on the next talk, but I want to get the hard news out there. You guys have some news, Driverless AI is one of your core things. Explain the news, what's the big news? >> The big news has been that... AI's a money ball for business, right? And money ball as it has been played out has been the experts were left out of the field, and algorithms taking over. And there is no participation between experts, the domain scientists, and the data scientists. And what we're bringing with the new product in Driverless AI, is an ability for companies to take our AI and become AI companies themselves. The real AI race is not between the Googles and the Amazons and the Microsofts and other AI companies, AI software companies. The real AI race is in the verticals and how can a company which is a bank, or an insurance giant, or a healthcare company take AI platforms and become, take the data and monetize the data and become AI companies themselves. >> Yeah, that's a really profound statement I would agree with 100% on that. I think we saw that early on in the big data world around Hadoop, well Hadoop kind of died by the wayside, but Dave Vellante and the WikiBon team have observed, and they actually predicted, that the most value was going to come from practitioners, not the vendors. 'Cause they're the ones who have the data. And you mentioned verticals, this is another interesting point I want to get more explanation from you on, is that apps are driven by data. Data needs domain-specific information. So you can't just say "I have data, therefore magic happens" it's really at the edge of the domain speak or the domain feature of the application. This is where the data is, so this kind of supports your idea that the AI's about the companies that are using it, not the suppliers of the technology. >> Our vision has always been how we make our customers satisfied. We focus on the customer, and through that we actually make customer one of the product managers inside the company. And the doors that open from working very closely with some of our leading customers is that we need to get them to participate and take AIs, algorithms, and platforms, that can tune automatically the algorithms, and have the right hyper parameter optimizations, the right features. And augment the right data sets that they have. There's a whole data lake around there, around data architecture today. Which data sets am I not using in my current problem I'm solving, that's a reasonable problem I'm looking at. That combination of these various pieces have been automated in Driverless AI. And the new version that we're now bringing to market is able to allow them to create their own recipes, bring their own transformers, and make an automatic fit for their particular race. So if you think about this as we built all the components of a race car, you're going to take it and apply it for that particular race to win. >> John: So that's the word driverless comes in. It's driverless in the sense of you don't really need a full operator, it kind of operates on its own. >> In some sense it's driverless. They're taking the data scientists, giving them a power tool. Historically, before automatic machine learning, driverless is in the umbrella of machine learning, they would fine tune, learning the nuances of the data, and the problem at hand, what they're optimizing for, and the right tweaks in the algorithm. So they have to understand how deep the streets are going to be, how many layers of deep learning they need, what variation of deep learning they should put, and in a natural language crossing, what context they need. Long term shot, memory, all these pieces they have to learn themselves. And there were only a few grand masters or big data scientists in the world who could come up with the right answer for different problems. >> So you're spreading the love of AI around. >> Simplifying that. >> You get the big brains to work on it, and democratization means people can participate and the machines also can learn. Both humans and machines. >> Between our open source and the very maker-centric culture, we've been able to attract some of the world's top data scientists, physicists, and compiler engineers. To bring in a form factor that businesses can use. One data scientist in a company like Franklin Templeton can operate at a level of ten or hundreds of them, and then bring the best in data science in a form factor that they can plug in and play. >> I was having a concert with Kent Libby, who works with me on our platform team. We have all this data with theCUBE, and we were just talking, we need to hire a data scientist and AI specialist. And you go out and look around, you've got Google, Amazon, all these big players spending between 3-4 million per machine learning engineer. And that might be someone under the age of 30 with no experience. So the talent bore is huge. The cost to just hire, we can't hire these people. >> It's a global war. There's talent shortage in China, there's talent shortage in India, there's talent shortage in Europe, and we have offices in Europe and India. There's a talent shortage in Toronto and Ottawa. So it's a global shortage of physicists and mathematicians and data scientists. So that's where our tools can help. And we see Driverless AI as, you can drive to New York or you can fly to New York. >> I was talking to my son the other day, he's taking computer science classes in night school. And it's like, well you know, the machine learning in AI is kind of like dog training. You have dog training, you train the dog to do some tricks, it does some tricks. Well, if you're a coder you want to train the machine. This is the machine training. This is data science, is what AI possibility is there. Machines have to be taught something. There's a base input, machines just aren't self-learning on their own. So as you look at the science of AI, this becomes the question on the talent gap. Can the talent gap be closed by machines? And you got the time, you want speed, low latency, and trust. All these things are hard to do. All three, balancing all three is extremely difficult. What's your thoughts on those three variables? >> So that's why we brought AI to help with AI. Driverless AI is a concept of bringing AI to simplify. It's an expert system to do AI better. So you can actually give to the hands of the new data scientists, so you can perform at the power of an advanced data scientist. We're not disempowering the data scientist, the part's still for a data scientist. When you start with a confusion matrix, false positives, false negatives, that's something a data scientist can understand. When you talk about feature engineering, that's something a data scientist can understand. And what Driverless AI is really doing is helping him do that rapidly, and automated on the latest hardware, that's where the time is coming into. GPUs, FPGAs, TPUs, different form of clouds. Cheaper, right. So faster, cheaper, easier, that's the democratization aspect. But it's really targeted at the data scientist to prevent experimental error. In science, the data science is a search for truth, but it's a lot of experiments to get to truth. If you can make the cost of experiments really simple, cheaper, and prevent over fitting. That's a common problem in our science. Prevent bias, accidental bias that you introduce because the data is biased, right. So trying to prevent the flaws in doing data science. Leakage, usually your signal leaks, and how do you prevent those common pieces. That's where Driverless AI is coming at it. But if you put that in a box, what that really unlocks is imagination. The real hard problems in the world are still the same. >> AI for creative people, for instance. They want infrastructure, they don't want to have to be an expert. They want that value. That's the consumerization. >> AI is really the co founder for someone who's highly imaginative and has courage, right. And you don't have to look for founders to look for courage and imagination. A lot of entrepreneurs in large companies, who are trying to bring change to their organizations. >> Yeah, we always say, the intellectual property game is changing from protocols, locked in, patented, to you could have a workflow innovation. Change one little tweak of a process with data and powerful AI, that's the new magic IP equation. It's in the workflow, it's in the application, it's new opportunities. Do you agree with that? >> Absolutely. The leapfrog from here is businesses will come up with new business processes. So we looked at business process optimization, and globalization's going to help there. But AI, as you rightfully said earlier, is training computers. Not just programming them, you're schooling them. A host of computers that can now, with data, think almost at the same level as a Go player. The world's leading Go player. They can think at the same level of an expert in that space. And if that's happening, now I can transform. My business can run 24 by 7 and the rate at which I can assemble machines and feed it data. Data creation becomes, making new data becomes, the real value that AI can- >> H20.ai announcing Driverless AI, part of their flagship product around recipes and democratizing AI. Congratulations. Final point, take a minute to explain to the folks just the product, how they buy it, what's it made of, what's the commitment, how do they engage with you guys? >> It's an annual license, a software license people can download on our website. Get a three week trial, try it on their own. >> Free trial? >> A free trial, our recipes are open-source. About a hundred recipes, built by grand masters have been made open source. And they can be plugged, and tried. Customers of course don't have to make their software open source. They can take this, make it theirs. And our vision here is to make every company an AI company. And that means that they have to embrace AI, learn it, tweak it, participate, some of the leading conservation companies are giving it back in the open source. But the real vision here is to build that community of AI practitioners inside large organizations. We are here, our teams are global, and we're here to support that transformation of some large customers. >> So my problem of hiring an AI person, you could help me solve that. >> Right today. >> Okay, so anyone who's watching, please get their stuff and come get an opening here. That's the goal. But that is the dream, we want AI in our system. >> I have watched you the last ten years, you've been an entrepreneur with a fierce passion, you want AI to be a partner so you can take your message to wider audience and build monetization around the data you have created. Businesses are the largest, after the big data warlords we have, and data privacy's going to come eventually, but I think businesses are the second largest owners of data they just don't know how to monetize it, unlock value from it, and AI will help. >> Well you know we love data, we want to be data-driven, we want to go faster. Love the driverless vision, Driverless AI, H20.ai. Here in theCUBE I'm John Furrier with breaking news here in Silicon Valley from hot startup H20.ai. Thanks for watching.

Published Date : Aug 16 2019

SUMMARY :

in the heart of Silicon Valley, Palo Alto, California of all the machine learning, artificial intelligence, You got the right formula here with AI. Which further reduces the need for extraordinary talent and the systems for people to do it. Why is the model deciding I need to take and the trust equation on the next talk, and the data scientists. that the most value was going to come from practitioners, and have the right hyper parameter optimizations, It's driverless in the sense of you don't really need and the problem at hand, what they're optimizing for, You get the big brains to work on it, Between our open source and the very So the talent bore is huge. and we have offices in Europe and India. This is the machine training. of the new data scientists, so you can perform That's the consumerization. AI is really the co founder for someone who's It's in the workflow, and the rate at which I can assemble machines just the product, how they buy it, what's it made of, a software license people can download on our website. And that means that they have to embrace AI, you could help me solve that. But that is the dream, we want AI in our system. around the data you have created. Love the driverless vision, Driverless AI, H20.ai.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Toronto	LOCATION	0.99+
Google	ORGANIZATION	0.99+
2016	DATE	0.99+
Amazons	ORGANIZATION	0.99+
Microsofts	ORGANIZATION	0.99+
August 2019	DATE	0.99+
John Furrier	PERSON	0.99+
India	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Ottawa	LOCATION	0.99+
ten	QUANTITY	0.99+
Sri Satish Ambati	PERSON	0.99+
John	PERSON	0.99+
China	LOCATION	0.99+
three week	QUANTITY	0.99+
24	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
100%	QUANTITY	0.99+
WikiBon	ORGANIZATION	0.99+
H20.ai	ORGANIZATION	0.99+
Cloud 2.0	TITLE	0.99+
one	QUANTITY	0.98+
7	QUANTITY	0.98+
Sri Ambati	PERSON	0.98+
One	QUANTITY	0.98+
3-4 million	QUANTITY	0.98+
today	DATE	0.98+
Franklin Templeton	ORGANIZATION	0.97+
Both	QUANTITY	0.97+
three variables	QUANTITY	0.97+
DevOps 2.0	TITLE	0.97+
CUBE Conversation	EVENT	0.97+
One data	QUANTITY	0.96+
python	TITLE	0.95+
Palo Alto, California	LOCATION	0.95+
About a hundred recipes	QUANTITY	0.94+
first	QUANTITY	0.94+
English	OTHER	0.93+
CUBE Studios	ORGANIZATION	0.91+
Kent Libby	PERSON	0.91+
Hadoop	TITLE	0.89+
about seven years ago	DATE	0.88+
first closed	QUANTITY	0.88+
CUBE Alum	ORGANIZATION	0.87+
Go	TITLE	0.87+
Silicon Valley, Palo Alto, California	LOCATION	0.87+
Kubernetes	TITLE	0.85+
thousands of companies	QUANTITY	0.84+
30	QUANTITY	0.84+
three real characteristics	QUANTITY	0.83+
three	QUANTITY	0.82+
theCUBE	ORGANIZATION	0.81+
H20.ai	TITLE	0.79+
H2O.ai	ORGANIZATION	0.79+
second largest	QUANTITY	0.76+
under	QUANTITY	0.76+
KubeCon	EVENT	0.71+
last ten years	DATE	0.7+
theCUBE Com	ORGANIZATION	0.68+
Con.	EVENT	0.59+
.ai	TITLE	0.57+
Sri	ORGANIZATION	0.57+
CUBEcon	EVENT	0.55+

Breaking Analysis: AI Goes Mainstream But ROI Remains Elusive

>> From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR, this is "Breaking Analysis" with Dave Vellante. >> A decade of big data investments combined with cloud scale, the rise of much more cost effective processing power. And the introduction of advanced tooling has catapulted machine intelligence to the forefront of technology investments. No matter what job you have, your operation will be AI powered within five years and machines may actually even be doing your job. Artificial intelligence is being infused into applications, infrastructure, equipment, and virtually every aspect of our lives. AI is proving to be extremely helpful at things like controlling vehicles, speeding up medical diagnoses, processing language, advancing science, and generally raising the stakes on what it means to apply technology for business advantage. But business value realization has been a challenge for most organizations due to lack of skills, complexity of programming models, immature technology integration, sizable upfront investments, ethical concerns, and lack of business alignment. Mastering AI technology will not be a requirement for success in our view. However, figuring out how and where to apply AI to your business will be crucial. That means understanding the business case, picking the right technology partner, experimenting in bite-sized chunks, and quickly identifying winners to double down on from an investment standpoint. Hello and welcome to this week's Wiki-bond CUBE Insights powered by ETR. In this breaking analysis, we update you on the state of AI and what it means for the competition. And to do so, we invite into our studios Andy Thurai of Constellation Research. Andy covers AI deeply. He knows the players, he knows the pitfalls of AI investment, and he's a collaborator. Andy, great to have you on the program. Thanks for coming into our CUBE studios. >> Thanks for having me on. >> You're very welcome. Okay, let's set the table with a premise and a series of assertions we want to test with Andy. I'm going to lay 'em out. And then Andy, I'd love for you to comment. So, first of all, according to McKinsey, AI adoption has more than doubled since 2017, but only 10% of organizations report seeing significant ROI. That's a BCG and MIT study. And part of that challenge of AI is it requires data, is requires good data, data proficiency, which is not trivial, as you know. Firms that can master both data and AI, we believe are going to have a competitive advantage this decade. Hyperscalers, as we show you dominate AI and ML. We'll show you some data on that. And having said that, there's plenty of room for specialists. They need to partner with the cloud vendors for go to market productivity. And finally, organizations increasingly have to put data and AI at the center of their enterprises. And to do that, most are going to rely on vendor R&D to leverage AI and ML. In other words, Andy, they're going to buy it and apply it as opposed to build it. What are your thoughts on that setup and that premise? >> Yeah, I see that a lot happening in the field, right? So first of all, the only 10% of realizing a return on investment. That's so true because we talked about this earlier, the most companies are still in the innovation cycle. So they're trying to innovate and see what they can do to apply. A lot of these times when you look at the solutions, what they come up with or the models they create, the experimentation they do, most times they don't even have a good business case to solve, right? So they just experiment and then they figure it out, "Oh my God, this model is working. Can we do something to solve it?" So it's like you found a hammer and then you're trying to find the needle kind of thing, right? That never works. >> 'Cause it's cool or whatever it is. >> It is, right? So that's why, I always advise, when they come to me and ask me things like, "Hey, what's the right way to do it? What is the secret sauce?" And, we talked about this. The first thing I tell them is, "Find out what is the business case that's having the most amount of problems, that that can be solved using some of the AI use cases," right? Not all of them can be solved. Even after you experiment, do the whole nine yards, spend millions of dollars on that, right? And later on you make it efficient only by saving maybe $50,000 for the company or a $100,000 for the company, is it really even worth the experiment, right? So you got to start with the saying that, you know, where's the base for this happening? Where's the need? What's a business use case? It doesn't have to be about cost efficient and saving money in the existing processes. It could be a new thing. You want to bring in a new revenue stream, but figure out what is a business use case, how much money potentially I can make off of that. The same way that start-ups go after. Right? >> Yeah. Pretty straightforward. All right, let's take a look at where ML and AI fit relative to the other hot sectors of the ETR dataset. This XY graph shows net score spending velocity in the vertical axis and presence in the survey, they call it sector perversion for the October survey, the January survey's in the field. Then that squiggly line on ML/AI represents the progression. Since the January 21 survey, you can see the downward trajectory. And we position ML and AI relative to the other big four hot sectors or big three, including, ML/AI is four. Containers, cloud and RPA. These have consistently performed above that magic 40% red dotted line for most of the past two years. Anything above 40%, we think is highly elevated. And we've just included analytics and big data for context and relevant adjacentness, if you will. Now note that green arrow moving toward, you know, the 40% mark on ML/AI. I got a glimpse of the January survey, which is in the field. It's got more than a thousand responses already, and it's trending up for the current survey. So Andy, what do you make of this downward trajectory over the past seven quarters and the presumed uptick in the coming months? >> So one of the things you have to keep in mind is when the pandemic happened, it's about survival mode, right? So when somebody's in a survival mode, what happens, the luxury and the innovations get cut. That's what happens. And this is exactly what happened in the situation. So as you can see in the last seven quarters, which is almost dating back close to pandemic, everybody was trying to keep their operations alive, especially digital operations. How do I keep the lights on? That's the most important thing for them. So while the numbers spent on AI, ML is less overall, I still think the AI ML to spend to sort of like a employee experience or the IT ops, AI ops, ML ops, as we talked about, some of those areas actually went up. There are companies, we talked about it, Atlassian had a lot of platform issues till the amount of money people are spending on that is exorbitant and simply because they are offering the solution that was not available other way. So there are companies out there, you can take AoPS or incident management for that matter, right? A lot of companies have a digital insurance, they don't know how to properly manage it. How do you find an intern solve it immediately? That's all using AI ML and some of those areas actually growing unbelievable, the companies in that area. >> So this is a really good point. If you can you bring up that chart again, what Andy's saying is a lot of the companies in the ETR taxonomy that are doing things with AI might not necessarily show up in a granular fashion. And I think the other point I would make is, these are still highly elevated numbers. If you put on like storage and servers, they would read way, way down the list. And, look in the pandemic, we had to deal with work from home, we had to re-architect the network, we had to worry about security. So those are really good points that you made there. Let's, unpack this a little bit and look at the ML AI sector and the ETR data and specifically at the players and get Andy to comment on this. This chart here shows the same x y dimensions, and it just notes some of the players that are specifically have services and products that people spend money on, that CIOs and IT buyers can comment on. So the table insert shows how the companies are plotted, it's net score, and then the ends in the survey. And Andy, the hyperscalers are dominant, as you can see. You see Databricks there showing strong as a specialist, and then you got to pack a six or seven in there. And then Oracle and IBM, kind of the big whales of yester year are in the mix. And to your point, companies like Salesforce that you mentioned to me offline aren't in that mix, but they do a lot in AI. But what are your takeaways from that data? >> If you could put the slide back on please. I want to make quick comments on a couple of those. So the first one is, it's surprising other hyperscalers, right? As you and I talked about this earlier, AWS is more about logo blocks. We discussed that, right? >> Like what? Like a SageMaker as an example. >> We'll give you all the components what do you need. Whether it's MLOps component or whether it's, CodeWhisperer that we talked about, or a oral platform or data or data, whatever you want. They'll give you the blocks and then you'll build things on top of it, right? But Google took a different way. Matter of fact, if we did those numbers a few years ago, Google would've been number one because they did a lot of work with their acquisition of DeepMind and other things. They're way ahead of the pack when it comes to AI for longest time. Now, I think Microsoft's move of partnering and taking a huge competitor out would open the eyes is unbelievable. You saw that everybody is talking about chat GPI, right? And the open AI tool and ChatGPT rather. Remember as Warren Buffet is saying that, when my laundry lady comes and talk to me about stock market, it's heated up. So that's how it's heated up. Everybody's using ChatGPT. What that means is at the end of the day is they're creating, it's still in beta, keep in mind. It's not fully... >> Can you play with it a little bit? >> I have a little bit. >> I have, but it's good and it's not good. You know what I mean? >> Look, so at the end of the day, you take the massive text of all the available text in the world today, mass them all together. And then you ask a question, it's going to basically search through that and figure it out and answer that back. Yes, it's good. But again, as we discussed, if there's no business use case of what problem you're going to solve. This is building hype. But then eventually they'll figure out, for example, all your chats, online chats, could be aided by your AI chat bots, which is already there, which is not there at that level. This could build help that, right? Or the other thing we talked about is one of the areas where I'm more concerned about is that it is able to produce equal enough original text at the level that humans can produce, for example, ChatGPT or the equal enough, the large language transformer can help you write stories as of Shakespeare wrote it. Pretty close to it. It'll learn from that. So when it comes down to it, talk about creating messages, articles, blogs, especially during political seasons, not necessarily just in US, but anywhere for that matter. If people are able to produce at the emission speed and throw it at the consumers and confuse them, the elections can be won, the governments can be toppled. >> Because to your point about chatbots is chatbots have obviously, reduced the number of bodies that you need to support chat. But they haven't solved the problem of serving consumers. Most of the chat bots are conditioned response, which of the following best describes your problem? >> The current chatbot. >> Yeah. Hey, did we solve your problem? No. Is the answer. So that has some real potential. But if you could bring up that slide again, Ken, I mean you've got the hyperscalers that are dominant. You talked about Google and Microsoft is ubiquitous, they seem to be dominant in every ETR category. But then you have these other specialists. How do those guys compete? And maybe you could even, cite some of the guys that you know, how do they compete with the hyperscalers? What's the key there for like a C3 ai or some of the others that are on there? >> So I've spoken with at least two of the CEOs of the smaller companies that you have on the list. One of the things they're worried about is that if they continue to operate independently without being part of hyperscaler, either the hyperscalers will develop something to compete against them full scale, or they'll become irrelevant. Because at the end of the day, look, cloud is dominant. Not many companies are going to do like AI modeling and training and deployment the whole nine yards by independent by themselves. They're going to depend on one of the clouds, right? So if they're already going to be in the cloud, by taking them out to come to you, it's going to be extremely difficult issue to solve. So all these companies are going and saying, "You know what? We need to be in hyperscalers." For example, you could have looked at DataRobot recently, they made announcements, Google and AWS, and they are all over the place. So you need to go where the customers are. Right? >> All right, before we go on, I want to share some other data from ETR and why people adopt AI and get your feedback. So the data historically shows that feature breadth and technical capabilities were the main decision points for AI adoption, historically. What says to me that it's too much focus on technology. In your view, is that changing? Does it have to change? Will it change? >> Yes. Simple answer is yes. So here's the thing. The data you're speaking from is from previous years. >> Yes >> I can guarantee you, if you look at the latest data that's coming in now, those two will be a secondary and tertiary points. The number one would be about ROI. And how do I achieve? I've spent ton of money on all of my experiments. This is the same thing theme I'm seeing across when talking to everybody who's spending money on AI. I've spent so much money on it. When can I get it live in production? How much, how can I quickly get it? Because you know, the board is breathing down their neck. You already spend this much money. Show me something that's valuable. So the ROI is going to become, take it from me, I'm predicting this for 2023, that's going to become number one. >> Yeah, and if people focus on it, they'll figure it out. Okay. Let's take a look at some of the top players that won, some of the names we just looked at and double click on that and break down their spending profile. So the chart here shows the net score, how net score is calculated. So pay attention to the second set of bars that Databricks, who was pretty prominent on the previous chart. And we've annotated the colors. The lime green is, we're bringing the platform in new. The forest green is, we're going to spend 6% or more relative to last year. And the gray is flat spending. The pinkish is our spending's going to be down on AI and ML, 6% or worse. And the red is churn. So you don't want big red. You subtract the reds from the greens and you get net score, which is shown by those blue dots that you see there. So AWS has the highest net score and very little churn. I mean, single low single digit churn. But notably, you see Databricks and DataRobot are next in line within Microsoft and Google also, they've got very low churn. Andy, what are your thoughts on this data? >> So a couple of things that stands out to me. Most of them are in line with my conversation with customers. Couple of them stood out to me on how bad IBM Watson is doing. >> Yeah, bring that back up if you would. Let's take a look at that. IBM Watson is the far right and the red, that bright red is churning and again, you want low red here. Why do you think that is? >> Well, so look, IBM has been in the forefront of innovating things for many, many years now, right? And over the course of years we talked about this, they moved from a product innovation centric company into more of a services company. And over the years they were making, as at one point, you know that they were making about majority of that money from services. Now things have changed Arvind has taken over, he came from research. So he's doing a great job of trying to reinvent themselves as a company. But it's going to have a long way to catch up. IBM Watson, if you think about it, that played what, jeopardy and chess years ago, like 15 years ago? >> It was jaw dropping when you first saw it. And then they weren't able to commercialize that. >> Yeah. >> And you're making a good point. When Gerstner took over IBM at the time, John Akers wanted to split the company up. He wanted to have a database company, he wanted to have a storage company. Because that's where the industry trend was, Gerstner said no, he came from AMEX, right? He came from American Express. He said, "No, we're going to have a single throat to choke for the customer." They bought PWC for relatively short money. I think it was $15 billion, completely transformed and I would argue saved IBM. But the trade off was, it sort of took them out of product leadership. And so from Gerstner to Palmisano to Remedi, it was really a services led company. And I think Arvind is really bringing it back to a product company with strong consulting. I mean, that's one of the pillars. And so I think that's, they've got a strong story in data and AI. They just got to sort of bring it together and better. Bring that chart up one more time. I want to, the other point is Oracle, Oracle sort of has the dominant lock-in for mission critical database and they're sort of applying AI there. But to your point, they're really not an AI company in the sense that they're taking unstructured data and doing sort of new things. It's really about how to make Oracle better, right? >> Well, you got to remember, Oracle is about database for the structure data. So in yesterday's world, they were dominant database. But you know, if you are to start storing like videos and texts and audio and other things, and then start doing search of vector search and all that, Oracle is not necessarily the database company of choice. And they're strongest thing being apps and building AI into the apps? They are kind of surviving in that area. But again, I wouldn't name them as an AI company, right? But the other thing that that surprised me in that list, what you showed me is yes, AWS is number one. >> Bring that back up if you would, Ken. >> AWS is number one as you, it should be. But what what actually caught me by surprise is how DataRobot is holding, you know? I mean, look at that. The either net new addition and or expansion, DataRobot seem to be doing equally well, even better than Microsoft and Google. That surprises me. >> DataRobot's, and again, this is a function of spending momentum. So remember from the previous chart that Microsoft and Google, much, much larger than DataRobot. DataRobot more niche. But with spending velocity and has always had strong spending velocity, despite some of the recent challenges, organizational challenges. And then you see these other specialists, H2O.ai, Anaconda, dataiku, little bit of red showing there C3.ai. But these again, to stress are the sort of specialists other than obviously the hyperscalers. These are the specialists in AI. All right, so we hit the bigger names in the sector. Now let's take a look at the emerging technology companies. And one of the gems of the ETR dataset is the emerging technology survey. It's called ETS. They used to just do it like twice a year. It's now run four times a year. I just discovered it kind of mid-2022. And it's exclusively focused on private companies that are potential disruptors, they might be M&A candidates and if they've raised enough money, they could be acquirers of companies as well. So Databricks would be an example. They've made a number of investments in companies. SNEAK would be another good example. Companies that are private, but they're buyers, they hope to go IPO at some point in time. So this chart here, shows the emerging companies in the ML AI sector of the ETR dataset. So the dimensions of this are similar, they're net sentiment on the Y axis and mind share on the X axis. Basically, the ETS study measures awareness on the x axis and intent to do something with, evaluate or implement or not, on that vertical axis. So it's like net score on the vertical where negatives are subtracted from the positives. And again, mind share is vendor awareness. That's the horizontal axis. Now that inserted table shows net sentiment and the ends in the survey, which informs the position of the dots. And you'll notice we're plotting TensorFlow as well. We know that's not a company, but it's there for reference as open source tooling is an option for customers. And ETR sometimes like to show that as a reference point. Now we've also drawn a line for Databricks to show how relatively dominant they've become in the past 10 ETS surveys and sort of mind share going back to late 2018. And you can see a dozen or so other emerging tech vendors. So Andy, I want you to share your thoughts on these players, who were the ones to watch, name some names. We'll bring that data back up as you as you comment. >> So Databricks, as you said, remember we talked about how Oracle is not necessarily the database of the choice, you know? So Databricks is kind of trying to solve some of the issue for AI/ML workloads, right? And the problem is also there is no one company that could solve all of the problems. For example, if you look at the names in here, some of them are database names, some of them are platform names, some of them are like MLOps companies like, DataRobot (indistinct) and others. And some of them are like future based companies like, you know, the Techton and stuff. >> So it's a mix of those sub sectors? >> It's a mix of those companies. >> We'll talk to ETR about that. They'd be interested in your input on how to make this more granular and these sub-sectors. You got Hugging Face in here, >> Which is NLP, yeah. >> Okay. So your take, are these companies going to get acquired? Are they going to go IPO? Are they going to merge? >> Well, most of them going to get acquired. My prediction would be most of them will get acquired because look, at the end of the day, hyperscalers need these capabilities, right? So they're going to either create their own, AWS is very good at doing that. They have done a lot of those things. But the other ones, like for particularly Azure, they're going to look at it and saying that, "You know what, it's going to take time for me to build this. Why don't I just go and buy you?" Right? Or or even the smaller players like Oracle or IBM Cloud, this will exist. They might even take a look at them, right? So at the end of the day, a lot of these companies are going to get acquired or merged with others. >> Yeah. All right, let's wrap with some final thoughts. I'm going to make some comments Andy, and then ask you to dig in here. Look, despite the challenge of leveraging AI, you know, Ken, if you could bring up the next chart. We're not repeating, we're not predicting the AI winter of the 1990s. Machine intelligence. It's a superpower that's going to permeate every aspect of the technology industry. AI and data strategies have to be connected. Leveraging first party data is going to increase AI competitiveness and shorten time to value. Andy, I'd love your thoughts on that. I know you've got some thoughts on governance and AI ethics. You know, we talked about ChatGBT, Deepfakes, help us unpack all these trends. >> So there's so much information packed up there, right? The AI and data strategy, that's very, very, very important. If you don't have a proper data, people don't realize that AI is, your AI is the morals that you built on, it's predominantly based on the data what you have. It's not, AI cannot predict something that's going to happen without knowing what it is. It need to be trained, it need to understand what is it you're talking about. So 99% of the time you got to have a good data for you to train. So this where I mentioned to you, the problem is a lot of these companies can't afford to collect the real world data because it takes too long, it's too expensive. So a lot of these companies are trying to do the synthetic data way. It has its own set of issues because you can't use all... >> What's that synthetic data? Explain that. >> Synthetic data is basically not a real world data, but it's a created or simulated data equal and based on real data. It looks, feels, smells, taste like a real data, but it's not exactly real data, right? This is particularly useful in the financial and healthcare industry for world. So you don't have to, at the end of the day, if you have real data about your and my medical history data, if you redact it, you can still reverse this. It's fairly easy, right? >> Yeah, yeah. >> So by creating a synthetic data, there is no correlation between the real data and the synthetic data. >> So that's part of AI ethics and privacy and, okay. >> So the synthetic data, the issue with that is that when you're trying to commingle that with that, you can't create models based on just on synthetic data because synthetic data, as I said is artificial data. So basically you're creating artificial models, so you got to blend in properly that that blend is the problem. And you know how much of real data, how much of synthetic data you could use. You got to use judgment between efficiency cost and the time duration stuff. So that's one-- >> And risk >> And the risk involved with that. And the secondary issues which we talked about is that when you're creating, okay, you take a business use case, okay, you think about investing things, you build the whole thing out and you're trying to put it out into the market. Most companies that I talk to don't have a proper governance in place. They don't have ethics standards in place. They don't worry about the biases in data, they just go on trying to solve a business case >> It's wild west. >> 'Cause that's what they start. It's a wild west! And then at the end of the day when they are close to some legal litigation action or something or something else happens and that's when the Oh Shit! moments happens, right? And then they come in and say, "You know what, how do I fix this?" The governance, security and all of those things, ethics bias, data bias, de-biasing, none of them can be an afterthought. It got to start with the, from the get-go. So you got to start at the beginning saying that, "You know what, I'm going to do all of those AI programs, but before we get into this, we got to set some framework for doing all these things properly." Right? And then the-- >> Yeah. So let's go back to the key points. I want to bring up the cloud again. Because you got to get cloud right. Getting that right matters in AI to the points that you were making earlier. You can't just be out on an island and hyperscalers, they're going to obviously continue to do well. They get more and more data's going into the cloud and they have the native tools. To your point, in the case of AWS, Microsoft's obviously ubiquitous. Google's got great capabilities here. They've got integrated ecosystems partners that are going to continue to strengthen through the decade. What are your thoughts here? >> So a couple of things. One is the last mile ML or last mile AI that nobody's talking about. So that need to be attended to. There are lot of players in the market that coming up, when I talk about last mile, I'm talking about after you're done with the experimentation of the model, how fast and quickly and efficiently can you get it to production? So that's production being-- >> Compressing that time is going to put dollars in your pocket. >> Exactly. Right. >> So once, >> If you got it right. >> If you get it right, of course. So there are, there are a couple of issues with that. Once you figure out that model is working, that's perfect. People don't realize, the moment you decide that moment when the decision is made, it's like a new car. After you purchase the value decreases on a minute basis. Same thing with the models. Once the model is created, you need to be in production right away because it starts losing it value on a seconds minute basis. So issue number one, how fast can I get it over there? So your deployment, you are inferencing efficiently at the edge locations, your optimization, your security, all of this is at issue. But you know what is more important than that in the last mile? You keep the model up, you continue to work on, again, going back to the car analogy, at one point you got to figure out your car is costing more than to operate. So you got to get a new car, right? And that's the same thing with the models as well. If your model has reached a stage, it is actually a potential risk for your operation. To give you an idea, if Uber has a model, the first time when you get a car from going from point A to B cost you $60. If the model decayed the next time I might give you a $40 rate, I would take it definitely. But it's lost for the company. The business risk associated with operating on a bad model, you should realize it immediately, pull the model out, retrain it, redeploy it. That's is key. >> And that's got to be huge in security model recency and security to the extent that you can get real time is big. I mean you, you see Palo Alto, CrowdStrike, a lot of other security companies are injecting AI. Again, they won't show up in the ETR ML/AI taxonomy per se as a pure play. But ServiceNow is another company that you have have mentioned to me, offline. AI is just getting embedded everywhere. >> Yep. >> And then I'm glad you brought up, kind of real-time inferencing 'cause a lot of the modeling, if we can go back to the last point that we're going to make, a lot of the AI today is modeling done in the cloud. The last point we wanted to make here, I'd love to get your thoughts on this, is real-time AI inferencing for instance at the edge is going to become increasingly important for us. It's going to usher in new economics, new types of silicon, particularly arm-based. We've covered that a lot on "Breaking Analysis", new tooling, new companies and that could disrupt the sort of cloud model if new economics emerge. 'Cause cloud obviously very centralized, they're trying to decentralize it. But over the course of this decade we could see some real disruption there. Andy, give us your final thoughts on that. >> Yes and no. I mean at the end of the day, cloud is kind of centralized now, but a lot of this companies including, AWS is kind of trying to decentralize that by putting their own sub-centers and edge locations. >> Local zones, outposts. >> Yeah, exactly. Particularly the outpost concept. And if it can even become like a micro center and stuff, it won't go to the localized level of, I go to a single IOT level. But again, the cloud extends itself to that level. So if there is an opportunity need for it, the hyperscalers will figure out a way to fit that model. So I wouldn't too much worry about that, about deployment and where to have it and what to do with that. But you know, figure out the right business use case, get the right data, get the ethics and governance place and make sure they get it to production and make sure you pull the model out when it's not operating well. >> Excellent advice. Andy, I got to thank you for coming into the studio today, helping us with this "Breaking Analysis" segment. Outstanding collaboration and insights and input in today's episode. Hope we can do more. >> Thank you. Thanks for having me. I appreciate it. >> You're very welcome. All right. I want to thank Alex Marson who's on production and manages the podcast. Ken Schiffman as well. Kristen Martin and Cheryl Knight helped get the word out on social media and our newsletters. And Rob Hoof is our editor-in-chief over at Silicon Angle. He does some great editing for us. Thank you all. Remember all these episodes are available as podcast. Wherever you listen, all you got to do is search "Breaking Analysis" podcast. I publish each week on wikibon.com and silicon angle.com or you can email me at david.vellante@siliconangle.com to get in touch, or DM me at dvellante or comment on our LinkedIn posts. Please check out ETR.AI for the best survey data and the enterprise tech business, Constellation Research. Andy publishes there some awesome information on AI and data. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching everybody and we'll see you next time on "Breaking Analysis". (gentle closing tune plays)

Published Date : Dec 29 2022

SUMMARY :

bringing you data-driven Andy, great to have you on the program. and AI at the center of their enterprises. So it's like you found a of the AI use cases," right? I got a glimpse of the January survey, So one of the things and it just notes some of the players So the first one is, Like a And the open AI tool and ChatGPT rather. I have, but it's of all the available text of bodies that you need or some of the others that are on there? One of the things they're So the data historically So here's the thing. So the ROI is going to So the chart here shows the net score, Couple of them stood out to me IBM Watson is the far right and the red, And over the course of when you first saw it. I mean, that's one of the pillars. Oracle is not necessarily the how DataRobot is holding, you know? So it's like net score on the vertical database of the choice, you know? on how to make this more Are they going to go IPO? So at the end of the day, of the technology industry. So 99% of the time you What's that synthetic at the end of the day, and the synthetic data. So that's part of AI that blend is the problem. And the risk involved with that. So you got to start at data's going into the cloud So that need to be attended to. is going to put dollars the first time when you that you can get real time is big. a lot of the AI today is I mean at the end of the day, and make sure they get it to production Andy, I got to thank you for Thanks for having me. and manages the podcast.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Alex Marson	PERSON	0.99+
Andy	PERSON	0.99+
Andy Thurai	PERSON	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ken Schiffman	PERSON	0.99+
Tom Davenport	PERSON	0.99+
AMEX	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Cheryl Knight	PERSON	0.99+
Rashmi Kumar	PERSON	0.99+
Rob Hoof	PERSON	0.99+
Google	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
Ken	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
October	DATE	0.99+
6%	QUANTITY	0.99+
$40	QUANTITY	0.99+
January 21	DATE	0.99+
Chipotle	ORGANIZATION	0.99+
$15 billion	QUANTITY	0.99+
five	QUANTITY	0.99+
Rashmi	PERSON	0.99+
$50,000	QUANTITY	0.99+
$60	QUANTITY	0.99+
US	LOCATION	0.99+
January	DATE	0.99+
Antonio	PERSON	0.99+
John Akers	PERSON	0.99+
Warren Buffet	PERSON	0.99+
late 2018	DATE	0.99+
Ikea	ORGANIZATION	0.99+
American Express	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
PWC	ORGANIZATION	0.99+
99%	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
Domino	ORGANIZATION	0.99+
Arvind	PERSON	0.99+
Palo Alto	LOCATION	0.99+
30 billion	QUANTITY	0.99+
last year	DATE	0.99+
Constellation Research	ORGANIZATION	0.99+
Gerstner	PERSON	0.99+
120 billion	QUANTITY	0.99+
$100,000	QUANTITY	0.99+

Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business

>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)

Published Date : Sep 7 2022

SUMMARY :

bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Ken Shifman	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Erik Bradley	PERSON	0.99+
November 21	DATE	0.99+
Darren Bramen	PERSON	0.99+
Alex	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Netskope	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Fivetran	ORGANIZATION	0.99+
$50 million	QUANTITY	0.99+
21%	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
19%	QUANTITY	0.99+
Jeremy Burton	PERSON	0.99+
$800 million	QUANTITY	0.99+
6,000	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
November '21	DATE	0.99+
ETR	ORGANIZATION	0.99+
First	QUANTITY	0.99+
25%	QUANTITY	0.99+
last year	DATE	0.99+
OneTrust	ORGANIZATION	0.99+
two dimensions	QUANTITY	0.99+
two groups	QUANTITY	0.99+
November of 21	DATE	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
more than 400 companies	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
MySQL	TITLE	0.99+
Moogsoft	ORGANIZATION	0.99+
The Cube	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Grafana	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Mike Speiser	PERSON	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
second	QUANTITY	0.99+
two	QUANTITY	0.99+
first	QUANTITY	0.99+
28%	QUANTITY	0.99+
16%	QUANTITY	0.99+
Second	QUANTITY	0.99+

Jonathan Seckler, Dell & Cal Al-Dhubaib, Pandata | VMware Explore 2022

(gentle music) >> Welcome back to theCUBE's virtual program, covering VMware Explorer, 2022. The first time since 2019 that the VMware ecosystem is gathered in person. But in the post isolation economy, hybrid is the new format, cube plus digital, we call it. And so we're really happy to welcome Cal Al-Dhubaib who's the founder and CEO and AI strategist of Pandata. And Jonathan Seckler back in theCUBE, the senior director of product marketing at Dell Technologies. Guys, great to see you, thanks for coming on. >> Yeah, thanks a lot for having us. >> Yeah, thank you >> Cal, Pandata, cool name, what's it all about? >> Thanks for asking. Really excited to share our story. I'm a data scientist by training and I'm based here in Cleveland, Ohio. And Pandata is a company that helps organizations design and develop machine learning and AI technology. And when I started this here in Cleveland six years ago, I had people react to me with, what? So we help demystify AI and make it practical. And we specifically focus on trustworthy AI. So we work a lot in regulated industries like healthcare. And we help organizations navigate the complexities of building machine learning and AI technology when data's hard to work with, when there's risk on the potential outcomes, or high cost in the consequences. And that's what we do every day. >> Yeah, yeah timing is great given all the focus on privacy and what you're seeing with big tech and public policy, so we're going to get into that. Jonathan, I understand you guys got some hard news. What's your story around AI and AutoML? Share that with us. >> Yeah, thanks. So having the opportunity to speak with Cal today is really important because one of the hardest things that we find that our customers have is making that transition of experimenting with AI to making it really useful in real life. >> What is the tech underneath that? Are we talking VxRail here? Are you're talking servers? What do you got? >> Yeah, absolutely. So the Dell validated design for AI is a reference framework that is based on the optimized set of hardware for a given outcome. That includes it could be VxRail, VMware, vSphere and Nvidia GPUs and Nvidia software to make all of that happen. And for today, what we're working with is H2O.ai's solution to develop automatic machine learning. So take just that one more step to make it easier for customers to bring AI into production. >> Cool. >> So it's a full stack of software that includes automated machine learning, it includes NVIDIA's AI enterprise for deployment and development, and it's all built on an engineering validated set of hardware, including servers and storage and whatever else you need >> AI out of the box, I don't have to worry about cobbling it all together. >> Exactly. >> Cal, I want to come back to this trusted AI notion. A lot of people don't trust AI just by the very nature of it. I think about, okay, well how does it know it's a cat? And then you can never explain, it says black box. And so I'm like, what are they do with my data? And you mentioned healthcare, financial services, the government, they know everything about me. I just had to get a real ID and Massachusetts, I had to give all my data away. I don't trust it. So what is trusted AI? >> Well, so let me take a step back and talk about sobering statistics. There's a lot of different sources that report on this, but anywhere you look, you'll hear somewhere between 80 to 90% of AI projects fail to yield a return. That's pretty scary, that's a disappointing industry. And why is that? AI is hard. Versus traditional software, you're programming rules hard and fast. If I click this button, I expect A, B, C to happen. And we're talking about recognizing and reacting to patterns. It's not, will it be wrong? It's, when it's wrong, how wrong will it be? And what are it cost to accept related to that? So zooming back in on this lens of trustworthy AI, much of the last 10 years the development in AI has looked like this. Let's get the data, let's race to build the warehouses, okay we did that, no problem. Next was race to build the algorithms. Can we build more sophisticated models? Can we work with things like documents and images? And it used to be the exclusive domain of deep tech companies. You'd have to have teams of teams building the software, building the infrastructure, working on very specific components in this pipeline. And now we have this explosion of technologies, very much like what Jonathan was talking about with validated designs. So it removes the complexities of the infrastructure, it removes the complexities of being able to access the right data. And we have a ton of modeling capabilities and tools out there, so we can build a lot of things. Now, this is when we start to encounter risk in machine learning and AI. If you think about the models that are being used to replicate or learn from language like GPT-3 to create new content, it's training data set is everything that's on the internet. And if you haven't been on the internet recently, it's not all good. So how do you go about building technology to recognize specific patterns, pick up patterns that are desirable, and avoid unintended consequences? And no one's immune to this. So the discipline of trustworthy AI is building models that are easier to interrogate, that are useful for humans, and that minimize the risk of unintended consequences. >> I would add too, one of the good things about the Pandata solution is how it tries to enforce fairness and transparency in the models. We've done some studies recently with IDC, where we've tried to compare leaders in AI technology versus those who are just getting started. And I have to say, one of the biggest differences between a leader in AI and the rest of us is often that the leaders have a policy in place to deal with the risks and the ethics of using data through some kind of machine oriented model. And it's a really important part of making AI usable for the masses. >> You certainly hear a lot about, AI ultimately, there's algorithms which are built by humans. Although of course, there's algorithms to build algorithms, we know that today. >> Right, exactly. >> But humans are biased, there's inherent bias, and so this is a big problem. Obviously Dell, you have a giant observation space in terms of customers. But I wonder, Cal, if you can share with us how you're working with your customers at Pandata? What kind of customers are you working with? What are they asking? What problems are they asking you to solve? And how does it manifest itself? >> So when I like to talk about AI and where it's useful, it usually has to do with taking a repetitive task that humans are tasked with, but they're starting to act more like machines than humans. There's not much creativity in the process, it's handling something that's fairly routine, and it ends up being a bottleneck to scaling. And just a year ago even, we'd have to start approaching our clients with conversations around trustworthy AI, and now they're starting to approach us. Really example, this actually just happened earlier today, we're partnering with one of our clients that basically scans medical claims from insurance providers. And what they're trying to do is identify members that qualify for certain government subsidies. And this isn't as straightforward as it seems because there's a lot of complexities in how the rules are implemented, how judges look at these cases. Long story short, we help them build machine learning to identify these patients that qualify. And a question that comes up, and that we're starting to hear from the insurance companies they serve is how do you go about making sure that your decisions are fair and you're not selecting certain groups of individuals over others to get this assistance? And so clients are starting to wise up to that and ask questions. Other things that we've done include identifying potential private health information that's contained in medical images so that you can create curated research data sets. We've helped organizations identify anomalies in cybersecurity logs. And go from an exploration space of billions of eventual events to what are the top 100 that I should look at today? And so it's all about, how do you find these routine processes that humans are bottlenecked from getting to, we're starting to act more like machines and insert a little bit of outer recognition intelligence to get them to spend more time on the creative side. >> Can you talk a little bit more about how? A lot of people talk about augmented AI. AI is amazing. My daughter the other day was, I'm sure as an AI expert, you've seen it, where the machine actually creates standup comedy which it's so hilarious because it is and it isn't. Some of the jokes are actually really funny. Some of them are so funny 'cause they're not funny and they're weird. So it really underscored the gap. And so how do you do it? Is it augmented? Is it you're focusing on the mundane things that you want to take humans out of the loop? Explain how. >> So there's this great Wall Street Journal article by Jennifer Strong that she published I think four years ago now. And she says, "For AI to become more useful, it needs to become more boring." And I really truly believe in that. So you hear about these cutting edge use cases. And there's certainly some room for these generative AI applications inspiring new designs, inspiring new approaches. But the reality is, most successful use cases that we encounter in our business have to do with augmenting human decisions. How do you make arriving at a decision easier? How do you prioritize from millions of options, hundreds of thousands of options down to three or four that a human can then take the last stretch and really consider or think about? So a really cool story, I've been playing around with DALL.E 2. And for those of you who haven't heard, it's this algorithm that can create images from props. And they're just painting I really wish I had bought when I was in Paris a few years ago. And I gave it a description, skyline of the Sacre-Coeur Church in Montmartre with pink and white hues. And it came up with a handful of examples that I can now go take to an artist and say paint me this. So at the end of the day, automation, it's not really, yes, there's certain applications where you really are truly getting to that automated AI in action. But in my experience, most of the use cases have to do with using AI to make humans more effective, more creative, more valuable. >> I'd also add, I think Cal, is that the opportunity to make AI real here is to automate these things and simplify the languages so that can get what we call citizen data scientists out there. I say ordinary, ordinary employees or people who are at the front line of making these decisions, working with the data directly. We've done this with customers who have done this on farms, where the growers are able to use AI to monitor and to manage the yield of crops. I think some of the other examples that you had mentioned just recently Cal I think are great. The other examples is where you can make this technology available to anyone. And maybe that's part of the message of making it boring, it's making it so simple that any of us can use it. >> I love that. John Furrier likes to say that traditionally in IT, we solve complexity with more complexity. So anything that simplifies things is goodness. So how do you use automated machine learning at Pandata? Where does that fit in here? >> So really excited that the connection here through H2O that Jonathan had mentioned earlier. So H2O.ai is one of the leading AutoML platforms. And what's really cool is if you think about the traditional way you would approach machine learning, is you need to have data scientists. These patterns might exist in documents or images or boring old spreadsheets. And the way you'd approach this is, okay, get these expensive data scientists, and 80% of what they do is clean up the data. And I'm yet to encounter a situation where there isn't cleaning data. Now, I'll get through the cleaning up the data step, you actually have to consider, all right, am I working with language? Am I working with financial forecasts? What are the statistical modeling approaches I want to use? And there's a lot of creativity involved in that. And you have to set up a whole experiment, and that takes a lot of time and effort. And then you might test one, two or three models because you know to use those or those are the go to for this type of problem. And you see which one performs best and you iterate from there. The AutoML framework basically allows you to cut through all of that. It can reduce the amount of time you're spending on those steps to 1/10 of the time. You're able to very quickly profile data, understand anomalies, understand what data you want to work with, what data you don't want to work with. And then when it comes to the modeling steps, instead of iterating through three or four AutoML is throwing the whole kitchen sink at it. Anything that's appropriate to the task, maybe you're trying to predict a category or label something, maybe you're trying to predict a value like a financial forecast or even generate test. And it tests all of the models that it has at its disposal that are appropriate to the task and says, here are the top 10. You can use features like let me make this more explainable, let me make the model more accurate. I don't necessarily care about interrogating the results because the risk here is low, I want to a model that predicts things with a higher accuracy. So you can use these dials instead of having to approach it from a development perspective. You can approach it from more of an experimental mindset. So you still need that expertise, you still need to understand what you're looking at, but it makes it really quick. And so you're not spending all that expensive data science time cleaning up data. >> Makes sense. Last question, so Cal, obviously you guys go deep into AI, Jonathan Dell works with every customer on the planet, all sizes, all industries. So what are you hearing and doing with customers that are best practices that you can share for people that want to get into it, that are concerned about AI, they want to simplify it? What would you tell them? Go ahead, Cal. >> Okay, you go first, Cal. >> And Jonathan, you're going to bring us home. >> Sure. >> This sounds good. So as far as where people get scared, I see two sides of it. One, our data's not clean enough, not enough quality, I'm going to stay away from this. So one, I combat that with, you've got to experiment, you got to iterate, And that's the only way your data's going to improve. Two, there's organizations that worry too much about managing the risk. We don't have the data science expertise that can help us uncover potential biases we have. We are now entering a new stage of AI development and machine learning development, And I use those terms interchangeably anymore. I know some folks will differentiate between them. But machine learning is the discipline driving most of the advances. The toolkits that we have at our disposal to quickly profile and manage and mitigate against the risk that data can bring to the table is really giving organizations more comfort, should give organizations more comfort to start to build mission critical applications. The thing that I would encourage organizations to look for, is organizations that put trustworthy AI, ethical AI first as a consideration, not as an afterthought or not as a we're going to sweep this on the carpet. When you're intentional with that, when you bring that up front and you make it a part of your design, it sets you up for success. And we saw this when GDPR changed the IT world a few years ago. Organizations that built for privacy first to begin with, adapting to GDPR was relatively straightforward. Organizations that made that an afterthought or had that as an afterthought, it was a huge lift, a huge cost to adapt and adjust to those changes. >> Great example. All right, John, I said bring us home, put a bow on this. >> Last bit. So I think beyond the mechanics of how to make a AI better and more workable, one of the big challenges with the AI is this concern that you're going to isolate and spend too much effort and dollars on the infrastructure itself. And that's one of the benefits that Dell brings to the table here with validated designs. Is that our AI validated design is built on a VMware vSphere architecture. So your backup, your migration, all of the management and the operational tools that IT is most comfortable with can be used to maintain and develop and deploy artificial intelligence projects without having to create unique infrastructure, unique stacks of hardware, and then which potentially isolates the data, potentially makes things unavailable to the rest of the organization. So when you run it all in a VMware environment, that means you can put it in the cloud, you can put it in your data center. Just really makes it easier for IT to build AI into their everyday process >> Silo busting. All right, guys, thanks Cal, John. I really appreciate you guys coming on theCUBE. >> Yeah, it's been a great time, thanks. >> All right. And thank you for watching theCUBE's coverage of VMware Explorer, 2022. Keep it right there for more action from the show floor with myself, Dave Velante, John Furrier, Lisa Martin and David Nicholson, keep it right there. (gentle music)

Published Date : Aug 30 2022

SUMMARY :

that the VMware ecosystem I had people react to me with, what? given all the focus on privacy So having the opportunity that is based on the I don't have to worry about And then you can never and that minimize the risk And I have to say, one of algorithms to build algorithms, And how does it manifest itself? so that you can create And so how do you do it? that I can now go take to an the opportunity to make AI real here So how do you use automated And it tests all of the models that are best practices that you can share going to bring us home. And that's the only way your All right, John, I said bring And that's one of the benefits I really appreciate you And thank you for watching

ENTITIES

Entity	Category	Confidence
Jonathan	PERSON	0.99+
John	PERSON	0.99+
Jennifer Strong	PERSON	0.99+
Jonathan Seckler	PERSON	0.99+
Dave Velante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Nicholson	PERSON	0.99+
Cleveland	LOCATION	0.99+
Paris	LOCATION	0.99+
John Furrier	PERSON	0.99+
Jonath	PERSON	0.99+
Jonathan Dell	PERSON	0.99+
two	QUANTITY	0.99+
80%	QUANTITY	0.99+
Pandata	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
billions	QUANTITY	0.99+
Cleveland, Ohio	LOCATION	0.99+
Dell Technologies	ORGANIZATION	0.99+
six years ago	DATE	0.99+
four	QUANTITY	0.99+
Montmartre	LOCATION	0.99+
three	QUANTITY	0.99+
Two	QUANTITY	0.99+
GDPR	TITLE	0.99+
a year ago	DATE	0.99+
2022	DATE	0.99+
Cal Al-Dhubaib	PERSON	0.98+
today	DATE	0.98+
Cal	PERSON	0.98+
2019	DATE	0.98+
first time	QUANTITY	0.98+
VxRail	TITLE	0.98+
first	QUANTITY	0.97+
Massachusetts	LOCATION	0.97+
millions of options	QUANTITY	0.97+
AutoML	TITLE	0.97+
three models	QUANTITY	0.97+
four years ago	DATE	0.97+
80	QUANTITY	0.96+
IDC	ORGANIZATION	0.96+
90%	QUANTITY	0.96+
DALL.E 2	TITLE	0.96+
1/10	QUANTITY	0.95+
VMware Explorer	TITLE	0.93+
Sacre-Coeur Church	LOCATION	0.92+
earlier today	DATE	0.91+
theCUBE	ORGANIZATION	0.9+
H2O.ai	TITLE	0.9+
Pandata	PERSON	0.9+
hundreds of thousands of options	QUANTITY	0.87+
10	QUANTITY	0.86+
VMware vSphere	TITLE	0.84+
few years ago	DATE	0.83+
H2O	TITLE	0.83+
GPT	TITLE	0.82+
VMware	ORGANIZATION	0.8+
Al-Dhubaib	PERSON	0.8+
100	QUANTITY	0.79+

Breaking Analysis: How JPMC is Implementing a Data Mesh Architecture on the AWS Cloud

>> From theCUBE studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> A new era of data is upon us, and we're in a state of transition. You know, even our language reflects that. We rarely use the phrase big data anymore, rather we talk about digital transformation or digital business, or data-driven companies. Many have come to the realization that data is a not the new oil, because unlike oil, the same data can be used over and over for different purposes. We still use terms like data as an asset. However, that same narrative, when it's put forth by the vendor and practitioner communities, includes further discussions about democratizing and sharing data. Let me ask you this, when was the last time you wanted to share your financial assets with your coworkers or your partners or your customers? Hello everyone, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we want to share our assessment of the state of the data business. We'll do so by looking at the data mesh concept and how a leading financial institution, JP Morgan Chase is practically applying these relatively new ideas to transform its data architecture. Let's start by looking at what is the data mesh. As we've previously reported many times, data mesh is a concept and set of principles that was introduced in 2018 by Zhamak Deghani who's director of technology at ThoughtWorks, it's a global consultancy and software development company. And she created this movement because her clients, who were some of the leading firms in the world had invested heavily in predominantly monolithic data architectures that had failed to deliver desired outcomes in ROI. So her work went deep into trying to understand that problem. And her main conclusion that came out of this effort was the world of data is distributed and shoving all the data into a single monolithic architecture is an approach that fundamentally limits agility and scale. Now a profound concept of data mesh is the idea that data architectures should be organized around business lines with domain context. That the highly technical and hyper specialized roles of a centralized cross functional team are a key blocker to achieving our data aspirations. This is the first of four high level principles of data mesh. So first again, that the business domain should own the data end-to-end, rather than have it go through a centralized big data technical team. Second, a self-service platform is fundamental to a successful architectural approach where data is discoverable and shareable across an organization and an ecosystem. Third, product thinking is central to the idea of data mesh. In other words, data products will power the next era of data success. And fourth data products must be built with governance and compliance that is automated and federated. Now there's lot more to this concept and there are tons of resources on the web to learn more, including an entire community that is formed around data mesh. But this should give you a basic idea. Now, the other point is that, in observing Zhamak Deghani's work, she is deliberately avoided discussions around specific tooling, which I think has frustrated some folks because we all like to have references that tie to products and tools and companies. So this has been a two-edged sword in that, on the one hand it's good, because data mesh is designed to be tool agnostic and technology agnostic. On the other hand, it's led some folks to take liberties with the term data mesh and claim mission accomplished when their solution, you know, maybe more marketing than reality. So let's look at JP Morgan Chase in their data mesh journey. Is why I got really excited when I saw this past week, a team from JPMC held a meet up to discuss what they called, data lake strategy via data mesh architecture. I saw that title, I thought, well, that's a weird title. And I wondered, are they just taking their legacy data lakes and claiming they're now transformed into a data mesh? But in listening to the presentation, which was over an hour long, the answer is a definitive no, not at all in my opinion. A gentleman named Scott Hollerman organized the session that comprised these three speakers here, James Reid, who's a divisional CIO at JPMC, Arup Nanda who is a technologist and architect and Serita Bakst who is an information architect, again, all from JPMC. This was the most detailed and practical discussion that I've seen to date about implementing a data mesh. And this is JP Morgan's their approach, and we know they're extremely savvy and technically sound. And they've invested, it has to be billions in the past decade on data architecture across their massive company. And rather than dwell on the downsides of their big data past, I was really pleased to see how they're evolving their approach and embracing new thinking around data mesh. So today, we're going to share some of the slides that they use and comment on how it dovetails into the concept of data mesh that Zhamak Deghani has been promoting, and at least as we understand it. And dig a bit into some of the tooling that is being used by JP Morgan, particularly around it's AWS cloud. So the first point is it's all about business value, JPMC, they're in the money business, and in that world, business value is everything. So Jr Reid, the CIO showed this slide and talked about their overall goals, which centered on a cloud first strategy to modernize the JPMC platform. I think it's simple and sensible, but there's three factors on which he focused, cut costs always short, you got to do that. Number two was about unlocking new opportunities, or accelerating time to value. But I was really happy to see number three, data reuse. That's a fundamental value ingredient in the slide that he's presenting here. And his commentary was all about aligning with the domains and maximizing data reuse, i.e. data is not like oil and making sure there's appropriate governance around that. Now don't get caught up in the term data lake, I think it's just how JP Morgan communicates internally. It's invested in the data lake concept, so they use water analogies. They use things like data puddles, for example, which are single project data marts or data ponds, which comprise multiple data puddles. And these can feed in to data lakes. And as we'll see, JPMC doesn't strive to have a single version of the truth from a data standpoint that resides in a monolithic data lake, rather it enables the business lines to create and own their own data lakes that comprise fit for purpose data products. And they do have a single truth of metadata. Okay, we'll get to that. But generally speaking, each of the domains will own end-to-end their own data and be responsible for those data products, we'll talk about that more. Now the genesis of this was sort of a cloud first platform, JPMC is leaning into public cloud, which is ironic since the early days, in the early days of cloud, all the financial institutions were like never. Anyway, JPMC is going hard after it, they're adopting agile methods and microservices architectures, and it sees cloud as a fundamental enabler, but it recognizes that on-prem data must be part of the data mesh equation. Here's a slide that starts to get into some of that generic tooling, and then we'll go deeper. And I want to make a couple of points here that tie back to Zhamak Deghani's original concept. The first is that unlike many data architectures, this puts data as products right in the fat middle of the chart. The data products live in the business domains and are at the heart of the architecture. The databases, the Hadoop clusters, the files and APIs on the left-hand side, they serve the data product builders. The specialized roles on the right hand side, the DBA's, the data engineers, the data scientists, the data analysts, we could have put in quality engineers, et cetera, they serve the data products. Because the data products are owned by the business, they inherently have the context that is the middle of this diagram. And you can see at the bottom of the slide, the key principles include domain thinking, an end-to-end ownership of the data products. They build it, they own it, they run it, they manage it. At the same time, the goal is to democratize data with a self-service as a platform. One of the biggest points of contention of data mesh is governance. And as Serita Bakst said on the Meetup, metadata is your friend, and she kind of made a joke, she said, "This sounds kind of geeky, but it's important to have a metadata catalog to understand where data resides and the data lineage in overall change management. So to me, this really past the data mesh stink test pretty well. Let's look at data as products. CIO Reid said the most difficult thing for JPMC was getting their heads around data product, and they spent a lot of time getting this concept to work. Here's the slide they use to describe their data products as it related to their specific industry. They set a common language and taxonomy is very important, and you can imagine how difficult that was. He said, for example, it took a lot of discussion and debate to define what a transaction was. But you can see at a high level, these three product groups around wholesale, credit risk, party, and trade and position data as products, and each of these can have sub products, like, party, we'll have to know your customer, KYC for example. So a key for JPMC was to start at a high level and iterate to get more granular over time. So lots of decisions had to be made around who owns the products and the sub-products. The product owners interestingly had to defend why that product should even exist, what boundaries should be in place and what data sets do and don't belong in the various products. And this was a collaborative discussion, I'm sure there was contention around that between the lines of business. And which sub products should be part of these circles? They didn't say this, but tying it back to data mesh, each of these products, whether in a data lake or a data hub or a data pond or data warehouse, data puddle, each of these is a node in the global data mesh that is discoverable and governed. And supporting this notion, Serita said that, "This should not be infrastructure-bound, logically, any of these data products, whether on-prem or in the cloud can connect via the data mesh." So again, I felt like this really stayed true to the data mesh concept. Well, let's look at some of the key technical considerations that JPM discussed in quite some detail. This chart here shows a diagram of how JP Morgan thinks about the problem, and some of the challenges they had to consider were how to write to various data stores, can you and how can you move data from one data store to another? How can data be transformed? Where's the data located? Can the data be trusted? How can it be easily accessed? Who has the right to access that data? These are all problems that technology can help solve. And to address these issues, Arup Nanda explained that the heart of this slide is the data in ingestor instead of ETL. All data producers and contributors, they send their data to the ingestor and the ingestor then registers the data so it's in the data catalog. It does a data quality check and it tracks the lineage. Then, data is sent to the router, which persists the data in the data store based on the best destination as informed by the registration. This is designed to be a flexible system. In other words, the data store for a data product is not fixed, it's determined at the point of inventory, and that allows changes to be easily made in one place. The router simply reads that optimal location and sends it to the appropriate data store. Nowadays you see the schema infer there is used when there is no clear schema on right. In this case, the data product is not allowed to be consumed until the schema is inferred, and then the data goes into a raw area, and the inferer determines the schema and then updates the inventory system so that the data can be routed to the proper location and properly tracked. So that's some of the detail of how the sausage factory works in this particular use case, it was very interesting and informative. Now let's take a look at the specific implementation on AWS and dig into some of the tooling. As described in some detail by Arup Nanda, this diagram shows the reference architecture used by this group within JP Morgan, and it shows all the various AWS services and components that support their data mesh approach. So start with the authorization block right there underneath Kinesis. The lake formation is the single point of entitlement and has a number of buckets including, you can see there the raw area that we just talked about, a trusted bucket, a refined bucket, et cetera. Depending on the data characteristics at the data catalog registration block where you see the glue catalog, that determines in which bucket the router puts the data. And you can see the many AWS services in use here, identity, the EMR, the elastic MapReduce cluster from the legacy Hadoop work done over the years, the Redshift Spectrum and Athena, JPMC uses Athena for single threaded workloads and Redshift Spectrum for nested types so they can be queried independent of each other. Now remember very importantly, in this use case, there is not a single lake formation, rather than multiple lines of business will be authorized to create their own lakes, and that creates a challenge. So how can that be done in a flexible and automated manner? And that's where the data mesh comes into play. So JPMC came up with this federated lake formation accounts idea, and each line of business can create as many data producer or consumer accounts as they desire and roll them up into their master line of business lake formation account. And they cross-connect these data products in a federated model. And these all roll up into a master glue catalog so that any authorized user can find out where a specific data element is located. So this is like a super set catalog that comprises multiple sources and syncs up across the data mesh. So again to me, this was a very well thought out and practical application of database. Yes, it includes some notion of centralized management, but much of that responsibility has been passed down to the lines of business. It does roll up to a master catalog, but that's a metadata management effort that seems compulsory to ensure federated and automated governance. As well at JPMC, the office of the chief data officer is responsible for ensuring governance and compliance throughout the federation. All right, so let's take a look at some of the suspects in this world of data mesh and bring in the ETR data. Now, of course, ETR doesn't have a data mesh category, there's no such thing as that data mesh vendor, you build a data mesh, you don't buy it. So, what we did is we use the ETR dataset to select and filter on some of the culprits that we thought might contribute to the data mesh to see how they're performing. This chart depicts a popular view that we often like to share. It's a two dimensional graphic with net score or spending momentum on the vertical axis and market share or pervasiveness in the data set on the horizontal axis. And we filtered the data on sectors such as analytics, data warehouse, and the adjacencies to things that might fit into data mesh. And we think that these pretty well reflect participation that data mesh is certainly not all compassing. And it's a subset obviously, of all the vendors who could play in the space. Let's make a few observations. Now as is often the case, Azure and AWS, they're almost literally off the charts with very high spending velocity and large presence in the market. Oracle you can see also stands out because much of the world's data lives inside of Oracle databases. It doesn't have the spending momentum or growth, but the company remains prominent. And you can see Google Cloud doesn't have nearly the presence in the dataset, but it's momentum is highly elevated. Remember that red dotted line there, that 40% line, anything over that indicates elevated spending momentum. Let's go to Snowflake. Snowflake is consistently shown to be the gold standard in net score in the ETR dataset. It continues to maintain highly elevated spending velocity in the data. And in many ways, Snowflake with its data marketplace and its data cloud vision and data sharing approach, fit nicely into the data mesh concept. Now, a caution, Snowflake has used the term data mesh in it's marketing, but in our view, it lacks clarity, and we feel like they're still trying to figure out how to communicate what that really is. But is really, we think a lot of potential there to that vision. Databricks is also interesting because the firm has momentum and we expect further elevated levels in the vertical axis in upcoming surveys, especially as it readies for its IPO. The firm has a strong product and managed service, and is really one to watch. Now we included a number of other database companies for obvious reasons like Redis and Mongo, MariaDB, Couchbase and Terradata. SAP as well is in there, but that's not all database, but SAP is prominent so we included them. As is IBM more of a database, traditional database player also with the big presence. Cloudera includes Hortonworks and HPE Ezmeral comprises the MapR business that HPE acquired. So these guys got the big data movement started, between Cloudera, Hortonworks which is born out of Yahoo, which was the early big data, sorry early Hadoop innovator, kind of MapR when it's kind of owned course, and now that's all kind of come together in various forms. And of course, we've got Talend and Informatica are there, they are two data integration companies that are worth noting. We also included some of the AI and ML specialists and data science players in the mix like DataRobot who just did a monster $250 million round. Dataiku, H2O.ai and ThoughtSpot, which is all about democratizing data and injecting AI, and I think fits well into the data mesh concept. And you know we put VMware Cloud in there for reference because it really is the predominant on-prem infrastructure platform. All right, let's wrap with some final thoughts here, first, thanks a lot to the JP Morgan team for sharing this data. I really want to encourage practitioners and technologists, go to watch the YouTube of that meetup, we'll include it in the link of this session. And thank you to Zhamak Deghani and the entire data mesh community for the outstanding work that you're doing, challenging the established conventions of monolithic data architectures. The JPM presentation, it gives you real credibility, it takes Data Mesh well beyond concept, it demonstrates how it can be and is being done. And you know, this is not a perfect world, you're going to start somewhere and there's going to be some failures, the key is to recognize that shoving everything into a monolithic data architecture won't support massive scale and agility that you're after. It's maybe fine for smaller use cases in smaller firms, but if you're building a global platform in a data business, it's time to rethink data architecture. Now much of this is enabled by the cloud, but cloud first doesn't mean cloud only, doesn't mean you'll leave your on-prem data behind, on the contrary, you have to include non-public cloud data in your Data Mesh vision just as JPMC has done. You've got to get some quick wins, that's crucial so you can gain credibility within the organization and grow. And one of the key takeaways from the JP Morgan team is, there is a place for dogma, like organizing around data products and domains and getting that right. On the other hand, you have to remain flexible because technologies is going to come, technology is going to go, so you got to be flexible in that regard. And look, if you're going to embrace the metaphor of water like puddles and ponds and lakes, we suggest maybe a little tongue in cheek, but still we believe in this, that you expand your scope to include data ocean, something John Furry and I have talked about and laughed about extensively in theCUBE. Data oceans, it's huge. It's the new data lake, go transcend data lake, think oceans. And think about this, just as we're evolving our language, we should be evolving our metrics. Much the last the decade of big data was around just getting the stuff to work, getting it up and running, standing up infrastructure and managing massive, how much data you got? Massive amounts of data. And there were many KPIs built around, again, standing up that infrastructure, ingesting data, a lot of technical KPIs. This decade is not just about enabling better insights, it's a more than that. Data mesh points us to a new era of data value, and that requires the new metrics around monetizing data products, like how long does it take to go from data product conception to monetization? And how does that compare to what it is today? And what is the time to quality if the business owns the data, and the business has the context? the quality that comes out of them, out of the shoot should be at a basic level, pretty good, and at a higher mark than out of a big data team with no business context. Automation, AI, and very importantly, organizational restructuring of our data teams will heavily contribute to success in the coming years. So we encourage you, learn, lean in and create your data future. Okay, that's it for now, remember these episodes, they're all available as podcasts wherever you listen, all you got to do is search, breaking analysis podcast, and please subscribe. Check out ETR's website at etr.plus for all the data and all the survey information. We publish a full report every week on wikibon.com and siliconangle.com. And you can get in touch with us, email me david.vellante@siliconangle.com, you can DM me @dvellante, or you can comment on my LinkedIn posts. This is Dave Vellante for theCUBE insights powered by ETR. Have a great week everybody, stay safe, be well, and we'll see you next time. (upbeat music)

Published Date : Jul 12 2021

SUMMARY :

This is braking analysis and the adjacencies to things

ENTITIES

Entity	Category	Confidence
JPMC	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
2018	DATE	0.99+
Zhamak Deghani	PERSON	0.99+
James Reid	PERSON	0.99+
JP Morgan	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Serita Bakst	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Scott Hollerman	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
40%	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
Serita	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Arup Nanda	PERSON	0.99+
each	QUANTITY	0.99+
ThoughtWorks	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
each line	QUANTITY	0.99+
Terradata	ORGANIZATION	0.99+
Redis	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
first point	QUANTITY	0.99+
three factors	QUANTITY	0.99+
Second	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
Talend	ORGANIZATION	0.99+
John Furry	PERSON	0.99+
Zhamak Deghani	PERSON	0.99+
first platform	QUANTITY	0.98+
YouTube	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
Third	QUANTITY	0.97+
Couchbase	ORGANIZATION	0.97+
three speakers	QUANTITY	0.97+
two data	QUANTITY	0.97+
first strategy	QUANTITY	0.96+
one	QUANTITY	0.96+
one place	QUANTITY	0.96+
Jr Reid	PERSON	0.96+
single lake	QUANTITY	0.95+
SAP	ORGANIZATION	0.95+
wikibon.com	OTHER	0.95+
siliconangle.com	OTHER	0.94+
Azure	ORGANIZATION	0.93+

Deploying AI in the Enterprise

(orchestral music) >> Hi, I'm Peter Burris and welcome to another digital community event. As we do with all digital community events, we're gonna start off by having a series of conversations with real thought leaders about a topic that's pressing to today's enterprises as they try to achieve new classes of business outcomes with technology. At the end of that series of conversations, we're gonna go into a crowd chat and give you an opportunity to voice your opinions and ask your questions. So stay with us throughout. So, what are we going to be talking about today? We're going to be talking about the challenge that businesses face as they try to apply AI, ML, and new classes of analytics to their very challenging, very difficult, but nonetheless very value-producing outcomes associated with data. The challenge that all these businesses have is that often, you spend too much time in the infrastructure and not enough time solving the problem. And so what's required is new classes of technology and new classes of partnerships and business arrangements that allow for us to mask the underlying infrastructure complexity from data science practitioners, so that they can focus more time and attention on building out the outcomes that the business wants and a sustained business capability so that we can continue to do so. Once again, at the end of this series of conversations, stay with us, so that we can have that crowd chat and you can, again, ask your questions, provide your insights, and participate with the community to help all of us move faster in this crucial direction for better AI, better ML and better analytics. So, the first conversation we're going to have is with Anant Chintamaneni. Anant's the Vice President of Products at BlueData. Anant, welcome to theCUBE. >> Hi Peter, it's great to be here. I think the topic that you just outlined is a very fascinating and interesting one. Over the last 10 years, data and analytics have been used to create transformative experiences and drive a lot of business growth. You look at companies like Uber, AirBnB, and you know, Spotify, practically, every industry's being disrupted. And the reason why they're able to do this is because data is in their DNA; it's their key asset and they've leveraged it in every aspect of their product development to deliver amazing experiences and drive business growth. And the reason why they're able to do this is they've been able to leverage open-source technologies, data science techniques, and big data, fast data, all types of data to extract that business value and inject analytics into every part of their business process. Enterprises of all sizes want to take advantage of that same assets that the new digital companies are taking and drive digital transformation and innovation, in their organizations. But there's a number of challenges. First and foremost, if you look at the enterprises where data was not necessarily in their DNA and to inject that into their DNA, it is a big challenge. The executives, the executive branch, definitely wants to understand where they want to apply AI, how to kind of identify which huge cases to go after. There is some recognition coming in. They want faster time-to-value and they're willing to invest in that. >> And they want to focus more on the actual outcomes they seek as opposed to the technology selection that's required to achieve those outcomes. >> Absolutely. I think it's, you know, a boardroom mandate for them to drive new business outcomes, new business models, but I think there is still some level of misalignment between the executive branch and the data worker community which they're trying to upgrade with the new-age data scientists, the AI developer and then you have IT in the middle who has to basically bridge the gap and enable the digital transformation journey and provide the infrastructure, provide the capabilities. >> So we've got a situation where people readily acknowledge the potential of some of these new AI, ML, big data related technologies, but we've got a mismatch between the executives that are trying to do evidence-based management, drive new models, the IT organization who's struggling to deal with data-first technologies, and data scientists who are few and far between, and leave quickly if they don't get the tooling that they need. So, what's the way forward, that's the problem. How do we move forward? >> Yeah, so I think, you know, I think we have to double-click into some of the problems. So the data scientists, they want to build a tool chain that leverages the best in-class, open source technologies to solve the problem at hand and they don't want, they want to be able to compile these tool chains, they want to be able to apply and create new algorithms and operationalize and do it in a very iterative cycle. It's a continuous development, continuous improvement process which is at odds with what IT can deliver, which is they have to deliver data that is dispersed all over the place to these data scientists. They need to be able to provide infrastructure, which today, they're not, there's an impotence mismatch. It takes them months, if not years, to be able to make those available, make that infrastructure available. And last but not the least, security and control. It's just fundamentally not the way they've worked where they can make data and new tool chains available very quickly to the data scientists. And the executives, it's all about faster time-to-value so there's a little bit of an expectation mismatch as well there and so those are some of the fundamental problems. There's also reproducibility, like, once you've created an analytics model, to be able to reproduce that at scale, to be then able to govern that and make sure that it's producing the right results is fundamentally a challenge. >> Audibility of that process. >> Absolutely, audibility. And, in general, being able to apply this sort of model for many different business problems so you can drive outcomes in different parts of your business. So there's a huge number of problems here. And so what I believe, and what we've seen with some of these larger companies, the new digital companies that are driving business valley ways, they have invested in a unified platform where they've made the infrastructure invisible by leveraging cloud technologies or containers and essentially, made it such that the data scientists don't have to worry about the infrastructure, they can be a lot more agile, they can quickly create the tool chains that work for the specific business problem at hand, scale it up and down as needed, be able to access data where it lies, whether it's on-prem, whether it's in the cloud or whether it's a hybrid model. And so that's something that's required from a unified platform where you can do your rapid prototyping, you can do your development and ultimately, the business outcome and the value comes when you operationalize it and inject it into your business processes. So, I think fundamentally, this start, this kind of a unified platform, is critical. Which, I think, a lot of the new age companies have, but is missing with a lot of the enterprises. >> So, a big challenge for the enterprise over the next few years is to bring these three groups together; the business, data science world and infrastructure world or others to help with those problems and apply it successfully to some of the new business challenges that we have. >> Yeah, and I would add one last point is that we are on this continuous journey, as I mentioned, this is a world of open source technologies that are coming out from a lot of the large organizations out there. Whether it's your Googles and your Facebooks. And so there is an evolution in these technologies much like we've evolved from big data and data management to capture the data. The next sort of phase is around data exploitation with artificial intelligence and machine learning type techniques. And so, it's extremely important that this platform enables these organizations to future proof themselves. So as new technologies come in, they can leverage them >> Great point. >> for delivering exponential business value. >> Deliver value now, but show a path to delivery value in the future as all of these technologies and practices evolve. >> Absolutely. >> Excellent, all right, Anant Chintamaneni, thanks very much for giving us some insight into the nature of the problems that enterprises face and some of the way forward. We're gonna be right back, and we're gonna talk about how to actually do this in a second. (light techno music) >> Introducing, BlueData EPIC. The leading container-based software platform for distributed AI, machine learning, deep learning and analytics environments. Whether on-prem, in the cloud or in a hybrid model. Data scientists need to build models utilizing various stacks of AI, ML and DL applications and libraries. However, installing and validating these environments is time consuming and prone to errors. BlueData provides the ability to spin up these environments on demand. The BlueData EPIC app store includes, best of breed, ready to run docker based application images. Like TensorFlow and H2O driverless AI. Teams can also add their own images, to provide the latest tools that data scientists prefer. And ensure compliance with enterprise standards. They can use the quick launch button. which provides pre configured templates with the appropriate application image and resources. For example, they can instantly launch a new Sandbox environment using the template for TensorFlow with a Jupyter Notebook. Within just a few minutes, it'll be automatically configured with GPUs and easy access to their data. Users can launch experiments and make GPUs automatically available for analysis. In this case, the H2O environment was set up with one GPU. With BlueData EPIC, users can also deploy end points with the appropriate run time. And the inference run times can use CPUs or GPUs. With a container based BlueData Platform, you can deploy fully configured distributed environments within a matter of minutes. Whether on-prem, in the public cloud, or in a hybrid a architecture. BlueData was recently acquired by Hewlett Packward Enterprise. And now, HPE and BlueData are joining forces to help you on your AI journey. (light techno music) To learn more, visit www.BlueData.com >> And we're back. I'm Peter Burris and we're continuing to have this conversation about how businesses are turning experience with the problems of advance analytics and the solutions that they seek into actual systems that deliver continuous on going value and achieve the business capabilities required to make possible these advanced outcomes associated with analytics, AI and ML. And to do that, we've got two great guests with us. We've got Kumar Sreekanti, who is the co-founder and CEO of BlueData. Kumar, welcome back to theCUBE. >> Thank you, it is nice to be here, back again. >> And Kumar, you're being joined by a customer. Ramesh Thyagarajan, is the executive director of the Advisory Board Company which is part of Optum now. Ramesh, welcome to theCUBE. >> Great to be here. >> Alright, so Kumar let's start with you. I mentioned up front, this notion of turning technology and understanding into actual business capabilities to deliver outcomes. What has been BlueData's journey along, to make that happen? >> Yeah, it all started six years ago, Peter. It was a bold vision and a big idea and no pun intended on big data which was an emerging market then. And as everybody knows, the data was enormous and there was a lot of innovation around the periphery. but nobody was paying attention to how to make the big data consumable in enterprise. And I saw an enormous opportunity to make this data more consumable in the enterprise and to give a cloud-like experience with the agility and elasticity. So, our vision was to build a software infrastructure platform like VMware, specially focused on data intensity distributed applications and this platform will allow enterprises to build cloud like experiences both on enterprise as well as on hybrid clouds. So that it pays the journey for their cloud experience. So I was very fortunate to put together a team and I found good partners like Intel. So that actually is the genesis for the BlueData. So, if you look back into the last six years, big data itself has went through a lot of evolution and so the marketplace and the enterprises have gone from offline analytics to AI, ML based work loads that are actually giving them predictive and descriptive analytics. What BlueData has done is by making the infrastructure invisible, by making the tool set completely available as the tool set itself is evolving and in the process, we actually created so many game changing software technologies. For example, we are the first end-to-end content-arised enterprise solution that gives you distributed applications. And we built a technology called DataTap, that provides computed data operation so that you don't have to actually copy the data, which is a boom for enterprises. We also actually built multitenancy so those enterprises can run multiple work loads on the same data and Ramesh will tell you in a second here, in the healthcare enterprise, the multitenancy is such a very important element. And finally, we also actually contributed to many open source technologies including, we have a project called KubeDirector which is actually is our own Kubernetes and how to run stateful workloads on Kubernetes. which we have actually very happy to see that people like, customers like Ramesh are using the BlueData. >> Sounds like quite a journey and obviously you've intercepted companies like the advisory board company. So Ramesh, a lot of enterprises have mastered or you know, gotten, understood how to create data lakes with a dupe but then found that they still weren't able to connect to some of the outcomes that they saw. Is that the experience that you had. >> Right, to be precise, that is one of the kind of problems we have. It's not just the data lake that we need to be able to do the workflows or other things, but we also, being a traditional company, being in the business for a long time, we have a lot of data assets that are not part of this data lake. We're finding it hard to, how do we get the data, getting them and putting them in a data lake is a duplication of work. We were looking for some kind of solutions that will help us to gather the benefits of leaving the data alone but still be able to get into it. >> This is where (mumbles). >> This is where we were looking for things and then I was lucky and fortunate to run into Kumar and his crew in one of the Hadoop conferences and then they demonstrated the way it can be done so immediately hit upon, it's a big hit with us and then we went back and then did a POC, very quickly adapt to the technology and that is also one of the benefits of corrupting this technology is the level of contrary memorization they are doing, it is helping me to address many needs. My data analyst, the data engineers and the data scientists so I'm able to serve all of them which otherwise wouldn't be possible for me with just this plain very (mumbles). >> So it sounds as though the partnership with BlueData has allowed you to focus on activities and problems and challenges above the technology so that you can actually start bringing data science, business objectives and infrastructure people together. Have I got that right? >> Absolutely. So BlueData is helping me to tie them all together and provide an excess value to my business. We being in the healthcare, the importance is we need to be able to look at the large data sets for a period of time in order to figure out how a patient's health journey is happening. That is very important so that we can figure out the ways and means in which we can lower the cost of health care and also provide insights to the physician, they can help get people better at health. >> So we're getting great outcomes today especially around, as you said that patient journey where all the constituents can get access to those insights without necessarily having to learn a whole bunch of new infrastructure stuff but presumably you need more. We're talking about a new world that you mentioned before upfront, talking about a new world, AI, ML, a lot of changes. A lot of our enterprise customers are telling us it's especially important that they find companies that not only deliver something today but demonstrate a commitment to sustain that value delivery process especially as the whole analytics world evolves. Are you experiencing that as well? >> Yes, we are experiencing and one of the great advantage of the platform, BlueData platform that gave me this ability to, I had the new functionality, be it the TensorFlow, be it the H2O, be it the heart studio, anything that I needed, I call them, they give me the images that are plug-and-play, just put them and all the prompting is practically transparent to nobody need to know how it is achieved. Now, in order to get to the next level of the predictive and prescriptive analytics, it is not just you having the data, you need to be able to have your curated data asset set process on top of a platform that will help you to get the data scientists to make you. One of the biggest challenges that are scientist is not able to get their hands on data. BlueData platform gives me the ability to do it and ensure all the security meets and all the compliances with the various other regulated compliances we need to make. >> Kamar, congratulations. >> Thank you. >> Sounds like you have a happy customer. >> Thank you. >> One of the challenges that every entrepreneur faces is how did you scale the business. So talk to us about where you are in the decisions that you made recently to achieve that. >> As an entrepreneur, when you start a company, odds are against you, right? You're always worried about it, right. You make so many sacrifices, yourself and your team and all that but the the customer is the king. The most important thing for us to find satisfied customers like Rameshan so we were very happy and BlueData was very successful in finding that customer because i think as you pointed out, as Ramesh pointed out, we provide that clean solution for the customer but as you go through this journey as a co-founder and CEO, you always worry about how do you scale to the next level. So we had partnerships with many companies including HPE and we found when this opportunity came in front of me with myself and my board, we saw this opportunity of combining the forces of BlueData satisfied customers and innovative technology and the team with the HPs brand name, their world-class service, their investment in R&D and they have a very long, large list of enterprise customers. We think putting these two things together provides that next journey in the BlueData's innovation and BlueData's customers. >> Excellent, so once again Kumar Sreekanti, co-founder and CEO of BlueData and Ramesh Thyagarajan who is the executive director of the advisory board company and part of Optum, I want to thank both of you for being on theCUBE. >> Thank you >> Thank you, great to be here. >> Now let's hear a little bit more about how this notion of bringing BlueData and HPE together is generating new classes of value that are making things happen today but are also gonna make things happen for customers in the future and to do that we've got Dave Velante who's with Silicon Angle Wiki Bond joined by Patrick Osbourne who's with HPE in our Marlborough studio so Dave over to you. >> Thanks Peter. We're here with Patrick Osbourne, the vice president and general manager of big data and analytics at Hewlett Packard Enterprise. Patrick, thanks for coming on. >> Thanks for having us. >> So we heard from Kumar, let's hear from you. Why did HPE purchase, acquire BlueData? >> So if you think about it from three angles. Platform, people and customers, right. Great platform, built for scale addressing a number of these new workloads and big data analytics and certainly AI, the people that they have are amazing, right, great engineering team, awesome customer success team, team of data scientists, right. So you know, all the folks that have some really, really great knowledge in this space so they're gonna be a great addition to HPE and also on the customer side, great logos, major fortune five customers in the financial services vertical, healthcare, pharma, manufacturing so a huge opportunity for us to scale that within HP context. >> Okay, so talk about how it fits into your strategy, specifically what are you gonna do with it? What are the priorities, can you share some roadmap? >> Yeah, so you take a look at HPE strategy. We talk about hybrid cloud and specifically edge to core to cloud and the common theme that runs through that is data, data-driven enterprises. So for us we see BlueData, Epic platform as a way to you know, help our customers quickly deploy these new mode to applications that are fueling their digital transformation. So we have some great plans. We're gonna certainly invest in all the functions, right. So we're gonna do a force multiplier on not only on product engineering and product delivery but also go to market and customer success. We're gonna come out in our business day one with some really good reference architectures, with some of our partners like Cloud Era, H2O, we've got some very scalable building block architectures to marry up the BlueData platform with our Apollo systems for those of you have seen that in the market, we've got our Elastic platform for analytics for customers who run these workloads, now you'd be able to virtualize those in containers and we'll have you know, we're gonna be building out a big services practice in this area. So a lot of customers often talk to us about, we don't have the people to do this, right. So we're gonna bring those people to you as HPE through Point Next, advisory services, implementation, ongoing help with customers. So it's going to be a really fantastic start. >> Apollo, as you mentioned Apollo. I think of Apollo sometimes as HPC high performance computing and we've had a lot of discussion about how that's sort of seeping in to mainstream, is that what you're seeing? >> Yeah absolutely, I mean we know that a lot of our customers have traditional workloads, you know, they're on the path to almost completely virtualizing those, right, but where a lot of the innovation is going on right now is in this mode two world, right. So your big data and analytics pipeline is getting longer, you're introducing new experiences on top of your product and that's fueling you know, essentially commercial HPC and now that folks are using techniques like AI and modeling inference to make those services more scalable, more automated, we're starting to bringing these more of these platforms, these scalable architectures like Apollo. >> So it sounds like your roadmap has a lot of integration plans across the HPE portfolio. We certainly saw that with Nimble, but BlueData was working with a lot of different companies, its software, is the plan to remain open or is this an HPE thing? >> Yeah, we absolutely want to be open. So we know that we have lots of customers that choose, so the HP is all about hybrid cloud, right and that has a couple different implications. We want to talk about your choice of on-prem versus off-prem so BlueData has a great capability to run some of these workloads. It essentially allows you to do separation of compute and storage, right in the world of AI and analytics we can run it off-prem as well in the public cloud but then we also have choice for customers, you know, any customer's private cloud. So that means they want to run on other infrastructure besides HPE, we're gonna support that, we have existing customers that do that. We're also gonna provide infrastructure that marries the software and the hardware together with frameworks like Info Site that we feel will be a you know, much better experience for the customers but we'll absolutely be open and absolutely have choice. >> All right, what about the business impact to take the customer perspective, what can they expect? >> So I think from a customer perspective, we're really just looking to accelerate deployment of AI in the enterprise, right and that has a lot of implications for us. We're gonna have very scalable infrastructure for them, we're gonna be really focused on this very dynamic AI and ML application ecosystems through partnerships and support within the BlueData platform. We want to provide a SAS experience, right. So whether that's GPUs or accelerators as a service, analytics as a service, we really want to fuel innovation as a service. We want to empower those data scientists there, those are they're really hard to find you know, they're really hard to retain within your organization so we want to unlock all that capability and really just we want to focus on innovation of the customers. >> Yeah, and they spend a lot of time wrangling data so you're really going to simplify that with the cloud (mumbles). Patrick thank you, I appreciate it. >> Thank you very much. >> Alright Peter, back to you in Palo Alto. >> And welcome back, I'm Peter Burris and we've been talking a lot in the industry about how new tooling, new processes can achieve new classes of analytics, AI and ML outcomes within a business but if you don't get the people side of that right, you're not going to achieve the full range of benefits that you might get out of your investments. Now to talk a little bit about how important the data science practitioner is in this equation, we've got two great guests with us. Nanda Vijaydev is the chief data scientists of BlueData. Welcome to theCUBE. >> Thank you Peter, happy to be here. >> Ingrid Burton is the CMO and business leader at H2O.AI, Ingrid, welcome to the CUBE. >> Thank you so much for having us. >> So Nanda Vijaydev, let's start with you. Again, having a nice platform, very, very important but how does that turn into making the data science practitioner's life easier so they can deliver more business value. >> Yeah thank you, it's a great question. I think end of the day for a data scientist, what's most important is, did you understand the question that somebody asked you and what is expected of you when you deliver something and then you go about finding, what do I need for them, I need data, I need systems and you know, I need to work with people, the experts in the process to make sure that the hypothesis I'm doing is structured in a nice way where it is testable, it's modular and I have you know, a way for them to go back to show my results and keep doing this in an iterative manner. That's the biggest thing because the satisfaction for a data scientist is when you actually take this and make use of it, put it in production, right. To make this whole thing easier, we definitely need some way of bringing it all together. That's really where, especially compared to the traditional data science where everything was monolithic, it was one system, there was a very set way of doing things but now it is not so you know, with the growing types of data, with the growing types of computation algorithms that's available, there's a lot of opportunity and at the same time there is a lot of uncertainty. So it's really about putting that structure and it's really making sure you get the best of everything and still deliver the results, that is the focus that all data scientists strive for. >> And especially you wanted, the data scientists wants to operate in the world of uncertainty related to the business question and reducing uncertainty and not deal with the underlying some uncertainty associated with the infrastructure. >> Absolutely, absolutely you know, as a data scientist a lot of time used to spend in the past about where is the data, then the question was, what data do you want and give it to you because the data always came in a nice structured, row-column format, it had already lost a lot of context of what we had to look for. So it is really not about you know, getting the you know, it's really not about going back to systems that are pre-built or pre-processed, it's getting access to that real, raw data. It's getting access to the information as it came so you can actually make the best judgment of how to go forward with it. >> So you describe the world with business, technology and data science practitioners are working together but let's face it, there's an enormous amount of change in the industry and quite frankly, a deficit of expertise and I think that requires new types of partnerships, new types of collaboration, a real (mumbles) approach and Ingrid, I want to talk about what H2O.AI is doing as a partner of BlueData, HPE to ensure that you're complementing these skills in pursuit or in service to the customer's objectives. >> Absolutely, thank you for that. So as Nanda described, you know, data scientists want to get to answers and what we do at H2O.AI is we provide the algorithms, the platforms for data scientist to be successful. So when they want to try and solve a problem, they need to work with their business leaders, they need to work with IT and they actually don't want to do all the heavy lifting, they want to solve that problem. So what we do is we do automatic machine learning platforms, we do that with optimizing algorithms and doing all the kind of, a lot of the heavy lifting that novice data scientists need and help expert data scientists as well. I talk about it as algorithms to answers and actually solving business problems with predictions and that's what machine learning is really all about but really what we're seeing in the industry right now and BlueData is a great example of kind of taking away some of the hard stuff away from a data scientist and making them successful. So working with BlueData and HPE, making us together really solve the problems that businesses are looking for, it's really transformative and we've been through like the digital transformation journey, all of us have been through that. We are now what I would term an AI transformation of sorts and businesses are going to the next step. They had their data, they got their data, infrastructure is kind of seamlessly working together, the clusters and containerization that's very important. Now what we're trying to do is get to the answers and using automatic machine learning platforms is probably the best way forward. >> That's still hard stuff but we're trying to get rid of data science practitioners, focusing on hard stuff that doesn't directly deliver value. >> It doesn't deliver anything for them, right. They shouldn't have to worry about the infrastructure, they should worry about getting the answers to the business problems they've been asked to solve. >> So let's talk a little bit about some of the new business problems that are going to be able to be solved by these kinds of partnerships between BlueData and H2O.AI. Start, Nanda, what do you, what gets you excited when we think about the new types of business problems that customers are gonna be able to solve. >> Yeah, I think it is really you know, the question that comes to you is not filtered through someone else's lens, right. Someone is trying an optimization problem, someone is trying to do a new product discovery so all this is based on a combination of both data-driven and evidence-based, right. For us as a data scientist, what excites me is that I have the flexibility now that I can choose the best of the breed technologies. I should not be restricted to what is given to me by an IT organization or something like that but at the same time, in an organization, for things to work, there has to be some level of control. So it is really having this type of environments or having some platforms where some, there is a team that can work on the control aspect but as a data scientist, I don't have to worry about it. I have my flexibility of tools of choice that I can use. At the same time, when you talk about data, security is a big deal in companies and a lot of times data scientists don't get access to data because of the layers and layers of security that they have to go through, right. So the excitement of the opportunity for me is if someone else takes care of the problem you know, just tell me where is the source of data that I can go to, don't filter the data for me you know, don't already structure the data for me but just tell me it's an approved source, right then it gives me more flexibility to actually go and take that information and build. So the having those controls taken care of well before I get into the picture as a data scientist, it makes it extremely easy for us to focus on you know, to her point, focus on the problem, right, focus on accessing the best of the breed technology and you know, give back and have that interaction with the business users on an ongoing basis. >> So especially focus on, so speed to value so that you're not messing around with a bunch of underlying infrastructure, governance remaining in place so that you know what are the appropriate limits of using the data with security that is embedded within that entire model without removing fidelity out of the quality of data. >> Absolutely. >> Would you agree with those? >> I totally agree with all the points that she brought up and we have joint customers in the market today, they're solving very complex problems. We have customers in financial services, joint customers there. We have customers in healthcare that are really trying to solve today's business problems and these are everything from, how do I give new credit to somebody? How do I know what next product to give them? How do I know what customer recommendations can I make next? Why did that customer churn? How do I reach new people? How do I do drug discovery? How do I give a patient a better prescription? How do I pinpoint disease than when I couldn't have seen it before? Now we have all that data that's available and it's very rich and data is a team sport. It takes data scientists, it takes business leaders and it takes IT to make it all work together and together the two companies are really working to solve problems that our customers are facing, working with our customers because they have the intellectual knowledge of what their problems are. We are providing the tools to help them solve those problems. >> Fantastic conversation about what is necessary to ensure that the data science practitioner remains at the center and is the ultimate test of whether or not these systems and these capabilities are working for business. Nanda Vijaydev, chief data scientist of BlueData, Ingrid Burton CMO and business leader, H2O.AI, thank you very much for being on theCUBE. >> Thank you. >> Thank you so much. >> So let's now spend some time talking about how ultimately, all of this comes together and what you're going to do as you participate in the crowd chat. To do that let me throw it back to Dave Velante in our Marlborough studios. >> We're back with Patrick Osbourne, alright Patrick, let's wrap up here and summarize. We heard how you're gonna help data science teams, right. >> Yup, speed, agility, time to value. >> Alright and I know a bunch of folks at BlueData, the engineering team is very, very strong so you picked up a good asset there. >> Yeah, it means amazing technology, the founders have a long lineage of software development and adoption in the market so we're just gonna, we're gonna invested them and let them loose. >> And then we heard they're sort of better together story from you, you got a roadmap, you're making some investments here, as I heard. >> Yeah, I mean so if we're really focused on hybrid cloud and we want to have all these as a services experience, whether it's through Green Lake or providing innovation, AI, GPUs as a service is something that we're gonna be you know, continuing to provide our customers as we move along. >> Okay and then we heard the data science angle and the data science community and the partner angle, that's exciting. >> Yeah, I mean, I think it's two approaches as well too. We have data scientists, right. So we're gonna bring that capability to bear whether it's through the product experience or through a professional services organization and then number two, you know, this is a very dynamic ecosystem from an application standpoint. There's commercial applications, there's certainly open source and we're gonna bring a fully vetted, full stack experience for our customers that they can feel confident in this you know, it's a very dynamic space. >> Excellent, well thank you very much. >> Thank you. Alright, now it's your turn. Go into the crowd chat and start talking. Ask questions, we're gonna have polls, we've got experts in there so let's crouch chat.

Published Date : May 7 2019

SUMMARY :

and give you an opportunity to voice your opinions and to inject that into their DNA, it is a big challenge. on the actual outcomes they seek and provide the infrastructure, provide the capabilities. and leave quickly if they don't get the tooling So the data scientists, they want to build a tool chain that the data scientists don't have to worry and apply it successfully to some and data management to capture the data. but show a path to delivery value in the future that enterprises face and some of the way forward. to help you on your AI journey. and the solutions that they seek into actual systems of the Advisory Board Company which is part of Optum now. What has been BlueData's journey along, to make that happen? and in the process, we actually created Is that the experience that you had. of leaving the data alone but still be able to get into it. and that is also one of the benefits and challenges above the technology and also provide insights to the physician, that you mentioned before upfront, and one of the great advantage of the platform, So talk to us about where you are in the decisions and all that but the the customer is the king. and part of Optum, I want to thank both of you in the future and to do that we've got Dave Velante and general manager of big data and analytics So we heard from Kumar, let's hear from you. and certainly AI, the people that they have are amazing, So a lot of customers often talk to us about, about how that's sort of seeping in to mainstream, and modeling inference to make those services more scalable, its software, is the plan to remain open and storage, right in the world of AI and analytics those are they're really hard to find you know, Yeah, and they spend a lot of time wrangling data of benefits that you might get out of your investments. Ingrid Burton is the CMO and business leader at H2O into making the data science practitioner's life easier and at the same time there is a lot of uncertainty. the data scientists wants to operate in the world of how to go forward with it. and Ingrid, I want to talk about what H2O and businesses are going to the next step. that doesn't directly deliver value. to the business problems they've been asked to solve. of the new business problems that are going to be able and a lot of times data scientists don't get access to data So especially focus on, so speed to value and it takes IT to make it all work together to ensure that the data science practitioner remains To do that let me throw it back to Dave Velante We're back with Patrick Osbourne, Alright and I know a bunch of folks at BlueData, and adoption in the market so we're just gonna, And then we heard they're sort of better together story that we're gonna be you know, continuing and the data science community and then number two, you know, Go into the crowd chat and start talking.

ENTITIES

Entity	Category	Confidence
Peter	PERSON	0.99+
Ramesh Thyagarajan	PERSON	0.99+
Kumar Sreekanti	PERSON	0.99+
Dave Velante	PERSON	0.99+
Peter Burris	PERSON	0.99+
Kumar	PERSON	0.99+
Nanda Vijaydev	PERSON	0.99+
AirBnB	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
BlueData	ORGANIZATION	0.99+
Patrick Osbourne	PERSON	0.99+
Patrick	PERSON	0.99+
Ingrid Burton	PERSON	0.99+
Ramesh	PERSON	0.99+
Anant Chintamaneni	PERSON	0.99+
Spotify	ORGANIZATION	0.99+
Nanda	PERSON	0.99+
HPE	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
two companies	QUANTITY	0.99+
Ingrid	PERSON	0.99+
Anant	PERSON	0.99+
Hewlett Packward Enterprise	ORGANIZATION	0.99+
H2O.AI	ORGANIZATION	0.99+
both	QUANTITY	0.99+
HPs	ORGANIZATION	0.99+
Facebooks	ORGANIZATION	0.99+
Googles	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Intel	ORGANIZATION	0.99+
Marlborough	LOCATION	0.99+
First	QUANTITY	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
one system	QUANTITY	0.99+
today	DATE	0.99+
two approaches	QUANTITY	0.99+
Apollo	ORGANIZATION	0.99+
www.BlueData.com	OTHER	0.99+
HP	ORGANIZATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
six years ago	DATE	0.98+
two things	QUANTITY	0.98+
One	QUANTITY	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for H2O.AI: