Opening Panel | Generative AI: Hype or Reality | AWS Startup Showcase S3 E1

(light airy music) >> Hello, everyone, welcome to theCUBE's presentation of the AWS Startup Showcase, AI and machine learning. "Top Startups Building Generative AI on AWS." This is season three, episode one of the ongoing series covering the exciting startups from the AWS ecosystem, talking about AI machine learning. We have three great guests Bratin Saha, VP, Vice President of Machine Learning and AI Services at Amazon Web Services. Tom Mason, the CTO of Stability AI, and Aidan Gomez, CEO and co-founder of Cohere. Two practitioners doing startups and AWS. Gentlemen, thank you for opening up this session, this episode. Thanks for coming on. >> Thank you. >> Thank you. >> Thank you. >> So the topic is hype versus reality. So I think we're all on the reality is great, hype is great, but the reality's here. I want to get into it. Generative AI's got all the momentum, it's going mainstream, it's kind of come out of the behind the ropes, it's now mainstream. We saw the success of ChatGPT, opens up everyone's eyes, but there's so much more going on. Let's jump in and get your early perspectives on what should people be talking about right now? What are you guys working on? We'll start with AWS. What's the big focus right now for you guys as you come into this market that's highly active, highly hyped up, but people see value right out of the gate? >> You know, we have been working on generative AI for some time. In fact, last year we released Code Whisperer, which is about using generative AI for software development and a number of customers are using it and getting real value out of it. So generative AI is now something that's mainstream that can be used by enterprise users. And we have also been partnering with a number of other companies. So, you know, stability.ai, we've been partnering with them a lot. We want to be partnering with other companies as well. In seeing how we do three things, you know, first is providing the most efficient infrastructure for generative AI. And that is where, you know, things like Trainium, things like Inferentia, things like SageMaker come in. And then next is the set of models and then the third is the kind of applications like Code Whisperer and so on. So, you know, it's early days yet, but clearly there's a lot of amazing capabilities that will come out and something that, you know, our customers are starting to pay a lot of attention to. >> Tom, talk about your company and what your focus is and why the Amazon Web Services relationship's important for you? >> So yeah, we're primarily committed to making incredible open source foundation models and obviously stable effusions been our kind of first big model there, which we trained all on AWS. We've been working with them over the last year and a half to develop, obviously a big cluster, and bring all that compute to training these models at scale, which has been a really successful partnership. And we're excited to take it further this year as we develop commercial strategy of the business and build out, you know, the ability for enterprise customers to come and get all the value from these models that we think they can get. So we're really excited about the future. We got hugely exciting pipeline for this year with new modalities and video models and wonderful things and trying to solve images for once and for all and get the kind of general value and value proposition correct for customers. So it's a really exciting time and very honored to be part of it. >> It's great to see some of your customers doing so well out there. Congratulations to your team. Appreciate that. Aidan, let's get into what you guys do. What does Cohere do? What are you excited about right now? >> Yeah, so Cohere builds large language models, which are the backbone of applications like ChatGPT and GPT-3. We're extremely focused on solving the issues with adoption for enterprise. So it's great that you can make a super flashy demo for consumers, but it takes a lot to actually get it into billion user products and large global enterprises. So about six months ago, we released our command models, which are some of the best that exist for large language models. And in December, we released our multilingual text understanding models and that's on over a hundred different languages and it's trained on, you know, authentic data directly from native speakers. And so we're super excited to continue pushing this into enterprise and solving those barriers for adoption, making this transformation a reality. >> Just real quick, while I got you there on the new products coming out. Where are we in the progress? People see some of the new stuff out there right now. There's so much more headroom. Can you just scope out in your mind what that looks like? Like from a headroom standpoint? Okay, we see ChatGPT. "Oh yeah, it writes my papers for me, does some homework for me." I mean okay, yawn, maybe people say that, (Aidan chuckles) people excited or people are blown away. I mean, it's helped theCUBE out, it helps me, you know, feed up a little bit from my write-ups but it's not always perfect. >> Yeah, at the moment it's like a writing assistant, right? And it's still super early in the technologies trajectory. I think it's fascinating and it's interesting but its impact is still really limited. I think in the next year, like within the next eight months, we're going to see some major changes. You've already seen the very first hints of that with stuff like Bing Chat, where you augment these dialogue models with an external knowledge base. So now the models can be kept up to date to the millisecond, right? Because they can search the web and they can see events that happened a millisecond ago. But that's still limited in the sense that when you ask the question, what can these models actually do? Well they can just write text back at you. That's the extent of what they can do. And so the real project, the real effort, that I think we're all working towards is actually taking action. So what happens when you give these models the ability to use tools, to use APIs? What can they do when they can actually affect change out in the real world, beyond just streaming text back at the user? I think that's the really exciting piece. >> Okay, so I wanted to tee that up early in the segment 'cause I want to get into the customer applications. We're seeing early adopters come in, using the technology because they have a lot of data, they have a lot of large language model opportunities and then there's a big fast follower wave coming behind it. I call that the people who are going to jump in the pool early and get into it. They might not be advanced. Can you guys share what customer applications are being used with large language and vision models today and how they're using it to transform on the early adopter side, and how is that a tell sign of what's to come? >> You know, one of the things we have been seeing both with the text models that Aidan talked about as well as the vision models that stability.ai does, Tom, is customers are really using it to change the way you interact with information. You know, one example of a customer that we have, is someone who's kind of using that to query customer conversations and ask questions like, you know, "What was the customer issue? How did we solve it?" And trying to get those kinds of insights that was previously much harder to do. And then of course software is a big area. You know, generating software, making that, you know, just deploying it in production. Those have been really big areas that we have seen customers start to do. You know, looking at documentation, like instead of you know, searching for stuff and so on, you know, you just have an interactive way, in which you can just look at the documentation for a product. You know, all of this goes to where we need to take the technology. One of which is, you know, the models have to be there but they have to work reliably in a production setting at scale, with privacy, with security, and you know, making sure all of this is happening, is going to be really key. That is what, you know, we at AWS are looking to do, which is work with partners like stability and others and in the open source and really take all of these and make them available at scale to customers, where they work reliably. >> Tom, Aidan, what's your thoughts on this? Where are customers landing on this first use cases or set of low-hanging fruit use cases or applications? >> Yeah, so I think like the first group of adopters that really found product market fit were the copywriting companies. So one great example of that is HyperWrite. Another one is Jasper. And so for Cohere, that's the tip of the iceberg, like there's a very long tail of usage from a bunch of different applications. HyperWrite is one of our customers, they help beat writer's block by drafting blog posts, emails, and marketing copy. We also have a global audio streaming platform, which is using us the power of search engine that can comb through podcast transcripts, in a bunch of different languages. Then a global apparel brand, which is using us to transform how they interact with their customers through a virtual assistant, two dozen global news outlets who are using us for news summarization. So really like, these large language models, they can be deployed all over the place into every single industry sector, language is everywhere. It's hard to think of any company on Earth that doesn't use language. So it's, very, very- >> We're doing it right now. We got the language coming in. >> Exactly. >> We'll transcribe this puppy. All right. Tom, on your side, what do you see the- >> Yeah, we're seeing some amazing applications of it and you know, I guess that's partly been, because of the growth in the open source community and some of these applications have come from there that are then triggering this secondary wave of innovation, which is coming a lot from, you know, controllability and explainability of the model. But we've got companies like, you know, Jasper, which Aidan mentioned, who are using stable diffusion for image generation in block creation, content creation. We've got Lensa, you know, which exploded, and is built on top of stable diffusion for fine tuning so people can bring themselves and their pets and you know, everything into the models. So we've now got fine tuned stable diffusion at scale, which is democratized, you know, that process, which is really fun to see your Lensa, you know, exploded. You know, I think it was the largest growing app in the App Store at one point. And lots of other examples like NightCafe and Lexica and Playground. So seeing lots of cool applications. >> So much applications, we'll probably be a customer for all you guys. We'll definitely talk after. But the challenges are there for people adopting, they want to get into what you guys see as the challenges that turn into opportunities. How do you see the customers adopting generative AI applications? For example, we have massive amounts of transcripts, timed up to all the videos. I don't even know what to do. Do I just, do I code my API there. So, everyone has this problem, every vertical has these use cases. What are the challenges for people getting into this and adopting these applications? Is it figuring out what to do first? Or is it a technical setup? Do they stand up stuff, they just go to Amazon? What do you guys see as the challenges? >> I think, you know, the first thing is coming up with where you think you're going to reimagine your customer experience by using generative AI. You know, we talked about Ada, and Tom talked about a number of these ones and you know, you pick up one or two of these, to get that robust. And then once you have them, you know, we have models and we'll have more models on AWS, these large language models that Aidan was talking about. Then you go in and start using these models and testing them out and seeing whether they fit in use case or not. In many situations, like you said, John, our customers want to say, "You know, I know you've trained these models on a lot of publicly available data, but I want to be able to customize it for my use cases. Because, you know, there's some knowledge that I have created and I want to be able to use that." And then in many cases, and I think Aidan mentioned this. You know, you need these models to be up to date. Like you can't have it staying. And in those cases, you augmented with a knowledge base, you know you have to make sure that these models are not hallucinating. And so you need to be able to do the right kind of responsible AI checks. So, you know, you start with a particular use case, and there are a lot of them. Then, you know, you can come to AWS, and then look at one of the many models we have and you know, we are going to have more models for other modalities as well. And then, you know, play around with the models. We have a playground kind of thing where you can test these models on some data and then you can probably, you will probably want to bring your own data, customize it to your own needs, do some of the testing to make sure that the model is giving the right output and then just deploy it. And you know, we have a lot of tools. >> Yeah. >> To make this easy for our customers. >> How should people think about large language models? Because do they think about it as something that they tap into with their IP or their data? Or is it a large language model that they apply into their system? Is the interface that way? What's the interaction look like? >> In many situations, you can use these models out of the box. But in typical, in most of the other situations, you will want to customize it with your own data or with your own expectations. So the typical use case would be, you know, these are models are exposed through APIs. So the typical use case would be, you know you're using these APIs a little bit for testing and getting familiar and then there will be an API that will allow you to train this model further on your data. So you use that AI, you know, make sure you augmented the knowledge base. So then you use those APIs to customize the model and then just deploy it in an application. You know, like Tom was mentioning, a number of companies that are using these models. So once you have it, then you know, you again, use an endpoint API and use it in an application. >> All right, I love the example. I want to ask Tom and Aidan, because like most my experience with Amazon Web Service in 2007, I would stand up in EC2, put my code on there, play around, if it didn't work out, I'd shut it down. Is that a similar dynamic we're going to see with the machine learning where developers just kind of log in and stand up infrastructure and play around and then have a cloud-like experience? >> So I can go first. So I mean, we obviously, with AWS working really closely with the SageMaker team, do fantastic platform there for ML training and inference. And you know, going back to your point earlier, you know, where the data is, is hugely important for companies. Many companies bringing their models to their data in AWS on-premise for them is hugely important. Having the models to be, you know, open sources, makes them explainable and transparent to the adopters of those models. So, you know, we are really excited to work with the SageMaker team over the coming year to bring companies to that platform and make the most of our models. >> Aidan, what's your take on developers? Do they just need to have a team in place, if we want to interface with you guys? Let's say, can they start learning? What do they got to do to set up? >> Yeah, so I think for Cohere, our product makes it much, much easier to people, for people to get started and start building, it solves a lot of the productionization problems. But of course with SageMaker, like Tom was saying, I think that lowers a barrier even further because it solves problems like data privacy. So I want to underline what Bratin was saying earlier around when you're fine tuning or when you're using these models, you don't want your data being incorporated into someone else's model. You don't want it being used for training elsewhere. And so the ability to solve for enterprises, that data privacy and that security guarantee has been hugely important for Cohere, and that's very easy to do through SageMaker. >> Yeah. >> But the barriers for using this technology are coming down super quickly. And so for developers, it's just becoming completely intuitive. I love this, there's this quote from Andrej Karpathy. He was saying like, "It really wasn't on my 2022 list of things to happen that English would become, you know, the most popular programming language." And so the barrier is coming down- >> Yeah. >> Super quickly and it's exciting to see. >> It's going to be awesome for all the companies here, and then we'll do more, we're probably going to see explosion of startups, already seeing that, the maps, ecosystem maps, the landscape maps are happening. So this is happening and I'm convinced it's not yesterday's chat bot, it's not yesterday's AI Ops. It's a whole another ballgame. So I have to ask you guys for the final question before we kick off the company's showcasing here. How do you guys gauge success of generative AI applications? Is there a lens to look through and say, okay, how do I see success? It could be just getting a win or is it a bigger picture? Bratin we'll start with you. How do you gauge success for generative AI? >> You know, ultimately it's about bringing business value to our customers. And making sure that those customers are able to reimagine their experiences by using generative AI. Now the way to get their ease, of course to deploy those models in a safe, effective manner, and ensuring that all of the robustness and the security guarantees and the privacy guarantees are all there. And we want to make sure that this transitions from something that's great demos to actual at scale products, which means making them work reliably all of the time not just some of the time. >> Tom, what's your gauge for success? >> Look, I think this, we're seeing a completely new form of ways to interact with data, to make data intelligent, and directly to bring in new revenue streams into business. So if businesses can use our models to leverage that and generate completely new revenue streams and ultimately bring incredible new value to their customers, then that's fantastic. And we hope we can power that revolution. >> Aidan, what's your take? >> Yeah, reiterating Bratin and Tom's point, I think that value in the enterprise and value in market is like a huge, you know, it's the goal that we're striving towards. I also think that, you know, the value to consumers and actual users and the transformation of the surface area of technology to create experiences like ChatGPT that are magical and it's the first time in human history we've been able to talk to something compelling that's not a human. I think that in itself is just extraordinary and so exciting to see. >> It really brings up a whole another category of markets. B2B, B2C, it's B2D, business to developer. Because I think this is kind of the big trend the consumers have to win. The developers coding the apps, it's a whole another sea change. Reminds me everyone use the "Moneyball" movie as example during the big data wave. Then you know, the value of data. There's a scene in "Moneyball" at the end, where Billy Beane's getting the offer from the Red Sox, then the owner says to the Red Sox, "If every team's not rebuilding their teams based upon your model, there'll be dinosaurs." I think that's the same with AI here. Every company will have to need to think about their business model and how they operate with AI. So it'll be a great run. >> Completely Agree >> It'll be a great run. >> Yeah. >> Aidan, Tom, thank you so much for sharing about your experiences at your companies and congratulations on your success and it's just the beginning. And Bratin, thanks for coming on representing AWS. And thank you, appreciate for what you do. Thank you. >> Thank you, John. Thank you, Aidan. >> Thank you John. >> Thanks so much. >> Okay, let's kick off season three, episode one. I'm John Furrier, your host. Thanks for watching. (light airy music)

Published Date : Mar 9 2023

SUMMARY :

of the AWS Startup Showcase, of the behind the ropes, and something that, you know, and build out, you know, Aidan, let's get into what you guys do. and it's trained on, you know, it helps me, you know, the ability to use tools, to use APIs? I call that the people and you know, making sure the first group of adopters We got the language coming in. Tom, on your side, what do you see the- and you know, everything into the models. they want to get into what you guys see and you know, you pick for our customers. then you know, you again, All right, I love the example. and make the most of our models. And so the ability to And so the barrier is coming down- and it's exciting to see. So I have to ask you guys and ensuring that all of the robustness and directly to bring in new and it's the first time in human history the consumers have to win. and it's just the beginning. I'm John Furrier, your host.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Tom	PERSON	0.99+
Tom Mason	PERSON	0.99+
Aidan	PERSON	0.99+
Red Sox	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Andrej Karpathy	PERSON	0.99+
Bratin Saha	PERSON	0.99+
December	DATE	0.99+
2007	DATE	0.99+
John Furrier	PERSON	0.99+
Aidan Gomez	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Billy Beane	PERSON	0.99+
Bratin	PERSON	0.99+
Moneyball	TITLE	0.99+
one	QUANTITY	0.99+
Ada	PERSON	0.99+
last year	DATE	0.99+
two	QUANTITY	0.99+
Earth	LOCATION	0.99+
yesterday	DATE	0.99+
Two practitioners	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
ChatGPT	TITLE	0.99+
next year	DATE	0.99+
Code Whisperer	TITLE	0.99+
third	QUANTITY	0.99+
this year	DATE	0.99+
App Store	TITLE	0.99+
first time	QUANTITY	0.98+
first	QUANTITY	0.98+
Inferentia	TITLE	0.98+
EC2	TITLE	0.98+
GPT-3	TITLE	0.98+
both	QUANTITY	0.98+
Lensa	TITLE	0.98+
SageMaker	ORGANIZATION	0.98+
three things	QUANTITY	0.97+
Cohere	ORGANIZATION	0.96+
over a hundred different languages	QUANTITY	0.96+
English	OTHER	0.96+
one example	QUANTITY	0.96+
about six months ago	DATE	0.96+
One	QUANTITY	0.96+
first use	QUANTITY	0.96+
SageMaker	TITLE	0.96+
Bing Chat	TITLE	0.95+
one point	QUANTITY	0.95+
Trainium	TITLE	0.95+
Lexica	TITLE	0.94+
Playground	TITLE	0.94+
three great guests	QUANTITY	0.93+
HyperWrite	TITLE	0.92+

AI Meets the Supercloud | Supercloud2

(upbeat music) >> Okay, welcome back everyone at Supercloud 2 event, live here in Palo Alto, theCUBE Studios live stage performance, virtually syndicating it all over the world. I'm John Furrier with Dave Vellante here as Cube alumni, and special influencer guest, Howie Xu, VP of Machine Learning and Zscaler, also part-time as a CUBE analyst 'cause he is that good. Comes on all the time. You're basically a CUBE analyst as well. Thanks for coming on. >> Thanks for inviting me. >> John: Technically, you're not really a CUBE analyst, but you're kind of like a CUBE analyst. >> Happy New Year to everyone. >> Dave: Great to see you. >> Great to see you, Dave and John. >> John: We've been talking about ChatGPT online. You wrote a great post about it being more like Amazon, not like Google. >> Howie: More than just Google Search. >> More than Google Search. Oh, it's going to compete with Google Search, which it kind of does a little bit, but more its infrastructure. So a clever point, good segue into this conversation, because this is kind of the beginning of these kinds of next gen things we're going to see. Things where it's like an obvious next gen, it's getting real. Kind of like seeing the browser for the first time, Mosaic browser. Whoa, this internet thing's real. I think this is that moment and Supercloud like enablement is coming. So this has been a big part of the Supercloud kind of theme. >> Yeah, you talk about Supercloud, you talk about, you know, AI, ChatGPT. I really think the ChatGPT is really another Netscape moment, the browser moment. Because if you think about internet technology, right? It was brewing for 20 years before early 90s. Not until you had a, you know, browser, people realize, "Wow, this is how wonderful this technology could do." Right? You know, all the wonderful things. Then you have Yahoo and Amazon. I think we have brewing, you know, the AI technology for, you know, quite some time. Even then, you know, neural networks, deep learning. But not until ChatGPT came along, people realize, "Wow, you know, the user interface, user experience could be that great," right? So I really think, you know, if you look at the last 30 years, there is a browser moment, there is iPhone moment. I think ChatGPT moment is as big as those. >> Dave: What do you see as the intersection of things like ChatGPT and the Supercloud? Of course, the media's going to focus, journalists are going to focus on all the negatives and the privacy. Okay. You know we're going to get by that, right? Always do. Where do you see the Supercloud and sort of the distributed data fitting in with ChatGPT? Does it use that as a data source? What's the link? >> Howie: I think there are number of use cases. One of the use cases, we talked about why we even have Supercloud because of the complexity, because of the, you know, heterogeneous nature of different clouds. In order for me as a developer, in order for me to create applications, I have so many things to worry about, right? It's a complexity. But with ChatGPT, with the AI, I don't have to worry about it, right? Those kind of details will be taken care of by, you know, the underlying layer. So we have been talking about on this show, you know, over the last, what, year or so about the Supercloud, hey, defining that, you know, API layer spanning across, you know, multiple clouds. I think that will be happening. However, for a lot of the things, that will be more hidden, right? A lot of that will be automated by the bots. You know, we were just talking about it right before the show. One of the profound statement I heard from Adrian Cockcroft about 10 years ago was, "Hey Howie, you know, at Netflix, right? You know, IT is just one API call away." That's a profound statement I heard about a decade ago. I think next decade, right? You know, the IT is just one English language away, right? So when it's one English language away, it's no longer as important, API this, API that. You still need API just like hardware, right? You still need all of those things. That's going to be more hidden. The high level thing will be more, you know, English language or the language, right? Any language for that matter. >> Dave: And so through language, you'll tap services that live across the Supercloud, is what you're saying? >> Howie: You just tell what you want, what you desire, right? You know, the bots will help you to figure out where the complexity is, right? You know, like you said, a lot of criticism about, "Hey, ChatGPT doesn't do this, doesn't do that." But if you think about how to break things down, right? For instance, right, you know, ChatGPT doesn't have Microsoft stock price today, obviously, right? However, you can ask ChatGPT to write a program for you, retrieve the Microsoft stock price, (laughs) and then just run it, right? >> Dave: Yeah. >> So the thing to think about- >> John: It's only going to get better. It's only going to get better. >> The thing people kind of unfairly criticize ChatGPT is it doesn't do this. But can you not break down humans' task into smaller things and get complex things to be done by the ChatGPT? I think we are there already, you know- >> John: That to me is the real game changer. That's the assembly of atomic elements at the top of the stack, whether the interface is voice or some programmatic gesture based thing, you know, wave your hand or- >> Howie: One of the analogy I used in my blog was, you know, each person, each professional now is a quarterback. And we suddenly have, you know, a lot more linebacks or you know, any backs to work for you, right? For free even, right? You know, and then that's sort of, you should think about it. You are the quarterback of your day-to-day job, right? Your job is not to do everything manually yourself. >> Dave: You call the play- >> Yes. >> Dave: And they execute. Do your job. >> Yes, exactly. >> Yeah, all the players are there. All the elves are in the North Pole making the toys, Dave, as we say. But this is the thing, I want to get your point. This change is going to require a new kind of infrastructure software relationship, a new kind of operating runtime, a new kind of assembler, a new kind of loader link things. This very operating systems kind of concepts. >> Data intensive, right? How to process the data, how to, you know, process so gigantic data in parallel, right? That's actually a tough job, right? So if you think about ChatGPT, why OpenAI is ahead of the game, right? You know, Google may not want to acknowledge it, right? It's not necessarily they do, you know, not have enough data scientist, but the software engineering pieces, you know, behind it, right? To train the model, to actually do all those things in parallel, to do all those things in a cost effective way. So I think, you know, a lot of those still- >> Let me ask you a question. Let me ask you a question because we've had this conversation privately, but I want to do it while we're on stage here. Where are all the alpha geeks and developers and creators and entrepreneurs going to gravitate to? You know, in every wave, you see it in crypto, all the alphas went into crypto. Now I think with ChatGPT, you're going to start to see, like, "Wow, it's that moment." A lot of people are going to, you know, scrum and do startups. CTOs will invent stuff. There's a lot of invention, a lot of computer science and customer requirements to figure out. That's new. Where are the alpha entrepreneurs going to go to? What do you think they're going to gravitate to? If you could point to the next layer to enable this super environment, super app environment, Supercloud. 'Cause there's a lot to do to enable what you just said. >> Howie: Right. You know, if you think about using internet as the analogy, right? You know, in the early 90s, internet came along, browser came along. You had two kind of companies, right? One is Amazon, the other one is walmart.com. And then there were company, like maybe GE or whatnot, right? Really didn't take advantage of internet that much. I think, you know, for entrepreneurs, suddenly created the Yahoo, Amazon of the ChatGPT native era. That's what we should be all excited about. But for most of the Fortune 500 companies, your job is to surviving sort of the big revolution. So you at least need to do your walmart.com sooner than later, right? (laughs) So not be like GE, right? You know, hand waving, hey, I do a lot of the internet, but you know, when you look back last 20, 30 years, what did they do much with leveraging the- >> So you think they're going to jump in, they're going to build service companies or SaaS tech companies or Supercloud companies? >> Howie: Okay, so there are two type of opportunities from that perspective. One is, you know, the OpenAI ish kind of the companies, I think the OpenAI, the game is still open, right? You know, it's really Close AI today. (laughs) >> John: There's room for competition, you mean? >> There's room for competition, right. You know, you can still spend you know, 50, $100 million to build something interesting. You know, there are company like Cohere and so on and so on. There are a bunch of companies, I think there is that. And then there are companies who's going to leverage those sort of the new AI primitives. I think, you know, we have been talking about AI forever, but finally, finally, it's no longer just good, but also super useful. I think, you know, the time is now. >> John: And if you have the cloud behind you, what do you make the Amazon do differently? 'Cause Amazon Web Services is only going to grow with this. It's not going to get smaller. There's more horsepower to handle, there's more needs. >> Howie: Well, Microsoft already showed what's the future, right? You know, you know, yes, there is a kind of the container, you know, the serverless that will continue to grow. But the future is really not about- >> John: Microsoft's shown the future? >> Well, showing that, you know, working with OpenAI, right? >> Oh okay. >> They already said that, you know, we are going to have ChatGPT service. >> $10 billion, I think they're putting it. >> $10 billion putting, and also open up the Open API services, right? You know, I actually made a prediction that Microsoft future hinges on OpenAI. I think, you know- >> John: They believe that $10 billion bet. >> Dave: Yeah. $10 billion bet. So I want to ask you a question. It's somewhat academic, but it's relevant. For a number of years, it looked like having first mover advantage wasn't an advantage. PCs, spreadsheets, the browser, right? Social media, Friendster, right? Mobile. Apple wasn't first to mobile. But that's somewhat changed. The cloud, AWS was first. You could debate whether or not, but AWS okay, they have first mover advantage. Crypto, Bitcoin, first mover advantage. Do you think OpenAI will have first mover advantage? >> It certainly has its advantage today. I think it's year two. I mean, I think the game is still out there, right? You know, we're still in the first inning, early inning of the game. So I don't think that the game is over for the rest of the players, whether the big players or the OpenAI kind of the, sort of competitors. So one of the VCs actually asked me the other day, right? "Hey, how much money do I need to spend, invest, to get, you know, another shot to the OpenAI sort of the level?" You know, I did a- (laughs) >> Line up. >> That's classic VC. "How much does it cost me to replicate?" >> I'm pretty sure he asked the question to a bunch of guys, right? >> Good luck with that. (laughs) >> So we kind of did some napkin- >> What'd you come up with? (laughs) >> $100 million is the order of magnitude that I came up with, right? You know, not a billion, not 10 million, right? So 100 million. >> John: Hundreds of millions. >> Yeah, yeah, yeah. 100 million order of magnitude is what I came up with. You know, we can get into details, you know, in other sort of the time, but- >> Dave: That's actually not that much if you think about it. >> Howie: Exactly. So when he heard me articulating why is that, you know, he's thinking, right? You know, he actually, you know, asked me, "Hey, you know, there's this company. Do you happen to know this company? Can I reach out?" You know, those things. So I truly believe it's not a billion or 10 billion issue, it's more like 100. >> John: And also, your other point about referencing the internet revolution as a good comparable. The other thing there is online user population was a big driver of the growth of that. So what's the equivalent here for online user population for AI? Is it more apps, more users? I mean, we're still early on, it's first inning. >> Yeah. We're kind of the, you know- >> What's the key metric for success of this sector? Do you have a read on that? >> I think the, you know, the number of users is a good metrics, but I think it's going to be a lot of people are going to use AI services without even knowing they're using it, right? You know, I think a lot of the applications are being already built on top of OpenAI, and then they are kind of, you know, help people to do marketing, legal documents, you know, so they're already inherently OpenAI kind of the users already. So I think yeah. >> Well, Howie, we've got to wrap, but I really appreciate you coming on. I want to give you a last minute to wrap up here. In your experience, and you've seen many waves of innovation. You've even had your hands in a lot of the big waves past three inflection points. And obviously, machine learning you're doing now, you're deep end. Why is this Supercloud movement, this wave of Supercloud and the discussion of this next inflection point, why is it so important? For the folks watching, why should they be paying attention to this particular moment in time? Could you share your super clip on Supercloud? >> Howie: Right. So this is simple from my point of view. So why do you even have cloud to begin with, right? IT is too complex, too complex to operate or too expensive. So there's a newer model. There is a better model, right? Let someone else operate it, there is elasticity out of it, right? That's great. Until you have multiple vendors, right? Many vendors even, you know, we're talking about kind of how to make multiple vendors look like the same, but frankly speaking, even one vendor has, you know, thousand services. Now it's kind of getting, what Kid was talking about what, cloud chaos, right? It's the evolution. You know, the history repeats itself, right? You know, you have, you know, next great things and then too many great things, and then people need to sort of abstract this out. So it's almost that you must do this. But I think how to abstract this out is something that at this time, AI is going to help a lot, right? You know, like I mentioned, right? A lot of the abstraction, you don't have to think about API anymore. I bet 10 years from now, you know, IT is one language away, not API away. So think about that world, right? So Supercloud in, in my opinion, sure, you kind of abstract things out. You have, you know, consistent layers. But who's going to do that? Is that like we all agreed upon the model, agreed upon those APIs? Not necessary. There are certain, you know, truth in that, but there are other truths, let bots take care of, right? Whether you know, I want some X happens, whether it's going to be done by Azure, by AWS, by GCP, bots will figure out at a given time with certain contacts with your security requirement, posture requirement. I'll think that out. >> John: That's awesome. And you know, Dave, you and I have been talking about this. We think scale is the new ratification. If you have first mover advantage, I'll see the benefit, but scale is a huge thing. OpenAI, AWS. >> Howie: Yeah. Every day, we are using OpenAI. Today, we are labeling data for them. So you know, that's a little bit of the- (laughs) >> John: Yeah. >> First mover advantage that other people don't have, right? So it's kind of scary. So I'm very sure that Google is a little bit- (laughs) >> When we do our super AI event, you're definitely going to be keynoting. (laughs) >> Howie: I think, you know, we're talking about Supercloud, you know, before long, we are going to talk about super intelligent cloud. (laughs) >> I'm super excited, Howie, about this. Thanks for coming on. Great to see you, Howie Xu. Always a great analyst for us contributing to the community. VP of Machine Learning and Zscaler, industry legend and friend of theCUBE. Thanks for coming on and sharing really, really great advice and insight into what this next wave means. This Supercloud is the next wave. "If you're not on it, you're driftwood," says Pat Gelsinger. So you're going to see a lot more discussion. We'll be back more here live in Palo Alto after this short break. >> Thank you. (upbeat music)

Published Date : Feb 17 2023

SUMMARY :

it all over the world. but you're kind of like a CUBE analyst. Great to see you, You wrote a great post about Kind of like seeing the So I really think, you know, Of course, the media's going to focus, will be more, you know, You know, like you said, John: It's only going to get better. I think we are there already, you know- you know, wave your hand or- or you know, any backs Do your job. making the toys, Dave, as we say. So I think, you know, A lot of people are going to, you know, I think, you know, for entrepreneurs, One is, you know, the OpenAI I think, you know, the time is now. John: And if you have You know, you know, yes, They already said that, you know, $10 billion, I think I think, you know- that $10 billion bet. So I want to ask you a question. to get, you know, another "How much does it cost me to replicate?" Good luck with that. You know, not a billion, into details, you know, if you think about it. You know, he actually, you know, asked me, the internet revolution We're kind of the, you know- I think the, you know, in a lot of the big waves You have, you know, consistent layers. And you know, Dave, you and I So you know, that's a little bit of the- So it's kind of scary. to be keynoting. Howie: I think, you know, This Supercloud is the next wave. (upbeat music)

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
GE	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adrian Cockcroft	PERSON	0.99+
John Furrier	PERSON	0.99+
$10 billion	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
10 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
50	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Howie Xu	PERSON	0.99+
CUBE	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
100 million	QUANTITY	0.99+
Hundreds of millions	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
10 billion	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
North Pole	LOCATION	0.99+
next decade	DATE	0.99+
first	QUANTITY	0.99+
Cohere	ORGANIZATION	0.99+
first inning	QUANTITY	0.99+
100	QUANTITY	0.99+
Today	DATE	0.99+
Machine Learning	ORGANIZATION	0.99+
Supercloud 2	EVENT	0.99+
English	OTHER	0.98+
each person	QUANTITY	0.98+
two type	QUANTITY	0.98+
One	QUANTITY	0.98+
one	QUANTITY	0.98+
Zscaler	ORGANIZATION	0.98+
early 90s	DATE	0.97+
Howie	PERSON	0.97+
two kind	QUANTITY	0.97+
one vendor	QUANTITY	0.97+
one language	QUANTITY	0.97+
each professional	QUANTITY	0.97+

Oracle Announces MySQL HeatWave on AWS

>>Oracle continues to enhance my sequel Heatwave at a very rapid pace. The company is now in its fourth major release since the original announcement in December 2020. 1 of the main criticisms of my sequel, Heatwave, is that it only runs on O. C I. Oracle Cloud Infrastructure and as a lock in to Oracle's Cloud. Oracle recently announced that heat wave is now going to be available in AWS Cloud and it announced its intent to bring my sequel Heatwave to Azure. So my secret heatwave on AWS is a significant TAM expansion move for Oracle because of the momentum AWS Cloud continues to show. And evidently the Heatwave Engineering team has taken the development effort from O. C I. And is bringing that to A W S with a number of enhancements that we're gonna dig into today is senior vice president. My sequel Heatwave at Oracle is back with me on a cube conversation to discuss the latest heatwave news, and we're eager to hear any benchmarks relative to a W S or any others. Nippon has been leading the Heatwave engineering team for over 10 years and there's over 100 and 85 patents and database technology. Welcome back to the show and good to see you. >>Thank you. Very happy to be back. >>Now for those who might not have kept up with the news, uh, to kick things off, give us an overview of my sequel, Heatwave and its evolution. So far, >>so my sequel, Heat Wave, is a fully managed my secret database service offering from Oracle. Traditionally, my secret has been designed and optimised for transaction processing. So customers of my sequel then they had to run analytics or when they had to run machine learning, they would extract the data out of my sequel into some other database for doing. Unlike processing or machine learning processing my sequel, Heat provides all these capabilities built in to a single database service, which is my sequel. He'd fake So customers of my sequel don't need to move the data out with the same database. They can run transaction processing and predicts mixed workloads, machine learning, all with a very, very good performance in very good price performance. Furthermore, one of the design points of heat wave is is a scale out architecture, so the system continues to scale and performed very well, even when customers have very large late assignments. >>So we've seen some interesting moves by Oracle lately. The collaboration with Azure we've we've covered that pretty extensively. What was the impetus here for bringing my sequel Heatwave onto the AWS cloud? What were the drivers that you considered? >>So one of the observations is that a very large percentage of users of my sequel Heatwave, our AWS users who are migrating of Aurora or so already we see that a good percentage of my secret history of customers are migrating from GWS. However, there are some AWS customers who are still not able to migrate the O. C. I to my secret heat wave. And the reason is because of, um, exorbitant cost, which was charges. So in order to migrate the workload from AWS to go see, I digress. Charges are very high fees which becomes prohibitive for the customer or the second example we have seen is that the latency of practising a database which is outside of AWS is very high. So there's a class of customers who would like to get the benefits of my secret heatwave but were unable to do so and with this support of my secret trip inside of AWS, these customers can now get all the grease of the benefits of my secret he trip without having to pay the high fees or without having to suffer with the poorly agency, which is because of the ws architecture. >>Okay, so you're basically meeting the customer's where they are. So was this a straightforward lifted shift from from Oracle Cloud Infrastructure to AWS? >>No, it is not because one of the design girls we have with my sequel, Heatwave is that we want to provide our customers with the best price performance regardless of the cloud. So when we decided to offer my sequel, he headed west. Um, we have optimised my sequel Heatwave on it as well. So one of the things to point out is that this is a service with the data plane control plane and the console are natively running on AWS. And the benefits of doing so is that now we can optimise my sequel Heatwave for the E. W s architecture. In addition to that, we have also announced a bunch of new capabilities as a part of the service which will also be available to the my secret history of customers and our CI, But we just announced them and we're offering them as a part of my secret history of offering on AWS. >>So I just want to make sure I understand that it's not like you just wrapped your stack in a container and stuck it into a W s to be hosted. You're saying you're actually taking advantage of the capabilities of the AWS cloud natively? And I think you've made some other enhancements as well that you're alluding to. Can you maybe, uh, elucidate on those? Sure. >>So for status, um, we have taken the mind sequel Heatwave code and we have optimised for the It was infrastructure with its computer network. And as a result, customers get very good performance and price performance. Uh, with my secret he trade in AWS. That's one performance. Second thing is, we have designed new interactive counsel for the service, which means that customers can now provision there instances with the council. But in addition, they can also manage their schemas. They can. Then court is directly from the council. Autopilot is integrated. The council we have introduced performance monitoring, so a lot of capabilities which we have introduced as a part of the new counsel. The third thing is that we have added a bunch of new security features, uh, expose some of the security features which were part of the My Secret Enterprise edition as a part of the service, which gives customers now a choice of using these features to build more secure applications. And finally, we have extended my secret autopilot for a number of old gpus cases. In the past, my secret autopilot had a lot of capabilities for Benedict, and now we have augmented my secret autopilot to offer capabilities for elderly people. Includes as well. >>But there was something in your press release called Auto thread. Pooling says it provides higher and sustained throughput. High concerns concerns concurrency by determining Apple number of transactions, which should be executed. Uh, what is that all about? The auto thread pool? It seems pretty interesting. How does it affect performance? Can you help us understand that? >>Yes, and this is one of the capabilities of alluding to which we have added in my secret autopilot for transaction processing. So here is the basic idea. If you have a system where there's a large number of old EP transactions coming into it at a high degrees of concurrency in many of the existing systems of my sequel based systems, it can lead to a state where there are few transactions executing, but a bunch of them can get blocked with or a pilot tried pulling. What we basically do is we do workload aware admission control and what this does is it figures out, what's the right scheduling or all of these algorithms, so that either the transactions are executing or as soon as something frees up, they can start executing, so there's no transaction which is blocked. The advantage to the customer of this capability is twofold. A get significantly better throughput compared to service like Aurora at high levels of concurrency. So at high concurrency, for instance, uh, my secret because of this capability Uh oh, thread pulling offers up to 10 times higher compared to Aurora, that's one first benefit better throughput. The second advantage is that the true part of the system never drops, even at high levels of concurrency, whereas in the case of Aurora, the trooper goes up, but then, at high concurrency is, let's say, starting, uh, level of 500 or something. It depends upon the underlying shit they're using the troopers just dropping where it's with my secret heatwave. The truth will never drops. Now, the ramification for the customer is that if the truth is not gonna drop, the user can start off with a small shape, get the performance and be a show that even the workload increases. They will never get a performance, which is worse than what they're getting with lower levels of concurrency. So this let's leads to customers provisioning a shape which is just right for them. And if they need, they can, uh, go with the largest shape. But they don't like, you know, over pay. So those are the two benefits. Better performance and sustain, uh, regardless of the level of concurrency. >>So how do we quantify that? I know you've got some benchmarks. How can you share comparisons with other cloud databases especially interested in in Amazon's own databases are obviously very popular, and and are you publishing those again and get hub, as you have done in the past? Take us through the benchmarks. >>Sure, So benchmarks are important because that gives customers a sense of what performance to expect and what price performance to expect. So we have run a number of benchmarks. And yes, all these benchmarks are available on guitar for customers to take a look at. So we have performance results on all the three castle workloads, ol DB Analytics and Machine Learning. So let's start with the Rdp for Rdp and primarily because of the auto thread pulling feature. We show that for the IPCC for attended dataset at high levels of concurrency, heatwave offers up to 10 times better throughput and this performance is sustained, whereas in the case of Aurora, the performance really drops. So that's the first thing that, uh, tend to alibi. Sorry, 10 gigabytes. B B C c. I can come and see the performance are the throughput is 10 times better than Aurora for analytics. We have done a comparison of my secret heatwave in AWS and compared with Red Ship Snowflake Googled inquiry, we find that the price performance of my secret heatwave compared to read ship is seven times better. So my sequel, Heat Wave in AWS, provides seven times better price performance than red ship. That's a very, uh, interesting results to us. Which means that customers of Red Shift are really going to take the service seriously because they're gonna get seven times better price performance. And this is all running in a W s so compared. >>Okay, carry on. >>And then I was gonna say, compared to like, Snowflake, uh, in AWS offers 10 times better price performance. And compared to Google, ubiquity offers 12 times better price performance. And this is based on a four terabyte p PCH workload. Results are available on guitar, and then the third category is machine learning and for machine learning, uh, for training, the performance of my secret heatwave is 25 times faster compared to that shit. So all the three workloads we have benchmark's results, and all of these scripts are available on YouTube. >>Okay, so you're comparing, uh, my sequel Heatwave on AWS to Red Shift and snowflake on AWS. And you're comparing my sequel Heatwave on a W s too big query. Obviously running on on Google. Um, you know, one of the things Oracle is done in the past when you get the price performance and I've always tried to call fouls you're, like, double your price for running the oracle database. Uh, not Heatwave, but Oracle Database on a W s. And then you'll show how it's it's so much cheaper on on Oracle will be like Okay, come on. But they're not doing that here. You're basically taking my sequel Heatwave on a W s. I presume you're using the same pricing for whatever you see to whatever else you're using. Storage, um, reserved instances. That's apples to apples on A W s. And you have to obviously do some kind of mapping for for Google, for big query. Can you just verify that for me, >>we are being more than fair on two dimensions. The first thing is, when I'm talking about the price performance for analytics, right for, uh, with my secret heat rape, the cost I'm talking about from my secret heat rape is the cost of running transaction processing, analytics and machine learning. So it's a fully loaded cost for the case of my secret heatwave. There has been I'm talking about red ship when I'm talking about Snowflake. I'm just talking about the cost of these databases for running, and it's only it's not, including the source database, which may be more or some other database, right? So that's the first aspect that far, uh, trip. It's the cost for running all three kinds of workloads, whereas for the competition, it's only for running analytics. The second thing is that for these are those services whether it's like shit or snowflakes, That's right. We're talking about one year, fully paid up front cost, right? So that's what most of the customers would pay for. Many of the customers would pay that they will sign a one year contract and pay all the costs ahead of time because they get a discount. So we're using that price and the case of Snowflake. The costs were using is their standard edition of price, not the Enterprise edition price. So yes, uh, more than in this competitive. >>Yeah, I think that's an important point. I saw an analysis by Marx Tamer on Wiki Bond, where he was doing the TCO comparisons. And I mean, if you have to use two separate databases in two separate licences and you have to do et yelling and all the labour associated with that, that that's that's a big deal and you're not even including that aspect in in your comparison. So that's pretty impressive. To what do you attribute that? You know, given that unlike, oh, ci within the AWS cloud, you don't have as much control over the underlying hardware. >>So look hard, but is one aspect. Okay, so there are three things which give us this advantage. The first thing is, uh, we have designed hateful foreign scale out architecture. So we came up with new algorithms we have come up with, like, uh, one of the design points for heat wave is a massively partitioned architecture, which leads to a very high degree of parallelism. So that's a lot of hype. Each were built, So that's the first part. The second thing is that although we don't have control over the hardware, but the second design point for heat wave is that it is optimised for commodity cloud and the commodity infrastructure so we can have another guys, what to say? The computer we get, how much network bandwidth do we get? How much of, like objects to a brand that we get in here? W s. And we have tuned heat for that. That's the second point And the third thing is my secret autopilot, which provides machine learning based automation. So what it does is that has the users workload is running. It learns from it, it improves, uh, various premieres in the system. So the system keeps getting better as you learn more and more questions. And this is the third thing, uh, as a result of which we get a significant edge over the competition. >>Interesting. I mean, look, any I SV can go on any cloud and take advantage of it. And that's, uh I love it. We live in a new world. How about machine learning workloads? What? What did you see there in terms of performance and benchmarks? >>Right. So machine learning. We offer three capabilities training, which is fully automated, running in France and explanations. So one of the things which many of our customers told us coming from the enterprise is that explanations are very important to them because, uh, customers want to know that. Why did the the system, uh, choose a certain prediction? So we offer explanations for all models which have been derailed by. That's the first thing. Now, one of the interesting things about training is that training is usually the most expensive phase of machine learning. So we have spent a lot of time improving the performance of training. So we have a bunch of techniques which we have developed inside of Oracle to improve the training process. For instance, we have, uh, metal and proxy models, which really give us an advantage. We use adaptive sampling. We have, uh, invented in techniques for paralysing the hyper parameter search. So as a result of a lot of this work, our training is about 25 times faster than that ship them health and all the data is, uh, inside the database. All this processing is being done inside the database, so it's much faster. It is inside the database. And I want to point out that there is no additional charge for the history of customers because we're using the same cluster. You're not working in your service. So all of these machine learning capabilities are being offered at no additional charge inside the database and as a performance, which is significantly faster than that, >>are you taking advantage of or is there any, uh, need not need, but any advantage that you can get if two by exploiting things like gravity. John, we've talked about that a little bit in the past. Or trainee. Um, you just mentioned training so custom silicon that AWS is doing, you're taking advantage of that. Do you need to? Can you give us some insight >>there? So there are two things, right? We're always evaluating What are the choices we have from hybrid perspective? Obviously, for us to leverage is right and like all the things you mention about like we have considered them. But there are two things to consider. One is he is a memory system. So he favours a big is the dominant cost. The processor is a person of the cost, but memory is the dominant cost. So what we have evaluated and found is that the current shape which we are using is going to provide our customers with the best price performance. That's the first thing. The second thing is that there are opportunities at times when we can use a specialised processor for vaccinating the world for a bit. But then it becomes a matter of the cost of the customer. Advantage of our current architecture is on the same hardware. Customers are getting very good performance. Very good, energetic performance in a very good machine learning performance. If you will go with the specialised processor, it may. Actually, it's a machine learning, but then it's an additional cost with the customers we need to pay. So we are very sensitive to the customer's request, which is usually to provide very good performance at a very low cost. And we feel is that the current design we have as providing customers very good performance and very good price performance. >>So part of that is architectural. The memory intensive nature of of heat wave. The other is A W s pricing. If AWS pricing were to flip, it might make more sense for you to take advantage of something like like cranium. Okay, great. Thank you. And welcome back to the benchmarks benchmarks. Sometimes they're artificial right there. A car can go from 0 to 60 in two seconds. But I might not be able to experience that level of performance. Do you? Do you have any real world numbers from customers that have used my sequel Heatwave on A W s. And how they look at performance? >>Yes, absolutely so the my Secret service on the AWS. This has been in Vera for, like, since November, right? So we have a lot of customers who have tried the service. And what actually we have found is that many of these customers, um, planning to migrate from Aurora to my secret heat rape. And what they find is that the performance difference is actually much more pronounced than what I was talking about. Because with Aurora, the performance is actually much poorer compared to uh, like what I've talked about. So in some of these cases, the customers found improvement from 60 times, 240 times, right? So he travels 100 for 240 times faster. It was much less expensive. And the third thing, which is you know, a noteworthy is that customers don't need to change their applications. So if you ask the top three reasons why customers are migrating, it's because of this. No change to the application much faster, and it is cheaper. So in some cases, like Johnny Bites, what they found is that the performance of their applications for the complex storeys was about 60 to 90 times faster. Then we had 60 technologies. What they found is that the performance of heat we have compared to Aurora was 100 and 39 times faster. So, yes, we do have many such examples from real workloads from customers who have tried it. And all across what we find is if it offers better performance, lower cost and a single database such that it is compatible with all existing by sequel based applications and workloads. >>Really impressive. The analysts I talked to, they're all gaga over heatwave, and I can see why. Okay, last question. Maybe maybe two and one. Uh, what's next? In terms of new capabilities that customers are going to be able to leverage and any other clouds that you're thinking about? We talked about that upfront, but >>so in terms of the capabilities you have seen, like they have been, you know, non stop attending to the feedback from the customers in reacting to it. And also, we have been in a wedding like organically. So that's something which is gonna continue. So, yes, you can fully expect that people not dressed and continue to in a way and with respect to the other clouds. Yes, we are planning to support my sequel. He tripped on a show, and this is something that will be announced in the near future. Great. >>All right, Thank you. Really appreciate the the overview. Congratulations on the work. Really exciting news that you're moving my sequel Heatwave into other clouds. It's something that we've been expecting for some time. So it's great to see you guys, uh, making that move, and as always, great to have you on the Cube. >>Thank you for the opportunity. >>All right. And thank you for watching this special cube conversation. I'm Dave Volonte, and we'll see you next time.

Published Date : Sep 14 2022

SUMMARY :

The company is now in its fourth major release since the original announcement in December 2020. Very happy to be back. Now for those who might not have kept up with the news, uh, to kick things off, give us an overview of my So customers of my sequel then they had to run analytics or when they had to run machine So we've seen some interesting moves by Oracle lately. So one of the observations is that a very large percentage So was this a straightforward lifted shift from No, it is not because one of the design girls we have with my sequel, So I just want to make sure I understand that it's not like you just wrapped your stack in So for status, um, we have taken the mind sequel Heatwave code and we have optimised Can you help us understand that? So this let's leads to customers provisioning a shape which is So how do we quantify that? So that's the first thing that, So all the three workloads we That's apples to apples on A W s. And you have to obviously do some kind of So that's the first aspect And I mean, if you have to use two So the system keeps getting better as you learn more and What did you see there in terms of performance and benchmarks? So we have a bunch of techniques which we have developed inside of Oracle to improve the training need not need, but any advantage that you can get if two by exploiting We're always evaluating What are the choices we have So part of that is architectural. And the third thing, which is you know, a noteworthy is that In terms of new capabilities that customers are going to be able so in terms of the capabilities you have seen, like they have been, you know, non stop attending So it's great to see you guys, And thank you for watching this special cube conversation.

ENTITIES

Entity	Category	Confidence
Dave Volonte	PERSON	0.99+
December 2020	DATE	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
France	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
10 times	QUANTITY	0.99+
two things	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Heatwave	TITLE	0.99+
100	QUANTITY	0.99+
60 times	QUANTITY	0.99+
one year	QUANTITY	0.99+
12 times	QUANTITY	0.99+
GWS	ORGANIZATION	0.99+
60 technologies	QUANTITY	0.99+
first part	QUANTITY	0.99+
240 times	QUANTITY	0.99+
two separate licences	QUANTITY	0.99+
third category	QUANTITY	0.99+
second advantage	QUANTITY	0.99+
0	QUANTITY	0.99+
seven times	QUANTITY	0.99+
two seconds	QUANTITY	0.99+
two	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
seven times	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one	QUANTITY	0.99+
25 times	QUANTITY	0.99+
second point	QUANTITY	0.99+
November	DATE	0.99+
85 patents	QUANTITY	0.99+
second thing	QUANTITY	0.99+
Aurora	TITLE	0.99+
third thing	QUANTITY	0.99+
Each	QUANTITY	0.99+
second example	QUANTITY	0.99+
10 gigabytes	QUANTITY	0.99+
three things	QUANTITY	0.99+
One	QUANTITY	0.99+
two benefits	QUANTITY	0.99+
one aspect	QUANTITY	0.99+
first aspect	QUANTITY	0.98+
two separate databases	QUANTITY	0.98+
over 10 years	QUANTITY	0.98+
fourth major release	QUANTITY	0.98+
39 times	QUANTITY	0.98+
first thing	QUANTITY	0.98+
Heat Wave	TITLE	0.98+

Luis Ceze, OctoML | Amazon re:MARS 2022

(upbeat music) >> Welcome back, everyone, to theCUBE's coverage here live on the floor at AWS re:MARS 2022. I'm John Furrier, host for theCUBE. Great event, machine learning, automation, robotics, space, that's MARS. It's part of the re-series of events, re:Invent's the big event at the end of the year, re:Inforce, security, re:MARS, really intersection of the future of space, industrial, automation, which is very heavily DevOps machine learning, of course, machine learning, which is AI. We have Luis Ceze here, who's the CEO co-founder of OctoML. Welcome to theCUBE. >> Thank you very much for having me in the show, John. >> So we've been following you guys. You guys are a growing startup funded by Madrona Venture Capital, one of your backers. You guys are here at the show. This is a, I would say small show relative what it's going to be, but a lot of robotics, a lot of space, a lot of industrial kind of edge, but machine learning is the centerpiece of this trend. You guys are in the middle of it. Tell us your story. >> Absolutely, yeah. So our mission is to make machine learning sustainable and accessible to everyone. So I say sustainable because it means we're going to make it faster and more efficient. You know, use less human effort, and accessible to everyone, accessible to as many developers as possible, and also accessible in any device. So, we started from an open source project that began at University of Washington, where I'm a professor there. And several of the co-founders were PhD students there. We started with this open source project called Apache TVM that had actually contributions and collaborations from Amazon and a bunch of other big tech companies. And that allows you to get a machine learning model and run on any hardware, like run on CPUs, GPUs, various GPUs, accelerators, and so on. It was the kernel of our company and the project's been around for about six years or so. Company is about three years old. And we grew from Apache TVM into a whole platform that essentially supports any model on any hardware cloud and edge. >> So is the thesis that, when it first started, that you want to be agnostic on platform? >> Agnostic on hardware, that's right. >> Hardware, hardware. >> Yeah. >> What was it like back then? What kind of hardware were you talking about back then? Cause a lot's changed, certainly on the silicon side. >> Luis: Absolutely, yeah. >> So take me through the journey, 'cause I could see the progression. I'm connecting the dots here. >> So once upon a time, yeah, no... (both chuckling) >> I walked in the snow with my bare feet. >> You have to be careful because if you wake up the professor in me, then you're going to be here for two hours, you know. >> Fast forward. >> The average version here is that, clearly machine learning has shown to actually solve real interesting, high value problems. And where machine learning runs in the end, it becomes code that runs on different hardware, right? And when we started Apache TVM, which stands for tensor virtual machine, at that time it was just beginning to start using GPUs for machine learning, we already saw that, with a bunch of machine learning models popping up and CPUs and GPU's starting to be used for machine learning, it was clear that it come opportunity to run on everywhere. >> And GPU's were coming fast. >> GPUs were coming and huge diversity of CPUs, of GPU's and accelerators now, and the ecosystem and the system software that maps models to hardware is still very fragmented today. So hardware vendors have their own specific stacks. So Nvidia has its own software stack, and so does Intel, AMD. And honestly, I mean, I hope I'm not being, you know, too controversial here to say that it kind of of looks like the mainframe era. We had tight coupling between hardware and software. You know, if you bought IBM hardware, you had to buy IBM OS and IBM database, IBM applications, it all tightly coupled. And if you want to use IBM software, you had to buy IBM hardware. So that's kind of like what machine learning systems look like today. If you buy a certain big name GPU, you've got to use their software. Even if you use their software, which is pretty good, you have to buy their GPUs, right? So, but you know, we wanted to help peel away the model and the software infrastructure from the hardware to give people choice, ability to run the models where it best suit them. Right? So that includes picking the best instance in the cloud, that's going to give you the right, you know, cost properties, performance properties, or might want to run it on the edge. You might run it on an accelerator. >> What year was that roughly, when you were going this? >> We started that project in 2015, 2016 >> Yeah. So that was pre-conventional wisdom. I think TensorFlow wasn't even around yet. >> Luis: No, it wasn't. >> It was, I'm thinking like 2017 or so. >> Luis: Right. So that was the beginning of, okay, this is opportunity. AWS, I don't think they had released some of the nitro stuff that the Hamilton was working on. So, they were already kind of going that way. It's kind of like converging. >> Luis: Yeah. >> The space was happening, exploding. >> Right. And the way that was dealt with, and to this day, you know, to a large extent as well is by backing machine learning models with a bunch of hardware specific libraries. And we were some of the first ones to say, like, know what, let's take a compilation approach, take a model and compile it to very efficient code for that specific hardware. And what underpins all of that is using machine learning for machine learning code optimization. Right? But it was way back when. We can talk about where we are today. >> No, let's fast forward. >> That's the beginning of the open source project. >> But that was a fundamental belief, worldview there. I mean, you have a world real view that was logical when you compare to the mainframe, but not obvious to the machine learning community. Okay, good call, check. Now let's fast forward, okay. Evolution, we'll go through the speed of the years. More chips are coming, you got GPUs, and seeing what's going on in AWS. Wow! Now it's booming. Now I got unlimited processors, I got silicon on chips, I got, everywhere >> Yeah. And what's interesting is that the ecosystem got even more complex, in fact. Because now you have, there's a cross product between machine learning models, frameworks like TensorFlow, PyTorch, Keras, and like that and so on, and then hardware targets. So how do you navigate that? What we want here, our vision is to say, folks should focus, people should focus on making the machine learning models do what they want to do that solves a value, like solves a problem of high value to them. Right? So another deployment should be completely automatic. Today, it's very, very manual to a large extent. So once you're serious about deploying machine learning model, you got a good understanding where you're going to deploy it, how you're going to deploy it, and then, you know, pick out the right libraries and compilers, and we automated the whole thing in our platform. This is why you see the tagline, the booth is right there, like bringing DevOps agility for machine learning, because our mission is to make that fully transparent. >> Well, I think that, first of all, I use that line here, cause I'm looking at it here on live on camera. People can't see, but it's like, I use it on a couple couple of my interviews because the word agility is very interesting because that's kind of the test on any kind of approach these days. Agility could be, and I talked to the robotics guys, just having their product be more agile. I talked to Pepsi here just before you came on, they had this large scale data environment because they built an architecture, but that fostered agility. So again, this is an architectural concept, it's a systems' view of agility being the output, and removing dependencies, which I think what you guys were trying to do. >> Only part of what we do. Right? So agility means a bunch of things. First, you know-- >> Yeah explain. >> Today it takes a couple months to get a model from, when the model's ready, to production, why not turn that in two hours. Agile, literally, physically agile, in terms of walk off time. Right? And then the other thing is give you flexibility to choose where your model should run. So, in our deployment, between the demo and the platform expansion that we announced yesterday, you know, we give the ability of getting your model and, you know, get it compiled, get it optimized for any instance in the cloud and automatically move it around. Today, that's not the case. You have to pick one instance and that's what you do. And then you might auto scale with that one instance. So we give the agility of actually running and scaling the model the way you want, and the way it gives you the right SLAs. >> Yeah, I think Swami was mentioning that, not specifically that use case for you, but that use case generally, that scale being moving things around, making them faster, not having to do that integration work. >> Scale, and run the models where they need to run. Like some day you want to have a large scale deployment in the cloud. You're going to have models in the edge for various reasons because speed of light is limited. We cannot make lights faster. So, you know, got to have some, that's a physics there you cannot change. There's privacy reasons. You want to keep data locally, not send it around to run the model locally. So anyways, and giving the flexibility. >> Let me jump in real quick. I want to ask this specific question because you made me think of something. So we're just having a data mesh conversation. And one of the comments that's come out of a few of these data as code conversations is data's the product now. So if you can move data to the edge, which everyone's talking about, you know, why move data if you don't have to, but I can move a machine learning algorithm to the edge. Cause it's costly to move data. I can move computer, everyone knows that. But now I can move machine learning to anywhere else and not worry about integrating on the fly. So the model is the code. >> It is the product. >> Yeah. And since you said, the model is the code, okay, now we're talking even more here. So machine learning models today are not treated as code, by the way. So do not have any of the typical properties of code that you can, whenever you write a piece of code, you run a code, you don't know, you don't even think what is a CPU, we don't think where it runs, what kind of CPU it runs, what kind of instance it runs. But with machine learning model, you do. So what we are doing and created this fully transparent automated way of allowing you to treat your machine learning models if you were a regular function that you call and then a function could run anywhere. >> Yeah. >> Right. >> That's why-- >> That's better. >> Bringing DevOps agility-- >> That's better. >> Yeah. And you can use existing-- >> That's better, because I can run it on the Artemis too, in space. >> You could, yeah. >> If they have the hardware. (both laugh) >> And that allows you to run your existing, continue to use your existing DevOps infrastructure and your existing people. >> So I have to ask you, cause since you're a professor, this is like a masterclass on theCube. Thank you for coming on. Professor. (Luis laughing) I'm a hardware guy. I'm building hardware for Boston Dynamics, Spot, the dog, that's the diversity in hardware, it's tends to be purpose driven. I got a spaceship, I'm going to have hardware on there. >> Luis: Right. >> It's generally viewed in the community here, that everyone I talk to and other communities, open source is going to drive all software. That's a check. But the scale and integration is super important. And they're also recognizing that hardware is really about the software. And they even said on stage, here. Hardware is not about the hardware, it's about the software. So if you believe that to be true, then your model checks all the boxes. Are people getting this? >> I think they're starting to. Here is why, right. A lot of companies that were hardware first, that thought about software too late, aren't making it. Right? There's a large number of hardware companies, AI chip companies that aren't making it. Probably some of them that won't make it, unfortunately just because they started thinking about software too late. I'm so glad to see a lot of the early, I hope I'm not just doing our own horn here, but Apache TVM, the infrastructure that we built to map models to different hardware, it's very flexible. So we see a lot of emerging chip companies like SiMa.ai's been doing fantastic work, and they use Apache TVM to map algorithms to their hardware. And there's a bunch of others that are also using Apache TVM. That's because you have, you know, an opening infrastructure that keeps it up to date with all the machine learning frameworks and models and allows you to extend to the chips that you want. So these companies pay attention that early, gives them a much higher fighting chance, I'd say. >> Well, first of all, not only are you backable by the VCs cause you have pedigree, you're a professor, you're smart, and you get good recruiting-- >> Luis: I don't know about the smart part. >> And you get good recruiting for PhDs out of University of Washington, which is not too shabby computer science department. But they want to make money. The VCs want to make money. >> Right. >> So you have to make money. So what's the pitch? What's the business model? >> Yeah. Absolutely. >> Share us what you're thinking there. >> Yeah. The value of using our solution is shorter time to value for your model from months to hours. Second, you shrink operator, op-packs, because you don't need a specialized expensive team. Talk about expensive, expensive engineers who can understand machine learning hardware and software engineering to deploy models. You don't need those teams if you use this automated solution, right? Then you reduce that. And also, in the process of actually getting a model and getting specialized to the hardware, making hardware aware, we're talking about a very significant performance improvement that leads to lower cost of deployment in the cloud. We're talking about very significant reduction in costs in cloud deployment. And also enabling new applications on the edge that weren't possible before. It creates, you know, latent value opportunities. Right? So, that's the high level value pitch. But how do we make money? Well, we charge for access to the platform. Right? >> Usage. Consumption. >> Yeah, and value based. Yeah, so it's consumption and value based. So depends on the scale of the deployment. If you're going to deploy machine learning model at a larger scale, chances are that it produces a lot of value. So then we'll capture some of that value in our pricing scale. >> So, you have direct sales force then to work those deals. >> Exactly. >> Got it. How many customers do you have? Just curious. >> So we started, the SaaS platform just launched now. So we started onboarding customers. We've been building this for a while. We have a bunch of, you know, partners that we can talk about openly, like, you know, revenue generating partners, that's fair to say. We work closely with Qualcomm to enable Snapdragon on TVM and hence our platform. We're close with AMD as well, enabling AMD hardware on the platform. We've been working closely with two hyperscaler cloud providers that-- >> I wonder who they are. >> I don't know who they are, right. >> Both start with the letter A. >> And they're both here, right. What is that? >> They both start with the letter A. >> Oh, that's right. >> I won't give it away. (laughing) >> Don't give it away. >> One has three, one has four. (both laugh) >> I'm guessing, by the way. >> Then we have customers in the, actually, early customers have been using the platform from the beginning in the consumer electronics space, in Japan, you know, self driving car technology, as well. As well as some AI first companies that actually, whose core value, the core business come from AI models. >> So, serious, serious customers. They got deep tech chops. They're integrating, they see this as a strategic part of their architecture. >> That's what I call AI native, exactly. But now there's, we have several enterprise customers in line now, we've been talking to. Of course, because now we launched the platform, now we started onboarding and exploring how we're going to serve it to these customers. But it's pretty clear that our technology can solve a lot of other pain points right now. And we're going to work with them as early customers to go and refine them. >> So, do you sell to the little guys, like us? Will we be customers if we wanted to be? >> You could, absolutely, yeah. >> What we have to do, have machine learning folks on staff? >> So, here's what you're going to have to do. Since you can see the booth, others can't. No, but they can certainly, you can try our demo. >> OctoML. >> And you should look at the transparent AI app that's compiled and optimized with our flow, and deployed and built with our flow. That allows you to get your image and do style transfer. You know, you can get you and a pineapple and see how you look like with a pineapple texture. >> We got a lot of transcript and video data. >> Right. Yeah. Right, exactly. So, you can use that. Then there's a very clear-- >> But I could use it. You're not blocking me from using it. Everyone's, it's pretty much democratized. >> You can try the demo, and then you can request access to the platform. >> But you get a lot of more serious deeper customers. But you can serve anybody, what you're saying. >> Luis: We can serve anybody, yeah. >> All right, so what's the vision going forward? Let me ask this. When did people start getting the epiphany of removing the machine learning from the hardware? Was it recently, a couple years ago? >> Well, on the research side, we helped start that trend a while ago. I don't need to repeat that. But I think the vision that's important here, I want the audience here to take away is that, there's a lot of progress being made in creating machine learning models. So, there's fantastic tools to deal with training data, and creating the models, and so on. And now there's a bunch of models that can solve real problems there. The question is, how do you very easily integrate that into your intelligent applications? Madrona Venture Group has been very vocal and investing heavily in intelligent applications both and user applications as well as enablers. So we say an enable of that because it's so easy to use our flow to get a model integrated into your application. Now, any regular software developer can integrate that. And that's just the beginning, right? Because, you know, now we have CI/CD integration to keep your models up to date, to continue to integrate, and then there's more downstream support for other features that you normally have in regular software development. >> I've been thinking about this for a long, long, time. And I think this whole code, no one thinks about code. Like, I write code, I'm deploying it. I think this idea of machine learning as code independent of other dependencies is really amazing. It's so obvious now that you say it. What's the choices now? Let's just say that, I buy it, I love it, I'm using it. Now what do I got to do if I want to deploy it? Do I have to pick processors? Are there verified platforms that you support? Is there a short list? Is there every piece of hardware? >> We actually can help you. I hope we're not saying we can do everything in the world here, but we can help you with that. So, here's how. When you have them all in the platform you can actually see how this model runs on any instance of any cloud, by the way. So we support all the three major cloud providers. And then you can make decisions. For example, if you care about latency, your model has to run on, at most 50 milliseconds, because you're going to have interactivity. And then, after that, you don't care if it's faster. All you care is that, is it going to run cheap enough. So we can help you navigate. And also going to make it automatic. >> It's like tire kicking in the dealer showroom. >> Right. >> You can test everything out, you can see the simulation. Are they simulations, or are they real tests? >> Oh, no, we run all in real hardware. So, we have, as I said, we support any instances of any of the major clouds. We actually run on the cloud. But we also support a select number of edge devices today, like ARMs and Nvidia Jetsons. And we have the OctoML cloud, which is a bunch of racks with a bunch Raspberry Pis and Nvidia Jetsons, and very soon, a bunch of mobile phones there too that can actually run the real hardware, and validate it, and test it out, so you can see that your model runs performant and economically enough in the cloud. And it can run on the edge devices-- >> You're a machine learning as a service. Would that be an accurate? >> That's part of it, because we're not doing the machine learning model itself. You come with a model and we make it deployable and make it ready to deploy. So, here's why it's important. Let me try. There's a large number of really interesting companies that do API models, as in API as a service. You have an NLP model, you have computer vision models, where you call an API and then point in the cloud. You send an image and you got a description, for example. But it is using a third party. Now, if you want to have your model on your infrastructure but having the same convenience as an API you can use our service. So, today, chances are that, if you have a model that you know that you want to do, there might not be an API for it, we actually automatically create the API for you. >> Okay, so that's why I get the DevOps agility for machine learning is a better description. Cause it's not, you're not providing the service. You're providing the service of deploying it like DevOps infrastructure as code. You're now ML as code. >> It's your model, your API, your infrastructure, but all of the convenience of having it ready to go, fully automatic, hands off. >> Cause I think what's interesting about this is that it brings the craftsmanship back to machine learning. Cause it's a craft. I mean, let's face it. >> Yeah. I want human brains, which are very precious resources, to focus on building those models, that is going to solve business problems. I don't want these very smart human brains figuring out how to scrub this into actually getting run the right way. This should be automatic. That's why we use machine learning, for machine learning to solve that. >> Here's an idea for you. We should write a book called, The Lean Machine Learning. Cause the lean startup was all about DevOps. >> Luis: We call machine leaning. No, that's not it going to work. (laughs) >> Remember when iteration was the big mantra. Oh, yeah, iterate. You know, that was from DevOps. >> Yeah, that's right. >> This code allowed for standing up stuff fast, double down, we all know the history, what it turned out. That was a good value for developers. >> I could really agree. If you don't mind me building on that point. You know, something we see as OctoML, but we also see at Madrona as well. Seeing that there's a trend towards best in breed for each one of the stages of getting a model deployed. From the data aspect of creating the data, and then to the model creation aspect, to the model deployment, and even model monitoring. Right? We develop integrations with all the major pieces of the ecosystem, such that you can integrate, say with model monitoring to go and monitor how a model is doing. Just like you monitor how code is doing in deployment in the cloud. >> It's evolution. I think it's a great step. And again, I love the analogy to the mainstream. I lived during those days. I remember the monolithic propriety, and then, you know, OSI model kind of blew it. But that OSI stack never went full stack, and it only stopped at TCP/IP. So, I think the same thing's going on here. You see some scalability around it to try to uncouple it, free it. >> Absolutely. And sustainability and accessibility to make it run faster and make it run on any deice that you want by any developer. So, that's the tagline. >> Luis Ceze, thanks for coming on. Professor. >> Thank you. >> I didn't know you were a professor. That's great to have you on. It was a masterclass in DevOps agility for machine learning. Thanks for coming on. Appreciate it. >> Thank you very much. Thank you. >> Congratulations, again. All right. OctoML here on theCube. Really important. Uncoupling the machine learning from the hardware specifically. That's only going to make space faster and safer, and more reliable. And that's where the whole theme of re:MARS is. Let's see how they fit in. I'm John for theCube. Thanks for watching. More coverage after this short break. >> Luis: Thank you. (gentle music)

Published Date : Jun 24 2022

SUMMARY :

live on the floor at AWS re:MARS 2022. for having me in the show, John. but machine learning is the And that allows you to get certainly on the silicon side. 'cause I could see the progression. So once upon a time, yeah, no... because if you wake up learning runs in the end, that's going to give you the So that was pre-conventional wisdom. the Hamilton was working on. and to this day, you know, That's the beginning of that was logical when you is that the ecosystem because that's kind of the test First, you know-- and scaling the model the way you want, not having to do that integration work. Scale, and run the models So if you can move data to the edge, So do not have any of the typical And you can use existing-- the Artemis too, in space. If they have the hardware. And that allows you So I have to ask you, So if you believe that to be true, to the chips that you want. about the smart part. And you get good recruiting for PhDs So you have to make money. And also, in the process So depends on the scale of the deployment. So, you have direct sales How many customers do you have? We have a bunch of, you know, And they're both here, right. I won't give it away. One has three, one has four. in Japan, you know, self They're integrating, they see this as it to these customers. Since you can see the booth, others can't. and see how you look like We got a lot of So, you can use that. But I could use it. and then you can request But you can serve anybody, of removing the machine for other features that you normally have It's so obvious now that you say it. So we can help you navigate. in the dealer showroom. you can see the simulation. And it can run on the edge devices-- You're a machine learning as a service. know that you want to do, I get the DevOps agility but all of the convenience it brings the craftsmanship for machine learning to solve that. Cause the lean startup No, that's not it going to work. You know, that was from DevOps. double down, we all know the such that you can integrate, and then, you know, OSI on any deice that you Professor. That's great to have you on. Thank you very much. Uncoupling the machine learning Luis: Thank you.

ENTITIES

Entity	Category	Confidence
Luis Ceze	PERSON	0.99+
Qualcomm	ORGANIZATION	0.99+
Luis	PERSON	0.99+
2015	DATE	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Boston Dynamics	ORGANIZATION	0.99+
two hours	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
2017	DATE	0.99+
Japan	LOCATION	0.99+
Madrona Venture Capital	ORGANIZATION	0.99+
AMD	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
three	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
One	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
four	QUANTITY	0.99+
2016	DATE	0.99+
University of Washington	ORGANIZATION	0.99+
Today	DATE	0.99+
Pepsi	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
yesterday	DATE	0.99+
First	QUANTITY	0.99+
both	QUANTITY	0.99+
Second	QUANTITY	0.99+
today	DATE	0.99+
SiMa.ai	ORGANIZATION	0.99+
OctoML	TITLE	0.99+
OctoML	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.98+
one instance	QUANTITY	0.98+
DevOps	TITLE	0.98+
Madrona Venture Group	ORGANIZATION	0.98+
Swami	PERSON	0.98+
Madrona	ORGANIZATION	0.98+
about six years	QUANTITY	0.96+
Spot	ORGANIZATION	0.96+
The Lean Machine Learning	TITLE	0.95+
first	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
ARMs	ORGANIZATION	0.94+
pineapple	ORGANIZATION	0.94+
Raspberry Pis	ORGANIZATION	0.92+
TensorFlow	TITLE	0.89+
Snapdragon	ORGANIZATION	0.89+
about three years old	QUANTITY	0.89+
a couple years ago	DATE	0.88+
two hyperscaler cloud providers	QUANTITY	0.88+
first ones	QUANTITY	0.87+
one of	QUANTITY	0.85+
50 milliseconds	QUANTITY	0.83+
Apache TVM	ORGANIZATION	0.82+
both laugh	QUANTITY	0.82+
three major cloud providers	QUANTITY	0.81+

Chris Samuels, Slalom & Bethany Petryszak Mudd, Experience Design | Snowflake Summit 2022

(upbeat music) >> Good morning. Welcome back to theCUBE's continuing coverage of Snowflake Summit 22, live from Las Vegas. Lisa Martin, here with Dave Villante. We are at Caesar's Forum, having lots of great conversations. As I mentioned, this is just the start of day two, a tremendous amount of content yesterday. I'm coming at you today. Two guests join us from Slalom, now, we've got Chris Samuels, Principal Machine Learning, and Bethany Mudd, Senior Director, Experience Design. Welcome to theCube, guys. >> Hi, thanks for having us. >> Thank you. >> So, Slalom and Snowflake, over 200 joint customers, over 1,800 plus engagements, lots of synergies there, partnership. We're here today to talk about intelligent products. Talk to us about what- how do you define intelligent products, and then kind of break that down? >> Yeah, I can, I can start with the simple version, right? So, when we think about intelligent products, what they're doing, is they're doing more than they were explicitly programmed to do. So, instead of having a developer write all of these rules and have, "If this, then that," right, we're using data, and real time insights to make products that are more performing and improving over time. >> Chris: Yeah, it's really bringing together an ecosystem of a series of things to have integrated capabilities working together that themselves offer constant improvement, better understanding, better flexibility, and better usability, for everyone involved. >> Lisa: And there are four pillars of intelligent products that let's walk through those: technology, intelligence, experiences, and operations. >> Sure. So for technology, like most modern data architectures, it has sort of a data component and it has a modern cloud platform, but here, the key is is sort of things being disconnected, things being self contained, and decoupled, such that there's better integration time, better iteration time, more cross use, and more extensibility and scalability with the cloud native portion of that. >> And the intelligence piece? >> The intelligence piece is the data that's been processed by machine learning algorithms, or by predictive analytics that provides sort of the most valuable, or more- most insightful inferences, or conclusions. So, by bringing together again, the tech and the intelligence, that's, you know, sort of the, two of the pillars that begin to move forward that enable sort of the other two pillars, which are- >> Experiences and operations. >> Yeah. >> Perfect. >> And if we think about those, all of the technology, all of the intelligence in the world, doesn't mean anything if it doesn't actually work for people. Without use, there is no value. So, as we're designing these products, we want to make sure that they're supporting people. As we're automating, there are still people accountable for those tasks. There are still impacts to people in the real world. So, we want to make sure that we're doing that intentionally. So, we're building the greater good. >> Yeah. And from the operations perspective, it's you can think of traditional DevOps becoming MLOps, where there's an overall platform and a framework in place to manage not only the software components of it, but the overall workflow, and the data flow, and the model life cycle such that we have tools and people from different backgrounds and different teams developing and maintaining this than you would previously see with something like product engineering. >> Dave: Can you guys walk us through an example of how you work with a customer? I'm envisioning, you know, meeting with a lot of yellow stickies, and prioritization, and I don't know if that's how it works, but take us through like the start and the sequence. >> You have my heart, I am a workshop lover. Anytime you have the scratch off, like, lottery stickers on something, you know it's a good one. But, as we think about our approach, we typically start with either a discovery or mobilized phase. We're really, we're starting by gathering context, and really understanding the business, the client, the users, and that full path the value. Who are all the teams that are going to have to come together and start working together to deliver this intelligent product? And once we've got that context, we can start solutioning and ideating on that. But, really it comes down to making sure that we've earned the right, and we've got the smarts to move into the space intelligently. >> Yeah, and, truly, it's the intelligent product itself is sort of tied to the use case. The business knows what the most- what is potentially the most valuable here. And so, so by communicating and working and co-creating with the business, we can define then, okay, here are the use cases and here are where machine learning and the overall intelligent product can maybe add more disruptive value than others. By saying, let's pretend that, you know, maybe your ML model or your predictive analytics is like a dial that we could turn up to 11. Which one of those dials turning turned up to 11 could add the most value or disruption to your business? And therefore, you know, how can we prioritize and then work toward that pie-in-the-sky goal. >> Okay. So the client comes and says, "This is the outcome we want." Okay, and then you help them. You gather the right people, sort of extract all the little, you know, pieces of knowledge, and then help them prioritize so they can focus. And then what? >> Yeah. So, from there we're going to take the approach that seeing is solving. We want to make sure that we get the right voices in the room, and we've got the right alignment. So, we're going to map out everything. We're going to diagram what that experience is going to look like, how technology's going to play into it, all of the roles and actors involved. We're going to draw a map of the ecosystem that everyone can understand, whether you're in marketing, or the IT sort of area, once again, so we can get crisp on that outcome and how we're going to deliver it. And, from there, we start building out that roadmap and backlog, and we deliver iteratively. So, by not thinking of things as getting to the final product after a three year push, we really want to shrink those build, measure, and learn loops. So, we're getting all of that feedback and we're listening and evolving and growing the same way that our products are. >> Yeah. Something like an intelligent product is is pretty heady. So it's a pretty heavy concept to talk about. And so, the question becomes, "What is the outcome that ultimately needs to be achieved?" And then, who, from where in the business across the different potentially business product lines or business departments needs to be brought together? What data needs to be brought together? Such that the people can understand how they themselves can shape. The stakeholders can, how the product itself can be shaped. And therefore, what is the ultimate outcome, collectively, for everybody involved? 'Cause while your data might be fueling, you know, finances or someone else's intelligence and that kind of thing, bringing it all together allows for a more seamless product that might benefit more of the overall structure of the organization. >> Can you talk a little bit about how Slalom and Snowflake are enabling, like a customer example? A customer to take that data, flex that muscle, and create intelligent products that delight and surprise their customers? >> Chris: Yeah, so here's a great story. We worked to co-create with Kawasaki Heavy Industries. So, we created an intelligent product with them to enable safer rail travel, more preventative, more efficient, preventative maintenance, and a more efficient and real time track status feedback to the rail operators. So, in this case, we brought, yeah, the intelligent product itself was, "Okay, how do you create a better rail monitoring service?" And while that itself was the primary driver of the data, multiple other parts of the organization are using sort of the intelligent product as part of their now daily routine, whether it's from the preventative maintenance perspective, or it's from route usage, route prediction. Or, indeed, helping KHI move forward into making trains a more software centered set of products in the future. >> So, taking that example, I would imagine when you running- like I'm going to call that a project. I hope that's okay. So, when I'm running a project, that I would imagine that sometimes you run into, "Oh, wow. Okay." To really be successful at this, the company- project versus whole house. The company doesn't have the right data architecture, the right skills or the right, you know, data team. Now, is it as simple as, oh yeah, just put it all into Snowflake? I doubt it. So how do you, do you encounter that often? How do you deal with that? >> Bethany: It's a journey. So, I think it's really about making sure we're meeting clients where they are. And I think that's something that we actually do pretty well. So, as we think about delivery co-creation, and co-delivering is a huge part of our model. So, we want to make sure that we have the client teams, with us. So, as we start thinking about intelligent products, it can be incorporating a small feature, with subscription based services. It doesn't have to be creating your own model and sort of going deep. It really does come down to like what value do you want to get out of this? Right? >> Yeah. It is important that it is a journey, right? So, it doesn't have to be okay, there's a big bang applied to you and your company's tech industry or tech ecosystem. You can just start by saying, "Okay, how will I bring my data together at a data lake? How do I see across my different pillars of excellence in my own business?" And then, "How do I manage, potentially, this in an overall MLOps platform such that it can be sustainable and gather more insights and improve itself with time, and therefore be more impactful to the ultimate users of the tool?" 'Cause again, as Bethany said that without use, these things are just tools on the shelf somewhere that have little value. >> So, it's a journey, as you both said, completely agree with that. It's a journey that's getting faster and faster. Because, I mean, we've seen so much acceleration in the last couple of the years, the consumer demands have massively changed. >> Bethany: Absolutely. >> In every industry, how do Slalom and Snowflake come together to help businesses define the journey, but also accelerate it, so that they can stay ahead or get ahead of the competition? >> Yeah. So, one thing I think is interesting about the technology field right now is I feel like we're at the point where it's not the technology or the tools that's limiting us or, you know, constraining what we can build, it's our imaginations. Right? And, when I think about intelligent products and all of the things that are capable, that you can achieve with AI and ML, that's not widely known. There's so much tech jargon. And, we put all of those statistical words on it, and you know the things you don't know. And, instead, really, what we're doing is we're providing different ways to learn and grow. So, I think if we can demystify and humanize some of that language, I really would love to see all of these companies better understand the crayons and the tools in their toolbox. >> Speaking from a creative perspective, I love it. >> No, And I'll do the tech nerd bit. So, there is- you're right. There is a portion where you need to bring data together, and tech together, and that kind of thing. So, something like Snowflake is a great enabler for how to actually bring the data of multiple parts of an organization together into, you know, a data warehouse, or a data lake, and then be able to manage that sort of in an MLOps platform, particularly with some of the press that Snowflake has put out this week. Things becoming more Python-native, allowing for more ML experimentation, and some more native insights on the platform, rather than going off Snowflake platform to do some of that kind of thing. Makes Snowflake an incredibly valuable portion of the data management and of the tech and of the engineering of the overall product. >> So, I agree, Bethany, lack of imagination sometimes is the barrier we get so down into the weeds, but there's also lack of skills, as mentioned the organizational, you know, structural issues, politics, you know, whatever it is, you know, specific agendas, how do you guys help with that? Can, will you bring in, you know, resources to help and fill gaps? >> Yeah, so we will bring in a cross-disciplinary team of experts. So, you will see an experienced designer, as well as your ML architects, as well as other technical architects, and what we call solution owners, because we want to make sure that we've got a lot of perspectives, so we can see that problem from a lot of different angles. The other thing that we're bringing in is a repeatable process, a repeatable engineering methodology, which, when you zoom out, and you look at it, it doesn't seem like that big of a deal. But, what we're doing, is we're training against it. We're building tools, we're building templates, we're re-imagining what our deliverables look like for intelligent products, just so, we're not only speeding up the development and getting to those outcomes faster, but we're also continuing to grow and we can gift those things to our clients, and help support them as well. >> And not only that, what we do at Slalom is we want to think about transition from the beginning. And so, by having all the stakeholders in the room from the earliest point, both the business stakeholders, the technical stakeholders, if they have data scientists, if they have engineers, who's going to be taking this and maintaining this intelligent product long after we're gone, because again, we will transition, and someone else will be taking over the maintenance of this team. One, they will understand, you know, early from beginning the path that it is on, and be more capable of maintaining this, and two, understand sort of the ethical concerns behind, okay, here's how parts of your system affect this other parts of the system. And, you know, sometimes ML gets some bad press because it's misapplied, or there are concerns, or models or data are used outside of context. And there's some, you know, there are potentially some ill effects to be had. By bringing those people together much earlier, it allows for the business to truly understand and the stakeholders to ask the questions that they- that need to be continually asked to evaluate, is this the right thing to do? How do I, how does my part affect the whole? And, how do I have an overall impact that is in a positive way and is something, you know, truly being done most effectively. >> So, that's that knowledge transfer. I hesitate to even say that because it makes it sound so black and white, because you're co-creating here. But, essentially, you're, you know, to use the the cliche, you're teaching them how to fish. Not, you know, going to ongoing, you know, do the fishing for them, so. >> Lisa: That thought diversity is so critical, as is the internal alignment. Last question for you guys, before we wrap here, where can customers go to get started? Do they engage Slalom, Snowflake? Can they do both? >> Chris: You definitely can. We can come through. I mean, we're fortunate that snowflake has blessed us with the title of partner of the year again for the fifth time. >> Lisa: Congratulations. >> Thank you, thank you. We are incredibly humbled in that. So, we would do a lot of work with Snowflake. You could certainly come to Slalom, any one of our local markets, or build or emerge. We'll definitely work together. We'll figure out what the right team is. We'll have lots and lots of conversations, because it is most important for you as a set of business stakeholders to define what is right for you and what you need. >> Yeah. Good stuff, you guys, thank you so much for joining Dave and me, talking about intelligent products, what they are, how you co-design them, and the impact that data can make with customers if they really bring the right minds together and get creative. We appreciate your insights and your thoughts. >> Thank you. >> Thanks for having us guys. Yeah. >> All right. For Dave Villante, I am Lisa Martin. You're watching theCUBE's coverage, day two, Snowflake Summit 22, from Las Vegas. We'll be right back with our next guest. (upbeat music)

Published Date : Jun 15 2022

SUMMARY :

just the start of day two, So, Slalom and Snowflake, and improving over time. and better usability, of intelligent products that and decoupled, such that and the intelligence, that's, all of the technology, all of and the data flow, the start and the sequence. and that full path the value. and the overall intelligent product sort of extract all the little, you know, all of the roles and actors involved. Such that the people can understand the intelligent product itself was, the right skills or the that we have the client teams, with us. there's a big bang applied to you in the last couple of the years, and all of the things that are capable, Speaking from a creative and of the engineering and getting to those outcomes faster, and the stakeholders to ask the questions do the fishing for them, so. as is the internal alignment. the title of partner of the to define what is right and the impact that data Thanks for having us guys. We'll be right back with our next guest.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Chris	PERSON	0.99+
Dave Villante	PERSON	0.99+
Bethany	PERSON	0.99+
Lisa	PERSON	0.99+
Chris Samuels	PERSON	0.99+
Kawasaki Heavy Industries	ORGANIZATION	0.99+
Bethany Mudd	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Two guests	QUANTITY	0.99+
two pillars	QUANTITY	0.99+
Slalom	ORGANIZATION	0.99+
three year	QUANTITY	0.99+
KHI	ORGANIZATION	0.99+
today	DATE	0.99+
fifth time	QUANTITY	0.99+
Bethany Petryszak Mudd	PERSON	0.99+
both	QUANTITY	0.98+
Python	TITLE	0.98+
Snowflake	ORGANIZATION	0.98+
two	QUANTITY	0.98+
Snowflake Summit 22	EVENT	0.98+
yesterday	DATE	0.98+
over 200 joint customers	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
day two	QUANTITY	0.97+
theCube	ORGANIZATION	0.97+
this week	DATE	0.96+
Snowflake Summit 2022	EVENT	0.96+
one thing	QUANTITY	0.96+
Snowflake	TITLE	0.95+
One	QUANTITY	0.94+
over 1,800 plus engagements	QUANTITY	0.93+
Slalom	PERSON	0.92+
one	QUANTITY	0.83+
Slalom	TITLE	0.83+
four	QUANTITY	0.82+
11	QUANTITY	0.78+
up	QUANTITY	0.76+
two of the pillars	QUANTITY	0.7+
Machine Learning	ORGANIZATION	0.68+
Caesar's Forum	LOCATION	0.6+
last couple of	DATE	0.56+
years	QUANTITY	0.42+

JG Chirapurath, Microsoft CLEAN

>> Okay, we're now going to explore the vision of the future of cloud computing from the perspective of one of the leaders in the field, JG Chirapurath is the Vice President of Azure Data AI and Edge at Microsoft. JG, welcome to theCUBE on Cloud, thanks so much for participating. >> Well, thank you, Dave. And it's a real pleasure to be here with you and just want to welcome the audience as well. >> Well, JG, judging from your title, we have a lot of ground to cover and our audience is definitely interested in all the topics that are implied there. So let's get right into it. We've said many times in theCUBE that the new innovation cocktail comprises machine intelligence or AI applied to troves of data with the scale of the cloud. It's no longer we're driven by Moore's law. It's really those three factors and those ingredients are going to power the next wave of value creation in the economy. So first, do you buy into that premise? >> Yes, absolutely. We do buy into it and I think one of the reasons why we put data analytics and AI together, is because all of that really begins with the collection of data and managing it and governing it, unlocking analytics in it. And we tend to see things like AI, the value creation that comes from AI as being on that continuum of having started off with really things like analytics and proceeding to be machine learning and the use of data in interesting ways. >> Yes, I'd like to get some more thoughts around data and how you see the future of data and the role of cloud and maybe how Microsoft strategy fits in there. I mean, your portfolio, you've got SQL Server, Azure SQL, you got Arc which is kind of Azure everywhere for people that aren't familiar with that you got Synapse which course does all the integration, the data warehouse and it gets things ready for BI and consumption by the business and the whole data pipeline. And then all the other services, Azure Databricks, you got you got Cosmos in there, you got Blockchain, you've got Open Source services like PostgreSQL and MySQL. So lots of choices there. And I'm wondering, how do you think about the future of cloud data platforms? It looks like your strategy is right tool for the right job. Is that fair? >> It is fair, but it's also just to step back and look at it. It's fundamentally what we see in this market today, is that customers they seek really a comprehensive proposition. And when I say a comprehensive proposition it is sometimes not just about saying that, "Hey, listen "we know you're a sequence of a company, "we absolutely trust that you have the best "Azure SQL database in the cloud. "But tell us more." We've got data that is sitting in Hadoop systems. We've got data that is sitting in PostgreSQL, in things like MongoDB. So that open source proposition today in data and data management and database management has become front and center. So our real sort of push there is when it comes to migration management modernization of data to present the broadest possible choice to our customers, so we can meet them where they are. However, when it comes to analytics, one of the things they ask for is give us lot more convergence use. It really, it isn't about having 50 different services. It's really about having that one comprehensive service that is converged. That's where things like Synapse fits in where you can just land any kind of data in the lake and then use any compute engine on top of it to drive insights from it. So fundamentally, it is that flexibility that we really sort of focus on to meet our customers where they are. And really not pushing our dogma and our beliefs on it but to meet our customers according to the way they've deployed stuff like this. >> So that's great. I want to stick on this for a minute because when I have guests on like yourself they never want to talk about the competition but that's all we ever talk about. And that's all your customers ever talk about. Because the counter to that right tool for the right job and that I would say is really kind of Amazon's approach is that you got the single unified data platform, the mega database. So it does it all. And that's kind of Oracle's approach. It sounds like you want to have your cake and eat it too. So you got the right tool with the right job approach but you've got an integration layer that allows you to have that converged database. I wonder if you could add color to that and confirm or deny what I just said. >> No, that's a very fair observation but I'd say there's a nuance in what I sort of described. When it comes to data management, when it comes to apps, we have then customers with the broadest choice. Even in that perspective, we also offer convergence. So case in point, when you think about cosmos DB under that one sort of service, you get multiple engines but with the same properties. Right, global distribution, the five nines availability. It gives customers the ability to basically choose when they have to build that new cloud native app to adopt cosmos DB and adopt it in a way that is an choose an engine that is most flexible to them. However, when it comes to say, writing a SequenceServer for example, if modernizing it, you want sometimes, you just want to lift and shift it into things like IS. In other cases, you want to completely rewrite it. So you need to have the flexibility of choice there that is presented by a legacy of what sits on premises. When you move into things like analytics, we absolutely believe in convergence. So we don't believe that look, you need to have a relational data warehouse that is separate from a Hadoop system that is separate from say a BI system that is just, it's a bolt-on. For us, we love the proposition of really building things that are so integrated that once you land data, once you prep it inside the Lake you can use it for analytics, you can use it for BI, you can use it for machine learning. So I think, our sort of differentiated approach speaks for itself there. >> Well, that's interesting because essentially again you're not saying it's an either or, and you see a lot of that in the marketplace. You got some companies you say, "No, it's the data lake." And others say "No, no, put it in the data warehouse." And that causes confusion and complexity around the data pipeline and a lot of cutting. And I'd love to get your thoughts on this. A lot of customers struggle to get value out of data and specifically data product builders are frustrated that it takes them too long to go from, this idea of, hey, I have an idea for a data service and it can drive monetization, but to get there you got to go through this complex data life cycle and pipeline and beg people to add new data sources and do you feel like we have to rethink the way that we approach data architecture? >> Look, I think we do in the cloud. And I think what's happening today and I think the place where I see the most amount of rethink and the most amount of push from our customers to really rethink is the area of analytics and AI. It's almost as if what worked in the past will not work going forward. So when you think about analytics only in the enterprise today, you have relational systems, you have Hadoop systems, you've got data marts, you've got data warehouses you've got enterprise data warehouse. So those large honking databases that you use to close your books with. But when you start to modernize it, what people are saying is that we don't want to simply take all of that complexity that we've built over, say three, four decades and simply migrate it en masse exactly as they are into the cloud. What they really want is a completely different way of looking at things. And I think this is where services like Synapse completely provide a differentiated proposition to our customers. What we say there is land the data in any way you see, shape or form inside the lake. Once you landed inside the lake, you can essentially use a Synapse Studio to prep it in the way that you like. Use any compute engine of your choice and operate on this data in any way that you see fit. So case in point, if you want to hydrate a relational data warehouse, you can do so. If you want to do ad hoc analytics using something like Spark, you can do so. If you want to invoke Power BI on that data or BI on that data, you can do so. If you want to bring in a machine learning model on this prep data, you can do so. So inherently, so when customers buy into this proposition, what it solves for them and what it gives to them is complete simplicity. One way to land the data multiple ways to use it. And it's all integrated. >> So should we think of Synapse as an abstraction layer that abstracts away the complexity of the underlying technology? Is that a fair way to think about it? >> Yeah, you can think of it that way. It abstracts away Dave, a couple of things. It takes away that type of data. Sort of complexities related to the type of data. It takes away the complexity related to the size of data. It takes away the complexity related to creating pipelines around all these different types of data. And fundamentally puts it in a place where it can be now consumed by any sort of entity inside the Azure proposition. And by that token, even Databricks. You can in fact use Databricks in sort of an integrated way with the Azure Synapse >> Right, well, so that leads me to this notion of and I wonder if you buy into it. So my inference is that a data warehouse or a data lake could just be a node inside of a global data mesh. And then it's Synapse is sort of managing that technology on top. Do you buy into that? That global data mesh concept? >> We do and we actually do see our customers using Synapse and the value proposition that it brings together in that way. Now it's not where they start, oftentimes when a customer comes and says, "Look, I've got an enterprise data warehouse, "I want to migrate it." Or "I have a Hadoop system, I want to migrate it." But from there, the evolution is absolutely interesting to see. I'll give you an example. One of the customers that we're very proud of is FedEx. And what FedEx is doing is it's completely re-imagining its logistics system. That basically the system that delivers, what is it? The 3 million packages a day. And in doing so, in this COVID times, with the view of basically delivering on COVID vaccines. One of the ways they're doing it, is basically using Synapse. Synapse is essentially that analytic hub where they can get complete view into the logistic processes, way things are moving, understand things like delays and really put all of that together in a way that they can essentially get our packages and these vaccines delivered as quickly as possible. Another example, it's one of my favorite. We see once customers buy into it, they essentially can do other things with it. So an example of this is really my favorite story is Peace Parks initiative. It is the premier of white rhino conservancy in the world. They essentially are using data that has landed in Azure, images in particular to basically use drones over the vast area that they patrol and use machine learning on this data to really figure out where is an issue and where there isn't an issue. So that this part with about 200 radios can scramble surgically versus having to range across the vast area that they cover. So, what you see here is, the importance is really getting your data in order, landing consistently whatever the kind of data it is, build the right pipelines, and then the possibilities of transformation are just endless. >> Yeah, that's very nice how you worked in some of the customer examples and I appreciate that. I want to ask you though that some people might say that putting in that layer while you clearly add simplification and is I think a great thing that there begins over time to be a gap, if you will, between the ability of that layer to integrate all the primitives and all the piece parts, and that you lose some of that fine grain control and it slows you down. What would you say to that? >> Look, I think that's what we excel at and that's what we completely sort of buy into. And it's our job to basically provide that level of integration and that granularity in the way that it's an art. I absolutely admit it's an art. There are areas where people crave simplicity and not a lot of sort of knobs and dials and things like that. But there are areas where customers want flexibility. And so I think just to give you an example of both of them, in landing the data, in consistency in building pipelines, they want simplicity. They don't want complexity. They don't want 50 different places to do this. There's one way to do it. When it comes to computing and reducing this data, analyzing this data, they want flexibility. This is one of the reasons why we say, "Hey, listen you want to use Databricks. "If you're buying into that proposition. "And you're absolutely happy with them, "you can plug it into it." You want to use BI and essentially do a small data model, you can use BI. If you say that, "Look, I've landed into the lake, "I really only want to use ML." Bring in your ML models and party on. So that's where the flexibility comes in. So that's sort of that we sort of think about it. >> Well, I like the strategy because one of our guests, Jumark Dehghani is I think one of the foremost thinkers on this notion of of the data mesh And her premise is that the data builders, data product and service builders are frustrated because the big data system is generic to context. There's no context in there. But by having context in the big data architecture and system you can get products to market much, much, much faster. So, and that seems to be your philosophy but I'm going to jump ahead to my ecosystem question. You've mentioned Databricks a couple of times. There's another partner that you have, which is Snowflake. They're kind of trying to build out their own DataCloud, if you will and GlobalMesh, and the one hand they're a partner on the other hand they're a competitor. How do you sort of balance and square that circle? >> Look, when I see Snowflake, I actually see a partner. When we see essentially we are when you think about Azure now this is where I sort of step back and look at Azure as a whole. And in Azure as a whole, companies like Snowflake are vital in our ecosystem. I mean, there are places we compete, but effectively by helping them build the best Snowflake service on Azure, we essentially are able to differentiate and offer a differentiated value proposition compared to say a Google or an AWS. In fact, that's been our approach with Databricks as well. Where they are effectively on multiple clouds and our opportunity with Databricks is to essentially integrate them in a way where we offer the best experience the best integrations on Azure Berna. That's always been our focus. >> Yeah, it's hard to argue with the strategy or data with our data partner and ETR shows Microsoft is both pervasive and impressively having a lot of momentum spending velocity within the budget cycles. I want to come back to AI a little bit. It's obviously one of the fastest growing areas in our survey data. As I said, clearly Microsoft is a leader in this space. What's your vision of the future of machine intelligence and how Microsoft will participate in that opportunity? >> Yeah, so fundamentally, we've built on decades of research around essentially vision, speech and language. That's been the three core building blocks and for a really focused period of time, we focused on essentially ensuring human parity. So if you ever wonder what the keys to the kingdom are, it's the boat we built in ensuring that the research or posture that we've taken there. What we've then done is essentially a couple of things. We've focused on essentially looking at the spectrum that is AI. Both from saying that, "Hey, listen, "it's got to work for data analysts." We're looking to basically use machine learning techniques to developers who are essentially, coding and building machine learning models from scratch. So for that select proposition manifest to us as really AI focused on all skill levels. The other core thing we've done is that we've also said, "Look, it'll only work as long "as people trust their data "and they can trust their AI models." So there's a tremendous body of work and research we do and things like responsible AI. So if you asked me where we sort of push on is fundamentally to make sure that we never lose sight of the fact that the spectrum of AI can sort of come together for any skill level. And we keep that responsible AI proposition absolutely strong. Now against that canvas Dave, I'll also tell you that as Edge devices get way more capable, where they can input on the Edge, say a camera or a mic or something like that. You will see us pushing a lot more of that capability onto the edge as well. But to me, that's sort of a modality but the core really is all skill levels and that responsibility in AI. >> Yeah, so that brings me to this notion of, I want to bring an Edge and hybrid cloud, understand how you're thinking about hybrid cloud, multicloud obviously one of your competitors Amazon won't even say the word multicloud. You guys have a different approach there but what's the strategy with regard to hybrid? Do you see the cloud, you're bringing Azure to the edge maybe you could talk about that and talk about how you're different from the competition. >> Yeah, I think in the Edge from an Edge and I even I'll be the first one to say that the word Edge itself is conflated. Okay, a little bit it's but I will tell you just focusing on hybrid, this is one of the places where, I would say 2020 if I were to look back from a COVID perspective in particular, it has been the most informative. Because we absolutely saw customers digitizing, moving to the cloud. And we really saw hybrid in action. 2020 was the year that hybrid sort of really became real from a cloud computing perspective. And an example of this is we understood that it's not all or nothing. So sometimes customers want Azure consistency in their data centers. This is where things like Azure Stack comes in. Sometimes they basically come to us and say, "We want the flexibility of adopting "flexible button of platforms let's say containers, "orchestrating Kubernetes "so that we can essentially deploy it wherever you want." And so when we designed things like Arc, it was built for that flexibility in mind. So, here's the beauty of what something like Arc can do for you. If you have a Kubernetes endpoint anywhere, we can deploy an Azure service onto it. That is the promise. Which means, if for some reason the customer says that, "Hey, I've got "this Kubernetes endpoint in AWS. And I love Azure SQL. You will be able to run Azure SQL inside AWS. There's nothing that stops you from doing it. So inherently, remember our first principle is always to meet our customers where they are. So from that perspective, multicloud is here to stay. We are never going to be the people that says, "I'm sorry." We will never say (speaks indistinctly) multicloud but it is a reality for our customers. >> So I wonder if we could close, thank you for that. By looking back and then ahead and I want to put forth, maybe it's a criticism, but maybe not. Maybe it's an art of Microsoft. But first, you did Microsoft an incredible job at transitioning its business. Azure is omnipresent, as we said our data shows that. So two-part question first, Microsoft got there by investing in the cloud, really changing its mindset, I think and leveraging its huge software estate and customer base to put Azure at the center of it's strategy. And many have said, me included, that you got there by creating products that are good enough. We do a one Datto, it's still not that great, then a two Datto and maybe not the best, but acceptable for your customers. And that's allowed you to grow very rapidly expand your market. How do you respond to that? Is that a fair comment? Are you more than good enough? I wonder if you could share your thoughts. >> Dave, you hurt my feelings with that question. >> Don't hate me JG. (both laugh) We're getting it out there all right, so. >> First of all, thank you for asking me that. I am absolutely the biggest cheerleader you'll find at Microsoft. I absolutely believe that I represent the work of almost 9,000 engineers. And we wake up every day worrying about our customer and worrying about the customer condition and to absolutely make sure we deliver the best in the first attempt that we do. So when you take the plethora of products we deliver in Azure, be it Azure SQL, be it Azure Cosmos DB, Synapse, Azure Databricks, which we did in partnership with Databricks, Azure Machine Learning. And recently when we premiered, we sort of offered the world's first comprehensive data governance solution in Azure Purview. I would humbly submit it to you that we are leading the way and we're essentially showing how the future of data, AI and the Edge should work in the cloud. >> Yeah, I'd be disappointed if you capitulated in any way, JG. So, thank you for that. And that's kind of last question is looking forward and how you're thinking about the future of cloud. Last decade, a lot about cloud migration, simplifying infrastructure to management and deployment. SaaSifying My Enterprise, a lot of simplification and cost savings and of course redeployment of resources toward digital transformation, other valuable activities. How do you think this coming decade will be defined? Will it be sort of more of the same or is there something else out there? >> I think that the coming decade will be one where customers start to unlock outsize value out of this. What happened to the last decade where people laid the foundation? And people essentially looked at the world and said, "Look, we've got to make a move. "They're largely hybrid, but you're going to start making "steps to basically digitize and modernize our platforms. I will tell you that with the amount of data that people are moving to the cloud, just as an example, you're going to see use of analytics, AI or business outcomes explode. You're also going to see a huge sort of focus on things like governance. People need to know where the data is, what the data catalog continues, how to govern it, how to trust this data and given all of the privacy and compliance regulations out there essentially their compliance posture. So I think the unlocking of outcomes versus simply, Hey, I've saved money. Second, really putting this comprehensive sort of governance regime in place and then finally security and trust. It's going to be more paramount than ever before. >> Yeah, nobody's going to use the data if they don't trust it, I'm glad you brought up security. It's a topic that is at number one on the CIO list. JG, great conversation. Obviously the strategy is working and thanks so much for participating in Cube on Cloud. >> Thank you, thank you, Dave and I appreciate it and thank you to everybody who's tuning into today. >> All right then keep it right there, I'll be back with our next guest right after this short break.

Published Date : Jan 5 2021

SUMMARY :

of one of the leaders in the field, to be here with you that the new innovation cocktail comprises and the use of data in interesting ways. and how you see the future that you have the best is that you got the single that once you land data, but to get there you got to go in the way that you like. Yeah, you can think of it that way. of and I wonder if you buy into it. and the value proposition and that you lose some of And so I think just to give you an example So, and that seems to be your philosophy when you think about Azure Yeah, it's hard to argue the keys to the kingdom are, Do you see the cloud, you're and I even I'll be the first one to say that you got there by creating products Dave, you hurt my We're getting it out there all right, so. that I represent the work Will it be sort of more of the same and given all of the privacy the data if they don't trust it, thank you to everybody I'll be back with our next guest

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
JG	PERSON	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Jumark Dehghani	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
JG Chirapurath	PERSON	0.99+
first	QUANTITY	0.99+
50 different services	QUANTITY	0.99+
both	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
50 different places	QUANTITY	0.99+
MySQL	TITLE	0.99+
one	QUANTITY	0.99+
GlobalMesh	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
first attempt	QUANTITY	0.99+
Second	QUANTITY	0.99+
Last decade	DATE	0.99+
three	QUANTITY	0.99+
three factors	QUANTITY	0.99+
Synapse	ORGANIZATION	0.99+
one way	QUANTITY	0.99+
COVID	OTHER	0.99+
One	QUANTITY	0.98+
first one	QUANTITY	0.98+
first principle	QUANTITY	0.98+
today	DATE	0.98+
Azure Stack	TITLE	0.98+
Azure SQL	TITLE	0.98+
Spark	TITLE	0.98+
First	QUANTITY	0.98+
MongoDB	TITLE	0.98+
2020	DATE	0.98+
about 200 radios	QUANTITY	0.98+
Moore	PERSON	0.97+
PostgreSQL	TITLE	0.97+
four decades	QUANTITY	0.97+
Arc	TITLE	0.97+
single	QUANTITY	0.96+
Snowflake	ORGANIZATION	0.96+
last decade	DATE	0.96+
Azure Purview	TITLE	0.95+
3 million packages a day	QUANTITY	0.95+
One way	QUANTITY	0.94+
three core	QUANTITY	0.94+

Sandy Carter, AWS Public Sector Partners | AWS re:Invent 2020 Public Sector Day

>> From around the globe, it's theCube, with digital coverage of AWS re:Invent 2020. Special coverage sponsored by, AWS Worldwide Public Sector. >> Okay, welcome back to theCube's coverage, of re:Invent 2020 virtual. It's theCube virtual, I'm John Farrow your host, we're here celebrating, the special coverage of public sector with Sandy Carter, vice president of AWS Public Sector Partners. She heads up the partner group within Public Sector, now in multiple for about a year now. Right Sandy, or so? >> Right, you got it, John. >> About a year? Congratulations, welcome back to theCube, >> Thank you. >> for reason- >> Always a pleasure to be here and what an exciting re:Invent right? >> It's been exciting, we've got wall-to-wall coverage, multiple sets, a lot of actions, virtual it's three weeks, we're not in person we have to do it remote this year. So when real life comes back, we'll bring the Cube back. But I want to take a minute to step back, take a minute to explain your role for the folks that are new to theCube virtual and what you're doing over there at Public Sector. Take a moment to introduce yourself to the new viewers. >> Well, welcome. theCube is phenomenal, and of course we love our new virtual re:Invent as well, as John said, my name is Sandy Carter and I'm vice president with our public sector partners group. So what does that mean? That means I get to work with thousands of partners globally covering exciting verticals like, space and healthcare, education, state and local government, federal government, and more. And what I get to do is, to help our partners learn more about AWS so that they can help our customers really be successful in the marketplace. >> What has been the most, exciting thing for you in the job? >> Well, you know, I love, wow, I love everything about it, but I think one of the things I love the most, is how we in Public Sector, really make technology have a meaningful impact on the world. So John, I get to work with partners like Orbis which is a non-profit they're fighting preventable blindness. They're a partner of ours. They've got something called CyberSec AI which enables us to use machine learning over 20 different machine learning algorithms to detect common eye diseases in seconds. So, you know, that purpose for me is so important. We also work with a partner called Twist Inc it's hard to say, but it just does a phenomenal job with AWS IoT and helps make water pumps, smart pumps. So they are in 7,300 remote locations around the world helping us with clean water. So for me that's probably the most exciting and meaningful part of the job that I have today. >> And it's so impactful because you guys really knew Amazon's business model has always been about enablement from startups to now up and running Public Sector; entities, agencies, education, healthcare, again, and even in spaces, this IoT in space. But you've been on the 100 partner tour over a 100 days. What did you learn, what are you hearing from partners now? What's the messages that you're hearing? >> Well, first of all, it was so exciting. I had a 100 different partner meetings in a 100 days because John, just like you, I missed going around the world and meeting in person. So I said, well, if I can't meet in person I will do a virtual tour and I talked to partners, in 68 different countries. So a couple of things I heard, one is a lot of love for our map program and that's our migration acceleration program. We now have funding available for partners as they assess migration, we can mobilize it and as they migrate it. And you may or may not know, but we have over twice the number of migration competency partners doing business in Public Sector this year, than we did last year. The second thing we heard was that, partners really love our marketing programs. We had some really nice success this year showcasing value for our customers with cyber security. And I love that because security is so important. Andy Jassy always talks about how her customers really have that as priority zeros. So we were able to work with a couple of different areas that we were very proud at and I loved that the partners were too. We did some repeatable solutions with our consulting partners. And then I think the third big takeaway that I saw was just our partners love the AWS technology. I heard a lot about AI and ML. We offered this new program called The Rapid Adoption Assistance Program. It's going global in 2021, and so we help partners brainstorm and envision what they could do with it. And then of course, 5G. 5G is ushering in, kind of a new era of new demand. And we going to to do a PartnerCast on all about 5G for partners in the first quarter. >> Okay, I'm going to put you on the spot. What are the three most talked about programs that you heard? >> Oh, wow, let's see. The three most talked about programs that I heard about, the first one was, is something I'm really excited about. It's called a Think Big for Small Business. It really focuses in on diverse partner groups and types. What it does is it provides just a little bit of extra boost to our small and medium businesses to help them get some of the benefits of our AWS partner program. So companies like MFT they're based down in South Africa it's a husband and wife team that focus on that Black Economic Empowerment rating and they use the program to get some of the go to market capability. So that's number one. Let's see, you said three. Okay, so number two would be our ProServe ready pilot. This helps to accelerate our partner activation and enablement and provides partners a way to get badged on the ProServe best practices get trained up and does opportunity matching. And I think a lot of partners were kind of buzzing about that program and wanting to know more about it. And then ,last but not least, the one that I think of probably really has impact to time to compliance it's called ATO or Authority to Operate and what we do is we help our partners, both technology partners and consulting partners get support for compliance framework. So FedRAMP, of course, we have over 129 solutions right now that are FedRAMPed but we also added John, PCI for financial HIPPA for healthcare, for public safety, IRS 1075 for international GDPR and of course for defense, aisle four, five and six, and CMMC. That program is amazing because it cuts the time to market and have cuts across and have and really steps partners through all of our best practices. I think those are the top three. >> Yeah, I've been like a broken record for the folks that don't know all my interviews I've done with Public Sector over the years. The last one is interesting and I think that's a secret sauce that you guys have done, the compliance piece, being an entrepreneur and starting companies that first three steps in a cloud of dust momentum the flywheel to get going. It's always the hardest and getting the certification if you don't have the resources, it's time consuming. I think you guys really cracked the code on that. I really want to call that out 'cause that's I think really super valuable for the folks that pay attention to and of course sales enablement through the program. So great stuff. Now, given that's all cool, (hands claps) the question I have and I hear all the time is, okay, I'm involved I got a lot of pressure pandemic has forced me to rethink I don't have a lot of IT I don't have a big budget I always complaint but not anymore. Mandate is move fast, get built out, leverage the cloud. Okay, I want to get going. What's the best ways for me to grow with Public Sector? How do I do that if I'm a customer, I really want to... I won't say take a shortcut because there's probably no shortage. How do I throttle up? Quickly, what's your take on that? >> Well, John, first I want to give one star that came to us from a Twilio. They had interviewed a ton of companies and they found that there was more digital transformation since March since when the pandemic started to now than in the last five years. So that just blew me away. And I know all of our partners are looking to see how they can really grow based on that. So if you're a consulting partner, one of the things that we say to help you grow is we've already done some integrations and if you can take advantage of those that can speed up your time to market. So I know know this one, the VMware Cloud on AWS. what a powerful integration, it provides protection of skillsets to your customer, increases your time to market because now VMware, vSphere, VSAN is all on AWS. So it's the same user interface and it really helps to reduce costs. And there's another integration that I think really helps which is Amazon connect one of our fastest growing areas because it's a ML AI, breads solution to help with call centers. It's been integrated with Salesforce but the Service Cloud and the Sales Cloud. So how powerful is that this integrated customer workflow? So I think both of those are really interesting for our consulting partners. >> That's a great point. In fact, well, that's the big part of the story here at re:Invent. These three weeks has been the integration. Salesforce as you mentioned connect has been huge and partner- >> Huge >> so just just great success again, I've seen great momentum. People are seeing their jobs being saved, they're saving lives. People are pretty excited and it's certainly a lot of work you've done in healthcare and education two big areas of activity which is really hard corporation, really, really hard. So congratulations on that and great work. Great to see you, I going to ask you one final question. What's the big message for your customers watching as they prepare for 2021 real life is coming back vaccines on the horizon. We're hearing some good news a lot of great cloud help there. What's your message to send to 2021? >> 2021, for our partners for 2021, one, there is a tremendous growth ahead and tremendous value that our partners have added. And that's both on the mission side, which both Theresa and I discussed during our sessions as well as technology. So I think first messages is, there's lots of growth ahead and a lot of ways that we can add value. Second is, all of those programs and initiatives, there's so much help out there for partners. So look for how you could really accelerate using some of those areas on your customer journey as you're going along. And then finally, I just want John, everybody to know , that we love our partners and AWS is there to help you every step of the way. And if you need anything at all obviously reach out to your PDM or your account manager or you're always welcome to reach out to me. And my final message is just, thank you, through so many different things that have happened in 2020, our partners have come through amazingly with passion with value and just with persistence, never stopping. So thank you to all of our partners out there who've really added so much value to our customers. >> And Amazon is recognizing the leadership of partners in the work you're doing. Your leadership session was awesome for the folks who missed it, check it out on demand. Thank you very much, Sandy for coming on the sharing the update. >> Thank you, John, and great to see all your partners out there. >> Okay, this is theCube virtual covering AWS re:Invent 2020 virtual three weeks, wall-to-wall coverage. A lot of videos ,check out all the videos on demand the leadership sessions, theCube videos and of course the Public Sector video on demand. Micro-site with theCube. I'm John Furrier, thanks for watching. (upbeat music)

Published Date : Dec 9 2020

SUMMARY :

From around the globe, it's theCube, the special coverage for the folks that are and of course we love our new So John, I get to work What's the messages that you're hearing? and I loved that the partners were too. Okay, I'm going to put you on the spot. of the go to market capability. for the folks that pay attention to And I know all of our partners are looking of the story here at re:Invent. So congratulations on that and great work. and AWS is there to help you of partners in the work you're doing. and great to see all and of course the Public

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Theresa	PERSON	0.99+
Sandy Carter	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Twist Inc	ORGANIZATION	0.99+
2021	DATE	0.99+
John Farrow	PERSON	0.99+
Sandy	PERSON	0.99+
South Africa	LOCATION	0.99+
Second	QUANTITY	0.99+
last year	DATE	0.99+
AWS Public Sector Partners	ORGANIZATION	0.99+
three	QUANTITY	0.99+
2020	DATE	0.99+
100 days	QUANTITY	0.99+
Orbis	ORGANIZATION	0.99+
both	QUANTITY	0.99+
three weeks	QUANTITY	0.99+
one star	QUANTITY	0.99+
Sales Cloud	TITLE	0.99+
Salesforce	TITLE	0.99+
Twilio	ORGANIZATION	0.99+
68 different countries	QUANTITY	0.99+
100 partner	QUANTITY	0.99+
third big takeaway	QUANTITY	0.98+
first messages	QUANTITY	0.98+
pandemic	EVENT	0.98+
one	QUANTITY	0.98+
first one	QUANTITY	0.98+
theCube	COMMERCIAL_ITEM	0.98+
one final question	QUANTITY	0.98+
this year	DATE	0.98+
second thing	QUANTITY	0.97+
AWS Worldwide Public Sector	ORGANIZATION	0.97+
Cube	COMMERCIAL_ITEM	0.97+
first	QUANTITY	0.97+
March	DATE	0.96+
IRS	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
over 129 solutions	QUANTITY	0.96+
thousands of partners	QUANTITY	0.94+
PCI	ORGANIZATION	0.94+
first three steps	QUANTITY	0.94+
today	DATE	0.94+
over 20 different machine learning algorithms	QUANTITY	0.92+
VMware Cloud	TITLE	0.92+
AWS Public Sector Partners	ORGANIZATION	0.91+
7,300 remote locations	QUANTITY	0.9+
last five years	DATE	0.9+
first quarter	DATE	0.89+
theCube virtual	COMMERCIAL_ITEM	0.88+
100 different partner meetings	QUANTITY	0.88+
a minute	QUANTITY	0.87+
about a year	QUANTITY	0.87+
MFT	ORGANIZATION	0.86+
two big areas	QUANTITY	0.86+
top three	QUANTITY	0.85+

Roger Barga, AWS | AWS re:Invent 2020

>>from around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel and AWS. Yeah, husband. Welcome back to the cubes. Live coverage of AWS reinvent 2020. We're not in person this year. We're virtual This is the Cube Virtual. I'm John for your host of the Cube. Roger Barker, the General Manager AWS Robotics and Autonomous Service. And a lot of other cool stuff was on last year. Always. Speed Racer. You got the machines. Now you have real time Robotics hitting, hitting seen Andy Jassy laid out a huge vision and and data points and announcements around Industrial this I o t it's kind of coming together. Roger, great to see you. And thanks for coming on. I want to dig in and get your perspective. Thanks for joining the Cube. >>Good to be here with you again today. >>Alright, so give us your take on the announcements yesterday and how that relates to the work that you're doing on the robotic side at a w s. And where where does this go from? You know, fun to real world to societal impact. Take us through. What? You how you see that vision? >>Yeah, sure. So we continue to see the story of how processing is moving to the edge and cloud services, or augmenting that processing at the edge with unique and new services. And he talked about five new industrial machine learning services yesterday, which are very relevant to exactly what we're trying to do with AWS robot maker. Um, a couple of them monitor on, which is for equipment monitoring for anomalies. And it's a whole solution, from an edge device to a gateway to a service. But we also heard about look out for equipment, which is if a customer already has their own censors. It's a service that can actually back up that that sensor on their on the device to actually get identify anomalies or potential failures. And we saw look out for video, which allows customers to actually use their camera and and build a service to detect anomalies and potential failures. When A. W s robot maker, we have Ross Cloud Service extensions, which allow developers to connect their robot to these services and so increasingly, that combination of being able to put sensors and processing at the edge, connecting it back with the cloud where you could do intelligent processing and understand what's going on out in the environment. So those were exciting announcements. And that story is going to continue to unfold with new services. New sensors we can put on our robots to again intelligently process the data and control these robots and industrial settings. >>You know, this brings up a great point. And, you know, I wasn't kidding. Was saying fun to real world. I mean, this is what's happening. Um, the use cases air different. You look at you mentioned, um, you know, monitor on lookout. But those depend Panorama appliance. You had computer vision, machine learning. I mean, these are all new, cool, relevant use cases, but they're not like static. It's not like you're going to see them. Just one thing is like the edge has very diverse and sometimes mostly purpose built for the edge piece. So it's not like you could build a product. Okay, fits everywhere. Talk about that dynamic and why the robotics piece has to be agile. And what do you guys doing to make that workable? Because, you know, you want purpose built. The purpose built implies supply chain years. in advance. It implies slow and you know, how do you get the trust? How do you get the security? Take us through that, please. >>So to your point, um, no single service is going to solve all problems, which is why AWS has has released a number of just primitives. Just think about Kinesis video or Aiken. Stream my raw video from an edge device and build my own machine learning model in the cloud with sage maker that will process that. Or I could use recognition. So we give customers these basic building blocks. But we also think about working customer backward. What is the finished solution that we could give a customer that just works out of the box? And the new services we heard about we heard about yesterday were exactly in that latter category. Their purpose built. They're ready to be used or trained for developers to use and and with very little customization that necessary. Um, but the point is, is that is that these customers that are working these environments, the business questions change all the time, and so they need actually re program a robot on the fly, for example, with a new mission to address the new business need that just arose is a dynamic, which we've been very tuned into since we first started with a device robo maker. We have a feature for a fleet management, which allows a developer to choose any robot that's out in their fleet and take the software stack a new software stack tested in simulation and then redeploy it to that robot so it changes its mission. And this is a This is a dialogue we've been seeing coming up over the last year, where roboticists are starting to educate their company that a robot is a device that could be dynamically program. At any point in time, they contest their application and simulation while the robots out in the field verify it's gonna work correctly and simulation and then change the mission for that robot. Dynamically. One of my customers they're working with Woods Hole Institute is sending autonomous underwater robots out into the ocean to monitor wind farms, and they realized the mission may change may change based on what they find out. If the wind farm with the equipment with their autonomous robot, the robot itself may encounter an issue and that ability because they do have connective ity to change the mission dynamically. First Testament, of course, in simulation is completely changing the game for how they think about robots no longer a static program at once, and have to bring it back in the shop to re program it. It's now just this dynamic entity that could test and modify it any time. >>You know, I'm old enough to know how hard that really is to pull off. And this highlights really kind of how exciting this is, E. I mean, just think about the idea of hardware being dynamically updated with software in real time and or near real time with new stacks. I mean, just that's just unheard of, you know, because purpose built has always been kind of you. Lock it in, you deploy it. You send the tech out there this kind of break fixed kind of mindset. Let's changes everything, whether it's space or underwater. You've been seeing everything. It's software defined, software operated model, so I have to ask you First of all, that's super awesome. Anyway, what's this like for the new generation? Because Andy talked on stage and in in my one On one way I had with him. He talked about, um, and referring to land in some of these new things. There's a new generation of developer. So you gotta look at these young kids coming out of school to them. They don't understand what how hard this is. They just look at it as lingua frank with software defined stuff. So can you share some of the cutting edge things that are coming out of these new new the new talent or the new developers? Uh, I'm sure the creativity is off the charts. Can you share some cool, um, use cases? Share your perspective? >>Absolutely. I think there's a couple of interesting cases to look at. One is, you know, roboticists historically have thought about all the processing on the robot. And if you say cloud and cloud service, they just couldn't fathom that reality that all the processing has cannot has to be, you know, could be moved off of the robot. Now you're seeing developers who are looking at the cloud services that we're launching and our cloud service extensions, which give you a secure connection to the cloud from your robot. They're starting to realize they can actually move some of that processing off the robot that could lower the bomb or the building materials, the cost of the robot. And they can have this dynamic programming surface in the cloud that they can program and change the behavior of the robot. So that's a dialogue we've seen coming over the last couple years, that rethinking of where the software should live. What makes sense to run on the robot? And what should we push out to the cloud? Let alone the fact that if you're aggregating information from hundreds of robots, you can actually build machine learning models that actually identify mistakes a single robot might make across the fleet and actually use that insight to actually retrain the models. Push new applications down, pushing machine learning models down. That is a completely different mindset. It's almost like introducing distributed computing to roboticists that you actually think this fabric of robots and another, more recent trend we're seeing that were listening very closely to customers is the ability to use simulation and machine learning, specifically reinforcement. Learning for a robot actually try different tasks up because simulations have gotten so realistic with the physics engines and the rendering quality that is almost nearly realistic for a camera. The physics are actually real world physics, so that you can put a simulation of your robot into a three D simulated world and allow it to bumble around and make mistakes while trying to perform the task that you frankly don't know how to write the code for it so complex and through reinforcement, learning, giving rewards signals if it does something right or punishment or negative rewards signals. If it does something wrong, the machine learning algorithm will learn to perform navigation and manipulation tasks, which again the programmer simply didn't have to write a line of code for other than creating the right simulation in the right set of trials >>so that it's like reversing the debugging protocol. It's like, Hey, do the simulations. The code writes itself. Debug it on the front end. It rights itself rather than writing code, compiling it, debugging it, working through the use cases. I mean, it's pretty different. >>It is. It's really a new persona. When we started out, not only are you taking that roboticist persona and again introduced him to the cloud services and distributed computing what you're seeing machine learning scientists with robotics experience is actually rising. Is a new developer persona that we have to pay attention to him. We're talking to right now about what they what they need from our service. >>Well, Roger, I get I'm getting tight on time here. I want one final question before we break. How does someone get involved with Amazon? And I'll see you know, whether it's robotics and new areas like space, which is verging, there's a lot of action, a lot of interest. Um, how does someone engaged with Amazon to get involved, Whether I'm a student or whether I'm a professional, I want a code. What's what's the absolutely, >>absolutely, so certainly reinvent. We have several sessions that reinvent on AWS robo maker. Our team is there, presenting and talking about our road map and how people can get engaged. There is, of course, the remarks conference, which will be happening next year, hopefully to get engaged. Our team is active in the Ross Open Source Community and Ross Industrial, which is happening in Europe later in December but also happens in the Americas, where were present giving demos and getting hands on tutorials. We're also very active in the academic research in education arena. In fact, we just released open source curriculum that any developer could get access to on Get Hub for Robotics and Ross, as well as how to use robo maker that's freely available. Eso There's a number of touch points and, of course, I'd be welcome to a field. Any request for people to learn more or just engage with our team? >>Arthur Parker, general manager. It is robotics and also the Autonomous Systems Group at AWS Amazon Web services. Great stuff, and this is really awesome insight. Also, you know it za candy For the developers, it's the new generation of people who are going to get put their teeth into some new science and some new problems to solve. With software again, distributed computing meets robotics and hardware, and it's an opportunity to change the world literally. >>It is an exciting space. It's still Day one and robotics, and we look forward to seeing the car customers do with our service. >>Great stuff, of course. The Cube loves this country. Love robots. We love autonomous. We love space programming all this stuff, totally cutting edge cloud computing, changing the game at many levels with the digital transformation just a cube. Thanks for watching

Published Date : Dec 2 2020

SUMMARY :

It's the Cube with digital You know, fun to real world to societal at the edge, connecting it back with the cloud where you could do intelligent processing and understand what's going And what do you guys doing to make that workable? for developers to use and and with very little customization that necessary. It's software defined, software operated model, so I have to ask you First of all, all the processing has cannot has to be, you know, could be moved off of the robot. so that it's like reversing the debugging protocol. persona and again introduced him to the cloud services and distributed computing what you're seeing machine And I'll see you know, whether it's robotics and There is, of course, the remarks conference, which will be happening next year, hopefully to get engaged. and hardware, and it's an opportunity to change the world literally. It's still Day one and robotics, and we look forward to seeing the car customers do with our service. all this stuff, totally cutting edge cloud computing, changing the game at many levels with the digital

ENTITIES

Entity	Category	Confidence
Roger	PERSON	0.99+
Arthur Parker	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Roger Barker	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Europe	LOCATION	0.99+
Andy	PERSON	0.99+
Woods Hole Institute	ORGANIZATION	0.99+
Ross Industrial	ORGANIZATION	0.99+
Americas	LOCATION	0.99+
next year	DATE	0.99+
Roger Barga	PERSON	0.99+
John	PERSON	0.99+
Ross Open Source Community	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Ross	ORGANIZATION	0.99+
last year	DATE	0.99+
One	QUANTITY	0.98+
today	DATE	0.98+
this year	DATE	0.98+
Get Hub	ORGANIZATION	0.97+
hundreds of robots	QUANTITY	0.97+
AWS Robotics and Autonomous Service	ORGANIZATION	0.96+
first	QUANTITY	0.96+
Intel	ORGANIZATION	0.96+
one thing	QUANTITY	0.95+
one final question	QUANTITY	0.95+
five new industrial machine learning services	QUANTITY	0.92+
Autonomous Systems Group	ORGANIZATION	0.92+
First	QUANTITY	0.9+
single service	QUANTITY	0.9+
last couple years	DATE	0.87+
single robot	QUANTITY	0.85+
Amazon Web	ORGANIZATION	0.85+
Day one	QUANTITY	0.83+
Kinesis	ORGANIZATION	0.8+
First Testament	QUANTITY	0.79+
Cube Virtual	COMMERCIAL_ITEM	0.75+
Cube	COMMERCIAL_ITEM	0.74+
W	PERSON	0.68+
one	QUANTITY	0.64+
couple	QUANTITY	0.63+
Invent	EVENT	0.62+
December	DATE	0.62+
Robotics	ORGANIZATION	0.61+
Aiken	ORGANIZATION	0.58+
reinvent 2020	EVENT	0.49+
2020	TITLE	0.47+
Cloud	TITLE	0.47+
reinvent	EVENT	0.44+
re	EVENT	0.32+

Hemanth Manda, IBM Cloud Pak

(soft electronic music) >> Welcome to this CUBE Virtual Conversation. I'm your host, Rebecca Knight. Today, I'm joined by Hermanth Manda. He is the Executive Director, IBM Data and AI, responsible for Cloud Pak for Data. Thanks so much for coming on the show, Hermanth. >> Thank you, Rebecca. >> So we're talking now about the release of Cloud Pak for Data version 3.5. I want to explore it for, from a lot of different angles, but do you want to just talk a little bit about why it is unique in the marketplace, in particular, accelerating innovation, reducing costs, and reducing complexity? >> Absolutely, Rebecca. I mean, this is something very unique from an IBM perspective. Frankly speaking, this is unique in the marketplace because what we are doing is we are bringing together all of our data and AI capabilities into a single offering, single platform. And we have continued, as I said, we made it run on any cloud. So we are giving customers the flexibility. So it's innovation across multiple fronts. It's still in consolidation. It's, in doing automation and infusing collaboration and also having customers to basically modernize to the cloud-native world and pick their own cloud which is what we are seeing in the market today. So I would say this is a unique across multiple fronts. >> When we talk about any new platform, one of the big concerns is always around internal skills and maintenance tasks. What changes are you introducing with version 3.5 that does, that help clients be more flexible and sort of streamline their tasks? >> Yeah, it's an interesting question. We are doing a lot of things with respect to 3.5, the latest release. Number one, we are simplifying the management of the platform, made it a lot simpler. We are infusing a lot of automation into it. We are embracing the concept of operators that are not open shelf has introduced into the market. So simple things such as provisioning, installation, upgrades, scaling it up and down, autopilot management. So all of that is taken care of as part of the latest release. Also, what we are doing is we are making the collaboration and user onboarding very easy to drive self service and use the productivity. So overall, this helps, basically, reduce the cost for our customers. >> One of the things that's so striking is the speed of the innovation. I mean, you've only been in the marketplace for two and a half years. This is already version 3.5. Can you talk a little bit about, about sort of the, the innovation that it takes to do this? >> Absolutely. You're right, we've been in the market for slightly over two and a half years, 3.5's our ninth release. So frankly speaking, for any company, or even for startups doing nine releases in 2.5 years is unheard of, and definitely unheard of at IBM. So we are acting and behaving like a startup while addressing the go to market, and the reach of IBM. So I would say that we are doing a lot here. And as I said before, we're trying to address the unique needs of the market, the need to modernize to the cloud-native architectures to move to the cloud also while addressing the needs of our existing customers, because there are two things we are trying to focus, here. First of all, make sure that we have a modern platform across the different capabilities in data and AI, that's number one. Number two is also how do we modernize our existing install base. We have six plus billion dollar business for data and AI across significant real estates. We're providing a platform through Cloud Pak for Data to those existing install base and existing customers to more nice, too. >> I want to talk about how you are addressing the needs of customers, but I want to delve into something you said earlier, and that is that you are behaving like a startup. How do you make sure that your employees have that kind of mindset that, that kind of experimental innovative, creative, resourceful mindset, particularly at a more mature company like IBM? What kinds of skills do you try to instill and cultivate in your, in your team? >> That's a very interesting question, Rebecca. I think there's no single answer, I would say. It starts with listening to the customers, trying to pay detailed attention to what's happening in the market. How competent is it reacting. Looking at the startups, themselves. What we did uniquely, that I didn't touch upon earlier is that we are also building an open ecosystem here, so we position ourselves as an open platform. Yes, there's a lot of IBM unique technology here, but we also are leveraging open source. We are, we have an ecosystem of 50 plus third party ISVs. So by doing that, we are able to drive a lot more innovation and a lot faster because when you are trying to do everything by yourself, it's a bit challenging. But when you're part of an open ecosystem, infusing open source and third party, it becomes a lot easier. In terms of culture, I just want to highlight one thing. I think we are making it a point to emphasize speed over being perfect, progress over perfection. And that, I think, that is something net new for IBM because at IBM, we pride ourselves in quality, scalability, trying to be perfect on day one. I think we didn't do that in this particular case. Initially, when we launched our offense two and a half years back, we tried to be quick to the market. Our time to market was prioritized over being perfect. But now that is not the case anymore, right? I think we will make sure we are exponentially better and those things are addressed for the past two and one-half years. >> Well, perfect is the enemy of the good, as we know. One of the things that your customers demand is flexibility when building with machine learning pipeline. What have you done to improve IBM machine learning tools on this platform? >> So there's a lot of things we've done. Number one, I want to emphasize our building AI, the initial problem that most of our customers concerned about, but in my opinion, that's 10% of the problem. Actually deploying those AI models or managing them and covering them at scales for the enterprise is a bigger challenge. So what we have is very unique. We have the end-to-end AI lifecycle, we have tools for all the way from building, deploying, managing, governing these models. Second is we are introducing net new capabilities as part of a latest release. We have this call or this new service called WMLA, Watson Machine Learning Accelerator that addresses the unique challenges of deep learning capabilities, managing GPUs, et cetera. We are also making the auto AI capabilities a lot more robust. And finally, we are introducing a net new concept called Federator Learning that allows you to build AI across distributed datasets, which is very unique. I'm not aware of any other vendor doing this, so you can actually have your data distributed across multiple clouds, and you can build an aggregated AI model without actually looking at the data that is spread across these clouds. And this concept, in my opinion, is going to get a lot more traction as we move forward. >> One of the things that IBM has always been proud of is the way it partners with ISVs and other vendors. Can you talk about how you work with your partners and foster this ecosystem of third-party capabilities that integrate into the platform? >> Yes, it's always a challenge. I mean, for this to be a platform, as I said before, you need to be open and you need to build an ecosystem. And so we made that a priority since day one and we have 53 third party ISVs, today. It's a chicken and egg problem, Rebecca, because you need to obviously showcase success and make it a priority for your partners to onboard and work with you closely. So, we obviously invest, we co-invest with our partners and we take them to market. We have different models. We have a tactical relationship with some of our third party ISVs. We also have a strategic relationship. So we partner with them depending on their ability to partner with us and we go invest and make sure that we are not only integrating them technically, but also we are integrating with them from a go-to-market perspective. >> I wonder if you can talk a little bit about the current environment that we're in. Of course, we're all living through a global health emergency in the form of the COVID-19 pandemic. So much of the knowledge work is being done from home. It is being done remotely. Teams are working asynchronously over different kinds of digital platforms. How have you seen these changes affect the team, your team at IBM, what kinds of new kinds of capabilities, collaborations, what kinds of skills have you seen your team have to gain and have to gain quite quickly in this environment? >> Absolutely. I think historically, IBM had quite a, quite a portion of our workforce working remotely so we are used to this, but not at the scale that the current situation has compelled us to. So we made a lot more investments earlier this year in digital technologies, whether it is Zoom and WebEx or trying to use tools, digital tools that helps us coordinate and collaborate effectively. So part of it is technical, right? Part of it is also a cultural shift. And that came all the way from our CEO in terms of making sure that we have the necessary processes in place to ensure that our employees are not in getting burnt out, that they're being productive and effective. And so a combination of what I would say, technical investments, plus process and leadership initiatives helped us essentially embrace the changes that we've seen, today. >> And I want you to close us out, here. Talk a little bit about the future, both for Cloud Pak for Data, but also for the companies and clients that you work for. What do you see in the next 12 to 24 months changing in the term, in terms of how we have re-imagined the future of work. I know you said this was already version nine. You've only been in the marketplace for, for not even three years. That's incredible innovation and speed. Talk a little bit about changes you see coming down the pike. >> So I think everything that we have done is going to get amplified and accelerated as we move forward, shift to cloud, embracing AI, adopting AI into business processes to automate and amplify new business models, collaboration, to a certain extent, consolidation of the different offerings into platforms. So all of this, we, I obviously see that being accelerated and that acceleration will continue as we move forward. And the real challenge I see with our customers and all the enterprises is, I see them in two buckets. There's one bucket which are resisting change, like to stick to the old concepts, and there's one bucket of enterprises who are embracing the change and moving forward, and actually get accelerating this transformation and change. I think it will be successful over the next one to five years. You know, it could be under the other bucket and if you're not, I think it's, you're going to get, you're going to miss out and that is getting amplified and accelerated, as we speak. >> So for those ones in the bucket that are resistant to the change, how do you get them onboard? I mean, this is classic change management that they teach at business schools around the world. But what are some advice that you would have to those who are resisting the change? >> So, and again, frankly speaking, we, at IBM, are going through that transition so I can speak from experience. >> Rebecca: You're drinking the Kool-Aid. >> Yeah, when, when I think, one way to address this is basically take one step at a time, like as opposed to completely revolutionizing the way you do your business. You can transform your business one step at a time while keeping the end objective as your goal, as your end goal. So, and it just want a little highlight that with full factor, that's exactly what we are enabling because what we do is we enable you to actually run anywhere you like. So if most of your systems, most of your data and your models, and analytics are on-premise, you can actually start your journey there while you plan for the future of a public cloud or a managed service. So my advice is pretty simple. You start the journey, but you can take, you can, you don't need to, you don't need to do it as a big bang. You, it could be a journey, it could be a gradual transformation, but you need to start the journey today. If you don't, you're going to miss out. >> Baby steps. Hey Hermanth Manda, thank you so much for joining us for this Virtual CUBE Conversation >> Thank you very much, Rebecca. >> I'm Rebecca Knight, stay tuned for more of theCUBE Virtual. (soft electronic music)

Published Date : Nov 20 2020

SUMMARY :

He is the Executive but do you want to just talk a little bit So we are giving one of the big concerns is of the platform, made it a lot simpler. the innovation that it takes to do this? the need to modernize to the and that is that you are is that we are also building of the good, as we know. that addresses the unique challenges One of the things that IBM has always and we have 53 third party ISVs, today. So much of the knowledge And that came all the way from our CEO and clients that you work for. over the next one to five years. in the bucket that are So, and again, frankly speaking, is we enable you to actually Hey Hermanth Manda, thank you so much for more of theCUBE Virtual.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hermanth	PERSON	0.99+
Hemanth Manda	PERSON	0.99+
10%	QUANTITY	0.99+
two and a half years	QUANTITY	0.99+
nine releases	QUANTITY	0.99+
two things	QUANTITY	0.99+
Hermanth Manda	PERSON	0.99+
Second	QUANTITY	0.99+
IBM Data	ORGANIZATION	0.99+
one bucket	QUANTITY	0.99+
2.5 years	QUANTITY	0.99+
ninth release	QUANTITY	0.99+
Today	DATE	0.99+
50 plus	QUANTITY	0.99+
One	QUANTITY	0.99+
over two and a half years	QUANTITY	0.98+
five years	QUANTITY	0.98+
two buckets	QUANTITY	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
First	QUANTITY	0.97+
three years	QUANTITY	0.97+
WMLA	ORGANIZATION	0.97+
COVID-19 pandemic	EVENT	0.96+
Kool-Aid	ORGANIZATION	0.96+
Watson Machine Learning Accelerator	ORGANIZATION	0.96+
Cloud Pak for Data	TITLE	0.96+
single platform	QUANTITY	0.96+
24 months	QUANTITY	0.96+
one thing	QUANTITY	0.95+
one	QUANTITY	0.95+
Zoom	ORGANIZATION	0.95+
WebEx	ORGANIZATION	0.94+
Number two	QUANTITY	0.92+
day one	QUANTITY	0.9+
Cloud Pak	TITLE	0.9+
single offering	QUANTITY	0.89+
version 3.5	OTHER	0.87+
12	QUANTITY	0.87+
one step	QUANTITY	0.86+
53 third party	QUANTITY	0.84+
two and a half years back	DATE	0.84+
single answer	QUANTITY	0.81+
year	QUANTITY	0.8+
nine	OTHER	0.79+
3.5	OTHER	0.78+
Cloud Pak for Data version 3.5	TITLE	0.76+
one way	QUANTITY	0.74+
Number one	QUANTITY	0.74+
six plus billion dollar	QUANTITY	0.7+
party	QUANTITY	0.61+
one-half years	QUANTITY	0.61+
past two	DATE	0.57+
3.5	TITLE	0.56+
version	QUANTITY	0.56+
Cloud Pak	ORGANIZATION	0.52+
Learning	OTHER	0.46+
CUBE	ORGANIZATION	0.43+
Cloud	COMMERCIAL_ITEM	0.4+

Swami Sivasubramanian, AWS | AWS Summit Online 2020

>> Narrator: From theCUBE Studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is a CUBE conversation. >> Hello everyone, welcome to this special CUBE interview. We are here at theCUBE Virtual covering AWS Summit Virtual Online. This is Amazon's Summits that they normally do all around the world. They're doing them now virtually. We are here in the Palo Alto COVID-19 quarantine crew getting all the interviews here with a special guest, Vice President of Machine Learning, we have Swami, CUBE Alumni, who's been involved in not only the machine learning, but all of the major activity around AWS around how machine learning's evolved, and all the services around machine learning workflows from transcribe, recognition, you name it. Swami, you've been at the helm for many years, and we've also chatted about that before. Welcome to the virtual CUBE covering AWS Summit. >> Hey, pleasure to be here, John. >> Great to see you. I know times are tough. Everything okay at Amazon? You guys are certainly cloud scaled, not too unfamiliar of working remotely. You do a lot of travel, but what's it like now for you guys right now? >> We're actually doing well. We have been I mean, this many of, we are working hard to make sure we continue to serve our customers. Even from their site, we have done, yeah, we had taken measures to prepare, and we are confident that we will be able to meet customer demands per capacity during this time. So we're also helping customers to react quickly and nimbly, current challenges, yeah. Various examples from amazing startups working in this area to reorganize themselves to serve customer. We can talk about that common layer. >> Large scale, you guys have done a great job and fun watching and chronicling the journey of AWS, as it now goes to a whole 'nother level with the post pandemic were expecting even more surge in everything from VPNs, workspaces, you name it, and all these workloads are going to be under a lot of pressure to do more and more value. You've been at the heart of one of the key areas, which is the tooling, and the scale around machine learning workflows. And this is where customers are really trying to figure out what are the adequate tools? How do my teams effectively deploy machine learning? Because now, more than ever, the data is going to start flowing in as virtualization, if you will, of life, is happening. We're going to be in a hybrid world with life. We're going to be online most of the time. And I think COVID-19 has proven that this new trajectory of virtualization, virtual work, applications are going to have to flex, and adjust, and scale, and be reinvented. This is a key thing. What's going on with machine learning, what's new? Tell us what are you guys doing right now. >> Yeah, I see now, in AWS, we offer broadest-- (poor audio capture obscures speech) All the way from like expert practitioners, we offer our frameworks and infrastructure layer support for all popular frameworks from like TensorFlow, Apache MXNet, and PyTorch, PowerShell, (poor audio capture obscures speech) custom chips like inference share. And then, for aspiring ML developers, who want to build their own custom machine learning models, we're actually building, we offer SageMaker, which is our end-to-end machine learning service that makes it easy for customers to be able to build, train, tune, and debug machine learning models, and it is one of our fastest growing machine learning services, and many startups and enterprises are starting to standardize their machine learning building on it. And then, the final tier is geared towards actually application developers, who did not want to go into model-building, just want an easy API to build capabilities to transcribe, run voice recognition, and so forth. And I wanted to talk about one of the new capabilities we are about to launch, enterprise search called Kendra, and-- >> So actually, so just from a news standpoint, that's GA now, that's being announced at the Summit. >> Yeah. >> That was a big hit at re:Invent, Kendra. >> Yeah. >> A lot of buzz! It's available. >> Yep, so I'm excited to say that Kendra is our new machine learning powered, highly accurate enterprise search service that has been made generally available. And if you look at what Kendra is, we have actually reimagined the traditional enterprise search service, which has historically been an underserved market segment, so to speak. If you look at it, on the public search, on the web search front, it is a relatively well-served area, whereas the enterprise search has been an area where data in enterprise, there are a huge amount of data silos, that is spread in file systems, SharePoint, or Salesforce, or various other areas. And deploying a traditional search index has always that even simple persons like when there's an ID desk open or when what is the security policy, or so forth. These kind of things have been historically, people have to find within an enterprise, let alone if I'm actually in a material science company or so forth like what 3M was trying to do. Enable collaboration of researchers spread across the world, to search their experiment archives and so forth. It has been super hard for them to be able to things, and this is one of those areas where Kendra has enabled the new, of course, where Kendra is a deep learning powered search service for enterprises, which breaks down data silos, and collects actually data across various things all the way from S3, or file system, or SharePoint, and various other data sources, and uses state-of-art NLP techniques to be able to actually index them, and then, you can query using natural language queries such as like when there's my ID desk-scoping, and the answer, it won't just give you a bunch of random, right? It'll tell you it opens at 8:30 a.m. in the morning. >> Yeah. >> Or what is the credit card cashback returns for my corporate credit card? It won't give you like a long list of links related to it. Instead it'll give you answer to be 2%. So it's that much highly accurate. (poor audio capture obscures speech) >> People who have been in the enterprise search or data business know how hard this is. And it is super, it's been a super hard problem, the old in the old guard models because databases were limiting to schemas and whatnot. Now, you have a data-driven world, and this becomes interesting. I think the big takeaway I took away from Kendra was not only the new kind of discovery navigation that's possible, in terms of low latency, getting relevant content, but it's really the under-the-covers impact, and I think I'd like to get your perspective on this because this has been an active conversation inside the community, in cloud scale, which is data silos have been a problem. People have had built these data silos, and they really talk about breaking them down but it's really again hard, there's legacy problems, and well, applications that are tied to them. How do I break my silos down? Or how do I leverage either silos? So I think you guys really solve a problem here around data silos and scale. >> Yeah. >> So talk about the data silos. And then, I'm going to follow up and get your take on the kind of size of of data, megabytes, petabytes, I mean, talk about data silos, and the scale behind it. >> Perfect, so if you look at actually how to set up something like a Kendra search cluster, even as simple as from your Management Console in the AWS, you'll be able to point Kendra to various data sources, such as Amazon S3, or SharePoint, and Salesforce, and various others. And say, these are kind of data I want to index. And Kendra automatically pulls in this data, index these using its deep learning and NLP models, and then, automatically builds a corpus. Then, I, as in user of the search index, can actually start querying it using natural language, and don't have to worry where it comes from, and Kendra takes care of things like access control, and it uses finely-tuned machine learning algorithms under the hood to understand the context of natural language query and return the most relevant. I'll give a real-world example of some of the field customers who are using Kendra. For instance, if you take a look at 3M, 3M is using Kendra to support search, support its material science R&D by enabling natural language search of their expansive repositories of past research documents that may be relevant to a new product. Imagine what this does to a company like 3M. Instead of researchers who are spread around the world, repeating the same experiments on material research over and over again, now, their engineers and researchers will allow everybody to quickly search through documents. And they can innovate faster instead of trying to literally reinvent the wheel all the time. So it is better acceleration to the market. Even we are in this situation, one of the interesting work that you might be interested in is the Semantic Scholar team at Allen Institute for AI, recently opened up what is a repository of scientific research called COVID-19 Open Research Dataset. These are expert research articles. (poor audio capture obscures speech) And now, the index is using Kendra, and it helps scientists, academics, and technologists to quickly find information in a sea of scientific literature. So you can even ask questions like, "Hey, how different is convalescent plasma "treatment compared to a vaccine?" And various in that question and Kendra automatically understand the context, and gets the summary answer to these questions for the customers, so. And this is one of the things where when we talk about breaking the data silos, it takes care of getting back the data, and putting it in a central location. Understanding the context behind each of these documents, and then, being able to also then, quickly answer the queries of customers using simple query natural language as well. >> So what's the scale? Talk about the scale behind this. What's the scale numbers? What are you guys seeing? I see you guys always do a good job, I've run a great announcement, and then following up with general availability, which means I know you've got some customers using it. What are we talking about in terms of scales? Petabytes, can you give some insight into the kind of data scale you're talking about here? >> So the nice thing about Kendra is it is easily linearly scalable. So I, as a developer, I can keep adding more and more data, and that is it linearly scales to whatever scale our customers want. So and that is one of the underpinnings of Kendra search engine. So this is where even if you see like customers like PricewaterhouseCoopers is using Kendra to power its regulatory application to help customers search through regulatory information quickly and easily. So instead of sifting through hundreds of pages of documents manually to answer certain questions, now, Kendra allows them to answer natural language question. I'll give another example, which is speaks to the scale. One is Baker Tilly, a leading advisory, tax, and assurance firm, is using Kendra to index documents. Compared to a traditional SharePoint-based full-text search, now, they are using Kendra to quickly search product manuals and so forth. And they're able to get answers up to 10x faster. Look at that kind of impact what Kendra has, being able to index vast amount of data, with in a linearly scalable fashion, keep adding in the order of terabytes, and keep going, and being able to search 10x faster than traditional, I mean traditional keyword search based algorithm is actually a big deal for these customers. They're very excited. >> So what is the main problem that you're solving with Kendra? What's the use case? If I'm the customer, what's my problem that you're solving? Is it just response to data, whether it's a call center, or support, or is it an app? I mean, what's the main focus that you guys came out? What was the vector of problem that you're solving here? >> So when we talked to customers before we started building Kendra, one of the things that constantly came back for us was that they wanted the same ease of use and the ability to search the world wide web, and customers like us to search within an enterprise. So it can be in the form of like an internal search to search within like the HR documents or internal wiki pages and so forth, or it can be to search like internal technical documentation or the public documentation to help the contact centers or is it the external search in terms of customer support and so forth, or to enable collaboration by sharing knowledge base and so forth. So each of these is really dissected. Why is this a problem? Why is it not being solved by traditional search techniques? One of the things that became obvious was that unlike the external world where the web pages are linked that easily with very well-defined structure, internal world is very messy within an enterprise. The documents are put in a SharePoint, or in a file system, or in a storage service like S3, or on naturally, tell-stores or Box, or various other things. And what really customers wanted was a system which knows how to actually pull the data from various these data silos, still understand the access control behind this, and enforce them in the search. And then, understand the real data behind it, and not just do simple keyword search, so that we can build remarkable search service that really answers queries in a natural language. And this has been the theme, premise of Kendra, and this is what had started to resonate with our customers. I talked with some of the other examples even in areas like contact centers. For instance, Magellan Health is using Kendra for its contact centers. So they are able to seamlessly tie like member, provider, or client specific information with other inside information about health care to its agents so that they can quickly resolve the call. Or it can be on internally to do things like external search as well. So very satisfied client. >> So you guys took the basic concept of discovery navigation, which is the consumer web, find what you're looking for as fast as possible, but also took advantage of building intelligence around understanding all the nuances and configuration, schemas, access, under the covers and allowing things to be discovered in a new way. So you basically makes data be discoverable, and then, provide an interface. >> Yeah. >> For discovery and navigation. So it's a broad use cat, then. >> Right, yeah that's sounds somewhat right except we did one thing more. We actually understood not just, we didn't just do discovery and also made it easy for people to find the information but they are sifting through like terabytes or hundreds of terabytes of internal documentation. Sometimes, one other things that happens is throwing a bunch of hundreds of links to these documents is not good enough. For instance, if I'm actually trying to find out for instance, what is the ALS marker in an health care setting, and for a particular research project, then, I don't want to actually sift through like thousands of links. Instead, I want to be able to correctly pinpoint which document contains answer to it. So that is the final element, which is to really understand the context behind each and every document using natural language processing techniques so that you not only find discover the information that is relevant but you also get like highly accurate possible precise answers to some of your questions. >> Well, that's great stuff, big fan. I was really liking the announcement of Kendra. Congratulations on the GA of that. We'll make some room on our CUBE Virtual site for your team to put more Kendra information up. I think it's fascinating. I think that's going to be the beginning of how the world changes, where this, this certainly with the voice activation and API-based applications integrating this in. I just see a ton of activity that this is going to have a lot of headroom. So appreciate that. The other thing I want to get to while I have you here is the news around the augmented artificial intelligence has been brought out as well. >> Yeah. >> So the GA of that is out. You guys are GA-ing everything, which is right on track with your cadence of AWS laws, I'd say. What is this about? Give us the headline story. What's the main thing to pay attention to of the GA? What have you learned? What's the learning curve, what's the results? >> So augmented artificial intelligence service, I called it A2I but Amazon A2I service, we made it generally available. And it is a very unique service that makes it easy for developers to augment human intelligence with machine learning predictions. And this is historically, has been a very challenging problem. We look at, so let me take a step back and explain the general idea behind it. You look at any developer building a machine learning application, there are use cases where even actually in 99% accuracy in machine learning is not going to be good enough to directly use that result as the response to back to the customer. Instead, you want to be able to augment that with human intelligence to make sure, hey, if my machine learning model is returning, saying hey, my confidence interval for this prediction is less than 70%, I would like it to be augmented with human intelligence. Then, A2I makes it super easy for customers to be, developers to use actually, a human reviewer workflow that comes in between. So then, I can actually send it either to the public pool using Mechanical Turk, where we have more than 500,000 Turkers, or I can use a private workflow as a vendor workflow. So now, A2I seamlessly integrates with our Textract, Rekognition, or SageMaker custom models. So now, for instance, NHS is integrated A2I with Textract, so that, and they are building these document processing workflows. The areas where the machine learning model confidence load is not as high, they will be able augment that with their human reviewer workflows so that they can actually build in highly accurate document processing workflow as well. So this, we think is a powerful capability. >> So this really kind of gets to what I've been feeling in some of the stuff we worked with you guys on our machine learning piece. It's hard for companies to hire machine learning people. This has been a real challenge. So I like this idea of human augmentation because humans and machines have to have that relationship, and if you build good abstraction layers, and you abstract away the complexity, which is what you guys do, and that's the vision of cloud, then, you're going to need to have that relationship solidified. So at what point do you think we're going to be ready for theCUBE team, or any customer that doesn't have the or can't find a machine learning person? Or may not want to pay the wages that's required? I mean it's hard to find a machine learning engineer, and when does the data science piece come in with visualization, the spectrum of pure computer science, math, machine learning guru to full end user productivity? Machine learning is where you guys are doing a lot of work. Can you just share your opinion on that evolution of where we are on that? Because people want to get to the point where they don't have to hire machine learning folks. >> Yeah. >> And have that kind support too. >> If you look at the history of technology, I actually always believe that many of these highly disruptive technology started as a way that it is available only to experts, and then, they quickly go through the cycles, where it becomes almost common place. I'll give an example with something totally outside the IT space. Let's take photography. I think more than probably 150 years ago, the first professional camera was invented, and built like three to four years still actually take a really good picture. And there were only very few expert photographers in the world. And then, fast forward to time where we are now, now, even my five-year-old daughter takes actually very good portraits, and actually gives it as a gift to her mom for Mother's Day. So now, if you look at Instagram, everyone is a professional photographer. I kind of think the same thing is about to, it will happen in machine learning too. Compared to 2012, where there were very few deep learning experts, who can really build these amazing applications, now, we are starting to see like tens of thousands of actually customers using machine learning in production in AWS, not just proof of concepts but in production. And this number is rapidly growing. I'll give one example. Internally, if you see Amazon, to aid our entire company to transform and make machine learning as a natural part of the business, six years ago, we started a Machine Learning University. And since then, we have been training all our engineers to take machine learning courses in this ML University, and a year ago, we actually made these coursework available through our Training and Certification platform in AWS, and within 48 hours, more than 100,000 people registered. Think about it, that's like a big all-time record. That's why I always like to believe that developers are always eager to learn, they're very hungry to pick up new technology, and I wouldn't be surprised if four or five years from now, machine learning is kind of becomes a normal feature of the app, the same with databases are, and that becomes less special. If that day happens, then, I would see it as my job is done, so. >> Well, you've got a lot more work to do because I know from the conversations I've been having around this COVID-19 pandemic is it's that there's general consensus and validation that the future got pulled forward, and what used to be an inside industry conversation that we used to have around machine learning and some of the visions that you're talking about has been accelerated on the pace of the new cloud scale, but now that people now recognize that virtual and experiencing it firsthand globally, everyone, there are now going to be an acceleration of applications. So we believe there's going to be a Cambrian explosion of new applications that got to reimagine and reinvent some of the plumbing or abstractions in cloud to deliver new experiences, because the expectations have changed. And I think one of the things we're seeing is that machine learning combined with cloud scale will create a whole new trajectory of a Cambrian explosion of applications. So this has kind of been validated. What's your reaction to that? I mean do you see something similar? What are some of the things that you're seeing as we come into this world, this virtualization of our lives, it's every vertical, it's not one vertical anymore that's maybe moving faster. I think everyone sees the impact. They see where the gaps are in this new reality here. What's your thoughts? >> Yeah, if you see the history from machine learning specifically around deep learning, while the technology is really not new, especially because the early deep learning paper was probably written like almost 30 years ago. And why didn't we see deep learning take us sooner? It is because historically, deep learning technologies have been hungry for computer resources, and hungry for like huge amount of data. And then, the abstractions were not easy enough. As you rightfully pointed out that cloud has come in made it super easy to get like access to huge amount of compute and huge amount of data, and you can literally pay by the hour or by the minute. And with new tools being made available to developers like SageMaker and all the AI services, we are talking about now, there is an explosion of options available that are easy to use for developers that we are starting to see, almost like a huge amount of like innovations starting to pop up. And unlike traditional disruptive technologies, which you usually see crashing in like one or two industry segments, and then, it crosses the chasm, and then goes mainstream, but machine learning, we are starting to see traction almost in like every industry segment, all the way from like in financial sector, where fintech companies like Intuit is using it to forecast its call center volume and then, personalization. In the health care sector, companies like Aidoc are using computer vision to assist radiologists. And then, we are seeing in areas like public sector. NASA has partnered with AWS to use machine learning to do anomaly detection, algorithms to detect solar flares in the space. And yeah, examples are plenty. It is because now, machine learning has become such common place that and almost every industry segment and every CIO is actually already looking at how can they reimagine, and reinvent, and make their customer experience better covered by machine learning. In the same way, Amazon actually asked itself, like eight or 10 years ago, so very exciting. >> Well, you guys continue to do the work, and I agree it's not just machine learning by itself, it's the integration and the perfect storm of elements that have come together at this time. Although pretty disastrous, but I think ultimately, it's going to come out, we're going to come out of this on a whole 'nother trajectory. It's going to be creativity will be emerged. You're going to start seeing really those builders thinking, "Okay hey, I got to get out there. "I can deliver, solve the gaps we are exposed. "Solve the problems, "pre-create new expectations, new experience." I think it's going to be great for software developers. I think it's going to change the computer science field, and it's really bringing the lifestyle aspect of things. Applications have to have a recognition of this convergence, this virtualization of life. >> Yeah. >> The applications are going to have to have that. So and remember virtualization helped Amazon formed the cloud. Maybe, we'll get some new kinds of virtualization, Swami. (laughs) Thanks for coming on, really appreciate it. Always great to see you. Thanks for taking the time. >> Okay, great to see you, John, also. Thank you, thanks again. >> We're with Swami, the Vice President of Machine Learning at AWS. Been on before theCUBE Alumni. Really sharing his insights around what we see around this virtualization, this online event at the Amazon Summit, we're covering with the Virtual CUBE. But as we go forward, more important than ever, the data is going to be important, searching it, finding it, and more importantly, having the humans use it building an application. So theCUBE coverage continues, for AWS Summit Virtual Online, I'm John Furrier, thanks for watching. (enlightening music)

Published Date : May 13 2020

SUMMARY :

leaders all around the world, and all the services around Great to see you. and we are confident that we will the data is going to start flowing in one of the new capabilities we are about announced at the Summit. That was a big hit A lot of buzz! and the answer, it won't just give you list of links related to it. and I think I'd like to get and the scale behind it. and then, being able to also then, into the kind of data scale So and that is one of the underpinnings One of the things that became obvious to be discovered in a new way. and navigation. So that is the final element, that this is going to What's the main thing to and explain the general idea behind it. and that's the vision of cloud, And have that and built like three to four years still and some of the visions of options available that are easy to use and it's really bringing the are going to have to have that. Okay, great to see you, John, also. the data is going to be important,

ENTITIES

Entity	Category	Confidence
NASA	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Swami	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2012	DATE	0.99+
John Furrier	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Boston	LOCATION	0.99+
99%	QUANTITY	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.99+
Kendra	ORGANIZATION	0.99+
Aidoc	ORGANIZATION	0.99+
2%	QUANTITY	0.99+
hundreds of pages	QUANTITY	0.99+
Swami Sivasubramanian	PERSON	0.99+
four years	QUANTITY	0.99+
less than 70%	QUANTITY	0.99+
thousands of links	QUANTITY	0.99+
S3	TITLE	0.99+
10x	QUANTITY	0.99+
more than 100,000 people	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
Mother's Day	EVENT	0.99+
3M	ORGANIZATION	0.99+
six years ago	DATE	0.99+
SharePoint	TITLE	0.99+
Magellan Health	ORGANIZATION	0.99+
hundreds of links	QUANTITY	0.98+
eight	DATE	0.98+
a year ago	DATE	0.98+
each	QUANTITY	0.98+
8:30 a.m.	DATE	0.98+
48 hours	QUANTITY	0.98+
Mechanical Turk	ORGANIZATION	0.98+
PricewaterhouseCoopers	ORGANIZATION	0.98+
one example	QUANTITY	0.98+
Textract	TITLE	0.97+
Amazon Summit	EVENT	0.97+
five-year-old	QUANTITY	0.97+
Salesforce	TITLE	0.97+
ML University	ORGANIZATION	0.97+
hundreds of terabytes	QUANTITY	0.97+
Allen Institute for AI	ORGANIZATION	0.97+
first professional camera	QUANTITY	0.96+
COVID-19 pandemic	EVENT	0.96+
A2I	TITLE	0.96+
One	QUANTITY	0.95+
COVID-19	OTHER	0.95+
Machine Learning University	ORGANIZATION	0.95+
GA	LOCATION	0.94+
Instagram	ORGANIZATION	0.94+
pandemic	EVENT	0.93+
theCUBE Studios	ORGANIZATION	0.93+
COVID	TITLE	0.93+
Baker Tilly	ORGANIZATION	0.92+
AWS Summit	EVENT	0.92+

UNLIST TILL 4/2 - Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning

>> Paige: Hello, everybody, and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled "Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning." I'm Paige Roberts, Opensource Relations Manager at Vertica, and I'll be your host for this session. Joining me is Vertica Software Engineer, George Larionov. >> George: Hi. >> Paige: (chuckles) That's George. So, before we begin, I encourage you guys to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. So, as soon as a question occurs to you, go ahead and type it in, and there will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to get to during that time. Any questions we don't get to, we'll do our best to answer offline. Now, alternatively, you can visit Vertica Forum to post your questions there, after the session. Our engineering team is planning to join the forums to keep the conversation going, so you can ask an engineer afterwards, just as if it were a regular conference in person. Also, reminder, you can maximize your screen by clicking the double-arrow button in the lower right corner of the slides. And, before you ask, yes, this virtual session is being recorded, and it will be available to view by the end this week. We'll send you a notification as soon as it's ready. Now, let's get started, over to you, George. >> George: Thank you, Paige. So, I've been introduced. I'm a Software Engineer at Vertica, and today I'm going to be talking about a new feature, Vertica's Integration with TensorFlow. So, first, I'm going to go over what is TensorFlow and what are neural networks. Then, I'm going to talk about why integrating with TensorFlow is a useful feature, and, finally, I am going to talk about the integration itself and give an example. So, as we get started here, what is TensorFlow? TensorFlow is an opensource machine learning library, developed by Google, and it's actually one of many such libraries. And, the whole point of libraries like TensorFlow is to simplify the whole process of working with neural networks, such as creating, training, and using them, so that it's available to everyone, as opposed to just a small subset of researchers. So, neural networks are computing systems that allow us to solve various tasks. Traditionally, computing algorithms were designed completely from the ground up by engineers like me, and we had to manually sift through the data and decide which parts are important for the task and which are not. Neural networks aim to solve this problem, a little bit, by sifting through the data themselves, automatically and finding traits and features which correlate to the right results. So, you can think of it as neural networks learning to solve a specific task by looking through the data without having human beings have to sit and sift through the data themselves. So, there's a couple necessary parts to getting a trained neural model, which is the final goal. By the way, a neural model is the same as a neural network. Those are synonymous. So, first, you need this light blue circle, an untrained neural model, which is pretty easy to get in TensorFlow, and, in edition to that, you need your training data. Now, this involves both training inputs and training labels, and I'll talk about exactly what those two things are on the next slide. But, basically, you need to train your model with the training data, and, once it is trained, you can use your trained model to predict on just the purple circle, so new training inputs. And, it will predict the training labels for you. You don't have to label it anymore. So, a neural network can be thought of as... Training a neural network can be thought of as teaching a person how to do something. For example, if I want to learn to speak a new language, let's say French, I would probably hire some sort of tutor to help me with that task, and I would need a lot of practice constructing and saying sentences in French. And a lot of feedback from my tutor on whether my pronunciation or grammar, et cetera, is correct. And, so, that would take me some time, but, finally, hopefully, I would be able to learn the language and speak it without any sort of feedback, getting it right. So, in a very similar manner, a neural network needs to practice on, example, training data, first, and, along with that data, it needs labeled data. In this case, the labeled data is kind of analogous to the tutor. It is the correct answers, so that the network can learn what those look like. But, ultimately, the goal is to predict on unlabeled data which is analogous to me knowing how to speak French. So, I went over most of the bullets. A neural network needs a lot of practice. To do that, it needs a lot of good labeled data, and, finally, since a neural network needs to iterate over the training data many, many times, it needs a powerful machine which can do that in a reasonable amount of time. So, here's a quick checklist on what you need if you have a specific task that you want to solve with a neural network. So, the first thing you need is a powerful machine for training. We discussed why this is important. Then, you need TensorFlow installed on the machine, of course, and you need a dataset and labels for your dataset. Now, this dataset can be hundreds of examples, thousands, sometimes even millions. I won't go into that because the dataset size really depends on the task at hand, but if you have these four things, you can train a good neural network that will predict whatever result you want it to predict at the end. So, we've talked about neural networks and TensorFlow, but the question is if we already have a lot of built-in machine-learning algorithms in Vertica, then why do we need to use TensorFlow? And, to answer that question, let's look at this dataset. So, this is a pretty simple toy dataset with 20,000 points, but it shows, it simulates a more complex dataset with some sort of two different classes which are not related in a simple way. So, the existing machine-learning algorithms that Vertica already has, mostly fail on this pretty simple dataset. Linear models can't really draw a good line separating the two types of points. Naïve Bayes, also, performs pretty badly, and even the Random Forest algorithm, which is a pretty powerful algorithm, with 300 trees gets only 80% accuracy. However, a neural network with only two hidden layers gets 99% accuracy in about ten minutes of training. So, I hope that's a pretty compelling reason to use neural networks, at least sometimes. So, as an aside, there are plenty of tasks that do fit the existing machine-learning algorithms in Vertica. That's why they're there, and if one of your tasks that you want to solve fits one of the existing algorithms, well, then I would recommend using that algorithm, not TensorFlow, because, while neural networks have their place and are very powerful, it's often easier to use an existing algorithm, if possible. Okay, so, now that we've talked about why neural networks are needed, let's talk about integrating them with Vertica. So, neural networks are best trained using GPUs, which are Graphics Processing Units, and it's, basically, just a different processing unit than a CPU. GPUs are good for training neural networks because they excel at doing many, many simple operations at the same time, which is needed for a neural network to be able to iterate through the training data many times. However, Vertica runs on CPUs and cannot run on GPUs at all because that's not how it was designed. So, to train our neural networks, we have to go outside of Vertica, and exporting a small batch of training data is pretty simple. So, that's not really a problem, but, given this information, why do we even need Vertica? If we train outside, then why not do everything outside of Vertica? So, to answer that question, here is a slide that Philips was nice enough to let us use. This is an example of production system at Philips. So, it consists of two branches. On the left, we have a branch with historical device log data, and this can kind of be thought of as a bunch of training data. And, all that data goes through some data integration, data analysis. Basically, this is where you train your models, whether or not they are neural networks, but, for the purpose of this talk, this is where you would train your neural network. And, on the right, we have a branch which has live device log data coming in from various MRI machines, CAT scan machines, et cetera, and this is a ton of data. So, these machines are constantly running. They're constantly on, and there's a bunch of them. So, data just keeps streaming in, and, so, we don't want this data to have to take any unnecessary detours because that would greatly slow down the whole system. So, this data in the right branch goes through an already trained predictive model, which need to be pretty fast, and, finally, it allows Philips to do some maintenance on these machines before they actually break, which helps Philips, obviously, and definitely the medical industry as well. So, I hope this slide helped explain the complexity of a live production system and why it might not be reasonable to train your neural networks directly in the system with the live device log data. So, a quick summary on just the neural networks section. So, neural networks are powerful, but they need a lot of processing power to train which can't really be done well in a production pipeline. However, they are cheap and fast to predict with. Prediction with a neural network does not require GPU anymore. And, they can be very useful in production, so we do want them there. We just don't want to train them there. So, the question is, now, how do we get neural networks into production? So, we have, basically, two options. The first option is to take the data and export it to our machine with TensorFlow, our powerful GPU machine, or we can take our TensorFlow model and put it where the data is. In this case, let's say that that is Vertica. So, I'm going to go through some pros and cons of these two approaches. The first one is bringing the data to the analytics. The pros of this approach are that TensorFlow is already installed, running on this GPU machine, and we don't have to move the model at all. The cons, however, are that we have to transfer all the data to this machine and if that data is big, if it's, I don't know, gigabytes, terabytes, et cetera, then that becomes a huge bottleneck because you can only transfer in small quantities. Because GPU machines tend to not be that big. Furthermore, TensorFlow prediction doesn't actually need a GPU. So, you would end up paying for an expensive GPU for no reason. It's not parallelized because you just have one GPU machine. You can't put your production system on this GPU, as we discussed. And, so, you're left with good results, but not fast and not where you need them. So, now, let's look at the second option. So, the second option is bringing the analytics to the data. So, the pros of this approach are that we can integrate with our production system. It's low impact because prediction is not processor intensive. It's cheap, or, at least, it's pretty much as cheap as your system was before. It's parallelized because Vertica was always parallelized, which we'll talk about in the next slide. There's no extra data movement. You get the benefit from model management in Vertica, meaning, if you import multiple TensorFlow models, you can keep track of their various attributes, when they were imported, et cetera. And, the results are right where you need them, inside your production pipeline. So, two cons are that TensorFlow is limited to just prediction inside Vertica, and, if you want to retrain your model, you need to do that outside of Vertica and, then, reimport. So, just as a recap of parallelization. Everything in Vertica is parallelized and distributed, and TensorFlow is no exception. So, when you import your TensorFlow model to your Vertica cluster, it gets copied to all the nodes, automatically, and TensorFlow will run in fenced mode which means that it the TensorFlow process fails for whatever reason, even though it shouldn't, but if it does, Vertica itself will not crash, which is obviously important. And, finally, prediction happens on each node. There are multiple threads of TensorFlow processes running, processing different little bits of data, which is faster, much faster, than processing the data line by line because it happens all in a parallelized fashion. And, so, the result is fast prediction. So, here's an example which I hope is a little closer to what everyone is used to than the usual machine learning TensorFlow example. This is the Boston housing dataset, or, rather, a small subset of it. Now, on the left, we have the input data to go back to, I think, the first slide, and, on the right, is the training label. So, the input data consists of, each line is a plot of land in Boston, along with various attributes, such as the level of crime in that area, how much industry is in that area, whether it's on the Charles River, et cetera, and, on the right, we have as the labels the median house value in that plot of land. And, so, the goal is to put all this data into the neural network and, finally, get a model which can train... I don't know, which can predict on new incoming data and predict a good housing value for that data. Now, I'm going to go through, step by step, how to actually use TensorFlow models in Vertica. So, the first step I won't go into much detail on because there are countless tutorials and resources online on how to use TensorFlow to train a neural network, so that's the first step. Second step is to save the model in TensorFlow's 'frozen graph' format. Again, this information is available online. The third step is to create a small, simple JSON file describing the inputs and outputs of the model, and what data type they are, et cetera. And, this is needed for Vertica to be able to translate from TensorFlow land into Vertica equal land, so that it can use a sequel table instead of the input set TensorFlow usually takes. So, once you have your model file and your JSON file, you want to put both of those files in a directory on a node, any node, in a Vertica cluster, and name that directory whatever you want your model to ultimately be called inside of Vertica. So, once you do that you can go ahead and import that directory into Vertica. So, this import model's function already exists in Vertica. All we added was a new category to be able to import. So, what you need to do is specify the pass to your neural network directory and specify that the category that the model is is a TensorFlow model. Once you successfully import, in order to predict, you run this brand new predict TensorFlow function, so, in this case, we're predicting on everything from the input table, which is what the star means. The model name is Boston housing net which is the name of your directory, and, then, there's a little bit of boilerplate. And, the two ID and value after the as are just the names of the columns of your outputs, and, finally, the Boston housing data is whatever sequel table you want to predict on that fits the import type of your network. And, this will output a bunch of predictions. In this case, values of houses that the network thinks are appropriate for all the input data. So, just a quick summary. So, we talked about what is TensorFlow and what are neural networks, and, then, we discussed that TensorFlow works best on GPUs because it needs very specific characteristics. That is TensorFlow works best for training on GPUs while Vertica is designed to use CPUs, and it's really good at storing and accessing a lot of data quickly. But, it's not very well designed for having neural networks trained inside of it. Then, we talked about how neural models are powerful, and we want to use them in our production flow. And, since prediction is fast, we can go ahead and do that, but we just don't want to train there, and, finally, I presented Vertica TensorFlow integration which allows importing a trained neural model, a trained neural TensorFlow model, into Vertica and predicting on all the data that is inside Vertica with few simple lines of sequel. So, thank you for listening. I'm going to take some questions, now.

Published Date : Mar 30 2020

SUMMARY :

and I'll be your host for this session. So, as soon as a question occurs to you, So, the second option is bringing the analytics to the data.

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
Philips	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
George	PERSON	0.99+
99%	QUANTITY	0.99+
20,000 points	QUANTITY	0.99+
second option	QUANTITY	0.99+
Charles River	LOCATION	0.99+
Google	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
third step	QUANTITY	0.99+
first step	QUANTITY	0.99+
George Larionov	PERSON	0.99+
first option	QUANTITY	0.99+
two things	QUANTITY	0.99+
first	QUANTITY	0.99+
Second step	QUANTITY	0.99+
Paige	PERSON	0.99+
each line	QUANTITY	0.99+
two branches	QUANTITY	0.99+
Today	DATE	0.99+
two options	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
300 trees	QUANTITY	0.99+
two approaches	QUANTITY	0.99+
millions	QUANTITY	0.99+
first slide	QUANTITY	0.99+
TensorFlow	TITLE	0.99+
Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning	TITLE	0.99+
two types	QUANTITY	0.99+
two different classes	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
first one	QUANTITY	0.98+
two cons	QUANTITY	0.97+
about ten minutes	QUANTITY	0.97+
two hidden layers	QUANTITY	0.97+
French	OTHER	0.96+
each node	QUANTITY	0.95+
one	QUANTITY	0.95+
end this week	DATE	0.94+
two ID	QUANTITY	0.91+
four things	QUANTITY	0.89+

UNLIST TILL 4/2 - Autonomous Log Monitoring

>> Sue: Hi everybody, thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled "Autonomous Monitoring Using Machine Learning". My name is Sue LeClaire, director of marketing at Vertica, and I'll be your host for this session. Joining me is Larry Lancaster, founder and CTO at Zebrium. Before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, just type your question or comment in the question box below the slide and click submit. There will be a Q&A session at the end of the presentation and we'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer them offline. Alternatively, you can also go and visit Vertica forums to post your questions after the session. Our engineering team is planning to join the forums to keep the conversation going. Also, just a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available for you to view on demand later this week. We'll send you a notification as soon as it's ready. So, let's get started. Larry, over to you. >> Larry: Hey, thanks so much. So hi, my name's Larry Lancaster and I'm here to talk to you today about something that I think who's time has come and that's autonomous monitoring. So, with that, let's get into it. So, machine data is my life. I know that's a sad life, but it's true. So I've spent most of my career kind of taking telemetry data from products, either in the field, we used to call it in the field or nowadays, that's been deployed, and bringing that data back, like log file stats, and then building stuff on top of it. So, tools to run the business or services to sell back to users and customers. And so, after doing that a few times, it kind of got to the point where I was really sort of sick of building the same kind of thing from scratch every time, so I figured, why not go start a company and do it so that we don't have to do it manually ever again. So, it's interesting to note, I've put a little sentence here saying, "companies where I got to use Vertica" So I've been actually kind of working with Vertica for a long time now, pretty much since they came out of alpha. And I've really been enjoying their technology ever since. So, our vision is basically that I want a system that will characterize incidents before I notice. So an incident is, you know, we used to call it a support case or a ticket in IT, or a support case in support. Nowadays, you may have a DevOps team, or a set of SREs who are monitoring a production sort of deployment. And so they'll call it an incident. So I'm looking for something that will notice and characterize an incident before I notice and have to go digging into log files and stats to figure out what happened. And so that's a pretty heady goal. And so I'm going to talk a little bit today about how we do that. So, if we look at logs in particular. Logs today, if you look at log monitoring. So monitoring is kind of that whole umbrella term that we use to talk about how we monitor systems in the field that we've shipped, or how we monitor production deployments in a more modern stack. And so basically there are log monitoring tools. But they have a number of drawbacks. For one thing, they're kind of slow in the sense that if something breaks and I need to go to a log file, actually chances are really good that if you have a new issue, if it's an unknown unknown problem, you're going to end up in a log file. So the problem then becomes basically you're searching around looking for what's the root cause of the incident, right? And so that's kind of time-consuming. So, they're also fragile and this is largely because log data is completely unstructured, right? So there's no formal grammar for a log file. So you have this situation where, if I write a parser today, and that parser is going to do something, it's going to execute some automation, it's going to open or update a ticket, it's going to maybe restart a service, or whatever it is that I want to happen. What'll happen is later upstream, someone who's writing the code that produces that log message, they might do something really useful for me, or for users. And they might go fix a spelling mistake in that log message. And then the next thing you know, all the automation breaks. So it's a very fragile source for automation. And finally, because of that, people will set alerts on, "Oh, well tell me how many thousands of errors are happening every hour." Or some horrible metric like that. And then that becomes the only visibility you have in the data. So because of all this, it's a very human-driven, slow, fragile process. So basically, we've set out to kind of up-level that a bit. So I touched on this already, right? The truth is if you do have an incident, you're going to end up in log files to do root cause. It's almost always the case. And so you have to wonder, if that's the case, why do most people use metrics only for monitoring? And the reason is related to the problems I just described. They're already structured, right? So for logs, you've got this mess of stuff, so you only want to dig in there when you absolutely have to. But ironically, it's where a lot of the information that you need actually is. So we have a model today, and this model used to work pretty well. And that model is called "index and search". And it basically means you treat log files like they're text documents. And so you index them and when there's some issue you have to drill into, then you go searching, right? So let's look at that model. So 20 years ago, we had sort of a shrink-wrap software delivery model. You had an incident. With that incident, maybe you had one customer and you had a monolithic application and a handful of log files. So it's perfectly natural, in fact, usually you could just v-item the log file, and search that way. Or if there's a lot of them, you could index them and search them that way. And that all worked very well because the developer or the support engineer had to be an expert in those few things, in those few log files, and understand what they meant. But today, everything has changed completely. So we live in a software as a service world. What that means is, for a given incident, first of all you're going to be affecting thousands of users. You're going to have, potentially, 100 services that are deployed in your environment. You're going to have 1,000 log streams to sift through. And yet, you're still kind of stuck in the situation where to go find out what's the matter, you're going to have to search through the log files. So this is kind of the unacceptable sort of position we're in today. So for us, the future will not be index and search. And that's simply because it cannot scale. And the reason I say that it can't scale is because it all kind of is bottlenecked by a person and their eyeball. So, you continue to drive up the amount of data that has to be sifted through, the complexity of the stack that has to be understood, and you still, at the end of the day, for MTTR purposes, you still have the same bottleneck, which is the eyeball. So this model, I believe, is fundamentally broken. And that's why, I believe in five years you're going to be in a situation where most monitoring of unknown unknown problems is going to be done autonomously. And those issues will be characterized autonomously because there's no other way it can happen. So now I'm going to talk a little bit about autonomous monitoring itself. So, autonomous monitoring basically means, if you can imagine in a monitoring platform and you watch the monitoring platform, maybe you watch the alerts coming from it or more importantly, you kind of watch the dashboards and try to see if something looks weird. So autonomous monitoring is the notion that the platform should do the watching for you and only let you know when something is going wrong and should kind of give you a window into what happened. So if you look at this example I have on screen, just to take it really slow and absorb the concept of autonomous monitoring. So here in this example, we've stopped the database. And as a result, down below you can see there were a bunch of fallout. This is an Atlassian Stack, so you can imagine you've got a Postgres database. And then you've got sort of Bitbucket, and Confluence, and Jira, and these various other components that need the database operating in order to function. So what this is doing is it's calling out, "Hey, the root cause is the database stopped and here's the symptoms." Now, you might be wondering, so what. I mean I could go write a script to do this sort of thing. Here's what's interesting about this very particular example, and I'll show a couple more examples that are a little more involved. But here's the interesting thing. So, in the software that came up with this incident and opened this incident and put this root cause and symptoms in there, there's no code that knows anything about timestamp formats, severities, Atlassian, Postgres, databases, Bitbucket, Confluence, there's no regexes that talk about starting, stopped, RDBMS, swallowed exception, and so on and so forth. So you might wonder how it's possible then, that something which is completely ignorant of the stack, could come up with this description, which is exactly what a human would have had to do, to figure out what happened. And I'm going to get into how we do that. But that's what autonomous monitoring is about. It's about getting into a set of telemetry from a stack with no prior information, and understanding when something breaks. And I could give you the punchline right now, which is there are fundamental ways that software behaves when it's breaking. And by looking at hundreds of data sets that people have generously allowed us to use containing incidents, we've been able to characterize that and now generalize it to apply it to any new data set and stack. So here's an interesting one right here. So there's a fella, David Gill, he's just a genius in the monitoring space. He's been working with us for the last couple of months. So he said, "You know what I'm going to do, is I'm going to run some chaos experiments." So for those of you who don't know what chaos engineering is, here's the idea. So basically, let's say I'm running a Kubernetes cluster and what I'll do is I'll use sort of a chaos injection test, something like litmus. And basically it will inject issues, it'll break things in my application randomly to see if my monitoring picks it up. And so this is what chaos engineering is built around. It's built around sort of generating lots of random problems and seeing how the stack responds. So in this particular case, David went in and he deleted, basically one of the tests that was presented through litmus did a delete of a pod delete. And so that's going to basically take out some containers that are part of the service layer. And so then you'll see all kinds of things break. And so what you're seeing here, which is interesting, this is why I like to use this example. Because it's actually kind of eye-opening. So the chaos tool itself generates logs. And of course, through Kubernetes, all the log files locations that are on the host, and the container logs are known. And those are all pulled back to us automatically. So one of the log files we have is actually the chaos tool that's doing the breaking, right? And so what the tool said here, when it went to determine what the root cause was, was it noticed that there was this process that had these messages happen, initializing deletion lists, selection a pod to kill, blah blah blah. It's saying that the root cause is the chaos test. And it's absolutely right, that is the root cause. But usually chaos tests don't get picked up themselves. You're supposed to be just kind of picking up the symptoms. But this is what happens when you're able to kind of tease out root cause from symptoms autonomously, is you end up getting a much more meaningful answer, right? So here's another example. So essentially, we collect the log files, but we also have a Prometheus scraper. So if you export Prometheus metrics, we'll scrape those and we'll collect those as well. And so we'll use those for our autonomous monitoring as well. So what you're seeing here is an issue where, I believe this is where we ran something out of disk space. So it opened an incident, but what's also interesting here is, you see that it pulled that metric to say that the spike in this metric was a symptom of this running out of space. So again, there's nothing that knows anything about file system usage, memory, CPU, any of that stuff. There's no actual hard-coded logic anywhere to explain any of this. And so the concept of autonomous monitoring is looking at a stack the way a human being would. If you can imagine how you would walk in and monitor something, how you would think about it. You'd go looking around for rare things. Things that are not normal. And you would look for indicators of breakage, and you would see, do those seem to be correlated in some dimension? That is how the system works. So as I mentioned a moment ago, metrics really do kind of complete the picture for us. We end up in a situation where we have a one-stop shop for incident root cause. So, how does that work? Well, we ingest and we structure the log files. So if we're getting the logs, we'll ingest them and we'll structure them, and I'm going to show a little bit what that structure looks like and how that goes into the database in a moment. And then of course we ingest and structure the Prometheus metrics. But here, structure really should have an asterisk next to it, because metrics are mostly structured already. They have names. If you have your own scraper, as opposed to going into the time series Prometheus database and pulling metrics from there, you can keep a lot more information about metadata about those metrics from the exporter's perspective. So we keep all of that too. Then we do our anomaly detection on both of those sets of data. And then we cross-correlate metrics and log anomalies. And then we create incidents. So this is at a high level, kind of what's happening without any sort of stack-specific logic built in. So we had some exciting recent validation. So Mayadata's a pretty big player in the Kubernetes space. Essentially, they do Kubernetes as a managed service. They have tens of thousands of customers that they manage their Kubernetes clusters for them. And then they're also involved, both in the OpenEBS project, as well as in the Litmius project I mentioned a moment ago. That's their tool for chaos engineering. So they're a pretty big player in the Kubernetes space. So essentially, they said, "Oh okay, let's see if this is real." So what they did was they set up our collectors, which took three minutes in Kubernetes. And then they went and they, using Litmus, they reproduced eight incidents that their actual, real-world customers had hit. And they were trying to remember the ones that were the hardest to figure out the root cause at the time. And we picked up and put a root cause indicator that was correct in 100% of these incidents with no training configuration or metadata required. So this is kind of what autonomous monitoring is all about. So now I'm going to talk a little bit about how it works. So, like I said, there's no information included or required about, so if you imagine a log file for example. Now, commonly, over to the left-hand side of every line, there will be some sort of a prefix. And what I mean by that is you'll see like a timestamp, or a severity, and maybe there's a PID, and maybe there's function name, and maybe there's some other stuff there. So basically that's kind of, it's common data elements for a large portion of the lines in a given log file. But you know, of course, the contents change. So basically today, like if you look at a typical log manager, they'll talk about connectors. And what connectors means is, for an application it'll generate a certain prefix format in a log. And that means what's the format of the timestamp, and what else is in the prefix. And this lets the tool pick it up. And so if you have an app that doesn't have a connector, you're out of luck. Well, what we do is we learn those prefixes dynamically with machine learning. You do not have to have a connector, right? And what that means is that if you come in with your own application, the system will just work for it from day one. You don't have to have connectors, you don't have to describe the prefix format. That's so yesterday, right? So really what we want to be doing is up-leveling what the system is doing to the point where it's kind of working like a human would. You look at a log line, you know what's a timestamp. You know what's a PID. You know what's a function name. You know where the prefix ends and where the variable parts begin. You know what's a parameter over there in the variable parts. And sometimes you may need to see a couple examples to know what was a variable, but you'll figure it out as quickly as possible, and that's exactly how the system goes about it. As a result, we kind of embrace free-text logs, right? So if you look at a typical stack, most of the logs generated in a typical stack are usually free-text. Even structured logging typically will have a message attribute, which then inside of it has the free-text message. For us, that's not a bad thing. That's okay. The purpose of a log is to inform people. And so there's no need to go rewrite the whole logging stack just because you want a machine to handle it. They'll figure it out for themselves, right? So, you give us the logs and we'll figure out the grammar, not only for the prefix but also for the variable message part. So I already went into this, but there's more that's usually required for configuring a log manager with alerts. You have to give it keywords. You have to give it application behaviors. You have to tell it some prior knowledge. And of course the problem with all of that is that the most important events that you'll ever see in a log file are the rarest. Those are the ones that are one out of a billion. And so you may not know what's going to be the right keyword in advance to pick up the next breakage, right? So we don't want that information from you. We'll figure that out for ourselves. As the data comes in, essentially we parse it and we categorize it, as I've mentioned. And when I say categorize, what I mean is, if you look at a certain given log file, you'll notice that some of the lines are kind of the same thing. So this one will say "X happened five times" and then maybe a few lines below it'll say "X happened six times" but that's basically the same event type. It's just a different instance of that event type. And it has a different value for one of the parameters, right? So when I say categorization, what I mean is figuring out those unique types and I'll show an example of that next. Anomaly detection, we do on top of that. So anomaly detection on metrics in a very sort of time series by time series manner with lots of tunables is a well-understood problem. So we also do this on the event types occurrences. So you can think of each event type occurring in time as sort of a point process. And then you can develop statistics and distributions on that, and you can do anomaly detection on those. Once we have all of that, we have extracted features, essentially, from metrics and from logs. We do pattern recognition on the correlations across different channels of information, so different event types, different log types, different hoses, different containers, and then of course across to the metrics. Based on all of this cross-correlation, we end up with a root cause identification. So that's essentially, at a high level, how it works. What's interesting, from the perspective of this call particularly, is that incident detection needs relationally structured data. It really does. You need to have all the instances of a certain event type that you've ever seen easily accessible. You need to have the values for a given sort of parameter easily, quickly available so you can figure out what's the distribution of this over time, how often does this event type happen. You can run analytical queries against that information so that you can quickly, in real-time, do anomaly detection against new data. So here's an example of that this looks like. And this kind of part of the work that we've done. At the top you see some examples of log lines, right? So that's kind of a snippet, it's three lines out of a log file. And you see one in the middle there that's kind of highlighted with colors, right? I mean, it's a little messy, but it's not atypical of the log file that you'll see pretty much anywhere. So there, you've got a timestamp, and a severity, and a function name. And then you've got some other information. And then finally, you have the variable part. And that's going to have sort of this checkpoint for memory scrubbers, probably something that's written in English, just so that the person who's reading the log file can understand. And then there's some parameters that are put in, right? So now, if you look at how we structure that, the way it looks is there's going to be three tables that correspond to the three event types that we see above. And so we're going to look at the one that corresponds to the one in the middle. So if we look at that table, there you'll see a table with columns, one for severity, for function name, for time zone, and so on. And date, and PID. And then you see over to the right with the colored columns there's the parameters that were pulled out from the variable part of that message. And so they're put in, they're typed and they're in integer columns. So this is the way structuring needs to work with logs to be able to do efficient and effective anomaly detection. And as far as I know, we're the first people to do this inline. All right, so let's talk now about Vertica and why we take those tables and put them in Vertica. So Vertica really is an MPP column store, but it's more than that, because nowadays when you say "column store", people sort of think, like, for example Cassandra's a column store, whatever, but it's not. Cassandra's not a column store in the sense that Vertica is. So Vertica was kind of built from the ground up to be... So it's the original column store. So back in the cStor project at Berkeley that Stonebraker was involved in, he said let's explore what kind of efficiencies we can get out of a real columnar database. And what he found was that, he and his grad students that started Vertica. What they found was that what they can do is they could build a database that gives orders of magnitude better query performance for the kinds of analytics I'm talking about here today. With orders of magnitude less data storage underneath. So building on top of machine data, as I mentioned, is hard, because it doesn't have any defined schemas. But we can use an RDBMS like Vertica once we've structured the data to do the analytics that we need to do. So I talked a little bit about this, but if you think about machine data in general, it's perfectly suited for a columnar store. Because, if you imagine laying out sort of all the attributes of an event type, right? So you can imagine that each occurrence is going to have- So there may be, say, three or four function names that are going to occur for all the instances of a given event type. And so if you were to sort all of those event instances by function name, what you would find is that you have sort of long, million long runs of the same function name over and over. So what you have, in general, in machine data, is lots and lots of slowly varying attributes, lots of low-cardinality data that it's almost completely compressed out when you use a real column store. So you end up with a massive footprint reduction on disk. And it also, that propagates through the analytical pipeline. Because Vertica does late materialization, which means it tries to carry that data through memory with that same efficiency, right? So the scale-out architecture, of course, is really suitable for petascale workloads. Also, I should point out, I was going to mention it in another slide or two, but we use the Vertica Eon architecture, and we have had no problems scaling that in the cloud. It's a beautiful sort of rewrite of the entire data layer of Vertica. The performance and flexibility of Eon is just unbelievable. And so I've really been enjoying using it. I was skeptical, you could get a real column store to run in the cloud effectively, but I was completely wrong. So finally, I should mention that if you look at column stores, to me, Vertica is the one that has the full SQL support, it has the ODBC drivers, it has the ACID compliance. Which means I don't need to worry about these things as an application developer. So I'm laying out the reasons that I like to use Vertica. So I touched on this already, but essentially what's amazing is that Vertica Eon is basically using S3 as an object store. And of course, there are other offerings, like the one that Vertica does with pure storage that doesn't use S3. But what I find amazing is how well the system performs using S3 as an object store, and how they manage to keep an actual consistent database. And they do. We've had issues where we've gone and shut down hosts, or hosts have been shut down on us, and we have to restart the database and we don't have any consistency issues. It's unbelievable, the work that they've done. Essentially, another thing that's great about the way it works is you can use the S3 as a shared object store. You can have query nodes kind of querying from that set of files largely independently of the nodes that are writing to them. So you avoid this sort of bottleneck issue where you've got contention over who's writing what, and who's reading what, and so on. So I've found the performance using separate subclusters for our UI and for the ingest has been amazing. Another couple of things that they have is they have a lot of in-database machine learning libraries. There's actually some cool stuff on their GitHub that we've used. One thing that we make a lot of use of is the sequence and time series analytics. For example, in our product, even though we do all of this stuff autonomously, you can also go create alerts for yourself. And one of the kinds of alerts you can do, you can say, "Okay, if this kind of event happens within so much time, and then this kind of an event happens, but not this one," Then you can be alerted. So you can have these kind of sequences that you define of events that would indicate a problem. And we use their sequence analytics for that. So it kind of gives you really good performance on some of these queries where you're wanting to pull out sequences of events from a fact table. And timeseries analytics is really useful if you want to do analytics on the metrics and you want to do gap filling interpolation on that. It's actually really fast in performance. And it's easy to use through SQL. So those are a couple of Vertica extensions that we use. So finally, I would like to encourage everybody, hey, come try us out. Should be up and running in a few minutes if you're using Kubernetes. If not, it's however long it takes you to run an installer. So you can just come to our website, pick it up and try out autonomous monitoring. And I want to thank everybody for your time. And we can open it up for Q and A.

Published Date : Mar 30 2020

SUMMARY :

Also, just a reminder that you can maximize your screen And one of the kinds of alerts you can do, you can say,

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Larry Lancaster	PERSON	0.99+
David Gill	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Sue LeClaire	PERSON	0.99+
five times	QUANTITY	0.99+
Larry	PERSON	0.99+
S3	TITLE	0.99+
three minutes	QUANTITY	0.99+
six times	QUANTITY	0.99+
Sue	PERSON	0.99+
100 services	QUANTITY	0.99+
Zebrium	ORGANIZATION	0.99+
today	DATE	0.99+
three	QUANTITY	0.99+
five years	QUANTITY	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
both	QUANTITY	0.99+
Kubernetes	TITLE	0.99+
one	QUANTITY	0.99+
thousands	QUANTITY	0.99+
two	QUANTITY	0.99+
SQL	TITLE	0.99+
one customer	QUANTITY	0.98+
three lines	QUANTITY	0.98+
three tables	QUANTITY	0.98+
each event	QUANTITY	0.98+
hundreds	QUANTITY	0.98+
first people	QUANTITY	0.98+
1,000 log streams	QUANTITY	0.98+
20 years ago	DATE	0.98+
eight incidents	QUANTITY	0.98+
tens of thousands of customers	QUANTITY	0.97+
later this week	DATE	0.97+
thousands of users	QUANTITY	0.97+
Stonebraker	ORGANIZATION	0.96+
each occurrence	QUANTITY	0.96+
Postgres	ORGANIZATION	0.96+
One thing	QUANTITY	0.95+
three event types	QUANTITY	0.94+
million	QUANTITY	0.94+
Vertica	TITLE	0.94+
one thing	QUANTITY	0.93+
4/2	DATE	0.92+
English	OTHER	0.92+
four function names	QUANTITY	0.86+
day one	QUANTITY	0.84+
Prometheus	TITLE	0.83+
one-stop	QUANTITY	0.82+
Berkeley	LOCATION	0.82+
Confluence	ORGANIZATION	0.79+
double arrow	QUANTITY	0.79+
last couple of months	DATE	0.79+
one of	QUANTITY	0.76+
cStor	ORGANIZATION	0.75+
a billion	QUANTITY	0.73+
Atlassian Stack	ORGANIZATION	0.72+
Eon	ORGANIZATION	0.71+
Bitbucket	ORGANIZATION	0.68+
couple more examples	QUANTITY	0.68+
Litmus	TITLE	0.65+

UNLIST TILL 4/2 - Extending Vertica with the Latest Vertica Ecosystem and Open Source Initiatives

>> Sue: Hello everybody. Thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session in entitled Extending Vertica with the Latest Vertica Ecosystem and Open Source Initiatives. My name is Sue LeClaire, Director of Marketing at Vertica and I'll be your host for this webinar. Joining me is Tom Wall, a member of the Vertica engineering team. But before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. There will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't get to, we'll do our best to answer them offline. Alternatively, you can visit the Vertica forums to post you questions after the session. Our engineering team is planning to join the forums to keep the conversation going. Also a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available to view on demand later this week. We'll send you a notification as soon as it's ready. So let's get started. Tom, over to you. >> Tom: Hello everyone and thanks for joining us today for this talk. My name is Tom Wall and I am the leader of Vertica's ecosystem engineering team. We are the team that focuses on building out all the developer tools, third party integrations that enables the SoftMaker system that surrounds Vertica to thrive. So today, we'll be talking about some of our new open source initatives and how those can be really effective for you and make things easier for you to build and integrate Vertica with the rest of your technology stack. We've got several new libraries, integration projects and examples, all open source, to share, all being built out in the open on our GitHub page. Whether you use these open source projects or not, this is a very exciting new effort that will really help to grow the developer community and enable lots of exciting new use cases. So, every developer out there has probably had to deal with the problem like this. You have some business requirements, to maybe build some new Vertica-powered application. Maybe you have to build some new system to visualize some data that's that's managed by Vertica. The various circumstances, lots of choices will might be made for you that constrain your approach to solving a particular problem. These requirements can come from all different places. Maybe your solution has to work with a specific visualization tool, or web framework, because the business has already invested in the licensing and the tooling to use it. Maybe it has to be implemented in a specific programming language, since that's what all the developers on the team know how to write code with. While Vertica has many different integrations with lots of different programming language and systems, there's a lot of them out there, and we don't have integrations for all of them. So how do you make ends meet when you don't have all the tools you need? All you have to get creative, using tools like PyODBC, for example, to bridge between programming languages and frameworks to solve the problems you need to solve. Most languages do have an ODBC-based database interface. ODBC is our C-Library and most programming languages know how to call C code, somehow. So that's doable, but it often requires lots of configuration and troubleshooting to make all those moving parts work well together. So that's enough to get the job done but native integrations are usually a lot smoother and easier. So rather than, for example, in Python trying to fight with PyODBC, to configure things and get Unicode working, and to compile all the different pieces, the right way is to make it all work smoothly. It would be much better if you could just PIP install library and get to work. And with Vertica-Python, a new Python client library, you can actually do that. So that story, I assume, probably sounds pretty familiar to you. Sounds probably familiar to a lot of the audience here because we're all using Vertica. And our challenge, as Big Data practitioners is to make sense of all this stuff, despite those technical and non-technical hurdles. Vertica powers lots of different businesses and use cases across all kinds of different industries and verticals. While there's a lot different about us, we're all here together right now for this talk because we do have some things in common. We're all using Vertica, and we're probably also using Vertica with other systems and tools too, because it's important to use the right tool for the right job. That's a founding principle of Vertica and it's true today too. In this constantly changing technology landscape, we need lots of good tools and well established patterns, approaches, and advice on how to combine them so that we can be successful doing our jobs. Luckily for us, Vertica has been designed to be easy to build with and extended in this fashion. Databases as a whole had had this goal from the very beginning. They solve the hard problems of managing data so that you don't have to worry about it. Instead of worrying about those hard problems, you can focus on what matters most to you and your domain. So implementing that business logic, solving that problem, without having to worry about all of these intense, sometimes details about what it takes to manage a database at scale. With the declarative syntax of SQL, you tell Vertica what the answer is that you want. You don't tell Vertica how to get it. Vertica will figure out the right way to do it for you so that you don't have to worry about it. So this SQL abstraction is very nice because it's a well defined boundary where lots of developers know SQL, and it allows you to express what you need without having to worry about those details. So we can be the experts in data management while you worry about your problems. This goes beyond though, what's accessible through SQL to Vertica. We've got well defined extension and integration points across the product that allow you to customize this experience even further. So if you want to do things write your own SQL functions, or extend database softwares with UDXs, you can do so. If you have a custom data format that might be a proprietary format, or some source system that Vertica doesn't natively support, we have extension points that allow you to use those. To make it very easy to do passive, parallel, massive data movement, loading into Vertica but also to export Vertica to send data to other systems. And with these new features in time, we also could do the same kinds of things with Machine Learning models, importing and exporting to tools like TensorFlow. And it's these integration points that have enabled Vertica to build out this open architecture and a rich ecosystem of tools, both open source and closed source, of different varieties that solve all different problems that are common in this big data processing world. Whether it's open source, streaming systems like Kafka or Spark, or more traditional ETL tools on the loading side, but also, BI tools and visualizers and things like that to view and use the data that you keep in your database on the right side. And then of course, Vertica needs to be flexible enough to be able to run anywhere. So you can really take Vertica and use it the way you want it to solve the problems that you need to solve. So Vertica has always employed open standards, and integrated it with all kinds of different open source systems. What we're really excited to talk about now is that we are taking our new integration projects and making those open source too. In particular, we've got two new open source client libraries that allow you to build Vertica applications for Python and Go. These libraries act as a foundation for all kinds of interesting applications and tools. Upon those libraries, we've also built some integrations ourselves. And we're using these new libraries to power some new integrations with some third party products. Finally, we've got lots of new examples and reference implementations out on our GitHub page that can show you how to combine all these moving parts and exciting ways to solve new problems. And the code for all these things is available now on our GitHub page. And so you can use it however you like, and even help us make it better too. So the first such project that we have is called Vertica-Python. Vertica-Python began at our customer, Uber. And then in late 2018, we collaborated with them and we took it over and made Vertica-Python the first official open source client for Vertica You can use this to build your own Python applications, or you can use it via tools that were written in Python. Python has grown a lot in recent years and it's very common language to solve lots of different problems and use cases in the Big Data space from things like DevOps admission and Data Science or Machine Learning, or just homegrown applications. We use Python a lot internally for our own QA testing and automation needs. And with the Python 2 End Of Life, that happened at the end of 2019, it was important that we had a robust Python solution to help migrate our internal stuff off of Python 2. And also to provide a nice migration path for all of you our users that might be worried about the same problems with their own Python code. So Vertica-Python is used already for lots of different tools, including Vertica's admintools now starting with 9.3.1. It was also used by DataDog to build a Vertica-DataDog integration that allows you to monitor your Vertica infrastructure within DataDog. So here's a little example of how you might use the Python Client to do some some work. So here we open in connection, we run a query to find out what node we've connected to, and then we do a little DataLoad by running a COPY statement. And this is designed to have a familiar look and feel if you've ever used a Python Database Client before. So we implement the DB API 2.0 standard and it feels like a Python package. So that includes things like, it's part of the centralized package manager, so you can just PIP install this right now and go start using it. We also have our client for Go length. So this is called vertica-sql-go. And this is a very similar story, just in a different context or the different programming language. So vertica-sql-go, began as a collaboration with the Microsoft Focus SecOps Group who builds microfocus' security products some of which use vertica internally to provide some of those analytics. So you can use this to build your own apps in the Go programming language but you can also use it via tools that are written Go. So most notably, we have our Grafana integration, which we'll talk a little bit more about later, that leverages this new clients to provide Grafana visualizations for vertica data. And Go is another rising popularity programming language 'cause it offers an interesting balance of different programming design trade-offs. So it's got good performance, got a good current concurrency and memory safety. And we liked all those things and we're using it to power some internal monitoring stuff of our own. And here's an example of the code you can write with this client. So this is Go code that does a similar thing. It opens a connection, it runs a little test query, and then it iterates over those rows, processing them using Go data types. You get that native look and feel just like you do in Python, except this time in the Go language. And you can go get it the way you usually package things with Go by running that command there to acquire this package. And it's important to note here for the DC projects, we're really doing open source development. We're not just putting code out on our GitHub page. So if you go out there and look, you can see that you can ask questions, you can report bugs, you can submit poll requests yourselves and you can collaborate directly with our engineering team and the other vertica users out on our GitHub page. Because it's out on our GitHub page, it allows us to be a little bit faster with the way we ship and deliver functionality compared to the core vertica release cycle. So in 2019, for example, as we were building features to prepare for the Python 3 migration, we shipped 11 different releases with 40 customer reported issues, filed on GitHub. That was done over 78 different poll requests and with lots of community engagement as we do so. So lots of people are using this already, we see as our GitHub badge last showed with about 5000 downloads of this a day of people using it in their software. And again, we want to make this easy, not just to use but also to contribute and understand and collaborate with us. So all these projects are built using the Apache 2.0 license. The master branch is always available and stable with the latest creative functionality. And you can always build it and test it the way we do so that it's easy for you to understand how it works and to submit contributions or bug fixes or even features. It uses automated testing both for locally and with poll requests. And for vertica-python, it's fully automated with Travis CI. So we're really excited about doing this and we're really excited about where it can go in the future. 'Cause this offers some exciting opportunities for us to collaborate with you more directly than we have ever before. You can contribute improvements and help us guide the direction of these projects, but you can also work with each other to share knowledge and implementation details and various best practices. And so maybe you think, "Well, I don't use Python, "I don't use go so maybe it doesn't matter to me." But I would argue it really does matter. Because even if you don't use these tools and languages, there's lots of amazing vertica developers out there who do. And these clients do act as low level building blocks for all kinds of different interesting tools, both in these Python and Go worlds, but also well beyond that. Because these implementations and examples really generalize to lots of different use cases. And we're going to do a deeper dive now into some of these to understand exactly how that's the case and what you can do with these things. So let's take a deeper look at some of the details of what it takes to build one of these open source client libraries. So these database client interfaces, what are they exactly? Well, we all know SQL, but if you look at what SQL specifies, it really only talks about how to manipulate the data within the database. So once you're connected and in, you can run commands with SQL. But these database client interfaces address the rest of those needs. So what does the programmer need to do to actually process those SQL queries? So these interfaces are specific to a particular language or a technology stack. But the use cases and the architectures and design patterns are largely the same between different languages. They all have a need to do some networking and connect and authenticate and create a session. They all need to be able to run queries and load some data and deal with problems and errors. And then they also have a lot of metadata and Type Mapping because you want to use these clients the way you use those programming languages. Which might be different than the way that vertica's data types and vertica's semantics work. So some of this client interfaces are truly standards. And they are robust enough in terms of what they design and call for to support a truly pluggable driver model. Where you might write an application that codes directly against the standard interface, and you can then plug in a different database driver, like a JDBC driver, to have that application work with any database that has a JDBC driver. So most of these interfaces aren't as robust as a JDBC or ODBC but that's okay. 'Cause it's good as a standard is, every database is unique for a reason. And so you can't really expose all of those unique properties of a database through these standard interfaces. So vertica's unique in that it can scale to the petabytes and beyond. And you can run it anywhere in any environment, whether it's on-prem or on clouds. So surely there's something about vertica that's unique, and we want to be able to take advantage of that fact in our solutions. So even though these standards might not cover everything, there's often a need and common patterns that arise to solve these problems in similar ways. When there isn't enough of a standard to define those comments, semantics that different databases might have in common, what you often see is tools will invent plug in layers or glue code to compensate by defining application wide standard to cover some of these same semantics. Later on, we'll get into some of those details and show off what exactly that means. So if you connect to a vertica database, what's actually happening under the covers? You have an application, you have a need to run some queries, so what does that actually look like? Well, probably as you would imagine, your application is going to invoke some API calls and some client library or tool. This library takes those API calls and implements them, usually by issuing some networking protocol operations, communicating over the network to ask vertica to do the heavy lifting required for that particular API call. And so these API's usually do the same kinds of things although some of the details might differ between these different interfaces. But you do things like establish a connection, run a query, iterate over your rows, manage your transactions, that sort of thing. Here's an example from vertica-python, which just goes into some of the details of what actually happens during the Connect API call. And you can see all these details in our GitHub implementation of this. There's actually a lot of moving parts in what happens during a connection. So let's walk through some of that and see what actually goes on. I might have my API call like this where I say Connect and I give it a DNS name, which is my entire cluster. And I give you my connection details, my username and password. And I tell the Python Client to get me a session, give me a connection so I can start doing some work. Well, in order to implement this, what needs to happen? First, we need to do some TCP networking to establish our connection. So we need to understand what the request is, where you're going to connect to and why, by pressing the connection string. and vertica being a distributed system, we want to provide high availability, so we might need to do some DNS look-ups to resolve that DNS name which might be an entire cluster and not just a single machine. So that you don't have to change your connection string every time you add or remove nodes to the database. So we do some high availability and DNS lookup stuff. And then once we connect, we might do Load Balancing too, to balance the connections across the different initiator nodes in the cluster, or in a sub cluster, as needed. Once we land on the node we want to be at, we might do some TLS to secure our connections. And vertica supports the industry standard TLS protocols, so this looks pretty familiar for everyone who've used TLS anywhere before. So you're going to do a certificate exchange and the client might send the server certificate too, and then you going to verify that the server is who it says it is, so that you can know that you trust it. Once you've established that connection, and secured it, then you can start actually beginning to request a session within vertica. So you going to send over your user information like, "Here's my username, "here's the database I want to connect to." You might send some information about your application like a session label, so that you can differentiate on the database with monitoring queries, what the different connections are and what their purpose is. And then you might also send over some session settings to do things like auto commit, to change the state of your session for the duration of this connection. So that you don't have to remember to do that with every query that you have. Once you've asked vertica for a session, before vertica will give you one, it has to authenticate you. and vertica has lots of different authentication mechanisms. So there's a negotiation that happens there to decide how to authenticate you. Vertica decides based on who you are, where you're coming from on the network. And then you'll do an auth-specific exchange depending on what the auth mechanism calls for until you are authenticated. Finally, vertica trusts you and lets you in, so you going to establish a session in vertica, and you might do some note keeping on the client side just to know what happened. So you might log some information, you might record what the version of the database is, you might do some protocol feature negotiation. So if you connect to a version of the database that doesn't support all these protocols, you might decide to turn some functionality off and that sort of thing. But finally, after all that, you can return from this API call and then your connection is good to go. So that connection is just one example of many different APIs. And we're excited here because with vertica-python we're really opening up the vertica client wire protocol for the first time. And so if you're a low level vertica developer and you might have used Postgres before, you might know that some of vertica's client protocol is derived from Postgres. But they do differ in many significant ways. And this is the first time we've ever revealed those details about how it works and why. So not all Postgres protocol features work with vertica because vertica doesn't support all the features that Postgres does. Postgres, for example, has a large object interface that allows you to stream very wide data values over. Whereas vertica doesn't really have very wide data values, you have 30, you have long bar charts, but that's about as wide as you can get. Similarly, the vertica protocol supports lots of features not present in Postgres. So Load Balancing, for example, which we just went through an example of, Postgres is a single node system, it doesn't really make sense for Postgres to have Load Balancing. But Load Balancing is really important for vertica because it is a distributed system. Vertica-python serves as an open reference implementation of this protocol. With all kinds of new details and extension points that we haven't revealed before. So if you look at these boxes below, all these different things are new protocol features that we've implemented since August 2019, out in the open on our GitHub page for Python. Now, the vertica-sql-go implementation of these things is still in progress, but the core protocols are there for basic query operations. There's more to do there but we'll get there soon. So this is really cool 'cause not only do you have now a Python Client implementation, and you have a Go client implementation of this, but you can use this protocol reference to do lots of other things, too. The obvious thing you could do is build more clients for other languages. So if you have a need for a client in some other language that are vertica doesn't support yet, now you have everything available to solve that problem and to go about doing so if you need to. But beyond clients, it's also used for other things. So you might use it for mocking and testing things. So rather than connecting to a real vertica database, you can simulate some of that. You can also use it to do things like query routing and proxies. So Uber, for example, this log here in this link tells a great story of how they route different queries to different vertical clusters by intercepting these protocol messages, parsing the queries in them and deciding which clusters to send them to. So a lot of these things are just ideas today, but now that you have the source code, there's no limit in sight to what you can do with this thing. And so we're very interested in hearing your ideas and requests and we're happy to offer advice and collaborate on building some of these things together. So let's take a look now at some of the things we've already built that do these things. So here's a picture of vertica's Grafana connector with some data powered from an example that we have in this blog link here. So this has an internet of things use case to it, where we have lots of different sensors recording flight data, feeding into Kafka which then gets loaded into vertica. And then finally, it gets visualized nicely here with Grafana. And Grafana's visualizations make it really easy to analyze the data with your eyes and see when something something happens. So in these highlighted sections here, you notice a drop in some of the activity, that's probably a problem worth looking into. It might be a lot harder to see that just by staring at a large table yourself. So how does a picture like that get generated with a tool like Grafana? Well, Grafana specializes in visualizing time series data. And time can be really tricky for computers to do correctly. You got time zones, daylight savings, leap seconds, negative infinity timestamps, please don't ever use those. In every system, if it wasn't hard enough, just with those problems, what makes it harder is that every system does it slightly differently. So if you're querying some time data, how do we deal with these semantic differences as we cross these domain boundaries from Vertica to Grafana's back end architecture, which is implemented in Go on it's front end, which is implemented with JavaScript? Well, you read this from bottom up in terms of the processing. First, you select the timestamp and Vertica is timestamp has to be converted to a Go time object. And we have to reconcile the differences that there might be as we translate it. So Go time has a different time zone specifier format, and it also supports nanosecond precision, while Vertica only supports microsecond precision. So that's not too big of a deal when you're querying data because you just see some extra zeros, not fractional seconds. But on the way in, if we're loading data, we have to find a way to resolve those things. Once it's into the Go process, it has to be converted further to render in the JavaScript UI. So that there, the Go time object has to be converted to a JavaScript Angular JS Date object. And there too, we have to reconcile those differences. So a lot of these differences might just be presentation, and not so much the actual data changing, but you might want to choose to render the date into a more human readable format, like we've done in this example here. Here's another picture. This is another picture of some time series data, and this one shows you can actually write your own queries with Grafana to provide answers. So if you look closely here you can see there's actually some functions that might not look too familiar with you if you know vertica's functions. Vertica doesn't have a dollar underscore underscore time function or a time filter function. So what's actually happening there? How does this actually provide an answer if it's not really real vertica syntax? Well, it's not sufficient to just know how to manipulate data, it's also really important that you know how to operate with metadata. So information about how the data works in the data source, Vertica in this case. So Grafana needs to know how time works in detail for each data source beyond doing that basic I/O that we just saw in the previous example. So it needs to know, how do you connect to the data source to get some time data? How do you know what time data types and functions there are and how they behave? How do you generate a query that references a time literal? And finally, once you've figured out how to do all that, how do you find the time in the database? How do you do know which tables have time columns and then they might be worth rendering in this kind of UI. So Go's database standard doesn't actually really offer many metadata interfaces. Nevertheless, Grafana needs to know those answers. And so it has its own plugin layer that provides a standardizing layer whereby every data source can implement hints and metadata customization needed to have an extensible data source back end. So we have another open source project, the Vertica-Grafana data source, which is a plugin that uses Grafana's extension points with JavaScript and the front end plugins and also with Go in the back end plugins to provide vertica connectivity inside Grafana. So the way this works, is that the plugin frameworks defines those standardizing functions like time and time filter, and it's our plugin that's going to rewrite them in terms of vertica syntax. So in this example, time gets rewritten to a vertica cast. And time filter becomes a BETWEEN predicate. So that's one example of how you can use Grafana, but also how you might build any arbitrary visualization tool that works with data in Vertica. So let's now look at some other examples and reference architectures that we have out in our GitHub page. For some advanced integrations, there's clearly a need to go beyond these standards. So SQL and these surrounding standards, like JDBC, and ODBC, were really critical in the early days of Vertica, because they really enabled a lot of generic database tools. And those will always continue to play a really important role, but the Big Data technology space moves a lot faster than these old database data can keep up with. So there's all kinds of new advanced analytics and query pushdown logic that were never possible 10 or 20 years ago, that Vertica can do natively. There's also all kinds of data-oriented application workflows doing things like streaming data, or Parallel Loading or Machine Learning. And all of these things, we need to build software with, but we don't really have standards to go by. So what do we do there? Well, open source implementations make for easier integrations, and applications all over the place. So even if you're not using Grafana for example, other tools have similar challenges that you need to overcome. And it helps to have an example there to show you how to do it. Take Machine Learning, for example. There's been many excellent Machine Learning tools that have arisen over the years to make data science and the task of Machine Learning lot easier. And a lot of those have basic database connectivity, but they generally only treat the database as a source of data. So they do lots of data I/O to extract data from a database like Vertica for processing in some other engine. We all know that's not the most efficient way to do it. It's much better if you can leverage Vertica scale and bring the processing to the data. So a lot of these tools don't take full advantage of Vertica because there's not really a uniform way to go do so with these standards. So instead, we have a project called vertica-ml-python. And this serves as a reference architecture of how you can do scalable machine learning with Vertica. So this project establishes a familiar machine learning workflow that scales with vertica. So it feels similar to like a scickit-learn project except all the processing and aggregation and heavy lifting and data processing happens in vertica. So this makes for a much more lightweight, scalable approach than you might otherwise be used to. So with vertica-ml-python, you can probably use this yourself. But you could also see how it works. So if it doesn't meet all your needs, you could still see the code and customize it to build your own approach. We've also got lots of examples of our UDX framework. And so this is an older GitHub project. We've actually had this for a couple of years, but it is really useful and important so I wanted to plug it here. With our User Defined eXtensions framework or UDXs, this allows you to extend the operators that vertica executes when it does a database load or a database query. So with UDXs, you can write your own domain logic in a C++, Java or Python or R. And you can call them within the context of a SQL query. And vertica brings your logic to that data, and makes it fast and scalable and fault tolerant and correct for you. So you don't have to worry about all those hard problems. So our UDX examples, demonstrate how you can use our SDK to solve interesting problems. And some of these examples might be complete, total usable packages or libraries. So for example, we have a curl source that allows you to extract data from any curlable endpoint and load into vertica. We've got things like an ODBC connector that allows you to access data in an external database via an ODBC driver within the context of a vertica query, all kinds of parsers and string processors and things like that. We also have more exciting and interesting things where you might not really think of vertica being able to do that, like a heat map generator, which takes some XY coordinates and renders it on top of an image to show you the hotspots in it. So the image on the right was actually generated from one of our intern gaming sessions a few years back. So all these things are great examples that show you not just how you can solve problems, but also how you can use this SDK to solve neat things that maybe no one else has to solve, or maybe that are unique to your business and your needs. Another exciting benefit is with testing. So the test automation strategy that we have in vertica-python these clients, really generalizes well beyond the needs of a database client. Anyone that's ever built a vertica integration or an application, probably has a need to write some integration tests. And that could be hard to do with all the moving parts, in the big data solution. But with our code being open source, you can see in vertica-python, in particular, how we've structured our tests to facilitate smooth testing that's fast, deterministic and easy to use. So we've automated the download process, the installation deployment process, of a Vertica Community Edition. And with a single click, you can run through the tests locally and part of the PR workflow via Travis CI. We also do this for multiple different python environments. So for all python versions from 2.7 up to 3.8 for different Python interpreters, and for different Linux distros, we're running through all of them very quickly with ease, thanks to all this automation. So today, you can see how we do it in vertica-python, in the future, we might want to spin that out into its own stand-alone testbed starter projects so that if you're starting any new vertica integration, this might be a good starting point for you to get going quickly. So that brings us to some of the future work we want to do here in the open source space . Well, there's a lot of it. So in terms of the the client stuff, for Python, we are marching towards our 1.0 release, which is when we aim to be protocol complete to support all of vertica's unique protocols, including COPY LOCAL and some new protocols invented to support complex types, which is our new feature in vertica 10. We have some cursor enhancements to do things like better streaming and improved performance. Beyond that we want to take it where you want to bring it. So send us your requests in the Go client fronts, just about a year behind Python in terms of its protocol implementation, but the basic operations are there. But we still have more work to do to implement things like load balancing, some of the advanced auths and other things. But they're two, we want to work with you and we want to focus on what's important to you so that we can continue to grow and be more useful and more powerful over time. Finally, this question of, "Well, what about beyond database clients? "What else might we want to do with open source?" If you're building a very deep or a robust vertica integration, you probably need to do a lot more exciting things than just run SQL queries and process the answers. Especially if you're an OEM or you're a vendor that resells vertica packaged as a black box piece of a larger solution, you might to have managed the whole operational lifecycle of vertica. There's even fewer standards for doing all these different things compared to the SQL clients. So we started with the SQL clients 'cause that's a well established pattern, there's lots of downstream work that that can enable. But there's also clearly a need for lots of other open source protocols, architectures and examples to show you how to do these things and do have real standards. So we talked a little bit about how you could do UDXs or testing or Machine Learning, but there's all sorts of other use cases too. That's why we're excited to announce here our awesome vertica, which is a new collection of open source resources available on our GitHub page. So if you haven't heard of this awesome manifesto before, I highly recommend you check out this GitHub page on the right. We're not unique here but there's lots of awesome projects for all kinds of different tools and systems out there. And it's a great way to establish a community and share different resources, whether they're open source projects, blogs, examples, references, community resources, and all that. And this tool is an open source project. So it's an open source wiki. And you can contribute to it by submitting yourself to PR. So we've seeded it with some of our favorite tools and projects out there but there's plenty more out there and we hope to see more grow over time. So definitely check this out and help us make it better. So with that, I'm going to wrap up. I wanted to thank you all. Special thanks to Siting Ren and Roger Huebner, who are the project leads for the Python and Go clients respectively. And also, thanks to all the customers out there who've already been contributing stuff. This has already been going on for a long time and we hope to keep it going and keep it growing with your help. So if you want to talk to us, you can find us at this email address here. But of course, you can also find us on the Vertica forums, or you could talk to us on GitHub too. And there you can find links to all the different projects I talked about today. And so with that, I think we're going to wrap up and now we're going to hand it off for some Q&A.

Published Date : Mar 30 2020

SUMMARY :

Also a reminder that you can maximize your screen and frameworks to solve the problems you need to solve.

ENTITIES

Entity	Category	Confidence
Tom Wall	PERSON	0.99+
Sue LeClaire	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Roger Huebner	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Tom	PERSON	0.99+
Python 2	TITLE	0.99+
August 2019	DATE	0.99+
2019	DATE	0.99+
Python 3	TITLE	0.99+
two	QUANTITY	0.99+
Sue	PERSON	0.99+
Python	TITLE	0.99+
python	TITLE	0.99+
SQL	TITLE	0.99+
late 2018	DATE	0.99+
First	QUANTITY	0.99+
end of 2019	DATE	0.99+
Vertica	TITLE	0.99+
today	DATE	0.99+
Java	TITLE	0.99+
Spark	TITLE	0.99+
C++	TITLE	0.99+
JavaScript	TITLE	0.99+
vertica-python	TITLE	0.99+
Today	DATE	0.99+
first time	QUANTITY	0.99+
11 different releases	QUANTITY	0.99+
UDXs	TITLE	0.99+
Kafka	TITLE	0.99+
Extending Vertica with the Latest Vertica Ecosystem and Open Source Initiatives	TITLE	0.98+
Grafana	ORGANIZATION	0.98+
PyODBC	TITLE	0.98+
first	QUANTITY	0.98+
UDX	TITLE	0.98+
vertica 10	TITLE	0.98+
ODBC	TITLE	0.98+
10	DATE	0.98+
Postgres	TITLE	0.98+
DataDog	ORGANIZATION	0.98+
40 customer reported issues	QUANTITY	0.97+
both	QUANTITY	0.97+

UNLIST TILL 4/2 - Model Management and Data Preparation

>> Sue: Hello, everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled Machine Learning with Vertica, Data Preparation and Model Management. My name is Sue LeClaire, Director of Managing at Vertica and I'll be your host for this webinar. Joining me is Waqas Dhillon. He's part of the Vertica Product Management Team at Vertica. Before we begin, I want to encourage you to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. There will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer offline. Alternately, you can visit Vertica Forums to post your questions there after the session. Our engineering team is planning to join the forums to keep the conversation going. Also, a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides, and yes, this virtual session is being recorded and will be available to view on demand later this week. We'll send you a notification as soon as it's ready. So, let's get started. Waqas, over to you. >> Waqas: Thank you, Sue. Hi, everyone. My name is Waqas Dhillon and I'm a Product Manager here at Vertica. So today, we're going to go through data preparation and model management in Vertica, and the session would essentially be starting with some introduction and going through some of the machine learning configurations and you're doing machine learning at scale. After that, we have two media sections here. The first one is on data preparation, and so we'd go through data preparation is, what are the Vertica functions for data exploration and data preparation, and then share an example with you. Similarly, in the second part of this talk we'll go through different export models using PMML and how that works with Vertica, and we'll share examples from that, as well. So yeah, let's dive right in. So, Vertica essentially is an open architecture with a rich ecosystem. So, you have a lot of options for data transformation and ingesting data from different tools, and then you also have options for connecting through ODBC, JDBC, and some other connectors to BI and visualization tools. There's a lot of them that Vertica connects to, and in the middle sits Vertica, which you can have on external tables or you can have in place analytics on R, on cloud, or on prem, so that choice is yours, but essentially what it does is it offers you a lot of options for performing your data and analytics on scale, and within that, data analytics machine learning is also a core component, and then you have a lot of options and functions for that. Now, machine learning in Vertica is actually built on top of the architecture that distributed data analytics offers, so it offers a lot of those capabilities and builds on top of them, so you eliminate the overhead data transfer when you're working with Vertica machine learning, you keep your data secure, storing and managing the models really easy and much more efficient. You can serve a lot of concurrent users all at the same time, and then it's really scalable and avoids maintenance cost of a separate system, so essentially a lot of benefits here, but one important thing to mention here is that all the algorithms that you see, whether they're analytics functions, advanced analytics functions, or machine learning functions, they are distributed not just across the cluster on different nodes. So, each node gets a distributed work load. On each node, too, there might be multiple tracks and multiple processors that are running with each of these functions. So, highly distributed solution and one of its kind in this space. So, when we talk about Vertica machine learning, it essentially covers all machine learning process and we see it as something starting with data ingestion and doing data analysis and understanding, going through the steps of data preparation, modeling, evaluation, and finally deployment, as well. So, when you're using with Vertica, you're using Vertica for machine learning, it takes care of all these steps and you can do all of that inside of the Vertica database, but when we look at the three main pillars that Vertica machine learning aims to build on, the first one is to have Vertica as a platform for high performance machine learning. We have a lot of functions for data exploration and preparation and we'll go through some of them here. We have distributed in-database algorithms for model training and prediction, we have scalable functions for model evaluation, and finally we have distributed scoring functions, as well. Doing all of the stuff in the database, that's a really good thing, but we don't want it isolated in this space. We understand that a lot of our customers, our users, they like to work with other tools and work with Vertica, as well. So, they might use Vertica for data prep, another two for model training, or use Vertica for model training and take those nodes out to other tools and do prediction there. So, integration is really important part of our overall offering. So, it's a pretty flexible system. We have been offering UdX in four languages, a lot of people find there over the past few years, but the new capability of importing PMML models for in-database scoring and exporting Vertica native-models, for external scoring it's something that we have recently added, and another talk would actually go through the TensorFlow integrations, a really exciting and important milestone that we have where you can bring TensorFlow models into Vertica for in-database scoring. For this talk, we'll focus on data exploration and preparation, importing PMML, and exporting PMML models, and finally, since Vertica is not just a cue engine, but also a data store, we have a lot of really good capability for model storage and management, as well. So, yeah. Let's dive into the first part on machine learning at scale. So, when we say machine learning at scale we're actually having a few really important considerations and they have their own implications. The first one is that we want to have speed, but also want it to come at a reasonable cost. So, it's really important for us to pick the right scaling architecture. Secondly, it's not easy to move big data around. It might be easy to do that on a smaller data set, on an Excel sheet, or something of the like, but once you're talking about big data and data analytics at really big scale, it's really not easy to move that data around from one tool to another, so what you'd want to do is bring models to the data instead of having to move this data to the tools, and the third thing here is that some sub-sampling it can actually compromise your accuracy, and a lot of tools that are out there they still force you to take smaller samples of your data because they can only handle so much data, but that can impact your accuracy and the need here is that you should be able to work with all of your data. We'll just go through each of these really quickly. So, the first factor here is scalability. Now, if you want to scale your architecture, you have two main options. The first is vertical scaling. Let's say you have a machine, a server, essentially, and you can keep on adding resources, like RAM and CPU and keep increasing the performance as well as the capacity of that system, but there's a limit to what you can do here, and the limit, you can hit that in terms of cost, as well as in terms of technology. Beyond a certain point, you will not be able to scale more. So, the right solution to follow here is actually horizontal scaling in which you can keep on adding more instances to have more computing power and more capacity. So, essentially what you get with this architecture is a super computer, which stitches together several nodes and the workload is distributed on each of those nodes for massive develop processing and really fast speeds, as well. The second aspect of having big data and the difficulty around moving it around is actually can be clarified with this example. So, what usually happens is, and this is a simplified version, you have a lot of applications and tools for which you might be collecting the data, and this data then goes into an analytics database. That database then in turn might be connected to some VI tools, dashboard and applications, and some ad-hoc queries being done on the database. Then, you want to do machine learning in this architecture. What usually happens is that you have your machine learning tools and the data that is coming in to the analytics database is actually being exported out of the machine learning tools. You're training your models there, and afterwards, when you have new incoming data, that data again goes out to the machine learning tools for prediction. With those results that you get from those tools usually ended up back in the distributed database because you want to put it on dashboard or you want to power up some applications with that. So, there's essentially a lot of data overhead that's involved here. There are cons with that, including data governance, data movement, and other complications that you need to resolve here. One of the possible solutions to overcome that difficulty is that you have machine learning as part of the distributed analytical database, as well, so you get the benefits of having it applied on all of the data that's inside of the database and not having to care about all of the data movement there, but if there are some use cases where it still makes sense to at least train the models outside, that's where you can do your data preparation outside of the database, and then take the data out, the prepared data, build your model, and then bring the model back to the analytics database. In this case, we'll talk about Vertica. So, the model would be archived, hosted by Vertica, and then you can keep on applying predictions on the new data that's incoming into the database. So, the third consideration here for machine learning on scale is sampling versus full data set. As I mentioned, a lot of tools they cannot handle big data and you are forced to sub-sample, but what happens here, as you can see in the figure on the left most, figure A, is that if you have a single data point, essentially any model can explain that, but if you have more data points, as in figure B, there would be a smaller number of models that could be able to explain that, and in figure C, even more data points, lesser number of models explained, but lesser also means here that these models would probably be more accurate, and the objective for building machine learning models is mostly to have prediction capability and generalization capability, essentially, on unseen data, so if you build a model that's accurate on one data point, it could not have very good generalization capabilities. The conventional wisdom with machine learning is that the more data points that you have for learning the better and more accurate models that you'll get out of your machine learning models. So, you need to pick a tool which can handle all of your data and does not force you to sub-sample that, and doing that, even a simpler model might be much better than a more complex model here. So, yeah. Let's go to data exploration and data preparation part. Vertica's a really powerful tool and it offers a lot of scalability in this space, and as I mentioned, will support the whole process. You can define the problem and you can gather your data and construct your data set inside Vertica, and then consider it a prepared training modeling deployment and managing the model, but this is a really critical step in the overall machine learning process. Some estimate it takes between 60 to 80% of the overall effort of a machine learning process. So, a lot of functions here. You can use part of Vertica, do data exploration, de-duplication, outlier detection, balancing, normalization, and potentially a lot more. You can actually go to our Vertica documentation and find them there. Within Vertica we divide them into two parts. Within data prep, one is exploration functions, the second is transformation functions. Within exploration, you have a rich set functions that you can use in DB, and then if you want to build your own you can use the UDX to do that. Similarly, for transformation there's a lot of functions around time series, pattern matching, outlier detection that you can use to transform that data, and it's just a snapshot of some of those functions that are available in Vertica right now. And again, the good thing about these functions is not just their presence in the database. The good thing is actually their ability to scale on really, really large data set and be able to compute those results for you on that data set in an acceptable amount of time, which makes your machine learning processes really critical. So, let's go to an example and see how we can use some of these functions. As I mentioned, there's a whole lot of them and we'll not be able to go through all of them, but just for our understanding we can go through some of them and see how they work. So, we have here a sample data set of network flows. It's a similar attack from some source nodes, and then there are some victim nodes on which these attacks are happening. So yeah, let's just look at the data here real quick. We'll load the data, we'll browse the data, compute some statistics around it, ask some questions, make plots, and then clean the data. The objective here is not to make a prediction, per se, which is what we mostly do in machine learning algorithms, but to just go through the data prep process and see how easy it is to do that with Vertica and what kind of options might be there to help you through that process. So, the first step is loading the data. Since in this case we know the structure of the data, so we create a table and create different column names and data types, but let's say you have a data set for which you do not already know the structure, there's a really cool feature in Vertica called flex tables and you can use that to initially import the data into the database and then go through all of the variables and then assign them variable types. You can also use that if your data is dynamic and it's changing, to board the data first and then create these definitions. So once we've done that, we load the data into the database. It's for one week of data out of the whole data set right now, but once you've done that we'd like to look at the flows just to look at the data, you know how it looks, and once we do select star from flows and just have a limit here, we see that there's already some data duplication, and by duplication I mean rows which have the exact same data for each of the columns. So, as part of the cleaning process, the first thing we'd want to do is probably to remove that duplication. So, we create a table with distinct flows and you can see here we have about a million flows here which are unique. So, moving on. The next step we want to do here, this is essentially time state data and these times are in days of the week, so we want to look at the trends of this data. So, the network traffic that's there, you can call it flows. So, based on hours of the day how does the traffic move and how does it differ from one day to another? So, it's part of an exploration process. There might be a lot of further exploration that you want to do, but we can start with this one and see how it goes, and you can see in the graph here that we have seven days of data, and the weekend traffic, which is in pink and purple here seems a little different from the rest of the days. Pretty close to each other, but yeah, definitely something we can look into and see if there's some real difference and if there's something we want to explore further here, but the thing is that this is just data for one week, as I mentioned. What if we load data for 70 days? You'd have a longer graph probably, but a lot of lines and would not really be able to make sense out of that data. It would be a really crowded plot for that, so we have to come up with a better way to be able to explore that and we'll come back to that in a little bit. So, what are some other things that we can do? We can get some statistics, we can take one sample flow and look at some of the values here. We see that the forward column here and ToS column here, they have zero values, and when we explore further we see that there's a lot of values here or records here for which these columns are essentially zero, so probably not really helpful for our use case. Then, we can look at the flow end. So, flow end is the end time when the last packet in a flow was sent and you can do a select min flow and max flow to see the data when it started and when it ended, and you can see it's about one week's of data for the first til eighth. Now, we also want to look at the data whether it's balanced or not because balanced data is really important for a lot of classification use cases that we want to try with this and you can see that source address, destination address, source port, and destination port, and you see it's highly in balanced data and so is versus destination address space, so probably something that we need to do, really powerful Vertica balancing functions that you can use within, and just sampling, over-sampling, or hybrid sampling here and that can be really useful here. Another thing we can look at is there's so many statistics of these columns, so off the unique flows table that we created we just use the summarize num call function in Vertica and it gives us a lot of really cool (mumbling) and percentile information on that. Now, if we look at the duration, which is the last record here, we can see that the mean is about 4.6 seconds, but when we look at the percentile information, we see that the median is about 0.27. So, there's a lot of short flows that have duration less than 0.27 seconds. Yes, there would be more and they'd probably bring the mean to the 4.6 value, but then the number of short flows is probably pretty high. We can ask some other questions from the data about the features. We can look at the protocols here and look at the count. So, we see that most of the traffic that we have is for TCP and UDP, which is sort of expected for a data set like this, and then we want to look at what are the most popular network services here? So again, simply queue here, select destination port count, add in the information here. We get the destination port and count for each. So, we can see that most of the traffic here is web traffic, HTTP and HTTPS, followed by domain name resolution. So, let's explore some more. We can look at the label distributions. We see that the labels that are given with that because this is essentially data for which we already know whether something was an anomaly or not, record was anomaly or not, and creating our algorithm based on it. So, we see that there's this background label, a lot of records there, and then anomaly spam seems to be really high. There are anomaly UDB scans and SSS scams, as well. So, another question we can ask is among the SMTP flows, how labels are distributed, and we can say that anomaly spam is highest, and then comes the background spam. So, can we say out of this that SMTP flows, they are spams, and maybe we can build a model that actually answers that question for us? That can be one machine learning model that you can build out of this data set. Again, we can also verify the destination port of flows that were labeled as spam. So, you can expect port 25 for SMTP service here, and we can see that SMTP with destination port 25, you have a lot of counts here, but there are some other destination ports for which the count is really low, and essentially, when we're doing and analysis at this scale, these data points might not really be needed. So, as part of the data prep slash data cleaning we might want to get rid of these records here. So now, what we can do is going back to the graph that I showed earlier, we can try and plot the daily trends by aggregating them. Again, we take the unique flow and convert into a flow count and to a manageable number that we can then feed into one of the algorithms. Now, PCA principle component analysis, it's a really powerful algorithm in Vertica, and what it essentially does is a lot of times when you have a high number of columns, which might be highly (mumbling) with each other, you can feed them into the PCA algorithm and it will get for you a list of principle components which would be linearly independent from each other. Now, each of these components would explain a certain extent of the variants of the overall data set that you have. So, you can see here component one explains about 73.9% of the variance, and component two explains about 16% of the variance. So, if you combine those two components alone, that would get you for around 90% of the variance. Now, you can use PCA for a lot of different purposes, but in this specific example, we want to see if we combine all the data points that we have together and we do that by day of the week, what sort of information can we get out of it? Is there any insight that this provides? Because once you have two data points, it's really easy to plot them. So, we just apply the PCA, we first (mumbling) it, and then reapply on our data set, and this is the graph we get as a result. Now, you can see component one is on the X axis here, component two on the y axis, and each of these points represents a day of the week. Now, with just two points it's easy to plot that and compare this to the graph that we saw earlier, which had a lot of lines and the more weeks that we added or the more days that we added, the more lines that we'd have versus this graph in which you can clearly tell that five days traffic starting from Monday til Friday, that's closely clustered together, so probably pretty similar to each other, and then Saturday traffic is pretty much apart from all of these days and it's also further away from Sunday. So, these two days of traffic is different from other days of traffic and we can always dive deeper into this and look at exactly what's happening here and see how this traffic is actually different, but with just a few functions and some pretty simple SQL queries, we were already able to get a pretty good insight from the data set that we had. Now, let's move on to our next part of this talk on importing and exporting PMML models to and from Vertica. So, current common practice is when you're putting your machine learning models into production, you'd have a dev or test environment, and in that you might be using a lot of different tools, Scikit and Spark, R, and once you want to deploy these models into production, you'd put them into containers and there would be a pool of containers in the production environment which would be talking to your database that could be your analytical database, and all of the new data that's incoming would be coming into the database itself. So, as I mentioned in one of the slides earlier, there is a lot of data transfer that's happening between that pool of containers hosting your machine learning training models versus the database which you'd be getting data for scoring and then sending the scores back to the database. So, why would you really need to transfer your models? The thing is that no machine learning platform provides everything. There might be some really cool algorithms that might compromise, but then Spark might have its own benefits in terms of some additional algorithms or some other stuff that you're looking at and that's the reason why a lot of these tools might be used in the same company at the same time, and then there might be some functional considerations, as well. You might want to isolate your data between data science team and your production environment, and you might want to score your pre-trained models on some S nodes here. You cannot host probably a big solution, so there is a whole lot of use cases where model movement or model transfer from one tool to another makes sense. Now, one of the common methods for transferring models from one tool to another is the PMML standard. It's an XML-based model exchange format, sort of a standard way to define statistical and data mining models, and helps you share models between the different applications that are PMML compliant. Really popular tool, and that's the tool of choice that we have for moving models to and from Vertica. Now, with this model management, this model movement capability, there's a lot of model management capabilities that Vertica offers. So, models are essentially first class citizens of Vertica. What that means is that each model is associated with a DB schema, so the user that initially creates a model, that's the owner of it, but he can transfer the ownership to other users, he can work with the ownership rights in any way that you would work with any other relation in a database would be. So, the same commands that you use for granting access to a model, changing its owner, changing its name, or dropping it, you can use similar commands for more of this one. There are a lot of functions for exploring the contents of models and that really helps in putting these models into production. The metadata of these models is also available for model management and governance, and finally, the import/export part enables you to apply all of these operations to the model that you have imported or you might want to export while they're in the database, and I think it would be nice to actually go through and example to showcase some of these capabilities in our model management, including the PMML model import and export. So, the workflow for export would be that we trained some data, we'll train a logistic regression model, and we'll save it as an in-DB Vertica model. Then, we'll explore the summary and attributes of the model, look at what's inside the model, what the training parameters are, concoctions and stuff, and then we can export the model as PMML and an external tool can import that model from PMML. And similarly, we'll go through and example for export. We'll have an external PMML model trained outside of Vertica, we'll import that PMML model and from there on, essentially, we'll treat it as an in-DB PMML model. We'll explore the summary and attribute of the model in much the same way as in in-DB model. We'll apply the model for in-DB scoring and get the prediction results, and finally, we'll bring some test data. We'll use that on test data for which the scoring needs to be done. So first, we want to create a connection with the database. In this case, we are using a Python Jupyter Notebook. We have the Vertica Python connector here that you can use, really powerful connector, allows you to do a lot of cool stuff to the database using the Jupyter front end, but essentially, you can use any other SQL front end tool or for that matter, any other Python ID which lets you connect to the database. So, exporting model. First, we'll create an logistic regression model here. Select logistic regression, we'll give it a model name, then put relation, which might be a table, time table, or review. There's response column and the predictor columns. So, we get a logistic regression model that we built. Now, we look at the models table and see that the model has been created. This is a table in Vertica that contains a list of all the models that are there in the database. So, we can see here that my model that we just created, it's created with Vertica models as a category, model type is logistic regression, and we have some other metadata around this model, as well. So now, we can look at some of the summary statistics of the model. We can look at the details. So, it gives us the predictor, coefficients, standard error, Z value, and P value. We can look at the regularization parameters. We didn't use any, so that would be a value of one, but if you had used, it would show it up here, the call string and also additional information regarding iteration count, rejected row count, and accepted row count. Now, we can also look at the list of attributes of the model. So, select get model attribute using parameter, model name is myModel. So, for this particular model that we just created, it would give us the name of all the attributes that are there. Similarly, you can look at the coefficients of the model in a column format. So, using parameter name myModel, and in this case we add attribute name equals details because we want all the details for that particular model and we get the predictor name, coefficient, standard error, Z value, and P value here. So now, what we can do is we can export this model. So, we used the select export models and we give it a path to where we want the model to be exported to. We give it the name of the model that needs to be exported because essentially might have a lot of models that you have created, and you give it the category here, which in our example is PMML, and you get a status message here that export model has been successful. So now, let's move onto the importing models example. In much the same way that we created a model in Vertica and exported it out, you might want to create a model outside of Vertica in another tool and then bring that to Vertica for scoring because Vertica contains all of the hard data and it might make sense to host that model in Vertica because scoring happens a lot more quickly than model training. So, in this particular case we do a select import models and we are importing a logistic regression model that was created in Spark. The category here again is PMML. So, we get the status message that the import was successful. Now, let's look at the attributes, look at the models table, and see that the model is really present there. Now previously when we ran this query because we had only myModel there, so that was the only entry you saw, but now once this model is imported you can see that as line item number two here, Spark logistic regression, it's a public schema. The category here however is different because it's not an individuated model, rather an imported model, so you get PMML here and then other metadata regarding the model, as well. Now, let's do some of the same operations that we did with the in-DB model so we can look at the summary of the imported PMML model. So, you can see the function name, data fields, predictors, and some additional information here. Moving on. Let's look at the attributes of the PMML model. Select your model attribute. Essentially the same query that we applied earlier, but the difference here is only the model name. So, you get the attribute names, attribute field, and number of rows. We can also look at the coefficient of the PMML model, name, exponent, and coefficient here. So yeah, pretty much similar to what you can do with an in-DB model. You can also perform all operations on an important model and one additional thing we'd want to do here is to use this important model for our prediction. So in this case, we'll data do a select predict PMML and give it some values using parameters model name, and logistic regression, and match by position, it's a really cool feature. This is true in this case. Sector, true. So, if you have model being imported from another platform in which, let's say you have 50 columns, now the names of the columns in that environment in which you're training the model might be slightly different than the names of the column that you have set up for Vertica, but as long as the order is the same, Vertica can actually match those columns by position and you don't need to have the exact same names for those columns. So in this case, we have set that to true and we see that predict PMML gives us a status of one. Now, using the important model, in this case we had a certain value that we had given it, but you can also use it on a table, as well. So in that case, you also get the prediction here and you can look at the (mumbling) metrics, see how well you did. Now, just sort of wrapping this up, it's really important to know the important distinction between using your models in any tool, any single node solution tool that you might already be using, like Python or R versus Vertica. What happens is, let's say you build a model in Python. It might be a single node solution. Now, after building that model, let's say you want to do prediction on really large amounts of data and you don't want to go through the overhead of keeping to move that data out of the database to do prediction every time you want to do it. So, what you can do is you can import that model into Vertica, but what Vertica does differently than Python is that the PMML model would actually be distributed across each mode in the cluster, so it would be applying on the data segments in each of those nodes and they might be different threads running for that prediction. So, the speed that you get here from all prediction would be much, much faster. Similarly, once you build a model for machine learning in Vertica, the objective mostly is that you want to use up all of your data and build a model that's accurate and is not just using a sample of the data, but using all the data that's available to it, essentially. So, you can build that model. The model building process would again go through the same technique. It would actually be distributed across all nodes in a cluster, and it would be using up all the threads and processes available to it within those nodes. So, really fast model training, but let's say you wanted to deploy it on an edge node and maybe do prediction closer to where the data was being generated, so you can export that model in a PMML format and all deploy it on the edge node. So, it's really helpful for a lot of use cases. And just some rising takeaways from our discussion today. So, Vertica's a really powerful tool for machine learning, for data preparation, model training, prediction, and deployment. You might want to use Vertica for all of these steps or some of these steps. Either way, Vertica supports both approaches. In the upcoming releases, we are planning to have more import and export capability through PMML models. Initially, we're supporting kmeans, linear, and logistic regression, but we keep on adding more algorithms and the plan is to actually move to supporting custom models. If you want to do that with the upcoming release, our TensorFlow indication is always there which you can use, but with PMML, this is the starting point for us and we keep on improving that. Vertica model can be exported in PMML format for scoring on other platforms, and similarly, models that get build in other tools can be imported for in-DB machine learning and in-DB scoring within Vertica. There are a lot of critical model management tools that are provided in Vertica and there are a lot of them on the roadmap, as well, which would keep on developing. Many ML functions and algorithms, they're already part of the in-DB library and we keep on adding to that, as well. So, thank you so much for joining the discussion today and if you have any questions we'd love to take them now. Back to you, Sue.

Published Date : Mar 30 2020

SUMMARY :

and thank you for joining us today and the limit, you can hit that in terms of cost,

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
Waqas Dhillon	PERSON	0.99+
70 days	QUANTITY	0.99+
Sue LeClaire	PERSON	0.99+
two points	QUANTITY	0.99+
two days	QUANTITY	0.99+
Sue	PERSON	0.99+
seven days	QUANTITY	0.99+
one week	QUANTITY	0.99+
five days	QUANTITY	0.99+
Sunday	DATE	0.99+
two parts	QUANTITY	0.99+
second part	QUANTITY	0.99+
Saturday	DATE	0.99+
Excel	TITLE	0.99+
50 columns	QUANTITY	0.99+
4/2	DATE	0.99+
First	QUANTITY	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
each node	QUANTITY	0.99+
Today	DATE	0.99+
first factor	QUANTITY	0.99+
less than 0.27 seconds	QUANTITY	0.99+
Vertica	TITLE	0.99+
first	QUANTITY	0.99+
Friday	DATE	0.99+
Monday	DATE	0.99+
second aspect	QUANTITY	0.99+
eighth	QUANTITY	0.99+
today	DATE	0.99+
one day	QUANTITY	0.99+
two data points	QUANTITY	0.99+
third consideration	QUANTITY	0.99+
one	QUANTITY	0.99+
first step	QUANTITY	0.98+
first part	QUANTITY	0.98+
first one	QUANTITY	0.98+
zero values	QUANTITY	0.98+
second	QUANTITY	0.98+
both approaches	QUANTITY	0.98+
about 4.6 seconds	QUANTITY	0.98+
third thing	QUANTITY	0.98+
Secondly	QUANTITY	0.98+
one tool	QUANTITY	0.98+
zero	QUANTITY	0.98+
each mode	QUANTITY	0.98+
One	QUANTITY	0.97+
figure B	OTHER	0.97+
figure C	OTHER	0.97+
4.6 value	QUANTITY	0.97+
R	TITLE	0.97+
Machine Learning with Vertica, Data Preparation and Model Management	TITLE	0.97+
Waqas	PERSON	0.97+
each model	QUANTITY	0.97+
two main options	QUANTITY	0.97+
80%	QUANTITY	0.97+
two components	QUANTITY	0.96+
around 90%	QUANTITY	0.96+
two	QUANTITY	0.96+
later this week	DATE	0.95+

Sandy Carter, AWS | AWS re:Invent 2019

(upbeat music) >> Narrator: Live from Las Vegas, it's theCube. Covering AWS re:Invent 2019 brought to you by Amazon Web Services and Intel, along with it's ecosystem partners. >> Hello there and welcome back to theCube's live coverage here in Las Vegas for AWS re:Invent 2019. This is theCube's seventh year covering re:Invent. They've been doing this show for eight years, we missed the first year, I'm John Furr, and my co-host David Vellante. We're here extracting the signal from the noise, and we're here with an amazing guest, our friend, she's been here with us from the beginning of theCube, since inception. Always great to get to comment with her. Sandy Carter Vice President with Amazon Web Services. >> Thank you. >> Now in the public sector handling partners. Great to see you, thanks for coming on again and sharing your content. >> So great to see you guys, so dressed up and looking good guys, I have to say. (laughs) >> You're looking good to, but I can't help but stare at our other guest here, the IoT suitcase. >> First, tell us-- >> Yes. >> About the IoT suitcase. >> Well we, in public sector we have a partner program, and that program helps entrepreneurs. And we're really keen on especially helping female entrepreneurs. So one of our entrepreneurs created this suitcase, that's an IoT based suitcase, you can put your logo's and that sort of thing on it, but more importantly for public sectors, she created this safety ring, John. And so, if I touch it I've de-activated it, but if I touch it, it will call the police for me, if I'm being assaulted. Or if I'm having an emergency, I can touch it and have an ambulance come for me as well. And the really cool thing about it is she worked backwards from the customer, figuring out like how are most people assaulted, and if you have an emergency and you fall, what's the best way to get ahold of someone. It's not your phone, because you don't always carry it, it's for a device like this. >> Or a bigger device that you can't, or you leave on the table somewhere, but that's you know it's attractive. >> It's awesome. >> And it's boom, simple. >> And it's pink. (laughs) >> What I love fast about re:Invent as an event is that there's so much innovation going on, but one of the areas that's become modernized very rapidly is the public sector. Your now in this area, there's a lot of partners, a huge ecosystem going, and the modernization effort is real. >> It is. >> Could you share some commentary on what's going on. Give people a feel for the pace of change, what's accelerating? What are people doubling down on, what are some of the dynamics in public sector? >> Yeah, so if you know public sector, public sector actually has a lot of Windows or Microsoft workloads in it. And so we're seeing a lot of public sector customers looking to modernize their Windows workloads, in fact we made several announcements just yesterday around helping more public sector customers modernize. For example, one is Windows Servers 2003, and 2008 will go out of support, and so we have a great new offering, with technology, that can help them to not re-factor, but actually abstract those layers and move quickly to 2016 and 2019, because both of those will go out of support in January. >> A lot of people don't know, and I've learned this from talking with Andy Jassy in the keynote, as well as hearing from some other folks, is that you got, Amazon runs a lot of Windows. >> Oh, we have 57% Windows workloads on AWS in terms of market segment share. Which is 2x the next nearest cloud provider, 2x. And most customers choose to run their Windows workloads on us, because we are so innovative, we move really fast. We're more reliable. The latest public data from 2018 shows that the nearest cloud provider had seven times more downtime. So if your in public sector or even commercial, who can afford to be down that long, and then finally, we have better security. So one of the things we've been focused on for public sector is FedRamp solutions. We know have over 90 solutions that are FedRamp ready. Which is four times more than the next two cloud providers. Four times more than the two combined. >> That's interesting, so I got to ask the question that's popping up in my mind, I'm sure people are curious about. >> Yeah. >> I get the Windows working on Amazon, and that makes a lot of sense, why wouldn't you want to run on the best cloud. The question I would have is, how would the licensing work, because, that's seems to be lock-in spec, Oracle does it, Microsoft does it, does license become the lock-in. So, when something expires, what happens on the licensing side. Licensing is really tricky, and in fact, October 1st, Microsoft made some new licensing changes. And so, we have some announcement to help our customers still bring their own licenses, or what we call fondly, BYOL over to AWS, so they don't have to double invest on the license. >> So you can honor that license on AWS. >> Yeah, and you have to do it on a dedicated host. Which at midnight madness, we announced new dedicated host solution, that's very cloud-like. Makes it as easy to run a dedicated host instance as it is an EC2 instance. So, wicked easy, very cost effective if your moving those on-premises workloads over. >> I just want to point out John, something that's really important here is a lot of times, software companies will use scare tactics, to your point. They'll jack up the cost of the license, to say, ah you got to stay with us, if you run on our hardware or our platform, you pay half. And then they'll put out, "Oh, Amazon's twice as expensive." But these are all negotiable. I've talked to a number of customers, particularly on the Oracle side, and said, no, no, we just went to Oracle and said look, you got a choice, I either give us the same license price or we're migrating off your database. Okay, all right. But some of it is scare tactics, and I think you know increasingly, that's not working in the marketplace. So I just wanted to point that out. >> So what's the strategy for customers to take, I guess that's the question. Because, certainly the licensing becomes again like they get squeezed, I can see that. But what do customers do, is there a playbook? >> Well there is, and so the best one is you buy your license from Microsoft, and then using BYOL, you can bring that over to AWS. It's faster, more performance, more reliable, that sort of thing. If you do get restricted though John, like they are doing for instance with their end of support, you could run that on Azure, and get all the security fixes. We are trying to provide technical solutions, like the ability to abstract Windows Server 2003 and Server 2008 as it goes out of support. >> I mean certainly in the case of Oracle, it used to be you know 10-15 years ago, you didn't have a choice. Instead of one RDBMS, and now it's so much optionality in databases. >> And I will also tell you that we have a lot of customers today, who are migrating from SQL server, or Oracle over to Aurora. Aurora, is equally as performant, and a tenth of the cost. So we actually have this team called the database freedom team that will help you do that migration. In fact I was talking to a very large customer last night, and I was explaining some of the options. And their like, "Let's do the Aurora thing." Let's do it two-step. Let's start by migrating the database over, Oracle and SQL and then I want to go to Aurora. It's like database built for the cloud, it's faster and its cheaper. So why wouldn't you do that? >> Yeah, and I think the key is, to my question about a friction. What's frictionless? How can they get it done quickly without going through the trip-wires of the licensing. >> Certain workloads are tough, right. You know if you're running your business on high transaction volume. But a lot of the analytics stuff, the data warehouse, you know look at Amazon's own experiences. You guys are just ticking it off, moving over from Oracle to Aurora, it's been fun to watch. >> I want to get you guy's perspective Dave, you and Sandy, because I think you guys might have good insight on this, because everyone knows that I'm really passionate about public sector, I've been really enamored with Teresa's business from Day one, but when she won the CIA deal, that really got my attention. As I dug into the Jedi deal, and that all went sideways, it really jumped out at me, that public sector is probably the most transformative market, because they are modernizing at a record pace. I mean this is like a glacier moving market. They don't really have old ways, they got the beltway bandits, they got old procurement, old technology, and like literally in a short period of time, they have to modernize. So they're becoming more enterprise like, can you guys, I mean pros in the enterprise, what's your take? It just seems like a Tsunami of change in the public sector, because the technology is driving it. What do you guys think about this? Am I on or off base? What are some of the trends that are going on? >> I mean I have a perspective, but please. >> No, okay. So I'll start. So I see so much transformation regardless of what industry your looking at. If you're looking at Government for example working with SAP NS2, we just actually took 26 different flavors of SAP ERP for the Navy, and helped them to migrate to the cloud. For the US Navy, which is awesome. Arkis Global, did the same thing for the UK. We actually have Amazon Connect in there, so that's like a cool call center driven by Machine Learning, and the health care system for the UK. Or you can even look at things, like here in the U.S. there's a company that really looks at how you do monitoring for the children to keep them safe. They've partnered up with a National Police Association, and they are bringing that to the cloud. So regardless of education, non-profits, government, and it's around the world, it's not just the US. We are seeing these governments education, start-ups, non-profits, all moving to the cloud, and taking their own legacy systems to Linux, to Aurora, and moving very rapidly. >> And I think Andy hit on it yesterday, it's got to start with top-down leadership. And in the government, if you can get somebody whose a leading thinker, CIO, we're going cloud first. Mandate cloud, you know you saw that years ago, but today, I think it's becoming more mainstream. I think the one big challenge is obviously the disruption in defense and that's why you talked about Jedi, in defense it's very high risk, and it needs disruption, it's like healthcare its like certain parts of financial services are very high risk industries, so they need leadership, and they need the best platform underneath in a long term strategy. >> Well Jedi actually went different. It was actually the right call, but I reported on that. But I think that what gets me is that Cerner on stage yesterday, on Yaney's keynote highlights that it's just not inefficiencies that you can solve, there's multiple win-win-win benefits so in that health care example, lower the costs, better care, better, the providers are in better shape, so in government in public sector, there's really no excuse to take the slack out of the system. >> Yeah. >> Well, there's regulation though. >> Yeah, and Dave mentioned cloud first strategies, we're also seeing a lot of movement around data. You know data is really powerful. Andy mentioned this as well yesterday, but for example in our partner keynote where I just came from. We had on stage Avis. Now, Avis, not public sector customer, but what they're doing is, the gentleman said, was that your car can now talk to you, and that data is now being given to local state officials, local city officials, they can use it for emergency response systems. So that public and private use of data, coming together, is also a big trend that we're seeing. >> I think that's a great example, because Avis I think what he said is a 70 year old company, I think the fleet was 18 billion dollar fleet. >> 600,000 vehicles. >> 600,000 vehicles, 18 billion dollars worth of assets, this is not a born in the cloud start-up, right. That's essentially transformed the entire fleet and made it intelligent. >> Right, and using data to drive a lot of their changes. Like the way they manage fuel for 600,000 cars, and the way they exchange that with local officials is helping them to you know not just be number two, but to start to take over number one. >> But to your point, data is at the core, right. >> Yeah. >> If you are the incumbent and you want to transform, you got to start with the data. >> Sandy, I want to get your reaction to two memes that have been developing on theCube this week. One is, if you take the T out of Cloud Native, and it's Cloud Naive. (Sandy laughs) The other one is, if your born in the cloud, that's great, your winning, but at the price of becoming re-born in the cloud. This is the transformation. Some are, and they're going to not have a long shelf life. So there's a real enterprise and now public sector re-birth, re-borning in the cloud, the new awakening. This is something that is happening. You're an industry veteran, you've seen a lot of waves, what's the re-born, what's this getting back on the cloud, really happening. What is going on? >> It's really interesting, because now I'm in the partner business, and one of our most successful programs is called our partner transformation program. And what that does, is it's a hundred day transformation program to get our partners drinking our own champagne, which is to be on the cloud. And one of the things, we know we first started testing it out, we didn't have a lot of takers, but now, those partners who have gone through that transformation, they're seeing 70% year to year growth, versus other apion partners, even though they're at an advanced layer, they're only seeing 34% growth. So its 2x of revenue growth having transformed to the cloud. So I think, you know back to your question, I think some of this showing the power. Like, why do you go to the cloud, it's not just about cost, it's about agility, it's about innovation, it's about that revenue growth, right. I mean 2x, 70% growth, you can't sneeze at that. That's pretty impactful. >> And you know this really hits, something of passion for me and Dave and our team is the impact on a society. This is a real focus across all generations now, not just millennials, and born in the web, into older folks like us, who have seen before the web. There's real impact, mission driven things. This is a check for good, shaping technology for good. Educate you guys have. This is a big part of what you guys are doing. >> Absolutely, this is one of the reasons why I really wanted to come work in the public sector, because it's fun helping customers make money, and we still do that. But it's really better, when you can help them make money and do great things. So you know, making with the Mayo clinic, for example, and some of these non-profit hospitals, so they can get better data. The GE example that Andy used yesterday, that data is used in public sector. Doing things, like, I know that you guys are part of re-powered tech. You know we brought a 112 unrepresented minorities and women to the conference. And I have to tell you I got goosebumps when one person came up to me and he said, it's the first time he stayed in a hotel, and he's coming here to enhance his coding. You don't realize when I go back to my country, you will have changed my life. And that's just like, don't you get goosebumps from that, versus it's great to change a company, and we want to do that, but it's really great when you can impact people, and that form or fashion. >> And the agility makes that happen faster, its a communal activity, tech for good is here. >> Absolutely, and we just announced today, right before this in the partner's session, that we now have the public safety and disaster response competency for our partners. Because when a customer is dealing with some sort of disaster or emergency they need a disconnected environment for a long periods of time. They need a cloud solution to rally the troops. So we announced that, and we had 17 partners step up immediately to sign up for that. And again, that's all about, giving back, helping in emergency situations, whether it's Ebola in Africa or Hurricane Dorene, right. >> Well, Sandy congratulations, not only have you a senior leader for AWS doing a great job. >> Thank you. >> Just a great passion, and Women in Tech, Underabridged Minorities, you do an amazing job on Tech for Good. >> Thank you. Well it's such an honor to always be on the show. I love what you guys do. I love the memes, I'm going to steal them, okay. >> Can I ask you another question? >> Absolutely. >> Before you wrap. You've had an opportunity to work with developers, you've experienced other clouds. Now you're with AWS and a couple of different roles. Can you describe, what's different about AWS, is it cultural, is it the innovation, I mean what's tangible that you can share with our audience in terms of the difference. >> I think it's a couple of things, the first one the way they we hire. So we hire builders, and you know what it really starts from that hiring. I actually interviewed Vernor the other day, and he and I had a debate about can you transform a company where you have all the same people, or do you need to bring in some new talent as well. So I think it's the way we hire. We search for people that not only meet the leadership criteria, but also are builders, are innovators. And the second one is, you know when Andy says we're customer obsessed, we're partnered obsessed. We really are. We have the mechanisms in place, we have the product management discipline. We have the process to learn from customers. So my first service I launched at AWS, I personally talked to 141 customers and another 100 partners. So think about that, that's almost two hundred almost fifty customers and partners. And at most large companies, as a senior executive you only spend about 20% of your time with customers, I spent about 80% of my time here with customers and partners. And that's a big difference. >> Well we look forward to covering the partner network this year. >> Awesome >> Your amazing, we'll see Teresa Carson on theCube here at 3:30. We are going to ask her some tough questions. What should we ask Teresa? >> What to jest Teresa? Where did you get those red pants? (everyone laughs) >> She's amazing, and again. >> She is amazing. >> We totally believe in what you're doing, and we love the impact, not only the technology advancement for modernizing the public sector across the board. But there's real opportunity for the industry to make, shape technology for betterment. >> Yeah. >> You're doing a great job. Thank you so much. >> Thank you. I think we should start another hashtag for theCube too, is #technologyforgood. >> Awesome. >> What do you think? >> Let's do it. >> I love that. >> But Jonathan been doing a lot of work in that area. >> I know he has. >> We love that. #technologyforgood, #techforgood. This is theCube here live in Las Vegas for re:Invent. I want to thank Intel and AWS, this is the big stage. We had two stages, without sponsoring our mission we wouldn't be here. Thank you AWS and Intel. More coverage after this short break. (dramatic music)

Published Date : Dec 4 2019

SUMMARY :

to you by Amazon Web Services and Intel, We're here extracting the signal from the noise, Now in the public sector handling partners. So great to see you guys, so dressed up at our other guest here, the IoT suitcase. and you fall, what's the best way to get ahold of someone. Or a bigger device that you can't, And it's pink. and the modernization effort is real. Could you share some commentary on what's going on. Yeah, so if you know public sector, as well as hearing from some other folks, is that you got, So one of the things we've been focused on That's interesting, so I got to ask the question I get the Windows working on Amazon, Yeah, and you have to do it on a dedicated host. and I think you know increasingly, I guess that's the question. like the ability to abstract Windows Server 2003 to be you know 10-15 years ago, you didn't have a choice. the database freedom team that will help you do Yeah, and I think the key is, But a lot of the analytics stuff, the data warehouse, I mean pros in the enterprise, what's your take? and it's around the world, it's not just the US. And in the government, if you can get somebody that it's just not inefficiencies that you can solve, and that data is now being given to local state officials, I think the fleet was 18 billion dollar fleet. and made it intelligent. to you know not just be number two, you got to start with the data. This is the transformation. So I think, you know back to your question, This is a big part of what you guys are doing. And I have to tell you I got goosebumps And the agility makes that happen faster, Absolutely, and we just announced today, Well, Sandy congratulations, not only have you Underabridged Minorities, you do an amazing job I love the memes, I'm going to steal them, okay. I mean what's tangible that you can share And the second one is, you know when Andy says the partner network this year. We are going to ask her some tough questions. the public sector across the board. Thank you so much. I think we should start another hashtag for theCube too, Thank you AWS and Intel.

ENTITIES

Entity	Category	Confidence
David Vellante	PERSON	0.99+
Andy	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Teresa	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Jonathan	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Sandy Carter	PERSON	0.99+
Avis	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
January	DATE	0.99+
October 1st	DATE	0.99+
John Furr	PERSON	0.99+
70%	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
2018	DATE	0.99+
Teresa Carson	PERSON	0.99+
Andy Jassy	PERSON	0.99+
CIA	ORGANIZATION	0.99+
Four times	QUANTITY	0.99+
National Police Association	ORGANIZATION	0.99+
2019	DATE	0.99+
2x	QUANTITY	0.99+
141 customers	QUANTITY	0.99+
2016	DATE	0.99+
600,000 cars	QUANTITY	0.99+
Arkis Global	ORGANIZATION	0.99+
57%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
FedRamp	ORGANIZATION	0.99+
17 partners	QUANTITY	0.99+
18 billion dollars	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
yesterday	DATE	0.99+
one	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
two memes	QUANTITY	0.99+
U.S.	LOCATION	0.99+
Windows	TITLE	0.99+
First	QUANTITY	0.99+
18 billion dollar	QUANTITY	0.99+
twice	QUANTITY	0.99+
US Navy	ORGANIZATION	0.99+
Server 2008	TITLE	0.99+
Windows Server 2003	TITLE	0.99+
first service	QUANTITY	0.99+
seven times	QUANTITY	0.99+

Francesca Lazzeri, Microsoft | Microsoft Ignite 2019

>> Commentator: Live from Orlando, Florida It's theCUBE. Covering Microsoft Ignite. Brought to you by Cohesity. >> Hello everyone and welcome back to theCUBE's live coverage of Microsoft Ignite 2019. We are theCUBE, we are here at the Cohesity booth in the middle of the show floor at the Orange County Convention Center. 26,000 people from around the globe here. It's a very exciting show. I'm your host, Rebecca Knight, along with my co-host, Stu Miniman. We are joined by Francesca Lazzeri. She is a Ph.D Machine Learning Scientist and Cloud Advocate at Microsoft. Thank you so much for coming on the show. >> Thank you for having me. I'm very excited to be here. >> Rebecca: Direct from Cambridge, so we're an all Boston table here. >> Exactly. >> I love it. I love it. >> We are in the most technology cluster, I think, in the world probably. >> So two words we're hearing a lot of here at the show, machine learning, deep learning, can you describe, define them for us here, and tell us the difference between machine learning and deep learning. >> Yeah, this is a great question and I have to say a lot of my customers ask me this question very, very often. Because I think right now there are many different terms such as deep learning as you said, machine learning, AI, that have been used more or less in the same way, but they are not really the same thing. So machine learning is portfolio, I would say, of algorithms, and when you say algorithms I mean really statistical models, that you can use to run some data analysis. So you can use these algorithms on your data, and these are going to produce what we call an output. Output are the results. So deep learning is just a type of machine learning, that has a different structure. We call it deep learning because there are many different layers, in a neural network, which is again a type of machine learning algorithm. And it's very interesting because it doesn't look at the linear relation within the different variables, but it looks at different ways to train itself, and learn something. So you have to think just about deep learning as a type of machine learning and then we have AI. AI is just on top of everything, AI is a way of building application on top of machine learning models and they run on top of machine learning algorithms. So it's a way, AI, of consuming intelligent models. >> Yeah, so Francesca, I know we're going to be talking to Jeffrey Stover tomorrow about a topic, responsible AI. Can you talk a little bit about how Microsoft is making sure that unintentional biases or challenges with data, leave the machine learning to do things, or have biases that we wouldn't want to otherwise. >> Yes, I think that Microsoft is actually investing a lot in responsible AI. Because I have to say, as a data scientist, as a machine learning scientist, I think that it's very important to understand what the model is doing and why it's give me analysis of a specific result. So, in my team, we have a tool kit, which is called, interpretability toolkit, and it's really a way to unpack machine learning models, so it's a way of opening machine learning models and understand what are the different relations between the different viables, the different data points, so it's an easy way through different type of this relation, that you can understand why your model is giving you specific results. So that you get that visibility, as a data scientist, but also as a final consumer, final users of these AI application. And I think that visibility is the most important thing to prevent unbias, sorry, bias application, and to make sure that our results are fair, for everybody. So there are some technical tools that we can use for sure. I can tell you, as a data scientist, that bias and unfairness starts with the data. You have to make sure that the data is representative enough of the population that you are targeting with your AI applications. But this sometimes is not possible. That's why it's important to create some services, some toolkits, that are going to allow you, again, as a data scientist, as a user, to understand what the AI application, or the machine learning model is doing. >> So what's the solution? If the problem, if the root of the problem is the data in the first place, how do we fix this? Because this is such an important issue in technology today. >> Yes, and so there are a few ways that you can use... So first of all I want to say that it's not a issue that you can really fix. I would say that, again, as a data scientist, there are a few things that you can do, in order to check that your AI application is doing a good job, in terms of fairness, again. And so these few steps are, as you said, the data. So most of the time, people, or customers, they just use their own data. Something that is very helpful is also looking at external type of data, and also make sure that, again, as I said, the pure data is representative enough of the entire population. So for example, if you are collecting data from a specific category of people, of a specific age, from a specific geography, you have to make sure that you understand that their results are not general results, are results that the machine learning algorithm learn from that target population. And so it's important again, to look at different type of data, different type of data sets, and use, if you can, also external data. And then, of course, this is just the first step. There's a second step, that you can always make sure that you check your model with a business expert, with data expert. So sometimes we have data scientists that work in siloes, they do not really communicate what they're doing. And I think that this is something that you need to change within your company, within your organization, you have to, always to make sure, that data scientists, machine learning scientists are working closely with data experts, business experts, and everybody's talking. Again, to make sure that we understand what we are doing. >> Okay, there were so many things announced at the show this week. In your space, what are some of the highlights of the things that people should be taking away from Microsoft Ignite. >> So I think that as your machine learning platform has been announcing a lot of updates, I love the product because I think it's a very dynamic product. There is, what we now call, the designer, which is a new version of the old Azure Machine Learning Studio. It's a drag and drop tool so it's a tool that is great for people who do not want to, code to match, or who are just getting started with machine learning. And you can really create end-to-end machine learning pipelines with these tools, in just a matter of a few minutes. The nice thing is that you can also deploy your machine learning models and this is going to create an API for you, and this API can be used by you, or by other developers in your company, to just call the model that you deployed. As I mentioned before, this is really the part where AI is arriving, and it's the part where you create application on top of your models. So this is a great announcement and we also created a algorithm cheat sheet, that is a really nice map that you can use to understand, based on your question, based on your data, what's the best machine learning algorithm, what's the best designer module that you can use to be build your end-to-end machine learning solution. So this, I would say, is my highlight. And then of course, in terms of Azure Machine Learning, there are other updates. We have the Azure Machine Learning python SDK, which is more for pro data scientists, who wants to create customized models, so models that they have to build from scratch. And for them it's very easy, because it's a python-based environment, where they can just build their models, train it, test it, deploy it. So when I say it's a very dynamic and flexible tool because it's really a tool on the pla- on the Cloud, that is targeting more business people, data analysts, but also pro data scientists and AI developers, so this is great to see and I'm very, very excited for that. >> So in addition to your work as a Cloud advocate at Microsoft, you are also a mentor to research and post-doc students at the Massachusetts Institute of Technology, MIT, so tell us a little more about that work in terms of what kind of mentorship do you provide and what your impressions are of this young generation, a young generation of scientists that's now coming up. >> Yes. So that's another wonderful question because one of the main goal of my team is actually working with a academic type of audience, and we started this about a year ago. So we are, again, a team of Cloud advocates, developers, data scientists, and we do not want to work only with big enterprises, but we want to work with academic type of institutions. So when I say academics, of course I mean, some of the best universities, like I've been working a lot with MIT in Cambridge, Massachusetts Institute of Technology, Harvard, and also now I've been working with the Columbia University, in New York. And with all of them, I work with both the PhD and post-doc students, and most of the time, what I try to help them with is changing their mindset. Because these are all brilliant students, that need just to understand how they can translate what they have learned doing their years of study, and also their technical skillset, in to the real world. And when I say the real world, I mean more like, building applications. So there is this sort of skill transfer that needs to be done and again, working with these brilliant people, I have to say, something that is easy to do, because sometimes they just need to work on a specific project that I create for them, so I give data to them and then we work together in a sort of lab environment, and we build end-to-end solutions. But from a knowledge perspective, from a, I would say, technical perspective, these are all excellent students, so it's really, I find myself in a position in which I'm mentoring them, I prepare them for their industry, because most of them, they want to become data scientist, machine learning scientist, but I have to say that I also learn a lot from them, because at the end of the day, when we build these solutions, it's really a way to build something, a project, an app together, and then we also see, the beauty of this is also that we also see how other people are using that to build something even better. So it's an amazing experience, and I feel very lucky that I'm in Cambridge, where, as you know, we have the best schools. >> Francesca, you've dug in some really interesting things, I'd love to get just a little bit, if you can share, about how machine learning is helping drive competitiveness and innovation in companies today, and any tips you have for companies, and how they can get involved even more. >> Yeah, absolutely. So I think that everything really start with the business problem because I think that, as we started this conversation, we were mentioning words such as deep learning, machine learning, AI, so it's, a lot of companies, they just want to do this because they think that they're missing something. So my first suggestion for them is really trying to understand what's the business question that they have, if there is a business problem that they can solve, if there is an operation that they can improve, so these are all interesting questions that they can ask themselves their themes. And then as soon as they have this question in mind, the second step is understand that, if they have the data, the right data, that are needed to support this process, that is going to help them with the business question. So after that, you understand that the data, I mean, if you understand, if you have the right data, they are the steppings, of course you have to understand if you have also external data, and if you have enough data, as we were saying, because this is very, very important as a first step, in your machine learning journey. And you know, it's important also, to be able to translate the business question in to a machine learning question. Like, for example, in the supervised learning, which is an area of machine learning, we have what is called the regression. Regression is a great type of model, that is great for, to answer questions such as, how many, how much? So if you are a retailer and you wanted to predict how much, how many sales of a specific product you're going to have in the next two weeks, so for example, the regression model, is going to be a good first find, first step for you to start your machine learning journey. So the translation of the business problem into a machine learning question, so it's a consequence in to a machine learning algorithm, is also very important. And then finally, I would say that you always have to make sure that you are able to deploy this machine learning model so that your environment is ready for the deployment and what we call the operizational part. Because this is really the moment in which we are going to allow the other people, meaning internal stake holders, other things in your company, to consume the machine learning model. That's the moment really in which you are going to add business value to your machine learning solution. So yeah, my suggestion for companies who want to start this journey is really to make sure that they have cleared these steps, because I think that if they have cleared these steps, then their team, their developers, their data scientists, are going to work together to build these end-to-end solutions. >> Francesca Lenzetti, thank you so much for coming on theCUBE, it was a pleasure having you. >> Thank you. Thank you. >> I'm Rebecca Knight, Stu Miniman. Stay tuned for more of theCUBE's live coverage of Microsoft Ignite. (upbeat music)

Published Date : Nov 5 2019

SUMMARY :

Brought to you by Cohesity. in the middle of the show floor Thank you for having me. so we're an all Boston table here. I love it. We are in the most technology cluster, I think, can you describe, So you can use these algorithms on your data, leave the machine learning to do things, that you can understand why your model is giving you is the data in the first place, And I think that this is something that you need to change announced at the show this week. and it's the part where you create application So in addition to your work and most of the time, what I try to help them with I'd love to get just a little bit, if you can share, and if you have enough data, as we were saying, thank you so much for coming on theCUBE, Thank you. live coverage of Microsoft Ignite.

ENTITIES

Entity	Category	Confidence
Francesca Lenzetti	PERSON	0.99+
Francesca Lazzeri	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Francesca	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Rebecca	PERSON	0.99+
Massachusetts Institute of Technology	ORGANIZATION	0.99+
Jeffrey Stover	PERSON	0.99+
MIT	ORGANIZATION	0.99+
New York	LOCATION	0.99+
26,000 people	QUANTITY	0.99+
first step	QUANTITY	0.99+
Cambridge	LOCATION	0.99+
Columbia University	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
second step	QUANTITY	0.99+
first	QUANTITY	0.99+
two words	QUANTITY	0.99+
Orlando, Florida	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Azure Machine Learning	TITLE	0.99+
Orange County Convention Center	LOCATION	0.99+
Cohesity	ORGANIZATION	0.99+
Harvard	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
first suggestion	QUANTITY	0.98+
both	QUANTITY	0.98+
this week	DATE	0.98+
python	TITLE	0.98+
today	DATE	0.95+
Azure Machine Learning Studio	TITLE	0.95+
one	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
idge	ORGANIZATION	0.92+
Cambr	LOCATION	0.92+
Azure Machine Learning python SDK	TITLE	0.87+
first place	QUANTITY	0.87+
Cloud	TITLE	0.87+
about	DATE	0.85+
a year ago	DATE	0.8+
next two weeks	DATE	0.79+
2019	DATE	0.68+
Ignite	TITLE	0.62+
Ignite 2019	TITLE	0.46+
Ignite	COMMERCIAL_ITEM	0.44+
Ignite	EVENT	0.31+

Howie Xu, Zscaler | CUBEconversation, May 2019

(upbeat jazz music) >> From our studios, in the heart of Silicon Valley, Palo Alto, California. This is a CUBEConversation. >> Hello everyone, welcome to this special CUBEConversation. I'm John Furrier in theCUBE Studios in Palo Alto, California. We're excited to have a great tech talk here with good friend Howie Xu, who's currently the Vice President of Machine Learning and AI at Zscaler. Formally an entrepreneur, which he sold his company Zscaler. Before that entrepreneur resident Greylock. Before that VMWare, a variety of other endeavors. Howie and I, we've known each other for a while. Great to have you come in and chat about-- >> Great to be here! >> The Zoom, Zscaler, these are the new breed modern era companies, SaaS business models. Really interesting and this is something that we were talking about on email and over text, is our topic. >> Yeah. >> Thanks for coming in. >> Great. >> So you've seen the waves at VMWare, you saw the rapid growth there. And now, you work for Zscaler which is experiencing rapid growth. You saw Zoom go public, and I just interviewed Michael Dell. We were commenting about that on text as well. He said these big markets that have big total addressable dollars associated with them are ripe for disruption. They used to have high barriers to entry in the old ways to look at it, but now with cloud and with SaaS, with data, there's different innovation speeds. This has become a big deal. Talk about your view on this. >> Well to me, when Zoom and then Zscaler founded, many years ago, no one believed that they would become this big, right? When Zoom founded, they were plenty of the conference, free even, software available out there. When Jay founded Zscaler people thought, "Well, there was enough security companies, security solutions." Clearly, they defied conventional wisdom and then they just fought on and they saw something that other people didn't see which is precisely what you were talking about. The SaaS is so different, right? The business model, the innovation speed, the data driven kind of the thing, it's so different. A lot of people say, "Hey what's the difference "between SaaS versus the convention? "Isn't that just moving that thing over to the cloud?" I actually used to think that way too, right? Isn't that just the virtual price, moving on to Amazon Cloud? After living and breathing in SaaS company and then also observing that in the VC industry as well. It's just totally different, day and night different. >> Well I wanted to get into this with you 'cause I think you bring some good perspective onto these insights and to the rocket success of say Zoom and Zscaler, but Zoom in particular, recent successful IPO. Among the recent class this past quarter. Zoom, Lyft, Uber. Zoom is standing out. They're getting profitable. This is video conferencing. You know in the old days if someone said, "Hey, I want to compete with video conferences." Well, the barriers are actually too high, but they took a very innovative approach. Cloud, data, simplicity, and the big 800 pound gorilla was the WebEx's of the world. Who was defined, divine for sharing slides, not so much pure video. (laughing) >> Yeah. >> They really innovated the focus, the speed of success. Unprecedented, in my opinion. I think this is a huge success of what the opportunities are for entrepreneurs. >> Yeah, I think on the surface, right? If you ask Eric he would tell you that, look the WebEx was designed for sharing slides, and then the Zoom was designed from ground up for video sharing, or the video conferencing, so it's very different and it requires different architecture. So that's very true. But I think there is a more fundamental to that. The more fundamental for that is, there are a few things. One is the product, the life cycle is very different. How do you approach the customer? The release cycle, the sort of the feedback loop, right? Much tighter feedback loop, much faster feedback loop between the customer and you. The release cost is much lower now as a SaaS product. So, innovation is just accelerated because it's SaaS, because it's a true SaaS. >> And this is a unique thing, you said before, SaaS isn't just lifting a on-premises workload and moving it to the cloud. It's a completely different mindset. Talk about this dynamic, because it affords new kinds of risk taking. You and I were talking about before we came on camera, share your insight on that. >> Well, you know, as kind of the traditional software you have a release cycle, you want it to have a release date, right? And then once the product is in customer hand, if you have a bug, if you have something, it's so costly to change it, right? But as a SaaS, the form factor, you can take a little bit more risk. You can even give that feature set to 10% of your audience. Not the entire set of the audience. You can do those kind of magic, so you can accelerate the innovation and as a shrink-wrapped software the traditional way. You have one shot, if that software is not good, then you are toast. >> So you can move quicker. You can push code, you don't have the on-premise dynamics. >> Yeah, the innovation and then risk taking are kind of correlated, right? Relatively more risk, the more you are willing to take risk, relatively you can take more innovation. So, that's the thing. >> Well, you and I were talking, and one of the key things that you have been talking about publicly, and amongst friends, is innovation speed. Everyone wants the innovation fever. "I got to win to innovate, digital transformation, rah rah." Easier said then done. Innovation speed is critical with cloud and SaaS, why? What's the formula there for innovation speed? >> Well, one thing we discussed, the release cycle. For a, not necessarily for Zoom and Zscaler, but you know for SaaS in general, its possible for you to have daily, weekly, monthly release. Traditional software, there is no way you can do that but that's just the release cycles of that. The other thing is, you can actually take a risk. You can say, "Hey I want you to try to raise 1% of the customer and then see how they are going to react to this." But in the traditional way you have product manager debating for six months, six years on whether or how to do things. Here, let's not debate, let's just see. >> Let's ship it. >> Right, ship it. >> And Reid Hoffman always says, "If he's not embarrassed by your first shipment then you're not doing it properly." Which begs the question, I want to get your thoughts on this because, again with VMware, you saw how early that worked and their transforming cloud is now here unlike when they started the company. What is the right way to do it? And what's the wrong way to do it? When you look at an entrepreneur or a friend, who's trying to get a business off the ground, SaaS business, when you look at what they're doing, and you look at their mechanisms and how they're organizing their team, their code. What jumps out at you as the wrong way, and what's the right way? >> Well, the, I think the coach is really it, right? You know, the kind of the coach of incremental success and the fast iteration is the culture for a SaaS company, right? For the traditional one, you cannot afford to do that, because once you make a small mistake, you are toast. So I think, you know, that the culture difference, you really want it to have faster iteration basically. >> And that also comes down to the team, the people, right? >> Yes. >> The people selection. >> Yes, if you are kind of used to the waterfall thing, it's pretty hard to adapt to this kind of the SaaS world. >> And what's your advice to entrepreneurs? Reset, because if you say speed is of the essence, resetting is probably something that's not hard to do, then. >> Well, I wouldn't say easy, but not easy-- >> I hate to use the word pivot, but you know, resetting means okay, stop, rebuild. >> I think one way to think about it is actually looking at it and how to build enterprise software, like the consumer sort of product way, right? If you think of Facebook or Google, the traditional Google, of course Google now has enterprise product, but the traditional sort of, the Google, Facebook, kind of the product, it's more for consumers to consume. I mean they are fast iterations. How often? What's the criteria to release a product? Enterprise product is getting towards there. You need that kind of the thing, so, if you don't know how to do it look at a Facebook, how Facebook, of course Facebook and YouTube pulled the other way around, they need to care more about the privacy, care about more stability. So I think you are seeing the the two sides of the world, the enterprise side and the consumer side. They are learning from each other. >> Well, I want to get to the enterprise talk track in a second, because I think you can give a lot of insight, so I want to stay on SaaS cloud native or cloud specifically, 'cause that's where SaaS really shines when you're really talking about cloud scale. Data, you're doing AI now, and you and I have both talked about data many times. >> Yes. >> You know I'm a data hardcore person. I love data. I think software and data, I wrote a blog post in 2007, that says data is the new developer kit. The word "developer kit" was used back then. You're now seeing where data is part of the developer's piece of their value creation. Highly addressable, available, usable, not stored in some silo unaddressable, high latency to get it. How important is the data for the SaaS piece? Because that's where to make these kind of changes you're talking about, you need the data, data's giving you insights, that's something that's near and dear to your heart. Explain your vision of the role of data. >> Yeah, I think, you touched up on it. If you want to make sense out of something, you need the data, right? And if it's not SaaS, I would go, maybe a more extreme way, but it's not clear to me the data's even useful to you 'cause you know the data may be for some large software company, they may have hundreds of thousands of customers out there, but the data is spread around. I mean how are you going to train a model with all the data spread around hundreds of thousands of locations? So the real, the correct, or the optimal way, is actually the SaaS model, you actually have the data with you and then you kind of leverage the data. So I would say this is actually another benefit of the SaaS, why SaaS is going to change the world or eat the world. It owns the data for real, right? The data may be not the private data, but it's actually could be a behavior data. How people are reacting to your features. From VMware days we wanted to know, is people even using this feature? How often people use this feature? You know people are always debating, "Hey what's the maximum policy we need to give this and that?" But in the SaaS world, no debate just look at it. We always say, "Don't listen to what customers are wanting you to do." But watch how they do things, so that you can sort of understand, what product you want to develop, right? Here you actually can really watch how customers using your product. Don't listen to them, if you listen to them you will give them a faster horse as we all knew. >> But what's important about the data discussion, because, a security person would say, "Hey if you put into one spot, I can hack it." But, it's not just people's names, it's other data. It's gesture data, it's usage data, so you're not talking about sign in data, it's data. >> It could be the behavior, it could be second order data. Do people use my product, that's my data. That's something I wanted to know, I'm not necessarily talking about peeking into people's email, no. It's actually the thing surrounding it. >> It's looking for the good things in the data. All right, let's talk about the customer alignment and customer expectations, you know customer user experience is driven by customer's expectations usually, right? As expectations change. And I think the Zoom thing jumped out at me, the Zoom IPO and their great success and were a customer as well, is that they really nailed the expectation of the user and cloud certainly helped them get that speed, but this is a key thing, if you could just deliver a great experience. >> Yeah. >> For those customers, you can actually win big part of the market. >> Yeah, if you Google, Eric. Eric doesn't speak to me as much, but if you Google Eric. >> We'll get him on theCUBE. >> What's sort of the jump? Hopefully I can help you to bring him here too. But what's going to be obvious if you Google search Eric he is sort of the notion of customer successes, my success. If customer is happy, I'm going to happy. So, my happiness hinges on the customer's happiness. So that's, kind of very important because only the SaaS model made that more natural. In traditional model, whether traditional on prime or we're not, you sort of celebrate when you have customer signing your PO and then you don't hear from the sales guy or three years, the sales guy may move on to another company, you don't know, right? But for the SaaS, it doesn't stop when sign the PO. You actually have to earn customers' happiness every single day. >> Adoption's critical. >> Yeah, customer success is important and then that's kind of the, so there is a huge alignment, very interesting alignment between customer's happiness, customer success, customer adoption of your product and you're sort of, the success, right? 'Cause you know, when I came to Zscaler, one of our first meeting is about, okay, we had a lot of customer interest us. They sign a PO. How to get them ramp up the actual first use, right? So, that kind of conversation doesn't happen in the traditional software company. You sign a PO. If the customer doesn't use your product for another 18 month which is actually quite normal, no one is going to jump up and say, "This is crazy!" Right? >> You know, we're going to do that on our Part Two, about the impact of the enterprise. But you made up a good point there, I want to just close out our last talk point is, the data driving the experience isn't like the old way of throw in, get the PO and celebrate. You got to, kind of, keep that going. The enterprise is changing and the enterprise has a tsunami of onboarding of new types of developers. In some cases they grow. We just had Cisco inside here on theCUBE this morning. They're turning network guys into programmers from CL command line prompt dudes to gals to coders. You're seeing developers now enter the enterprise to build the apps so there's now a digital transformation initiative for enterprises to be, I guess, SaaS-like. But it's hard. >> Yeah, I think that's, you know, this is part of the digital transformation. Every company, Fortune 500 or Fortune 2000 company need to do it, right? So, another interesting part is, when they do this on this journey of digitalization, you cannot possibly build all the infrastructure yourself. You will have to consume public cloud, you know sometimes private and hybrid cloud, and you are actually going to consume lots of the SaaS, right? Whether Zoom, the Zscaler, or the PagerDuty, I mean you are not going to be all those thing from scratch but you want it to have a very good, sort of the stack on top of it and how you going to take advantage of the SaaS, is a very interesting aspect. >> Well in Part Two of our chat, when we come back on our next discussion, I want to get into the enterprise. But to wrap up Part One here, innovation speed, leveraging data and the beautiful risk taking and benefits of SaaS. Large scale, fast, high value, target and developing an app or a venture. >> Yeah. >> What is your advice to entrepreneurs out there and/or someone who's doing a digital transformation? Where they want to leverage Saas, what's the playbook, what's the starting point, what's your advice? >> Well, there are a number of things. One, there are so many SaaS companies out there taking advantage of them, right? In the old days you have to hire email admins, you have to do this. Nowadays, all the SaaS, that's your kind of, you only need to worry about the business logic, you have some unique insight in the business and then just have, hire programmers to codify that and then the rest will magically happen because of the public cloud, because of the SaaS. So, be very mindful about the new environment you are in, that's number one. The second thing I want to say is, how do you look at AI technology? The older way is program something in a definitive way. I think there will be a limit for that. It has taken the software industry a long way to where we are. But, if you look at the next 20 years, I think a lot of the lift is going to be done by the AI Center. But it's not going to be easy to be done, you have to think about your data strategy, where are you going to have the massive, sustainable, unique, ideally even labeled data. If you don't have the labeled data, you have to have the strategy. How are you going to have some unique model with the data you have? So, the data strategy, right? So, essentially, how to take advantage of the cloud? How to take advantage of the data? And then on top of that you are going to do something that's solving an unmet um-- >> Customer problem. >> Customer problem. >> An acute landing spot in the market place. >> Unmet need. >> In a big market. >> In a big, well, in a big market. >> There it is. >> Even if there is already a mature solution I bet, since those mature solutions would not develop from that native cloud era, and the native AI era. You have plenty of opportunities. >> Howie, you and I are on the same page on this, I have been saying it truly believe we are living in an entrepreneurial era where, with your advice and what you just laid out, the better mousetrap can take down a big market. >> And, I'm hopeful that you will also disrupt the media business, you know we're-- >> Don't tell anyone! (laughing) We're still going to do that top secret of Silent Running. Howie, we're going to get Part Two. We're going to dig Deep into the enterprise, because the enterprise now has an opportunity in the first historic time in tech history, to use tools and technologies to completely reset and re-architect for this kind of capability. >> Absolutely. >> So, we'll hit that in Part Two. >> I'm super passionate about it too. >> Howie Xu, here inside theCUBE. Friend of theCUBE, legend in the industry. Great entrepreneur and technologist here, sharing CUBEConversation. I'm John Furrier, thanks for watching. (upbeat jazz music)

Published Date : May 17 2019

SUMMARY :

in the heart of Silicon Valley, Great to have you come in and chat about-- that we were talking about on email and over text, you saw the rapid growth there. the data driven kind of the thing, it's so different. 'cause I think you bring some good perspective They really innovated the focus, the speed of success. One is the product, the life cycle is very different. And this is a unique thing, you said before, so you can accelerate the innovation You can push code, you don't have the on-premise dynamics. the more you are willing to take risk, that you have been talking about publicly, But in the traditional way you have product manager and you look at their mechanisms For the traditional one, you cannot afford to do that, Yes, if you are kind of used to the waterfall thing, Reset, because if you say speed is of the essence, I hate to use the word pivot, but you know, kind of the product, it's more for consumers to consume. and you and I have both talked How important is the data for the SaaS piece? and then you kind of leverage the data. "Hey if you put into one spot, I can hack it." It's actually the thing surrounding it. if you could just deliver a great experience. For those customers, you can actually but if you Google Eric. and then you don't hear If the customer doesn't use your product The enterprise is changing and the enterprise and you are actually going to consume leveraging data and the beautiful risk taking In the old days you have to hire email admins, in a big market. and the native AI era. Howie, you and I are on the same page on this, in the first historic time in tech history, Friend of theCUBE, legend in the industry.

ENTITIES

Entity	Category	Confidence
Howie	PERSON	0.99+
2007	DATE	0.99+
Eric	PERSON	0.99+
Michael Dell	PERSON	0.99+
John Furrier	PERSON	0.99+
six years	QUANTITY	0.99+
Reid Hoffman	PERSON	0.99+
May 2019	DATE	0.99+
six months	QUANTITY	0.99+
Howie Xu	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Jay	PERSON	0.99+
Zscaler	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Lyft	ORGANIZATION	0.99+
Zoom	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
1%	QUANTITY	0.99+
18 month	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
first shipment	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.98+
800 pound	QUANTITY	0.98+
one shot	QUANTITY	0.98+
SaaS	TITLE	0.97+
second thing	QUANTITY	0.97+
first meeting	QUANTITY	0.97+
Zscaler	PERSON	0.97+
Silent Running	TITLE	0.96+
AI Center	ORGANIZATION	0.96+
Part One	OTHER	0.96+
hundreds of thousands	QUANTITY	0.96+
Silicon Valley,	LOCATION	0.95+
Amazon	ORGANIZATION	0.95+
One	QUANTITY	0.95+
first use	QUANTITY	0.95+
Saas	TITLE	0.94+
hundreds of thousands of customers	QUANTITY	0.94+
Fortune	ORGANIZATION	0.93+
one spot	QUANTITY	0.93+
CUBEConversation	EVENT	0.92+
this morning	DATE	0.91+
Part Two	OTHER	0.91+
VMware	ORGANIZATION	0.91+
Zscaler	TITLE	0.9+
Fortune 500	ORGANIZATION	0.9+
second order	QUANTITY	0.9+
WebEx	ORGANIZATION	0.89+
PagerDuty	TITLE	0.88+
many years ago	DATE	0.86+
one way	QUANTITY	0.82+
prime	COMMERCIAL_ITEM	0.81+

Stefan Sigg, Software AG & Dave McCann, AWS | AWS re:Invent 2018

>> Live from Las Vegas, it's theCUBE covering AWS re:Invent 2018. (techy music) Brought to you by Amazon Web Services, Intel, and their ecosystem partners. (techy music) >> Welcome back, everyone, live coverage here of AWS re:Invent 2018, I'm John Furrier with Dave Vellante. Two sets, three days of wall-to-wall coverage. Hundreds of videos, great content. Three-hour keynote from Andy Jassy, 52,000 people here. This is where the industry now is getting together to set the agenda for the future. It's cloud-based, it's on-premise, it's all cloud all the time, our next two guests are with Amazon. Dave McCann, who's with the marketplace, and Stefan Sigg, chief R&D officer at Software AG, great to see you. >> Good being back. >> Great to come back. >> Thank you for having me. >> So, I've got to say, you know, the customer dynamic that you guys have is pretty impressive, you guys are a customer. The value creation of the cloud is pretty amazing. What's your world like these days in terms of your market? You're in Europe, you've got thousands of customers, what's the update? >> Well, I mean, we're operating worldwide, US being the biggest market, so 70 countries we're in. From a customer perspective, all over the place with our labs, and obviously, I mean the cloud and the digitalization is a whole new ballgame, and we are... Just at the time, we are reinventing ourselves for maybe the third time in order to push, to help the customers to go that transition, and our middleware expertise and our expertise that we now newly have added to in terms of IoT is just amazingly how that momentum is showing. >> I'd like to get your analysis of Andy Jassy's keynote, and I'll throw one perspective at you. >> Yeah. >> besides the IoT awesomeness that the edge is now with satellites coming-- >> Yes. >> And unlimited connectivity in the future. >> Yes. >> But it kind of points out that this new kind of software developer, new personas-- >> Oh, yeah. >> The builder, the right tool for the right job. >> Yeah. >> There's a set of services now out there that can be merchandised and bought and sold. Marketplace, which you run, software design's changing, but also consumption upon the buying side's changing. What's your analysis to that? >> Well, for us it's just... It couldn't be better, because it's now, again, that software comes into enterprises. It has been pushed aside for many years because people would just implement standard software, would implement, you know, office software, and now all of a sudden, driven by the digital transformation and stuff like IoT, there's a demand for software, building software for their own needs, not just for the back office, but you know, equipping the products with sensors, with data, and enhanced software. So, that's exactly our play, helping those customers, those enterprises to just start their software where it's necessary, and we provide the platform getting them there. >> Yeah. >> So, Dave, Software AG's almost as old as I am. Right, mainframe, you went through the client server, you dealt with the desktop, and now the cloud era. How are you helping companies like Software AG maintain their relevance, keep their infrastructure modern? How does that all work, give us some insight on that. >> So, first of all, AWS broadly is obviously working with all the world's top software companies, and if you think of it, all the large enterprises in the world are moving their applications onto the cloud, and when if you think of the average enterprise has got 1,000 applications, those 1,000 applications are woven into a lot of third party software, so as our AWS customers move onto AWS, they want to bring their software with them, and clearly we work with companies like Software AG, and these guys are modernizing and rearchitecting their software, and the launch we just did today on container marketplace, so now we've launch marketplace for containers. It's a new way of packaging your software up in a microservices model, and Software AG has already refactored 10 of their product lines onto containers, so they're modernizing, our customers are modernizing, and we're working together. >> And so, Stefan, is it a case where you say to the customer, "Run it wherever you want it," or is it more aggressive, like, "Okay, we're moving "to the cloud, you're moving with us." How does it all work, what's the customer conversation like? >> Customer conversation is, you know, customers come and they already decided their pace of going into the cloud, their, you know, maturity level going into the cloud, and for the foreseeable future, there will be a hybrid world, there will be a hybrid world. Still some pieces on-premise, new things on the cloud, application integration within the cloud, application integration from the cloud to on-premise, device integration is coming up, the integration to edge use cases-- >> Yeah. >> Very much a big topic. So, it's a rebirth of our core technology that we are now seeing-- >> Yeah. >> And we are taking our customers with us, and they take us with them. >> You know, the thing that's interesting is that the whole software building market, development or builders, and right tool for the right job, needs to have a broad set of tools available, because if you go to an IoT edge application, for instance, right, that's a complete custom build, in a way, so you don't want to have it be a one-off, just have the tools available, then it's just how you build. >> Yeah, yeah. >> You build a unique solution for the unique use case for the unique workload, use the cloud as distribution, so you need a lot of services, so this is kind of the preferred model versus buying a general purpose application and stuffing it into a use case. (chuckling) >> Well, you've got to understand that when you go to the cloud you're going to redesign a lot of your applications. It's not a simple lift and shift. In some cases it's right new, and on some occasions the developers want to use the tools they love, so you know, you guys have got, what, 10,000 customers? >> Sure. >> Call it 10,000. Those 10,000 customers have all got skills and developers, so you've probably got a million developers that understand Software AG, and they're coming onto the cloud. They want to be familiar with what they're working with. >> Yep, yeah. >> So, what I want to give, and what AWS wants to give the developer, is a consumer experience that when the developer has a project they can find the software. >> Yeah. >> And so, what we want to do is we're publishing Software AG's products right in Marketplace, and you know, yesterday we announced that we now have 200,000 customers in AWS Marketplace. Two years ago I announced for the first time that we had 100,000, so we've doubled the number of customers using marketplace in two years, and the reason is that the developers are showing up and finding the software they want-- >> Yep. >> And the more software we add, the more developers come and use Marketplace. >> It's like going to Home Depot. I need a new tool. (chuckling) You know, I need a new service, hit the catalog. This is the preferred, and with containers and Kubernetes you're seeing that explosive integration happen. People are integrating faster now because of, say, containers and Kubernetes, and with more compute, it's only more goodness to accelerate the Kubernetes and containers, so that's got to be great glue for your business. >> Well, it is just the state of the art. I mean, this virtualization technology has evolved, and now it's there with Kubernetes and Docker and containers, so that's what customers even expect us doing, yeah, and then beyond that they expect us being present in marketplaces, yeah. Like, the AWS Marketplace is the place to be. >> Yeah, it's good for-- >> That's where people are looking for us, so we better be there. >> Containers are taking off for several reasons. You know, if you're a developer, one of the compelling things about containers is consistency of deployment. You can run Kubernetes on your laptop. You can run Kubernetes up on a server. You can run Kubernetes on the cloud. So, you can develop it on your laptop, provision up on the server, and then deploy on AWS, so that consistency is very compelling to the developer. What we're doing is by putting it in Marketplace we're making it really easy, because with ECS and EKS, whether it's the Docker container model, the Kubernetes Orchestrator, we allow the developer on AWS to be well-integrated into the AWS environment. >> So, add edge into that equation, and how does that consistency flow through? What's your edge strategy in terms of developing applications? >> Well, the edge strategy is clearly providing the... At the same time, the same way we provide the platform for our usual application development, there is a huge demand for edge development. >> Yeah. >> So, for example, we have a great customer out there in Germany. They're the world market leader for paint robots. >> Yeah. >> So, obviously if you want to maintain a paint robot, it's an edge thing, yeah, so we want to make sure that the data is close to the edge, is close to the device that it can monitor and do the recognition of failures. >> The thing I want to just add to that, that you mentioned about Kubernetes and the software deployment, is that when you got Lambda, you got these services that are so fast, you can do a lot with that, so as a service you can bring that together. So, the idea of throwing more compute at it, in hundreds of milliseconds you can wrap VMs around things, you can do cool things, so almost a change of buyer behavior is built into the development process. So, that's good for your business, it's good for your business, and companies are changing their business model. So, Cisco, for instance, did a deal with you guys. A couple weeks ago we covered it. They're using EKS for all the cloud stuff, so they have their stuff on their premise, so they go, "Hey, great!" >> Yeah, so containers as a next generation of deployment is one of your choices, right? You can go SAS, you can go serverless, you can go containers, and companies are going to have all three in the mix. All of the software companies that are going to be repackaging for containers, and the other thing that we've done with containers in Marketplace is we're actually metering by the second. A lot of containers run for a very short space of time. I don't know if you know this, but 50% of containers don't run for a week. You spin them up, you shut them down. You spin them up, you shut them down, and so the consumption of the software is moving much more into pay for how much you use. >> And you're granular. >> And we're granular, so we're going to meter by the second. The vendors are typically going to price monthly and annually, or hourly, depending on what the vendor choice is, and so we're going to make it easy for that to happen, and of course, the other thing we do is that by Software AG being in Marketplace it goes on the developer's bill, developer shows up with an account. The developer just gets the Software AG software and runs it, and what makes it really easy for Software AG is that developer has a contract with AWS, but they're now using Software AG's software. >> Well, congratulations, a great opportunity. By the way, I saw the announcement about having a marketplace for machine learning, too. A lot of things happening. >> Right. >> So, the machine learning marketplace, in a way, actually leverages the same capability as the container marketplace, because if you think of it, in machine learning we're packaging up the model, or the algorithm, in a Docker container. >> Yep. >> The difference, however, is that instead of rendering the container into ECS and EKS, we actually deploy the container right into the SageMaker console, so it's a different console, and the user over there is either a data scientist-- >> Yeah. >> Or a developer, but they're going to find that packaged in a container and provision it in SageMaker and then apply the model, and you're right, we announced today... Andy announced the marketplace with machine learning with over 200 different machine learning models. >> Yeah. >> So, we had 160 container packages and we had 200 machine learning models. So, now around the world developers suddenly have access to 300 new pieces of software that they didn't have yesterday. >> I love this market, web services. Going back to the old 2001 timeframe. It's now happening, service-oriented architectures are all happening, catalogs of services, it's what it is. It's being realized right now. >> It is. >> And it's impacting and the results are obvious. The business model evolution, opportunities, not a bad thing, marketplaces of the future. You're going to be all marketplace-driven. >> AWS Marketplace right now is probably the largest live, in production infrastructure library with third party software. >> Congratulations, Dave, nice to see the success. Great to hear about these success stories there, good job. >> And you know, ultimately we've got to remember that what we're delivering is a world class experience to the customer, but a marketplace only works if we have ISVs. >> Hm... >> Yeah. >> So, I want to thank Software AG, because now all of our customers have access to their software, thank you. >> Customers win. >> Thank you. >> Thanks very much. >> It's been a pleasure. >> It's a win-win, everyone wins with the cloud. That's the best part of co-creation and the cloud scale. I'm John Furrier with Dave Vellante. Stay with us, more coverage here, day two of AWS re:Invent after this short break. Stay with us. (techy music)

Published Date : Nov 29 2018

SUMMARY :

Brought to you by Amazon Web Services, Intel, all the time, our next two guests are with Amazon. you know, the customer dynamic that you guys have for maybe the third time in order to push, I'd like to get your analysis of Andy Jassy's keynote, in the future. for the right job. Marketplace, which you run, software design's changing, the digital transformation and stuff like IoT, How are you helping companies like and the launch we just did today on container marketplace, to the customer, "Run it wherever you want it," and for the foreseeable future, there will be that we are now seeing-- And we are taking our customers You know, the thing that's interesting is that for the unique workload, use the cloud as distribution, so you know, you guys have got, what, 10,000 customers? and they're coming onto the cloud. the developer has a project they can find the software. and the reason is that the developers And the more software we add, This is the preferred, and with containers and Kubernetes Like, the AWS Marketplace is the place to be. for us, so we better be there. You can run Kubernetes on the cloud. At the same time, the same way we provide They're the world market leader and do the recognition of failures. and the software deployment, and so the consumption of the software is moving and of course, the other thing we do By the way, I saw the announcement about having So, the machine learning marketplace, Andy announced the marketplace with machine learning So, now around the world developers are all happening, catalogs of services, it's what it is. And it's impacting and the results are obvious. the largest live, in production infrastructure Congratulations, Dave, nice to see the success. And you know, ultimately we've got to because now all of our customers That's the best part of co-creation and the cloud scale.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave McCann	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Stefan Sigg	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Stefan	PERSON	0.99+
50%	QUANTITY	0.99+
Europe	LOCATION	0.99+
2001	DATE	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
10	QUANTITY	0.99+
Andy	PERSON	0.99+
10,000 customers	QUANTITY	0.99+
100,000	QUANTITY	0.99+
200,000 customers	QUANTITY	0.99+
Three-hour	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Home Depot	ORGANIZATION	0.99+
70 countries	QUANTITY	0.99+
1,000 applications	QUANTITY	0.99+
300 new pieces	QUANTITY	0.99+
10,000	QUANTITY	0.99+
52,000 people	QUANTITY	0.99+
Software AG	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
US	LOCATION	0.99+
Two sets	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
third time	QUANTITY	0.99+
Two years ago	DATE	0.98+
Hundreds of videos	QUANTITY	0.98+
a week	QUANTITY	0.98+
first time	QUANTITY	0.98+
second	QUANTITY	0.98+
today	DATE	0.98+
Lambda	TITLE	0.97+
one	QUANTITY	0.96+
two guests	QUANTITY	0.96+
hundreds of milliseconds	QUANTITY	0.96+
three	QUANTITY	0.95+
one perspective	QUANTITY	0.92+
two years	QUANTITY	0.91+
over 200 different machine learning models	QUANTITY	0.91+
SageMaker	TITLE	0.9+
200 machine learning models	QUANTITY	0.89+
Kubernetes Orchestrator	TITLE	0.89+
thousands of customers	QUANTITY	0.89+
160 container packages	QUANTITY	0.87+
AWS re:Invent 2018	EVENT	0.85+
Docker	TITLE	0.85+
day two	QUANTITY	0.85+
ECS	TITLE	0.84+
Kubernetes	TITLE	0.84+
first	QUANTITY	0.82+
couple weeks ago	DATE	0.8+
EKS	TITLE	0.79+
re:Invent 2018	EVENT	0.77+

Bob DeSantis & Jason Gabbard, Conga | Conga Connect West at Dreamforce 2018

(exciting electronic music) >> From San Francisco, it's theCUBE, covering Conga Connect West 2018. Brought to you by Conga. >> Hey, welcome back everybody, Jeff Frick here with theCUBE. We're in downtown San Francisco at the Thirsty Bear. We're at Dreamforce. I can't get an official number, I keep asking, but the number they're throwing around is 170,000 people, so if you're coming, do not bring your car. It will take you four days to get here from AT&T and I think the Giants have a home game today, too, which just makes things even more interesting. But we're at a special side event, it's the Conga Connect West event here at the Thirsty Bear, three doors down from Moscone South, so we're excited to be here. It's our first time at Salesforce, and to kick things off, we've got Bob DeSantis, the chief operating officer of Conga, and with him, Jason Gabbard, the head of AI strategy. So gentlemen, welcome. >> Thank you. >> Good morning, great to be here with you. >> So what a cool event. You guys have this thing rented out for three days. >> Yep. You've got entertainment, you've got the silent disco. I think tomorrow night, some crazy bands. >> Yeah, we've got an open bar, food going all day and all night, actually we did this last year, and we were so crowded that this year we rented the parking lot behind and we built two circus tents so we actually extend all the way out to the next block. We have multiple sponsors here helping us to bring their customers and their partners in. So, open bar, open food, meeting rooms, demo stations, a place to come and relax and kick back a little bit from the chaos of those 170,000 people just a block away. >> It's just crazy, so come on down and meet the Conga crew and all the people, you have a good time. Let's jump into it. The topic at hand is AI. We are all the buzz about AI, AI, AI, machine learning, artificial intelligence, and what we hear time and time again is no one, I just need to go buy some AI. Really that's not the way the implementation is going to work, but where we see it in a great example I like to use a lot that people are familiar with is Gmail, those little tiny automated responses back to that email, there's actually a ton of AI behind those setting context and voice, and this that and the other. How are you guys leveraging AI in your solutions? You've been at this for a while. AI represents a great new opportunity. >> Yeah, it really is, Jason do you want to? >> Yeah, sure, you may not be aware, but Conga has actually been developing AI inside of the contract management system for a few years now, and I came over to Conga in connection with the acquisition of a company I founded focused on AI, and so obviously, things are getting a lot more interesting, technology is getting a lot more robust. You know, I think you made a great analogy to Gmail. Inside of the Conga CLM, Conga Contracts, you'll actually see that we're starting to make suggestions around contracts, so you may load a document in and you might see a popup over in the margin that says, "Hey, is this a limitation of liability clause?" So that's one example of AI working in the background of CLM. >> Well, I was going to say, what are some of the things you look for? I had a friend years ago, he had a contract management company, and I was like, "How?" And this was before OCR, and it was not good. "How? How are you doing this?" He goes, "No, if we just tell them where's the document and when does it expire, huge value there." He sold the company, he made a ton of money. But obviously, time has moved along. A lot of different opportunities now, so what are some of the things you do in contract lifecycle management? >> Think of that example as phase one of contract lifecycle management. Just get all my contracts into a common repository, give me some key metadata, like what's the value, who are the counterparties, and what's the expiration date? That's huge. So, ten years ago, 15 years ago, that was the cutting edge of CLM, contract lifecycle management, now the evolution has continued, we're in what we think of as sort of the third phase of CLM. So now, how do we actually pull actionable data out of contracts? So having the contract, you mentioned OCR, having machine readable data in a repository is great, but what's actually in the contract? What did we negotiate six months ago that now could have an impact on our business if we knew it? If we could act on it? And so with Conga AI, and the machine learning technology that Jason's company developed, and that we've now embedded in our CLM products, we can unlock the data that's hidden in documents, and make it actionable for our customers. >> So one of the things that you used to trigger that action, because the other thing about contracts we always think about, right, is you negotiate them, it's a pain in the butt, you sign them, then you put them in the file cabinet, nobody thinks about it again. So in terms of making that more of a living document beyond it's just simply time to renew, what are some of the things that you look for using the AI? Are you flagging bad things, are you looking for good things, are you seeing deltas? What are you looking for? >> I'll give you a really concrete example. We recently had a customer that negotiated a payment term to their benefit with one of their suppliers, but that payment term was embedded in the document, and their payables team was paying on net 30 when their negotiators had negotiated net 90. That data was locked in the contract. With Conga AI, we can pull that data out, update the system of record, in that case, it would have been SAP, and now the payables team can take advantage of those hard fought wins in that contract negotiation. That's just one example. >> Yeah, so two obvious use cases we're seeing day in and day out right now, number one, I'll call an on ramp to the CLM, so that's likely a new customer or relatively new customer at Conga that says, "Hey, I have 50,000 contracts." I was on the phone this morning with this precise use case. "I have 50,000 contracts, really happy to be part of the Conga family, get my CLM up and running, but now I got to get those 50,000 contracts into the system, so how do we do that?" Well, there's one way to do that, get a bunch of people together and work for a couple years and we'll have it done. The other way is to use AI to accelerate some of that. Classic misconception is that the AI is going to do all of the work, that's just not the case. At Conga, we tend to take more of a human computer symbiosis sort of working side by side, and the AI can really do the first pass. You might be able to automate something like 75% of the fields, so you can take your reduced team of people then and get the rest of the information into the system and verified, but we may be able to cut that down from a couple years to 30, 60 days, something like that, so that's one obvious use case for the technology, and then I think the second is more of a stare and compare exercise. Historically, you would see companies come in and say, "If I'm going to sign an NDA, it's got to have the following ten features, and I'll never accept x, y, and z." So we can sort of key to that with our AI, and take the first pass of a document and really do the triage, and so again, while it may not be 100%, we'll get to 80-90% and say, "Here are the three or four areas where you need to let your knowledge workers focus." >> And are there some really discrete data points that you call out in a defined field for every single contract because there always are payment terms, I imagine, obviously dates and signatures, so some of those things that are pretty consistent across the board versus, I would imagine, all of the crazy, esoteric-y stuff, which is probably their corner cases that people focus too much on relative to the value that you can get across that entire pop, 50,000 contracts is a lot of contracts. >> I don't know what your view is, but for me, I think it's follow the money. Everyone always cares about dollars, when I'm getting my dollars, and the other is follow very high risk stuff. Like indemnities, limitations and liability, occasionally you're seeing people interested in change in control, what happens if I sell my company or take on a bunch of financing, does that trigger anything? >> What's interesting about contracts is there are hundreds if not thousands of different potential clauses that could live in a contract, but in general, sort of the 90-10 rule is that there's about 40 clauses that you find in most commercial agreements, most business to business, or even business to consumer commercial agreements, so with Conga Machine Learning, we train based on the sort of use cases that extend that for a specific domain. So for example, we've done a lot of work in commercial real estate, right? So those commercial real estate agreements have that core base, but then they have unique attributes that are unique to commercial real estate, so Conga Machine Learning, as part of the Conga AI suite, can be trained to learn so that we can reduce that cycle time. You know, when we go into our tenth commercial real estate use case, it's going to be a lot more efficient, a lot faster, and a lot higher initial hit than we start training it at the beginning. For us, it's about helping customers consume the documents that make sense for their business. And machine learning is intuitively about learning, so there is this process that has to take place, but it's amazing how quickly it can learn. You use the google example, I like to think of the Amazon.com suggestion service example. They literally know what I'm going to buy before I'm going to buy it. >> Right, right. >> That didn't just happen yesterday, they've been learning that from me for the last 20 years or 15 years. We're at sort of the beginning of that phase right now in terms of B to B CLM, but it's amazing how quickly it's moving, and how quickly it's having an impact on our customers businesses. >> Yeah, I was going to ask, so where are we on the lifecycle of the opportunity of using AI in these contracts beyond just the signature date and the renewal date for some of these things? And also I would imagine, you guys can tie some of that back into your document creation process >> That's right. >> So that you again remove a lot of anomalies, and get more of a standardized process >> Yeah, so Conga provides a full digital document transformation suite, and that includes, as you mentioned, document generation capabilities, contract management, Conga AI >> Signature, the whole thing, right? >> Conga sign. So we're not here yet, but imagine if through Conga AI, we're able to learn what type of clause structure actually has a higher close rate, or a faster cycle time, or a higher dollar value for a given book of business, so customer x is selling their products to consumers or other businesses, and if we can learn, we can, how their contracts streamline and improve their effectiveness, then we can feed that right back into the creation side of their business. So that's just over the horizon. >> And then the other thing, I would imagine, is that you can get the best practices both inter-department, inter-company, and then I don't know where the legal limits are in terms of using it anonymized and the best practice data to publish benchmarks and stuff, which we're seeing more and more because people want to know the benefits of using so many of these things. You know, what's next? And then do you see triggers? Will some day it will be a trigger mechanism or is it really more a kind of an audit and adjust going forward? >> From my perspective, I think the some day is more, we're extremely focused on the analytics and the kind of discovery of documents right now, but I think looking out over the one year horizon, it's less about triggers and more about more touchpoints in the work close, and so really optimizing the contracting process, so being able to walk into a company and say, "Hey, I know you would like for this to be in all your contracts, but as a matter of practice, it's not, so maybe we need to abandon that policy, and get to a signed document faster. So more of that type of exercise with AI, and also integrating with sibling systems and testing what you expected to happen in the document versus what actually happened. That may be vis-à-vis an integration with ERP or something like that. >> It's pretty amazing, because as we know, the stuff learns fast. >> It does. >> From watching that happen with the chess and the go and everything else, and you read some of the books about exponential curves, you'll get down that path probably faster than we think. >> Yes. >> Well, Bob, Jason, thanks for taking a few minutes, and again thanks for inviting us to this cool event, and everybody come on down, there's lots of free food and drinks. >> Come down to the Thirsty Bear. >> Thanks so much. >> Alright, he's Bob, he's Jason, I'm Jeff, you're watching theCUBE. We're at the Conga Connect West event at Dreamforce at the Thirsty Bear, come on down and see us. Thanks for watching. (energetic electronic music)

Published Date : Sep 25 2018

SUMMARY :

Brought to you by Conga. We're in downtown San Francisco at the Thirsty Bear. So what a cool event. I think tomorrow night, some crazy bands. and kick back a little bit from the chaos and meet the Conga crew and all the people, Inside of the Conga CLM, Conga Contracts, of the things you look for? So having the contract, you mentioned OCR, So one of the things that you used and their payables team was paying on net 30 like 75% of the fields, so you can take your that are pretty consistent across the board and the other is follow very high risk stuff. of the Amazon.com suggestion service example. We're at sort of the beginning of that phase So that's just over the horizon. and the best practice data to publish and so really optimizing the contracting process, the stuff learns fast. and the go and everything else, and everybody come on down, We're at the Conga Connect West event

ENTITIES

Entity	Category	Confidence
Jason Gabbard	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Conga	ORGANIZATION	0.99+
Jason	PERSON	0.99+
Bob DeSantis	PERSON	0.99+
75%	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
three	QUANTITY	0.99+
100%	QUANTITY	0.99+
Jeff	PERSON	0.99+
Bob	PERSON	0.99+
hundreds	QUANTITY	0.99+
thousands	QUANTITY	0.99+
50,000 contracts	QUANTITY	0.99+
30	QUANTITY	0.99+
170,000 people	QUANTITY	0.99+
AT&T	ORGANIZATION	0.99+
Amazon.com	ORGANIZATION	0.99+
tomorrow night	DATE	0.99+
60 days	QUANTITY	0.99+
Conga Connect West	EVENT	0.99+
one example	QUANTITY	0.99+
first time	QUANTITY	0.99+
yesterday	DATE	0.99+
six months ago	DATE	0.99+
three days	QUANTITY	0.99+
15 years ago	DATE	0.99+
first	QUANTITY	0.98+
ten years ago	DATE	0.98+
this year	DATE	0.98+
ten features	QUANTITY	0.98+
third phase	QUANTITY	0.98+
Gmail	TITLE	0.98+
four days	QUANTITY	0.98+
Giants	ORGANIZATION	0.98+
both	QUANTITY	0.98+
Moscone South	LOCATION	0.97+
two obvious use cases	QUANTITY	0.97+
80-90%	QUANTITY	0.97+
today	DATE	0.97+
Conga Connect West 2018	EVENT	0.97+
first pass	QUANTITY	0.96+
Salesforce	ORGANIZATION	0.96+
second	QUANTITY	0.96+
one year	QUANTITY	0.95+
google	ORGANIZATION	0.95+
Thirsty Bear	EVENT	0.95+
last year	DATE	0.94+
about 40 clauses	QUANTITY	0.94+
Dreamforce	EVENT	0.94+
15 years	QUANTITY	0.93+
SAP	ORGANIZATION	0.93+
one	QUANTITY	0.92+
two circus tents	QUANTITY	0.92+
four areas	QUANTITY	0.91+
one way	QUANTITY	0.91+
Conga	LOCATION	0.9+
this morning	DATE	0.9+
net 90	QUANTITY	0.88+
years ago	DATE	0.86+
net 30	QUANTITY	0.84+
three doors	QUANTITY	0.84+
Dreamforce	ORGANIZATION	0.79+
Conga AI	ORGANIZATION	0.78+
couple years	QUANTITY	0.77+
OCR	ORGANIZATION	0.76+
a ton of money	QUANTITY	0.75+
single contract	QUANTITY	0.73+
tenth commercial real estate use case	QUANTITY	0.72+

Rob Thomas, IBM | Change the Game: Winning With AI 2018

>> [Announcer] Live from Times Square in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to theCUBE's special presentation. We're covering IBM's announcements today around AI. IBM, as theCUBE does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on theCUBE, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching theCUBE.

Published Date : Sep 18 2018

SUMMARY :

brought to you by IBM. Long time Cube alum, Rob, great to see you. But Rob, let's start with what you guys have going on, it's great when you have Strata, a lot of people in town. and kind of get ready for this era that we're in now. where you want to go, you're just going to wind around, and data science collaboration, you guys have It's hard to provide self-service if you don't have and it's an executive level, what are you seeing let's get to an outcome, and you can do this and I think we have a customer who's actually as the architecture to drive that modernization. So, just to remind people, you remember ODPI, folks? has the skills they need, so we're sponsoring a community and it can find data anywhere in the world. of processing power on the edge, where you can get data a couple billion dollar moves, to do some acquisitions This is why you see such a premium being put on things Is that the right way to think about it? to a Cloud-Native architecture if that's what they prefer. certain laws of the land, if you will, that say, for how you execute models that you've built. I mean, it's clear, Rob, from the conversation here, and it's not a lot of time, you'll see the examples tonight, Rob, we'll see you there, thanks so much for coming back. we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
one month	QUANTITY	0.99+
Germany	LOCATION	0.99+
last year	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
Monday	DATE	0.99+
one	QUANTITY	0.99+
100%	QUANTITY	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
first	QUANTITY	0.99+
two	QUANTITY	0.99+
ibm.com/WinWithAI	OTHER	0.99+
Watson Studio	TITLE	0.99+
Python	TITLE	0.99+
Scala	TITLE	0.99+
First	QUANTITY	0.99+
Data Science Elite	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
one step	QUANTITY	0.99+
One	QUANTITY	0.99+
Times Square	LOCATION	0.99+
today	DATE	0.99+
Vince Lombardi	PERSON	0.98+
three	QUANTITY	0.98+
Stack Overflow	ORGANIZATION	0.98+
tonight	DATE	0.98+
Javits Center	LOCATION	0.98+
Barney	ORGANIZATION	0.98+
Terminal 5	LOCATION	0.98+
IBM Analytics	ORGANIZATION	0.98+
Watson	TITLE	0.97+
two steps	QUANTITY	0.97+
New York City	LOCATION	0.97+
Watson Applications	TITLE	0.97+
Cloud	TITLE	0.96+
This evening	DATE	0.95+
Watson Machine Learning	TITLE	0.94+
two area	QUANTITY	0.93+
seven-figure deals	QUANTITY	0.92+
Cube	PERSON	0.91+

Influencer Panel | theCUBE NYC 2018

- [Announcer] Live, from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media, and its ecosystem partners. - Hello everyone, welcome back to CUBE NYC. This is a CUBE special presentation of something that we've done now for the past couple of years. IBM has sponsored an influencer panel on some of the hottest topics in the industry, and of course, there's no hotter topic right now than AI. So, we've got nine of the top influencers in the AI space, and we're in Hell's Kitchen, and it's going to get hot in here. (laughing) And these guys, we're going to cover the gamut. So, first of all, folks, thanks so much for joining us today, really, as John said earlier, we love the collaboration with you all, and we'll definitely see you on social after the fact. I'm Dave Vellante, with my cohost for this session, Peter Burris, and again, thank you to IBM for sponsoring this and organizing this. IBM has a big event down here, in conjunction with Strata, called Change the Game, Winning with AI. We run theCUBE NYC, we've been here all week. So, here's the format. I'm going to kick it off, and then we'll see where it goes. So, I'm going to introduce each of the panelists, and then ask you guys to answer a question, I'm sorry, first, tell us a little bit about yourself, briefly, and then answer one of the following questions. Two big themes that have come up this week. One has been, because this is our ninth year covering what used to be Hadoop World, which kind of morphed into big data. Question is, AI, big data, same wine, new bottle? Or is it really substantive, and driving business value? So, that's one question to ponder. The other one is, you've heard the term, the phrase, data is the new oil. Is data really the new oil? Wonder what you think about that? Okay, so, Chris Penn, let's start with you. Chris is cofounder of Trust Insight, long time CUBE alum, and friend. Thanks for coming on. Tell us a little bit about yourself, and then pick one of those questions. - Sure, we're a data science consulting firm. We're an IBM business partner. When it comes to "data is the new oil," I love that expression because it's completely accurate. Crude oil is useless, you have to extract it out of the ground, refine it, and then bring it to distribution. Data is the same way, where you have to have developers and data architects get the data out. You need data scientists and tools, like Watson Studio, to refine it, and then you need to put it into production, and that's where marketing technologists, technologists, business analytics folks, and tools like Watson Machine Learning help bring the data and make it useful. - Okay, great, thank you. Tony Flath is a tech and media consultant, focus on cloud and cyber security, welcome. - Thank you. - Tell us a little bit about yourself and your thoughts on one of those questions. - Sure thing, well, thanks so much for having us on this show, really appreciate it. My background is in cloud, cyber security, and certainly in emerging tech with artificial intelligence. Certainly touched it from a cyber security play, how you can use machine learning, machine control, for better controlling security across the gamut. But I'll touch on your question about wine, is it a new bottle, new wine? Where does this come from, from artificial intelligence? And I really see it as a whole new wine that is coming along. When you look at emerging technology, and you look at all the deep learning that's happening, it's going just beyond being able to machine learn and know what's happening, it's making some meaning to that data. And things are being done with that data, from robotics, from automation, from all kinds of different things, where we're at a point in society where data, our technology is getting beyond us. Prior to this, it's always been command and control. You control data from a keyboard. Well, this is passing us. So, my passion and perspective on this is, the humanization of it, of IT. How do you ensure that people are in that process, right? - Excellent, and we're going to come back and talk about that. - Thanks so much. - Carla Gentry, @DataNerd? Great to see you live, as opposed to just in the ether on Twitter. Data scientist, and owner of Analytical Solution. Welcome, your thoughts? - Thank you for having us. Mine is, is data the new oil? And I'd like to rephrase that is, data equals human lives. So, with all the other artificial intelligence and everything that's going on, and all the algorithms and models that's being created, we have to think about things being biased, being fair, and understand that this data has impacts on people's lives. - Great. Steve Ardire, my paisan. - Paisan. - AI startup adviser, welcome, thanks for coming to theCUBE. - Thanks Dave. So, uh, my first career was geology, and I view AI as the new oil, but data is the new oil, but AI is the refinery. I've used that many times before. In fact, really, I've moved from just AI to augmented intelligence. So, augmented intelligence is really the way forward. This was a presentation I gave at IBM Think last spring, has almost 100,000 impressions right now, and the fundamental reason why is machines can attend to vastly more information than humans, but you still need humans in the loop, and we can talk about what they're bringing in terms of common sense reasoning, because big data does the who, what, when, and where, but not the why, and why is really the Holy Grail for causal analysis and reasoning. - Excellent, Bob Hayes, Business Over Broadway, welcome, great to see you again. - Thanks for having me. So, my background is in psychology, industrial psychology, and I'm interested in things like customer experience, data science, machine learning, so forth. And I'll answer the question around big data versus AI. And I think there's other terms we could talk about, big data, data science, machine learning, AI. And to me, it's kind of all the same. It's always been about analytics, and getting value from your data, big, small, what have you. And there's subtle differences among those terms. Machine learning is just about making a prediction, and knowing if things are classified correctly. Data science is more about understanding why things work, and understanding maybe the ethics behind it, what variables are predicting that outcome. But still, it's all the same thing, it's all about using data in a way that we can get value from that, as a society, in residences. - Excellent, thank you. Theo Lau, founder of Unconventional Ventures. What's your story? - Yeah, so, my background is driving technology innovation. So, together with my partner, what our work does is we work with organizations to try to help them leverage technology to drive systematic financial wellness. We connect founders, startup founders, with funders, we help them get money in the ecosystem. We also work with them to look at, how do we leverage emerging technology to do something good for the society. So, very much on point to what Bob was saying about. So when I look at AI, it is not new, right, it's been around for quite a while. But what's different is the amount of technological power that we have allow us to do so much more than what we were able to do before. And so, what my mantra is, great ideas can come from anywhere in the society, but it's our job to be able to leverage technology to shine a spotlight on people who can use this to do something different, to help seniors in our country to do better in their financial planning. - Okay, so, in your mind, it's not just a same wine, new bottle, it's more substantive than that. - [Theo] It's more substantive, it's a much better bottle. - Karen Lopez, senior project manager for Architect InfoAdvisors, welcome. - Thank you. So, I'm DataChick on twitter, and so that kind of tells my focus is that I'm here, I also call myself a data evangelist, and that means I'm there at organizations helping stand up for the data, because to me, that's the proxy for standing up for the people, and the places and the events that that data describes. That means I have a focus on security, data privacy and protection as well. And I'm going to kind of combine your two questions about whether data is the new wine bottle, I think is the combination. Oh, see, now I'm talking about alcohol. (laughing) But anyway, you know, all analogies are imperfect, so whether we say it's the new wine, or, you know, same wine, or whether it's oil, is that the analogy's good for both of them, but unlike oil, the amount of data's just growing like crazy, and the oil, we know at some point, I kind of doubt that we're going to hit peak data where we have not enough data, like we're going to do with oil. But that says to me that, how did we get here with big data, with machine learning and AI? And from my point of view, as someone who's been focused on data for 35 years, we have hit this perfect storm of open source technologies, cloud architectures and cloud services, data innovation, that if we didn't have those, we wouldn't be talking about large machine learning and deep learning-type things. So, because we have all these things coming together at the same time, we're now at explosions of data, which means we also have to protect them, and protect the people from doing harm with data, we need to do data for good things, and all of that. - Great, definite differences, we're not running out of data, data's like the terrible tribbles. (laughing) - Yes, but it's very cuddly, data is. - Yeah, cuddly data. Mark Lynd, founder of Relevant Track? - That's right. - I like the name. What's your story? - Well, thank you, and it actually plays into what my interest is. It's mainly around AI in enterprise operations and cyber security. You know, these teams that are in enterprise operations both, it can be sales, marketing, all the way through the organization, as well as cyber security, they're often under-sourced. And they need, what Steve pointed out, they need augmented intelligence, they need to take AI, the big data, all the information they have, and make use of that in a way where they're able to, even though they're under-sourced, make some use and some value for the organization, you know, make better use of the resources they have to grow and support the strategic goals of the organization. And oftentimes, when you get to budgeting, it doesn't really align, you know, you're short people, you're short time, but the data continues to grow, as Karen pointed out. So, when you take those together, using AI to augment, provided augmented intelligence, to help them get through that data, make real tangible decisions based on information versus just raw data, especially around cyber security, which is a big hit right now, is really a great place to be, and there's a lot of stuff going on, and a lot of exciting stuff in that area. - Great, thank you. Kevin L. Jackson, author and founder of GovCloud. GovCloud, that's big. - Yeah, GovCloud Network. Thank you very much for having me on the show. Up and working on cloud computing, initially in the federal government, with the intelligence community, as they adopted cloud computing for a lot of the nation's major missions. And what has happened is now I'm working a lot with commercial organizations and with the security of that data. And I'm going to sort of, on your questions, piggyback on Karen. There was a time when you would get a couple of bottles of wine, and they would come in, and you would savor that wine, and sip it, and it would take a few days to get through it, and you would enjoy it. The problem now is that you don't get a couple of bottles of wine into your house, you get two or three tankers of data. So, it's not that it's a new wine, you're just getting a lot of it. And the infrastructures that you need, before you could have a couple of computers, and a couple of people, now you need cloud, you need automated infrastructures, you need huge capabilities, and artificial intelligence and AI, it's what we can use as the tool on top of these huge infrastructures to drink that, you know. - Fire hose of wine. - Fire hose of wine. (laughs) - Everybody's having a good time. - Everybody's having a great time. (laughs) - Yeah, things are booming right now. Excellent, well, thank you all for those intros. Peter, I want to ask you a question. So, I heard there's some similarities and some definite differences with regard to data being the new oil. You have a perspective on this, and I wonder if you could inject it into the conversation. - Sure, so, the perspective that we take in a lot of conversations, a lot of folks here in theCUBE, what we've learned, and I'll kind of answer both questions a little bit. First off, on the question of data as the new oil, we definitely think that data is the new asset that business is going to be built on, in fact, our perspective is that there really is a difference between business and digital business, and that difference is data as an asset. And if you want to understand data transformation, you understand the degree to which businesses reinstitutionalizing work, reorganizing its people, reestablishing its mission around what you can do with data as an asset. The difference between data and oil is that oil still follows the economics of scarcity. Data is one of those things, you can copy it, you can share it, you can easily corrupt it, you can mess it up, you can do all kinds of awful things with it if you're not careful. And it's that core fundamental proposition that as an asset, when we think about cyber security, we think, in many respects, that is the approach to how we can go about privatizing data so that we can predict who's actually going to be able to appropriate returns on it. So, it's a good analogy, but as you said, it's not entirely perfect, but it's not perfect in a really fundamental way. It's not following the laws of scarcity, and that has an enormous effect. - In other words, I could put oil in my car, or I could put oil in my house, but I can't put the same oil in both. - Can't put it in both places. And now, the issue of the wine, I think it's, we think that it is, in fact, it is a new wine, and very simple abstraction, or generalization we come up with is the issue of agency. That analytics has historically not taken on agency, it hasn't acted on behalf of the brand. AI is going to act on behalf of the brand. Now, you're going to need both of them, you can't separate them. - A lot of implications there in terms of bias. - Absolutely. - In terms of privacy. You have a thought, here, Chris? - Well, the scarcity is our compute power, and our ability for us to process it. I mean, it's the same as oil, there's a ton of oil under the ground, right, we can't get to it as efficiently, or without severe environmental consequences to use it. Yeah, when you use it, it's transformed, but our scarcity is compute power, and our ability to use it intelligently. - Or even when you find it. I have data, I can apply it to six different applications, I have oil, I can apply it to one, and that's going to matter in how we think about work. - But one thing I'd like to add, sort of, you're talking about data as an asset. The issue we're having right now is we're trying to learn how to manage that asset. Artificial intelligence is a way of managing that asset, and that's important if you're going to use and leverage big data. - Yeah, but see, everybody's talking about the quantity, the quantity, it's not always the quantity. You know, we can have just oodles and oodles of data, but if it's not clean data, if it's not alphanumeric data, which is what's needed for machine learning. So, having lots of data is great, but you have to think about the signal versus the noise. So, sometimes you get so much data, you're looking at over-fitting, sometimes you get so much data, you're looking at biases within the data. So, it's not the amount of data, it's the, now that we have all of this data, making sure that we look at relevant data, to make sure we look at clean data. - One more thought, and we have a lot to cover, I want to get inside your big brain. - I was just thinking about it from a cyber security perspective, one of my customers, they were looking at the data that just comes from the perimeter, your firewalls, routers, all of that, and then not even looking internally, just the perimeter alone, and the amount of data being pulled off of those. And then trying to correlate that data so it makes some type of business sense, or they can determine if there's incidents that may happen, and take a predictive action, or threats that might be there because they haven't taken a certain action prior, it's overwhelming to them. So, having AI now, to be able to go through the logs to look at, and there's so many different types of data that come to those logs, but being able to pull that information, as well as looking at end points, and all that, and people's houses, which are an extension of the network oftentimes, it's an amazing amount of data, and they're only looking at a small portion today because they know, there's not enough resources, there's not enough trained people to do all that work. So, AI is doing a wonderful way of doing that. And some of the tools now are starting to mature and be sophisticated enough where they provide that augmented intelligence that Steve talked about earlier. - So, it's complicated. There's infrastructure, there's security, there's a lot of software, there's skills, and on and on. At IBM Think this year, Ginni Rometty talked about, there were a couple of themes, one was augmented intelligence, that was something that was clear. She also talked a lot about privacy, and you own your data, etc. One of the things that struck me was her discussion about incumbent disruptors. So, if you look at the top five companies, roughly, Facebook with fake news has dropped down a little bit, but top five companies in terms of market cap in the US. They're data companies, all right. Apple just hit a trillion, Amazon, Google, etc. How do those incumbents close the gap? Is that concept of incumbent disruptors actually something that is being put into practice? I mean, you guys work with a lot of practitioners. How are they going to close that gap with the data haves, meaning data at their core of their business, versus the data have-nots, it's not that they don't have a lot of data, but it's in silos, it's hard to get to? - Yeah, I got one more thing, so, you know, these companies, and whoever's going to be big next is, you have a digital persona, whether you want it or not. So, if you live in a farm out in the middle of Oklahoma, you still have a digital persona, people are collecting data on you, they're putting profiles of you, and the big companies know about you, and people that first interact with you, they're going to know that you have this digital persona. Personal AI, when AI from these companies could be used simply and easily, from a personal deal, to fill in those gaps, and to have a digital persona that supports your family, your growth, both personal and professional growth, and those type of things, there's a lot of applications for AI on a personal, enterprise, even small business, that have not been done yet, but the data is being collected now. So, you talk about the oil, the oil is being built right now, lots, and lots, and lots of it. It's the applications to use that, and turn that into something personally, professionally, educationally, powerful, that's what's missing. But it's coming. - Thank you, so, I'll add to that, and in answer to your question you raised. So, one example we always used in banking is, if you look at the big banks, right, and then you look at from a consumer perspective, and there's a lot of talk about Amazon being a bank. But the thing is, Amazon doesn't need to be a bank, they provide banking services, from a consumer perspective they don't really care if you're a bank or you're not a bank, but what's different between Amazon and some of the banks is that Amazon, like you say, has a lot of data, and they know how to make use of the data to offer something as relevant that consumers want. Whereas banks, they have a lot of data, but they're all silos, right. So, it's not just a matter of whether or not you have the data, it's also, can you actually access it and make something useful out of it so that you can create something that consumers want? Because otherwise, you're just a pipe. - Totally agree, like, when you look at it from a perspective of, there's a lot of terms out there, digital transformation is thrown out so much, right, and go to cloud, and you migrate to cloud, and you're going to take everything over, but really, when you look at it, and you both touched on it, it's the economics. You have to look at the data from an economics perspective, and how do you make some kind of way to take this data meaningful to your customers, that's going to work effectively for them, that they're going to drive? So, when you look at the big, big cloud providers, I think the push in things that's going to happen in the next few years is there's just going to be a bigger migration to public cloud. So then, between those, they have to differentiate themselves. Obvious is artificial intelligence, in a way that makes it easy to aggregate data from across platforms, to aggregate data from multi-cloud, effectively. To use that data in a meaningful way that's going to drive, not only better decisions for your business, and better outcomes, but drives our opportunities for customers, drives opportunities for employees and how they work. We're at a really interesting point in technology where we get to tell technology what to do. It's going beyond us, it's no longer what we're telling it to do, it's going to go beyond us. So, how we effectively manage that is going to be where we see that data flow, and those big five or big four, really take that to the next level. - Now, one of the things that Ginni Rometty said was, I forget the exact step, but it was like, 80% of the data, is not searchable. Kind of implying that it's sitting somewhere behind a firewall, presumably on somebody's premises. So, it was kind of interesting. You're talking about, certainly, a lot of momentum for public cloud, but at the same time, a lot of data is going to stay where it is. - Yeah, we're assuming that a lot of this data is just sitting there, available and ready, and we look at the desperate, or disparate kind of database situation, where you have 29 databases, and two of them have unique quantifiers that tie together, and the rest of them don't. So, there's nothing that you can do with that data. So, artificial intelligence is just that, it's artificial intelligence, so, they know, that's machine learning, that's natural language, that's classification, there's a lot of different parts of that that are moving, but we also have to have IT, good data infrastructure, master data management, compliance, there's so many moving parts to this, that it's not just about the data anymore. - I want to ask Steve to chime in here, go ahead. - Yeah, so, we also have to change the mentality that it's not just enterprise data. There's data on the web, the biggest thing is Internet of Things, the amount of sensor data will make the current data look like chump change. So, data is moving faster, okay. And this is where the sophistication of machine learning needs to kick in, going from just mostly supervised-learning today, to unsupervised learning. And in order to really get into, as I said, big data, and credible AI does the who, what, where, when, and how, but not the why. And this is really the Holy Grail to crack, and it's actually under a new moniker, it's called explainable AI, because it moves beyond just correlation into root cause analysis. Once we have that, then you have the means to be able to tap into augmented intelligence, where humans are working with the machines. - Karen, please. - Yeah, so, one of the things, like what Carla was saying, and what a lot of us had said, I like to think of the advent of ML technologies and AI are going to help me as a data architect to love my data better, right? So, that includes protecting it, but also, when you say that 80% of the data is unsearchable, it's not just an access problem, it's that no one knows what it was, what the sovereignty was, what the metadata was, what the quality was, or why there's huge anomalies in it. So, my favorite story about this is, in the 1980s, about, I forget the exact number, but like, 8 million children disappeared out of the US in April, at April 15th. And that was when the IRS enacted a rule that, in order to have a dependent, a deduction for a dependent on your tax returns, they had to have a valid social security number, and people who had accidentally miscounted their children and over-claimed them, (laughter) over the years them, stopped doing that. Well, some days it does feel like you have eight children running around. (laughter) - Agreed. - When, when that rule came about, literally, and they're not all children, because they're dependents, but literally millions of children disappeared off the face of the earth in April, but if you were doing analytics, or AI and ML, and you don't know that this anomaly happened, I can imagine in a hundred years, someone is saying some catastrophic event happened in April, 1983. (laughter) And what caused that, was it healthcare? Was it a meteor? Was it the clown attacking them? - That's where I was going. - Right. So, those are really important things that I want to use AI and ML to help me, not only document and capture that stuff, but to provide that information to the people, the data scientists and the analysts that are using the data. - Great story, thank you. Bob, you got a thought? You got the mic, go, jump in here. - Well, yeah, I do have a thought, actually. I was talking about, what Karen was talking about. I think it's really important that, not only that we understand AI, and machine learning, and data science, but that the regular folks and companies understand that, at the basic level. Because those are the people who will ask the questions, or who know what questions to ask of the data. And if they don't have the tools, and the knowledge of how to get access to that data, or even how to pose a question, then that data is going to be less valuable, I think, to companies. And the more that everybody knows about data, even people in congress. Remember when Zuckerberg talked about? (laughter) - That was scary. - How do you make money? It's like, we all know this. But, we need to educate the masses on just basic data analytics. - We could have an hour-long panel on that. - Yeah, absolutely. - Peter, you and I were talking about, we had a couple of questions, sort of, how far can we take artificial intelligence? How far should we? You know, so that brings in to the conversation of ethics, and bias, why don't you pick it up? - Yeah, so, one of the crucial things that we all are implying is that, at some point in time, AI is going to become a feature of the operations of our homes, our businesses. And as these technologies get more powerful, and they diffuse, and know about how to use them, diffuses more broadly, and you put more options into the hands of more people, the question slowly starts to turn from can we do it, to should we do it? And, one of the issues that I introduce is that I think the difference between big data and AI, specifically, is this notion of agency. The AI will act on behalf of, perhaps you, or it will act on behalf of your business. And that conversation is not being had, today. It's being had in arguments between Elon Musk and Mark Zuckerberg, which pretty quickly get pretty boring. (laughing) At the end of the day, the real question is, should this machine, whether in concert with others, or not, be acting on behalf of me, on behalf of my business, or, and when I say on behalf of me, I'm also talking about privacy. Because Facebook is acting on behalf of me, it's not just what's going on in my home. So, the question of, can it be done? A lot of things can be done, and an increasing number of things will be able to be done. We got to start having a conversation about should it be done? - So, humans exhibit tribal behavior, they exhibit bias. Their machine's going to pick that up, go ahead, please. - Yeah, one thing that sort of tag onto agency of artificial intelligence. Every industry, every business is now about identifying information and data sources, and their appropriate sinks, and learning how to draw value out of connecting the sources with the sinks. Artificial intelligence enables you to identify those sources and sinks, and when it gets agency, it will be able to make decisions on your behalf about what data is good, what data means, and who it should be. - What actions are good. - Well, what actions are good. - And what data was used to make those actions. - Absolutely. - And was that the right data, and is there bias of data? And all the way down, all the turtles down. - So, all this, the data pedigree will be driven by the agency of artificial intelligence, and this is a big issue. - It's really fundamental to understand and educate people on, there are four fundamental types of bias, so there's, in machine learning, there's intentional bias, "Hey, we're going to make "the algorithm generate a certain outcome "regardless of what the data says." There's the source of the data itself, historical data that's trained on the models built on flawed data, the model will behave in a flawed way. There's target source, which is, for example, we know that if you pull data from a certain social network, that network itself has an inherent bias. No matter how representative you try to make the data, it's still going to have flaws in it. Or, if you pull healthcare data about, for example, African-Americans from the US healthcare system, because of societal biases, that data will always be flawed. And then there's tool bias, there's limitations to what the tools can do, and so we will intentionally exclude some kinds of data, or not use it because we don't know how to, our tools are not able to, and if we don't teach people what those biases are, they won't know to look for them, and I know. - Yeah, it's like, one of the things that we were talking about before, I mean, artificial intelligence is not going to just create itself, it's lines of code, it's input, and it spits out output. So, if it learns from these learning sets, we don't want AI to become another buzzword. We don't want everybody to be an "AR guru" that has no idea what AI is. It takes months, and months, and months for these machines to learn. These learning sets are so very important, because that input is how this machine, think of it as your child, and that's basically the way artificial intelligence is learning, like your child. You're feeding it these learning sets, and then eventually it will make its own decisions. So, we know from some of us having children that you teach them the best that you can, but then later on, when they're doing their own thing, they're really, it's like a little myna bird, they've heard everything that you've said. (laughing) Not only the things that you said to them directly, but the things that you said indirectly. - Well, there are some very good AI researchers that might disagree with that metaphor, exactly. (laughing) But, having said that, what I think is very interesting about this conversation is that this notion of bias, one of the things that fascinates me about where AI goes, are we going to find a situation where tribalism more deeply infects business? Because we know that human beings do not seek out the best information, they seek out information that reinforces their beliefs. And that happens in business today. My line of business versus your line of business, engineering versus sales, that happens today, but it happens at a planning level, and when we start talking about AI, we have to put the appropriate dampers, understand the biases, so that we don't end up with deep tribalism inside of business. Because AI could have the deleterious effect that it actually starts ripping apart organizations. - Well, input is data, and then the output is, could be a lot of things. - Could be a lot of things. - And that's where I said data equals human lives. So that we look at the case in New York where the penal system was using this artificial intelligence to make choices on people that were released from prison, and they saw that that was a miserable failure, because that people that release actually re-offended, some committed murder and other things. So, I mean, it's, it's more than what anybody really thinks. It's not just, oh, well, we'll just train the machines, and a couple of weeks later they're good, we never have to touch them again. These things have to be continuously tweaked. So, just because you built an algorithm or a model doesn't mean you're done. You got to go back later, and continue to tweak these models. - Mark, you got the mic. - Yeah, no, I think one thing we've talked a lot about the data that's collected, but what about the data that's not collected? Incomplete profiles, incomplete datasets, that's a form of bias, and sometimes that's the worst. Because they'll fill that in, right, and then you can get some bias, but there's also a real issue for that around cyber security. Logs are not always complete, things are not always done, and when things are doing that, people make assumptions based on what they've collected, not what they didn't collect. So, when they're looking at this, and they're using the AI on it, that's only on the data collected, not on that that wasn't collected. So, if something is down for a little while, and no data's collected off that, the assumption is, well, it was down, or it was impacted, or there was a breach, or whatever, it could be any of those. So, you got to, there's still this human need, there's still the need for humans to look at the data and realize that there is the bias in there, there is, we're just looking at what data was collected, and you're going to have to make your own thoughts around that, and assumptions on how to actually use that data before you go make those decisions that can impact lots of people, at a human level, enterprise's profitability, things like that. And too often, people think of AI, when it comes out of there, that's the word. Well, it's not the word. - Can I ask a question about this? - Please. - Does that mean that we shouldn't act? - It does not. - Okay. - So, where's the fine line? - Yeah, I think. - Going back to this notion of can we do it, or should we do it? Should we act? - Yeah, I think you should do it, but you should use it for what it is. It's augmenting, it's helping you, assisting you to make a valued or good decision. And hopefully it's a better decision than you would've made without it. - I think it's great, I think also, your answer's right too, that you have to iterate faster, and faster, and faster, and discover sources of information, or sources of data that you're not currently using, and, that's why this thing starts getting really important. - I think you touch on a really good point about, should you or shouldn't you? You look at Google, and you look at the data that they've been using, and some of that out there, from a digital twin perspective, is not being approved, or not authorized, and even once they've made changes, it's still floating around out there. Where do you know where it is? So, there's this dilemma of, how do you have a digital twin that you want to have, and is going to work for you, and is going to do things for you to make your life easier, to do these things, mundane tasks, whatever? But how do you also control it to do things you don't want it to do? - Ad-based business models are inherently evil. (laughing) - Well, there's incentives to appropriate our data, and so, are things like blockchain potentially going to give users the ability to control their data? We'll see. - No, I, I'm sorry, but that's actually a really important point. The idea of consensus algorithms, whether it's blockchain or not, blockchain includes games, and something along those lines, whether it's Byzantine fault tolerance, or whether it's Paxos, consensus-based algorithms are going to be really, really important. Parts of this conversation, because the data's going to be more distributed, and you're going to have more elements participating in it. And so, something that allows, especially in the machine-to-machine world, which is a lot of what we're talking about right here, you may not have blockchain, because there's no need for a sense of incentive, which is what blockchain can help provide. - And there's no middleman. - And, well, all right, but there's really, the thing that makes blockchain so powerful is it liberates new classes of applications. But for a lot of the stuff that we're talking about, you can use a very powerful consensus algorithm without having a game side, and do some really amazing things at scale. - So, looking at blockchain, that's a great thing to bring up, right. I think what's inherently wrong with the way we do things today, and the whole overall design of technology, whether it be on-prem, or off-prem, is both the lock and key is behind the same wall. Whether that wall is in a cloud, or behind a firewall. So, really, when there is an audit, or when there is a forensics, it always comes down to a sysadmin, or something else, and the system administrator will have the finger pointed at them, because it all resides, you can edit it, you can augment it, or you can do things with it that you can't really determine. Now, take, as an example, blockchain, where you've got really the source of truth. Now you can take and have the lock in one place, and the key in another place. So that's certainly going to be interesting to see how that unfolds. - So, one of the things, it's good that, we've hit a lot of buzzwords, right now, right? (laughing) AI, and ML, block. - Bingo. - We got the blockchain bingo, yeah, yeah. So, one of the things is, you also brought up, I mean, ethics and everything, and one of the things that I've noticed over the last year or so is that, as I attend briefings or demos, everyone is now claiming that their product is AI or ML-enabled, or blockchain-enabled. And when you try to get answers to the questions, what you really find out is that some things are being pushed as, because they have if-then statements somewhere in their code, and therefore that's artificial intelligence or machine learning. - [Peter] At least it's not "go-to." (laughing) - Yeah, you're that experienced as well. (laughing) So, I mean, this is part of the thing you try to do as a practitioner, as an analyst, as an influencer, is trying to, you know, the hype of it all. And recently, I attended one where they said they use blockchain, and I couldn't figure it out, and it turns out they use GUIDs to identify things, and that's not blockchain, it's an identifier. (laughing) So, one of the ethics things that I think we, as an enterprise community, have to deal with, is the over-promising of AI, and ML, and deep learning, and recognition. It's not, I don't really consider it visual recognition services if they just look for red pixels. I mean, that's not quite the same thing. Yet, this is also making things much harder for your average CIO, or worse, CFO, to understand whether they're getting any value from these technologies. - Old bottle. - Old bottle, right. - And I wonder if the data companies, like that you talked about, or the top five, I'm more concerned about their nearly, or actual $1 trillion valuations having an impact on their ability of other companies to disrupt or enter into the field more so than their data technologies. Again, we're coming to another perfect storm of the companies that have data as their asset, even though it's still not on their financial statements, which is another indicator whether it's really an asset, is that, do we need to think about the terms of AI, about whose hands it's in, and who's, like, once one large trillion-dollar company decides that you are not a profitable company, how many other companies are going to buy that data and make that decision about you? - Well, and for the first time in business history, I think, this is true, we're seeing, because of digital, because it's data, you're seeing tech companies traverse industries, get into, whether it's content, or music, or publishing, or groceries, and that's powerful, and that's awful scary. - If you're a manger, one of the things your ownership is asking you to do is to reduce asset specificities, so that their capital could be applied to more productive uses. Data reduces asset specificities. It brings into question the whole notion of vertical industry. You're absolutely right. But you know, one quick question I got for you, playing off of this is, again, it goes back to this notion of can we do it, and should we do it? I find it interesting, if you look at those top five, all data companies, but all of them are very different business models, or they can classify the two different business models. Apple is transactional, Microsoft is transactional, Google is ad-based, Facebook is ad-based, before the fake news stuff. Amazon's kind of playing it both sides. - Yeah, they're kind of all on a collision course though, aren't they? - But, well, that's what's going to be interesting. I think, at some point in time, the "can we do it, should we do it" question is, brands are going to be identified by whether or not they have gone through that process of thinking about, should we do it, and say no. Apple is clearly, for example, incorporating that into their brand. - Well, Silicon Valley, broadly defined, if I include Seattle, and maybe Armlock, not so much IBM. But they've got a dual disruption agenda, they've always disrupted horizontal tech. Now they're disrupting vertical industries. - I was actually just going to pick up on what she was talking about, we were talking about buzzword, right. So, one we haven't heard yet is voice. Voice is another big buzzword right now, when you couple that with IoT and AI, here you go, bingo, do I got three points? (laughing) Voice recognition, voice technology, so all of the smart speakers, if you think about that in the world, there are 7,000 languages being spoken, but yet if you look at Google Home, you look at Siri, you look at any of the devices, I would challenge you, it would have a lot of problem understanding my accent, and even when my British accent creeps out, or it would have trouble understanding seniors, because the way they talk, it's very different than a typical 25-year-old person living in Silicon Valley, right. So, how do we solve that, especially going forward? We're seeing voice technology is going to be so more prominent in our homes, we're going to have it in the cars, we have it in the kitchen, it does everything, it listens to everything that we are talking about, not talking about, and records it. And to your point, is it going to start making decisions on our behalf, but then my question is, how much does it actually understand us? - So, I just want one short story. Siri can't translate a word that I ask it to translate into French, because my phone's set to Canadian English, and that's not supported. So I live in a bilingual French English country, and it can't translate. - But what this is really bringing up is if you look at society, and culture, what's legal, what's ethical, changes across the years. What was right 200 years ago is not right now, and what was right 50 years ago is not right now. - It changes across countries. - It changes across countries, it changes across regions. So, what does this mean when our AI has agency? How do we make ethical AI if we don't even know how to manage the change of what's right and what's wrong in human society? - One of the most important questions we have to worry about, right? - Absolutely. - But it also says one more thing, just before we go on. It also says that the issue of economies of scale, in the cloud. - Yes. - Are going to be strongly impacted, not just by how big you can build your data centers, but some of those regulatory issues that are going to influence strongly what constitutes good experience, good law, good acting on my behalf, agency. - And one thing that's underappreciated in the marketplace right now is the impact of data sovereignty, if you get back to data, countries are now recognizing the importance of managing that data, and they're implementing data sovereignty rules. Everyone talks about California issuing a new law that's aligned with GDPR, and you know what that meant. There are 30 other states in the United States alone that are modifying their laws to address this issue. - Steve. - So, um, so, we got a number of years, no matter what Ray Kurzweil says, until we get to artificial general intelligence. - The singularity's not so near? (laughing) - You know that he's changed the date over the last 10 years. - I did know it. - Quite a bit. And I don't even prognosticate where it's going to be. But really, where we're at right now, I keep coming back to, is that's why augmented intelligence is really going to be the new rage, humans working with machines. One of the hot topics, and the reason I chose to speak about it is, is the future of work. I don't care if you're a millennial, mid-career, or a baby boomer, people are paranoid. As machines get smarter, if your job is routine cognitive, yes, you have a higher propensity to be automated. So, this really shifts a number of things. A, you have to be a lifelong learner, you've got to learn new skillsets. And the dynamics are changing fast. Now, this is also a great equalizer for emerging startups, and even in SMBs. As the AI improves, they can become more nimble. So back to your point regarding colossal trillion dollar, wait a second, there's going to be quite a sea change going on right now, and regarding demographics, in 2020, millennials take over as the majority of the workforce, by 2025 it's 75%. - Great news. (laughing) - As a baby boomer, I try my damnedest to stay relevant. - Yeah, surround yourself with millennials is the takeaway there. - Or retire. (laughs) - Not yet. - One thing I think, this goes back to what Karen was saying, if you want a basic standard to put around the stuff, look at the old ISO 38500 framework. Business strategy, technology strategy. You have risk, compliance, change management, operations, and most importantly, the balance sheet in the financials. AI and what Tony was saying, digital transformation, if it's of meaning, it belongs on a balance sheet, and should factor into how you value your company. All the cyber security, and all of the compliance, and all of the regulation, is all stuff, this framework exists, so look it up, and every time you start some kind of new machine learning project, or data sense project, say, have we checked the box on each of these standards that's within this machine? And if you haven't, maybe slow down and do your homework. - To see a day when data is going to be valued on the balance sheet. - It is. - It's already valued as part of the current, but it's good will. - Certainly market value, as we were just talking about. - Well, we're talking about all of the companies that have opted in, right. There's tens of thousands of small businesses just in this region alone that are opt-out. They're small family businesses, or businesses that really aren't even technology-aware. But data's being collected about them, it's being on Yelp, they're being rated, they're being reviewed, the success to their business is out of their hands. And I think what's really going to be interesting is, you look at the big data, you look at AI, you look at things like that, blockchain may even be a potential for some of that, because of mutability, but it's when all of those businesses, when the technology becomes a cost, it's cost-prohibitive now, for a lot of them, or they just don't want to do it, and they're proudly opt-out. In fact, we talked about that last night at dinner. But when they opt-in, the company that can do that, and can reach out to them in a way that is economically feasible, and bring them back in, where they control their data, where they control their information, and they do it in such a way where it helps them build their business, and it may be a generational business that's been passed on. Those kind of things are going to make a big impact, not only on the cloud, but the data being stored in the cloud, the AI, the applications that you talked about earlier, we talked about that. And that's where this bias, and some of these other things are going to have a tremendous impact if they're not dealt with now, at least ethically. - Well, I feel like we just got started, we're out of time. Time for a couple more comments, and then officially we have to wrap up. - Yeah, I had one thing to say, I mean, really, Henry Ford, and the creation of the automobile, back in the early 1900s, changed everything, because now we're no longer stuck in the country, we can get away from our parents, we can date without grandma and grandpa setting on the porch with us. (laughing) We can take long trips, so now we're looked at, we've sprawled out, we're not all living in the country anymore, and it changed America. So, AI has that same capabilities, it will automate mundane routine tasks that nobody wanted to do anyway. So, a lot of that will change things, but it's not going to be any different than the way things changed in the early 1900s. - It's like you were saying, constant reinvention. - I think that's a great point, let me make one observation on that. Every period of significant industrial change was preceded by the formation, a period of formation of new assets that nobody knew what to do with. Whether it was, what do we do, you know, industrial manufacturing, it was row houses with long shafts tied to an engine that was coal-fired, and drove a bunch of looms. Same thing, railroads, large factories for Henry Ford, before he figured out how to do an information-based notion of mass production. This is the period of asset formation for the next generation of social structures. - Those ship-makers are going to be all over these cars, I mean, you're going to have augmented reality right there, on your windshield. - Karen, bring it home. Give us the drop-the-mic moment. (laughing) - No pressure. - Your AV guys are not happy with that. So, I think the, it all comes down to, it's a people problem, a challenge, let's say that. The whole AI ML thing, people, it's a legal compliance thing. Enterprises are going to struggle with trying to meet five billion different types of compliance rules around data and its uses, about enforcement, because ROI is going to make risk of incarceration as well as return on investment, and we'll have to manage both of those. I think businesses are struggling with a lot of this complexity, and you just opened a whole bunch of questions that we didn't really have solid, "Oh, you can fix it by doing this." So, it's important that we think of this new world of data focus, data-driven, everything like that, is that the entire IT and business community needs to realize that focusing on data means we have to change how we do things and how we think about it, but we also have some of the same old challenges there. - Well, I have a feeling we're going to be talking about this for quite some time. What a great way to wrap up CUBE NYC here, our third day of activities down here at 37 Pillars, or Mercantile 37. Thank you all so much for joining us today. - Thank you. - Really, wonderful insights, really appreciate it, now, all this content is going to be available on theCUBE.net. We are exposing our video cloud, and our video search engine, so you'll be able to search our entire corpus of data. I can't wait to start searching and clipping up this session. Again, thank you so much, and thank you for watching. We'll see you next time.

Published Date : Sep 13 2018

SUMMARY :

- Well, and for the first

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Steve	PERSON	0.99+
Mark Lynd	PERSON	0.99+
Karen	PERSON	0.99+
Karen Lopez	PERSON	0.99+
John	PERSON	0.99+
Steve Ardire	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Peter Burris	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Chris Penn	PERSON	0.99+
Google	ORGANIZATION	0.99+
Carla Gentry	PERSON	0.99+
Dave	PERSON	0.99+
Theo Lau	PERSON	0.99+
Carla	PERSON	0.99+
Kevin L. Jackson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Tony Flath	PERSON	0.99+
Tony	PERSON	0.99+
April, 1983	DATE	0.99+
Apple	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Ray Kurzweil	PERSON	0.99+
Zuckerberg	PERSON	0.99+
New York	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
75%	QUANTITY	0.99+
Ginni Rometty	PERSON	0.99+
Bob Hayes	PERSON	0.99+
80%	QUANTITY	0.99+
GovCloud	ORGANIZATION	0.99+
35 years	QUANTITY	0.99+
2025	DATE	0.99+
Oklahoma	LOCATION	0.99+
Mark Zuckerberg	PERSON	0.99+
US	LOCATION	0.99+
two questions	QUANTITY	0.99+
United States	LOCATION	0.99+
April	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
29 databases	QUANTITY	0.99+
Mark	PERSON	0.99+
7,000 languages	QUANTITY	0.99+
five billion	QUANTITY	0.99+
Elon Musk	PERSON	0.99+
1980s	DATE	0.99+
Unconventional Ventures	ORGANIZATION	0.99+
IRS	ORGANIZATION	0.99+
Siri	TITLE	0.99+
eight children	QUANTITY	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
Armlock	ORGANIZATION	0.99+
French	OTHER	0.99+
Trust Insight	ORGANIZATION	0.99+
ninth year	QUANTITY	0.99+
congress	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Paisan	PERSON	0.99+

Rob Thomas, IBM | Change the Game: Winning With AI

>> Live from Times Square in New York City, it's The Cube covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to The Cube's special presentation. We're covering IBM's announcements today around AI. IBM, as The Cube does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on The Cube, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching The Cube.

Published Date : Sep 13 2018

SUMMARY :

brought to you by IBM. Rob, great to see you. what you guys have going on, it's great when you have on the phases, the waves that we've seen where you want to go, you're the BI data warehouse modernization, a data catalog, if you and get the infrastructure right with, and help them get to a first and I think we have a as the architecture to news that you guys announced that are looking to do new things, I point it as that server, I get the data, of processing power on the the edge, where essentially, it's not just the IBM Cloud, Is that the right way to think about it? We need to give them seamless connectivity certain laws of the land, that is the runtime for people to consume. and it's not a lot of time, register for the event tonight. Yeah, it's going to be fun, we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
six months	QUANTITY	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Monday	DATE	0.99+
last year	DATE	0.99+
one month	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Germany	LOCATION	0.99+
New York City	LOCATION	0.99+
one	QUANTITY	0.99+
Vince Lombardi	PERSON	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
Watson Studio	TITLE	0.99+
Cube	ORGANIZATION	0.99+
ibm.com/WinWithAI	OTHER	0.99+
two	QUANTITY	0.99+
Times Square	LOCATION	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
First	QUANTITY	0.99+
today	DATE	0.98+
Data Science Elite	ORGANIZATION	0.98+
The Cube	TITLE	0.98+
two steps	QUANTITY	0.98+
Scala	TITLE	0.98+
Python	TITLE	0.98+
One	QUANTITY	0.98+
three	QUANTITY	0.98+
Barney	ORGANIZATION	0.98+
Javits Center	LOCATION	0.98+
Watson	TITLE	0.98+
This evening	DATE	0.98+
IBM Analytics	ORGANIZATION	0.97+
one step	QUANTITY	0.97+
Stack Overflow	ORGANIZATION	0.96+
Cloud	TITLE	0.96+
seven-figure deals	QUANTITY	0.96+
Terminal 5	LOCATION	0.96+
Watson Applications	TITLE	0.95+
Watson Machine Learning	TITLE	0.94+
a month	QUANTITY	0.94+
50 million developers	QUANTITY	0.92+

Dr Matt Wood, AWS | AWS Summit NYC 2018

live from New York it's the cube covering AWS summit New York 2018 hot GUI Amazon Web Services and its ecosystem partners hello and welcome back here live cube coverage in New York City for AWS Amazon Web Services summit 2018 I'm John Fourier with Jeff Rick here at the cube our next guest is dr. Matt wood general manager of artificial intelligence with Amazon Web Services keep alumnae been so busy for the past year and been on the cubanía thanks for coming back appreciate you spending the time so promotions keep on going on you got now general manager of the AI group AI operations ai automation machine learning offices a lot of big category of new things developing and a you guys have really taken AI and machine learning to a whole new level it's one of the key value propositions that you guys now have for not just a large enterprise but down to startups and developers so you know congratulations and what's the update oh well the update is this morning in the keynote I was lucky enough to introduce some new capabilities across our platform when it comes to machine learning our mission is that we want to be able to take machine learning and make it available to all developers we joke internally that we just want to we want to make machine learning boring we wanted to make it vanilla it's just it's another tool in the tool chest of any developer and any any data data scientist and we've done that this idea of taking technology that is traditionally only within reached a very very small number of well-funded organizations and making it as broadly distributed as possible we've done that pretty successfully with compute storage and databases and analytics and data warehousing and we want to do the exact same thing for the machine learning and to do that we have to kind of build an entirely new stack and we think of that stack in in three different tiers the bottom tier really for academics and researchers and data scientists we provide a wide range of frameworks open source programming libraries the developers and data scientists use to build neural networks and intelligence they're things like tend to flow and Apache mx9 and by torch and they're really they're very technical you can build you know arbitrarily sophisticated says most she open source to write mostly open source that's right we contribute a lot of our work back to MX net but we also contribute to buy torch and to tend to flow and there's big healthy open source projects growing up around you know all these popular frameworks plus more like chaos and gluon and horror boredom so that's a very very it's a key area for for researchers and academics the next level up we have machine learning platforms this is for developers and data scientists who have data they see in the clout although they want to move to the cloud quickly but they want to be able to use for modeling they want to be able to use it to build custom machine learning models and so here we try and remove as much of the undifferentiated heavy lifting associated with doing that as possible and this is really where sage maker fits in Cersei's maker allows developers to quickly fill train optimize and host their machine learning models and then at the top tier we have a set of AI services which are for application developers that don't want to get into the weeds they just want to get up and running really really quickly and so today we announced four new services really across those their middle tier in that top tier so for Sage maker we're very pleased to introduce a new streaming data protocol which allows you to take data straight from s3 and pump it straight into your algorithm and straight onto the computer infrastructure and what that means is you no longer have to copy data from s3 onto your computer infrastructure in order to be able to start training you just take away that step and just stream it right on there and it's an approach that we use inside sage maker for a lot of our built-in algorithms and it significantly increases the the speed of the algorithm and significantly of course decreases the cost of running the training because you pay by the second so any second you can save off it's a coffin for the customer and they also it helps the machine learn more that's right yeah you can put more data through it absolutely so you're no longer constrained by the amount of disk space you're not even constrained by the amount of memory on the instance you can just pump terabyte after terabyte after terabyte and we actually had another thing like talked about in the keynote this morning a new customer of ours snap who are routinely training on over 100 terabytes of image data using sage maker so you know the ability to be able to pump in lots of data is one of the keys to building successful machine learning applications so we brought that capability to everybody that's using tensorflow now you can just have your tensor flow model bring it to Sage maker do a little bit of wiring click a button and you were just start streaming your data to your tents upload what's the impact of the developer time speed I think it is it is the ability to be able to pump more data it is the decrease in time it takes to start the training but most importantly it decreases the training time all up so you'll see between a 10 and 25 percent decrease in training time some ways you can train more models or you can train more models per in the same unit time or you can just decrease the cost so it's a completely different way of thinking about how to train over large amounts of data we were doing it internally and now we're making it available for everybody through tej matrix that's the first thing the second thing that we're adding is the ability to be able to batch process and stage make them so stage maker used to be great at real-time predictions but there's a lot of use cases where you don't want to just make a one-off prediction you want to predict hundreds or thousands or even millions of things all at once so let's say you've got all of your sales information at the end of the month you want to use that to make a forecast for the next month you don't need to do that in real-time you need to do it once and then place the order and so we added batch transforms to Sage maker so you can pull in all of that data large amounts of data batch process it within a fully automated environment and then spin down the infrastructure and you're done it's a very very simple API anyone that uses a lambda function it's can take advantage of this again just dramatically decreasing the overhead and making it so much easier for everybody to take advantage of machine load and then at the top layer we had new capabilities for our AI services so we announced 12 new language pairs for our translation service and we announced new transcription so capability which allows us to take multi-channel audio such as might be recorded here but more commonly on contact centers just like you have a left channel on the right channel for stereo context centers often record the agent and the customer on the same track and today you can now pass that through our transcribed service long-form speech will split it up into the channels or automatically transcribe it will analyze all the timestamps and create just a single script and from there you can see what was being talked about you can check the topics automatically using comprehend or you can check the compliance did the agents say the words that they have to say for compliance reasons at some point during the conversation that's a material new capability for what's the top surface is being used obviously comprehend transcribe and barri of others you guys have put a lot of stuff out there all kinds of stuff what's the top sellers top use usage as a proxy for uptake you know I think I think we see a ton of we see a ton of adoption across all of these areas but where a lot of the momentum is growing right now is sage maker so if you look at a formula one they just chose Formula One racing they just chose AWS and sage maker as their machine learning platform the National Football League Major League Baseball today announcer they're you know re offering their relationship and their strategic partnership with AWS cream machine learning so all of these groups are using the data which just streams out of these these races all these games yeah and that can be the video or it can be the telemetry of the cars or the telemetry of the players and they're pumping that through Sage maker to drive more engaging experiences for their viewers so guys ok streaming this data is key this is a stage maker quickly this can do video yeah just get it all in all of it well you know we'd love data I would love to follow up on that so the question is is that when will sage maker overtake Aurora as the fastest growing product in history of Amazon because I predicted that reinvent that sage maker would go on err is it looking good right now I mean I sorta still on paper you guys are seeing is growing but see no eager give us an indicator well I mean I don't women breakout revenue per service but even the same excitement I'll say this the same excitement that I see Perseids maker now and the same opportunity and the same momentum it really really reminds me of AWS ten years ago it's the same sort of transformative democratizing approach to which really engages builders and I see the same level of the excitement as levels are super super high as well no super high in general reader pipe out there but I see the same level of enthusiasm and movement and the middle are building with it basically absolutely so what's this toy you have here I know we don't have a lot of time but this isn't you've got a little problem this is the world's first deep learning in April were on wireless video camera we thought it D blends we announced it and launched it at reinvent 2017 and actually hold that but they can hold it up to the camera it's a cute little device we modeled it after wall-e the Pixar movie and it is a HD video camera on the front here and in the base here we have a incredibly powerful custom piece of machine learning hardware so this can process over a billion machine learning operations per second you can take the video in real time you send it to the GPU on board and we'll just start processing the stream in real time so that's kind of interesting but the real value of this and why we designed it was we wanted to try and find a way for developers to get literally hands-on with machine learning so the way that build is a lifelong learners right they they love to learn they have an insatiable appetite for new information and new technologies and the way that they learn that is they experiment they start working and they kind of spin this flywheel where you try something out it works you fiddle with it it stops working you learn a little bit more and you want to go around around around that's been tried and tested for developers for four decades the challenge with machine learning is doing that is still very very difficult you need a label data you need to understand the algorithms it's just it's hard to do but with deep lens you can get up and running in ten minutes so it's connected back to the cloud it's good at about two stage makeup you can deploy a pre-built model down onto the device in ten minutes to do object detection we do some wacky visual effects with neural style transfer we do hot dog and no hot dog detection of course but the real value comes in that you can take any of those models tear them apart so sage maker start fiddling around with them and then immediately deploy them back down onto the camera and every developer on their desk has things that they can detect there are pens and cups and people whatever it is so they can very very quickly spin this flywheel where they're experimenting changing succeeding failing and just going round around a row that's for developers your target audience yes right okay and what are some of the things that have come out of it have you seen any cool yes evolutionary it has been incredibly gratifying and really humbling to see developers that have no machine learning experience take this out of the box and build some really wonderful projects one in really good example is exercise detection so you know when you're doing a workout they build a model which detects the exerciser there and then detects the reps of the weights that you're lifting now we saw skeletal mapping so you could map a person in 3d space using a simple camera we saw security features where you could put this on your door and then it would send you a text message if it didn't recognize who was in front of the door we saw one which was amazing which would read books aloud to kids so you would hold up the book and they would detect the text extract the text send the text to paly and then speak aloud for the kids so there's games as educational tools as little security gizmos one group even trained a dog detection model which detected individual species plug this into an enormous power pack and took it to the local dog park so they could test it out so it's all of this from from a cold start with know machine learning experience you having fun yes absolutely one of the great things about machine learning is you don't just get to work in one area you get to work in you get to work in Formula One and sports and you get to work in healthcare and you get to work in retail and and develop a tool in CTO is gonna love this chief toy officers chief toy officers I love it so I got to ask you so what's new in your world GM of AI audition intelligence what does that mean just quickly explain it for our our audience is that all the software I mean what specifically are you overseeing what's your purview within the realm of AWS yeah that's that's a totally fair question so my purview is I run the products for deep learning machine learning and artificial intelligence really across the AWS machine learning team so I get I have a lot of fingers in a lot of pies I get involved in the new products we're gonna go build out I get involved in helping grow usage of existing products I get it to do a lot of invention it spent a ton of time with customers but overall work with the rest of the team on setting the technical and pronto strategy for machine learning at AWS when what's your top priorities this year adoption uptake new product introductions and you guys don't stop it well we do sync we don't need to keep on introducing more and more things any high ground that you want to take what's what's the vision I didn't the vision is to is genuinely to continue to make it as easy as possible for developers to use Ruggiero my icon overstate the importance or the challenge so we're not at the point where you can just pull down some Python code and figure it out we're not even we don't have a JVM for machine learning where there's no there's no developer tools or debuggers there's very few visualizers so it's still very hard if you kind of think of it in computing terms we're still working in assembly language and you're seen learning so there's this wealth of opportunity ahead of us and the responsibility that I feel very strongly is to be able to continually in crew on the staff to continually bring new capabilities to mortar but well cloud has been disrupting IT operations AI ops with a calling in Silicon Valley and the venture circuit Auto ml as a term has been kicked around Auto automatic machine learning you got to train the machines with something data seems to be it strikes me about this compared to storage or compared to compute or compared to some of the core Amazon foundational products those are just better ways to do something they already existed this is not a better way to do something that are exists this is a way to get the democratization at the start of the process of the application of machine learning and artificial intelligence to a plethora of applications in these cases that is fundamentally yeah different in it just a step up in terms of totally agree the power to the hands of the people it's something which is very far as an area which is very fast moving and very fast growing but what's funny is it totally builds on top of the cloud and you really can't do machine learning in any meaningful production way unless you have a way that is cheap and easy to collect large amounts of data in a way which allows you to pull down high-performance computation at any scale that you need it and so through the cloud we've actually laid the foundations for machine learning going forwards and other things too coming oh yes that's a search as you guys announced the cloud highlights the power yet that it brings to these new capabilities solutely yeah and we get to build on them at AWS and at Amazon just like our customers do so osage make the runs on ec2 we wouldn't we won't be able to do sage maker without ec2 and you know in the fullness of time we see that you know the usage of machine learning could be as big if not bigger than the whole of the rest of AWS combined that's our aspiration dr. Matt would I wish we had more time to Chad loved shopping with you I'd love to do a whole nother segment on what you're doing with customers I know you guys are great customer focus as Andy always mentions when on the cube you guys listen to customers want to hear that maybe a reinvent will circle back sounds good congratulations on your success great to see you he showed it thanks off dr. Matt would here in the cube was dreaming all this data out to the Amazon Cloud is whether they be hosts all of our stuff of course it's the cube bringing you live action here in New York City for cube coverage of AWS summit 2018 in Manhattan we'll be back with more after this short break

Published Date : Jul 17 2018

SUMMARY :

amount of memory on the instance you can

ENTITIES

Entity	Category	Confidence
Jeff Rick	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
John Fourier	PERSON	0.99+
New York City	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
ten minutes	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Manhattan	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Andy	PERSON	0.99+
Matt Wood	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
25 percent	QUANTITY	0.99+
ten minutes	QUANTITY	0.99+
New York	LOCATION	0.99+
second thing	QUANTITY	0.99+
millions	QUANTITY	0.99+
Pixar	ORGANIZATION	0.99+
dr. Matt wood	PERSON	0.99+
Python	TITLE	0.99+
April	DATE	0.99+
today	DATE	0.99+
four decades	QUANTITY	0.98+
terabyte	QUANTITY	0.98+
over 100 terabytes	QUANTITY	0.98+
Sage maker	ORGANIZATION	0.98+
ten years ago	DATE	0.97+
12 new language pairs	QUANTITY	0.97+
next month	DATE	0.97+
four new services	QUANTITY	0.97+
first thing	QUANTITY	0.96+
thousands	QUANTITY	0.96+
s3	TITLE	0.95+
Aurora	TITLE	0.95+
second	QUANTITY	0.95+
one	QUANTITY	0.95+
sage maker	ORGANIZATION	0.94+
Formula One	TITLE	0.94+
dr. Matt	PERSON	0.93+
first deep learning	QUANTITY	0.93+
ec2	TITLE	0.93+
AWS Summit	EVENT	0.92+
single script	QUANTITY	0.9+
a ton of time	QUANTITY	0.9+
one of the keys	QUANTITY	0.9+
this morning	DATE	0.9+
MX net	ORGANIZATION	0.89+
National Football League Major League Baseball	EVENT	0.88+
Cersei	ORGANIZATION	0.88+
sage maker	ORGANIZATION	0.88+
year	DATE	0.88+
reinvent 2017	EVENT	0.87+
three different tiers	QUANTITY	0.87+
AWS summit 2018	EVENT	0.87+
cubanía	LOCATION	0.86+
one area	QUANTITY	0.86+
2018	EVENT	0.86+
dr. Matt	PERSON	0.85+
Perseids	ORGANIZATION	0.85+
about two stage	QUANTITY	0.82+
lot of time	QUANTITY	0.81+
Web Services summit 2018	EVENT	0.81+
this year	DATE	0.8+
Apache	TITLE	0.79+
over a billion machine learning operations per second	QUANTITY	0.79+
Chad	PERSON	0.79+
things	QUANTITY	0.78+
lot of use cases	QUANTITY	0.77+
a ton of	QUANTITY	0.77+
lots of data	QUANTITY	0.74+
CTO	TITLE	0.73+
this morning	DATE	0.72+
amounts of	QUANTITY	0.71+
Sage maker	TITLE	0.69+

David Abercrombie, Sharethrough & Michael Nixon, Snowflake | Big Data SV 2018

>> Narrator: Live from San Jose, it's theCUBE. Presenting Big Data, Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hi, I'm George Gilbert, and we are broadcasting from the Strata Data Conference, we're right around the corner at the Forager Tasting Room & Eatery. We have this wonderful location here, and we are very lucky to have with us Michael Nixon, from Snowflake, which is a leading cloud data warehouse. And David Abercrombie from Sharethrough which is a leading ad tech company. And between the two of them, they're going to tell us some of the most advance these cases we have now for cloud-native data warehousing. Michael, why don't you start with giving us some context for how on a cloud platform one might rethink a data warehouse? >> Yeah, thank you. That's a great question because let me first answer it from the end-user, business value perspective, when you run a workload on a cloud, there's a certain level of expectation you want out of the cloud. You want scalability, you want unlimited scalability, you want to be able to support all your users, you want to be able to support the data types, whatever they may be that comes in into your organization. So, there's a level of expectation that one should expect from a service point of view once you're in a cloud. So, a lot of the technology that were built up to this point have been optimized for on-premises types of data warehousing where perhaps that level of service and currency and unlimited scalability was not really expected but, guess what? Once it comes to the cloud, it's expected. So those on-premises technologies aren't suitable in the cloud, so for enterprises and, I mean, companies, organizations of all types from finance, banking, manufacturing, ad tech as we'll have today, they want that level of service in the cloud. And so, those technologies will not work, and so it requires a rethinking of how those architectures are built. And it requires being built for the cloud. >> And just to, alright, to break this down and be really concrete, some of the rethinking. We separate compute from storage, which is a familiar pattern that we've learned in the cloud but we also then have to have this sort of independent elasticity between-- >> Yes. Storage and the compute, and then Snowflake's taken it even a step further where you can spin out multiple compute clusters. >> Right. >> Tell us how that works and why that's so difficult and unique. >> Yeah, you know, that's taking us under the covers a little bit, but what makes our infrastructure unique is that we have a three-layer architecture. We separate, just as you said, storage from the compute layer, from the services layer. And that's really important because as I mentioned before, you want unlimited capacity, unlimited resources. So, if you scale, compute, and today's world on on-premises MPP, what that really means is that you have to bring the storage along with the compute because compute is tied to the storage so when you scale the storage along with the compute, usually that involves a lot of burden on the data warehouse manager because now they have to redistribute the data and that means redistributing keys, managing keys if you will. And that's a burden, and by the reverse, if all you wanted to do was increase storage but not the compute, because compute was tied to storage. Why you have to buy these additional compute notes, and that might add to the cost when, in fact, all you really wanted to pay for was for additional storage? So, by separating those, you keep them independent, and so you can scale storage apart from compute and then, once you have your compute resources in place, the virtual warehouses that you're talking about that have completed the job, you spun them up, it's done its job, and you take it down, guess what? You can release those resources, and of course, in releasing those resources, basically you can cut your cost as well because, for us, it's pure usage-based pricing. You only pay for what you use, and that's really fantastic. >> Very different from the on-prem model where, as you were saying, tied compute and storage together, so. >> Yeah, let's think about what that means architecturally, right? So if you have an on-premises data warehouse, and you want to scale your capacity, chances are you'll have to have that hardware in place already. And having that hardware in place already means you're paying that expense and, so you may pay for that expense six months prior to need it. Let's take a retailer example. >> Yeah. >> You're gearing up for a peak season, which might be Christmas, and so you put that hardware in place sometime in June, you'll always put it in advanced because why? You have to bring up the environment, so you have to allow time for implementation or, if you will, deployment to make sure everything is operational. >> Okay. >> And then what happens is when that peak period comes, you can't expand in that capacity. But what happens once that peak period is over? You paid for that hardware, but you don't really need it. So, our vision is, or the vision we believe you should have when you move workloads to the cloud is, you pay for those when you need them. >> Okay, so now, David, help us understand, first, what was the business problem you were trying to solve? And why was Snowflake, you know, sort of uniquely suited for that? >> Well, let me talk a little bit about Sharethrough. We're ad tech, at the core of our business we run an ad exchange, where we're doing programmatic training with the bids, with the real-time bidding spec. The data is very high in volume, with 12 billion impressions a month, that's a lot of bids that we have to process, a lot of bid requests. The way it operates, the bids and the bid responses and programmatic training are encoded in JSONs, so our ad exchange is basically exchanging messages in JSON with our business partners. And the JSONs are very complicated, there's a lot of richness and detail, such that the advertisers can decide whether or not they want to bid. Well, this data is very complicated, very high-volume. And advertising, like any business, we really need to have good analytics to understand how our business is operating, how our publishers are doing, how our advertisers are doing. And it all depends upon this very high-volume, very complex JSON event data stream. So, Snowflake was able to ingest our high-volume data very gracefully. The JSON parsing techniques of Snowflake allow me to expose the complicated data structure in a way that's very transparent and usable to our analysts. Our use of Snowflake has replaced clunkier tools where the analysts basically had to be programmers, writing programs in Scala or something to do in analysis. And now, because we've transparently and easily exposed the complicated structures within Snowflake in a relational database, they can use good old-fashioned SQL to run their queries, literally, afternoon analysis is now a five-minute query. >> So, let me, as I'm listening to you describe this. We've had various vendors telling us about these workflows in the sort of data prep and data science tool change. It almost sounds to me like Snowflake is taking semi-structured or complex data and it's sort of unraveling it and normalizing is kind of an overloaded term but it's making it business-ready, so you don't need as much of that manual data prep. >> Yeah, exactly, you don't need as much manual data prep, or you don't need as much expertise. For instance, Snowflake's JSON capabilities, in terms of drilling down the JSON tree with dot path notation, or expanding nested objects is very expressive, very powerful, but still your typical analyst or your BI tool certainly wouldn't know how to do that. So, in Snowflake, we sort of have our cake and eat it too. We can have our JSONs with their full richness in our database, but yet we can simplify and expose the data elements that are needed for analysis, so that an analyst, their first day on the job, they can get right to work and start writing queries. >> So let me ask you about, a little more about the programmatic ad use case. So if you have billions of impressions per month, I'm guessing that means you have quite a few times more, in terms of bids, and then there's the, you know once you have, I guess a successful one, you want to track what happens. >> Correct. >> So tell us a little more about that, what that workload looks like, in terms of, what analytics you're trying to perform, what's your tracking? >> Yeah, well, you're right. There's different steps in our funnel. The impression request expands out by a factor of a dozen as we send it to all the different potential bidders. We track all that data, the responses come back, we track that, we track our decisions and why we selected the bidder. And then, once the ad is shown, of course there's various beacons and tracking things that fire. We'd have to track all of that data, and the only way we could make sense out of our business is by bringing all that data together. And in a way that is reliable, transparent, and visible, and also has data integrity, that's another thing I like about the Snowflake database is that it's a good old-fashioned SQL database that I can declare my primary keys, I can run QC checks, I can ensure high data integrity that is demanded by BI and other sorts of analytics. >> What would be, as you continue to push the boundaries of the ad tech service, what's some functionality that you're looking to add, and Snowflake as your partner, either that's in there now that you still need to take advantage of or things that you're looking to in the future? >> Well, moving forward, of course, we, it's very important for us to be able to quickly gauge the effectiveness of new products. The ad tech market is fast-changing, there's always new ways of bidding, new products that are being developed, new ways for the ad ecosystem to work. And so, as we roll those out, we need to be able to quickly analyze, you know, "Is this thing working or not?" You know, kind of an agile environment, pivot or prove it. Does this feature work or not? So, having all the data in one place makes that possible for that very quick assessment of the viability of a new feature, new product. >> And, dropping down a little under the covers for how that works, does that mean, like you still have the base JSON data that you've absorbed, but you're going to expose it with different schemas or access patterns? >> Yeah, indeed. For instance, we make use of the SQL schemas, roles, and permissions internally where we can have the different teams have their own domain of data that they can expose internally, and looking forward, there's the share house feature of Snowflake that we're looking to implement with our partners, where, rather than sending them data, like a daily dump of data, we can give them access to their data in our database through this top layer that Michael mentioned, the service layer, essentially allows me to create a view grant select onto another customer. So I no longer have to send daily data dumps to partners or have some sort of API for getting data. They can simply query the data themselves so we'll be implementing that feature with our major partners. >> I would be remiss in not asking at a data conference like this, now that there's the tie-in with CuBOL and Spark Integration and Machine Learning, is there anything along that front that you're planning to exploit in the near future? >> Well, yeah, Sharethrough, we're very experimental, playful, we're always examining new data technologies and new ways of doing things but now with Snowflake as sort of our data warehouse of curated data. I've got two petabytes of referential integrity data, and that is reliable. We can move forward into our other analyses and other uses of data knowing that we have captured every event exactly once, and we know exactly where it fits in a business context, in a relational manner. It's clean, good data integrity, reliable, accessible, visible, and it's just plain old SQL. (chuckles) >> That's actually a nice way to sum it up. We've got the integrity that we've come to expect and love from relational databases. We've got the flexibility of machine-oriented data, or JSON. But we don't have to give up the query engine, and then now you have more advanced features, analytic features that you can take advantage of coming down the pipe. >> Yeah, again we're a modern platform for the modern age, that's basically cloud-based computing. With a platform like Snowflake in the backend, you can now move those workloads that you're accustomed to to the cloud and have in the environment that you're familiar with, and it saves you a lot of time and effort. You can focus on more strategic projects. >> Okay, well, with that, we're going to take a short break. This has been George Gilbert, we're with Michael Nixon of Snowflake, and David Abercrombie of Sharethrough listening to how the most modern ad tech companies are taking advantage of the most modern cloud data warehouses. And we'll be back after a short break here at the Strata Data Conference, thanks. (quirky music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media some of the most advance these cases we have now a certain level of expectation you want out of the cloud. concrete, some of the rethinking. Storage and the compute, and then Snowflake's taken it and unique. that have completed the job, you spun them up, Very different from the on-prem model where, as you and you want to scale your capacity, chances are You have to bring up the environment, so you have to allow You paid for that hardware, but you don't really need it. of richness and detail, such that the advertisers can So, let me, as I'm listening to you describe this. of drilling down the JSON tree with dot path notation, I'm guessing that means you have quite a few times more, I like about the Snowflake database analyze, you know, "Is this thing working or not?" the service layer, essentially allows me to create and that is reliable. and then now you have more you can now move those workloads that you're accustomed to at the Strata Data Conference, thanks.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
David Abercrombie	PERSON	0.99+
Michael Nixon	PERSON	0.99+
Michael	PERSON	0.99+
June	DATE	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Scala	TITLE	0.99+
first	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
five-minute	QUANTITY	0.99+
Snowflake	TITLE	0.99+
Christmas	EVENT	0.98+
Strata Data Conference	EVENT	0.98+
three-layer	QUANTITY	0.98+
first day	QUANTITY	0.98+
a dozen	QUANTITY	0.98+
two petabytes	QUANTITY	0.97+
Sharethrough	ORGANIZATION	0.97+
JSON	TITLE	0.97+
SQL	TITLE	0.96+
one place	QUANTITY	0.95+
six months	QUANTITY	0.94+
Forager Tasting Room & Eatery	ORGANIZATION	0.91+
today	DATE	0.89+
Snowflake	ORGANIZATION	0.87+
Spark	TITLE	0.87+
12 billion impressions a month	QUANTITY	0.87+
Machine Learning	TITLE	0.84+
Big Data	ORGANIZATION	0.84+
billions of impressions	QUANTITY	0.8+
CuBOL	TITLE	0.79+
Big Data SV 2018	EVENT	0.77+
once	QUANTITY	0.72+
theCUBE	ORGANIZATION	0.63+
JSONs	TITLE	0.61+
times	QUANTITY	0.55+

Ian Swanson, DataScience.com | Big Data SV 2018

(royal music) >> Announcer: John Cleese. >> There's a lot of people out there who have no idea what they're doing, but they have absolutely no idea that they have no idea what they're doing. Those are the ones with the confidence and stupidity who finish up in power. That's why the planet doesn't work. >> Announcer: Knowledgeable, insightful, and a true gentleman. >> The guy at the counter recognized me and said... Are you listening? >> John Furrier: Yes, I'm tweeting away. >> No, you're not. >> I tweet, I'm tweeting away. >> He is kind of rude that way. >> You're on your (bleep) keyboard. >> Announcer: John Cleese joins the Cube alumni. Welcome, John. >> John Cleese: Have you got any phone calls you need to answer? >> John Furrier: Hold on, let me check. >> Announcer: Live from San Jose, it's the Cube, presenting Big Data Silicon Valley, brought to you by Silicon Angle Media and its ecosystem partners. (busy music) >> Hey, welcome back to the Cube's continuing coverage of our event, Big Data SV. I'm Lisa Martin with my co-host, George Gilbert. We are down the street from the Strata Data Conference. This is our second day, and we've been talking all things big data, cloud data science. We're now excited to be joined by the CEO of a company called Data Science, Ian Swanson. Ian, welcome to the Cube. >> Thanks so much for having me. I mean, it's been a awesome two days so far, and it's great to wrap up my trip here on the show. >> Yeah, so, tell us a little bit about your company, Data Science, what do you guys do? What are some of the key opportunities for you guys in the enterprise market? >> Yeah, absolutely. My company's called datascience.com, and what we do is we offer an enterprise data science platform where data scientists get to use all they tools they love in all the languages, all the libraries, leveraging everything that is open source to build models and put models in production. Then we also provide IT the ability to be able to manage this massive stack of tools that data scientists require, and it all boils down to one thing, and that is, companies need to use the data that they've been storing for years. It's about, how do you put that data into action. We give the tools to data scientists to get that data into action. >> Let's drill down on that a bit. For a while, we thought if we just put all our data in this schema-on-read repository, that would be nirvana. But it wasn't all that transparent, and we recognized we have to sort of go in and structure it somewhat, help us take the next couple steps. >> Ian: Yeah, the journey. >> From this partially curated data sets to something that turns into a model that is actionable. >> That's actually been the theme in the show here at the Strata Data Conference. If we went back years ago, it was, how do we store data. Then it was, how do we not just store and manage, but how do we transform it and get it into a shape that we can actually use it. The theme of this year is how do we get it to that next step, the next step of putting it into action. To layer onto that, data scientists need to access data, yes, but then they need to be able to collaborate, work together, apply many different techniques, machine learning, AI, deep learning, these are all techniques of a data scientist to be able to build a model. But then there's that next step, and the next is, hey, I built this model, how do I actually get it in production? How does it actually get used? Here's the shocking thing. I was at an event where there's 500 data scientists in the audience, and I said, "Stand up if you worked on a model for more than nine months "and it never went into production." 90% of the audience stood up. That's the last mile that we're all still working on, and what's exciting is, we can make it possible today. >> Wanting to drill down into the sort of, it sounds like there's a lot of choice in the tools. But typically, to do a pipeline, you either need well established APIs that everyone understands and plugs together with, or you need an end to end sort of single vendor solution that becomes the sort of collaboration backbone. How are you organized, how are you built? >> This might be self-serving, but datascience.com, we have enterprise data science platform, we recommend a unified platform for data science. Now, that unified platform needs to be highly configurable. You need to make it so that that workbench, you can use any tool that you want. Some data scientists might want to use a hammer, others want to be able to use a screwdriver over here. The power is how configurable, how extensible it is, how open source you can adopt everything. The amazing trends that we've seen have been proprietary solutions going back decades, to now, the rise of open source. Every day, dozens if not hundreds of new machine learning libraries are being released every single day. We've got to give those capabilities to data scientists and make them scale. >> OK, so the, and I think it's pretty easy to see how you would have incorporate new machine learning libraries into a pipeline. But then there's also the tools for data preparation, and for like feature extraction and feature engineering, you might even have some tools that help you with figuring out which algorithm to select. What holds all that together? >> Yeah, so orchestrating the enterprise data science stack is the hardest challenge right now. There has to be a company like us that is the glue, that is not just, do these solutions work together, but also, how do they collaborate, what is that workflow? What are those steps in that process? There's one thing that you might have left out, and that is, model deployment, model interpretation, model management. >> George: That's the black art, yeah. >> That's where this whole thing is going next. That was the exciting thing that I heard in terms of all these discussion with business leaders throughout the last two days is model deployment, model management. >> If I can kind of take this to maybe shift the conversation a little bit to the target audience. Talked a lot about data scientists and needing to enable them. I'm curious about, we just talked with, a couple of guests ago, about the chief data officer. How, you work with enterprises, how common is the chief data officer role today? What are some of the challenges they've got that datascience.com can help them to eliminate? >> Yeah, the CIO and the chief data officer, we have CIOs that have been selecting tools for companies to use, and now the chief data officer is sitting down with the CEO and saying, "How do we actually drive business results?" We work very closely with both of those personas. But on the CDO side, it's really helping them educate their teams on the possibilities of what could be realized with the data at hand, and making sure that IT is enabling the data scientists with the right tools. We supply the tools, but we also like to go in there with our customers and help coach, help educate what is possible, and that helps with the CDO's mission. >> A question along that front. We've been talking about sort of empowering the data scientist, and really, from one end of the modeling life cycle all the way to the end or the deployment, which is currently the hardest part and least well supported. But we also have tons of companies that don't have data science trained people, or who are only modestly familiar. Where do, what do we do with them? How do we get those companies into the mainstream in terms of deploying this? >> I think whether you're a small company or a big company, digital transformation is the mandate. Digital transformation is not just, how do I make a taxi company become Uber, or how do I make a speaker company become Sonos, the smart speaker, it's how do I exploit all the sources of my data to get better and improved operational processes, new business models, increased revenue, reduced operation costs. You could start small, and so we work with plenty of smaller companies. They'll hire a couple data scientists, and they're able to do small quick wins. You don't have to go sit in the basement for a year having something that is the thing, the unicorn in the business, it's small quick wins. Now we, my company, we believe in writing code, trained, educated, data scientists. There are solutions out there that you throw data at, you push a button, it gets an output. It's this magic black box. There's risk in that. Model interpretation, what are the features it's scoring on, there's risk, but those companies are seeing some level of success. We firmly believe, though, in hiring a data science team that is trained, you can start small, two or three, and get some very quick wins. >> I was going to say, those quick wins are essential for survivability, like digital transformation is essential, but it's also, I mean, to survival at a minimum, right? >> Ian: Yes. >> Those quick wins are presumably transformative to an enterprise being able to sustain, and then eventually, or ideally, be able to take market share from their competition. >> That is key for the CDO. The CDO is there pitching what is possible, he's pitching, she's pitching the dream. In order to be able to help visualize what that dream and the outcome could be, we always say, start small, quick wins, then from there, you can build. What you don't want to do is go nine months working on something and you don't know if there's going to be outcome. A lot of data science is trial and error. This is science, we're testing hypotheses. There's not always an outcome that's to be there, so small quick wins is something we highly recommend. >> A question, one of the things that we see more and more is the idea that actionable insights are perishable, and that latency matters. In fact, you have a budget for latency, almost, like in that short amount of time, the more sort of features that you can dynamically feed into a model to get a score, are you seeing more of that? How are the use cases that you're seeing, how's that pattern unfolding? >> Yeah, so we're seeing more streaming data use cases. We work with some of the biggest technology companies in the world, so IoT, connected services, streaming real time decisions that are happening. But then, also, there are so many use cases around org that could be marketing, finance, HR related, not just tech related. On the marketing side, imagine if you're customer service, and somebody calls you, and you know instantly the lifetime value of that customer, and it kicks off a totally new talk track, maybe get escalated immediately to a new supervisor, because that supervisor can handle this top tier customer. These are decisions that can happen real time leveraging machine learning models, and these are things that, again, are small quick wins, but massive, massive impact. It's about decision process now. That's digital transformation. >> OK. Are you seeing patterns in terms of how much horsepower customers are budgeting for the training process, creating the model? Because we know it's very compute intensive, like, even Intel, some people call it, like, high performance compute, like a supercomputer type workload. How much should people be budgeting? Because we don't see any guidelines or rules of thumb for this. >> I still think the boundaries are being worked out. There's a lot of great work that Nvidia's doing with GPU, we're able to do things faster on compute power. But even if we just start from the basics, if you go and talk to a data scientist at a massive company where they have a team of over 1,000 data scientists, and you say to do this analysis, how do you spin up your compute power? Well, I go walk over to IT and I knock on the door, and I say, "Set up this machine, set up this cluster." That's ridiculous. A product like ours is able to instantly give them the compute power, scale it elastically with our cloud service partners or work with on-prem solutions to be able to say, get the power that you need to get the results in the time that's needed, quick, fast. In terms of the boundaries of the budget, that's still being defined. But at the end of the day, we are seeing return on investment, and that's what's key. >> Are you seeing a movement towards a greater scope of integration for the data science tool chain? Or is it that at the high end, where you have companies with 1,000 data scientists, they know how to deal with specialized components, whereas, when there's perhaps less of, a smaller pool of expertise, the desire for end to end integration is greater. >> I think there's this kind of thought that is not necessarily right, and that is, if you have a bigger data science team, you're more sophisticated. We actually see the same sophistication level of 1,000 person data science team, in many cases, to a 20 person data science team, and sometimes inverse, I mean, it's kind of crazy. But it's, how do we make sure that we give them the tools so they can drive value. Tools need to include collaboration and workflow, not just hammers and nails, but how do we work together, how do we scale knowledge, how do we get it in the hands of the line of business so they can use the results. It's that that is key. >> That's great, Ian. I also like that you really kind of articulated start small, quick ins can make massive impact. We want to thank you so much for stopping by the Cube and sharing that, and what you guys are doing at Data Science to help enterprises really take advantage of the value that data can really deliver. >> Thanks so much for having datascience.com on, really appreciate it. >> Lisa: Absolutely. George, thank you for being my co-host. >> You're always welcome. >> We want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert, and we are at our event Big Data SV on day two. Stick around, we'll be right back with our next guest after a short break. (busy music)

Published Date : Mar 8 2018

SUMMARY :

Those are the ones with the confidence and stupidity and a true gentleman. The guy at the counter recognized me and said... Announcer: John Cleese joins the Cube alumni. brought to you by Silicon Angle Media We are down the street from the Strata Data Conference. and it's great to wrap up my trip here on the show. and it all boils down to one thing, and that is, the next couple steps. to something that turns into a model that is actionable. and the next is, hey, I built this model, that becomes the sort of collaboration backbone. how open source you can adopt everything. OK, so the, and I think it's pretty easy to see Yeah, so orchestrating the enterprise data science stack in terms of all these discussion with business leaders a couple of guests ago, about the chief data officer. and making sure that IT is enabling the data scientists empowering the data scientist, and really, having something that is the thing, or ideally, be able to take market share and the outcome could be, we always say, start small, the more sort of features that you can dynamically in the world, so IoT, connected services, customers are budgeting for the training process, get the power that you need to get the results Or is it that at the high end, We actually see the same sophistication level and sharing that, and what you guys are doing Thanks so much for having datascience.com on, George, thank you for being my co-host. and we are at our event Big Data SV on day two.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Ian Swanson	PERSON	0.99+
George	PERSON	0.99+
Ian	PERSON	0.99+
Lisa	PERSON	0.99+
Uber	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
John	PERSON	0.99+
John Cleese	PERSON	0.99+
500 data scientists	QUANTITY	0.99+
90%	QUANTITY	0.99+
dozens	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
20 person	QUANTITY	0.99+
Data Science	ORGANIZATION	0.99+
nine months	QUANTITY	0.99+
1,000 person	QUANTITY	0.99+
two	QUANTITY	0.99+
two days	QUANTITY	0.99+
more than nine months	QUANTITY	0.99+
second day	QUANTITY	0.99+
1,000 data scientists	QUANTITY	0.99+
three	QUANTITY	0.99+
Big Data SV	EVENT	0.99+
over 1,000 data scientists	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Strata Data Conference	EVENT	0.98+
one	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
Sonos	ORGANIZATION	0.98+
one thing	QUANTITY	0.97+
a year	QUANTITY	0.96+
today	DATE	0.95+
day two	QUANTITY	0.95+
this year	DATE	0.94+
single	QUANTITY	0.92+
Big Data SV 2018	EVENT	0.88+
DataScience.com	ORGANIZATION	0.87+
hundreds of new machine learning libraries	QUANTITY	0.86+
lot of people	QUANTITY	0.83+
decades	QUANTITY	0.82+
every single day	QUANTITY	0.81+
years ago	DATE	0.77+
last two days	DATE	0.76+
datascience.com	ORGANIZATION	0.75+
one end	QUANTITY	0.7+
years	QUANTITY	0.67+
datascience.com	OTHER	0.65+
couple steps	QUANTITY	0.64+
Big Data	EVENT	0.64+
couple of guests	DATE	0.57+
couple	QUANTITY	0.52+
Silicon Valley	LOCATION	0.52+
things	QUANTITY	0.5+
Cube	TITLE	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Women in Machine Learning: