UNLIST TILL 4/2 - Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning

>> Paige: Hello, everybody, and thank you for joining us today for the Virtual Vertica BDC 2020. Today's breakout session is entitled "Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning." I'm Paige Roberts, Opensource Relations Manager at Vertica, and I'll be your host for this session. Joining me is Vertica Software Engineer, George Larionov. >> George: Hi. >> Paige: (chuckles) That's George. So, before we begin, I encourage you guys to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. So, as soon as a question occurs to you, go ahead and type it in, and there will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to get to during that time. Any questions we don't get to, we'll do our best to answer offline. Now, alternatively, you can visit Vertica Forum to post your questions there, after the session. Our engineering team is planning to join the forums to keep the conversation going, so you can ask an engineer afterwards, just as if it were a regular conference in person. Also, reminder, you can maximize your screen by clicking the double-arrow button in the lower right corner of the slides. And, before you ask, yes, this virtual session is being recorded, and it will be available to view by the end this week. We'll send you a notification as soon as it's ready. Now, let's get started, over to you, George. >> George: Thank you, Paige. So, I've been introduced. I'm a Software Engineer at Vertica, and today I'm going to be talking about a new feature, Vertica's Integration with TensorFlow. So, first, I'm going to go over what is TensorFlow and what are neural networks. Then, I'm going to talk about why integrating with TensorFlow is a useful feature, and, finally, I am going to talk about the integration itself and give an example. So, as we get started here, what is TensorFlow? TensorFlow is an opensource machine learning library, developed by Google, and it's actually one of many such libraries. And, the whole point of libraries like TensorFlow is to simplify the whole process of working with neural networks, such as creating, training, and using them, so that it's available to everyone, as opposed to just a small subset of researchers. So, neural networks are computing systems that allow us to solve various tasks. Traditionally, computing algorithms were designed completely from the ground up by engineers like me, and we had to manually sift through the data and decide which parts are important for the task and which are not. Neural networks aim to solve this problem, a little bit, by sifting through the data themselves, automatically and finding traits and features which correlate to the right results. So, you can think of it as neural networks learning to solve a specific task by looking through the data without having human beings have to sit and sift through the data themselves. So, there's a couple necessary parts to getting a trained neural model, which is the final goal. By the way, a neural model is the same as a neural network. Those are synonymous. So, first, you need this light blue circle, an untrained neural model, which is pretty easy to get in TensorFlow, and, in edition to that, you need your training data. Now, this involves both training inputs and training labels, and I'll talk about exactly what those two things are on the next slide. But, basically, you need to train your model with the training data, and, once it is trained, you can use your trained model to predict on just the purple circle, so new training inputs. And, it will predict the training labels for you. You don't have to label it anymore. So, a neural network can be thought of as... Training a neural network can be thought of as teaching a person how to do something. For example, if I want to learn to speak a new language, let's say French, I would probably hire some sort of tutor to help me with that task, and I would need a lot of practice constructing and saying sentences in French. And a lot of feedback from my tutor on whether my pronunciation or grammar, et cetera, is correct. And, so, that would take me some time, but, finally, hopefully, I would be able to learn the language and speak it without any sort of feedback, getting it right. So, in a very similar manner, a neural network needs to practice on, example, training data, first, and, along with that data, it needs labeled data. In this case, the labeled data is kind of analogous to the tutor. It is the correct answers, so that the network can learn what those look like. But, ultimately, the goal is to predict on unlabeled data which is analogous to me knowing how to speak French. So, I went over most of the bullets. A neural network needs a lot of practice. To do that, it needs a lot of good labeled data, and, finally, since a neural network needs to iterate over the training data many, many times, it needs a powerful machine which can do that in a reasonable amount of time. So, here's a quick checklist on what you need if you have a specific task that you want to solve with a neural network. So, the first thing you need is a powerful machine for training. We discussed why this is important. Then, you need TensorFlow installed on the machine, of course, and you need a dataset and labels for your dataset. Now, this dataset can be hundreds of examples, thousands, sometimes even millions. I won't go into that because the dataset size really depends on the task at hand, but if you have these four things, you can train a good neural network that will predict whatever result you want it to predict at the end. So, we've talked about neural networks and TensorFlow, but the question is if we already have a lot of built-in machine-learning algorithms in Vertica, then why do we need to use TensorFlow? And, to answer that question, let's look at this dataset. So, this is a pretty simple toy dataset with 20,000 points, but it shows, it simulates a more complex dataset with some sort of two different classes which are not related in a simple way. So, the existing machine-learning algorithms that Vertica already has, mostly fail on this pretty simple dataset. Linear models can't really draw a good line separating the two types of points. Naïve Bayes, also, performs pretty badly, and even the Random Forest algorithm, which is a pretty powerful algorithm, with 300 trees gets only 80% accuracy. However, a neural network with only two hidden layers gets 99% accuracy in about ten minutes of training. So, I hope that's a pretty compelling reason to use neural networks, at least sometimes. So, as an aside, there are plenty of tasks that do fit the existing machine-learning algorithms in Vertica. That's why they're there, and if one of your tasks that you want to solve fits one of the existing algorithms, well, then I would recommend using that algorithm, not TensorFlow, because, while neural networks have their place and are very powerful, it's often easier to use an existing algorithm, if possible. Okay, so, now that we've talked about why neural networks are needed, let's talk about integrating them with Vertica. So, neural networks are best trained using GPUs, which are Graphics Processing Units, and it's, basically, just a different processing unit than a CPU. GPUs are good for training neural networks because they excel at doing many, many simple operations at the same time, which is needed for a neural network to be able to iterate through the training data many times. However, Vertica runs on CPUs and cannot run on GPUs at all because that's not how it was designed. So, to train our neural networks, we have to go outside of Vertica, and exporting a small batch of training data is pretty simple. So, that's not really a problem, but, given this information, why do we even need Vertica? If we train outside, then why not do everything outside of Vertica? So, to answer that question, here is a slide that Philips was nice enough to let us use. This is an example of production system at Philips. So, it consists of two branches. On the left, we have a branch with historical device log data, and this can kind of be thought of as a bunch of training data. And, all that data goes through some data integration, data analysis. Basically, this is where you train your models, whether or not they are neural networks, but, for the purpose of this talk, this is where you would train your neural network. And, on the right, we have a branch which has live device log data coming in from various MRI machines, CAT scan machines, et cetera, and this is a ton of data. So, these machines are constantly running. They're constantly on, and there's a bunch of them. So, data just keeps streaming in, and, so, we don't want this data to have to take any unnecessary detours because that would greatly slow down the whole system. So, this data in the right branch goes through an already trained predictive model, which need to be pretty fast, and, finally, it allows Philips to do some maintenance on these machines before they actually break, which helps Philips, obviously, and definitely the medical industry as well. So, I hope this slide helped explain the complexity of a live production system and why it might not be reasonable to train your neural networks directly in the system with the live device log data. So, a quick summary on just the neural networks section. So, neural networks are powerful, but they need a lot of processing power to train which can't really be done well in a production pipeline. However, they are cheap and fast to predict with. Prediction with a neural network does not require GPU anymore. And, they can be very useful in production, so we do want them there. We just don't want to train them there. So, the question is, now, how do we get neural networks into production? So, we have, basically, two options. The first option is to take the data and export it to our machine with TensorFlow, our powerful GPU machine, or we can take our TensorFlow model and put it where the data is. In this case, let's say that that is Vertica. So, I'm going to go through some pros and cons of these two approaches. The first one is bringing the data to the analytics. The pros of this approach are that TensorFlow is already installed, running on this GPU machine, and we don't have to move the model at all. The cons, however, are that we have to transfer all the data to this machine and if that data is big, if it's, I don't know, gigabytes, terabytes, et cetera, then that becomes a huge bottleneck because you can only transfer in small quantities. Because GPU machines tend to not be that big. Furthermore, TensorFlow prediction doesn't actually need a GPU. So, you would end up paying for an expensive GPU for no reason. It's not parallelized because you just have one GPU machine. You can't put your production system on this GPU, as we discussed. And, so, you're left with good results, but not fast and not where you need them. So, now, let's look at the second option. So, the second option is bringing the analytics to the data. So, the pros of this approach are that we can integrate with our production system. It's low impact because prediction is not processor intensive. It's cheap, or, at least, it's pretty much as cheap as your system was before. It's parallelized because Vertica was always parallelized, which we'll talk about in the next slide. There's no extra data movement. You get the benefit from model management in Vertica, meaning, if you import multiple TensorFlow models, you can keep track of their various attributes, when they were imported, et cetera. And, the results are right where you need them, inside your production pipeline. So, two cons are that TensorFlow is limited to just prediction inside Vertica, and, if you want to retrain your model, you need to do that outside of Vertica and, then, reimport. So, just as a recap of parallelization. Everything in Vertica is parallelized and distributed, and TensorFlow is no exception. So, when you import your TensorFlow model to your Vertica cluster, it gets copied to all the nodes, automatically, and TensorFlow will run in fenced mode which means that it the TensorFlow process fails for whatever reason, even though it shouldn't, but if it does, Vertica itself will not crash, which is obviously important. And, finally, prediction happens on each node. There are multiple threads of TensorFlow processes running, processing different little bits of data, which is faster, much faster, than processing the data line by line because it happens all in a parallelized fashion. And, so, the result is fast prediction. So, here's an example which I hope is a little closer to what everyone is used to than the usual machine learning TensorFlow example. This is the Boston housing dataset, or, rather, a small subset of it. Now, on the left, we have the input data to go back to, I think, the first slide, and, on the right, is the training label. So, the input data consists of, each line is a plot of land in Boston, along with various attributes, such as the level of crime in that area, how much industry is in that area, whether it's on the Charles River, et cetera, and, on the right, we have as the labels the median house value in that plot of land. And, so, the goal is to put all this data into the neural network and, finally, get a model which can train... I don't know, which can predict on new incoming data and predict a good housing value for that data. Now, I'm going to go through, step by step, how to actually use TensorFlow models in Vertica. So, the first step I won't go into much detail on because there are countless tutorials and resources online on how to use TensorFlow to train a neural network, so that's the first step. Second step is to save the model in TensorFlow's 'frozen graph' format. Again, this information is available online. The third step is to create a small, simple JSON file describing the inputs and outputs of the model, and what data type they are, et cetera. And, this is needed for Vertica to be able to translate from TensorFlow land into Vertica equal land, so that it can use a sequel table instead of the input set TensorFlow usually takes. So, once you have your model file and your JSON file, you want to put both of those files in a directory on a node, any node, in a Vertica cluster, and name that directory whatever you want your model to ultimately be called inside of Vertica. So, once you do that you can go ahead and import that directory into Vertica. So, this import model's function already exists in Vertica. All we added was a new category to be able to import. So, what you need to do is specify the pass to your neural network directory and specify that the category that the model is is a TensorFlow model. Once you successfully import, in order to predict, you run this brand new predict TensorFlow function, so, in this case, we're predicting on everything from the input table, which is what the star means. The model name is Boston housing net which is the name of your directory, and, then, there's a little bit of boilerplate. And, the two ID and value after the as are just the names of the columns of your outputs, and, finally, the Boston housing data is whatever sequel table you want to predict on that fits the import type of your network. And, this will output a bunch of predictions. In this case, values of houses that the network thinks are appropriate for all the input data. So, just a quick summary. So, we talked about what is TensorFlow and what are neural networks, and, then, we discussed that TensorFlow works best on GPUs because it needs very specific characteristics. That is TensorFlow works best for training on GPUs while Vertica is designed to use CPUs, and it's really good at storing and accessing a lot of data quickly. But, it's not very well designed for having neural networks trained inside of it. Then, we talked about how neural models are powerful, and we want to use them in our production flow. And, since prediction is fast, we can go ahead and do that, but we just don't want to train there, and, finally, I presented Vertica TensorFlow integration which allows importing a trained neural model, a trained neural TensorFlow model, into Vertica and predicting on all the data that is inside Vertica with few simple lines of sequel. So, thank you for listening. I'm going to take some questions, now.

Published Date : Mar 30 2020

SUMMARY :

and I'll be your host for this session. So, as soon as a question occurs to you, So, the second option is bringing the analytics to the data.

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
Philips	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
George	PERSON	0.99+
99%	QUANTITY	0.99+
20,000 points	QUANTITY	0.99+
second option	QUANTITY	0.99+
Charles River	LOCATION	0.99+
Google	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
third step	QUANTITY	0.99+
first step	QUANTITY	0.99+
George Larionov	PERSON	0.99+
first option	QUANTITY	0.99+
two things	QUANTITY	0.99+
first	QUANTITY	0.99+
Second step	QUANTITY	0.99+
Paige	PERSON	0.99+
each line	QUANTITY	0.99+
two branches	QUANTITY	0.99+
Today	DATE	0.99+
two options	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
300 trees	QUANTITY	0.99+
two approaches	QUANTITY	0.99+
millions	QUANTITY	0.99+
first slide	QUANTITY	0.99+
TensorFlow	TITLE	0.99+
Tapping Vertica's Integration with TensorFlow for Advanced Machine Learning	TITLE	0.99+
two types	QUANTITY	0.99+
two different classes	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
Vertica	TITLE	0.99+
first one	QUANTITY	0.98+
two cons	QUANTITY	0.97+
about ten minutes	QUANTITY	0.97+
two hidden layers	QUANTITY	0.97+
French	OTHER	0.96+
each node	QUANTITY	0.95+
one	QUANTITY	0.95+
end this week	DATE	0.94+
two ID	QUANTITY	0.91+
four things	QUANTITY	0.89+

Robert Nishihara, Anyscale | AWS Startup Showcase S3 E1

(upbeat music) >> Hello everyone. Welcome to theCube's presentation of the "AWS Startup Showcase." The topic this episode is AI and machine learning, top startups building foundational model infrastructure. This is season three, episode one of the ongoing series covering exciting startups from the AWS ecosystem. And this time we're talking about AI and machine learning. I'm your host, John Furrier. I'm excited I'm joined today by Robert Nishihara, who's the co-founder and CEO of a hot startup called Anyscale. He's here to talk about Ray, the open source project, Anyscale's infrastructure for foundation as well. Robert, thank you for joining us today. >> Yeah, thanks so much as well. >> I've been following your company since the founding pre pandemic and you guys really had a great vision scaled up and in a perfect position for this big wave that we all see with ChatGPT and OpenAI that's gone mainstream. Finally, AI has broken out through the ropes and now gone mainstream, so I think you guys are really well positioned. I'm looking forward to to talking with you today. But before we get into it, introduce the core mission for Anyscale. Why do you guys exist? What is the North Star for Anyscale? >> Yeah, like you mentioned, there's a tremendous amount of excitement about AI right now. You know, I think a lot of us believe that AI can transform just every different industry. So one of the things that was clear to us when we started this company was that the amount of compute needed to do AI was just exploding. Like to actually succeed with AI, companies like OpenAI or Google or you know, these companies getting a lot of value from AI, were not just running these machine learning models on their laptops or on a single machine. They were scaling these applications across hundreds or thousands or more machines and GPUs and other resources in the Cloud. And so to actually succeed with AI, and this has been one of the biggest trends in computing, maybe the biggest trend in computing in, you know, in recent history, the amount of compute has been exploding. And so to actually succeed with that AI, to actually build these scalable applications and scale the AI applications, there's a tremendous software engineering lift to build the infrastructure to actually run these scalable applications. And that's very hard to do. So one of the reasons many AI projects and initiatives fail is that, or don't make it to production, is the need for this scale, the infrastructure lift, to actually make it happen. So our goal here with Anyscale and Ray, is to make that easy, is to make scalable computing easy. So that as a developer or as a business, if you want to do AI, if you want to get value out of AI, all you need to know is how to program on your laptop. Like, all you need to know is how to program in Python. And if you can do that, then you're good to go. Then you can do what companies like OpenAI or Google do and get value out of machine learning. >> That programming example of how easy it is with Python reminds me of the early days of Cloud, when infrastructure as code was talked about was, it was just code the infrastructure programmable. That's super important. That's what AI people wanted, first program AI. That's the new trend. And I want to understand, if you don't mind explaining, the relationship that Anyscale has to these foundational models and particular the large language models, also called LLMs, was seen with like OpenAI and ChatGPT. Before you get into the relationship that you have with them, can you explain why the hype around foundational models? Why are people going crazy over foundational models? What is it and why is it so important? >> Yeah, so foundational models and foundation models are incredibly important because they enable businesses and developers to get value out of machine learning, to use machine learning off the shelf with these large models that have been trained on tons of data and that are useful out of the box. And then, of course, you know, as a business or as a developer, you can take those foundational models and repurpose them or fine tune them or adapt them to your specific use case and what you want to achieve. But it's much easier to do that than to train them from scratch. And I think there are three, for people to actually use foundation models, there are three main types of workloads or problems that need to be solved. One is training these foundation models in the first place, like actually creating them. The second is fine tuning them and adapting them to your use case. And the third is serving them and actually deploying them. Okay, so Ray and Anyscale are used for all of these three different workloads. Companies like OpenAI or Cohere that train large language models. Or open source versions like GPTJ are done on top of Ray. There are many startups and other businesses that fine tune, that, you know, don't want to train the large underlying foundation models, but that do want to fine tune them, do want to adapt them to their purposes, and build products around them and serve them, those are also using Ray and Anyscale for that fine tuning and that serving. And so the reason that Ray and Anyscale are important here is that, you know, building and using foundation models requires a huge scale. It requires a lot of data. It requires a lot of compute, GPUs, TPUs, other resources. And to actually take advantage of that and actually build these scalable applications, there's a lot of infrastructure that needs to happen under the hood. And so you can either use Ray and Anyscale to take care of that and manage the infrastructure and solve those infrastructure problems. Or you can build the infrastructure and manage the infrastructure yourself, which you can do, but it's going to slow your team down. It's going to, you know, many of the businesses we work with simply don't want to be in the business of managing infrastructure and building infrastructure. They want to focus on product development and move faster. >> I know you got a keynote presentation we're going to go to in a second, but I think you hit on something I think is the real tipping point, doing it yourself, hard to do. These are things where opportunities are and the Cloud did that with data centers. Turned a data center and made it an API. The heavy lifting went away and went to the Cloud so people could be more creative and build their product. In this case, build their creativity. Is that kind of what's the big deal? Is that kind of a big deal happening that you guys are taking the learnings and making that available so people don't have to do that? >> That's exactly right. So today, if you want to succeed with AI, if you want to use AI in your business, infrastructure work is on the critical path for doing that. To do AI, you have to build infrastructure. You have to figure out how to scale your applications. That's going to change. We're going to get to the point, and you know, with Ray and Anyscale, we're going to remove the infrastructure from the critical path so that as a developer or as a business, all you need to focus on is your application logic, what you want the the program to do, what you want your application to do, how you want the AI to actually interface with the rest of your product. Now the way that will happen is that Ray and Anyscale will still, the infrastructure work will still happen. It'll just be under the hood and taken care of by Ray in Anyscale. And so I think something like this is really necessary for AI to reach its potential, for AI to have the impact and the reach that we think it will, you have to make it easier to do. >> And just for clarification to point out, if you don't mind explaining the relationship of Ray and Anyscale real quick just before we get into the presentation. >> So Ray is an open source project. We created it. We were at Berkeley doing machine learning. We started Ray so that, in order to provide an easy, a simple open source tool for building and running scalable applications. And Anyscale is the managed version of Ray, basically we will run Ray for you in the Cloud, provide a lot of tools around the developer experience and managing the infrastructure and providing more performance and superior infrastructure. >> Awesome. I know you got a presentation on Ray and Anyscale and you guys are positioning as the infrastructure for foundational models. So I'll let you take it away and then when you're done presenting, we'll come back, I'll probably grill you with a few questions and then we'll close it out so take it away. >> Robert: Sounds great. So I'll say a little bit about how companies are using Ray and Anyscale for foundation models. The first thing I want to mention is just why we're doing this in the first place. And the underlying observation, the underlying trend here, and this is a plot from OpenAI, is that the amount of compute needed to do machine learning has been exploding. It's been growing at something like 35 times every 18 months. This is absolutely enormous. And other people have written papers measuring this trend and you get different numbers. But the point is, no matter how you slice and dice it, it' a astronomical rate. Now if you compare that to something we're all familiar with, like Moore's Law, which says that, you know, the processor performance doubles every roughly 18 months, you can see that there's just a tremendous gap between the needs, the compute needs of machine learning applications, and what you can do with a single chip, right. So even if Moore's Law were continuing strong and you know, doing what it used to be doing, even if that were the case, there would still be a tremendous gap between what you can do with the chip and what you need in order to do machine learning. And so given this graph, what we've seen, and what has been clear to us since we started this company, is that doing AI requires scaling. There's no way around it. It's not a nice to have, it's really a requirement. And so that led us to start Ray, which is the open source project that we started to make it easy to build these scalable Python applications and scalable machine learning applications. And since we started the project, it's been adopted by a tremendous number of companies. Companies like OpenAI, which use Ray to train their large models like ChatGPT, companies like Uber, which run all of their deep learning and classical machine learning on top of Ray, companies like Shopify or Spotify or Instacart or Lyft or Netflix, ByteDance, which use Ray for their machine learning infrastructure. Companies like Ant Group, which makes Alipay, you know, they use Ray across the board for fraud detection, for online learning, for detecting money laundering, you know, for graph processing, stream processing. Companies like Amazon, you know, run Ray at a tremendous scale and just petabytes of data every single day. And so the project has seen just enormous adoption since, over the past few years. And one of the most exciting use cases is really providing the infrastructure for building training, fine tuning, and serving foundation models. So I'll say a little bit about, you know, here are some examples of companies using Ray for foundation models. Cohere trains large language models. OpenAI also trains large language models. You can think about the workloads required there are things like supervised pre-training, also reinforcement learning from human feedback. So this is not only the regular supervised learning, but actually more complex reinforcement learning workloads that take human input about what response to a particular question, you know is better than a certain other response. And incorporating that into the learning. There's open source versions as well, like GPTJ also built on top of Ray as well as projects like Alpa coming out of UC Berkeley. So these are some of the examples of exciting projects in organizations, training and creating these large language models and serving them using Ray. Okay, so what actually is Ray? Well, there are two layers to Ray. At the lowest level, there's the core Ray system. This is essentially low level primitives for building scalable Python applications. Things like taking a Python function or a Python class and executing them in the cluster setting. So Ray core is extremely flexible and you can build arbitrary scalable applications on top of Ray. So on top of Ray, on top of the core system, what really gives Ray a lot of its power is this ecosystem of scalable libraries. So on top of the core system you have libraries, scalable libraries for ingesting and pre-processing data, for training your models, for fine tuning those models, for hyper parameter tuning, for doing batch processing and batch inference, for doing model serving and deployment, right. And a lot of the Ray users, the reason they like Ray is that they want to run multiple workloads. They want to train and serve their models, right. They want to load their data and feed that into training. And Ray provides common infrastructure for all of these different workloads. So this is a little overview of what Ray, the different components of Ray. So why do people choose to go with Ray? I think there are three main reasons. The first is the unified nature. The fact that it is common infrastructure for scaling arbitrary workloads, from data ingest to pre-processing to training to inference and serving, right. This also includes the fact that it's future proof. AI is incredibly fast moving. And so many people, many companies that have built their own machine learning infrastructure and standardized on particular workflows for doing machine learning have found that their workflows are too rigid to enable new capabilities. If they want to do reinforcement learning, if they want to use graph neural networks, they don't have a way of doing that with their standard tooling. And so Ray, being future proof and being flexible and general gives them that ability. Another reason people choose Ray in Anyscale is the scalability. This is really our bread and butter. This is the reason, the whole point of Ray, you know, making it easy to go from your laptop to running on thousands of GPUs, making it easy to scale your development workloads and run them in production, making it easy to scale, you know, training to scale data ingest, pre-processing and so on. So scalability and performance, you know, are critical for doing machine learning and that is something that Ray provides out of the box. And lastly, Ray is an open ecosystem. You can run it anywhere. You can run it on any Cloud provider. Google, you know, Google Cloud, AWS, Asure. You can run it on your Kubernetes cluster. You can run it on your laptop. It's extremely portable. And not only that, it's framework agnostic. You can use Ray to scale arbitrary Python workloads. You can use it to scale and it integrates with libraries like TensorFlow or PyTorch or JAX or XG Boost or Hugging Face or PyTorch Lightning, right, or Scikit-learn or just your own arbitrary Python code. It's open source. And in addition to integrating with the rest of the machine learning ecosystem and these machine learning frameworks, you can use Ray along with all of the other tooling in the machine learning ecosystem. That's things like weights and biases or ML flow, right. Or you know, different data platforms like Databricks, you know, Delta Lake or Snowflake or tools for model monitoring for feature stores, all of these integrate with Ray. And that's, you know, Ray provides that kind of flexibility so that you can integrate it into the rest of your workflow. And then Anyscale is the scalable compute platform that's built on top, you know, that provides Ray. So Anyscale is a managed Ray service that runs in the Cloud. And what Anyscale does is it offers the best way to run Ray. And if you think about what you get with Anyscale, there are fundamentally two things. One is about moving faster, accelerating the time to market. And you get that by having the managed service so that as a developer you don't have to worry about managing infrastructure, you don't have to worry about configuring infrastructure. You also, it provides, you know, optimized developer workflows. Things like easily moving from development to production, things like having the observability tooling, the debug ability to actually easily diagnose what's going wrong in a distributed application. So things like the dashboards and the other other kinds of tooling for collaboration, for monitoring and so on. And then on top of that, so that's the first bucket, developer productivity, moving faster, faster experimentation and iteration. The second reason that people choose Anyscale is superior infrastructure. So this is things like, you know, cost deficiency, being able to easily take advantage of spot instances, being able to get higher GPU utilization, things like faster cluster startup times and auto scaling. Things like just overall better performance and faster scheduling. And so these are the kinds of things that Anyscale provides on top of Ray. It's the managed infrastructure. It's fast, it's like the developer productivity and velocity as well as performance. So this is what I wanted to share about Ray in Anyscale. >> John: Awesome. >> Provide that context. But John, I'm curious what you think. >> I love it. I love the, so first of all, it's a platform because that's the platform architecture right there. So just to clarify, this is an Anyscale platform, not- >> That's right. >> Tools. So you got tools in the platform. Okay, that's key. Love that managed service. Just curious, you mentioned Python multiple times, is that because of PyTorch and TensorFlow or Python's the most friendly with machine learning or it's because it's very common amongst all developers? >> That's a great question. Python is the language that people are using to do machine learning. So it's the natural starting point. Now, of course, Ray is actually designed in a language agnostic way and there are companies out there that use Ray to build scalable Java applications. But for the most part right now we're focused on Python and being the best way to build these scalable Python and machine learning applications. But, of course, down the road there always is that potential. >> So if you're slinging Python code out there and you're watching that, you're watching this video, get on Anyscale bus quickly. Also, I just, while you were giving the presentation, I couldn't help, since you mentioned OpenAI, which by the way, congratulations 'cause they've had great scale, I've noticed in their rapid growth 'cause they were the fastest company to the number of users than anyone in the history of the computer industry, so major successor, OpenAI and ChatGPT, huge fan. I'm not a skeptic at all. I think it's just the beginning, so congratulations. But I actually typed into ChatGPT, what are the top three benefits of Anyscale and came up with scalability, flexibility, and ease of use. Obviously, scalability is what you guys are called. >> That's pretty good. >> So that's what they came up with. So they nailed it. Did you have an inside prompt training, buy it there? Only kidding. (Robert laughs) >> Yeah, we hard coded that one. >> But that's the kind of thing that came up really, really quickly if I asked it to write a sales document, it probably will, but this is the future interface. This is why people are getting excited about the foundational models and the large language models because it's allowing the interface with the user, the consumer, to be more human, more natural. And this is clearly will be in every application in the future. >> Absolutely. This is how people are going to interface with software, how they're going to interface with products in the future. It's not just something, you know, not just a chat bot that you talk to. This is going to be how you get things done, right. How you use your web browser or how you use, you know, how you use Photoshop or how you use other products. Like you're not going to spend hours learning all the APIs and how to use them. You're going to talk to it and tell it what you want it to do. And of course, you know, if it doesn't understand it, it's going to ask clarifying questions. You're going to have a conversation and then it'll figure it out. >> This is going to be one of those things, we're going to look back at this time Robert and saying, "Yeah, from that company, that was the beginning of that wave." And just like AWS and Cloud Computing, the folks who got in early really were in position when say the pandemic came. So getting in early is a good thing and that's what everyone's talking about is getting in early and playing around, maybe replatforming or even picking one or few apps to refactor with some staff and managed services. So people are definitely jumping in. So I have to ask you the ROI cost question. You mentioned some of those, Moore's Law versus what's going on in the industry. When you look at that kind of scale, the first thing that jumps out at people is, "Okay, I love it. Let's go play around." But what's it going to cost me? Am I going to be tied to certain GPUs? What's the landscape look like from an operational standpoint, from the customer? Are they locked in and the benefit was flexibility, are you flexible to handle any Cloud? What is the customers, what are they looking at? Basically, that's my question. What's the customer looking at? >> Cost is super important here and many of the companies, I mean, companies are spending a huge amount on their Cloud computing, on AWS, and on doing AI, right. And I think a lot of the advantage of Anyscale, what we can provide here is not only better performance, but cost efficiency. Because if we can run something faster and more efficiently, it can also use less resources and you can lower your Cloud spending, right. We've seen companies go from, you know, 20% GPU utilization with their current setup and the current tools they're using to running on Anyscale and getting more like 95, you know, 100% GPU utilization. That's something like a five x improvement right there. So depending on the kind of application you're running, you know, it's a significant cost savings. We've seen companies that have, you know, processing petabytes of data every single day with Ray going from, you know, getting order of magnitude cost savings by switching from what they were previously doing to running their application on Ray. And when you have applications that are spending, you know, potentially $100 million a year and getting a 10 X cost savings is just absolutely enormous. So these are some of the kinds of- >> Data infrastructure is super important. Again, if the customer, if you're a prospect to this and thinking about going in here, just like the Cloud, you got infrastructure, you got the platform, you got SaaS, same kind of thing's going to go on in AI. So I want to get into that, you know, ROI discussion and some of the impact with your customers that are leveraging the platform. But first I hear you got a demo. >> Robert: Yeah, so let me show you, let me give you a quick run through here. So what I have open here is the Anyscale UI. I've started a little Anyscale Workspace. So Workspaces are the Anyscale concept for interactive developments, right. So here, imagine I'm just, you want to have a familiar experience like you're developing on your laptop. And here I have a terminal. It's not on my laptop. It's actually in the cloud running on Anyscale. And I'm just going to kick this off. This is going to train a large language model, so OPT. And it's doing this on 32 GPUs. We've got a cluster here with a bunch of CPU cores, bunch of memory. And as that's running, and by the way, if I wanted to run this on instead of 32 GPUs, 64, 128, this is just a one line change when I launch the Workspace. And what I can do is I can pull up VS code, right. Remember this is the interactive development experience. I can look at the actual code. Here it's using Ray train to train the torch model. We've got the training loop and we're saying that each worker gets access to one GPU and four CPU cores. And, of course, as I make the model larger, this is using deep speed, as I make the model larger, I could increase the number of GPUs that each worker gets access to, right. And how that is distributed across the cluster. And if I wanted to run on CPUs instead of GPUs or a different, you know, accelerator type, again, this is just a one line change. And here we're using Ray train to train the models, just taking my vanilla PyTorch model using Hugging Face and then scaling that across a bunch of GPUs. And, of course, if I want to look at the dashboard, I can go to the Ray dashboard. There are a bunch of different visualizations I can look at. I can look at the GPU utilization. I can look at, you know, the CPU utilization here where I think we're currently loading the model and running that actual application to start the training. And some of the things that are really convenient here about Anyscale, both I can get that interactive development experience with VS code. You know, I can look at the dashboards. I can monitor what's going on. It feels, I have a terminal, it feels like my laptop, but it's actually running on a large cluster. And I can, with however many GPUs or other resources that I want. And so it's really trying to combine the best of having the familiar experience of programming on your laptop, but with the benefits, you know, being able to take advantage of all the resources in the Cloud to scale. And it's like when, you know, you're talking about cost efficiency. One of the biggest reasons that people waste money, one of the silly reasons for wasting money is just forgetting to turn off your GPUs. And what you can do here is, of course, things will auto terminate if they're idle. But imagine you go to sleep, I have this big cluster. You can turn it off, shut off the cluster, come back tomorrow, restart the Workspace, and you know, your big cluster is back up and all of your code changes are still there. All of your local file edits. It's like you just closed your laptop and came back and opened it up again. And so this is the kind of experience we want to provide for our users. So that's what I wanted to share with you. >> Well, I think that whole, couple of things, lines of code change, single line of code change, that's game changing. And then the cost thing, I mean human error is a big deal. People pass out at their computer. They've been coding all night or they just forget about it. I mean, and then it's just like leaving the lights on or your water running in your house. It's just, at the scale that it is, the numbers will add up. That's a huge deal. So I think, you know, compute back in the old days, there's no compute. Okay, it's just compute sitting there idle. But you know, data cranking the models is doing, that's a big point. >> Another thing I want to add there about cost efficiency is that we make it really easy to use, if you're running on Anyscale, to use spot instances and these preemptable instances that can just be significantly cheaper than the on-demand instances. And so when we see our customers go from what they're doing before to using Anyscale and they go from not using these spot instances 'cause they don't have the infrastructure around it, the fault tolerance to handle the preemption and things like that, to being able to just check a box and use spot instances and save a bunch of money. >> You know, this was my whole, my feature article at Reinvent last year when I met with Adam Selipsky, this next gen Cloud is here. I mean, it's not auto scale, it's infrastructure scale. It's agility. It's flexibility. I think this is where the world needs to go. Almost what DevOps did for Cloud and what you were showing me that demo had this whole SRE vibe. And remember Google had site reliability engines to manage all those servers. This is kind of like an SRE vibe for data at scale. I mean, a similar kind of order of magnitude. I mean, I might be a little bit off base there, but how would you explain it? >> It's a nice analogy. I mean, what we are trying to do here is get to the point where developers don't think about infrastructure. Where developers only think about their application logic. And where businesses can do AI, can succeed with AI, and build these scalable applications, but they don't have to build, you know, an infrastructure team. They don't have to develop that expertise. They don't have to invest years in building their internal machine learning infrastructure. They can just focus on the Python code, on their application logic, and run the stuff out of the box. >> Awesome. Well, I appreciate the time. Before we wrap up here, give a plug for the company. I know you got a couple websites. Again, go, Ray's got its own website. You got Anyscale. You got an event coming up. Give a plug for the company looking to hire. Put a plug in for the company. >> Yeah, absolutely. Thank you. So first of all, you know, we think AI is really going to transform every industry and the opportunity is there, right. We can be the infrastructure that enables all of that to happen, that makes it easy for companies to succeed with AI, and get value out of AI. Now we have, if you're interested in learning more about Ray, Ray has been emerging as the standard way to build scalable applications. Our adoption has been exploding. I mentioned companies like OpenAI using Ray to train their models. But really across the board companies like Netflix and Cruise and Instacart and Lyft and Uber, you know, just among tech companies. It's across every industry. You know, gaming companies, agriculture, you know, farming, robotics, drug discovery, you know, FinTech, we see it across the board. And all of these companies can get value out of AI, can really use AI to improve their businesses. So if you're interested in learning more about Ray and Anyscale, we have our Ray Summit coming up in September. This is going to highlight a lot of the most impressive use cases and stories across the industry. And if your business, if you want to use LLMs, you want to train these LLMs, these large language models, you want to fine tune them with your data, you want to deploy them, serve them, and build applications and products around them, give us a call, talk to us. You know, we can really take the infrastructure piece, you know, off the critical path and make that easy for you. So that's what I would say. And, you know, like you mentioned, we're hiring across the board, you know, engineering, product, go-to-market, and it's an exciting time. >> Robert Nishihara, co-founder and CEO of Anyscale, congratulations on a great company you've built and continuing to iterate on and you got growth ahead of you, you got a tailwind. I mean, the AI wave is here. I think OpenAI and ChatGPT, a customer of yours, have really opened up the mainstream visibility into this new generation of applications, user interface, roll of data, large scale, how to make that programmable so we're going to need that infrastructure. So thanks for coming on this season three, episode one of the ongoing series of the hot startups. In this case, this episode is the top startups building foundational model infrastructure for AI and ML. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

episode one of the ongoing and you guys really had and other resources in the Cloud. and particular the large language and what you want to achieve. and the Cloud did that with data centers. the point, and you know, if you don't mind explaining and managing the infrastructure and you guys are positioning is that the amount of compute needed to do But John, I'm curious what you think. because that's the platform So you got tools in the platform. and being the best way to of the computer industry, Did you have an inside prompt and the large language models and tell it what you want it to do. So I have to ask you and you can lower your So I want to get into that, you know, and you know, your big cluster is back up So I think, you know, the on-demand instances. and what you were showing me that demo and run the stuff out of the box. I know you got a couple websites. and the opportunity is there, right. and you got growth ahead

ENTITIES

Entity	Category	Confidence
Robert Nishihara	PERSON	0.99+
John	PERSON	0.99+
Robert	PERSON	0.99+
John Furrier	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
35 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Ant Group	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Python	TITLE	0.99+
20%	QUANTITY	0.99+
32 GPUs	QUANTITY	0.99+
Lyft	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
tomorrow	DATE	0.99+
Anyscale	ORGANIZATION	0.99+
three	QUANTITY	0.99+
128	QUANTITY	0.99+
September	DATE	0.99+
today	DATE	0.99+
Moore's Law	TITLE	0.99+
Adam Selipsky	PERSON	0.99+
PyTorch	TITLE	0.99+
Ray	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
64	QUANTITY	0.99+
each worker	QUANTITY	0.99+
each worker	QUANTITY	0.99+
Photoshop	TITLE	0.99+
UC Berkeley	ORGANIZATION	0.99+
Java	TITLE	0.99+
Shopify	ORGANIZATION	0.99+
OpenAI	ORGANIZATION	0.99+
Anyscale	PERSON	0.99+
third	QUANTITY	0.99+
two things	QUANTITY	0.99+
ByteDance	ORGANIZATION	0.99+
Spotify	ORGANIZATION	0.99+
One	QUANTITY	0.99+
95	QUANTITY	0.99+
Asure	ORGANIZATION	0.98+
one line	QUANTITY	0.98+
one GPU	QUANTITY	0.98+
ChatGPT	TITLE	0.98+
TensorFlow	TITLE	0.98+
last year	DATE	0.98+
first bucket	QUANTITY	0.98+
both	QUANTITY	0.98+
two layers	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Alipay	ORGANIZATION	0.98+
Ray	PERSON	0.97+
one	QUANTITY	0.97+
Instacart	ORGANIZATION	0.97+

SiliconANGLE News | Swami Sivasubramanian Extended Version

(bright upbeat music) >> Hello, everyone. Welcome to SiliconANGLE News breaking story here. Amazon Web Services expanding their relationship with Hugging Face, breaking news here on SiliconANGLE. I'm John Furrier, SiliconANGLE reporter, founder, and also co-host of theCUBE. And I have with me, Swami, from Amazon Web Services, vice president of database, analytics, machine learning with AWS. Swami, great to have you on for this breaking news segment on AWS's big news. Thanks for coming on and taking the time. >> Hey, John, pleasure to be here. >> You know- >> Looking forward to it. >> We've had many conversations on theCUBE over the years, we've watched Amazon really move fast into the large data modeling, SageMaker became a very smashing success, obviously you've been on this for a while. Now with ChatGPT OpenAI, a lot of buzz going mainstream, takes it from behind the curtain inside the ropes, if you will, in the industry to a mainstream. And so this is a big moment, I think, in the industry, I want to get your perspective, because your news with Hugging Face, I think is another tell sign that we're about to tip over into a new accelerated growth around making AI now application aware, application centric, more programmable, more API access. What's the big news about, with AWS Hugging Face, you know, what's going on with this announcement? >> Yeah. First of all, they're very excited to announce our expanded collaboration with Hugging Face, because with this partnership, our goal, as you all know, I mean, Hugging Face, I consider them like the GitHub for machine learning. And with this partnership, Hugging Face and AWS, we'll be able to democratize AI for a broad range of developers, not just specific deep AI startups. And now with this, we can accelerate the training, fine tuning and deployment of these large language models, and vision models from Hugging Face in the cloud. And the broader context, when you step back and see what customer problem we are trying to solve with this announcement, essentially if you see these foundational models, are used to now create like a huge number of applications, suggest like tech summarization, question answering, or search image generation, creative, other things. And these are all stuff we are seeing in the likes of these ChatGPT style applications. But there is a broad range of enterprise use cases that we don't even talk about. And it's because these kind of transformative, generative AI capabilities and models are not available to, I mean, millions of developers. And because either training these elements from scratch can be very expensive or time consuming and need deep expertise, or more importantly, they don't need these generic models, they need them to be fine tuned for the specific use cases. And one of the biggest complaints we hear is that these models, when they try to use it for real production use cases, they are incredibly expensive to train and incredibly expensive to run inference on, to use it at a production scale. So, and unlike web search style applications, where the margins can be really huge, here in production use cases and enterprises, you want efficiency at scale. That's where Hugging Face and AWS share our mission. And by integrating with Trainium and Inferentia, we're able to handle the cost efficient training and inference at scale, I'll deep dive on it. And by teaming up on the SageMaker front, now the time it takes to build these models and fine tune them is also coming down. So that's what makes this partnership very unique as well. So I'm very excited. >> I want to get into the time savings and the cost savings as well on the training and inference, it's a huge issue, but before we get into that, just how long have you guys been working with Hugging Face? I know there's a previous relationship, this is an expansion of that relationship, can you comment on what's different about what's happened before and then now? >> Yeah. So, Hugging Face, we have had a great relationship in the past few years as well, where they have actually made their models available to run on AWS, you know, fashion. Even in fact, their Bloom Project was something many of our customers even used. Bloom Project, for context, is their open source project which builds a GPT-3 style model. And now with this expanded collaboration, now Hugging Face selected AWS for that next generation office generative AI model, building on their highly successful Bloom Project as well. And the nice thing is, now, by direct integration with Trainium and Inferentia, where you get cost savings in a really significant way, now, for instance, Trn1 can provide up to 50% cost to train savings, and Inferentia can deliver up to 60% better costs, and four x more higher throughput than (indistinct). Now, these models, especially as they train that next generation generative AI models, it is going to be, not only more accessible to all the developers, who use it in open, so it'll be a lot cheaper as well. And that's what makes this moment really exciting, because we can't democratize AI unless we make it broadly accessible and cost efficient and easy to program and use as well. >> Yeah. >> So very exciting. >> I'll get into the SageMaker and CodeWhisperer angle in a second, but you hit on some good points there. One, accessibility, which is, I call the democratization, which is getting this in the hands of developers, and/or AI to develop, we'll get into that in a second. So, access to coding and Git reasoning is a whole nother wave. But the three things I know you've been working on, I want to put in the buckets here and comment, one, I know you've, over the years, been working on saving time to train, that's a big point, you mentioned some of those stats, also cost, 'cause now cost is an equation on, you know, bundling whether you're uncoupling with hardware and software, that's a big issue. Where do I find the GPUs? Where's the horsepower cost? And then also sustainability. You've mentioned that in the past, is there a sustainability angle here? Can you talk about those three things, time, cost, and sustainability? >> Certainly. So if you look at it from the AWS perspective, we have been supporting customers doing machine learning for the past years. Just for broader context, Amazon has been doing ML the past two decades right from the early days of ML powered recommendation to actually also supporting all kinds of generative AI applications. If you look at even generative AI application within Amazon, Amazon search, when you go search for a product and so forth, we have a team called MFi within Amazon search that helps bring these large language models into creating highly accurate search results. And these are created with models, really large models with tens of billions of parameters, scales to thousands of training jobs every month and trained on large model of hardware. And this is an example of a really good large language foundation model application running at production scale, and also, of course, Alexa, which uses a large generator model as well. And they actually even had a research paper that showed that they are more, and do better in accuracy than other systems like GPT-3 and whatnot. So, and we also touched on things like CodeWhisperer, which uses generative AI to improve developer productivity, but in a responsible manner, because 40% of some of the studies show 40% of this generated code had serious security flaws in it. This is where we didn't just do generative AI, we combined with automated reasoning capabilities, which is a very, very useful technique to identify these issues and couple them so that it produces highly secure code as well. Now, all these learnings taught us few things, and which is what you put in these three buckets. And yeah, like more than 100,000 customers using ML and AI services, including leading startups in the generative AI space, like stability AI, AI21 Labs, or Hugging Face, or even Alexa, for that matter. They care about, I put them in three dimension, one is around cost, which we touched on with Trainium and Inferentia, where we actually, the Trainium, you provide to 50% better cost savings, but the other aspect is, Trainium is a lot more power efficient as well compared to traditional one. And Inferentia is also better in terms of throughput, when it comes to what it is capable of. Like it is able to deliver up to three x higher compute performance and four x higher throughput, compared to it's previous generation, and it is extremely cost efficient and power efficient as well. >> Well. >> Now, the second element that really is important is in a day, developers deeply value the time it takes to build these models, and they don't want to build models from scratch. And this is where SageMaker, which is, even going to Kaggle uses, this is what it is, number one, enterprise ML platform. What it did to traditional machine learning, where tens of thousands of customers use StageMaker today, including the ones I mentioned, is that what used to take like months to build these models have dropped down to now a matter of days, if not less. Now, a generative AI, the cost of building these models, if you look at the landscape, the model parameter size had jumped by more than thousand X in the past three years, thousand x. And that means the training is like a really big distributed systems problem. How do you actually scale these model training? How do you actually ensure that you utilize these efficiently? Because these machines are very expensive, let alone they consume a lot of power. So, this is where SageMaker capability to build, automatically train, tune, and deploy models really concern this, especially with this distributor training infrastructure, and those are some of the reasons why some of the leading generative AI startups are actually leveraging it, because they do not want a giant infrastructure team, which is constantly tuning and fine tuning, and keeping these clusters alive. >> It sounds like a lot like what startups are doing with the cloud early days, no data center, you move to the cloud. So, this is the trend we're seeing, right? You guys are making it easier for developers with Hugging Face, I get that. I love that GitHub for machine learning, large language models are complex and expensive to build, but not anymore, you got Trainium and Inferentia, developers can get faster time to value, but then you got the transformers data sets, token libraries, all that optimized for generator. This is a perfect storm for startups. Jon Turow, a former AWS person, who used to work, I think for you, is now a VC at Madrona Venture, he and I were talking about the generator AI landscape, it's exploding with startups. Every alpha entrepreneur out there is seeing this as the next frontier, that's the 20 mile stairs, next 10 years is going to be huge. What is the big thing that's happened? 'Cause some people were saying, the founder of Yquem said, "Oh, the start ups won't be real, because they don't all have AI experience." John Markoff, former New York Times writer told me that, AI, there's so much work done, this is going to explode, accelerate really fast, because it's almost like it's been waiting for this moment. What's your reaction? >> I actually think there is going to be an explosion of startups, not because they need to be AI startups, but now finally AI is really accessible or going to be accessible, so that they can create remarkable applications, either for enterprises or for disrupting actually how customer service is being done or how creative tools are being built. And I mean, this is going to change in many ways. When we think about generative AI, we always like to think of how it generates like school homework or arts or music or whatnot, but when you look at it on the practical side, generative AI is being actually used across various industries. I'll give an example of like Autodesk. Autodesk is a customer who runs an AWS and SageMaker. They already have an offering that enables generated design, where designers can generate many structural designs for products, whereby you give a specific set of constraints and they actually can generate a structure accordingly. And we see similar kind of trend across various industries, where it can be around creative media editing or various others. I have the strong sense that literally, in the next few years, just like now, conventional machine learning is embedded in every application, every mobile app that we see, it is pervasive, and we don't even think twice about it, same way, like almost all apps are built on cloud. Generative AI is going to be part of every startup, and they are going to create remarkable experiences without needing actually, these deep generative AI scientists. But you won't get that until you actually make these models accessible. And I also don't think one model is going to rule the world, then you want these developers to have access to broad range of models. Just like, go back to the early days of deep learning. Everybody thought it is going to be one framework that will rule the world, and it has been changing, from Caffe to TensorFlow to PyTorch to various other things. And I have a suspicion, we had to enable developers where they are, so. >> You know, Dave Vellante and I have been riffing on this concept called super cloud, and a lot of people have co-opted to be multicloud, but we really were getting at this whole next layer on top of say, AWS. You guys are the most comprehensive cloud, you guys are a super cloud, and even Adam and I are talking about ISVs evolving to ecosystem partners. I mean, your top customers have ecosystems building on top of it. This feels like a whole nother AWS. How are you guys leveraging the history of AWS, which by the way, had the same trajectory, startups came in, they didn't want to provision a data center, the heavy lifting, all the things that have made Amazon successful culturally. And day one thinking is, provide the heavy lifting, undifferentiated heavy lifting, and make it faster for developers to program code. AI's got the same thing. How are you guys taking this to the next level, because now, this is an opportunity for the competition to change the game and take it over? This is, I'm sure, a conversation, you guys have a lot of things going on in AWS that makes you unique. What's the internal and external positioning around how you take it to the next level? >> I mean, so I agree with you that generative AI has a very, very strong potential in terms of what it can enable in terms of next generation application. But this is where Amazon's experience and expertise in putting these foundation models to work internally really has helped us quite a bit. If you look at it, like amazon.com search is like a very, very important application in terms of what is the customer impact on number of customers who use that application openly, and the amount of dollar impact it does for an organization. And we have been doing it silently for a while now. And the same thing is true for like Alexa too, which actually not only uses it for natural language understanding other city, even national leverages is set for creating stories and various other examples. And now, our approach to it from AWS is we actually look at it as in terms of the same three tiers like we did in machine learning, because when you look at generative AI, we genuinely see three sets of customers. One is, like really deep technical expert practitioner startups. These are the startups that are creating the next generation models like the likes of stability AIs or Hugging Face with Bloom or AI21. And they generally want to build their own models, and they want the best price performance of their infrastructure for training and inference. That's where our investments in silicon and hardware and networking innovations, where Trainium and Inferentia really plays a big role. And we can nearly do that, and that is one. The second middle tier is where I do think developers don't want to spend time building their own models, let alone, they actually want the model to be useful to that data. They don't need their models to create like high school homeworks or various other things. What they generally want is, hey, I had this data from my enterprises that I want to fine tune and make it really work only for this, and make it work remarkable, can be for tech summarization, to generate a report, or it can be for better Q&A, and so forth. This is where we are. Our investments in the middle tier with SageMaker, and our partnership with Hugging Face and AI21 and co here are all going to very meaningful. And you'll see us investing, I mean, you already talked about CodeWhisperer, which is an open preview, but we are also partnering with a whole lot of top ISVs, and you'll see more on this front to enable the next wave of generated AI apps too, because this is an area where we do think lot of innovation is yet to be done. It's like day one for us in this space, and we want to enable that huge ecosystem to flourish. >> You know, one of the things Dave Vellante and I were talking about in our first podcast we just did on Friday, we're going to do weekly, is we highlighted the AI ChatGPT example as a horizontal use case, because everyone loves it, people are using it in all their different verticals, and horizontal scalable cloud plays perfectly into it. So I have to ask you, as you look at what AWS is going to bring to the table, a lot's changed over the past 13 years with AWS, a lot more services are available, how should someone rebuild or re-platform and refactor their application of business with AI, with AWS? What are some of the tools that you see and recommend? Is it Serverless, is it SageMaker, CodeWhisperer? What do you think's going to shine brightly within the AWS stack, if you will, or service list, that's going to be part of this? As you mentioned, CodeWhisperer and SageMaker, what else should people be looking at as they start tinkering and getting all these benefits, and scale up their ups? >> You know, if we were a startup, first, I would really work backwards from the customer problem I try to solve, and pick and choose, bar, I don't need to deal with the undifferentiated heavy lifting, so. And that's where the answer is going to change. If you look at it then, the answer is not going to be like a one size fits all, so you need a very strong, I mean, granted on the compute front, if you can actually completely accurate it, so unless, I will always recommend it, instead of running compute for running your ups, because it takes care of all the undifferentiated heavy lifting, but on the data, and that's where we provide a whole variety of databases, right from like relational data, or non-relational, or dynamo, and so forth. And of course, we also have a deep analytical stack, where data directly flows from our relational databases into data lakes and data virus. And you can get value along with partnership with various analytical providers. The area where I do think fundamentally things are changing on what people can do is like, with CodeWhisperer, I was literally trying to actually program a code on sending a message through Twilio, and I was going to pull up to read a documentation, and in my ID, I was actually saying like, let's try sending a message to Twilio, or let's actually update a Route 53 error code. All I had to do was type in just a comment, and it actually started generating the sub-routine. And it is going to be a huge time saver, if I were a developer. And the goal is for us not to actually do it just for AWS developers, and not to just generate the code, but make sure the code is actually highly secure and follows the best practices. So, it's not always about machine learning, it's augmenting with automated reasoning as well. And generative AI is going to be changing, and not just in how people write code, but also how it actually gets built and used as well. You'll see a lot more stuff coming on this front. >> Swami, thank you for your time. I know you're super busy. Thank you for sharing on the news and giving commentary. Again, I think this is a AWS moment and industry moment, heavy lifting, accelerated value, agility. AIOps is going to be probably redefined here. Thanks for sharing your commentary. And we'll see you next time, I'm looking forward to doing more follow up on this. It's going to be a big wave. Thanks. >> Okay. Thanks again, John, always a pleasure. >> Okay. This is SiliconANGLE's breaking news commentary. I'm John Furrier with SiliconANGLE News, as well as host of theCUBE. Swami, who's a leader in AWS, has been on theCUBE multiple times. We've been tracking the growth of how Amazon's journey has just been exploding past five years, in particular, past three. You heard the numbers, great performance, great reviews. This is a watershed moment, I think, for the industry, and it's going to be a lot of fun for the next 10 years. Thanks for watching. (bright music)

Published Date : Feb 22 2023

SUMMARY :

Swami, great to have you on inside the ropes, if you And one of the biggest complaints we hear and easy to program and use as well. I call the democratization, the Trainium, you provide And that means the training What is the big thing that's happened? and they are going to create this to the next level, and the amount of dollar impact that's going to be part of this? And generative AI is going to be changing, AIOps is going to be John, always a pleasure. and it's going to be a lot

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Swami	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Jon Turow	PERSON	0.99+
John Markoff	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
40%	QUANTITY	0.99+
Autodesk	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
Madrona Venture	ORGANIZATION	0.99+
20 mile	QUANTITY	0.99+
Hugging Face	ORGANIZATION	0.99+
Friday	DATE	0.99+
second element	QUANTITY	0.99+
more than 100,000 customers	QUANTITY	0.99+
AI21	ORGANIZATION	0.99+
tens of thousands	QUANTITY	0.99+
first podcast	QUANTITY	0.99+
three tiers	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
twice	QUANTITY	0.98+
Bloom Project	TITLE	0.98+
one	QUANTITY	0.98+
SageMaker	ORGANIZATION	0.98+
Hugging Face	TITLE	0.98+
Alexa	TITLE	0.98+
first	QUANTITY	0.98+
GitHub	ORGANIZATION	0.98+
one model	QUANTITY	0.98+
up to 50%	QUANTITY	0.97+
ChatGPT	TITLE	0.97+
First	QUANTITY	0.97+
more than thousand X	QUANTITY	0.97+
amazon.com	ORGANIZATION	0.96+
tens of billions	QUANTITY	0.96+
One	QUANTITY	0.96+
up to 60%	QUANTITY	0.96+
one framework	QUANTITY	0.96+
Yquem	ORGANIZATION	0.94+
three things	QUANTITY	0.94+
Inferentia	ORGANIZATION	0.94+
CodeWhisperer	TITLE	0.93+
four	QUANTITY	0.92+
three sets	QUANTITY	0.92+
three	QUANTITY	0.92+
Twilio	ORGANIZATION	0.92+

Opher Kahane, Sonoma Ventures | CloudNativeSecurityCon 23

(uplifting music) >> Hello, welcome back to theCUBE's coverage of CloudNativeSecurityCon, the inaugural event, in Seattle. I'm John Furrier, host of theCUBE, here in the Palo Alto Studios. We're calling it theCUBE Center. It's kind of like our Sports Center for tech. It's kind of remote coverage. We've been doing this now for a few years. We're going to amp it up this year as more events are remote, and happening all around the world. So, we're going to continue the coverage with this segment focusing on the data stack, entrepreneurial opportunities around all things security, and as, obviously, data's involved. And our next guest is a friend of theCUBE, and CUBE alumni from 2013, entrepreneur himself, turned, now, venture capitalist angel investor, with his own firm, Opher Kahane, Managing Director, Sonoma Ventures. Formerly the founder of Origami, sold to Intuit a few years back. Focusing now on having a lot of fun, angel investing on boards, focusing on data-driven applications, and stacks around that, and all the stuff going on in, really, in the wheelhouse for what's going on around security data. Opher, great to see you. Thanks for coming on. >> My pleasure. Great to be back. It's been a while. >> So you're kind of on Easy Street now. You did the entrepreneurial venture, you've worked hard. We were on together in 2013 when theCUBE just started. XCEL Partners had an event in Stanford, XCEL, and they had all the features there. We interviewed Satya Nadella, who was just a manager at Microsoft at that time, he was there. He's now the CEO of Microsoft. >> Yeah, he was. >> A lot's changed in nine years. But congratulations on your venture you sold, and you got an exit there, and now you're doing a lot of investments. I'd love to get your take, because this is really the biggest change I've seen in the past 12 years, around an inflection point around a lot of converging forces. Data, which, big data, 10 years ago, was a big part of your career, but now it's accelerated, with cloud scale. You're seeing people building scale on top of other clouds, and becoming their own cloud. You're seeing data being a big part of it. Cybersecurity kind of has not really changed much, but it's the most important thing everyone's talking about. So, developers are involved, data's involved, a lot of entrepreneurial opportunities. So I'd love to get your take on how you see the current situation, as it relates to what's gone on in the past five years or so. What's the big story? >> So, a lot of big stories, but I think a lot of it has to do with a promise of making value from data, whether it's for cybersecurity, for Fintech, for DevOps, for RevTech startups and companies. There's a lot of challenges in actually driving and monetizing the value from data with velocity. Historically, the challenge has been more around, "How do I store data at massive scale?" And then you had the big data infrastructure company, like Cloudera, and MapR, and others, deal with it from a scale perspective, from a storage perspective. Then you had a whole layer of companies that evolved to deal with, "How do I index massive scales of data, for quick querying, and federated access, et cetera?" But now that a lot of those underlying problems, if you will, have been solved, to a certain extent, although they're always being stretched, given the scale of data, and its utility is becoming more and more massive, in particular with AI use cases being very prominent right now, the next level is how to actually make value from the data. How do I manage the full lifecycle of data in complex environments, with complex organizations, complex use cases? And having seen this from the inside, with Origami Logic, as we dealt with a lot of large corporations, and post-acquisition by Intuit, and a lot of the startups I'm involved with, it's clear that we're now onto that next step. And you have fundamental new paradigms, such as data mesh, that attempt to address that complexity, and responsibly scaling access, and democratizing access in the value monetization from data, across large organizations. You have a slew of startups that are evolving to help the entire lifecycle of data, from the data engineering side of it, to the data analytics side of it, to the AI use cases side of it. And it feels like the early days, to a certain extent, of the revolution that we've seen in transition from traditional databases, to data warehouses, to cloud-based data processing, and big data. It feels like we're at the genesis of that next wave. And it's super, super exciting, for me at least, as someone who's sitting more in the coach seat, rather than being on the pitch, and building startups, helping folks as they go through those motions. >> So that's awesome. I want to get into some of these data infrastructure dynamics you mentioned, but before that, talk to the audience around what you're working on now. You've been a successful entrepreneur, you're focused on angel investing, so, super-early seed stage. What kind of deals are you looking at? What's interesting to you? What is Sonoma Ventures looking for, and what are some of the entrepreneurial dynamics that you're seeing right now, from a startup standpoint? >> Cool, so, at a macro level, this is a little bit of background of my history, because it shapes very heavily what it is that I'm looking at. So, I've been very fortunate with entrepreneurial career. I founded three startups. All three of them are successful. Final two were sold, the first one merged and went public. And my third career has been about data, moving data, passing data, processing data, generating insights from it. And, at this phase, I wanted to really evolve from just going and building startup number four, from going through the same motions again. A 10 year adventure, I'm a little bit too old for that, I guess. But the next best thing is to sit from a point whereby I can be more elevated in where I'm dealing with, and broaden the variety of startups I'm focused on, rather than just do your own thing, and just go very, very deep into it. Now, what specifically am I focused on at Sonoma Ventures? So, basically, looking at what I refer to as a data-driven application stack. Anything from the low-level data infrastructure and cloud infrastructure, that helps any persona in the data universe maximize value for data, from their particular point of view, for their particular role, whether it's data analysts, data scientists, data engineers, cloud engineers, DevOps folks, et cetera. All the way up to the application layer, in applications that are very data-heavy. And what are very typical data-heavy applications? FinTech, cyber, Web3, revenue technologies, and product and DevOps. So these are the areas we're focused on. I have almost 23 or 24 startups in the portfolio that span all these different areas. And this is in terms of the aperture. Now, typically, focus on pre-seed, seed. Sometimes a little bit later stage, but this is the primary focus. And it's really about partnering with entrepreneurs, and helping them make, if you will, original mistakes, avoid the mistakes I made. >> Yeah. >> And take it to the next level, whatever the milestone they're driving with. So I'm very, very hands-on with many of those startups. Now, what is it that's happening right now, initially, and why is it so exciting? So, on one hand, you have this scaling of data and its complexity, yet lagging value creation from it, across those different personas we've touched on. So that's one fundamental opportunity which is secular. The other one, which is more a cyclic situation, is the fact that we're going through a down cycle in tech, as is very evident in the public markets, and everything we're hearing about funding going slower and lower, terms shifting more into the hands of typical VCs versus entrepreneur-friendly market, and so on and so forth. And a very significant amount of layoffs. Now, when you combine these two trends together, you're observing a very interesting thing, that a lot of folks, really bright folks, who have sold a startup to a company, or have been in the guts of the large startup, or a large corporation, have, hands-on, experienced all those challenges we've spoken about earlier, in turf, maximizing value from data, irrespective of their role, in a specific angle, or vantage point they have on those challenges. So, for many of them, it's an opportunity to, "Now, let me now start a startup. I've been laid off, maybe, or my company's stock isn't doing as well as it used to, as a large corporation. Now I have an opportunity to actually go and take my entrepreneurial passion, and apply it to a product and experience as part of this larger company." >> Yeah. >> And you see a slew of folks who are emerging with these great ideas. So it's a very, very exciting period of time to innovate. >> It's interesting, a lot of people look at, I mean, I look at Snowflake as an example of a company that refactored data warehouses. They just basically took data warehouse, and put it on the cloud, and called it a data cloud. That, to me, was compelling. They didn't pay any CapEx. They rode Amazon's wave there. So, a similar thing going on with data. You mentioned this, and I see it as an enabling opportunity. So whether it's cybersecurity, FinTech, whatever vertical, you have an enablement. Now, you mentioned data infrastructure. It's a super exciting area, as there's so many stacks emerging. We got an analytics stack, there's real-time stacks, there's data lakes, AI stack, foundational models. So, you're seeing an explosion of stacks, different tools probably will emerge. So, how do you look at that, as a seasoned entrepreneur, now investor? Is that a good thing? Is that just more of the market? 'Cause it just seems like more and more kind of decomposed stacks targeted at use cases seems to be a trend. >> Yeah. >> And how do you vet that, is it? >> So it's a great observation, and if you take a step back and look at the evolution of technology over the last 30 years, maybe longer, you always see these cycles of expansion, fragmentation, contraction, expansion, contraction. Go decentralize, go centralize, go decentralize, go centralize, as manifested in different types of technology paradigms. From client server, to storage, to microservices, to et cetera, et cetera. So I think we're going through another big bang, to a certain extent, whereby end up with more specialized data stacks for specific use cases, as you need performance, the data models, the tooling to best adapt to the particular task at hand, and the particular personas at hand. As the needs of the data analysts are quite different from the needs of an NL engineer, it's quite different from the needs of the data engineer. And what happens is, when you end up with these siloed stacks, you end up with new fragmentation, and new gaps that need to be filled with a new layer of innovation. And I suspect that, in part, that's what we're seeing right now, in terms of the next wave of data innovation. Whether it's in a service of FinTech use cases, or cyber use cases, or other, is a set of tools that end up having to try and stitch together those elements and bridge between them. So I see that as a fantastic gap to innovate around. I see, also, a fundamental need in creating a common data language, and common data management processes and governance across those different personas, because ultimately, the same underlying data these folks need, albeit in different mediums, different access models, different velocities, et cetera, the subject matter, if you will, the underlying raw data, and some of the taxonomies right on top of it, do need to be consistent. So, once again, a great opportunity to innovate, whether it's about semantic layers, whether it's about data mesh, whether it's about CICD tools for data engineers, and so on and so forth. >> I got to ask you, first of all, I see you have a friend you brought into the interview. You have a dog in the background who made a little cameo appearance. And that's awesome. Sitting right next to you, making sure everything's going well. On the AI thing, 'cause I think that's the hot trend here. >> Yeah. >> You're starting to see, that ChatGPT's got everyone excited, because it's kind of that first time you see kind of next-gen functionality, large-language models, where you can bring data in, and it integrates well. So, to me, I think, connecting the dots, this kind of speaks to the beginning of what will be a trend of really blending of data stacks together, or blending of models. And so, as more data modeling emerges, you start to have this AI stack kind of situation, where you have things out there that you can compose. It's almost very developer-friendly, conceptually. This is kind of new, but kind of the same concept's been working on with Google and others. How do you see this emerging, as an investor? What are some of the things that you're excited about, around the ChatGPT kind of things that's happening? 'Cause it brings it mainstream. Again, a million downloads, fastest applications get a million downloads, even among all the successes. So it's obviously hit a nerve. People are talking about it. What's your take on that? >> Yeah, so, I think that's a great point, and clearly, it feels like an iPhone moment, right, to the industry, in this case, AI, and lots of applications. And I think there's, at a high level, probably three different layers of innovation. One is on top of those platforms. What use cases can one bring to the table that would drive on top of a ChatGPT-like service? Whereby, the startup, the company, can bring some unique datasets to infuse and add value on top of it, by custom-focusing it and purpose-building it for a particular use case or particular vertical. Whether it's applying it to customer service, in a particular vertical, applying it to, I don't know, marketing content creation, and so on and so forth. That's one category. And I do know that, as one of my startups is in Y Combinator, this season, winter '23, they're saying that a very large chunk of the YC companies in this cycle are about GPT use cases. So we'll see a flurry of that. The next layer, the one below that, is those who actually provide those platforms, whether it's ChatGPT, whatever will emerge from the partnership with Microsoft, and any competitive players that emerge from other startups, or from the big cloud providers, whether it's Facebook, if they ever get into this, and Google, which clearly will, as they need to, to survive around search. The third layer is the enabling layer. As you're going to have more and more of those different large-language models and use case running on top of it, the underlying layers, all the way down to cloud infrastructure, the data infrastructure, and the entire set of tools and systems, that take raw data, and massage it into useful, labeled, contextualized features and data to feed the models, the AI models, whether it's during training, or during inference stages, in production. Personally, my focus is more on the infrastructure than on the application use cases. And I believe that there's going to be a massive amount of innovation opportunity around that, to reach cost-effective, quality, fair models that are deployed easily and maintained easily, or at least with as little pain as possible, at scale. So there are startups that are dealing with it, in various areas. Some are about focusing on labeling automation, some about fairness, about, speaking about cyber, protecting models from threats through data and other issues with it, and so on and so forth. And I believe that this will be, too, a big driver for massive innovation, the infrastructure layer. >> Awesome, and I love how you mentioned the iPhone moment. I call it the browser moment, 'cause it felt that way for me, personally. >> Yep. >> But I think, from a business model standpoint, there is that iPhone shift. It's not the BlackBerry. It's a whole 'nother thing. And I like that. But I do have to ask you, because this is interesting. You mentioned iPhone. iPhone's mostly proprietary. So, in these machine learning foundational models, >> Yeah. >> you're starting to see proprietary hardware, bolt-on, acceleration, bundled together, for faster uptake. And now you got open source emerging, as two things. It's almost iPhone-Android situation happening. >> Yeah. >> So what's your view on that? Because there's pros and cons for either one. You're seeing a lot of these machine learning laws are very proprietary, but they work, and do you care, right? >> Yeah. >> And then you got open source, which is like, "Okay, let's get some upsource code, and let people verify it, and then build with that." Is it a balance? >> Yes, I think- >> Is it mutually exclusive? What's your view? >> I think it's going to be, markets will drive the proportion of both, and I think, for a certain use case, you'll end up with more proprietary offerings. With certain use cases, I guess the fundamental infrastructure for ChatGPT-like, let's say, large-language models and all the use cases running on top of it, that's likely going to be more platform-oriented and open source, and will allow innovation. Think of it as the equivalent of iPhone apps or Android apps running on top of those platforms, as in AI apps. So we'll have a lot of that. Now, when you start going a little bit more into the guts, the lower layers, then it's clear that, for performance reasons, in particular, for certain use cases, we'll end up with more proprietary offerings, whether it's advanced silicon, such as some of the silicon that emerged from entrepreneurs who have left Google, around TensorFlow, and all the silicon that powers that. You'll see a lot of innovation in that area as well. It hopefully intends to improve the cost efficiency of running large AI-oriented workloads, both in inference and in learning stages. >> I got to ask you, because this has come up a lot around Azure and Microsoft. Microsoft, pretty good move getting into the ChatGPT >> Yep. >> and the open AI, because I was talking to someone who's a hardcore Amazon developer, and they said, they swore they would never use Azure, right? One of those types. And they're spinning up Azure servers to get access to the API. So, the developers are flocking, as you mentioned. The YC class is all doing large data things, because you can now program with data, which is amazing, which is amazing. So, what's your take on, I know you got to be kind of neutral 'cause you're an investor, but you got, Amazon has to respond, Google, essentially, did all the work, so they have to have a solution. So, I'm expecting Google to have something very compelling, but Microsoft, right now, is going to just, might run the table on developers, this new wave of data developers. What's your take on the cloud responses to this? What's Amazon, what do you think AWS is going to do? What should Google be doing? What's your take? >> So, each of them is coming from a slightly different angle, of course. I'll say, Google, I think, has massive assets in the AI space, and their underlying cloud platform, I think, has been designed to support such complicated workloads, but they have yet to go as far as opening it up the same way ChatGPT is now in that Microsoft partnership, and Azure. Good question regarding Amazon. AWS has had a significant investment in AI-related infrastructure. Seeing it through my startups, through other lens as well. How will they respond to that higher layer, above and beyond the low level, if you will, AI-enabling apparatuses? How do they elevate to at least one or two layers above, and get to the same ChatGPT layer, good question. Is there an acquisition that will make sense for them to accelerate it, maybe. Is there an in-house development that they can reapply from a different domain towards that, possibly. But I do suspect we'll end up with acquisitions as the arms race around the next level of cloud wars emerges, and it's going to be no longer just about the basic tooling for basic cloud-based applications, and the infrastructure, and the cost management, but rather, faster time to deliver AI in data-heavy applications. Once again, each one of those cloud suppliers, their vendor is coming with different assets, and different pros and cons. All of them will need to just elevate the level of the fight, if you will, in this case, to the AI layer. >> It's going to be very interesting, the different stacks on the data infrastructure, like I mentioned, analytics, data lake, AI, all happening. It's going to be interesting to see how this turns into this AI cloud, like data clouds, data operating systems. So, super fascinating area. Opher, thank you for coming on and sharing your expertise with us. Great to see you, and congratulations on the work. I'll give you the final word here. Give a plugin for what you're looking for for startup seats, pre-seeds. What's the kind of profile that gets your attention, from a seed, pre-seed candidate or entrepreneur? >> Cool, first of all, it's my pleasure. Enjoy our chats, as always. Hopefully the next one's not going to be in nine years. As to what I'm looking for, ideally, smart data entrepreneurs, who have come from a particular domain problem, or problem domain, that they understand, they felt it in their own 10 fingers, or millions of neurons in their brains, and they figured out a way to solve it. Whether it's a data infrastructure play, a cloud infrastructure play, or a very, very smart application that takes advantage of data at scale. These are the things I'm looking for. >> One final, final question I have to ask you, because you're a seasoned entrepreneur, and now coach. What's different about the current entrepreneurial environment right now, vis-a-vis, the past decade? What's new? Is it different, highly accelerated? What advice do you give entrepreneurs out there who are putting together their plan? Obviously, a global resource pool now of engineering. It might not be yesterday's formula for success to putting a venture together to get to that product-market fit. What's new and different, and what's your advice to the folks out there about what's different about the current environment for being an entrepreneur? >> Fantastic, so I think it's a great question. So I think there's a few axes of difference, compared to, let's say, five years ago, 10 years ago, 15 years ago. First and foremost, given the amount of infrastructure out there, the amount of open-source technologies, amount of developer toolkits and frameworks, trying to develop an application, at least at the application layer, is much faster than ever. So, it's faster and cheaper, to the most part, unless you're building very fundamental, core, deep tech, where you still have a big technology challenge to deal with. And absent that, the challenge shifts more to how do you manage my resources, to product-market fit, how are you integrating the GTM lens, the go-to-market lens, as early as possible in the product-market fit cycle, such that you reach from pre-seed to seed, from seed to A, from A to B, with an optimal amount of velocity, and a minimal amount of resources. One big difference, specifically as of, let's say, beginning of this year, late last year, is that money is no longer free for entrepreneurs, which means that you need to operate and build startup in an environment with a lot more constraints. And in my mind, some of the best startups that have ever been built, and some of the big market-changing, generational-changing, if you will, technology startups, in their respective industry verticals, have actually emerged from these times. And these tend to be the smartest, best startups that emerge because they operate with a lot less money. Money is not as available for them, which means that they need to make tough decisions, and make verticals every day. What you don't need to do, you can kick the cow down the road. When you have plenty of money, and it cushions for a lot of mistakes, you don't have that cushion. And hopefully we'll end up with companies with a more agile, more, if you will, resilience, and better cultures in making those tough decisions that startups need to make every day. Which is why I'm super, super excited to see the next batch of amazing unicorns, true unicorns, not just valuation, market rising with the water type unicorns that emerged from this particular era, which we're in the beginning of. And very much enjoy working with entrepreneurs during this difficult time, the times we're in. >> The next 24 months will be the next wave, like you said, best time to do a company. Remember, Airbnb's pitch was, "We'll rent cots in apartments, and sell cereal." Boy, a lot of people passed on that deal, in that last down market, that turned out to be a game-changer. So the crazy ideas might not be that bad. So it's all about the entrepreneurs, and >> 100%. >> this is a big wave, and it's certainly happening. Opher, thank you for sharing. Obviously, data is going to change all the markets. Refactoring, security, FinTech, user experience, applications are going to be changed by data, data operating system. Thanks for coming on, and thanks for sharing. Appreciate it. >> My pleasure. Have a good one. >> Okay, more coverage for the CloudNativeSecurityCon inaugural event. Data will be the key for cybersecurity. theCUBE's coverage continues after this break. (uplifting music)

Published Date : Feb 2 2023

SUMMARY :

and happening all around the world. Great to be back. He's now the CEO in the past five years or so. and a lot of the startups What kind of deals are you looking at? and broaden the variety of and apply it to a product and experience And you see a slew of folks and put it on the cloud, and new gaps that need to be filled You have a dog in the background but kind of the same and the entire set of tools and systems, I call it the browser moment, But I do have to ask you, And now you got open source and do you care, right? and then build with that." and all the use cases I got to ask you, because and the open AI, and it's going to be no longer What's the kind of profile These are the things I'm looking for. about the current environment and some of the big market-changing, So it's all about the entrepreneurs, and to change all the markets. Have a good one. for the CloudNativeSecurityCon

ENTITIES

Entity	Category	Confidence
Satya Nadella	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
2013	DATE	0.99+
Opher	PERSON	0.99+
CapEx	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
John Furrier	PERSON	0.99+
Sonoma Ventures	ORGANIZATION	0.99+
BlackBerry	ORGANIZATION	0.99+
10 fingers	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
CUBE	ORGANIZATION	0.99+
nine years	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Origami Logic	ORGANIZATION	0.99+
Origami	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
RevTech	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Opher Kahane	PERSON	0.99+
CloudNativeSecurityCon	EVENT	0.99+
Palo Alto Studios	LOCATION	0.99+
yesterday	DATE	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
third layer	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
two layers	QUANTITY	0.98+
Android	TITLE	0.98+
third career	QUANTITY	0.98+
two things	QUANTITY	0.98+
both	QUANTITY	0.98+
MapR	ORGANIZATION	0.98+
one	QUANTITY	0.98+
one category	QUANTITY	0.98+
late last year	DATE	0.98+
millions of neurons	QUANTITY	0.98+
a million downloads	QUANTITY	0.98+
three startups	QUANTITY	0.98+
10 years ago	DATE	0.97+
Fintech	ORGANIZATION	0.97+
winter '23	DATE	0.97+
first one	QUANTITY	0.97+
this year	DATE	0.97+
Stanford	LOCATION	0.97+
Cloudera	ORGANIZATION	0.97+
theCUBE Center	ORGANIZATION	0.96+
five years ago	DATE	0.96+
10 year	QUANTITY	0.96+
ChatGPT	TITLE	0.96+
three	QUANTITY	0.95+
first time	QUANTITY	0.95+
XCEL Partners	ORGANIZATION	0.95+
15 years ago	DATE	0.94+
24 startups	QUANTITY	0.93+

Dhabaleswar “DK” Panda, Ohio State State University | SuperComputing 22

>>Welcome back to The Cube's coverage of Supercomputing Conference 2022, otherwise known as SC 22 here in Dallas, Texas. This is day three of our coverage, the final day of coverage here on the exhibition floor. I'm Dave Nicholson, and I'm here with my co-host, tech journalist extraordinaire, Paul Gillum. How's it going, >>Paul? Hi, Dave. It's going good. >>And we have a wonderful guest with us this morning, Dr. Panda from the Ohio State University. Welcome Dr. Panda to the Cube. >>Thanks a lot. Thanks a lot to >>Paul. I know you're, you're chopping at >>The bit, you have incredible credentials, over 500 papers published. The, the impact that you've had on HPC is truly remarkable. But I wanted to talk to you specifically about a product project you've been working on for over 20 years now called mva, high Performance Computing platform that's used by more than 32 organ, 3,200 organizations across 90 countries. You've shepherded this from, its, its infancy. What is the vision for what MVA will be and and how is it a proof of concept that others can learn from? >>Yeah, Paul, that's a great question to start with. I mean, I, I started with this conference in 2001. That was the first time I came. It's very coincidental. If you remember the Finman Networking Technology, it was introduced in October of 2000. Okay. So in my group, we were working on NPI for Marinette Quadrics. Those are the old technology, if you can recollect when Finman was there, we were the very first one in the world to really jump in. Nobody knew how to use Infin van in an HPC system. So that's how the Happy Project was born. And in fact, in super computing 2002 on this exhibition floor in Baltimore, we had the first demonstration, the open source happy, actually is running on an eight node infinite van clusters, eight no zeros. And that was a big challenge. But now over the years, I means we have continuously worked with all infinite van vendors, MPI Forum. >>We are a member of the MPI Forum and also all other network interconnect. So we have steadily evolved this project over the last 21 years. I'm very proud of my team members working nonstop, continuously bringing not only performance, but scalability. If you see now INFIN event are being deployed in 8,000, 10,000 node clusters, and many of these clusters actually use our software, stack them rapid. So, so we have done a lot of, like our focuses, like we first do research because we are in academia. We come up with good designs, we publish, and in six to nine months, we actually bring it to the open source version and people can just download and then use it. And that's how currently it's been used by more than 3000 orange in 90 countries. And, but the interesting thing is happening, your second part of the question. Now, as you know, the field is moving into not just hvc, but ai, big data, and we have those support. This is where like we look at the vision for the next 20 years, we want to design this MPI library so that not only HPC but also all other workloads can take advantage of it. >>Oh, we have seen libraries that become a critical develop platform supporting ai, TensorFlow, and, and the pie torch and, and the emergence of, of, of some sort of default languages that are, that are driving the community. How, how important are these frameworks to the, the development of the progress making progress in the HPC world? >>Yeah, no, those are great. I mean, spite our stencil flow, I mean, those are the, the now the bread and butter of deep learning machine learning. Am I right? But the challenge is that people use these frameworks, but continuously models are becoming larger. You need very first turnaround time. So how do you train faster? How do you do influencing faster? So this is where HPC comes in and what exactly what we have done is actually we have linked floor fighters to our happy page because now you see the MPI library is running on a million core system. Now your fighters and tenor four clan also be scaled to to, to those number of, large number of course and gps. So we have actually done that kind of a tight coupling and that helps the research to really take advantage of hpc. >>So if, if a high school student is thinking in terms of interesting computer science, looking for a place, looking for a university, Ohio State University, bruns, world renowned, widely known, but talk about what that looks like from a day on a day to day basis in terms of the opportunity for undergrad and graduate students to participate in, in the kind of work that you do. What is, what does that look like? And is, and is that, and is that a good pitch to for, for people to consider the university? >>Yes. I mean, we continuously, from a university perspective, by the way, the Ohio State University is one of the largest single campus in, in us, one of the top three, top four. We have 65,000 students. Wow. It's one of the very largest campus. And especially within computer science where I am located, high performance computing is a very big focus. And we are one of the, again, the top schools all over the world for high performance computing. And we also have very strength in ai. So we always encourage, like the new students who like to really work on top of the art solutions, get exposed to the concepts, principles, and also practice. Okay. So, so we encourage those people that wish you can really bring you those kind of experience. And many of my past students, staff, they're all in top companies now, have become all big managers. >>How, how long, how long did you say you've been >>At 31 >>Years? 31 years. 31 years. So, so you, you've had people who weren't alive when you were already doing this stuff? That's correct. They then were born. Yes. They then grew up, yes. Went to university graduate school, and now they're on, >>Now they're in many top companies, national labs, all over the universities, all over the world. So they have been trained very well. Well, >>You've, you've touched a lot of lives, sir. >>Yes, thank you. Thank >>You. We've seen really a, a burgeoning of AI specific hardware emerge over the last five years or so. And, and architectures going beyond just CPUs and GPUs, but to Asics and f PGAs and, and accelerators, does this excite you? I mean, are there innovations that you're seeing in this area that you think have, have great promise? >>Yeah, there is a lot of promise. I think every time you see now supercomputing technology, you see there is sometime a big barrier comes barrier jump. Rather I'll say, new technology comes some disruptive technology, then you move to the next level. So that's what we are seeing now. A lot of these AI chips and AI systems are coming up, which takes you to the next level. But the bigger challenge is whether it is cost effective or not, can that be sustained longer? And this is where commodity technology comes in, which commodity technology tries to take you far longer. So we might see like all these likes, Gaudi, a lot of new chips are coming up, can they really bring down the cost? If that cost can be reduced, you will see a much more bigger push for AI solutions, which are cost effective. >>What, what about on the interconnect side of things, obvi, you, you, your, your start sort of coincided with the initial standards for Infin band, you know, Intel was very, very, was really big in that, in that architecture originally. Do you see interconnects like RDMA over converged ethernet playing a part in that sort of democratization or commoditization of things? Yes. Yes. What, what are your thoughts >>There for internet? No, this is a great thing. So, so we saw the infinite man coming. Of course, infinite Man is, commod is available. But then over the years people have been trying to see how those RDMA mechanisms can be used for ethernet. And then Rocky has been born. So Rocky has been also being deployed. But besides these, I mean now you talk about Slingshot, the gray slingshot, it is also an ethernet based systems. And a lot of those RMA principles are actually being used under the hood. Okay. So any modern networks you see, whether it is a Infin and Rocky Links art network, rock board network, you name any of these networks, they are using all the very latest principles. And of course everybody wants to make it commodity. And this is what you see on the, on the slow floor. Everybody's trying to compete against each other to give you the best performance with the lowest cost, and we'll see whoever wins over the years. >>Sort of a macroeconomic question, Japan, the US and China have been leapfrogging each other for a number of years in terms of the fastest supercomputer performance. How important do you think it is for the US to maintain leadership in this area? >>Big, big thing, significantly, right? We are saying that I think for the last five to seven years, I think we lost that lead. But now with the frontier being the number one, starting from the June ranking, I think we are getting that leadership back. And I think it is very critical not only for fundamental research, but for national security trying to really move the US to the leading edge. So I hope us will continue to lead the trend for the next few years until another new system comes out. >>And one of the gating factors, there is a shortage of people with data science skills. Obviously you're doing what you can at the university level. What do you think can change at the secondary school level to prepare students better to, for data science careers? >>Yeah, I mean that is also very important. I mean, we, we always call like a pipeline, you know, that means when PhD levels we are expecting like this even we want to students to get exposed to, to, to many of these concerts from the high school level. And, and things are actually changing. I mean, these days I see a lot of high school students, they, they know Python, how to program in Python, how to program in sea object oriented things. Even they're being exposed to AI at that level. So I think that is a very healthy sign. And in fact we, even from Ohio State side, we are always engaged with all this K to 12 in many different programs and then gradually trying to take them to the next level. And I think we need to accelerate also that in a very significant manner because we need those kind of a workforce. It is not just like a building a system number one, but how do we really utilize it? How do we utilize that science? How do we propagate that to the community? Then we need all these trained personal. So in fact in my group, we are also involved in a lot of cyber training activities for HPC professionals. So in fact, today there is a bar at 1 1 15 I, yeah, I think 1215 to one 15. We'll be talking more about that. >>About education. >>Yeah. Cyber training, how do we do for professionals? So we had a funding together with my co-pi, Dr. Karen Tom Cook from Ohio Super Center. We have a grant from NASA Science Foundation to really educate HPT professionals about cyber infrastructure and ai. Even though they work on some of these things, they don't have the complete knowledge. They don't get the time to, to learn. And the field is moving so fast. So this is how it has been. We got the initial funding, and in fact, the first time we advertised in 24 hours, we got 120 application, 24 hours. We couldn't even take all of them. So, so we are trying to offer that in multiple phases. So, so there is a big need for those kind of training sessions to take place. I also offer a lot of tutorials at all. Different conference. We had a high performance networking tutorial. Here we have a high performance deep learning tutorial, high performance, big data tutorial. So I've been offering tutorials at, even at this conference since 2001. Good. So, >>So in the last 31 years, the Ohio State University, as my friends remind me, it is properly >>Called, >>You've seen the world get a lot smaller. Yes. Because 31 years ago, Ohio, in this, you know, of roughly in the, in the middle of North America and the United States was not as connected as it was to everywhere else in the globe. So that's, that's pro that's, I i it kind of boggles the mind when you think of that progression over 31 years, but globally, and we talk about the world getting smaller, we're sort of in the thick of, of the celebratory seasons where, where many, many groups of people exchange gifts for varieties of reasons. If I were to offer you a holiday gift, that is the result of what AI can deliver the world. Yes. What would that be? What would, what would, what would the first thing be? This is, this is, this is like, it's, it's like the genie, but you only get one wish. >>I know, I know. >>So what would the first one be? >>Yeah, it's very hard to answer one way, but let me bring a little bit different context and I can answer this. I, I talked about the happy project and all, but recently last year actually we got awarded an S f I institute award. It's a 20 million award. I am the overall pi, but there are 14 universities involved. >>And who is that in that institute? >>What does that Oh, the I ici. C e. Okay. I cycle. You can just do I cycle.ai. Okay. And that lies with what exactly what you are trying to do, how to bring lot of AI for masses, democratizing ai. That's what is the overall goal of this, this institute, think of like a, we have three verticals we are working think of like one is digital agriculture. So I'll be, that will be my like the first ways. How do you take HPC and AI to agriculture the world as though we just crossed 8 billion people. Yeah, that's right. We need continuous food and food security. How do we grow food with the lowest cost and with the highest yield? >>Water >>Consumption. Water consumption. Can we minimize or minimize the water consumption or the fertilization? Don't do blindly. Technologies are out there. Like, let's say there is a weak field, A traditional farmer see that, yeah, there is some disease, they will just go and spray pesticides. It is not good for the environment. Now I can fly it drone, get images of the field in the real time, check it against the models, and then it'll tell that, okay, this part of the field has disease. One, this part of the field has disease. Two, I indicate to the, to the tractor or the sprayer saying, okay, spray only pesticide one, you have pesticide two here. That has a big impact. So this is what we are developing in that NSF A I institute I cycle ai. We also have, we have chosen two additional verticals. One is animal ecology, because that is very much related to wildlife conservation, climate change, how do you understand how the animals move? Can we learn from them? And then see how human beings need to act in future. And the third one is the food insecurity and logistics. Smart food distribution. So these are our three broad goals in that institute. How do we develop cyber infrastructure from below? Combining HP c AI security? We have, we have a large team, like as I said, there are 40 PIs there, 60 students. We are a hundred members team. We are working together. So, so that will be my wish. How do we really democratize ai? >>Fantastic. I think that's a great place to wrap the conversation here On day three at Supercomputing conference 2022 on the cube, it was an honor, Dr. Panda working tirelessly at the Ohio State University with his team for 31 years toiling in the field of computer science and the end result, improving the lives of everyone on Earth. That's not a stretch. If you're in high school thinking about a career in computer science, keep that in mind. It isn't just about the bits and the bobs and the speeds and the feeds. It's about serving humanity. Maybe, maybe a little, little, little too profound a statement, I would argue not even close. I'm Dave Nicholson with the Queue, with my cohost Paul Gillin. Thank you again, Dr. Panda. Stay tuned for more coverage from the Cube at Super Compute 2022 coming up shortly. >>Thanks a lot.

Published Date : Nov 17 2022

SUMMARY :

Welcome back to The Cube's coverage of Supercomputing Conference 2022, And we have a wonderful guest with us this morning, Dr. Thanks a lot to But I wanted to talk to you specifically about a product project you've So in my group, we were working on NPI for So we have steadily evolved this project over the last 21 years. that are driving the community. So we have actually done that kind of a tight coupling and that helps the research And is, and is that, and is that a good pitch to for, So, so we encourage those people that wish you can really bring you those kind of experience. you were already doing this stuff? all over the world. Thank this area that you think have, have great promise? I think every time you see now supercomputing technology, with the initial standards for Infin band, you know, Intel was very, very, was really big in that, And this is what you see on the, Sort of a macroeconomic question, Japan, the US and China have been leapfrogging each other for a number the number one, starting from the June ranking, I think we are getting that leadership back. And one of the gating factors, there is a shortage of people with data science skills. And I think we need to accelerate also that in a very significant and in fact, the first time we advertised in 24 hours, we got 120 application, that's pro that's, I i it kind of boggles the mind when you think of that progression over 31 years, I am the overall pi, And that lies with what exactly what you are trying to do, to the tractor or the sprayer saying, okay, spray only pesticide one, you have pesticide two here. I think that's a great place to wrap the conversation here On

ENTITIES

Entity	Category	Confidence
Dave Nicholson	PERSON	0.99+
Paul Gillum	PERSON	0.99+
Dave	PERSON	0.99+
Paul Gillin	PERSON	0.99+
October of 2000	DATE	0.99+
Paul	PERSON	0.99+
NASA Science Foundation	ORGANIZATION	0.99+
2001	DATE	0.99+
Baltimore	LOCATION	0.99+
8,000	QUANTITY	0.99+
14 universities	QUANTITY	0.99+
31 years	QUANTITY	0.99+
20 million	QUANTITY	0.99+
24 hours	QUANTITY	0.99+
last year	DATE	0.99+
Karen Tom Cook	PERSON	0.99+
60 students	QUANTITY	0.99+
Ohio State University	ORGANIZATION	0.99+
90 countries	QUANTITY	0.99+
six	QUANTITY	0.99+
Earth	LOCATION	0.99+
Panda	PERSON	0.99+
today	DATE	0.99+
65,000 students	QUANTITY	0.99+
3,200 organizations	QUANTITY	0.99+
North America	LOCATION	0.99+
Python	TITLE	0.99+
United States	LOCATION	0.99+
Dallas, Texas	LOCATION	0.99+
over 500 papers	QUANTITY	0.99+
June	DATE	0.99+
One	QUANTITY	0.99+
more than 32 organ	QUANTITY	0.99+
120 application	QUANTITY	0.99+
Ohio	LOCATION	0.99+
more than 3000 orange	QUANTITY	0.99+
first ways	QUANTITY	0.99+
one	QUANTITY	0.99+
nine months	QUANTITY	0.99+
40 PIs	QUANTITY	0.99+
Asics	ORGANIZATION	0.99+
MPI Forum	ORGANIZATION	0.98+
China	ORGANIZATION	0.98+
Two	QUANTITY	0.98+
Ohio State State University	ORGANIZATION	0.98+
8 billion people	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
HP	ORGANIZATION	0.97+
Dr.	PERSON	0.97+
over 20 years	QUANTITY	0.97+
US	ORGANIZATION	0.97+
Finman	ORGANIZATION	0.97+
Rocky	PERSON	0.97+
Japan	ORGANIZATION	0.97+
first time	QUANTITY	0.97+
first demonstration	QUANTITY	0.96+
31 years ago	DATE	0.96+
Ohio Super Center	ORGANIZATION	0.96+
three broad goals	QUANTITY	0.96+
one wish	QUANTITY	0.96+
second part	QUANTITY	0.96+
31	QUANTITY	0.96+
Cube	ORGANIZATION	0.95+
eight	QUANTITY	0.95+
over 31 years	QUANTITY	0.95+
10,000 node clusters	QUANTITY	0.95+
day three	QUANTITY	0.95+
first	QUANTITY	0.95+
INFIN	EVENT	0.94+
seven years	QUANTITY	0.94+
Dhabaleswar “DK” Panda	PERSON	0.94+
three	QUANTITY	0.93+
S f I institute	TITLE	0.93+
first thing	QUANTITY	0.93+

Felix Van de Maele, Collibra, Data Citizens 22

(upbeat techno music) >> Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions, and they were largely confined to regulated industries that had to comply with public policy mandates. But as the cloud went mainstream the tech giants showed us how valuable data could become, and the value proposition for data quality and trust, it evolved from primarily a compliance driven issue, to becoming a linchpin of competitive advantage. But, data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper-specialized skills, to develop data architectures and processes, to serve the myriad data needs of organizations. And it resulted in a lot of frustration, with data initiatives for most organizations, that didn't have the resources of the cloud guys and the social media giants, to really attack their data problems and turn data into gold. This is why today, for example, there's quite a bit of momentum to re-thinking monolithic data architectures. You see, you hear about initiatives like Data Mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business users. You hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver, like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that but also, how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. In other words, while it's enticing to experiment, and run fast and loose with data initiatives, kind of like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated and intelligent. Governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is going to use data that is entrusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. Hello and welcome to theCUBE's coverage of Data Citizens made possible by Collibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Vellante and I'm one of the hosts of our program which is running in parallel to Data Citizens. Now at theCUBE we like to say we extract the signal from the noise, and over the next couple of days we're going to feature some of the themes from the keynote speakers at Data Citizens, and we'll hear from several of the executives. Felix Van de Maele, who is the co-founder and CEO of Collibra, will join us. Along with one of the other founders of Collibra, Stan Christiaens, who's going to join my colleague Lisa Martin. I'm going to also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Haslbeck. He's the Vice President of Data Quality at Collibra. He's an amazingly smart dude who founded Owl DQ, a company that he sold to Collibra last year. Now, many companies they didn't make it through the Hadoop era, you know they missed the industry waves and they became driftwood. Collibra, on the other hand, has evolved its business, they've leveraged the cloud, expanded its product portfolio and leaned in heavily to some major partnerships with cloud providers as well as receiving a strategic investment from Snowflake, earlier this year. So, it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. (upbeat rock music) Last year theCUBE covered Data Citizens, Collibra's customer event, and the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know starting with the Hadoop movement, we had Data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like Tensorflow, the rise of AI, Low Code, No Code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives, and we said at the time, you know maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation. Meaning, making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back and we're pleased to have Felix Van de Maele who is the founder and CEO of Collibra. He's on theCUBE. We're excited to have you Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data citizens 2021, and we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends and we're not just snapping back to the 2010s, that's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s, from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely, and and I think you said it well, Dave and the intro that, that rising complexity and fragmentation, in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten more more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under, respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity, and fragmentation. So, it's become much more acute. And to your earlier point, we do live in a different world and and the past couple of years we could probably just kind of brute force it, right? We could focus on, on the top line, there was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, how do we truly get the value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale with data, not just from a a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? You said at the, the people and process, how do we do that at scale? And that's only, only, only becoming much more important, and we do believe that the, the economic environment that we find ourselves in today is going to be catalyst for organizations to really take that more seriously if, if, if you will, than they maybe have in the have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >> Yeah, absolutely. We, we started Collibra in 2008. So, in some sense and the, the last kind of financial crisis and that was really the, the start of Collibra, where we found product market fit, working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis. And kind of here we are again, in a very different environment of course 15 years, almost 15 years later, but data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So, what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it Data Citizens, we truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we still relatively early in that, in that journey. >> Well that's interesting, because you know, in my observation it takes 7 to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwind organizations that are only maturing their data practices and we've seen that kind of transform or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world with its Adobe, Heineken, Bank of America and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in the, in the market with some of the cloud partners like Google, Amazon, Snowflake, Data Breaks, and and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new but I think there's a lot of reasons why we're so excited about quality and observability now. One, is around leveraging AI machine learning again to drive more automation. And a second is that those data pipelines, that are now being created in the cloud, in these modern data architecture, architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously, has become absolutely critical so that they're really excited about, about that as well. And on the organizational side, I'm sure you've heard the term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations, and so that aligns really well with our vision and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know Kirk Haslbeck and, and their team. It's interesting, you know the whole data quality used to be this back office function and and really confined to highly regulated industries. It's come to the front office, it's top of mind for Chief Data Officers. Data mesh, you mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So, let's chat a little bit about the, the products. We're going to go deeper into products later on, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes, as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We we're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission. Either customers are still start, are just starting on that, on that journey. We want to make it as easy as possible for the, for organization to actually get started, because we know that's important that they do. And for our organization and customers, that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again to make it easier for, really to, to accomplish that mission and vision around that Data Citizen, that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that, as clear becomes this kind of mission critical enterprise platform, from a security performance, architecture scale supportability, that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One, is around data marketplace. Again, a lot of our customers have plans in that direction, How to make it easy? How do we make How do we make available to true kind of shopping experience? So that anybody in the organization can, in a very easy search first way, find the right data product, find the right dataset, that they can then consume. Usage analytics, how do you, how do we help organizations drive adoption? Tell them where they're working really well and where they have opportunities. Homepages again to, to make things easy for, for people, for anyone in your organization, to kind of get started with Collibra. You mentioned Workflow Designer, again, we have a very powerful enterprise platform, one of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a, a new Low-Code, No-Code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around Collibra protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PIA data, is managed as a much more effective, effective rate. Really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily, and quickly, and widely as we can? Moving that to the cloud has been a big part of our strategy. So, we launch our data quality cloud product, as well as making use of those, those native compute capabilities and platforms, like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down, so we're actually pushing down the computer and data quality, to monitoring into the underlying platform, which again from a scale performance and ease of use perspective, is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So that's a lot coming out, the team has been work, at work really hard, and we are really really excited about what we are coming, what we're bringing to market. >> Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you you talked about, you know, the marketplace, you know you think about Data Mesh, you think of data as product, one of the key principles, you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been, been so hard. So, how do you see sort of the future and, you know give us the, your closing thoughts please? >> Yeah, absolutely. And, and I think we we're really at a pivotal moment and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to, deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can, as kind of our, as our mission. And so I'm really, really excited to see what we, what we are going to, how the marks are going to evolve over the next, next few quarters and years. I think the trend is clearly there. We talked about Data Mesh, this kind of federated approach focus on data products, is just another signal that we believe, that a lot of our organization are now at the time, they're understanding need to go beyond just the technology. I really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now is the time, given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now is the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much. Good luck in, in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in-person event and and of course the content post-event is going to be available at collibra.com and you can of course catch theCUBE coverage at theCUBE.net and all the news at siliconangle.com. This is Dave Vellante for theCUBE, your leader in enterprise and emerging tech coverage. (upbeat techno music)

Published Date : Nov 2 2022

SUMMARY :

and the premise that we put for having me again. in the data landscape of the 2020s, and to scale with data, and what are you doing to And kind of here we are again, still in the early days a lot of momentum in the org in the, And of course we see you at all the shows. is the ability to the technology to work and now is the time we need to look of data won't be like the and of course the content

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Heineken	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Van de Maele	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Laura Sellers	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
San Diego	LOCATION	0.99+
Stan Christiaens	PERSON	0.99+
Dave	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
7	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
2020s	DATE	0.99+
last year	DATE	0.99+
2010s	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
Python	TITLE	0.99+
Last year	DATE	0.99+
12 months	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
one	QUANTITY	0.99+
Data Citizens	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Owl DQ	ORGANIZATION	0.98+
10	DATE	0.98+
OwlDQ	ORGANIZATION	0.98+
Kirk Haslbeck	PERSON	0.98+
10 years	QUANTITY	0.98+
One	QUANTITY	0.98+
Spark	TITLE	0.98+
today	DATE	0.98+
first	QUANTITY	0.97+
Data Citizens	EVENT	0.97+
earlier this year	DATE	0.96+
Tensorflow	TITLE	0.96+
Data Citizens 22	ORGANIZATION	0.95+
both	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
15 years ago	DATE	0.93+
over 600 enterprise customers	QUANTITY	0.91+
past couple of years	DATE	0.91+
about 18 months ago	DATE	0.9+
collibra.com	OTHER	0.89+
Data citizens 2021	ORGANIZATION	0.88+
Data Citizens 2022	EVENT	0.86+
almost 15 years later	DATE	0.85+
West	LOCATION	0.85+
Azure	TITLE	0.84+
first way	QUANTITY	0.83+
Vice President	PERSON	0.83+
last couple of years	DATE	0.8+

Collibra Data Citizens 22

>>Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions and they were largely confined to regulatory regulated industries that had to comply with public policy mandates. But as the cloud went mainstream, the tech giants showed us how valuable data could become and the value proposition for data quality and trust. It evolved from primarily a compliance driven issue to becoming a lynchpin of competitive advantage. But data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper specialized skills to develop data architectures and processes to serve the myriad data needs of organizations. >>And it resulted in a lot of frustration with data initiatives for most organizations that didn't have the resources of the cloud guys and the social media giants to really attack their data problems and turn data into gold. This is why today for example, this quite a bit of momentum to rethinking monolithic data architectures. You see, you hear about initiatives like data mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business Uni users, you hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that, but also how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. >>In other words, while it's enticing to experiment and run fast and loose with data initiatives kinda like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated. And intelligence governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is gonna use data that isn't trusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. >>Hello and welcome to the Cube's coverage of Data Citizens made possible by Calibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Ante and I'm one of the hosts of our program, which is running in parallel to data citizens. Now at the Cube we like to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the themes from the keynote speakers at Data Citizens and we'll hear from several of the executives. Felix Von Dala, who is the co-founder and CEO of Collibra, will join us along with one of the other founders of Collibra, Stan Christians, who's gonna join my colleague Lisa Martin. I'm gonna also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the, the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Hasselbeck. >>He's the vice president of Data quality at Collibra. He's an amazingly smart dude who founded Owl dq, a company that he sold to Col to Collibra last year. Now many companies, they didn't make it through the Hado era, you know, they missed the industry waves and they became Driftwood. Collibra, on the other hand, has evolved its business. They've leveraged the cloud, expanded its product portfolio, and leaned in heavily to some major partnerships with cloud providers, as well as receiving a strategic investment from Snowflake earlier this year. So it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. >>Last year, the Cube Covered Data Citizens Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hado movement, we had data lakes, we'd spark the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of ai, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you know, more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights for data, trust the data, and begin to use that data in new ways, fueling data, products, monetization and insights data citizens 2022 is back and we're pleased to have Felix Van Dema, who is the founder and CEO of Collibra. He's on the cube or excited to have you, Felix. Good to see you again. >>Likewise Dave. Thanks for having me again. >>You bet. All right, we're gonna get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever and get current on what Collibra has been up to over the past year and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >>Yeah, absolutely. And, and I think you said it well, Dave, and and the intro that that rising complexity and fragmentation in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use has only gotten kinda more, more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well, which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. >>So it's become much more acute. And, and to your earlier point, we do live in a different world and and the the past couple of years we could probably just kind of brute for it, right? We could focus on, on the top line. There was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are, are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, How do we truly get value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale it data, not just from a a technology and infrastructure perspective, but how do you actually scale data from an organizational perspective, right? You said at the the people and process, how do we do that at scale? And that's only, only only becoming much more important. And we do believe that the, the economic environment that we find ourselves in today is gonna be catalyst for organizations to really dig out more seriously if, if, if, if you will, than they maybe have in the have in the best. >>You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated it was gonna get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >>Yeah, absolutely. We, we started Colli in 2008. So in some sense and the, the last kind of financial crisis, and that was really the, the start of Colli where we found product market fit, working with large finance institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis and kind of here we are again in a very different environment, of course 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it data citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we are still relatively early in that, in that journey. >>Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a company and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your, your current momentum? >>Yeah, absolutely. Again, there's, there's a lot of tail organizations that are only maturing the data practices and we've seen it kind of transform or, or, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world where it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners like Google, Amazon, Snowflake, data bricks and, and others, right? As those kind of new modern data infrastructures, modern data architectures that are definitely all moving to the cloud, a great opportunity for us, our partners and of course our customers to help them kind of transition to the cloud even faster. >>And so we see a lot of excitement and momentum there within an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course, data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging ai, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architecture arch architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical so that they're really excited about about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believe. Then federated focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations. And so that aligns really well with our vision and, and from a product perspective, we've seen a lot of momentum with our customers there as well. >>Yeah, you know, a couple things there. I mean, the acquisition of i l dq, you know, Kirk Hasselbeck and, and their team, it's interesting, you know, the whole data quality used to be this back office function and, and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh. You mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So let's chat a little bit about the, the products. We're gonna go deeper in into products later on at, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the, the the under the covers in security, sort of making data more accessible for people just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >>Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission, either customers are still start, are just starting on that, on that journey. We wanna make it as easy as possible for the, for our organization to actually get started because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for really to, to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving. >>A lot of kind of ease of adoption, ease of use, but also then how do we make sure that lio becomes this kind of mission critical enterprise platform from a security performance architecture scale supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme from an innovation perspective, From a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction, how to make it easy. How do we make, how do we make available to true kind of shopping experience that anybody in your organization can, in a very easy search first way, find the right data product, find the right dataset, that data can then consume usage analytics. How do you, how do we help organizations drive adoption, tell them where they're working really well and where they have opportunities homepages again to, to make things easy for, for people, for anyone in your organization to kind of get started with ppia, you mentioned workflow designer, again, we have a very powerful enterprise platform. >>One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around K Bear Protect, which in partnership with Snowflake, which has been a strategic investor in kib, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data is managed as much more effective, effective rate, really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. >>So we launch more data quality cloud product as well as making use of those, those native compute capabilities in platforms like Snowflake, Data, Bricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down. So actually pushing down the computer and data quality, the monitoring into the underlying platform, which again, from a scale performance and ease of use perspective is gonna make a massive difference. And then more broadly, we, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So there's a lot coming out. The, the team has been work at work really hard and we are really, really excited about what we are coming, what we're bringing to markets. >>Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you, you talked about, you know, the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard. So how do you see sort of the future and, you know, give us the, your closing thoughts please? >>Yeah, absolutely. And I, and I think we we're really at this pivotal moment, and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not gonna fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can. It's kind of, of our, it's our mission. And so I'm really, really excited to see what we, what we are gonna, how the marks gonna evolve over the next, next few quarters and years. I think the trend is clearly there when we talk about data mesh, this kind of federated approach folks on data products is just another signal that we believe that a lot of our organization are now at the time. >>The understanding need to go beyond just the technology. I really, really think about how do we actually scale data as a business function, just like we've done with it, with, with hr, with, with sales and marketing, with finance. That's how we need to think about data. I think now is the time given the economic environment that we are in much more focus on control, much more focused on productivity efficiency and now's the time. We need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >>Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much and good luck in, in San Diego. I know you're gonna crush it out there. >>Thank you Dave. >>Yeah, it's a great spot for an in-person event and, and of course the content post event is gonna be available@collibra.com and you can of course catch the cube coverage@thecube.net and all the news@siliconangle.com. This is Dave Valante for the cube, your leader in enterprise and emerging tech coverage. >>Hi, I'm Jay from Collibra's Data Office. Today I want to talk to you about Collibra's data intelligence cloud. We often say Collibra is a single system of engagement for all of your data. Now, when I say data, I mean data in the broadest sense of the word, including reference and metadata. Think of metrics, reports, APIs, systems, policies, and even business processes that produce or consume data. Now, the beauty of this platform is that it ensures all of your users have an easy way to find, understand, trust, and access data. But how do you get started? Well, here are seven steps to help you get going. One, start with the data. What's data intelligence? Without data leverage the Collibra data catalog to automatically profile and classify your enterprise data wherever that data lives, databases, data lakes or data warehouses, whether on the cloud or on premise. >>Two, you'll then wanna organize the data and you'll do that with data communities. This can be by department, find a business or functional team, however your organization organizes work and accountability. And for that you'll establish community owners, communities, make it easy for people to navigate through the platform, find the data and will help create a sense of belonging for users. An important and related side note here, we find it's typical in many organizations that data is thought of is just an asset and IT and data offices are viewed as the owners of it and who are really the central teams performing analytics as a service provider to the enterprise. We believe data is more than an asset, it's a true product that can be converted to value. And that also means establishing business ownership of data where that strategy and ROI come together with subject matter expertise. >>Okay, three. Next, back to those communities there, the data owners should explain and define their data, not just the tables and columns, but also the related business terms, metrics and KPIs. These objects we call these assets are typically organized into business glossaries and data dictionaries. I definitely recommend starting with the topics that are most important to the business. Four, those steps that enable you and your users to have some fun with it. Linking everything together builds your knowledge graph and also known as a metadata graph by linking or relating these assets together. For example, a data set to a KPI to a report now enables your users to see what we call the lineage diagram that visualizes where the data in your dashboards actually came from and what the data means and who's responsible for it. Speaking of which, here's five. Leverage the calibra trusted business reporting solution on the marketplace, which comes with workflows for those owners to certify their reports, KPIs, and data sets. >>This helps them force their trust in their data. Six, easy to navigate dashboards or landing pages right in your platform for your company's business processes are the most effective way for everyone to better understand and take action on data. Here's a pro tip, use the dashboard design kit on the marketplace to help you build compelling dashboards. Finally, seven, promote the value of this to your users and be sure to schedule enablement office hours and new employee onboarding sessions to get folks excited about what you've built and implemented. Better yet, invite all of those community and data owners to these sessions so that they can show off the value that they've created. Those are my seven tips to get going with Collibra. I hope these have been useful. For more information, be sure to visit collibra.com. >>Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. My name is Dave Valante. With us is Kirk Hasselbeck, who's the vice president of Data Quality of Collibra Kirk, good to see you. Welcome. >>Thanks for having me, Dave. Excited to be here. >>You bet. Okay, we're gonna discuss data quality observability. It's a hot trend right now. You founded a data quality company, OWL dq, and it was acquired by Collibra last year. Congratulations. And now you lead data quality at Collibra. So we're hearing a lot about data quality right now. Why is it such a priority? Take us through your thoughts on that. >>Yeah, absolutely. It's, it's definitely exciting times for data quality, which you're right, has been around for a long time. So why now and why is it so much more exciting than it used to be? I think it's a bit stale, but we all know that companies use more data than ever before and the variety has changed and the volume has grown. And, and while I think that remains true, there are a couple other hidden factors at play that everyone's so interested in as, as to why this is becoming so important now. And, and I guess you could kind of break this down simply and think about if Dave, you and I were gonna build, you know, a new healthcare application and monitor the heartbeat of individuals, imagine if we get that wrong, you know, what the ramifications could be, what, what those incidents would look like, or maybe better yet, we try to build a, a new trading algorithm with a crossover strategy where the 50 day crosses the, the 10 day average. >>And imagine if the data underlying the inputs to that is incorrect. We will probably have major financial ramifications in that sense. So, you know, it kind of starts there where everybody's realizing that we're all data companies and if we are using bad data, we're likely making incorrect business decisions. But I think there's kind of two other things at play. You know, I, I bought a car not too long ago and my dad called and said, How many cylinders does it have? And I realized in that moment, you know, I might have failed him because, cause I didn't know. And, and I used to ask those types of questions about any lock brakes and cylinders and, and you know, if it's manual or, or automatic and, and I realized I now just buy a car that I hope works. And it's so complicated with all the computer chips, I, I really don't know that much about it. >>And, and that's what's happening with data. We're just loading so much of it. And it's so complex that the way companies consume them in the IT function is that they bring in a lot of data and then they syndicate it out to the business. And it turns out that the, the individuals loading and consuming all of this data for the company actually may not know that much about the data itself, and that's not even their job anymore. So we'll talk more about that in a minute, but that's really what's setting the foreground for this observability play and why everybody's so interested. It, it's because we're becoming less close to the intricacies of the data and we just expect it to always be there and be correct. >>You know, the other thing too about data quality, and for years we did the MIT CDO IQ event, we didn't do it last year, Covid messed everything up. But the observation I would make there thoughts is, is it data quality? Used to be information quality used to be this back office function, and then it became sort of front office with financial services and government and healthcare, these highly regulated industries. And then the whole chief data officer thing happened and people were realizing, well, they sort of flipped the bit from sort of a data as a, a risk to data as a, as an asset. And now as we say, we're gonna talk about observability. And so it's really become front and center just the whole quality issue because data's so fundamental, hasn't it? >>Yeah, absolutely. I mean, let's imagine we pull up our phones right now and I go to my, my favorite stock ticker app and I check out the NASDAQ market cap. I really have no idea if that's the correct number. I know it's a number, it looks large, it's in a numeric field. And, and that's kind of what's going on. There's, there's so many numbers and they're coming from all of these different sources and data providers and they're getting consumed and passed along. But there isn't really a way to tactically put controls on every number and metric across every field we plan to monitor, but with the scale that we've achieved in early days, even before calibra. And what's been so exciting is we have these types of observation techniques, these data monitors that can actually track past performance of every field at scale. And why that's so interesting and why I think the CDO is, is listening right intently nowadays to this topic is, so maybe we could surface all of these problems with the right solution of data observability and with the right scale and then just be alerted on breaking trends. So we're sort of shifting away from this world of must write a condition and then when that condition breaks, that was always known as a break record. But what about breaking trends and root cause analysis? And is it possible to do that, you know, with less human intervention? And so I think most people are seeing now that it's going to have to be a software tool and a computer system. It's, it's not ever going to be based on one or two domain experts anymore. >>So, So how does data observability relate to data quality? Are they sort of two sides of the same coin? Are they, are they cousins? What's your perspective on that? >>Yeah, it's, it's super interesting. It's an emerging market. So the language is changing a lot of the topic and areas changing the way that I like to say it or break it down because the, the lingo is constantly moving is, you know, as a target on this space is really breaking records versus breaking trends. And I could write a condition when this thing happens, it's wrong and when it doesn't it's correct. Or I could look for a trend and I'll give you a good example. You know, everybody's talking about fresh data and stale data and, and why would that matter? Well, if your data never arrived or only part of it arrived or didn't arrive on time, it's likely stale and there will not be a condition that you could write that would show you all the good in the bads. That was kind of your, your traditional approach of data quality break records. But your modern day approach is you lost a significant portion of your data, or it did not arrive on time to make that decision accurately on time. And that's a hidden concern. Some people call this freshness, we call it stale data, but it all points to the same idea of the thing that you're observing may not be a data quality condition anymore. It may be a breakdown in the data pipeline. And with thousands of data pipelines in play for every company out there there, there's more than a couple of these happening every day. >>So what's the Collibra angle on all this stuff made the acquisition, you got data quality observability coming together, you guys have a lot of expertise in, in this area, but you hear providence of data, you just talked about, you know, stale data, you know, the, the whole trend toward real time. How is Calibra approaching the problem and what's unique about your approach? >>Well, I think where we're fortunate is with our background, myself and team, we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade ago. And we saw it from many different angles. And what we came up with before it was called data observability or reliability was basically the, the underpinnings of that. So we're a little bit ahead of the curve there when most people evaluate our solution, it's more advanced than some of the observation techniques that that currently exist. But we've also always covered data quality and we believe that people want to know more, they need more insights, and they want to see break records and breaking trends together so they can correlate the root cause. And we hear that all the time. I have so many things going wrong, just show me the big picture, help me find the thing that if I were to fix it today would make the most impact. So we're really focused on root cause analysis, business impact, connecting it with lineage and catalog metadata. And as that grows, you can actually achieve total data governance at this point with the acquisition of what was a Lineage company years ago, and then my company Ldq now Collibra, Data quality Collibra may be the best positioned for total data governance and intelligence in the space. >>Well, you mentioned financial services a couple of times and some examples, remember the flash crash in 2010. Nobody had any idea what that was, you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. So this is a really interesting topic to me. So we know at Data Citizens 22 that you're announcing, you gotta announce new products, right? You're yearly event what's, what's new. Give us a sense as to what products are coming out, but specifically around data quality and observability. >>Absolutely. There's this, you know, there's always a next thing on the forefront. And the one right now is these hyperscalers in the cloud. So you have databases like Snowflake and Big Query and Data Bricks is Delta Lake and SQL Pushdown. And ultimately what that means is a lot of people are storing in loading data even faster in a SaaS like model. And we've started to hook in to these databases. And while we've always worked with the the same databases in the past, they're supported today we're doing something called Native Database pushdown, where the entire compute and data activity happens in the database. And why that is so interesting and powerful now is everyone's concerned with something called Egress. Did your, my data that I've spent all this time and money with my security team securing ever leave my hands, did it ever leave my secure VPC as they call it? >>And with these native integrations that we're building and about to unveil, here's kind of a sneak peek for, for next week at Data Citizens. We're now doing all compute and data operations in databases like Snowflake. And what that means is with no install and no configuration, you could log into the Collibra data quality app and have all of your data quality running inside the database that you've probably already picked as your your go forward team selection secured database of choice. So we're really excited about that. And I think if you look at the whole landscape of network cost, egress, cost, data storage and compute, what people are realizing is it's extremely efficient to do it in the way that we're about to release here next week. >>So this is interesting because what you just described, you know, you mentioned Snowflake, you mentioned Google, Oh actually you mentioned yeah, data bricks. You know, Snowflake has the data cloud. If you put everything in the data cloud, okay, you're cool, but then Google's got the open data cloud. If you heard, you know, Google next and now data bricks doesn't call it the data cloud, but they have like the open source data cloud. So you have all these different approaches and there's really no way up until now I'm, I'm hearing to, to really understand the relationships between all those and have confidence across, you know, it's like Jak Dani, you should just be a note on the mesh. And I don't care if it's a data warehouse or a data lake or where it comes from, but it's a point on that mesh and I need tooling to be able to have confidence that my data is governed and has the proper lineage, providence. And, and, and that's what you're bringing to the table, Is that right? Did I get that right? >>Yeah, that's right. And it's, for us, it's, it's not that we haven't been working with those great cloud databases, but it's the fact that we can send them the instructions now, we can send them the, the operating ability to crunch all of the calculations, the governance, the quality, and get the answers. And what that's doing, it's basically zero network costs, zero egress cost, zero latency of time. And so when you were to log into Big Query tomorrow using our tool or like, or say Snowflake for example, you have instant data quality metrics, instant profiling, instant lineage and access privacy controls, things of that nature that just become less onerous. What we're seeing is there's so much technology out there, just like all of the major brands that you mentioned, but how do we make it easier? The future is about less clicks, faster time to value, faster scale, and eventually lower cost. And, and we think that this positions us to be the leader there. >>I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, and of course now we're seeing that the ecosystem is finding so much white space to add value, connect across cloud. Sometimes we call it super cloud and so, or inter clouding. All right, Kirk, give us your, your final thoughts and on on the trends that we've talked about and Data Citizens 22. >>Absolutely. Well, I think, you know, one big trend is discovery and classification. Seeing that across the board, people used to know it was a zip code and nowadays with the amount of data that's out there, they wanna know where everything is, where their sensitive data is. If it's redundant, tell me everything inside of three to five seconds. And with that comes, they want to know in all of these hyperscale databases how fast they can get controls and insights out of their tools. So I think we're gonna see more one click solutions, more SAS based solutions and solutions that hopefully prove faster time to value on, on all of these modern cloud platforms. >>Excellent. All right, Kurt Hasselbeck, thanks so much for coming on the Cube and previewing Data Citizens 22. Appreciate it. >>Thanks for having me, Dave. >>You're welcome. Right, and thank you for watching. Keep it right there for more coverage from the Cube. Welcome to the Cube's virtual Coverage of Data Citizens 2022. My name is Dave Valante and I'm here with Laura Sellers, who's the Chief Product Officer at Collibra, the host of Data Citizens. Laura, welcome. Good to see you. >>Thank you. Nice to be here. >>Yeah, your keynote at Data Citizens this year focused on, you know, your mission to drive ease of use and scale. Now when I think about historically fast access to the right data at the right time in a form that's really easily consumable, it's been kind of challenging, especially for business users. Can can you explain to our audience why this matters so much and what's actually different today in the data ecosystem to make this a reality? >>Yeah, definitely. So I think what we really need and what I hear from customers every single day is that we need a new approach to data management and our product teams. What inspired me to come to Calibra a little bit a over a year ago was really the fact that they're very focused on bringing trusted data to more users across more sources for more use cases. And so as we look at what we're announcing with these innovations of ease of use and scale, it's really about making teams more productive in getting started with and the ability to manage data across the entire organization. So we've been very focused on richer experiences, a broader ecosystem of partners, as well as a platform that delivers performance, scale and security that our users and teams need and demand. So as we look at, Oh, go ahead. >>I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it was just so complicated. But, but please carry on. I'd love to hear more about this. >>Yeah, I, I really, you know, Collibra is a system of engagement for data and we really are working on bringing that entire system of engagement to life for everyone to leverage here and now. So what we're announcing from our ease of use side of the world is first our data marketplace. This is the ability for all users to discover and access data quickly and easily shop for it, if you will. The next thing that we're also introducing is the new homepage. It's really about the ability to drive adoption and have users find data more quickly. And then the two more areas of the ease of use side of the world is our world of usage analytics. And one of the big pushes and passions we have at Collibra is to help with this data driven culture that all companies are trying to create. And also helping with data literacy, with something like usage analytics, it's really about driving adoption of the CLE platform, understanding what's working, who's accessing it, what's not. And then finally we're also introducing what's called workflow designer. And we love our workflows at Libra, it's a big differentiator to be able to automate business processes. The designer is really about a way for more people to be able to create those workflows, collaborate on those workflow flows, as well as people to be able to easily interact with them. So a lot of exciting things when it comes to ease of use to make it easier for all users to find data. >>Y yes, there's definitely a lot to unpack there. I I, you know, you mentioned this idea of, of of, of shopping for the data. That's interesting to me. Why this analogy, metaphor or analogy, I always get those confused. I let's go with analogy. Why is it so important to data consumers? >>I think when you look at the world of data, and I talked about this system of engagement, it's really about making it more accessible to the masses. And what users are used to is a shopping experience like your Amazon, if you will. And so having a consumer grade experience where users can quickly go in and find the data, trust that data, understand where the data's coming from, and then be able to quickly access it, is the idea of being able to shop for it, just making it as simple as possible and really speeding the time to value for any of the business analysts, data analysts out there. >>Yeah, I think when you, you, you see a lot of discussion about rethinking data architectures, putting data in the hands of the users and business people, decentralized data and of course that's awesome. I love that. But of course then you have to have self-service infrastructure and you have to have governance. And those are really challenging. And I think so many organizations, they're facing adoption challenges, you know, when it comes to enabling teams generally, especially domain experts to adopt new data technologies, you know, like the, the tech comes fast and furious. You got all these open source projects and get really confusing. Of course it risks security, governance and all that good stuff. You got all this jargon. So where do you see, you know, the friction in adopting new data technologies? What's your point of view and how can organizations overcome these challenges? >>You're, you're dead on. There's so much technology and there's so much to stay on top of, which is part of the friction, right? It's just being able to stay ahead of, of and understand all the technologies that are coming. You also look at as there's so many more sources of data and people are migrating data to the cloud and they're migrating to new sources. Where the friction comes is really that ability to understand where the data came from, where it's moving to, and then also to be able to put the access controls on top of it. So people are only getting access to the data that they should be getting access to. So one of the other things we're announcing with, with all of the innovations that are coming is what we're doing around performance and scale. So with all of the data movement, with all of the data that's out there, the first thing we're launching in the world of performance and scale is our world of data quality. >>It's something that Collibra has been working on for the past year and a half, but we're launching the ability to have data quality in the cloud. So it's currently an on-premise offering, but we'll now be able to carry that over into the cloud for us to manage that way. We're also introducing the ability to push down data quality into Snowflake. So this is, again, one of those challenges is making sure that that data that you have is d is is high quality as you move forward. And so really another, we're just reducing friction. You already have Snowflake stood up. It's not another machine for you to manage, it's just push down capabilities into Snowflake to be able to track that quality. Another thing that we're launching with that is what we call Collibra Protect. And this is that ability for users to be able to ingest metadata, understand where the PII data is, and then set policies up on top of it. So very quickly be able to set policies and have them enforced at the data level. So anybody in the organization is only getting access to the data they should have access to. >>Here's Topica data quality is interesting. It's something that I've followed for a number of years. It used to be a back office function, you know, and really confined only to highly regulated industries like financial services and healthcare and government. You know, you look back over a decade ago, you didn't have this worry about personal information, g gdpr, and, you know, California Consumer Privacy Act all becomes, becomes so much important. The cloud is really changed things in terms of performance and scale and of course partnering for, for, with Snowflake it's all about sharing data and monetization, anything but a back office function. So it was kind of smart that you guys were early on and of course attracting them and as a, as an investor as well was very strong validation. What can you tell us about the nature of the relationship with Snowflake and specifically inter interested in sort of joint engineering or, and product innovation efforts, you know, beyond the standard go to market stuff? >>Definitely. So you mentioned there were a strategic investor in Calibra about a year ago. A little less than that I guess. We've been working with them though for over a year really tightly with their product and engineering teams to make sure that Collibra is adding real value. Our unified platform is touching pieces of our unified platform or touching all pieces of Snowflake. And when I say that, what I mean is we're first, you know, able to ingest data with Snowflake, which, which has always existed. We're able to profile and classify that data we're announcing with Calibra Protect this week that you're now able to create those policies on top of Snowflake and have them enforce. So again, people can get more value out of their snowflake more quickly as far as time to value with, with our policies for all business users to be able to create. >>We're also announcing Snowflake Lineage 2.0. So this is the ability to take stored procedures in Snowflake and understand the lineage of where did the data come from, how was it transformed with within Snowflake as well as the data quality. Pushdown, as I mentioned, data quality, you brought it up. It is a new, it is a, a big industry push and you know, one of the things I think Gartner mentioned is people are losing up to $15 million without having great data quality. So this push down capability for Snowflake really is again, a big ease of use push for us at Collibra of that ability to, to push it into snowflake, take advantage of the data, the data source, and the engine that already lives there and get the right and make sure you have the right quality. >>I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, you know, high degree of confidence that the data sharing can be done in a safe way. Bringing, you know, Collibra into the, into the story allows me to have that data quality and, and that governance that I, that I need. You know, we've said many times on the cube that one of the notable differences in cloud this decade versus last decade, I mean ob there are obvious differences just in terms of scale and scope, but it's shaping up to be about the strength of the ecosystems. That's really a hallmark of these big cloud players. I mean they're, it's a key factor for innovating, accelerating product delivery, filling gaps in, in the hyperscale offerings cuz you got more stack, you know, mature stack capabilities and you know, it creates this flywheel momentum as we often say. But, so my question is, how do you work with the hyperscalers? Like whether it's AWS or Google, whomever, and what do you see as your role and what's the Collibra sweet spot? >>Yeah, definitely. So, you know, one of the things I mentioned early on is the broader ecosystem of partners is what it's all about. And so we have that strong partnership with Snowflake. We also are doing more with Google around, you know, GCP and kbra protect there, but also tighter data plex integration. So similar to what you've seen with our strategic moves around Snowflake and, and really covering the broad ecosystem of what Collibra can do on top of that data source. We're extending that to the world of Google as well and the world of data plex. We also have great partners in SI's Infosys is somebody we spoke with at the conference who's done a lot of great work with Levi's as they're really important to help people with their whole data strategy and driving that data driven culture and, and Collibra being the core of it. >>Hi Laura, we're gonna, we're gonna end it there, but I wonder if you could kind of put a bow on, you know, this year, the event your, your perspectives. So just give us your closing thoughts. >>Yeah, definitely. So I, I wanna say this is one of the biggest releases Collibra's ever had. Definitely the biggest one since I've been with the company a little over a year. We have all these great new product innovations coming to really drive the ease of use to make data more valuable for users everywhere and, and companies everywhere. And so it's all about everybody being able to easily find, understand, and trust and get access to that data going forward. >>Well congratulations on all the pro progress. It was great to have you on the cube first time I believe, and really appreciate you, you taking the time with us. >>Yes, thank you for your time. >>You're very welcome. Okay, you're watching the coverage of Data Citizens 2022 on the cube, your leader in enterprise and emerging tech coverage. >>So data modernization oftentimes means moving some of your storage and computer to the cloud where you get the benefit of scale and security and so on. But ultimately it doesn't take away the silos that you have. We have more locations, more tools and more processes with which we try to get value from this data. To do that at scale in an organization, people involved in this process, they have to understand each other. So you need to unite those people across those tools, processes, and systems with a shared language. When I say customer, do you understand the same thing as you hearing customer? Are we counting them in the same way so that shared language unites us and that gives the opportunity for the organization as a whole to get the maximum value out of their data assets and then they can democratize data so everyone can properly use that shared language to find, understand, and trust the data asset that's available. >>And that's where Collibra comes in. We provide a centralized system of engagement that works across all of those locations and combines all of those different user types across the whole business. At Collibra, we say United by data and that also means that we're united by data with our customers. So here is some data about some of our customers. There was the case of an online do it yourself platform who grew their revenue almost three times from a marketing campaign that provided the right product in the right hands of the right people. In other case that comes to mind is from a financial services organization who saved over 800 K every year because they were able to reuse the same data in different kinds of reports and before there was spread out over different tools and processes and silos, and now the platform brought them together so they realized, oh, we're actually using the same data, let's find a way to make this more efficient. And the last example that comes to mind is that of a large home loan, home mortgage, mortgage loan provider where they have a very complex landscape, a very complex architecture legacy in the cloud, et cetera. And they're using our software, they're using our platform to unite all the people and those processes and tools to get a common view of data to manage their compliance at scale. >>Hey everyone, I'm Lisa Martin covering Data Citizens 22, brought to you by Collibra. This next conversation is gonna focus on the importance of data culture. One of our Cube alumni is back, Stan Christians is Collibra's co-founder and it's Chief Data citizens. Stan, it's great to have you back on the cube. >>Hey Lisa, nice to be. >>So we're gonna be talking about the importance of data culture, data intelligence, maturity, all those great things. When we think about the data revolution that every business is going through, you know, it's so much more than technology innovation. It also really re requires cultural transformation, community transformation. Those are challenging for customers to undertake. Talk to us about what you mean by data citizenship and the role that creating a data culture plays in that journey. >>Right. So as you know, our event is called Data Citizens because we believe that in the end, a data citizen is anyone who uses data to do their job. And we believe that today's organizations, you have a lot of people, most of the employees in an organization are somehow gonna to be a data citizen, right? So you need to make sure that these people are aware of it. You need that. People have skills and competencies to do with data what necessary and that's on, all right? So what does it mean to have a good data culture? It means that if you're building a beautiful dashboard to try and convince your boss, we need to make this decision that your boss is also open to and able to interpret, you know, the data presented in dashboard to actually make that decision and take that action. Right? >>And once you have that why to the organization, that's when you have a good data culture. Now that's continuous effort for most organizations because they're always moving, somehow they're hiring new people and it has to be continuous effort because we've seen that on the hand. Organizations continue challenged their data sources and where all the data is flowing, right? Which in itself creates a lot of risk. But also on the other set hand of the equation, you have the benefit. You know, you might look at regulatory drivers like, we have to do this, right? But it's, it's much better right now to consider the competitive drivers, for example, and we did an IDC study earlier this year, quite interesting. I can recommend anyone to it. And one of the conclusions they found as they surveyed over a thousand people across organizations worldwide is that the ones who are higher in maturity. >>So the, the organizations that really look at data as an asset, look at data as a product and actively try to be better at it, don't have three times as good a business outcome as the ones who are lower on the maturity scale, right? So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them up as data citizens. I'm doing this for competitive reasons, I'm doing this re reasons you're trying to bring both of those together and the ones that get data intelligence right, are successful and competitive. That's, and that's what we're seeing out there in the market. >>Absolutely. We know that just generally stand right, the organizations that are, are really creating a, a data culture and enabling everybody within the organization to become data citizens are, We know that in theory they're more competitive, they're more successful. But the IDC study that you just mentioned demonstrates they're three times more successful and competitive than their peers. Talk about how Collibra advises customers to create that community, that culture of data when it might be challenging for an organization to adapt culturally. >>Of course, of course it's difficult for an organization to adapt but it's also necessary, as you just said, imagine that, you know, you're a modern day organization, laptops, what have you, you're not using those, right? Or you know, you're delivering them throughout organization, but not enabling your colleagues to actually do something with that asset. Same thing as through with data today, right? If you're not properly using the data asset and competitors are, they're gonna to get more advantage. So as to how you get this done, establish this. There's angles to look at, Lisa. So one angle is obviously the leadership whereby whoever is the boss of data in the organization, you typically have multiple bosses there, like achieve data officers. Sometimes there's, there's multiple, but they may have a different title, right? So I'm just gonna summarize it as a data leader for a second. >>So whoever that is, they need to make sure that there's a clear vision, a clear strategy for data. And that strategy needs to include the monetization aspect. How are you going to get value from data? Yes. Now that's one part because then you can leadership in the organization and also the business value. And that's important. Cause those people, their job in essence really is to make everyone in the organization think about data as an asset. And I think that's the second part of the equation of getting that right, is it's not enough to just have that leadership out there, but you also have to get the hearts and minds of the data champions across the organization. You, I really have to win them over. And if you have those two combined and obviously a good technology to, you know, connect those people and have them execute on their responsibilities such as a data intelligence platform like s then the in place to really start upgrading that culture inch by inch if you'll, >>Yes, I like that. The recipe for success. So you are the co-founder of Collibra. You've worn many different hats along this journey. Now you're building Collibra's own data office. I like how before we went live, we were talking about Calibra is drinking its own champagne. I always loved to hear stories about that. You're speaking at Data Citizens 2022. Talk to us about how you are building a data culture within Collibra and what maybe some of the specific projects are that Collibra's data office is working on. >>Yes, and it is indeed data citizens. There are a ton of speaks here, are very excited. You know, we have Barb from m MIT speaking about data monetization. We have Dilla at the last minute. So really exciting agen agenda. Can't wait to get back out there essentially. So over the years at, we've doing this since two and eight, so a good years and I think we have another decade of work ahead in the market, just to be very clear. Data is here to stick around as are we. And myself, you know, when you start a company, we were for people in a, if you, so everybody's wearing all sorts of hat at time. But over the years I've run, you know, presales that sales partnerships, product cetera. And as our company got a little bit biggish, we're now thousand two. Something like people in the company. >>I believe systems and processes become a lot important. So we said you CBRA isn't the size our customers we're getting there in of organization structure, process systems, et cetera. So we said it's really time for us to put our money where is and to our own data office, which is what we were seeing customers', organizations worldwide. And they organizations have HR units, they have a finance unit and over time they'll all have a department if you'll, that is responsible somehow for the data. So we said, ok, let's try to set an examples that other people can take away with it, right? Can take away from it. So we set up a data strategy, we started building data products, took care of the data infrastructure. That's sort of good stuff. And in doing all of that, ISA exactly as you said, we said, okay, we need to also use our product and our own practices and from that use, learn how we can make the product better, learn how we make, can make the practice better and share that learning with all the, and on, on the Monday mornings, we sometimes refer to eating our dog foods on Friday evenings. >>We referred to that drinking our own champagne. I like it. So we, we had a, we had the driver to do this. You know, there's a clear business reason. So we involved, we included that in the data strategy and that's a little bit of our origin. Now how, how do we organize this? We have three pillars, and by no means is this a template that everyone should, this is just the organization that works at our company, but it can serve as an inspiration. So we have a pillar, which is data science. The data product builders, if you'll or the people who help the business build data products. We have the data engineers who help keep the lights on for that data platform to make sure that the products, the data products can run, the data can flow and you know, the quality can be checked. >>And then we have a data intelligence or data governance builders where we have those data governance, data intelligence stakeholders who help the business as a sort of data partner to the business stakeholders. So that's how we've organized it. And then we started following the CBRA approach, which is, well, what are the challenges that our business stakeholders have in hr, finance, sales, marketing all over? And how can data help overcome those challenges? And from those use cases, we then just started to build a map and started execution use of the use case. And a important ones are very simple. We them with our, our customers as well, people talking about the cata, right? The catalog for the data scientists to know what's in their data lake, for example, and for the people in and privacy. So they have their process registry and they can see how the data flows. >>So that's a starting place and that turns into a marketplace so that if new analysts and data citizens join kbra, they immediately have a place to go to, to look at, see, ok, what data is out there for me as an analyst or a data scientist or whatever to do my job, right? So they can immediately get access data. And another one that we is around trusted business. We're seeing that since, you know, self-service BI allowed everyone to make beautiful dashboards, you know, pie, pie charts. I always, my pet pee is the pie chart because I love buy and you shouldn't always be using pie charts. But essentially there's become proliferation of those reports. And now executives don't really know, okay, should I trust this report or that report the reporting on the same thing. But the numbers seem different, right? So that's why we have trusted this reporting. So we know if a, the dashboard, a data product essentially is built, we not that all the right steps are being followed and that whoever is consuming that can be quite confident in the result either, Right. And that silver browser, right? Absolutely >>Decay. >>Exactly. Yes, >>Absolutely. Talk a little bit about some of the, the key performance indicators that you're using to measure the success of the data office. What are some of those KPIs? >>KPIs and measuring is a big topic in the, in the data chief data officer profession, I would say, and again, it always varies with to your organization, but there's a few that we use that might be of interest. Use those pillars, right? And we have metrics across those pillars. So for example, a pillar on the data engineering side is gonna be more related to that uptime, right? Are the, is the data platform up and running? Are the data products up and running? Is the quality in them good enough? Is it going up? Is it going down? What's the usage? But also, and especially if you're in the cloud and if consumption's a big thing, you have metrics around cost, for example, right? So that's one set of examples. Another one is around the data sciences and products. Are people using them? Are they getting value from it? >>Can we calculate that value in ay perspective, right? Yeah. So that we can to the rest of the business continue to say we're tracking all those numbers and those numbers indicate that value is generated and how much value estimated in that region. And then you have some data intelligence, data governance metrics, which is, for example, you have a number of domains in a data mesh. People talk about being the owner of a data domain, for example, like product or, or customer. So how many of those domains do you have covered? How many of them are already part of the program? How many of them have owners assigned? How well are these owners organized, executing on their responsibilities? How many tickets are open closed? How many data products are built according to process? And so and so forth. So these are an set of examples of, of KPIs. There's a, there's a lot more, but hopefully those can already inspire the audience. >>Absolutely. So we've, we've talked about the rise cheap data offices, it's only accelerating. You mentioned this is like a 10 year journey. So if you were to look into a crystal ball, what do you see in terms of the maturation of data offices over the next decade? >>So we, we've seen indeed the, the role sort of grow up, I think in, in thousand 10 there may have been like 10 achieve data officers or something. Gartner has exact numbers on them, but then they grew, you know, industries and the number is estimated to be about 20,000 right now. Wow. And they evolved in a sort of stack of competencies, defensive data strategy, because the first chief data officers were more regulatory driven, offensive data strategy support for the digital program. And now all about data products, right? So as a data leader, you now need all of those competences and need to include them in, in your strategy. >>How is that going to evolve for the next couple of years? I wish I had one of those balls, right? But essentially I think for the next couple of years there's gonna be a lot of people, you know, still moving along with those four levels of the stack. A lot of people I see are still in version one and version two of the chief data. So you'll see over the years that's gonna evolve more digital and more data products. So for next years, my, my prediction is it's all products because it's an immediate link between data and, and the essentially, right? Right. So that's gonna be important and quite likely a new, some new things will be added on, which nobody can predict yet. But we'll see those pop up in a few years. I think there's gonna be a continued challenge for the chief officer role to become a real executive role as opposed to, you know, somebody who claims that they're executive, but then they're not, right? >>So the real reporting level into the board, into the CEO for example, will continue to be a challenging point. But the ones who do get that done will be the ones that are successful and the ones who get that will the ones that do it on the basis of data monetization, right? Connecting value to the data and making that value clear to all the data citizens in the organization, right? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences aligned of course. And they'll need to focus on adoption. Again, it's not enough to just have your data office be involved in this. It's really important that you're waking up data citizens across the organization and you make everyone in the organization think about data as an asset. >>Absolutely. Because there's so much value that can be extracted. Organizations really strategically build that data office and democratize access across all those data citizens. Stan, this is an exciting arena. We're definitely gonna keep our eyes on this. Sounds like a lot of evolution and maturation coming from the data office perspective. From the data citizen perspective. And as the data show that you mentioned in that IDC study, you mentioned Gartner as well, organizations have so much more likelihood of being successful and being competitive. So we're gonna watch this space. Stan, thank you so much for joining me on the cube at Data Citizens 22. We appreciate it. >>Thanks for having me over >>From Data Citizens 22, I'm Lisa Martin, you're watching The Cube, the leader in live tech coverage. >>Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra. Remember, all these videos are available on demand@thecube.net. And don't forget to check out silicon angle.com for all the news and wiki bod.com for our weekly breaking analysis series where we cover many data topics and share survey research from our partner ETR Enterprise Technology Research. If you want more information on the products announced at Data Citizens, go to collibra.com. There are tons of resources there. You'll find analyst reports, product demos. It's really worthwhile to check those out. Thanks for watching our program and digging into Data Citizens 2022 on the Cube, your leader in enterprise and emerging tech coverage. We'll see you soon.

Published Date : Nov 2 2022

SUMMARY :

largely about getting the technology to work. Now the cloud is definitely helping with that, but also how do you automate governance? So you can see how data governance has evolved into to say we extract the signal from the noise, and over the, the next couple of days, we're gonna feature some of the So it's a really interesting story that we're thrilled to be sharing And we said at the time, you know, maybe it's time to rethink data innovation. 2020s from the previous decade, and what challenges does that bring for your customers? as data becomes more impactful than important, the level of scrutiny with respect to privacy, So again, I think it just another incentive for organization to now truly look at data You know, I don't know when you guys founded Collibra, if, if you had a sense as to how complicated the last kind of financial crisis, and that was really the, the start of Colli where we found product market Well, that's interesting because, you know, in my observation it takes seven to 10 years to actually build a again, a lot of momentum in the org in, in the, in the markets with some of the cloud partners And the second is that those data pipelines that are now being created in the cloud, I mean, the acquisition of i l dq, you know, So that's really the theme of a lot of the innovation that we're driving. And so that's the big theme from an innovation perspective, One of our key differentiators is the ability to really drive a lot of automation through workflows. So actually pushing down the computer and data quality, one of the key principles you think about monetization. And I, and I think we we're really at this pivotal moment, and I think you said it well. We need to look beyond just the I know you're gonna crush it out there. This is Dave Valante for the cube, your leader in enterprise and Without data leverage the Collibra data catalog to automatically And for that you'll establish community owners, a data set to a KPI to a report now enables your users to see what Finally, seven, promote the value of this to your users and Welcome to the Cube's coverage of Data Citizens 2022 Collibra's customer event. And now you lead data quality at Collibra. imagine if we get that wrong, you know, what the ramifications could be, And I realized in that moment, you know, I might have failed him because, cause I didn't know. And it's so complex that the way companies consume them in the IT function is And so it's really become front and center just the whole quality issue because data's so fundamental, nowadays to this topic is, so maybe we could surface all of these problems with So the language is changing a you know, stale data, you know, the, the whole trend toward real time. we sort of lived this problem for a long time, you know, in, in the Wall Street days about a decade you know, they just said, Oh, it's a glitch, you know, so they didn't understand the root cause of it. And the one right now is these hyperscalers in the cloud. And I think if you look at the whole So this is interesting because what you just described, you know, you mentioned Snowflake, And so when you were to log into Big Query tomorrow using our I love this example because, you know, Barry talks about, wow, the cloud guys are gonna own the world and, Seeing that across the board, people used to know it was a zip code and nowadays Appreciate it. Right, and thank you for watching. Nice to be here. Can can you explain to our audience why the ability to manage data across the entire organization. I was gonna say, you know, when I look back at like the last 10 years, it was all about getting the technology to work and it And one of the big pushes and passions we have at Collibra is to help with I I, you know, you mentioned this idea of, and really speeding the time to value for any of the business analysts, So where do you see, you know, the friction in adopting new data technologies? So one of the other things we're announcing with, with all of the innovations that are coming is So anybody in the organization is only getting access to the data they should have access to. So it was kind of smart that you guys were early on and We're able to profile and classify that data we're announcing with Calibra Protect this week that and get the right and make sure you have the right quality. I mean, the nice thing about Snowflake, if you play in the Snowflake sandbox, you, you, you, you can get sort of a, We also are doing more with Google around, you know, GCP and kbra protect there, you know, this year, the event your, your perspectives. And so it's all about everybody being able to easily It was great to have you on the cube first time I believe, cube, your leader in enterprise and emerging tech coverage. the cloud where you get the benefit of scale and security and so on. And the last example that comes to mind is that of a large home loan, home mortgage, Stan, it's great to have you back on the cube. Talk to us about what you mean by data citizenship and the And we believe that today's organizations, you have a lot of people, And one of the conclusions they found as they So you can say, ok, I'm doing this, you know, data culture for everyone, awakening them But the IDC study that you just mentioned demonstrates they're three times So as to how you get this done, establish this. part of the equation of getting that right, is it's not enough to just have that leadership out Talk to us about how you are building a data culture within Collibra and But over the years I've run, you know, So we said you the data products can run, the data can flow and you know, the quality can be checked. The catalog for the data scientists to know what's in their data lake, and data citizens join kbra, they immediately have a place to go to, Yes, success of the data office. So for example, a pillar on the data engineering side is gonna be more related So how many of those domains do you have covered? to look into a crystal ball, what do you see in terms of the maturation industries and the number is estimated to be about 20,000 right now. How is that going to evolve for the next couple of years? And in that sense, they'll need to have both, you know, technical audiences and non-technical audiences And as the data show that you mentioned in that IDC study, the leader in live tech coverage. Okay, this concludes our coverage of Data Citizens 2022, brought to you by Collibra.

ENTITIES

Entity	Category	Confidence
Laura	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Heineken	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
Laura Sellers	PERSON	0.99+
2008	DATE	0.99+
Collibra	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Von Dala	PERSON	0.99+
Google	ORGANIZATION	0.99+
Felix Van Dema	PERSON	0.99+
seven	QUANTITY	0.99+
Stan Christians	PERSON	0.99+
2010	DATE	0.99+
Lisa	PERSON	0.99+
San Diego	LOCATION	0.99+
Jay	PERSON	0.99+
50 day	QUANTITY	0.99+
Felix	PERSON	0.99+
one	QUANTITY	0.99+
Kurt Hasselbeck	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
10 year	QUANTITY	0.99+
California Consumer Privacy Act	TITLE	0.99+
10 day	QUANTITY	0.99+
Six	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Dave Ante	PERSON	0.99+
Last year	DATE	0.99+
demand@thecube.net	OTHER	0.99+
ETR Enterprise Technology Research	ORGANIZATION	0.99+
Barry	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
Python	TITLE	0.99+
2010s	DATE	0.99+
2020s	DATE	0.99+
Calibra	LOCATION	0.99+
last year	DATE	0.99+
two	QUANTITY	0.99+
Calibra	ORGANIZATION	0.99+
K Bear Protect	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Kirk Hasselbeck	PERSON	0.99+
12 months	QUANTITY	0.99+
tomorrow	DATE	0.99+
AWS	ORGANIZATION	0.99+
Barb	PERSON	0.99+
Stan	PERSON	0.99+
Data Citizens	ORGANIZATION	0.99+

Felix Van de Maele, Collibra | Data Citizens '22

(upbeat music) >> Last year, the Cube covered Data Citizens, Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hadoop movement. We had data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of AI, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back, and we're pleased to have Felix Van de Maele, who is the founder and CEO of Collibra. He's on the Cube. We're excited to have you, Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear. And that's really true, as well, in the world of data. So what's different in your mind in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely. And I think you said it well, Dave, in the intro that rising complexity and fragmentation in the broader data landscape that hasn't gotten any better over the last couple of years. When we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten kind of more difficult. So that trend is continuing. I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under with respect to data, as data becomes more mission critical, as data becomes more impactful and important, the level of scrutiny with respect to privacy, security, regulatory compliance, is only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. So it's become much more acute. And to your earlier point, we do live in a different world, and the past couple of years, we could probably just kind of brute force it, right? We could focus on the top line. There was enough kind of investments to be had. I think nowadays organizations are focused, or are in a very different environment where there's much more focus on cost control, productivity, efficiency. How do we truly get value from that data? So again, I think it's just another incentive for organizations to now truly look at that data and to scale that data, not just from a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? Like you said, the people and process, how do we do that at scale? And that's only becoming much more important. And we do believe that the economic environment that we find ourselves in today is going to be a catalyst for organizations to really take that more seriously if you will than they maybe have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your mission, and what are you doing to address these challenges? >> Yeah, absolutely. We started Collibra in 2008. So in some sense in the last kind of financial crisis. And that was really the start of Collibra, where we found product market fit working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the financial crisis, and kind of here we are again in a very different environment of course, 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case, and across every source, frankly has only become more important. So while it's been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to, again, be able to provide everyone, and that's why we call it Data Citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy manner. That mission is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we're still relatively early in that journey. >> Well, that's interesting because, you know, in my observation, it takes seven to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwinds, organizations are only maturing their data practices, and we've seen it kind of transform, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world, whether it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the market with some of the cloud partners like Google, Amazon, Snowflake, Databricks, and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging AI, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical. So we're really excited about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale the data organizations, and so that aligns really well with our vision, and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know, Kirk Haslbeck and their team, it's interesting, you know, the whole data quality used to be this back office function and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh, you mentioned. You guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're a critical part of many ecosystems, and you're developing your own ecosystem. So let's chat a little bit about the products. We're going to go deeper into products later on at Data Citizens '22, but we know you're debuting some new innovations, you know, whether it's, you know, the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme, and like I said, we're still relatively early in this journey towards kind of that mission of data intelligence, that really bold and compelling mission. Either customers are just starting on that journey, and we want to make it as easy as possible for the organization to actually get started, because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for, really to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that as Collibra becomes this kind of mission critical enterprise platform from a security performance architecture scale, supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction. How do we make it easy? How do we make available a true kind of shopping experience so that anybody in your organization can, in a very easy search first way, find the right data product, find the right data set that data can then consume, use its analytics. How do we help organizations drive adoption, tell them where they're working really well, and where they have opportunities. Home pages, again, to make things easy for people, for anyone in your organization, to kind of get started with Collibra. You mentioned workflow designer, again, we have a very powerful enterprise platform. One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code, kind of workflow designer experience. So really customers can take it to the next level. There's a lot more new product around Collibra Protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data, is managed in a much more effective way. Really excited about that product. There's more around data quality. Again, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. So we launched our data quality cloud product as well as making use of those native compute capabilities in platforms like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability that we call push down. So we're actually pushing down the computer and data quality, the monitoring, into the underlying platform, which again, from a scale performance and ease of use perspective is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations that we talk about, being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure, and Google Cloud Storage as well. So there's a lot coming out. The team has been at work really hard, and we are really, really excited about what we are coming, what we're bringing to markets. >> Yeah, a lot going on there. I wonder if you could give us your closing thoughts. I mean, you talked about the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles. You think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard, so how do you see sort of the future? And, you know, give us your closing thoughts please. >> Yeah, absolutely. And I think we're really at this pivotal moment, and I think you said it well. We all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, the data intelligence platform. We are still early, making it as easy as we can. It's kind of our, as our mission. And so I'm really, really excited to see what we are going to, how the markets are going to evolve over the next few quarters and years. I think the trend is clearly there, when we talk about data mesh, this kind of federated approach, focus on data products is just another signal that we believe that a lot of our organizations are now at the time, they understand the need to go beyond just the technology, how to really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now's the time given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now's the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much, and good luck in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in person event, and of course, the content post event is going to be available at collibra.com, and you can of course catch the Cube coverage at thecube.net, and all the news at siliconangle.com. This is Dave Vellante for the Cube, your leader in enterprise and emerging tech coverage. (light music)

Published Date : Oct 24 2022

SUMMARY :

And the premise that we put Thanks for having me again. of the 2020s from the previous decade, and the past couple of years, and what are you doing to and kind of here we are again What do people need to know And on the organizational side, And of course we see you at all the shows. for the organization to the technology to work and now's the time we need to look beyond I know you're going to crush it out there. and of course, the content post event

ENTITIES

Entity	Category	Confidence
Adobe	ORGANIZATION	0.99+
Heineken	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
San Diego	LOCATION	0.99+
Dave	PERSON	0.99+
Felix Van de Maele	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
seven	QUANTITY	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
2020s	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Python	TITLE	0.99+
2010s	DATE	0.99+
Last year	DATE	0.99+
thecube.net	OTHER	0.99+
Data Citizens	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
second	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
One	QUANTITY	0.99+
10 years	QUANTITY	0.99+
OwlDQ	ORGANIZATION	0.98+
Spark	TITLE	0.98+
TensorFlow	TITLE	0.97+
Data Citizens	EVENT	0.97+
today	DATE	0.97+
Kirk Haslbeck	PERSON	0.96+
over 600 enterprise customers	QUANTITY	0.96+
both	QUANTITY	0.96+
Collibra Protect	ORGANIZATION	0.96+
first way	QUANTITY	0.94+
one	QUANTITY	0.93+
last decade	DATE	0.93+
past couple of years	DATE	0.93+
collibra.com	OTHER	0.92+
15 years	QUANTITY	0.88+
about 18 months ago	DATE	0.87+
last couple of years	DATE	0.87+
last couple of years	DATE	0.83+
almost 15 years later	DATE	0.82+
Data	ORGANIZATION	0.81+
previous decade	DATE	0.76+
Data Citizens 2021	ORGANIZATION	0.73+
next 10 years	DATE	0.69+
quarters	DATE	0.67+
last	DATE	0.66+
Data Citizens 2022	ORGANIZATION	0.63+
Google Cloud	ORGANIZATION	0.63+
past year	DATE	0.62+
Storage	TITLE	0.6+
Azure	ORGANIZATION	0.59+
next	DATE	0.58+
case	QUANTITY	0.58+
Cube	ORGANIZATION	0.53+
single vertical	QUANTITY	0.53+
14	QUANTITY	0.46+
Cube	COMMERCIAL_ITEM	0.45+

Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business

>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)

Published Date : Sep 7 2022

SUMMARY :

bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Ken Shifman	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Erik Bradley	PERSON	0.99+
November 21	DATE	0.99+
Darren Bramen	PERSON	0.99+
Alex	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Netskope	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Fivetran	ORGANIZATION	0.99+
$50 million	QUANTITY	0.99+
21%	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
19%	QUANTITY	0.99+
Jeremy Burton	PERSON	0.99+
$800 million	QUANTITY	0.99+
6,000	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
November '21	DATE	0.99+
ETR	ORGANIZATION	0.99+
First	QUANTITY	0.99+
25%	QUANTITY	0.99+
last year	DATE	0.99+
OneTrust	ORGANIZATION	0.99+
two dimensions	QUANTITY	0.99+
two groups	QUANTITY	0.99+
November of 21	DATE	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
more than 400 companies	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
MySQL	TITLE	0.99+
Moogsoft	ORGANIZATION	0.99+
The Cube	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Grafana	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Mike Speiser	PERSON	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
second	QUANTITY	0.99+
two	QUANTITY	0.99+
first	QUANTITY	0.99+
28%	QUANTITY	0.99+
16%	QUANTITY	0.99+
Second	QUANTITY	0.99+

Jason Bloomberg, Intellyx | VMware Explore 2022

>>Welcome back everyone to the cubes coverage of VM wear Explorer, 2022 formerly VM world. The Cube's 12th year covering the annual conference. I'm Jennifer Daveon. We got Jason Bloomberg here. Who's a Silicon angle contributor guest author, president of inte analyst firm. Great to see you, Jason. Thanks for coming on the queue. >>Yeah, it's great to be here. Thanks a lot. >>And thanks for contributing to Silicon angle. We really appreciate your articles and, and so does the audience, so thanks for that. >>Very good. We're happy >>To help. All right. So I gotta ask you, okay. We've been here on the desk. We haven't had a chance to really scour the landscape here at Moscone. What's going, what's your take on what's going on with VMware Explorer, not world. Yeah. Gotta see the name change. You got the overhang of the, the cloud Broadcom, which from us, it seems like it's energized people like, like shocked to the system something's gonna happen. What's your take. >>Yeah, something's definitely going to happen. Well, I've been struggling with VMware's messaging, you know, how they're messaging to the market. They seem to be downplaying cloud native computing in favor of multi-cloud, which is really quite different from the Tansu centric messaging from a year or two ago. So Tansu is still obviously part of the story, but it's really, they're relegating the cloud native story to an architectural pattern, which it is, but I believe it's much more than that. It's really more of a paradigm shift in how organizations implement it. Broadly speaking, where virtualization is part of the cloud native story, but VMware is making cloud native part of the virtualization story. Do so >>Do you think that's the, the mischaracterization of cloud native or a bad strategy or both? >>Well, I think they're missing an opportunity, right? I think they're missing an opportunity to be a cloud native leader. They're well positioned to do that with Tansu and where the technology was going and the technology is still there. Right? It's not that that >>They're just downplaying it. >>They're just downplaying it. Right. So >>As, as they were security too, they didn't really pump up security at >>All. Yeah. And you know, vSphere is still gonna be based on Kubernetes. So it's, they're going to be cloud native in terms of Kubernetes support across their product line. Anyway. So, but they're, they're really focusing on multi-cloud and betting the farm on multi-cloud and that ties to the change of the name of the conference. Although it's hard to see really how they're connecting the dots. Right. >>It's a bridge you can't cross, you can't see that bridge crossing what you're saying. Yeah. I mean, I thought that was a clever way of saying, oh, we're exploring new frontiers, which is kinda like, we don't really know what it is >>Yet. Yeah. Yeah. I think the, the term Explorer was probably concocted by a committee where, you know, they eliminated all the more interesting names and that was the one that was left. But, you know, Raghu explained that that Explorer is supposed to expand the audience for the conference beyond the VMware customer to this broader multi-cloud audience. But it's hard to say whether you >>Think it worked. Was there people that you recognize here or identified as a new audience? >>I don't think so. Not, not at this show, but over time, they're hoping to have this broader audience now where it's a multi-cloud audience where it's more than just VMware. It's more than just individual clouds, you know, we'll see if that works. >>You heard the cl the cloud chaos. Right. Do you, do you think they're, multi-cloud cross cloud services is a solution looking for a problem or is the problem real? Is there a market there? >>Oh, oh, the cloud chaos. That's a real problem. Right? Multi-cloud is, is a reality. Many organizations are leveraging different clouds for different reasons. And as a result, you have management security, other issues, which lead to this chaos challenge. So the, the problem is real aria. If they can get it up and running and, you know, straightened out, it's gonna be a great solution, but there are other products on the market that are more mature and more well integrated than aria. So they're going to, you know, have to compete, but VMware is very good at that. So, you know, I don't, I don't count the outing. Who >>Do you see as the competition lay out the horses on the track from your perspective? >>Well, you know, there's, there's a lot of different companies. I, I don't wanna mention any particular ones cuz, cuz I don't want to, you know, favor certain ones over others cuz then I get into trouble. But there's a, a lot of companies that >>Okay, I will. So you got a red hat with, you got obvious ones, Cisco, Cisco, I guess is Ashi Corp plays a role? Well, >>Cisco's been talking about this, >>Anybody we missed. >>Well, there's a number of smaller players, including some of the exhibitors at the, at the show that are putting together this, you know, I guess cloud native control plane that covers more than just a single cloud or cover on premises of virtualization as well as multiple clouds. And that's sort of the big challenge, right? This control plane. How do we come up with a way of managing all of this, heterogeneous it in a unified way that meets the business need and allows the technology organization, both it and the application development folks to move quickly and to do what they need to do to meet business needs. Right? So difficult for large organizations to get out of their own way and achieve that, you know, level of speed and scalability that, that, that technology promises. But they're organizationally challenged to, >>To accomplish. I think I've always looked at multi-cloud as a reality. I do see that as a situational analysis on the landscape. Yeah, I got Azure because I got Microsoft in my enterprise and they converted everything to the cloud. And so I didn't really change that. I got Amazon cause that's from almost my action is, and I gotta use Google cloud for some AI stuff. Right. All good. Right. I mean that's not really spanning anything. There's no ring. It's not really, it's like point solutions within the ecosystem, but it's interesting to see how people are globbing onto multi-cloud because to me it feels like a broken strategy trying to get straightened out. Right. Like, you know, multi-cloud groping from multi-cloud it feels that way. And, and that makes a lot of sense cuz if you're not on the right side of this historic shift right now, you're gonna be dead. >>So which side of the street do you wanna be on? I think it's becoming clear. I think the good news is this year. It's like, if you're on this side of the street, you're gonna be, be alive. Yeah. And this side of the street, not so much. So, you know, that's cloud native obviously hybrid steady state mul how multi-cloud shakes out. I don't think the market's ready personally in terms of true multi-cloud I think it's, it's an opportunity to have the conversation. That's why we're having the super cloud narrative. Cause it's a lit more attention getting, but it focuses on, it has to do something specific. Right? It can't be vaporware. The market won't tolerate vaporware and the new cloud architecture, at least that's my opinion. What's your reaction? Yeah. >>Well the, well you're quite right that a lot of the multiple cloud scenarios involve, you know, picking and choosing the various capabilities each of the cloud provider pro offers. Right? So you want TensorFlow, you have a little bit of Google and you want Amazon for something, but then Amazon's too expensive for something else. So you go with a Azure for that or you have Microsoft 365 as well as Amazon. Right? So you're, that's sort of a multi-cloud right there. But I think the more strategic question is organizations who are combining clouds for more architectural reasons. So for example, you know, back backup or failover or data sovereignty issues, right, where you, you can go into a single cloud and say, well, I want, you know, different data and different regions, but they may a, a particular cloud might not have all the answers for you. So you may say, okay, well I want, I may one of the big clouds or there's specialty cloud providers that focus on data sovereignty solutions for particular markets. And, and that might be part of the mix, right? Isn't necessarily all the big clouds. >>I think that's an interesting observation. Cause when you look at, you know, hybrid, right. When you really dig into a lot of the hybrid was Dr. Right? Yeah. Well, we got, we're gonna use the cloud for backup. And that, and that, what you're saying is multi-cloud could be sort of a similar dynamic, >>The low-head fruit, >>Which is fine, which is not that interesting. >>It's the low hanging fruit though. It's the easy, it's that risk free? I won't say risk free, but it's the easiest way not to get killed, >>But there's a translate into just sort of more interesting and lucrative and monetizable opportunities. You know, it's kind of a big leap to go from Dr. To actually building new applications that cross clouds and delivering new monetization value on top of data and you know, this nerve. >>Yeah. Whether that would be the best way to build such applications, the jury's still out. Why would you actually want to do well? >>I was gonna ask you, is there an advantage? We talked to Mariana, Tess, who's, you know, she's CTO of into it now of course, into it's a, you know, different kind of application, but she's like, yeah, we kinda looked hard at that multiple cloud thing. We found it too complex. And so we just picked one cloud, you know, in, for kind of the same thing. So, you know, is there an advantage now, the one advantage John, you pointed this out is if I run on Microsoft, I'll make more money. If I run on Amazon and you know, they'll, they'll help me sell. So, so that's a business justification, but is there a technical reason to do it? You know, global presence, there >>Could be technical reason not to do it either too. So >>There's more because of complexity. >>You mean? Well, and or technical debt on some services might not be there at this point. I mean the puzzle pieces gotta be there, assume that all clouds have have the pieces. Right. Then it's a matter of composability. I think E AJ who came on AJ Patel who runs modern applications development would agree with your assessment of cloud native being probably the driving front car on this messaging, because that's the issue like once you have the, everything there, then you're composing, it's the orchestra model, Dave. It's like, okay, we got everything here. How do I stitch it together? Not so much coding, writing code, cuz you got everything in building blocks and patterns and, and recipes. >>Yeah. And that's really what VMware has in mind when they talk about multi-cloud right? From VMware's perspective, you can put their virtual machine technology in any cloud. So if you, if you do that and you put it in multiple clouds, then you have, you know, this common, familiar environment, right. It's VMware everywhere. Doesn't really matter which cloud it's in because you get all the goodness that VMware has and you have the expertise on staff. And so now you have, you know, the workload portability across clouds, which can give you added benefits. But one of the straw men of this argument is that price arbitrage, right. I'm gonna, you know, put workloads in Amazon if it's cheaper. But if then if Amazon, you know, Azure has a different pricing structure for something I'm doing, then maybe I'll, I'll move a workload over there to get better pricing. That's difficult to implement in practice. Right. That's so that's that while people like to talk about that, yeah. I'm gonna optimize my cost by moving workloads across clouds, the practicalities at this point, make it difficult. Yeah. But with, if you have VMware, any your clouds, it may be more straightforward, but you still might not do it in order to save money on a particular cloud bill. >>It still, people don't want data. They really, really don't want to move >>Data. This audience does not want do it. I mean, if you look at the evolution, this customer base, even their, their affinity towards cloud native that's years in the making just to good put it perspective. Yeah. So I like how VMware's reality is on crawl, walk, run their clients, no matter what they want 'em to do, you can't make 'em run. And when they're still in diapers right. Or instill in the crib. Right. So you gotta get the customers in a mode of saying, I can see how VMware could operate that. I know and know how to run in an environment because the people who come through this show, they're like teams, it's like an offsite meeting, meets a conference and it's institutionalized for 15 plus years of main enterprise workload management. So I like, that's just not going away. So okay. Given that, how do you connect to the next thing? >>Well, I think the, the missing piece of the puzzle is, is the edge, right? Because it's not just about connecting one hyperscaler to another hyperscaler or even to on-premises or a private cloud, it's also the edge, the edge computing and the edge computing data center requirements. Right. Because you have, you could have an edge data center in a, a phone tower or a point of presence, a telco point of presence, which are those nondescript buildings, every town has. Right? Yeah, yeah. Yeah. And you know, we have that >>Little colo that no one knows about, >>Right, exactly. That, you know, used to be your DSL end point. And now it's just a mini data center for the cloud, or it could be the, you know, the factory computer room or computer room in a retailer. You know, every retailer has that computer room in the modern retails target home Depot. They will have thousands of these little mini cloud data centers they're handling their, their point of sale systems, their, you know, local wifi and all these other local systems. That's, that's where the interesting part of this cloud story is going because that is inherently heterogeneous inherently mixed in terms of the hardware requirements, the software requirements and how you're going to build applications to support that, including AI based applications, which are sort of the, one of the areas of major innovation today is how are we going to do AI on the edge and why would we do it? And there's huge, huge opportunity to >>Well, real time referencing at the edge. Exactly. Absolutely. With all the data. My, my question is, is, is, is the cloud gonna be part of that? Or is the edge gonna actually bring new architectures and new economics that completely disrupt the, the economics that we've known in the cloud and in the data center? >>Well, this for hardware matters. If form factor matters, you can put a data center, the size of four, you know, four U boxes and then you're done >>Nice. I, >>I think it's a semantic question. It's something for the marketers to come up with the right jargon for is yeah. Is the edge part of the cloud, is the cloud part of the edge? Are we gonna come up with a new term, super cloud HyperCloud? >>Yeah. >>Wonder woman cloud, who knows? Yeah. But what, what >>Covers everything, but what might not be semantic is the, I, I come back to the Silicon that inside the, you know, apple max, the M one M two M two ultras, the, what Tesla's doing with NPUs, what you're seeing, you know, in, in, in arm based innovations could completely change the economics of computing, the security model. >>As we say, with the AJ >>Power consumption, >>Cloud's the hardware middleware. And then you got the application is the business everything's completely technology. The business is the app. I >>Mean we're 15 years into the cloud. You know, it's like every 15 years something gets blown up. >>We have two minutes left Jason. So I want to get into what you're working on for when your firm, you had a great, great traction, great practice over there. But before that, what's the, what's your scorecard on the event? How would you, what, what would be your constructive analysis? Positive, good, bad, ugly for VMwares team around this event. What'd they get right? What'd they need to work on >>Well as a smaller event, right? So about one third, the size of previous worlds. I mean, it's, it's, it's been a reasonably well run event for a smaller event. I, you know, in terms of the logistics and everything everything's handled well, I think their market messaging, they need to sort of revisit, but in terms of the ecosystem, you know, I think the ecosystem is, is, is, is doing well. You know, met with a number of the exhibitors over the last few days. And I think there's a lot of, a lot of positive things going on there. >>They see a wave coming and that's cloud native in your mind. >>Well, some of them are talking about cloud native. Some of them aren't, it's a variety of different >>Potentially you're talking where they are in this dag are on the hardware. Okay, cool. What's going on with your research? Tell us what you're focused on right now. What are you digging into? What's going on? Well, >>Cloud native, obviously a big part of what we do, but cybersecurity as well, mainframe modernization, believe it or not. It's a hot topic. DevOps continues to be a hot topic. So a variety of different things. And I'll be writing an article for Silicon angle on this conference. So highlights from the show. Great. Focusing on not just the VMware story, but some of the hot spots among the exhibitors. >>And what's your take on the whole crypto defi world. That's emerging. >>It's all a scam hundred >>Percent. All right. We're now back to enterprise. >>Wait a minute. Hold on. >>We're out of time. >>Gotta go. >>We'll make that a virtual, there are >>A lot of scams. >>I'll admit that you gotta, it's a lot of cool stuff. You gotta get through the underbelly that grows the old bolt. >>You hear kit earlier. He's like, yeah. Well, forget about crypto. Let's talk blockchain, but I'm like, no, let's talk crypto. >>Yeah. All good stuff, Jason. Thanks for coming on the cube. Thanks for spending time. I know you've been busy in meetings and thanks for coming back. Yeah. Happy to help. All right. We're wrapping up day two. I'm Jeff David ante cube coverage. Two sets three days live coverage, 12th year covering VMware's user conference called explore now was formerly VM world onto the next level. That's what it's all about. Just the cube signing off for day two. Thanks for watching.

Published Date : Sep 1 2022

SUMMARY :

Thanks for coming on the queue. Yeah, it's great to be here. And thanks for contributing to Silicon angle. We're happy You got the overhang of the, the cloud Broadcom, you know, how they're messaging to the market. I think they're missing an opportunity to be a cloud native leader. So So it's, they're going to be cloud It's a bridge you can't cross, you can't see that bridge crossing what you're saying. But it's hard to say whether you Was there people that you recognize here or identified as a new audience? clouds, you know, we'll see if that works. You heard the cl the cloud chaos. So, you know, I don't, I don't count the outing. Well, you know, there's, there's a lot of different companies. So you got a red hat with, you got obvious ones, Cisco, that, you know, level of speed and scalability that, that, that technology promises. Like, you know, multi-cloud groping from multi-cloud it So, you know, that's cloud native obviously hybrid steady state mul So for example, you know, back backup or failover or data sovereignty Cause when you look at, you know, hybrid, right. but it's the easiest way not to get killed, on top of data and you know, this nerve. Why would you actually want to do And so we just picked one cloud, you know, in, for kind of the same thing. Could be technical reason not to do it either too. on this messaging, because that's the issue like once you have the, But if then if Amazon, you know, Azure has a different pricing structure for something I'm doing, They really, really don't want to move I mean, if you look at the evolution, this customer base, even their, And you know, we have that or it could be the, you know, the factory computer room or computer room and in the data center? you know, four U boxes and then you're done It's something for the marketers to come up with the right jargon for is yeah. Yeah. inside the, you know, apple max, the M one M two M two ultras, And then you got the application is the business everything's completely technology. You know, it's like every 15 years something gets blown up. So I want to get into what you're working on for when your firm, they need to sort of revisit, but in terms of the ecosystem, you know, I think the ecosystem is, Well, some of them are talking about cloud native. What are you digging into? So highlights from the show. And what's your take on the whole crypto defi world. We're now back to enterprise. Wait a minute. I'll admit that you gotta, it's a lot of cool stuff. Well, forget about crypto. Thanks for coming on the cube.

ENTITIES

Entity	Category	Confidence
Cisco	ORGANIZATION	0.99+
Jason	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Jennifer Daveon	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Raghu	PERSON	0.99+
John	PERSON	0.99+
Tess	PERSON	0.99+
15 years	QUANTITY	0.99+
Tesla	ORGANIZATION	0.99+
15 plus years	QUANTITY	0.99+
Mariana	PERSON	0.99+
Jason Bloomberg	PERSON	0.99+
VMware	ORGANIZATION	0.99+
two minutes	QUANTITY	0.99+
thousands	QUANTITY	0.99+
three days	QUANTITY	0.99+
Dave	PERSON	0.99+
Tansu	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Jeff David	PERSON	0.99+
Google	ORGANIZATION	0.99+
AJ Patel	PERSON	0.99+
12th year	QUANTITY	0.99+
Ashi Corp	ORGANIZATION	0.98+
M one	COMMERCIAL_ITEM	0.98+
Intellyx	ORGANIZATION	0.97+
day two	QUANTITY	0.97+
a year	DATE	0.96+
Moscone	LOCATION	0.95+
single cloud	QUANTITY	0.95+
one	QUANTITY	0.94+
one cloud	QUANTITY	0.94+
today	DATE	0.94+
apple	ORGANIZATION	0.93+
VMwares	ORGANIZATION	0.93+
this year	DATE	0.92+
two ago	DATE	0.92+
M two	COMMERCIAL_ITEM	0.9+
E AJ	PERSON	0.9+
Azure	TITLE	0.9+
Explorer	TITLE	0.89+
2022	DATE	0.87+
Broadcom	ORGANIZATION	0.87+
M two ultras	COMMERCIAL_ITEM	0.85+
TensorFlow	TITLE	0.85+
Kubernetes	TITLE	0.84+
one advantage	QUANTITY	0.83+
four	QUANTITY	0.83+
each	QUANTITY	0.79+
wave	EVENT	0.78+
Silicon angle	ORGANIZATION	0.72+
vSphere	TITLE	0.72+
Two sets	QUANTITY	0.72+
max	COMMERCIAL_ITEM	0.7+
home Depot	ORGANIZATION	0.7+
one third	QUANTITY	0.69+
cloud	TITLE	0.61+
Cube	ORGANIZATION	0.6+

Luis Ceze, OctoML | Amazon re:MARS 2022

(upbeat music) >> Welcome back, everyone, to theCUBE's coverage here live on the floor at AWS re:MARS 2022. I'm John Furrier, host for theCUBE. Great event, machine learning, automation, robotics, space, that's MARS. It's part of the re-series of events, re:Invent's the big event at the end of the year, re:Inforce, security, re:MARS, really intersection of the future of space, industrial, automation, which is very heavily DevOps machine learning, of course, machine learning, which is AI. We have Luis Ceze here, who's the CEO co-founder of OctoML. Welcome to theCUBE. >> Thank you very much for having me in the show, John. >> So we've been following you guys. You guys are a growing startup funded by Madrona Venture Capital, one of your backers. You guys are here at the show. This is a, I would say small show relative what it's going to be, but a lot of robotics, a lot of space, a lot of industrial kind of edge, but machine learning is the centerpiece of this trend. You guys are in the middle of it. Tell us your story. >> Absolutely, yeah. So our mission is to make machine learning sustainable and accessible to everyone. So I say sustainable because it means we're going to make it faster and more efficient. You know, use less human effort, and accessible to everyone, accessible to as many developers as possible, and also accessible in any device. So, we started from an open source project that began at University of Washington, where I'm a professor there. And several of the co-founders were PhD students there. We started with this open source project called Apache TVM that had actually contributions and collaborations from Amazon and a bunch of other big tech companies. And that allows you to get a machine learning model and run on any hardware, like run on CPUs, GPUs, various GPUs, accelerators, and so on. It was the kernel of our company and the project's been around for about six years or so. Company is about three years old. And we grew from Apache TVM into a whole platform that essentially supports any model on any hardware cloud and edge. >> So is the thesis that, when it first started, that you want to be agnostic on platform? >> Agnostic on hardware, that's right. >> Hardware, hardware. >> Yeah. >> What was it like back then? What kind of hardware were you talking about back then? Cause a lot's changed, certainly on the silicon side. >> Luis: Absolutely, yeah. >> So take me through the journey, 'cause I could see the progression. I'm connecting the dots here. >> So once upon a time, yeah, no... (both chuckling) >> I walked in the snow with my bare feet. >> You have to be careful because if you wake up the professor in me, then you're going to be here for two hours, you know. >> Fast forward. >> The average version here is that, clearly machine learning has shown to actually solve real interesting, high value problems. And where machine learning runs in the end, it becomes code that runs on different hardware, right? And when we started Apache TVM, which stands for tensor virtual machine, at that time it was just beginning to start using GPUs for machine learning, we already saw that, with a bunch of machine learning models popping up and CPUs and GPU's starting to be used for machine learning, it was clear that it come opportunity to run on everywhere. >> And GPU's were coming fast. >> GPUs were coming and huge diversity of CPUs, of GPU's and accelerators now, and the ecosystem and the system software that maps models to hardware is still very fragmented today. So hardware vendors have their own specific stacks. So Nvidia has its own software stack, and so does Intel, AMD. And honestly, I mean, I hope I'm not being, you know, too controversial here to say that it kind of of looks like the mainframe era. We had tight coupling between hardware and software. You know, if you bought IBM hardware, you had to buy IBM OS and IBM database, IBM applications, it all tightly coupled. And if you want to use IBM software, you had to buy IBM hardware. So that's kind of like what machine learning systems look like today. If you buy a certain big name GPU, you've got to use their software. Even if you use their software, which is pretty good, you have to buy their GPUs, right? So, but you know, we wanted to help peel away the model and the software infrastructure from the hardware to give people choice, ability to run the models where it best suit them. Right? So that includes picking the best instance in the cloud, that's going to give you the right, you know, cost properties, performance properties, or might want to run it on the edge. You might run it on an accelerator. >> What year was that roughly, when you were going this? >> We started that project in 2015, 2016 >> Yeah. So that was pre-conventional wisdom. I think TensorFlow wasn't even around yet. >> Luis: No, it wasn't. >> It was, I'm thinking like 2017 or so. >> Luis: Right. So that was the beginning of, okay, this is opportunity. AWS, I don't think they had released some of the nitro stuff that the Hamilton was working on. So, they were already kind of going that way. It's kind of like converging. >> Luis: Yeah. >> The space was happening, exploding. >> Right. And the way that was dealt with, and to this day, you know, to a large extent as well is by backing machine learning models with a bunch of hardware specific libraries. And we were some of the first ones to say, like, know what, let's take a compilation approach, take a model and compile it to very efficient code for that specific hardware. And what underpins all of that is using machine learning for machine learning code optimization. Right? But it was way back when. We can talk about where we are today. >> No, let's fast forward. >> That's the beginning of the open source project. >> But that was a fundamental belief, worldview there. I mean, you have a world real view that was logical when you compare to the mainframe, but not obvious to the machine learning community. Okay, good call, check. Now let's fast forward, okay. Evolution, we'll go through the speed of the years. More chips are coming, you got GPUs, and seeing what's going on in AWS. Wow! Now it's booming. Now I got unlimited processors, I got silicon on chips, I got, everywhere >> Yeah. And what's interesting is that the ecosystem got even more complex, in fact. Because now you have, there's a cross product between machine learning models, frameworks like TensorFlow, PyTorch, Keras, and like that and so on, and then hardware targets. So how do you navigate that? What we want here, our vision is to say, folks should focus, people should focus on making the machine learning models do what they want to do that solves a value, like solves a problem of high value to them. Right? So another deployment should be completely automatic. Today, it's very, very manual to a large extent. So once you're serious about deploying machine learning model, you got a good understanding where you're going to deploy it, how you're going to deploy it, and then, you know, pick out the right libraries and compilers, and we automated the whole thing in our platform. This is why you see the tagline, the booth is right there, like bringing DevOps agility for machine learning, because our mission is to make that fully transparent. >> Well, I think that, first of all, I use that line here, cause I'm looking at it here on live on camera. People can't see, but it's like, I use it on a couple couple of my interviews because the word agility is very interesting because that's kind of the test on any kind of approach these days. Agility could be, and I talked to the robotics guys, just having their product be more agile. I talked to Pepsi here just before you came on, they had this large scale data environment because they built an architecture, but that fostered agility. So again, this is an architectural concept, it's a systems' view of agility being the output, and removing dependencies, which I think what you guys were trying to do. >> Only part of what we do. Right? So agility means a bunch of things. First, you know-- >> Yeah explain. >> Today it takes a couple months to get a model from, when the model's ready, to production, why not turn that in two hours. Agile, literally, physically agile, in terms of walk off time. Right? And then the other thing is give you flexibility to choose where your model should run. So, in our deployment, between the demo and the platform expansion that we announced yesterday, you know, we give the ability of getting your model and, you know, get it compiled, get it optimized for any instance in the cloud and automatically move it around. Today, that's not the case. You have to pick one instance and that's what you do. And then you might auto scale with that one instance. So we give the agility of actually running and scaling the model the way you want, and the way it gives you the right SLAs. >> Yeah, I think Swami was mentioning that, not specifically that use case for you, but that use case generally, that scale being moving things around, making them faster, not having to do that integration work. >> Scale, and run the models where they need to run. Like some day you want to have a large scale deployment in the cloud. You're going to have models in the edge for various reasons because speed of light is limited. We cannot make lights faster. So, you know, got to have some, that's a physics there you cannot change. There's privacy reasons. You want to keep data locally, not send it around to run the model locally. So anyways, and giving the flexibility. >> Let me jump in real quick. I want to ask this specific question because you made me think of something. So we're just having a data mesh conversation. And one of the comments that's come out of a few of these data as code conversations is data's the product now. So if you can move data to the edge, which everyone's talking about, you know, why move data if you don't have to, but I can move a machine learning algorithm to the edge. Cause it's costly to move data. I can move computer, everyone knows that. But now I can move machine learning to anywhere else and not worry about integrating on the fly. So the model is the code. >> It is the product. >> Yeah. And since you said, the model is the code, okay, now we're talking even more here. So machine learning models today are not treated as code, by the way. So do not have any of the typical properties of code that you can, whenever you write a piece of code, you run a code, you don't know, you don't even think what is a CPU, we don't think where it runs, what kind of CPU it runs, what kind of instance it runs. But with machine learning model, you do. So what we are doing and created this fully transparent automated way of allowing you to treat your machine learning models if you were a regular function that you call and then a function could run anywhere. >> Yeah. >> Right. >> That's why-- >> That's better. >> Bringing DevOps agility-- >> That's better. >> Yeah. And you can use existing-- >> That's better, because I can run it on the Artemis too, in space. >> You could, yeah. >> If they have the hardware. (both laugh) >> And that allows you to run your existing, continue to use your existing DevOps infrastructure and your existing people. >> So I have to ask you, cause since you're a professor, this is like a masterclass on theCube. Thank you for coming on. Professor. (Luis laughing) I'm a hardware guy. I'm building hardware for Boston Dynamics, Spot, the dog, that's the diversity in hardware, it's tends to be purpose driven. I got a spaceship, I'm going to have hardware on there. >> Luis: Right. >> It's generally viewed in the community here, that everyone I talk to and other communities, open source is going to drive all software. That's a check. But the scale and integration is super important. And they're also recognizing that hardware is really about the software. And they even said on stage, here. Hardware is not about the hardware, it's about the software. So if you believe that to be true, then your model checks all the boxes. Are people getting this? >> I think they're starting to. Here is why, right. A lot of companies that were hardware first, that thought about software too late, aren't making it. Right? There's a large number of hardware companies, AI chip companies that aren't making it. Probably some of them that won't make it, unfortunately just because they started thinking about software too late. I'm so glad to see a lot of the early, I hope I'm not just doing our own horn here, but Apache TVM, the infrastructure that we built to map models to different hardware, it's very flexible. So we see a lot of emerging chip companies like SiMa.ai's been doing fantastic work, and they use Apache TVM to map algorithms to their hardware. And there's a bunch of others that are also using Apache TVM. That's because you have, you know, an opening infrastructure that keeps it up to date with all the machine learning frameworks and models and allows you to extend to the chips that you want. So these companies pay attention that early, gives them a much higher fighting chance, I'd say. >> Well, first of all, not only are you backable by the VCs cause you have pedigree, you're a professor, you're smart, and you get good recruiting-- >> Luis: I don't know about the smart part. >> And you get good recruiting for PhDs out of University of Washington, which is not too shabby computer science department. But they want to make money. The VCs want to make money. >> Right. >> So you have to make money. So what's the pitch? What's the business model? >> Yeah. Absolutely. >> Share us what you're thinking there. >> Yeah. The value of using our solution is shorter time to value for your model from months to hours. Second, you shrink operator, op-packs, because you don't need a specialized expensive team. Talk about expensive, expensive engineers who can understand machine learning hardware and software engineering to deploy models. You don't need those teams if you use this automated solution, right? Then you reduce that. And also, in the process of actually getting a model and getting specialized to the hardware, making hardware aware, we're talking about a very significant performance improvement that leads to lower cost of deployment in the cloud. We're talking about very significant reduction in costs in cloud deployment. And also enabling new applications on the edge that weren't possible before. It creates, you know, latent value opportunities. Right? So, that's the high level value pitch. But how do we make money? Well, we charge for access to the platform. Right? >> Usage. Consumption. >> Yeah, and value based. Yeah, so it's consumption and value based. So depends on the scale of the deployment. If you're going to deploy machine learning model at a larger scale, chances are that it produces a lot of value. So then we'll capture some of that value in our pricing scale. >> So, you have direct sales force then to work those deals. >> Exactly. >> Got it. How many customers do you have? Just curious. >> So we started, the SaaS platform just launched now. So we started onboarding customers. We've been building this for a while. We have a bunch of, you know, partners that we can talk about openly, like, you know, revenue generating partners, that's fair to say. We work closely with Qualcomm to enable Snapdragon on TVM and hence our platform. We're close with AMD as well, enabling AMD hardware on the platform. We've been working closely with two hyperscaler cloud providers that-- >> I wonder who they are. >> I don't know who they are, right. >> Both start with the letter A. >> And they're both here, right. What is that? >> They both start with the letter A. >> Oh, that's right. >> I won't give it away. (laughing) >> Don't give it away. >> One has three, one has four. (both laugh) >> I'm guessing, by the way. >> Then we have customers in the, actually, early customers have been using the platform from the beginning in the consumer electronics space, in Japan, you know, self driving car technology, as well. As well as some AI first companies that actually, whose core value, the core business come from AI models. >> So, serious, serious customers. They got deep tech chops. They're integrating, they see this as a strategic part of their architecture. >> That's what I call AI native, exactly. But now there's, we have several enterprise customers in line now, we've been talking to. Of course, because now we launched the platform, now we started onboarding and exploring how we're going to serve it to these customers. But it's pretty clear that our technology can solve a lot of other pain points right now. And we're going to work with them as early customers to go and refine them. >> So, do you sell to the little guys, like us? Will we be customers if we wanted to be? >> You could, absolutely, yeah. >> What we have to do, have machine learning folks on staff? >> So, here's what you're going to have to do. Since you can see the booth, others can't. No, but they can certainly, you can try our demo. >> OctoML. >> And you should look at the transparent AI app that's compiled and optimized with our flow, and deployed and built with our flow. That allows you to get your image and do style transfer. You know, you can get you and a pineapple and see how you look like with a pineapple texture. >> We got a lot of transcript and video data. >> Right. Yeah. Right, exactly. So, you can use that. Then there's a very clear-- >> But I could use it. You're not blocking me from using it. Everyone's, it's pretty much democratized. >> You can try the demo, and then you can request access to the platform. >> But you get a lot of more serious deeper customers. But you can serve anybody, what you're saying. >> Luis: We can serve anybody, yeah. >> All right, so what's the vision going forward? Let me ask this. When did people start getting the epiphany of removing the machine learning from the hardware? Was it recently, a couple years ago? >> Well, on the research side, we helped start that trend a while ago. I don't need to repeat that. But I think the vision that's important here, I want the audience here to take away is that, there's a lot of progress being made in creating machine learning models. So, there's fantastic tools to deal with training data, and creating the models, and so on. And now there's a bunch of models that can solve real problems there. The question is, how do you very easily integrate that into your intelligent applications? Madrona Venture Group has been very vocal and investing heavily in intelligent applications both and user applications as well as enablers. So we say an enable of that because it's so easy to use our flow to get a model integrated into your application. Now, any regular software developer can integrate that. And that's just the beginning, right? Because, you know, now we have CI/CD integration to keep your models up to date, to continue to integrate, and then there's more downstream support for other features that you normally have in regular software development. >> I've been thinking about this for a long, long, time. And I think this whole code, no one thinks about code. Like, I write code, I'm deploying it. I think this idea of machine learning as code independent of other dependencies is really amazing. It's so obvious now that you say it. What's the choices now? Let's just say that, I buy it, I love it, I'm using it. Now what do I got to do if I want to deploy it? Do I have to pick processors? Are there verified platforms that you support? Is there a short list? Is there every piece of hardware? >> We actually can help you. I hope we're not saying we can do everything in the world here, but we can help you with that. So, here's how. When you have them all in the platform you can actually see how this model runs on any instance of any cloud, by the way. So we support all the three major cloud providers. And then you can make decisions. For example, if you care about latency, your model has to run on, at most 50 milliseconds, because you're going to have interactivity. And then, after that, you don't care if it's faster. All you care is that, is it going to run cheap enough. So we can help you navigate. And also going to make it automatic. >> It's like tire kicking in the dealer showroom. >> Right. >> You can test everything out, you can see the simulation. Are they simulations, or are they real tests? >> Oh, no, we run all in real hardware. So, we have, as I said, we support any instances of any of the major clouds. We actually run on the cloud. But we also support a select number of edge devices today, like ARMs and Nvidia Jetsons. And we have the OctoML cloud, which is a bunch of racks with a bunch Raspberry Pis and Nvidia Jetsons, and very soon, a bunch of mobile phones there too that can actually run the real hardware, and validate it, and test it out, so you can see that your model runs performant and economically enough in the cloud. And it can run on the edge devices-- >> You're a machine learning as a service. Would that be an accurate? >> That's part of it, because we're not doing the machine learning model itself. You come with a model and we make it deployable and make it ready to deploy. So, here's why it's important. Let me try. There's a large number of really interesting companies that do API models, as in API as a service. You have an NLP model, you have computer vision models, where you call an API and then point in the cloud. You send an image and you got a description, for example. But it is using a third party. Now, if you want to have your model on your infrastructure but having the same convenience as an API you can use our service. So, today, chances are that, if you have a model that you know that you want to do, there might not be an API for it, we actually automatically create the API for you. >> Okay, so that's why I get the DevOps agility for machine learning is a better description. Cause it's not, you're not providing the service. You're providing the service of deploying it like DevOps infrastructure as code. You're now ML as code. >> It's your model, your API, your infrastructure, but all of the convenience of having it ready to go, fully automatic, hands off. >> Cause I think what's interesting about this is that it brings the craftsmanship back to machine learning. Cause it's a craft. I mean, let's face it. >> Yeah. I want human brains, which are very precious resources, to focus on building those models, that is going to solve business problems. I don't want these very smart human brains figuring out how to scrub this into actually getting run the right way. This should be automatic. That's why we use machine learning, for machine learning to solve that. >> Here's an idea for you. We should write a book called, The Lean Machine Learning. Cause the lean startup was all about DevOps. >> Luis: We call machine leaning. No, that's not it going to work. (laughs) >> Remember when iteration was the big mantra. Oh, yeah, iterate. You know, that was from DevOps. >> Yeah, that's right. >> This code allowed for standing up stuff fast, double down, we all know the history, what it turned out. That was a good value for developers. >> I could really agree. If you don't mind me building on that point. You know, something we see as OctoML, but we also see at Madrona as well. Seeing that there's a trend towards best in breed for each one of the stages of getting a model deployed. From the data aspect of creating the data, and then to the model creation aspect, to the model deployment, and even model monitoring. Right? We develop integrations with all the major pieces of the ecosystem, such that you can integrate, say with model monitoring to go and monitor how a model is doing. Just like you monitor how code is doing in deployment in the cloud. >> It's evolution. I think it's a great step. And again, I love the analogy to the mainstream. I lived during those days. I remember the monolithic propriety, and then, you know, OSI model kind of blew it. But that OSI stack never went full stack, and it only stopped at TCP/IP. So, I think the same thing's going on here. You see some scalability around it to try to uncouple it, free it. >> Absolutely. And sustainability and accessibility to make it run faster and make it run on any deice that you want by any developer. So, that's the tagline. >> Luis Ceze, thanks for coming on. Professor. >> Thank you. >> I didn't know you were a professor. That's great to have you on. It was a masterclass in DevOps agility for machine learning. Thanks for coming on. Appreciate it. >> Thank you very much. Thank you. >> Congratulations, again. All right. OctoML here on theCube. Really important. Uncoupling the machine learning from the hardware specifically. That's only going to make space faster and safer, and more reliable. And that's where the whole theme of re:MARS is. Let's see how they fit in. I'm John for theCube. Thanks for watching. More coverage after this short break. >> Luis: Thank you. (gentle music)

Published Date : Jun 24 2022

SUMMARY :

live on the floor at AWS re:MARS 2022. for having me in the show, John. but machine learning is the And that allows you to get certainly on the silicon side. 'cause I could see the progression. So once upon a time, yeah, no... because if you wake up learning runs in the end, that's going to give you the So that was pre-conventional wisdom. the Hamilton was working on. and to this day, you know, That's the beginning of that was logical when you is that the ecosystem because that's kind of the test First, you know-- and scaling the model the way you want, not having to do that integration work. Scale, and run the models So if you can move data to the edge, So do not have any of the typical And you can use existing-- the Artemis too, in space. If they have the hardware. And that allows you So I have to ask you, So if you believe that to be true, to the chips that you want. about the smart part. And you get good recruiting for PhDs So you have to make money. And also, in the process So depends on the scale of the deployment. So, you have direct sales How many customers do you have? We have a bunch of, you know, And they're both here, right. I won't give it away. One has three, one has four. in Japan, you know, self They're integrating, they see this as it to these customers. Since you can see the booth, others can't. and see how you look like We got a lot of So, you can use that. But I could use it. and then you can request But you can serve anybody, of removing the machine for other features that you normally have It's so obvious now that you say it. So we can help you navigate. in the dealer showroom. you can see the simulation. And it can run on the edge devices-- You're a machine learning as a service. know that you want to do, I get the DevOps agility but all of the convenience it brings the craftsmanship for machine learning to solve that. Cause the lean startup No, that's not it going to work. You know, that was from DevOps. double down, we all know the such that you can integrate, and then, you know, OSI on any deice that you Professor. That's great to have you on. Thank you very much. Uncoupling the machine learning Luis: Thank you.

ENTITIES

Entity	Category	Confidence
Luis Ceze	PERSON	0.99+
Qualcomm	ORGANIZATION	0.99+
Luis	PERSON	0.99+
2015	DATE	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Boston Dynamics	ORGANIZATION	0.99+
two hours	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
2017	DATE	0.99+
Japan	LOCATION	0.99+
Madrona Venture Capital	ORGANIZATION	0.99+
AMD	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
three	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
One	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
four	QUANTITY	0.99+
2016	DATE	0.99+
University of Washington	ORGANIZATION	0.99+
Today	DATE	0.99+
Pepsi	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
yesterday	DATE	0.99+
First	QUANTITY	0.99+
both	QUANTITY	0.99+
Second	QUANTITY	0.99+
today	DATE	0.99+
SiMa.ai	ORGANIZATION	0.99+
OctoML	TITLE	0.99+
OctoML	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.98+
one instance	QUANTITY	0.98+
DevOps	TITLE	0.98+
Madrona Venture Group	ORGANIZATION	0.98+
Swami	PERSON	0.98+
Madrona	ORGANIZATION	0.98+
about six years	QUANTITY	0.96+
Spot	ORGANIZATION	0.96+
The Lean Machine Learning	TITLE	0.95+
first	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.94+
ARMs	ORGANIZATION	0.94+
pineapple	ORGANIZATION	0.94+
Raspberry Pis	ORGANIZATION	0.92+
TensorFlow	TITLE	0.89+
Snapdragon	ORGANIZATION	0.89+
about three years old	QUANTITY	0.89+
a couple years ago	DATE	0.88+
two hyperscaler cloud providers	QUANTITY	0.88+
first ones	QUANTITY	0.87+
one of	QUANTITY	0.85+
50 milliseconds	QUANTITY	0.83+
Apache TVM	ORGANIZATION	0.82+
both laugh	QUANTITY	0.82+
three major cloud providers	QUANTITY	0.81+

Chris Wright, Red Hat | Red Hat Summit 2022

(bright upbeat music) >> We're back at the Red Hat Summit at the Seaport in Boston, theCUBE's coverage. This is day two. Dave Vellante and Paul Gillin. Chris Wright is here, the chief technology officer at Red Hat. Chris, welcome back to theCUBE. Good to see you. >> Yeah, likewise. Thanks for having me. >> You're very welcome. So, you were saying today in your keynote. We got a lot of ground to cover here, Chris. You were saying that, you know, software, Andreessen's software is eating the world. Software ate the world, is what you said. And now we have to think about AI. AI is eating the world. What does that mean? What's the implication for customers and developers? >> Well, a lot of implications. I mean, to start with, just acknowledging that software isn't this future dream. It is the reality of how businesses run today. It's an important part of understanding what you need to invest in to make yourself successful, essentially, as a software company, where all companies are building technology to differentiate themselves. Take that, all that discipline, everything we've learned in that context, bring in AI. So, we have a whole new set of skills to learn, tools to create and discipline processes to build around delivering data-driven value into the company, just the way we've built software value into companies. >> I'm going to cut right to the chase because I would say data is eating software. Data and AI, to me, are like, you know, kissing cousins. So here's what I want to ask you as a technologist. So we have the application development stack, if you will. And it's separate from the data and analytics stack. All we talk about is injecting AI into applications, making them data-driven. You just used that term. But they're totally two totally separate stacks, organizationally and technically. Are those worlds coming together? Do they have to come together in order for the AI vision to be real? >> Absolutely, so, totally agree with you on the data piece. It's inextricably linked to AI and analytics and all of the, kind of, machine learning that goes on in creating intelligence for applications. The application connection to a machine learning model is fundamental. So, you got to think about not just the software developer or the data scientist, but also there's a line of business in there that's saying, "Here's the business outcomes I'm looking for." It's that trifecta that has to come together to make advancements and really make change in the business. So, you know, some of the folks we had on stage today were talking about exactly that. Which is, how do you bring together those three different roles? And there's technology that can help bridge gaps. So, we look at what we call intelligent applications. Embed intelligence into the application. That means you surface a machine learning model with APIs to make it accessible into applications, so that developers can query a machine learning model. You need to do that with some discipline and rigor around, you know, what does it mean to develop this thing and life cycle it and integrate it into this bigger picture. >> So the technology is capable of coming together. You know, Amanda Purnell is coming on next. >> Oh, great. >> 'Cause she was talking about, you know, getting, you know, insights in the hands of nurses and they're not coders. >> That's right. >> But they need data. But I feel like it's, well, I feel very strongly that it's an organizational challenge, more so. I think you're confirming. It's not really a technical challenge. I can insert a column into the application development stack and bring TensorFlow in or AI or data, whatever it is. It's not a technical issue. Is that fair? >> Well, there are some technical challenges. So, for example, data scientists. Kind of a scarce kind of skillset within any business. So, how do you scale data scientists into the developer population? Which will be a large population within an organization. So, there's tools that we can use to bring those worlds together. So, you know, it's not just TensorFlow but it's the entire workflow and platform of how you share the data, the data training models and then just deploying models into a runtime production environment. That looks similar to software development processes but it's slightly different. So, that's where a common platform can help bridge the gaps between that developer world and the data science world. >> Where is Red Hat's position in this evolving AI stack? I mean, you're not into developing tool sets like TensorFlow, right? >> Yeah, that's right. If you think about a lot of what we do, it's aggregate content together, bring a distribution of tools, giving flexibility to the user. Whether that's a developer, a system administrator, or a data scientist. So our role here is, one, make sure we work with our hardware partners to create accelerated environments for AI. So, that's sort of an enablement thing. The other is bring together those disparate tools into a workflow and give a platform that enables data scientists to choose which, is it PyTorch, is it TensorFlow? What's the best tool for you? And assemble that tool into your workflow and then proceed training, doing inference, and, you know, tuning and lather, rinse, repeat. >> So, to make your platform then, as receptive as possible, right? You're not trying to pick winners in what languages to work with or what frameworks? >> Yeah, that's right. I mean, picking winners is difficult. The world changes so rapidly. So we make big bets on key areas and certainly TensorFlow would be a great example. A lot of community attraction there. But our goal isn't to say that's the one tool that everybody should use. It's just one of the many tools in your toolbox. >> There are risks of not pursuing this, from an organization's perspective. A customer, they kind of get complacent and, you know, they could get disrupted, but there's also an industry risk. If the industry can't deliver this capability, what are the implications if the industry doesn't step up? I believe the industry will, just 'cause it always does. But what about customer complacency? We certainly saw that a lot with digital transformation and COVID sort of forced us to march to digital. What should we be thinking about of the implications of not leaning in? >> Well, I think that the disruption piece is key because there's always that spectrum of businesses. Some are more leaning in, invested in the future. Some are more laggards and kind of wait and see. Those leaning in tend to be separating themselves, wheat from the chaff. So, that's an important way to look at it. Also, if you think about it, many data science experiments fail within businesses. I think part of that is not having the rigor and discipline around connecting, not just the tools and data scientists together, but also looking at what business outcomes are you trying to drive? If you don't bring those things together then it sort of can be too academic and the business doesn't see the value. And so there's also the question of transparency. How do you understand why is a model predicting you should take a certain action or do a certain thing? As an industry, I think we need to focus on bringing tools together, bringing data together, and building better transparency into how models work. >> There's also a lot of activity around governance right now, AI governance. Particularly removing bias from ML models. Is that something that you are guiding your customers on? Or, how important do you feel this is at this point of AI's development? >> It's really important. I mean, the challenge is finding it and understanding, you know, we bring data that maybe already carrying a bias into a training process and building a model around that. How do you understand what the bias is in that model? There's a lot of open questions there and academic research to try to understand how you can ferret out, you know, essentially biased data and make it less biased or unbiased. Our role is really just bringing the toolset together so that you have the ability to do that as a business. So, we're not necessarily building the next machine learning algorithm or models or ways of building transparency into models, as much as building the platform and bringing the tools together that can give you that for your own organization. >> So, it brings up the question of architectures. I've been sort of a casual or even active observer of data architectures over the last, whatever, 15 years. They've been really centralized. Our data teams are highly specialized. You mentioned data scientists, but there's data engineers and there's data analysts and very hyper specialized roles that don't really scale that well. So there seems to be a move, talk about edge. We're going to talk about edge. The ultimate edge, which is space, very cool. But data is distributed by its very nature. We have this tendency to try to force it into this, you know, monolithic system. And I know that's a pejorative, but for good reason. So I feel like there's this push in organizations to enable scale, to decentralize data architectures. Okay, great. And put data in the hands of those business owners that you talked about earlier. The domain experts that have business context. Two things, two problems that brings up, is you need infrastructure that's self-service, in that instance. And you need, to your point, automated and computational governance. Those are real challenges. What do you see in terms of the trends to decentralize data architectures? Is it even feasible that everybody wants a single version of the truth, centralized data team, right? And they seem to be at odds. >> Yeah, well I think we're coming from a history informed by centralization. That's what we understand. That's what we kind of gravitate towards, but the reality, as you put it, the world's just distributed. So, what we can do is look at federation. So, it's not necessarily centralization but create connections between data sources which requires some policy and governance. Like, who gets access to what? And also think about those domain experts maybe being the primary source of surfacing a model that you don't necessarily have to know how it was trained or what the internals are. You're using it more to query it as a, you know, the domain expert produces this model, you're in a different part of the organization just leveraging some work that somebody else has done. Which is how we build software, reusable components in software. So, you know, I think building that mindset into data and the whole process of creating value from data is going to be a really critical part of how we roll forward. >> So, there are two things in your keynote. One, that I was kind of in awe of. You wanted to be an astronaut when you were a kid. You know, I mean, I watched the moon landing and I was like, "I'm never going up into space." So, I'm in awe of that. >> Oh, I got the space helmet picture and all that. >> That's awesome, really, you know, hat's off to you. The other one really pissed me off, which was that you're a better skier 'cause you got some device in your boot. >> Oh, it's amazing. >> And the reason it angered me is 'cause I feel like it's the mathematicians taking over baseball, you know. Now, you're saying, you're a better skier because of that. But those are two great edge examples and there's a billion of them, right? So, talk about your edge strategy. Kind of, your passion there, how you see that all evolving. >> Well, first of all, we see the edge as a fundamental part of the future of computing. So in that centralization, decentralization pendulum swing, we're definitely on the path towards distributed computing and that is edge and that's because of data. And also because of the compute capabilities that we have in hardware. Hardware gets more capable, lower power, can bring certain types of accelerators into the mix. And you really create this world where what's happening in a virtual context and what's happening in a physical context can come together through this distributed computing system. Our view is, that's hybrid. That's what we've been working on for years. Just the difference was maybe, originally it was focused on data center, cloud, multi-cloud and now we're just extending that view out to the edge and you need the same kind of consistency for development, for operations, in the edge that you do in that hybrid world. So that's really where we're placing our focus and then it gets into all the different use cases. And you know, really, that's the fun part. >> I'd like to shift gears a little bit 'cause another remarkable statistic you cited during your keynote was, it was a Forrester study that said 99% of all applications now have open source in them. What are the implications of that for those who are building applications? In terms of license compliance and more importantly, I think, confidence in the code that they're borrowing from open source projects. >> Well, I think, first and foremost, it says open source has won. We see that that was audited code bases which means there's mission critical code bases. We see that it's pervasive, it's absolutely everywhere. And that means developers are pulling dependencies into their applications based on all of the genius that's happening in open source communities. Which I think we should celebrate. Right after we're finished celebrating we got to look at what are the implications, right? And that shows up as, are there security vulnerabilities that become ubiquitous because we're using similar dependencies? What is your process for vetting code that you bring into your organization and push into production? You know that process for the code you author, what about your dependencies? And I think that's an important part of understanding and certainly there are some license implications. What are you required to do when you use that code? You've been given that code on a license from the open source community, are you compliant with that license? Some of those are reasonably well understood. Some of those are, you know, newer to the enterprise. So I think we have to look at this holistically and really help enterprises build safe application code that goes into production and runs their business. >> We saw Intel up in the keynotes today. We heard from Nvidia, both companies are coming on. We know you've done a lot of work with ARM over the years. I think Graviton was one of the announcements this week. So, love to see that. I want to run something by you as a technologist. The premise is, you know, we used to live in this CPU centric world. We marched to the cadence of Moore's Law and now we're seeing the combinatorial factors of CPU, GPU, NPU, accelerators and other supporting components. With IO and controllers and NICs all adding up. It seems like we're shifting from a processor centric world to a connect centric world on the hardware side. That first of all, do you buy that premise? And does hardware matter anymore with all the cloud? >> Hardware totally matters. I mean the cloud tried to convince us that hardware doesn't matter and it actually failed. And the reason I say that is because if you go to a cloud, you'll find 100s of different instance types that are all reflections of different types of assemblies of hardware. Faster IO, better storage, certain sizes of memory. All of that is a reflection of, applications need certain types of environments for acceleration, for performance, to do their job. Now I do think there's an element of, we're decomposing compute into all of these different sort of accelerators and the only way to bring that back together is connectivity through the network. But there's also SOCs when you get to the edge where you can integrate the entire system onto a pretty small device. I think the important part here is, we're leveraging hardware to do interesting work on behalf of applications that makes hardware exciting. And as an operating system geek, I couldn't be more thrilled, because that's what we do. We enable hardware, we get down into the bits and bytes and poke registers and bring things to life. There's a lot happening in the hardware world and applications can't always follow it directly. They need that level of indirection through a software abstraction and that's really what we're bringing to life here. >> We've seen now hardware specific AI, you know, AI chips and AI SOCs emerge. How do you make decisions about what you're going to support or do you try to support all of them? >> Well, we definitely have a breadth view of support and we're also just driven by customer demand. Where our customers are interested we work closely with our partners. We understand what their roadmaps are. We plan together ahead of time and we know where they're making investments and we work with our customers. What are the best chips that support their business needs and we focus there first but it ends up being a pretty broad list of hardware that we support. >> I could pick your brain for an hour. We didn't even get into super cloud, Chris. But, thanks so much for coming on theCUBE. It's great to have you. >> Absolutely, thanks for having me. >> All right. Thank you for watching. Keep it right there. Paul Gillin, Dave Vellante, theCUBE's live coverage of Red Hat Summit 2022 from Boston. We'll be right back. (mellow music)

Published Date : May 11 2022

SUMMARY :

We're back at the Red Hat Summit Thanks for having me. Software ate the world, is what you said. what you need to invest in And it's separate from the So, you know, some of the So the technology is 'Cause she was talking about, you know, I can insert a column into the and the data science world. and give a platform that say that's the one tool of the implications of not leaning in? and the business doesn't see the value. Is that something that you and understanding, you know, that you talked about earlier. but the reality, as you put it, when you were a kid. Oh, I got the space you know, hat's off to you. And the reason it angered in the edge that you do What are the implications of that for the code you author, The premise is, you know, and the only way to specific AI, you know, What are the best chips that It's great to have you. Thank you for watching.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amanda Purnell	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Chris Wright	PERSON	0.99+
99%	QUANTITY	0.99+
100s	QUANTITY	0.99+
Boston	LOCATION	0.99+
Red Hat	ORGANIZATION	0.99+
15 years	QUANTITY	0.99+
two problems	QUANTITY	0.99+
today	DATE	0.99+
Intel	ORGANIZATION	0.99+
Forrester	ORGANIZATION	0.99+
both companies	QUANTITY	0.99+
Red Hat Summit 2022	EVENT	0.99+
two	QUANTITY	0.99+
ARM	ORGANIZATION	0.99+
Seaport	LOCATION	0.99+
one	QUANTITY	0.98+
two things	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
one tool	QUANTITY	0.98+
One	QUANTITY	0.97+
first	QUANTITY	0.96+
Two things	QUANTITY	0.96+
this week	DATE	0.95+
Red Hat Summit	EVENT	0.95+
an hour	QUANTITY	0.93+
TensorFlow	TITLE	0.92+
Graviton	ORGANIZATION	0.87+
PyTorch	TITLE	0.87+
separate stacks	QUANTITY	0.85+
single version	QUANTITY	0.83+
Andreessen	PERSON	0.82+
day two	QUANTITY	0.81+
Moore	TITLE	0.79+
three different roles	QUANTITY	0.76+
years	QUANTITY	0.75+
COVID	OTHER	0.7+
edge	QUANTITY	0.6+
billion	QUANTITY	0.54+

Ian Buck, NVIDIA | AWS re:Invent 2021

>>Well, welcome back to the cubes coverage of AWS reinvent 2021. We're here joined by Ian buck, general manager and vice president of accelerated computing at Nvidia I'm. John Ford, your host of the QB. And thanks for coming on. So in video, obviously, great brand congratulates on all your continued success. Everyone who has does anything in graphics knows the GPU's are hot and you guys get great brand great success in the company, but AI and machine learning was seeing the trend significantly being powered by the GPU's and other systems. So it's a key part of everything. So what's the trends that you're seeing, uh, in ML and AI, that's accelerating computing to the cloud. Yeah, >>I mean, AI is kind of drape bragging breakthroughs innovations across so many segments, so many different use cases. We see it showing up with things like credit card, fraud prevention and product and content recommendations. Really it's the new engine behind search engines is AI. Uh, people are applying AI to things like, um, meeting transcriptions, uh, virtual calls like this using AI to actually capture what was said. Um, and that gets applied in person to person interactions. We also see it in intelligence systems assistance for a contact center, automation or chat bots, uh, medical imaging, um, and intelligence stores and warehouses and everywhere. It's really, it's really amazing what AI has been demonstrated, what it can do. And, uh, it's new use cases are showing up all the time. >>Yeah. I'd love to get your thoughts on, on how the world's evolved just in the past few years, along with cloud, and certainly the pandemics proven it. You had this whole kind of full stack mindset initially, and now you're seeing more of a horizontal scale, but yet enabling this vertical specialization in applications. I mean, you mentioned some of those apps, the new enablers, this kind of the horizontal play with enablement for specialization, with data, this is a huge shift that's going on. It's been happening. What's your reaction to that? >>Yeah, it's the innovations on two fronts. There's a horizontal front, which is basically the different kinds of neural networks or AIS as well as machine learning techniques that are, um, just being invented by researchers for, uh, and the community at large, including Amazon. Um, you know, it started with these convolutional neural networks, which are great for image processing, but as it expanded more recently into, uh, recurrent neural networks, transformer models, which are great for language and language and understanding, and then the new hot topic graph neural networks, where the actual graph now is trained as a, as a neural network, you have this underpinning of great AI technologies that are being adventure around the world in videos role is try to productize that and provide a platform for people to do that innovation and then take the next step and innovate vertically. Um, take it, take it and apply it to two particular field, um, like medical, like healthcare and medical imaging applying AI, so that radiologists can have an AI assistant with them and highlight different parts of the scan. >>Then maybe troublesome worrying, or requires more investigation, um, using it for robotics, building virtual worlds, where robots can be trained in a virtual environment, their AI being constantly trained, reinforced, and learn how to do certain activities and techniques. So that the first time it's ever downloaded into a real robot, it works right out of the box, um, to do, to activate that we co we are creating different vertical solutions, vertical stacks for products that talk the languages of those businesses, of those users, uh, in medical imaging, it's processing medical data, which is obviously a very complicated large format data, often three-dimensional boxes in robotics. It's building combining both our graphics and simulation technologies, along with the, you know, the AI training capabilities and different capabilities in order to run in real time. Those are, >>Yeah. I mean, it's just so cutting edge. It's so relevant. I mean, I think one of the things you mentioned about the neural networks, specifically, the graph neural networks, I mean, we saw, I mean, just to go back to the late two thousands, you know, how unstructured data or object store created, a lot of people realize that the value out of that now you've got graph graph value, you got graph network effect, you've got all kinds of new patterns. You guys have this notion of graph neural networks. Um, that's, that's, that's out there. What is, what is a graph neural network and what does it actually mean for deep learning and an AI perspective? >>Yeah, we have a graph is exactly what it sounds like. You have points that are connected to each other, that established relationships and the example of amazon.com. You might have buyers, distributors, sellers, um, and all of them are buying or recommending or selling different products. And they're represented in a graph if I buy something from you and from you, I'm connected to those end points and likewise more deeply across a supply chain or warehouse or other buyers and sellers across the network. What's new right now is that those connections now can be treated and trained like a neural network, understanding the relationship. How strong is that connection between that buyer and seller or that distributor and supplier, and then build up a network that figure out and understand patterns across them. For example, what products I may like. Cause I have this connection in my graph, what other products may meet those requirements, or also identifying things like fraud when, when patterns and buying patterns don't match, what a graph neural networks should say would be the typical kind of graph connectivity, the different kind of weights and connections between the two captured by the frequency half I buy things or how I rate them or give them stars as she used cases, uh, this application graph neural networks, which is basically capturing the connections of all things with all people, especially in the world of e-commerce, it's very exciting to a new application, but applying AI to optimizing business, to reducing fraud and letting us, you know, get access to the products that we want, the products that they have, our recommendations be things that, that excited us and want us to buy things >>Great setup for the real conversation that's going on here at re-invent, which is new kinds of workloads are changing. The game. People are refactoring their business with not just replatform, but actually using this to identify value and see cloud scale allows you to have the compute power to, you know, look at a note on an arc and actually code that. It's all, it's all science, all computer science, all at scale. So with that, that brings up the whole AWS relationship. Can you tell us how you're working with AWS before? >>Yeah. 80 of us has been a great partner and one of the first cloud providers to ever provide GPS the cloud, uh, we most more recently we've announced two new instances, uh, the instance, which is based on the RA 10 G GPU, which has it was supports the Nvidia RTX technology or rendering technology, uh, for real-time Ray tracing and graphics and game streaming is their highest performance graphics, enhanced replicate without allows for those high performance graphics applications to be directly hosted in the cloud. And of course runs everything else as well, including our AI has access to our AI technology runs all of our AI stacks. We also announced with AWS, the G 5g instance, this is exciting because it's the first, uh, graviton or ARM-based processor connected to a GPU and successful in the cloud. Um, this makes, uh, the focus here is Android gaming and machine learning and France. And we're excited to see the advancements that Amazon is making and AWS is making with arm and the cloud. And we're glad to be part of that journey. >>Well, congratulations. I remember I was just watching my interview with James Hamilton from AWS 2013 and 2014. He was getting, he was teasing this out, that they're going to build their own, get in there and build their own connections, take that latency down and do other things. This is kind of the harvest of all that. As you start looking at these new new interfaces and the new servers, new technology that you guys are doing, you're enabling applications. What does, what do you see this enabling as this, as this new capability comes out, new speed, more, more performance, but also now it's enabling more capabilities so that new workloads can be realized. What would you say to folks who want to ask that question? >>Well, so first off I think arm is here to stay and you can see the growth and explosion of my arm, uh, led of course, by grab a tiny to be. I spend many others, uh, and by bringing all of NVIDIA's rendering graphics, machine learning and AI technologies to arm, we can help bring that innovation. That arm allows that open innovation because there's an open architecture to the entire ecosystem. Uh, we can help bring it forward, uh, to the state of the art in AI machine learning, the graphics. Um, we all have our software that we released is both supportive, both on x86 and an army equally, um, and including all of our AI stacks. So most notably for inference the deployment of AI models. We have our, the Nvidia Triton inference server. Uh, this is the, our inference serving software where after he was trained to model, he wanted to play it at scale on any CPU or GPU instance, um, for that matter. So we support both CPS and GPS with Triton. Um, it's natively integrated with SageMaker and provides the benefit of all those performance optimizations all the time. Uh, things like, uh, features like dynamic batching. It supports all the different AI frameworks from PI torch to TensorFlow, even a generalized Python code. Um, we're activating how activating the arm ecosystem as well as bringing all those AI new AI use cases and all those different performance levels, uh, with our partnership with AWS and all the different clouds. >>And you got to making it really easy for people to use, use the technology that brings up the next kind of question I want to ask you. I mean, a lot of people are really going in jumping in the big time into this. They're adopting AI. Either they're moving in from prototype to production. There's always some gaps, whether it's knowledge, skills, gaps, or whatever, but people are accelerating into the AI and leaning into it hard. What advancements have is Nvidia made to make it more accessible, um, for people to move faster through the, through the system, through the process? >>Yeah, it's one of the biggest challenges. The other promise of AI, all the publications that are coming all the way research now, how can you make it more accessible or easier to use by more people rather than just being an AI researcher, which is, uh, uh, obviously a very challenging and interesting field, but not one that's directly in the business. Nvidia is trying to write a full stack approach to AI. So as we make, uh, discover or see these AI technologies come available, we produce SDKs to help activate them or connect them with developers around the world. Uh, we have over 150 different STKs at this point, certain industries from gaming to design, to life sciences, to earth scientist. We even have stuff to help simulate quantum computing. Um, and of course all the, all the work we're doing with AI, 5g and robotics. So, uh, we actually just introduced about 65 new updates just this past month on all those SDKs. Uh, some of the newer stuff that's really exciting is the large language models. Uh, people are building some amazing AI. That's capable of understanding the Corpus of like human understanding, these language models that are trained on literally the continent of the internet to provide general purpose or open domain chatbots. So the customer is going to have a new kind of experience with a computer or the cloud. Uh, we're offering large language, uh, those large language models, as well as AI frameworks to help companies take advantage of this new kind of technology. >>You know, each and every time I do an interview with Nvidia or talk about Nvidia my kids and their friends, they first thing they said, you get me a good graphics card. Hey, I want the best thing in their rig. Obviously the gaming market's hot and known for that, but I mean, but there's a huge software team behind Nvidia. This is a well-known your CEO is always talking about on his keynotes, you're in the software business. And then you had, do have hardware. You were integrating with graviton and other things. So, but it's a software practices, software. This is all about software. Could you share kind of more about how Nvidia culture and their cloud culture and specifically around the scale? I mean, you, you hit every, every use case. So what's the software culture there at Nvidia, >>And it is actually a bigger, we have more software people than hardware people, people don't often realize this. Uh, and in fact that it's because of we create, uh, the, the, it just starts with the chip, obviously building great Silicon is necessary to provide that level of innovation, but as it expanded dramatically from then, from there, uh, not just the Silicon and the GPU, but the server designs themselves, we actually do entire server designs ourselves to help build out this infrastructure. We consume it and use it ourselves and build our own supercomputers to use AI, to improve our products. And then all that software that we build on top, we make it available. As I mentioned before, uh, as containers on our, uh, NGC container store container registry, which is accessible for me to bus, um, to connect to those vertical markets, instead of just opening up the hardware and none of the ecosystem in develop on it, they can with a low-level and programmatic stacks that we provide with Kuda. We believe that those vertical stacks are the ways we can help accelerate and advance AI. And that's why we make as well, >>Ram a little software is so much easier. I want to get that plug for, I think it's worth noting that you guys are, are heavy hardcore, especially on the AI side. And it's worth calling out, uh, getting back to the customers who are bridging that gap and getting out there, what are the metrics they should consider as they're deploying AI? What are success metrics? What does success look like? Can you share any insight into what they should be thinking about and looking at how they're doing? >>Yeah. Um, for training, it's all about time to solution. Um, it's not the hardware that that's the cost, it's the opportunity that AI can provide your business and many, and the productivity of those data scientists, which are developing, which are not easy to come by. So, uh, what we hear from customers is they need a fast time to solution to allow people to prototype very quickly, to train a model to convergence, to get into production quickly, and of course, move on to the next or continue to refine it often. So in training is time to solution for inference. It's about our, your ability to deploy at scale. Often people need to have real time requirements. They want to run in a certain amount of latency, a certain amount of time. And typically most companies don't have a single AI model. They have a collection of them. They want, they want to run for a single service or across multiple services. That's where you can aggregate some of your infrastructure leveraging the trading infant server. I mentioned before can actually run multiple models on a single GPU saving costs, optimizing for efficiency yet still meeting the requirements for latency and the real time experience so that your customers have a good, a good interaction with the AI. >>Awesome. Great. Let's get into, uh, the customer examples. You guys have obviously great customers. Can you share some of the use cases, examples with customers, notable customers? >>Yeah. I want one great part about working in videos as a technology company. You see, you get to engage with such amazing customers across many verticals. Uh, some of the ones that are pretty exciting right now, Netflix is using the G4 instances to CLA um, to do a video effects and animation content. And, you know, from anywhere in the world, in the cloud, uh, as a cloud creation content platform, uh, we work in the energy field that Siemens energy is actually using AI combined with, um, uh, simulation to do predictive maintenance on their energy plants, um, and, and, uh, doing preventing or optimizing onsite inspection activities and eliminating downtime, which is saving a lot of money for the engine industry. Uh, we have worked with Oxford university, uh, which is Oxford university actually has over two, over 20 million artifacts and specimens and collections across its gardens and museums and libraries. They're actually using convenient GPS and Amazon to do enhance image recognition, to classify all these things, which would take literally years with, um, uh, going through manually each of these artifacts using AI, we can click and quickly catalog all of them and connect them with their users. Um, great stories across graphics, about cross industries across research that, uh, it's just so exciting to see what people are doing with our technology together with, >>And thank you so much for coming on the cube. I really appreciate Greg, a lot of great content there. We probably going to go another hour, all the great stuff going on in the video, any closing remarks you want to share as we wrap this last minute up >>Now, the, um, really what Nvidia is about as accelerating cloud computing, whether it be AI, machine learning, graphics, or headphones, community simulation, and AWS was one of the first with this in the beginning, and they continue to bring out great instances to help connect, uh, the cloud and accelerated computing with all the different opportunities integrations with with SageMaker really Ks and ECS. Uh, the new instances with G five and G 5g, very excited to see all the work that we're doing together. >>Ian buck, general manager, and vice president of accelerated computing. I mean, how can you not love that title? We want more, more power, more faster, come on. More computing. No, one's going to complain with more computing know, thanks for coming on. Thank you. Appreciate it. I'm John Farrell hosted the cube. You're watching Amazon coverage reinvent 2021. Thanks for watching.

Published Date : Nov 30 2021

SUMMARY :

knows the GPU's are hot and you guys get great brand great success in the company, but AI and machine learning was seeing the AI. Uh, people are applying AI to things like, um, meeting transcriptions, I mean, you mentioned some of those apps, the new enablers, Yeah, it's the innovations on two fronts. technologies, along with the, you know, the AI training capabilities and different capabilities in I mean, I think one of the things you mentioned about the neural networks, You have points that are connected to each Great setup for the real conversation that's going on here at re-invent, which is new kinds of workloads And we're excited to see the advancements that Amazon is making and AWS is making with arm and interfaces and the new servers, new technology that you guys are doing, you're enabling applications. Well, so first off I think arm is here to stay and you can see the growth and explosion of my arm, I mean, a lot of people are really going in jumping in the big time into this. So the customer is going to have a new kind of experience with a computer And then you had, do have hardware. not just the Silicon and the GPU, but the server designs themselves, we actually do entire server I want to get that plug for, I think it's worth noting that you guys are, that that's the cost, it's the opportunity that AI can provide your business and many, Can you share some of the use cases, examples with customers, notable customers? research that, uh, it's just so exciting to see what people are doing with our technology together with, all the great stuff going on in the video, any closing remarks you want to share as we wrap this last minute up Uh, the new instances with G one's going to complain with more computing know, thanks for coming on.

ENTITIES

Entity	Category	Confidence
Ian buck	PERSON	0.99+
John Farrell	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Ian Buck	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ian buck	PERSON	0.99+
Greg	PERSON	0.99+
2014	DATE	0.99+
Amazon	ORGANIZATION	0.99+
John Ford	PERSON	0.99+
James Hamilton	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
G five	COMMERCIAL_ITEM	0.99+
NVIDIA	ORGANIZATION	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
G 5g	COMMERCIAL_ITEM	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
Android	TITLE	0.99+
Oxford university	ORGANIZATION	0.99+
2013	DATE	0.98+
amazon.com	ORGANIZATION	0.98+
over two	QUANTITY	0.98+
two	QUANTITY	0.98+
first time	QUANTITY	0.97+
single service	QUANTITY	0.97+
2021	DATE	0.97+
two fronts	QUANTITY	0.96+
single	QUANTITY	0.96+
over 20 million artifacts	QUANTITY	0.96+
each	QUANTITY	0.95+
about 65 new updates	QUANTITY	0.93+
Siemens energy	ORGANIZATION	0.92+
over 150 different STKs	QUANTITY	0.92+
single GPU	QUANTITY	0.91+
two new instances	QUANTITY	0.91+
first thing	QUANTITY	0.9+
France	LOCATION	0.87+
two particular field	QUANTITY	0.85+
SageMaker	TITLE	0.85+
Triton	TITLE	0.82+
first cloud providers	QUANTITY	0.81+
NGC	ORGANIZATION	0.77+
80 of	QUANTITY	0.74+
past month	DATE	0.68+
x86	COMMERCIAL_ITEM	0.67+
late	DATE	0.67+
two thousands	QUANTITY	0.64+
pandemics	EVENT	0.64+
past few years	DATE	0.61+
G4	ORGANIZATION	0.6+
RA	COMMERCIAL_ITEM	0.6+
Kuda	ORGANIZATION	0.59+
ECS	ORGANIZATION	0.55+
10 G	OTHER	0.54+
SageMaker	ORGANIZATION	0.49+
TensorFlow	OTHER	0.48+
Ks	ORGANIZATION	0.36+

PA3 Ian Buck

(bright music) >> Well, welcome back to theCUBE's coverage of AWS re:Invent 2021. We're here joined by Ian Buck, general manager and vice president of Accelerated Computing at NVIDIA. I'm John Furrrier, host of theCUBE. Ian, thanks for coming on. >> Oh, thanks for having me. >> So NVIDIA, obviously, great brand. Congratulations on all your continued success. Everyone who does anything in graphics knows that GPU's are hot, and you guys have a great brand, great success in the company. But AI and machine learning, we're seeing the trend significantly being powered by the GPU's and other systems. So it's a key part of everything. So what's the trends that you're seeing in ML and AI that's accelerating computing to the cloud? >> Yeah. I mean, AI is kind of driving breakthroughs and innovations across so many segments, so many different use cases. We see it showing up with things like credit card fraud prevention, and product and content recommendations. Really, it's the new engine behind search engines, is AI. People are applying AI to things like meeting transcriptions, virtual calls like this, using AI to actually capture what was said. And that gets applied in person-to-person interactions. We also see it in intelligence assistance for contact center automation, or chat bots, medical imaging, and intelligence stores, and warehouses, and everywhere. It's really amazing what AI has been demonstrating, what it can do, and its new use cases are showing up all the time. >> You know, Ian, I'd love to get your thoughts on how the world's evolved, just in the past few years alone, with cloud. And certainly, the pandemic's proven it. You had this whole kind of fullstack mindset, initially, and now you're seeing more of a horizontal scale, but yet, enabling this vertical specialization in applications. I mean, you mentioned some of those apps. The new enablers, this kind of, the horizontal play with enablement for, you know, specialization with data, this is a huge shift that's going on. It's been happening. What's your reaction to that? >> Yeah. The innovation's on two fronts. There's a horizontal front, which is basically the different kinds of neural networks or AIs, as well as machine learning techniques, that are just being invented by researchers and the community at large, including Amazon. You know, it started with these convolutional neural networks, which are great for image processing, but has expanded more recently into recurrent neural networks, transformer models, which are great for language and language and understanding, and then the new hot topic, graph neural networks, where the actual graph now is trained as a neural network. You have this underpinning of great AI technologies that are being invented around the world. NVIDIA's role is to try to productize that and provide a platform for people to do that innovation. And then, take the next step and innovate vertically. Take it and apply it to a particular field, like medical, like healthcare and medical imaging, applying AI so that radiologists can have an AI assistant with them and highlight different parts of the scan that may be troublesome or worrying, or require some more investigation. Using it for robotics, building virtual worlds where robots can be trained in a virtual environment, their AI being constantly trained and reinforced, and learn how to do certain activities and techniques. So that the first time it's ever downloaded into a real robot, it works right out of the box. To activate that, we are creating different vertical solutions, vertical stacks, vertical products, that talk the languages of those businesses, of those users. In medical imaging, it's processing medical data, which is obviously a very complicated, large format data, often three-dimensional voxels. In robotics, it's building, combining both our graphics and simulation technologies, along with the AI training capabilities and difference capabilities, in order to run in real time. Those are just two simple- >> Yeah, no. I mean, it's just so cutting-edge, it's so relevant. I mean, I think one of the things you mentioned about the neural networks, specifically, the graph neural networks, I mean, we saw, I mean, just go back to the late 2000s, how unstructured data, or object storage created, a lot of people realized a lot of value out of that. Now you got graph value, you got network effect, you got all kinds of new patterns. You guys have this notion of graph neural networks that's out there. What is a graph neural network, and what does it actually mean from a deep learning and an AI perspective? >> Yeah. I mean, a graph is exactly what it sounds like. You have points that are connected to each other, that establish relationships. In the example of Amazon.com, you might have buyers, distributors, sellers, and all of them are buying, or recommending, or selling different products. And they're represented in a graph. If I buy something from you and from you, I'm connected to those endpoints, and likewise, more deeply across a supply chain, or warehouse, or other buyers and sellers across the network. What's new right now is, that those connections now can be treated and trained like a neural network, understanding the relationship, how strong is that connection between that buyer and seller, or the distributor and supplier, and then build up a network to figure out and understand patterns across them. For example, what products I may like, 'cause I have this connection in my graph, what other products may meet those requirements? Or, also, identifying things like fraud, When patterns and buying patterns don't match what a graph neural networks should say would be the typical kind of graph connectivity, the different kind of weights and connections between the two, captured by the frequency of how often I buy things, or how I rate them or give them stars, or other such use cases. This application, graph neural networks, which is basically capturing the connections of all things with all people, especially in the world of e-commerce, is very exciting to a new application of applying AI to optimizing business, to reducing fraud, and letting us, you know, get access to the products that we want. They have our recommendations be things that excite us and want us to buy things, and buy more. >> That's a great setup for the real conversation that's going on here at re:Invent, which is new kinds of workloads are changing the game, people are refactoring their business with, not just re-platforming, but actually using this to identify value. And also, your cloud scale allows you to have the compute power to, you know, look at a note in an arc and actually code that. It's all science, it's all computer science, all at scale. So with that, that brings up the whole AWS relationship. Can you tell us how you're working with AWS, specifically? >> Yeah, AWS have been a great partner, and one of the first cloud providers to ever provide GPUs to the cloud. More recently, we've announced two new instances, the G5 instance, which is based on our A10G GPU, which supports the NVIDIA RTX technology, our rendering technology, for real-time ray tracing in graphics and game streaming. This is our highest performance graphics enhanced application, allows for those high-performance graphics applications to be directly hosted in the cloud. And, of course, runs everything else as well. It has access to our AI technology and runs all of our AI stacks. We also announced, with AWS, the G5 G instance. This is exciting because it's the first Graviton or Arm-based processor connected to a GPU and successful in the cloud. The focus here is Android gaming and machine learning inference. And we're excited to see the advancements that Amazon is making and AWS is making, with Arm in the cloud. And we're glad to be part of that journey. >> Well, congratulations. I remember, I was just watching my interview with James Hamilton from AWS 2013 and 2014. He was teasing this out, that they're going to build their own, get in there, and build their own connections to take that latency down and do other things. This is kind of the harvest of all that. As you start looking at these new interfaces, and the new servers, new technology that you guys are doing, you're enabling applications. What do you see this enabling? As this new capability comes out, new speed, more performance, but also, now it's enabling more capabilities so that new workloads can be realized. What would you say to folks who want to ask that question? >> Well, so first off, I think Arm is here to stay. We can see the growth and explosion of Arm, led of course, by Graviton and AWS, but many others. And by bringing all of NVIDIA's rendering graphics, machine learning and AI technologies to Arm, we can help bring that innovation that Arm allows, that open innovation, because there's an open architecture, to the entire ecosystem. We can help bring it forward to the state of the art in AI machine learning and graphics. All of our software that we release is both supportive, both on x86 and on Arm equally, and including all of our AI stacks. So most notably, for inference, the deployment of AI models, we have the NVIDIA Triton inference server. This is our inference serving software, where after you've trained a model, you want to deploy it at scale on any CPU, or GPU instance, for that matter. So we support both CPUs and GPUs with Triton. It's natively integrated with SageMaker and provides the benefit of all those performance optimizations. Features like dynamic batching, it supports all the different AI frameworks, from PyTorch to TensorFlow, even a generalized Python code. We're activating, and help activating, the Arm ecosystem, as well as bringing all those new AI use cases, and all those different performance levels with our partnership with AWS and all the different cloud instances. >> And you guys are making it really easy for people to use use the technology. That brings up the next, kind of, question I wanted to ask you. I mean, a lot of people are really going in, jumping in big-time into this. They're adopting AI, either they're moving it from prototype to production. There's always some gaps, whether it's, you know, knowledge, skills gaps, or whatever. But people are accelerating into the AI and leaning into it hard. What advancements has NVIDIA made to make it more accessible for people to move faster through the system, through the process? >> Yeah. It's one of the biggest challenges. You know, the promise of AI, all the publications that are coming out, all the great research, you know, how can you make it more accessible or easier to use by more people? Rather than just being an AI researcher, which is obviously a very challenging and interesting field, but not one that's directly connected to the business. NVIDIA is trying to provide a fullstack approach to AI. So as we discover or see these AI technologies become available, we produce SDKs to help activate them or connect them with developers around the world. We have over 150 different SDKs at this point, serving industries from gaming, to design, to life sciences, to earth sciences. We even have stuff to help simulate quantum computing. And of course, all the work we're doing with AI, 5G, and robotics. So we actually just introduced about 65 new updates, just this past month, on all those SDKs. Some of the newer stuff that's really exciting is the large language models. People are building some amazing AI that's capable of understanding the corpus of, like, human understanding. These language models that are trained on literally the content of the internet to provide general purpose or open-domain chatbots, so the customer is going to have a new kind of experience with the computer or the cloud. We're offering those large language models, as well as AI frameworks, to help companies take advantage of this new kind of technology. >> You know, Ian, every time I do an interview with NVIDIA or talk about NVIDIA, my kids and friends, first thing they say is, "Can you get me a good graphics card?" They all want the best thing in their rig. Obviously the gaming market's hot and known for that. But there's a huge software team behind NVIDIA. This is well-known. Your CEO is always talking about it on his keynotes. You're in the software business. And you do have hardware, you are integrating with Graviton and other things. But it's a software practice. This is software. This is all about software. >> Right. >> Can you share, kind of, more about how NVIDIA culture and their cloud culture, and specifically around the scale, I mean, you hit every use case. So what's the software culture there at NVIDIA? >> Yeah, NVIDIA's actually a bigger, we have more software people than hardware people. But people don't often realize this. And in fact, that it's because of, it just starts with the chip, and obviously, building great silicon is necessary to provide that level of innovation. But it's expanded dramatically from there. Not just the silicon and the GPU, but the server designs themselves. We actually do entire server designs ourselves, to help build out this infrastructure. We consume it and use it ourselves, and build our own supercomputers to use AI to improve our products. And then, all that software that we build on top, we make it available, as I mentioned before, as containers on our NGC container store, container registry, which is accessible from AWS, to connect to those vertical markets. Instead of just opening up the hardware and letting the ecosystem develop on it, they can, with the low-level and programmatic stacks that we provide with CUDA. We believe that those vertical stacks are the ways we can help accelerate and advance AI. And that's why we make them so available. >> And programmable software is so much easier. I want to get that plug in for, I think it's worth noting that you guys are heavy hardcore, especially on the AI side, and it's worth calling out. Getting back to the customers who are bridging that gap and getting out there, what are the metrics they should consider as they're deploying AI? What are success metrics? What does success look like? Can you share any insight into what they should be thinking about, and looking at how they're doing? >> Yeah. For training, it's all about time-to-solution. It's not the hardware that's the cost, it's the opportunity that AI can provide to your business, and the productivity of those data scientists which are developing them, which are not easy to come by. So what we hear from customers is they need a fast time-to-solution to allow people to prototype very quickly, to train a model to convergence, to get into production quickly, and of course, move on to the next or continue to refine it. >> John Furrier: Often. >> So in training, it's time-to-solution. For inference, it's about your ability to deploy at scale. Often people need to have real-time requirements. They want to run in a certain amount of latency, in a certain amount of time. And typically, most companies don't have a single AI model. They have a collection of them they want to run for a single service or across multiple services. That's where you can aggregate some of your infrastructure. Leveraging the Triton inference server, I mentioned before, can actually run multiple models on a single GPU saving costs, optimizing for efficiency, yet still meeting the requirements for latency and the real-time experience, so that our customers have a good interaction with the AI. >> Awesome. Great. Let's get into the customer examples. You guys have, obviously, great customers. Can you share some of the use cases examples with customers, notable customers? >> Yeah. One great part about working at NVIDIA is, as technology company, you get to engage with such amazing customers across many verticals. Some of the ones that are pretty exciting right now, Netflix is using the G4 instances to do a video effects and animation content from anywhere in the world, in the cloud, as a cloud creation content platform. We work in the energy field. Siemens energy is actually using AI combined with simulation to do predictive maintenance on their energy plants, preventing, or optimizing, onsite inspection activities and eliminating downtime, which is saving a lot of money for the energy industry. We have worked with Oxford University. Oxford University actually has over 20 million artifacts and specimens and collections, across its gardens and museums and libraries. They're actually using NVIDIA GPU's and Amazon to do enhanced image recognition to classify all these things, which would take literally years going through manually, each of these artifacts. Using AI, we can quickly catalog all of them and connect them with their users. Great stories across graphics, across industries, across research, that it's just so exciting to see what people are doing with our technology, together with Amazon. >> Ian, thank you so much for coming on theCUBE. I really appreciate it. A lot of great content there. We probably could go another hour. All the great stuff going on at NVIDIA. Any closing remarks you want to share, as we wrap this last minute up? >> You know, really what NVIDIA's about, is accelerating cloud computing. Whether it be AI, machine learning, graphics, or high-performance computing and simulation. And AWS was one of the first with this, in the beginning, and they continue to bring out great instances to help connect the cloud and accelerated computing with all the different opportunities. The integrations with EC2, with SageMaker, with EKS, and ECS. The new instances with G5 and G5 G. Very excited to see all the work that we're doing together. >> Ian Buck, general manager and vice president of Accelerated Computing. I mean, how can you not love that title? We want more power, more faster, come on. More computing. No one's going to complain with more computing. Ian, thanks for coming on. >> Thank you. >> Appreciate it. I'm John Furrier, host of theCUBE. You're watching Amazon coverage re:Invent 2021. Thanks for watching. (bright music)

Published Date : Nov 18 2021

SUMMARY :

to theCUBE's coverage and you guys have a great brand, Really, it's the new engine And certainly, the pandemic's proven it. and the community at the things you mentioned and connections between the two, the compute power to, you and one of the first cloud providers This is kind of the harvest of all that. and all the different cloud instances. But people are accelerating into the AI so the customer is going to You're in the software business. and specifically around the scale, and build our own supercomputers to use AI especially on the AI side, and the productivity of and the real-time experience, the use cases examples Some of the ones that are All the great stuff going on at NVIDIA. and they continue to No one's going to complain I'm John Furrier, host of theCUBE.

ENTITIES

Entity	Category	Confidence
John Furrrier	PERSON	0.99+
Ian Buck	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Ian	PERSON	0.99+
John Furrier	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Oxford University	ORGANIZATION	0.99+
James Hamilton	PERSON	0.99+
2014	DATE	0.99+
Netflix	ORGANIZATION	0.99+
Amazon.com	ORGANIZATION	0.99+
G5 G	COMMERCIAL_ITEM	0.99+
Python	TITLE	0.99+
late 2000s	DATE	0.99+
Graviton	ORGANIZATION	0.99+
Android	TITLE	0.99+
One	QUANTITY	0.99+
one	QUANTITY	0.99+
Accelerated Computing	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first time	QUANTITY	0.99+
two	QUANTITY	0.98+
2013	DATE	0.98+
A10G	COMMERCIAL_ITEM	0.98+
both	QUANTITY	0.98+
two fronts	QUANTITY	0.98+
each	QUANTITY	0.98+
single service	QUANTITY	0.98+
PyTorch	TITLE	0.98+
over 20 million artifacts	QUANTITY	0.97+
single	QUANTITY	0.97+
TensorFlow	TITLE	0.95+
EC2	TITLE	0.94+
G5 instance	COMMERCIAL_ITEM	0.94+
over 150 different SDKs	QUANTITY	0.93+
SageMaker	TITLE	0.93+
G5	COMMERCIAL_ITEM	0.93+
Arm	ORGANIZATION	0.91+
first thing	QUANTITY	0.91+
single GPU	QUANTITY	0.9+
theCUBE	ORGANIZATION	0.9+
about 65 new updates	QUANTITY	0.89+
two new instances	QUANTITY	0.89+
pandemic	EVENT	0.88+
Triton	ORGANIZATION	0.87+
PA3	ORGANIZATION	0.87+
Triton	TITLE	0.84+
Invent	EVENT	0.83+
G5 G.	COMMERCIAL_ITEM	0.82+
two simple	QUANTITY	0.8+

Mark Hinkle | KubeCon + CloudNativeCon NA 2021

(upbeat music) >> Greetings from Los Angeles, Lisa Martin here with Dave Nicholson. We are on day three of the caves wall-to-wall coverage of KubeCon CloudNativeCon North America 21. We're pleased to welcome Mark Hinkle to the program, the co-founder and CEO of TriggerMesh. Mark welcome. >> Thank you, It's nice to be here. >> Lisa: Love the name. Very interesting TriggerMesh. Talk to us about what TriggerMesh does and what, when you were founded and what some of the gaps were that you saw in the market. >> Yeah, so TriggerMesh actually the Genesis of the name is in, cloud event, driven architecture. You trigger workloads. So that's the trigger and trigger mesh, and then mesh, we mesh services together, so cloud, so that's why we're called TriggerMesh. So we're a cloud native open source integration platform. And the idea is that, the number of cloud services are proliferating. You still have stuff in your data center that you can't decommission and just wholesale lift and shift to the cloud. So we wanted to provide a platform to create workflows from the data center, to the cloud, from cloud to cloud and not, and use all the cloud native design principles, but not leave your past behind. So that's, what we do. We're, very, we were cloud, we are cloud operators and developers, and we wanted the experience to be very similar to the way that DevOps folks are doing infrastructure code and deploying that we want to make it easy to do integration as code. So we follow the same design patterns, use the same domain languages, some of those tools like Hashi corpse, Terraform, and that that's what we do and how we go about doing it. >> Lisa: And when were you guys founded? >> September, 2018. >> Oh so your young, your three years young. >> Three years it's feels like 21 >> I bet. >> And startup years it's a lot has happened, but yeah, we my co-founder and I were former early cloud folks. We were at cloud.com worked through the OpenStack years and the CloudStack, and we just saw the pattern of, abstraction coming about. So first you abstract the hardware, then you abstract the operating system. And now at with the Kubernetes container, you know, evolution, you're abstracting it up to the application layer and we want it to be able to provide tooling that lets you take full advantage of that. >> Dave: So being founded in 2018, what's your perception of that? The shift that happened during the pandemic in terms of the drive towards cloud adoption and the demands for services like you provide? >> Mark: Yeah, I think it's a mixed blessing. So we, people became more remote. They needed to enable digital transformation. Biggest thing, I think that that for us is, you know, you don't go to the bank anymore. And the banking industry is doing, you know, exponentially more remote, online transactions than in person. And it's very important. So we decided that financial services is where we were going to start with first because they have a lot of legacy architecture. They have a lot of need to move to the cloud to have better digital experiences. And we wanted to enable them to, you know, keep their mainframes online while they were still doing cutting edge, you know, mobile applications, that kind of thing. >> Lisa: And of course the legacy institutions like the BFA's the Wells Fargo, they're competing with the fintechs who are much more nimble, much more agile and able to sort of disrupt the financial services industry. Was that part of also your decision to start in financial services? >> It was a little bit of luck because we started with our network and it turned out the, you know, we saw, we started talking to our friends early on, cause we're a startup and said, this is what we're going to do. And where it really resonated was PNC bank was our, one of our first customers. You know, another financial regulatory company was another one, a couple of banks in Europe. And we, you know, as we started talking about what we were doing, that we just gravitated there because they had the, the biggest need, even though everybody has the need, their businesses are, you know, critically tied to digital transformation. >> So starting with financial services. >> It's, it's counter intuitive, isn't it? >> It was counterintuitive, but it lends credibility to any other industry vertical that you're going to approach. >> Yeah, yeah it does. It's a, it's a great, they're going to be our hardest customers and they have more at stake than a lot of like transactions are millions and millions of dollars per hour for these folks. So they don't want to play around, they, they have no tolerance for failure. So it's a good start, but it's sort of like taking up jogging and running a marathon in your first week. It's very very grilling in that sense, but it really has made us a lot better and gave us a lot of insight into the kinds of things we need to do from not just functionality, but security and that kind of thing. >> Where are you finding these customers with respect to adoption of Kubernetes? Are they leading? Are they knowing we've got to get there eventually from an infrastructure perspective? >> So the interesting thing is Kubernetes is a platform for us to deliver on, so we, we don't require you to be a Kubernetes expert we offer it as a SaaS, but what happens is that the Kubernetes folks are the ones that we end up really engaging with earlier on. And I think that we find that they're in this phase of they're containerizing their apps, that's the first step. And then they're putting them on Kubernetes and then their next step is a security and integration path. So once she, I think they call it and this is my buzzword of the show day two operations, right? So they, they get to day two and then they have a security and an integration concern before they go live. So they want to be able to make sure that they don't increase their attack face. And then they also want to make sure that this newly deployed containerized infrastructure is as well integrated as the previous, you know, virtualized or even, you know, on the server infrastructure that they had before. >> So TriggerMesh, doesn't solely work in the containerized world, you're, you're sort of you're bridging the divide. >> Mark: Yes. >> What percentage of the workloads that you're seeing are the result of modernization migration, as opposed to standing up net new application environments in Kubernetes? Do you have a sense for that? >> I think we live in a lot in the brown field. So, you know, folks that have an existing project that they're trying to bridge to it versus the Greenfield kind of, you know, the, the huge wins that you saw in the early cloud days of the Netflix and the Twitter's Dwayne scale. Now we're talking to the enterprises who have, you know, they have existing concerns. So I would say that it's, it's mostly people that are, you know, very few net new projects, unless it's a modernization and they're getting ready to decommission an old one, which is. >> Dave: So Brownfield financial services. You just said, you know, let's just, let's just go after that. >> You know, yeah. I mean, we had this dart forward and we put up buzzwords, but no, it was, it was actually just, and you know, we're still finding our way as far as early on where we're open source folks. And we did not open source from day one, which is very weird when everybody's new, your identity is, you know, I worked, I was the VP of marketing for Linux foundation and no JS and all these open source projects. And my co-founder and I are Apache committers. And our project wasn't open yet because we had to get to the point where it could be open and people could be productive in the use and contribution. And we had to staff up engineers. And now I think this week we open-sourced our entire platform. And I think that's going to open up, you know, that's where we started because it was not necessarily the lowest hanging fruit, but the profitable, less profitable, lowest hanging fruit was financial services. Now we are letting our code out into the wild. And I think it'll be interesting to see what comes back. >> So you just announced that this week TriggerMesh integration platform as an open source project here at KubeCon, what's been some of the feedback? >> It's all been positive. I haven't heard anything negative. We did it, so we're very, very, there's a very, the culture around open source is very tough. It's very critical if you don't do it right. So I think we did a good job, we used enough, we used a OSI approved. They've been sourced, licensed the Apache software, a V2 license. We hired someone who was well-respected in the DevREL world from a chef who understands the DevOps sort of culture methodologies. We staffed up our engineers who are going to be helping the free and open source users. So they're successful and we're betting that that will yield business results down the road. >> Lisa: And what are the two I see on your website, two primary use cases that you guys support. Can you dig into details on that? >> So the first one is sort of a workflow automation and a really simple example of that is you have a, something that happens in one cloud. So for example, you take a picture on your phone and you upload it and it goes to Amazon and there is a service that wants to identify what's in that picture. And once you put it on the line and the internship parlance, you could kick off a workflow from TensorFlow, which is artificial intelligence to identify the picture. And there isn't a good way for clouds to communicate from one to the other, without writing custom blue, which is really what, what we're helping to get rid of is there's a lot of blue written to put together cloud native applications. So that's a workflow, you know, triggering a server less function is the workflow. The other thing is actually breaking up data gravity. So I have a warehouse of data, in my data center, and I want to start replicating some portion of that. As it changes to a database as a service, we can based on an event flow, which is passive. We're not, we're not making, having a conversation like you would with an API where there's an event stream. That's like drinking from the fire hose and TriggerMesh is the nozzle. And we can direct that data to a DBaaS. We can direct that data to snowflake. We can direct that data to a cloud-based data lake on Microsoft Azure, or we can split it up, so some events could go to Splunk and all of the events can go to your data lake or some of those, those things can be used to trigger workloads on other systems. And that event driven architecture is really the design pattern of the individual clouds. We're just making it multi-cloud and on-prem. >> Lisa: Do you have a favorite customer example that you think really articulates that the value of that use case? >> Mark: Yeah I think a PNC is probably our, well for the, for the data flow one, I would say we have a regular to Oracle and one of their customers it was their biggest SMB customer of last year. The Oracle cloud is very, very important, but it's not as tool. It doesn't have the same level of tooling as a lot of the other ones. And to, to close that deal, their regulatory customer wanted to use Datadog. So they have hundreds and hundreds of metrics. And what TriggerMesh did was ingest the hundreds and hundreds of metrics and filter them and connect them to Datadog so that, they could, use Datadog to measure, to monitor workloads on Oracle cloud. So that, would be an example of the data flow on the workflow. PNC bank is, is probably our best example and PNC bank. They want to do. I talked about infrastructure code integration is code. They want to do policy as code. So they're very highly regulatory regulated. And what they used to do is they had policies that they applied against all their systems once a month, to determine how much they were in compliance. Well, theoretically if you do that once a month, it could be 30 days before you knew where you were out of compliance. What we did was, we provided them a way to take all of the changes within their systems and for them to a server less cluster. And they codified all of these policies into server less functions and TriggerMesh is triggering their policies as code. So upon change, they're getting almost real-time updates on whether or not they're in compliance or not. And that's a huge thing. And they're going to, they have, within their first division, we worked with, you know, tens of policies throughout PNC. They have thousands of policies. And so that's really going to revolutionize what they're able to do as far as compliance. And that's a huge use case across the whole banking system. >> That's also a huge business outcome. >> Yes. >> So Mark, where can folks go to learn more about TriggerMesh, maybe even read about more specifically about the announcement that you made this week. >> TriggerMesh.com is the best way to get an overview. The open source project is get hub.com/triggermesh/trigger mesh. >> Awesome Mark, thank you for joining Dave and me talking to us about TriggerMesh, what you guys are doing. The use cases that you're enabling customers. We appreciate your time and we wish you best of luck as you continue to forge into financial services and other industries. >> Thanks, it was great to be here. >> All right. For Dave Nicholson, I'm Lisa Martin coming to you live from Los Angeles at KubeCon and CloudNativeCon North America 21, stick around Dave and I, will be right back with our next guest.

Published Date : Oct 15 2021

SUMMARY :

the co-founder and CEO of TriggerMesh. Talk to us about what the data center, to the cloud, Oh so your young, So first you abstract the hardware, I think that that for us is, you know, like the BFA's the And we, you know, but it lends credibility to any So they don't want to play around, as the previous, you know, the containerized world, it's mostly people that are, you know, You just said, you know, to open up, you know, So I think we did a good that you guys support. So that's a workflow, you know, we worked with, you know, announcement that you made this week. TriggerMesh.com is the and me talking to us about you live from Los Angeles at

ENTITIES

Entity	Category	Confidence
Mark Hinkle	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Dave	PERSON	0.99+
Lisa Martin	PERSON	0.99+
PNC	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
2018	DATE	0.99+
Lisa	PERSON	0.99+
September, 2018	DATE	0.99+
Mark	PERSON	0.99+
Los Angeles	LOCATION	0.99+
Wells Fargo	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
BFA	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
first division	QUANTITY	0.99+
Three years	QUANTITY	0.99+
two	QUANTITY	0.99+
Twitter	ORGANIZATION	0.99+
first step	QUANTITY	0.99+
last year	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
KubeCon	EVENT	0.99+
30 days	QUANTITY	0.99+
TriggerMesh	ORGANIZATION	0.98+
this week	DATE	0.98+
CloudStack	TITLE	0.98+
21	QUANTITY	0.98+
hub.com/triggermesh/trigger mesh	OTHER	0.98+
first week	QUANTITY	0.98+
KubeCon	ORGANIZATION	0.98+
CloudNativeCon North America 21	EVENT	0.97+
Linux	ORGANIZATION	0.97+
once a month	QUANTITY	0.97+
Apache	ORGANIZATION	0.97+
first	QUANTITY	0.96+
first customers	QUANTITY	0.96+
tens of policies	QUANTITY	0.96+
two primary use cases	QUANTITY	0.96+
one	QUANTITY	0.95+
first one	QUANTITY	0.95+
thousands of policies	QUANTITY	0.94+
Brownfield	ORGANIZATION	0.93+
day two	QUANTITY	0.92+
day three	QUANTITY	0.92+
one cloud	QUANTITY	0.91+
Hashi corpse	TITLE	0.91+
day two	QUANTITY	0.9+
OpenStack	TITLE	0.88+
PNC bank	ORGANIZATION	0.87+
hundreds of metrics	QUANTITY	0.87+
TensorFlow	ORGANIZATION	0.86+
CloudNativeCon NA 2021	EVENT	0.85+
Terraform	TITLE	0.83+
KubeCon CloudNativeCon North America 21	EVENT	0.82+
Kubernetes	TITLE	0.81+
pandemic	EVENT	0.81+
hundreds and	QUANTITY	0.8+
cloud.com	ORGANIZATION	0.79+
DevOps	TITLE	0.75+
Greenfield	ORGANIZATION	0.74+

Ajay Singh, Pure Storage | CUBEconversation

(upbeat music) >> The Cloud essentially turned the data center into an API and ushered in the era of programmable infrastructure, no longer do we think about deploying infrastructure in rigid silos with a hardened, outer shell, rather infrastructure has to facilitate digital business strategies. And what this means is putting data at the core of your organization, irrespective of its physical location. It also means infrastructure generally and storage specifically must be accessed as sets of services that can be discovered, deployed, managed, secured, and governed in a DevOps model or OpsDev, if you prefer. Now, this has specific implications as to how vendor product strategies will evolve and how they'll meet modern data requirements. Welcome to this Cube conversation, everybody. This is Dave Vellante. And with me to discuss these sea changes is Ajay Singh, the Chief Product Officer of Pure Storage, Ajay welcome. >> Thank you, David, gald to be on. >> Yeah, great to have you, so let's talk about your role at Pure. I think you're the first CPO, what's the vision there? >> That's right, I just joined up Pure about eight months ago from VMware as the chief product officer and you're right, I'm the first our chief product officer at Pure. And at VMware I ran the Cloud management business unit, which was a lot about automation and infrastructure as code. And it's just great to join Pure, which has a phenomenal all flash product set. I kind of call it the iPhone or flash story super easy to use. And how do we take that same ease of use, which is a heart of a Cloud operating principle, and how do we actually take it up to really deliver a modern data experience, which includes infrastructure and storage as code, but then even more beyond that and how do you do modern operations and then modern data services. So super excited to be at Pure. And the vision, if you may, at the end of the day, is to provide, leveraging this moderate experience, a connected and effortless experience data experience, which allows customers to ultimately focus on what matters for them, their business, and by really leveraging and managing and winning with their data, because ultimately data is the new oil, if you may, and if you can mine it, get insights from it and really drive a competitive edge in the digital transformation in your head, and that's what be intended to help our customers to. >> So you joined earlier this year kind of, I guess, middle of the pandemic really I'm interested in kind of your first 100 days, what that was like, what key milestones you set and now you're into your second a 100 plus days. How's that all going? What can you share with us in and that's interesting timing because the effects of the pandemic you came in in a kind of post that, so you had experience from VMware and then you had to apply that to the product organization. So tell us about that sort of first a 100 days and the sort of mission now. >> Absolutely, so as we talked about the vision, around the modern data experience, kind of have three components to it, modernizing the infrastructure and really it's kudos to the team out of the work we've been doing, a ton of work in modernizing the infrastructure, I'll briefly talk to that, then modernizing the data, much more than modernizing the operations. I'll talk to that as well. And then of course, down the pike, modernizing data services. So if you think about it from modernizing the infrastructure, if you think about Pure for a minute, Pure is the first company that took flash to mainstream, essentially bringing what we call consumer simplicity to enterprise storage. The manual for the products with the front and back of a business card, that's it, you plug it in, boom, it's up and running, and then you get proactive AI driven support, right? So that was kind of the heart of Pure. Now you think about Pure again, what's unique about Pure has been a lot of our competition, has dealt with flash at the SSD level, hey, because guess what? All this software was built for hard drive. And so if I can treat NAND as a solid state drive SSD, then my software would easily work on it. But with Pure, because we started with flash, we released went straight to the NAND level, and as opposed to kind of the SSD layer, and what that does is it gives you greater efficiency, greater reliability and create a performance compared to an SSD, because you can optimize at the chip level as opposed to at the SSD module level. That's one big advantage that Pure has going for itself. And if you look at the physics, in the industry for a minute, there's recent data put out by Wikibon early this year, effectively showing that by the year 2026, flash on a dollar per terabyte basis, just the economics of the semiconductor versus the hard disk is going to be cheaper than hard disk. So this big inflection point is slowly but surely coming that's going to disrupt the hardest industry, already the high end has been taken over by flash, but hybrid is next and then even the long tail is coming up over there. And so to end to that extent our lead, if you may, the introduction of QLC NAND, QLC NAND powerful competition is barely introducing, we've been at it for a while. We just recently this year in my first a 100 days, we introduced the flasher AC, C40 and C60 drives, which really start to open up our ability to go after the hybrid story market in a big way. It opens up a big new market for us. So great work there by the team,. Also at the heart of it. If you think about it in the NAND side, we have our flash array, which is a scale-up latency centric architecture and FlashBlade which is a scale-out throughput architecture, all operating with NAND. And what that does is it allows us to cover both structured data, unstructured data, tier one apps and tier two apps. So pretty broad data coverage in that journey to the all flash data center, slowly but surely we're heading over there to the all flash data center based on demand economics that we just talked about, and we've done a bunch of releases. And then the team has done a bunch of things around introducing and NVME or fabric, the kind of thing that you expect them to do. A lot of recognition in the industry for the team or from the likes of TrustRadius, Gartner, named FlashRay, the Carton Peer Insights, the customer choice award and primary storage in the MQ. We were the leader. So a lot of kudos and recognition coming to the team as a result, Flash Blade just hit a billion dollars in cumulative revenue, kind of a leader by far in kind of the unstructured data, fast file an object marketplace. And then of course, all the work we're doing around what we say, ESG, environmental, social and governance, around reducing carbon footprint, reducing waste, our whole notion of evergreen and non-disruptive upgrades. We also kind of did a lot of work in that where we actually announced that over 2,700 customers have actually done non-disruptive upgrades over the technology. >> Yeah a lot to unpack there. And a lot of this sometimes you people say, oh, it's the plumbing, but the plumbing is actually very important too. 'Cause we're in a major inflection point, when we went from spinning disk to NAND. And it's all about volumes, you're seeing this all over the industry now, you see your old boss, Pat Gelsinger, is dealing with this at Intel. And it's all about consumer volumes in my view anyway, because thanks to Steve Jobs, NAND volumes are enormous and what two hard disk drive makers left in the planet. I don't know, maybe there's two and a half, but so those volumes drive costs down. And so you're on that curve and you can debate as to when it's going to happen, but it's not an if it's a when. Let me, shift gears a little bit. Because Cloud, as I was saying, it's ushered in this API economy, this as a service model, a lot of infrastructure companies have responded. How are you thinking at Pure about the as a service model for your customers? What's the strategy? How is it evolving and how does it differentiate from the competition? >> Absolutely, a great question. It's kind of segues into the second part of the moderate experience, which is how do you modernize the operations? And that's where automation as a service, because ultimately, the Cloud has validated and the address of this model, right? People are looking for outcomes. They care less about how you get there. They just want the outcome. And the as a service model actually delivers these outcomes. And this whole notion of infrastructure as code is kind of the start of it. Imagine if my infrastructure for a developer is just a line of code, in a Git repository in a program that goes through a CICD process and automatically kind of is configured and set up, fits in with the Terraform, the Ansibles, all that different automation frameworks. And so what we've done is we've gone down the path of really building out what I think is modern operations with this ability to have storage as code, disability, in addition modern operations is not just storage scored, but also we've got recently introduced some comprehensive ransomware protection, that's part of modern operations. There's all the threat you hear in the news or ransomware. We introduced what we call safe mode snapshots that allow you to recover in literally seconds. When you have a ransomware attack, we also have in the modern operations Pure one, which is maybe the leader in AI driven support to prevent downtime. We actually call you 80% of the time and fix the problems without you knowing about it. That's what modern operations is all about. And then also Martin operations says, okay, you've got flash on your on-prem side, but even maybe using flash in the public Cloud, how can I have seamless multi-Cloud experience in our Cloud block store we've introduced around Amazon, AWS and Azure allows one to do that. And then finally, for modern applications, if you think about it, this whole notion of infrastructure's code, as a service, software driven storage, the Kubernetes infrastructure enables one to really deliver a great automation framework that enables to reduce the labor required to manage the storage infrastructure and deliver it as code. And we have, kudos to Charlie and the Pure storage team before my time with the acquisition of Portworx, Portworx today is truly delivers true storage as code orchestrated entirely through Kubernetes and in a multi-Cloud hybrid situation. So it can run on EKS, GKE, OpenShift rancher, Tansu, recently announced as the leader by giggle home for enterprise Kubernetes storage. We were really proud about that asset. And then finally, the last piece are Pure as a service. That's also all outcome oriented, SLS. What matters is you sign up for SLS, and then you get those SLS, very different from our competition, right? Our competition tends to be a lot more around financial engineering, hey, you can buy it OPEX versus CapEx. And, but you get the same thing with a lot of professional services, we've really got, I'd say a couple of years and lead on, actually delivering and managing with SRE engineers for the SLA. So a lot of great work there. We recently also introduced Cisco FlashStack, again, flash stack as a service, again, as a service, a validation of that. And then finally, we also recently did a announcement with Aquaponics, with their bare metal as a service where we are a key part of their bare metal as a service offering, again, pushing the kind of the added service strategy. So yes, big for us, that's where the buck is skating, half the enterprises, even on prem, wanting to consume things in the Cloud operating model. And so that's where we're putting it lot. >> I see, so your contention is, it's not just this CapEx to OPEX, that's kind of the, during the economic downturn of 2007, 2008, the economic crisis, that was the big thing for CFOs. So that's kind of yesterday's news. What you're saying is you're creating a Cloud, like operating model, as I was saying upfront, irrespective of physical location. And I see that as your challenge, the industry's challenge, be, if I'm going to effect the digital transformation, I don't want to deal with the Cloud primitives. I want you to hide the underlying complexity of that Cloud. I want to deal with higher level problems, but so that brings me to digital transformation, which is kind of the now initiative, or I even sometimes call it the mandate. There's not a one size fits all for digital transformation, but I'm interested in your thoughts on the must take steps, universal steps that everybody needs to think about in a digital transformation journey. >> Yeah, so ultimately the digital transformation is all about how companies are gain a competitive edge in this new digital world or that the company are, and the competition are changing the game on, right? So you want to make sure that you can rapidly try new things, fail fast, innovate and invest, but speed is of the essence, agility and the Cloud operating model enables that agility. And so what we're also doing is not only are we driving agility in a multicloud kind of data, infrastructure, data operation fashion, but we also taking it a step further. We were also on the journey to deliver modern data services. Imagine on a Pure on-prem infrastructure, along with your different public Clouds that you're working on with the Kubernetes infrastructures, you could, with a few clicks run Kakfa as a service, TensorFlow as a service, Mongo as a service. So me as a technology team can truly become a service provider and not just an on-prem service provider, but a multi-Cloud service provider. Such that these services can be used to analyze the data that you have, not only your data, your partner data, third party public data, and how you can marry those different data sets, analyze it to deliver new insights that ultimately give you a competitive edge in the digital transformation. So you can see data plays a big role there. The data is what generates those insights. Your ability to match that data with partner data, public data, your data, the analysis on it services ready to go, as you get the digital, as you can do the insights. You can really start to separate yourself from your competition and get on the leaderboard a decade from now when this digital transformation settles down. >> All right, so bring us home, Ajay, summarize what does a modern data strategy look like and how does it fit into a digital business or a digital organization? >> So look, at the end of the day, data and analysis, both of them play a big role in the digital transformation. And it really comes down to how do I leverage this data, my data, partner data, public data, to really get that edge. And that links back to a vision. How do we provide that connected and effortless, modern data experience that allows our customers to focus on their business? How do I get the edge in the digital transformation? But easily leveraging, managing and winning with their data. And that's the heart of where Pure is headed. >> Ajay Singh, thanks so much for coming inside theCube and sharing your vision. >> Thank you, Dave, it was a real pleasure. >> And thank you for watching this Cube conversation. This is Dave Vellante and we'll see you next time. (upbeat music)

Published Date : Aug 18 2021

SUMMARY :

in the era of programmable Yeah, great to have you, And the vision, if you the pandemic you came in in kind of the unstructured data, And a lot of this sometimes and the address of this model, right? of 2007, 2008, the economic crisis, the data that you have, And that's the heart of and sharing your vision. was a real pleasure. And thank you for watching

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Ajay Singh	PERSON	0.99+
Charlie	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Pat Gelsinger	PERSON	0.99+
Ajay	PERSON	0.99+
Steve Jobs	PERSON	0.99+
80%	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Pure	ORGANIZATION	0.99+
TrustRadius	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
2008	DATE	0.99+
2007	DATE	0.99+
first	QUANTITY	0.99+
CapEx	ORGANIZATION	0.99+
Aquaponics	ORGANIZATION	0.99+
Portworx	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Intel	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
OPEX	ORGANIZATION	0.99+
Martin	PERSON	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
both	QUANTITY	0.99+
100 plus days	QUANTITY	0.99+
Pure Storage	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
over 2,700 customers	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.98+
second	QUANTITY	0.98+
first 100 days	QUANTITY	0.98+
billion dollars	QUANTITY	0.98+
this year	DATE	0.97+
Kubernetes	TITLE	0.97+
Cisco	ORGANIZATION	0.96+
two and a half	QUANTITY	0.96+
one	QUANTITY	0.96+
Mongo	ORGANIZATION	0.96+
Tansu	ORGANIZATION	0.95+
Azure	ORGANIZATION	0.95+
early this year	DATE	0.94+
earlier this year	DATE	0.94+
100 days	QUANTITY	0.94+
FlashRay	ORGANIZATION	0.93+
first company	QUANTITY	0.93+
tier two apps	QUANTITY	0.93+
C60	COMMERCIAL_ITEM	0.92+
pandemic	EVENT	0.92+
OpenShift	ORGANIZATION	0.91+
SLS	TITLE	0.91+
2026	DATE	0.91+
Carton	ORGANIZATION	0.91+
three components	QUANTITY	0.9+
today	DATE	0.88+
Cloud	TITLE	0.88+
a minute	QUANTITY	0.87+
SRE	ORGANIZATION	0.86+
Cloud block	TITLE	0.86+
two hard disk drive	QUANTITY	0.86+
EKS	ORGANIZATION	0.85+
Kubernetes	ORGANIZATION	0.82+
about eight months ago	DATE	0.82+
Ansibles	ORGANIZATION	0.8+
GKE	ORGANIZATION	0.79+
Kakfa	ORGANIZATION	0.79+
a decade	DATE	0.77+
tier one apps	QUANTITY	0.76+
Peer Insights	TITLE	0.75+
Git	TITLE	0.75+
TensorFlow	ORGANIZATION	0.71+
one big advantage	QUANTITY	0.7+

2021 AWSSQ2 069 AWS Krishna Gade and Amit Paka

(upbeat music) >> Hello and welcome to theCUBE as we present AWS Startup Showcase, The Next Big Thing in AI, Security & Life Sciences, the hottest startups. And today's session is really the next big thing in AI Security & Life Sciences. As to the AI track is really a big one most important. And we have a feature in company, fiddler.ai. I'm your host, John Furrier with theCUBE. And we're joined by the founders, Krishna Gade, founder and CEO, and Amit Paka, founder and Chief Product Officer. Great to have the founders on. Gentlemen, thank you for coming on this Cube segment for the AWS Startup Showcase. >> Thanks, John... >> Good to be here. >> So the topic of this session is staying compliant and accelerating AI adoption and model performance monitoring. Basically, bottom line is how to be innovative with AI and stay (John laughs) within the rules of the road, if you will. So, super important topic. Everyone knows the benefits of what AI can do. Everyone sees machine learning being embedded in every single application, but the business drivers of compliance and all kinds of new kinds of regulations are popping up. So we don't. The question is how do you stay compliant? Which is essentially how do you not foreclose the future opportunities? That's really the question on everyone's mind these days. So let's get into it. But before we start let's take a minute to explain what you guys do. Krishna, we'll start with you first. What does fiddler.ai do? >> Absolutely, yeah. Fiddler is a model performance management platform company. We help, you know, enterprises, mid-market companies to build responsible AI by helping them continuously monitoring their AI, analyzing it, explaining it, so that they know what's going on with their AI solutions at any given point of time. And they can be like, ensuring that, you know businesses are intact and they're compliant with all the regulations that they have in their industry. >> Everyone thinks AI is a secret sauce. It's magic beans and automatically will just change over the company. (John laughs) So it's kind of like this almost like it's a hope. But the reality is there is some value there but there's something that has to be done first. So let's get into what this model performance management is because it's a concept that needs to be understood well but also you got to implement it properly. There's some foundational things you've got to you know, walk, crawl before you walk and walk before you run kind of thing. So let's get into it. What is model performance management? >> Yeah, that's a great question. So the core software artifact most an AI system is called an AI model. So it essentially represents the patterns inside data accessing manner so that it can actually predict the future. Now, for example, let's say I'm trying to build an AI based credit underwriting system. What I would do is I would look at the historical you know, loans data. You know, good loans and bad loans. And then, I will build it a model that can capture those patterns so that when a new customer comes in I can actually predict, you know, how likely they are going to default on the loan much more activity. And this helps me as a bank or center company to produce more good loans for my company and ensure that my customer is not, you know, getting the right customer service. Now, the problem though is this AI model is a black box. Unlike regular software code you cannot really open up and read its code and read its patterns and how it is doing. And so that's where the risks around the AI models come along. And so you need a ways to innovate to actually explain it. You need to understand it and you need to monitor it. And this is where the model performance management system like Fiddler can help you look into that black box. Understand how it's doing it, monitor its predictions continuously so that you know what these models are doing at any given point of time. >> I mean, I'd love to get your thoughts on this because on the product side I could, first of all, totally awesome concept. No one debates that. But now you've got more and more companies integrating with each other more data's being shared. And so the, you know, everyone knows what an app sec review is, right? But now they're thinking about this concept of how do you do review of models, right? So understanding what's inside the black box is a huge thing. How do you do this? What does it mean? >> Yeah, so typically what you would do is it's just like software where you would validate software code going through QA and like analysis. In case of models you would try to prove the model in like different granularities to really understand how the model is behaving. This could be at a model prediction like level in case of the loans example, Krishna just gave. Why is my model saying high-risk to in particular loan? Or it might be in case of explaining groups of loans. For example, why is my model making high-risk predictions to loans made in California or loans made to all men? Was it loans made to all women? And it could also be at the global level. What are the key data factors important to my model? So the ability to prove the model deeper and really opening up the black box and then using that knowledge to explain how the model is working to non-technical folks in compliance. Or to folks who are regulators, who just want to ensure that they know how the model works to make sure that it's keeping up with kind of lending regulations to ensure that it's not biased and so on. So that's typically the way you would do it with the machine learning model. >> Krishna, talk about the potential embarrassments that could happen. You just mentioned some of the use cases you heard from a mid-saying you know, female, male. I mean, machines, aren't that smart. (John laughs) >> Yeah. >> If they don't have the data. >> Yeah. >> And data is fragmented you've got silos with all kinds of challenges just on the data problem, right? >> Yeah. >> So nevermind the machine learning problems. So, this is huge. I mean, the embarrassment opportunities. >> Yeah. >> And the risk management on whether it's a hack or something else. So you've got public embarrassment by doing something really went wrong. And then, you've got the real business impact that could be damaging. >> Absolutely. You know, AI has come forward a lot, right? I mean, you know, you have lots of data these days. You have a lot of computing power an amazing algorithms that you can actually build really sophisticated models. Some of these models were known to beat humans in image recognition and whatnot. However, the problem is there are risks in using AI, you know, without properly testing it, without properly monitoring it. For example, a couple of years ago, Apple and Goldman Sachs launched a credit card, right? And for their users where they were using algorithms presumably AI or machine learning algorithms to set credit limits. What happened was within the same household husband and wife got 10 times difference in the credit limits being set for them. And some of these people had similar FICO scores, similar salary ranges. And some of them went online and complained about it and that included the likes of Steve Wozniak as well. >> Yeah. >> So this was, these kind of stories are usually embarrassing when you could lose customer trust overnight, right? And, you know, you have to do a lot of PR damage. Eventually, there was a regulatory probate with Goldman Sachs. So there are these problems if you're not properly monitoring area systems, properly validating and testing them before you launch to the users. And that is why tools like Fiddler are coming forward so that you know, enterprises can do this. So that they can ensure responsible AI for both their organization as well as their customers. >> That's a great point, I want to get into this. What it kind of means and the kind of the industry side of it? And then, how that impacts customers? If you guys don't mind, machine learning opposite a term MLOps has been coined in the industry as you know. Basically, operations around machine learning, which kind of gets into the workflows and development life cycles. But ultimately, as you mentioned, this black box and this model being made. There's a heavy reliance on data. So Amit, what does this mean? Because now is it becomes operational with MLOps. There is now internal workflows and activities and roles and responsibilities. How is this changing organizations, you know separate the embarrassment, which is totally true. Now I've got an internal operational aspect and there's dev involved. What's the issue? >> Yeah, so typically, so if you look at the whole life cycle of machine learning ops, in some ways mirrors the traditional life cycle of kind of DevOps but in some ways it introduces new complexities. Specifically, because the models can be a black box. That's one thing to kind of watch out for. And secondly, because these models are probabilistic artifact, which means they are trained on data to grab relationships for what kind of potentially making high accuracy predictions. But the data that they see in life might actually differ and that might hurt their performance especially because machine learning is applied towards these high ROI use cases. So this process of MLOps needs to change to incorporate the fact that machine learning models can be black boxes and machine learning models can decay. And so the second part I think that's also relevant is because machine learning models can decay. You don't just create one model you create multiple versions of these models. And so you have to constantly stay on top of how your model is deviating from your reality and actual reality and kind of bring it back to that representation of reality. >> So this is interesting, I like this. So now there's a model for the model. So this is interesting. You guys have innovated on this model performance management idea. Can you explain the framework and how you guys solve that regulatory compliance piece? Because if you can be a model of the model, if you will. >> Then. >> Then you can then have some stability around maintaining the code basis or the integrity of the model. >> Okay. >> How does that? What do you guys offer? Take us through the framework and how it works and then how it ties to that regulatory piece? >> So the MPM system or the model performance management system really sits at the heart of the machine learning workflow. Keeping track of the data that is flowing through your ML life cycle, keeping track of the models that are going, you know, we're getting created and getting deployed and how they're performing. Keeping track of the whole parts of the models. So it gives you a centralized way of managing all of these information in one place, right? It gives you an oversight from a compliance standpoint from an operational standpoint of what's going on with your models in production. Imagine you're a bank you're probably creating hundreds of these models, but a variety of use cases, credit risk, fraud, anti-money laundering. How are you going to know which models are actually working very well? Which models are stale? Which models are expired? How do you know which models are underperforming? You know, are you getting alerts? So this is what this kind of governance, this performance management is what the system offers. It's a visual interface, lots of dashboards, the developers, operations folks, compliance folks can go and look into. And then they would get alerts when things go wrong with respect to their models. In terms of how it can be helpful to meet in compliance regulations. For example, let's say I'm starting to create a new credit risk model in a bank. Now I'm innovating on different AI algorithms here immediately before I even deploy that model I have to validate it. I have to explain it and create a report so that I can submit to my internal risk management team which can then review it, you know, understand all kinds of risks around it. And then potentially share it with the audit team and then keep a log of these reports so that when a regulator comes visits them, you know they can share these reports. These are the model reports. Is that how the model was created? Fiddler helps them create these reports, keep all of these reports in one place. And then once the model is deployed, you know, it basically can help them monitor these models continuously. So that they don't just have one ad hoc report when it was created upfront, they can a continuous monitoring continuous dashboard in terms of what it was doing in the last one whatever number of months it was running for. >> You know what? >> Historically, if you were to regulate it like all AI applications in the U.S. the legacy regulations are the ones that today are applied as to the equal credit opportunity or the Fed guidelines of like SR 11-7 that kind of comment that's applicable to all banks. So there is no purpose-built AI regulation but the EU released a proposed regulation just about three weeks back. That classifies risk within applications, and specifically for high-risk applications. They propose new oversight and the ads mandating explainability helping teams understand how the models are working and monitoring to ensure that when a model is trained for high accuracy, it maintains that. So now those two mandatory needs of high risk application, those are the ones that are solved by Fiddler. >> Yeah, this is, you mentioned explainable AI. Could you just quickly define that for the audience? Because this is a trend we're seeing a lot more of. Take a minute to explain what is explainable AI? >> Yeah, as I said in the beginning, you know AI model is a new software artifact that is being created. It is the core of an AI system. It's what represents all the patterns in the data and coach them and then uses that knowledge to predict the future. Now how it encodes all of these patterns is black magic, right? >> Yeah. >> You really don't know how the model is working. And so explainable AI is a set of technologies that can help you unlock that black box. You know, quote-unquote debug that model, looking to the model is introspected inspected, probate, whatever you want to call it, to understand how it works. For example, let's say I created an AI model, that again, predicts, you know, loan risk. Now let's say some person, a person comes to my bank and applies for a $10,000 loan, and the bank rejects the loan or the model rejects the loan. Now, why did it do it, right? That's a question that can explain the way I can answer. They can answer, hey, you know, the person's, you know salary range, you know, is contributing to 20% of the loan risk or this person's previous debt is contributing to 30% of the loan risk. So you can get a detailed set of dashboards in terms of attribution of taking the loan risk, the composite loan risk, and then attributing it to all the inputs that the model is observing. And so therefore, you now know how the moral is treating each of these inputs. And so now you have an idea of like where the person is getting effected by this loaner's mark. So now as a human, as an underwriter or a loan officer lending officer, I have knowledge about how the model is working. I can then have my human intuition or lap on it. I can approve the model sometimes I can disapprove the model sometimes. I can use this feedback and deliver it to the data science team, the AI team, so they can actually make the model better over time. So this unlocking black box has several benefits throughout their life cycle. >> That's awesome. Great definition. Great call. I want to grab get that on the record for the audience. Also, we'll make a clip out of that too. One of the things that I meant you brought up I love and want to get into is this MLOps impact. So as we were just talking earlier debugging module models and production, totally cool, relevant, unpacked a black box. But model decay, that's an interesting concept. Can you explain more? Because this to me, I think is potentially a big blind spot for the industry, because, you know, I talked to Swami at Amazon, who runs their AI group and, you know, they want to make AI easier and ML easier with SageMaker and other tools. But you can fall into a trap of thinking everything's done at one and done. It's iterative is you've got leverage here. You got to keep track of the performance of the models, not just debugging them. Are they actually working? Is there new data? This is a whole another practice. Could you explain this concept of model decay? >> Yeah, so let's look at the lending example Krishna was just talking about. If you expect your customers to be your citizens, right? So you will have examples in your training set which might have historical loans made to people that the needs of 40, and let's say 70. And so you will train your model and your model will be trained our highest accuracy in making loans to these type of applicants. But now let's say introduced a new loan product that you're targeting, let's say younger college going folks. So that model is not trained to work well in those kinds of scenarios. Or it could also happen that you could get a lot more older people coming in to apply for these loans. So the data that the model can see in life might not represent the data that you train the model with. And the model has recognized relationships in this data and it might not recognize relationships in this new data. So this is a constant, I would say, it's an ongoing challenge that you would face when you have a live model in ensuring that the reality meets your representation of the reality when you train the model. And so this is something that's unique to machine learning models and it has not been a problem historically in the world of DevOps. But it is a very key problem in the DevOps. >> This is really great topic. And most people who are watching might want to might know of some of these problems when they see the main mainstream press talk about fairness in black versus white skin and bias and algorithms. I mean, that's kind of like the press state that talk about those kinds of like big high level topics. But what it really means is that the data (John laughs) of practiced fairness and bias and skewing and all kinds of new things that come up that the machines just can't handle. This is a big deal. So this is happening to every part of data in an organization. So, great problem statement. I guess the next segue would be, why Fiddler, why now? What are you guys doing? How are you solving these problems? Take us through some use cases. How people engage with you guys? How you solve the problem and how you guys see this evolving? >> Great, so Fiddler is a purpose-built platform to solve for model explainability of modern monitoring and moderate bias detection. This is the only thing that we do, right? So we are super focused on building this tool to be useful across a variety of, you know, AI problems, from financial services to retail, to advertising to human resources, healthcare and so on and so forth. And so we have found a lot of commonalities around how data scientists are solving these problems across these industries. And we've created a system that can be plugged into their workflows. For example, I could be a bank, you know, creating anti-money laundering models on a modern AI platform like TensorFlow. Or I could be like a retail company that is building a recommendation models in, you know, PyTorch, like library. You can bring all of those models into one under one sort of umbrella, like using Fiddler. We can support a variety of heterogeneous types of models. And that is a very very hard technical problem to solve. To be able to ingest and digest all these different types of monotypes and then provide a single pane of glass in terms of how the model is performing. How explaining the model, tracking the model life cycle throughout its existence, right? And so that is the value prop that Fiddler offers, the MLOps team, so they can get this oversight. And so this plugs in nicely with their MLOps so they don't have to change anything and give the additional benefit... >> So, you're basically creating faster outcomes because the teams can work on real problems. >> Right. >> And not have to deal with the maintenance of model management. >> Right. >> Whether it's debugging or decay evaluations, right? >> Right, we take care of all of their model operations from a monitoring standpoint, analysis standpoint, debugability, alerting. So that they can just build the right kind of models for their customers. And we give them all the insights and intelligence to know the problems with behind those models behind their datasets. So that they can actually build more accurate models more responsible models for their customers. >> Okay, Amit, give us the secret sauce. What's going on in the product? How does it all work? What's the secret sauce? >> So there are three key kind of pillars to Fiddler product. One is of course, we leverage the latest research, and we actually productize that in like amazing ways where when you explain models you get the explanation within a second. So this activates new use cases like, let's say counterfactual analysis. You can not only get explanations for your loan, you can also see hypothetically. What if this the loan applicant was, you know, had a higher income? What would the model do? So, that's one part productizing latest research. The second part is infrastructure at scale. So we are not just building something that would work for SMBs. We are building something that works on enterprise scale. So billions and billions of predictions, right? Flowing through the system. We want to make sure that we can handle as larger scale as seamlessly as kind of possible. So we are trying to activate that and making sure we are the best enterprise grade product on the market. And thirdly, user experience. What you'll see when you use Fiddler. Finally, when we do demos to kind of customers what they really see is the product. They don't see that the scale right, right, right then and there. They don't see the deep reason. What they see, what they see are these like beautiful experiences that are very intuitive to them. Where we've merged explainability and monitoring and bias detection in like seamless way. So you get the most intuitive experiences that are not just designed for the technical user, but also for the non-technical user. Who are also stakeholders within AI. >> So the scale thing is a huge point, by the way. I think that's something that you see successful companies. That's a differentiator and frankly, it's the new sustainability. So new lock-in, if you will, not to be in a bad way but in a good way. You do a good job. You get scale, you get leverage. I want to just point out and get your guys' thoughts on your approach on the frame. Where you guys are centralized. >> Right. >> So as decentralization continues to be a wave you guys are taking much more of a centralized approach. Why is that done? Take us through the decision on that. >> Yeah. So, I mean, in terms of, you know decentralization in terms of running models on different you know, containers and, you know, scoring them on multiple number of nodes, that's absolutely makes sense, right? When from a deployment standpoint from a inference standpoint. But when it comes to actually you know, understanding how the models are working. Visualizing them, monitoring them, knowing what's going on with the models. You need a centralized dashboard that a lapsed user can actually use or a head of AI governance inside a bank and use what are all the models that my team is shipping? You know, which models carry risk, you know? How are these models performing last week? This, you need a centralized repository. Otherwise, it'll be very very hard to track these models, right? Because the models are going to grow really really fast. You know, there are so many open source libraries, open source model architecture has been produced. And so many data scientists coming out of grad schools and whatnot. And the number of models in enterprise is just going to grow many many fold in the coming years. Now, how are you going to track all of these things without having a centralized platform? And that's what we envisaged a few years ago that every team will need an oversight tool like Fiddler. Which can keep track of all of their models in one place. And that's what we are finding from our customers. >> As long as you don't get in the way of them creating value, which is the goal, right? >> Right. >> And be frictionless take away the friction. >> Yeah. >> And enable it. Love the concept. I think you guys are on something big there, great products. Great vision. The question I have for you to kind of wrap things up here. Is that this is all new, right? And new, it's all goodness, right? If you've got scale in the Cloud, all these new benefits. Again, more techies coming out of grad school and Computer Science and Engineering, and just data analysis in general is changing. And there's more people to be democratized to be contributing. >> Right. >> How do you operationalize it? How do companies get this going? Because you've got a new thing happening. It's a new wave. >> Okay. >> But it's still the same game, make business run better. >> Right. >> So you've got to deploy something new. What's the operational playbook for companies to get started? >> Absolutely. First step is to, if a company is trying to install AI, incorporate AI into their workflow. You know, most companies I would say, they're in still early stages, right? There a lot of enterprises are still, you know, developing these models. Some of them may have been in labs. ML operationalization is starting to happen and it probably started in a year or two ago, right? So now when it comes to, you know, putting AI into practice, so far, you know, you can have AI models in labs. They're not going to hurt anyone. They're not going to hurt your business. They're not going to hurt your users. But once you operationalize them then you have to do it in a proper manner, in a responsible manner, in a trustworthy manner. And so we actually have a playbook in terms of how you would have to do this, right? How are you going to test these models? How are you going to analyze and validate them before they actually are deployed? How are you going to analyze, you know, look into data bias and training set bias, or test set bias. And once they are deployed to production are you tracking, you know, model performance or time? Are you tracking drifting models? You know, the decay part that we talked about. Do you have alerts in place when model performance goes all over the place? Now, all of a sudden, suddenly you get a lot of false positives in your fraud models. Are you able to track them? We have the personnel in place. You have the data scientists, the ML engineers, the MLOps engineers, the governance teams in place if it's in a regulated industry to use these tools. And then, the tools like Fiddler, will add value, will make them, you know, do their job, institutionalize this process of responsible AI. So that they're not only reaping the benefits of this great technology. There's no doubt about the AI, right? It's actually, it's going to be game changing but then they can also do it in a responsible and trustworthy manner. >> Yeah, it's really get some wins, get some momentum, see it. This is the Cloud way. It gets them some value immediately and grow from there. I was talking to a friend the other day, Amit, about IT the lecture. I don't worry about IT and all the Cloud. I go, there's no longer IT, IT is dead. It's an AI department now. (Amit laughs) So and this is kind of what you guys are getting at. This now it's data now it's AI. It's kind of like what IT used to be enabling organizations to be successful. You guys are looking at it from the perspective of the same way it's enabled success. You put it out that you provision (John laughs) algorithms instead of servers they're algorithms now. This is the new model. >> Yeah, we believe that all companies in the future as it happened to this wave of data are going to be AI companies, right? So it's really just a matter of time. And the companies that are first movers in this are going to have a significant advantage like we're seeing that in like banking already. Where the banks that have made the leap into AI battles are reaping benefits of enabling a lot more models at the same risk profile using deep learning models. As long as you're able to like validate these to ensure that they're meeting kind of like the regulations. But it's going to give significant advantages to a lot of companies as they move faster with respect to others in the same industry. >> Yeah, quickers too, saw a friend too on the compliance side. You mentioned trust and transparency with the whole EU thing. Some are saying that, you know, to be a public company, you're going to have to have AI disclosure soon. You're going to have to have on the disclosure in your public statements around how you're explaining your AI. Again, fantasy today. But pretty plausible. >> Right, absolutely. I mean, the real reality today is, you know less than 10% of the CEOs care about ethical AI, right? And that has to change. And I think, you know, and I think that has to change for the better, because at the end of the day, if you are using AI, if you're not using in a responsible and trustworthy manner then there is like regulation. There is compliance risk, there's operational business risk. You know, customer trust. Losing customers trust can be huge. So I think, you know, we want to provide that you know, insurance, or like, you know like a preventative mechanism. So that, you know, if you have these tools in place then you're less likely to get into those situations. >> Awesome. Great, great conversation, Krishna, Amit. Thank you for sharing both the founders of Fiddler.ai. Great company. On the right side of history in my opinion, the next big thing in AI. AI departments, AI compliance, AI reporting. (John laughs) Explainable AI, ethical AI, all part of this next revolution. Gentlemen, thank you for joining us on theCUBE Amazon Startup Showcase. >> Thanks for having us, John. >> Okay, it's theCUBE coverage. Thank you for watching. (upbeat music)

Published Date : May 28 2021

SUMMARY :

really the next big thing So the topic of this We help, you know, enterprises, and walk before you run kind of thing. so that you know what And so the, you know, So the ability to prove the model deeper of the use cases you heard So nevermind the And the risk management and that included the likes so that you know, enterprises can do this. and the kind of the industry side of it? And so you have to constantly stay on top of the model, if you will. the integrity of the model. that are going, you know, and the ads mandating define that for the audience? It is the core of an AI system. know, the person's, you know One of the things that of the reality when you train the model. and how you guys see this evolving? And so that is the value because the teams can And not have to deal So that they can just build What's going on in the product? They don't see that the scale So the scale thing is you guys are taking much more And the number of models in enterprise take away the friction. I think you guys are How do you operationalize it? But it's still the same game, What's the operational playbook So now when it comes to, you know, You put it out that you of like the regulations. you know, to be a public company, And I think, you know, the founders of Fiddler.ai. Thank you for watching.

ENTITIES

Entity	Category	Confidence
California	LOCATION	0.99+
10 times	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Amit Paka	PERSON	0.99+
Steve Wozniak	PERSON	0.99+
Apple	ORGANIZATION	0.99+
EU	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
20%	QUANTITY	0.99+
Goldman Sachs	ORGANIZATION	0.99+
John	PERSON	0.99+
40	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
Krishna	PERSON	0.99+
Amit	PERSON	0.99+
billions	QUANTITY	0.99+
70	QUANTITY	0.99+
Fed	ORGANIZATION	0.99+
last week	DATE	0.99+
Krishna Gade	PERSON	0.99+
One	QUANTITY	0.99+
one part	QUANTITY	0.99+
second part	QUANTITY	0.99+
less than 10%	QUANTITY	0.99+
one model	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
three key	QUANTITY	0.98+
both	QUANTITY	0.98+
one thing	QUANTITY	0.98+
First step	QUANTITY	0.98+
today	DATE	0.98+
one place	QUANTITY	0.98+
Fiddler.ai	ORGANIZATION	0.98+
secondly	QUANTITY	0.97+
hundreds	QUANTITY	0.97+
each	QUANTITY	0.97+
first	QUANTITY	0.97+
U.S.	LOCATION	0.97+
Swami	PERSON	0.96+
a year	DATE	0.94+
first movers	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
Fiddler	ORGANIZATION	0.94+
FICO	ORGANIZATION	0.93+
SR 11-7	TITLE	0.93+
one	QUANTITY	0.91+
two ago	DATE	0.88+
three weeks back	DATE	0.83+
couple of years ago	DATE	0.82+
Amazon Startup Showcase	EVENT	0.81+
few years ago	DATE	0.8+
billions of predictions	QUANTITY	0.77+
Fiddler	TITLE	0.77+
two mandatory	QUANTITY	0.76+
SageMaker	TITLE	0.76+
single pane of	QUANTITY	0.75+
a second	QUANTITY	0.74+
thirdly	QUANTITY	0.73+
about	DATE	0.73+
single application	QUANTITY	0.73+
PyTorch	ORGANIZATION	0.73+
AWS	EVENT	0.72+
Startup Showcase	EVENT	0.69+
TensorFlow	TITLE	0.67+
AWS Startup Showcase	EVENT	0.65+

Brian Gracely, Red Hat | KubeCon + CloudNativeCon Europe 2021 - Virtual

>> From around the globe, it's theCUBE, with coverage of KubeCon and CloudNativeCon Europe 2021 Virtual. Brought to you by Red Hat, the Cloud Native Computing Foundation and ecosystem partners. >> Hello, welcome back to theCUBE's coverage of KubeCon 2021 CloudNativeCon Europe Virtual, I'm John Furrier your host, preview with Brian Gracely from Red Hat Senior Director Product Strategy Cloud Business Unit Brian Gracely great to see you. Former CUBE host CUBE alumni, big time strategist at Red Hat, great to see you, always great. And also the founder of Cloudcast which is an amazing podcast on cloud, part of the cloud (indistinct), great to see you Brian. Hope's all well. >> Great to see you too, you know for years, theCUBE was always sort of the ESPN of tech, I feel like, you know ESPN has become nothing but highlights. This is where all the good conversation is. It's theCUBE has become sort of the the clubhouse of tech, if you will. I know that's that's an area you're focused on, so yeah I'm excited to be back on and good to talk to you. >> It's funny you know, with all the events going away loved going out extracting the signal from the noise, you know, game day kind of vibe. CUBE Virtual has really expanded, so it's been so much more fun because we can get more people easy to dial in. So we're going to keep that feature post COVID. You're going to hear more about theCUBE Virtual hybrid events are going to be a big part of it, which is great because as you know and we've talked about communities and ecosystems are huge advantage right now it's been a big part of the Red Hat story. Now part of IBM bringing that mojo to the table the role of ecosystems with hybrid cloud is so critical. Can you share your thoughts on this? Because I know you study it, you have podcasts you've had one for many years, you understand that democratization and this new direct to audience kind of concept. Share your thoughts on this new ecosystem. >> Yeah, I think so, you know, we're sort of putting this in the context of what we all sort of familiarly call KubeCon but you know, if we think about it, it started as KubeCon it was sort of about this one technology but it's always been CloudNativeCon and we've sort of downplayed the cloud native part of it. But even if we think about it now, you know Kubernetes to a certain extent has kind of, you know there's this feeling around the community that, that piece of the puzzle is kind of boring. You know, it's 21 releases in, and there's lots of different offerings that you can get access to. There's still, you know, a lot of innovation but the rest of the ecosystem has just exploded. So it's, you know, there are ecosystem partners and companies that are working on edge and miniaturization. You know, we're seeing things like Kubernetes now getting into outer space and it's in the space station. We're seeing, you know, Linux get on Mars. But we're also seeing, you know, stuff on the other side of the spectrum. We're sort of seeing, you know awesome people doing database work and streaming and AI and ML on top of Kubernetes. So, you know, the ecosystem is doing what you'd expect it to do once one part of it gets stable. The innovation sort of builds on top of it. And, you know, even though we're virtual, we're still seeing just tons and tons of contributions, different companies different people stepping up and leading. So it's been really cool to watch the last few years. >> Yes, interesting point about the CloudNativeCon. That's an interesting insight, and I totally agree with you. And I think it's worth double clicking on. Let me just ask you, because when you look at like, say Kubernetes, okay, it's enabled a lot. Okay, it's been called the dial tone of Cloud native. I think Pat Gelsinger of VMware used that term. We call it the kind of the interoperability layer it enables more large scale deployments. So you're seeing a lot more Kubernetes enablement on clusters. Which is causing more hybrid cloud which means more Cloud native. So it actually is creating a network effect in and of itself with more Cloud native components and it's changing the development cycle. So the question I want to ask you is one how does a customer deal with that? Because people are saying, I like hybrid. I agree, Multicloud is coming around the corner. And of course, Multicloud is just a subsystem of resource underneath hybrid. How do I connect it all? Now I have multiple vendors, I have multiple clusters. I'm cross-cloud, I'm connecting multiple clouds multiple services, Kubernetes clusters, some get stood up some gets to down, it's very dynamic. >> Yeah, it's very dynamic. It's actually, you know, just coincidentally, you know, our lead architect, a guy named Clayton Coleman, who was one of the Kubernetes founders, is going to give a talk on sort of Kubernetes is this hybrid control plane. So we're already starting to see the tentacles come out of it. So you know how we do cross cloud networking how we do cross cloud provisioning of services. So like, how do I go discover what's in other clouds? You know and I think like you said, it took people a few years to figure out, like how do I use this new thing, this Kubernetes thing. How do I harness it. And, but the demand has since become "I have to do multi-cloud." And that means, you know, hey our company acquires companies, so you know, we don't necessarily know where that next company we acquire is going to run. Are they going to run on AWS? Are they going to, you know, run on Azure I've got to be able to run in multiple places. You know, we're seeing banking industries say, "hey, look cloud's now a viable target for you to put your applications, but you have to treat multiple clouds as if they're your backup domains." And so we're, you know, we're seeing both, you know the way business operates whether it's acquisitions or new things driving it. We're seeing regulations driving hybrid and multi-cloud and, even you know, even if the stalwart were to you know, set for a long time, well the world's only going to be public cloud and sort of you know, legacy data centers even those folks are now coming around to "I've got to bring hybrid to, to these places." So it's been more than just technology. It's been, you know, industries pushing it regulations pushing it, a lot of stuff. So, but like I said, we're going to be talking about kind of our future, our vision on that, our future on that. And, you know Red Hat everything we end up doing is a community activity. So we expect a lot of people will get on board with it >> You know, for all the old timers out there they can relate to this. But I remember in the 80's the OSI Open Systems Interconnect, and I was chatting with Paul Cormier about this because we were kind of grew up through that generation. That disrupted network protocols that were proprietary and that opened the door for massive, massive growth massive innovation around just getting that interoperability with TCP/IP, and then everything else happened. So Kubernetes does that, that's a phenomenal impact. So Cloud native to me is at that stage where it's totally next-gen and it's happening really fast. And a lot of people getting caught off guard, Brian. So you know, I got to to ask you as a product strategist, what's your, how would you give them the navigation of where that North star is? If I'm a customer, okay, I got to figure out where I got to navigate now. I know it's super volatile, changing super fast. What's your advice? >> I think it's a couple of pieces, you know we're seeing more and more that, you know, the technology decisions don't get driven out of sort of central IT as much anymore right? We sort of talk all the time that every business opportunity, every business project has a technology component to it. And I think what we're seeing is the companies that tend to be successful with it have built up the muscle, built up the skill set to say, okay, when this line of business says, I need to do something new and innovative I've got the capabilities to sort of stand behind that. They're not out trying to learn it new they're not chasing it. So that's a big piece of it, is letting the business drive your technology decisions as opposed to what happened for a long time which was we built out technology, we hope they would come. You know, the other piece of it is I think because we're seeing so much push from different directions. So we're seeing, you know people put technology out at the edge. We're able to do some, you know unique scalable things, you know in the cloud and so forth That, you know more and more companies are having to say, "hey, look, I'm not, I'm not in the pharmaceutical business. I'm not in the automotive business, I'm in software." And so, you know the companies that realize that faster, and then, you know once they sort of come to those realizations they realize, that's my new normal, those are the ones that are investing in software skills. And they're not afraid to say, look, you know even if my existing staff is, you know, 30 years of sort of history, I'm not afraid to bring in some folks that that'll break a few eggs and, you know, and use them as a lighthouse within their organization to retrain and sort of reset, you know, what's possible. So it's the business doesn't move. That's the the thing that drives all of them. And it's, if you embrace it, we see a lot of success. It's the ones that, that push back on it really hard. And, you know the market tends to sort of push back on them as well. >> Well we're previewing KubeCon CloudNativeCon. We'll amplify that it's CloudNativeCon as well. You guys bought StackRox, okay, so interesting company, not an open source company they have soon to be, I'm assuring, but Advanced Cluster Security, ACS, as it's known it's really been a key part of Red Hat. Can you give us the strategy behind that deal? What does that product, how does it fit in that's a lot of people are really talking about this acquisition. >> Yeah so here's the way we looked at it, is we've learned a couple of things over the last say five years that we've been really head down in Kubernetes, right? One is, we've always embedded a lot of security capabilities in the platform. So OpenShift being our core Kubernetes platform. And then what's happened over time is customers have said to us, "that's great, you've made the platform very secure" but the reality is, you know, our software supply chain. So the way that we build applications that, you know we need to secure that better. We need to deal with these more dynamic environments. And then once the applications are deployed they interact with various types of networks. I need to better secure those environments too. So we realized that we needed to expand our functionality beyond the core platform of OpenShift. And then the second thing that we've learned over the last number of years is to be successful in this space, it's really hard to take technology that wasn't designed for containers, or it wasn't designed for Kubernetes and kind of retrofit it back into that. And so when we were looking at potential acquisition targets, we really narrowed down to companies whose fundamental technologies were you know, Kubernetes-centric, you know having had to modify something to get to Kubernetes, and StackRox was really the leader in that space. They really, you know have been the leader in enterprise Kubernetes security. And the great thing about them was, you know not only did they have this Kubernetes expertise but on top of that, probably half of their customers were already OpenShift customers. And about 3/4 of their customers were using you know, native Kubernetes services and other clouds. So, you know, when we went and talked to them and said, "Hey we believe in Kubernetes, we believe in multi-cloud. We believe in open source," they said, "yeah, those are all the foundational things for us." And to your point about it, you know, maybe not being an open source company, they actually had a number of sort of ancillary projects that were open source. So they weren't unfamiliar to it. And then now that the acquisition's closed, we will do what we do with every piece of Red Hat technology. We'll make sure that within a reasonable period of time that it's made open source. And so you know, it's good for the community. It allows them to keep focusing on their innovation. >> Yeah you've got to get that code out there cool. Brian, I'm hearing about Platform Plus what is that about? Take us through that. >> Yeah, so you know, one of the things that our customers, you know, have come to us over time is it's you know, it's like, I've been saying kind of throughout this discussion, right? Kubernetes is foundational, but it's become pretty stable. The things that people are solving for now are like, you highlighted lots and lots of clusters, they're all over the place. That was something that our advanced cluster management capabilities were able to solve for people. Once you start getting into lots of places you've got to be able to secure things everywhere you go. And so OpenShift for us really allows us to bundle together, you know, sort of the complete set of the portfolio. So the platform, security management, and it also gives us the foundational pieces or it allows our customers to buy the foundational pieces that are going to help them do multi and hybrid cloud. And, you know, when we bundle that we can save them probably 25% in terms of sort of product acquisition. And then obviously the integration work we do you know, saves a ton on the operational side. So it's a new way for us to, to not only bundle the platform and the technologies but it gets customers in a mindset that says, "hey we've moved past sort of single environments to hybrid and multi-cloud environments. >> Awesome, well thanks for the update on that, appreciate it. One of the things going into KubeCon, and that we're watching closely is this Cloud native developer action. Certainly end users want to get that in a separate section with you but the end user contribution, which is like exploding. But on the developer side there's a real trend towards adding stronger consistency programmability support for more use cases okay. Where it's becoming more of a data platform as a requirement. >> Brian: Right. >> So how, so that's a trend so I'm kind of thinking, there's no disagreement on that. >> Brian: No, absolutely. >> What does that mean? Like I'm a customer, that sounds good. How do I make that happen? 'Cause that's the critical discussion right now in the DevOps, DevSecOps day, two operations. What you want to call it. This is the number one concern for developers and that solution architect, consistency, programmability more use cases with data as a platform. >> Yeah, I think, you know the way I kind of frame this up was you know, for any for any organization, the last thing you want to to do is sort of keep investing in lots of platforms, right? So platforms are great on their surface but once you're having to manage five and six and, you know 10 or however many you're managing, the economies of scale go away. And so what's been really interesting to watch with Kubernetes is, you know when we first got started everything was Cloud native application but that really was sort of, you know shorthand for stateless applications. We quickly saw a move to, you know, people that said, "Hey I can modernize something, you know, a Stateful application and we add that into Kubernetes, right? The community added the ability to do Stateful applications and that got people a certain amount of the way. And they sort of started saying, okay maybe Kubernetes can help me peel off some things of an existing platform. So I can peel off, you know Java workloads or I can peel off, what's been this explosion is the data community, if you will. So, you know, the TensorFlows the PItorches, you know, the Apache community with things like Couchbase and Kafka, TensorFlow, all these things that, you know maybe in the past didn't necessarily, had their own sort of underlying system are now defaulting to Kubernetes. And what we see because of that is, you know people now can say, okay, these data workloads these AI and ML workloads are so important to my business, right? Like I can directly point to cost savings. I can point to, you know, driving innovation and because Kubernetes is now their default sort of way of running, you know we're seeing just sort of what used to be, you know small islands of clusters become these enormous footprints whether they're in the cloud or in their data center. And that's almost become, you know, the most prevalent most widely used use case. And again, it makes total sense. It's exactly the trends that we've seen in our industry, even before Kubernetes. And now people are saying, okay, I can consolidate a lot of stuff on Kubernetes. I can get away from all those silos. So, you know, that's been a huge thing over the last probably year plus. And the cool thing is we've also seen, you know the hardware vendors. So whether it's Intel or Nvidia, especially around GPUs, really getting on board and trying to make that simpler. So it's not just the software ecosystem. It's also the hardware ecosystem, really getting on board. >> Awesome, Brian let me get your thoughts on the cloud versus the power dynamics between the cloud players and the open source software vendors. So what's the Red Hat relationship with the cloud players with the hybrid architecture, 'cause you want to set up the modern day developer environment, we get that right. And it's hybrid, what's the relationship with the cloud players? >> You know, I think so we we've always had two philosophies that haven't really changed. One is, we believe in open source and open licensing. So you haven't seen us look at the cloud as, a competitive threat, right? We didn't want to make our business, and the way we compete in business, you know change our philosophy in software. So we've always sort of maintained open licenses permissive licenses, but the second piece is you know, we've looked at the cloud providers as very much partners. And mostly because our customers look at them as partners. So, you know, if Delta Airlines or Deutsche Bank or somebody says, "hey that cloud provider is going to be our partner and we want you to be part of that journey, we need to be partners with that cloud as well." And you've seen that sort of manifest itself in terms of, you know, we haven't gone and set up new SaaS offerings that are Red Hat offerings. We've actually taken a different approach than a lot of the open source companies. And we've said we're going to embed our capabilities, especially, you know OpenShift into AWS, into Azure into IBM cloud working with Google cloud. So we'd look at them very much as a partner. I think it aligns to how Red Hat's done things in the past. And you know, we think, you know even though it maybe easy to sort of see a way of monetizing things you know, changing licensing, we've always found that, you've got to allow the ecosystem to compete. You've got to allow customers to go where they want to go. And we try and be there in the most consumable way possible. So that's worked out really well for us. >> So I got to bring up the end user participation component. That's a big theme here at KubeCon going into it and around the event is, and we've seen this trend happen. I mean, Envoy, Lyft the laying examples are out there. But they're more end-use enterprises coming in. So the enterprise class I call classic enterprise end user participation is at an all time high in opensource. You guys have the biggest portfolio of enterprises in the business. What's the trend that you're seeing because it used to be limited to the hyperscalers the Lyfts and the Facebooks and the big guys. Now you have, you know enterprises coming in the business model is working, can you just share your thoughts on CloudNativeCons participation for end users? >> Yeah, I think we're definitely seeing a blurring of lines between what used to be the Silicon Valley companies were the ones that would create innovation. So like you mentioned Lyft, or, you know LinkedIn doing Kafka or Twitter doing you know, whatever. But as we've seen more and more especially enterprises look at themselves as software companies right. So, you know if you talk about, you know, Ford or Volkswagen they think of themselves as a software company, almost more than they think about themselves as a car company, right. They're a sort of mobile transportation company you know, something like that. And so they look at themselves as I've got to I've got to have software as an expertise. I've got to compete for the best talent, no matter where that talent is, right? So it doesn't have to be in Detroit or in Germany or wherever I can go get that anywhere. And I think what they really, they look for us to do is you know, they've got great technology chops but they don't always understand kind of the the nuances and the dynamics of open-source right. They're used to having their own proprietary internal stuff. And so a lot of times they'll come to us, not you know, "Hey how do we work with the project?" But you know like here's new technology. But they'll come to us and they'll say "how do we be good, good stewards in this community? How do we make sure that we can set up our own internal open source office and have that group, work with communities?" And so the dynamics have really changed. I think a lot of them have, you know they've looked at Silicon Valley for years and now they're modeling it, but it's, you know, for us it's great because now we're talking the same language, you know we're able to share sort of experiences we're able to share best practices. So it is really, really interesting in terms of, you know, how far that whole sort of software is eating the world thing is materialized in sort of every industry. >> Yeah and it's the workloads of expanding Cloud native everywhere edge is blowing up big time. Brian, final question for you before we break. >> You bet. >> Thanks for coming on and always great to chat with you. It's always riffing and getting the data out too. What's your expectation for KubeCon CloudNativeCon this year? What are you expecting to see? What highlights do you expect will come out of CloudNativeCon KubeCon this year? >> Yeah, I think, you know like I said, I think it's going to be much more on the Cloud native side, you know we're seeing a ton of new communities come out. I think that's going to be the big headline is the number of new communities that are, you know have sort of built up a following. So whether it's Crossplane or whether it's, you know get-ops or whether it's, you know expanding around the work that's going on in operators we're going to see a whole bunch of projects around, you know, developer sort of frameworks and developer experience and so forth. So I think the big thing we're going to see is sort of this next stage of, you know a thousand flowers are blooming and we're going to see probably a half dozen or so new communities come out of this one really strong and you know the trends around those are going to accelerate. So I think that'll probably be the biggest takeaway. And then I think just the fact that the community is going to come out stronger after the pandemic than maybe it did before, because we're learning you know, new ways to work remotely, and that, that brings in a ton of new companies and contributors. So I think those two big things will be the headlines. And, you know, the state of the community is strong as they, as they like to say >> Yeah, love the ecosystem, I think the values are going to be network effect, ecosystems, integration standards evolving very quickly out in the open. Great to see Brian Gracely Senior Director Product Strategy at Red Hat for the cloud business unit, also podcasts are over a million episode downloads for the cloud cast podcast, thecloudcast.net. What's it Brian, what's the stats now. >> Yeah, I think we've, we've done over 500 shows. We're you know, about a million and a half listeners a year. So it's, you know again, it's great to have community followings and, you know, and meet people from around the world. So, you know, so many of these things intersect it's a real pleasure to work with everybody >> You're going to create a culture, well done. We're all been there, done that great job. >> Thank you >> Check out the cloud cast, of course, Red Hat's got the great OpenShift mojo going on into KubeCon. Brian, thanks for coming on. >> Thanks John. >> Okay so CUBE coverage of KubeCon, CloudNativeCon Europe 2021 Virtual, I'm John Furrier with theCUBE virtual. Thanks for watching. (upbeat music)

Published Date : Apr 26 2021

SUMMARY :

Brought to you by Red great to see you Brian. Great to see you too, It's funny you know, with to a certain extent has kind of, you know So the question I want to ask you is one the stalwart were to you know, So you know, I got to to ask to say, look, you know Can you give us the but the reality is, you know, that code out there cool. Yeah, so you know, one of with you but the end user contribution, So how, so that's a trend What you want to call it. the PItorches, you know, and the open source software vendors. And you know, we think, you So the enterprise class come to us, not you know, Yeah and it's the workloads of What are you expecting to see? and you know the trends around for the cloud business unit, So it's, you know again, You're going to create Check out the cloud cast, of course, of KubeCon, CloudNativeCon

ENTITIES

Entity	Category	Confidence
Ford	ORGANIZATION	0.99+
Volkswagen	ORGANIZATION	0.99+
Pat Gelsinger	PERSON	0.99+
Brian	PERSON	0.99+
Deutsche Bank	ORGANIZATION	0.99+
Nvidia	ORGANIZATION	0.99+
Clayton Coleman	PERSON	0.99+
Brian Gracely	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Delta Airlines	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
John Furrier	PERSON	0.99+
25%	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
John	PERSON	0.99+
Detroit	LOCATION	0.99+
Paul Cormier	PERSON	0.99+
LinkedIn	ORGANIZATION	0.99+
30 years	QUANTITY	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
five	QUANTITY	0.99+
two philosophies	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
One	QUANTITY	0.99+
six	QUANTITY	0.99+
10	QUANTITY	0.99+
KubeCon	EVENT	0.99+
Silicon Valley	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
ESPN	ORGANIZATION	0.99+
21 releases	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
both	QUANTITY	0.99+
CloudNativeCon	EVENT	0.98+
Facebooks	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Cloudcast	ORGANIZATION	0.98+
thecloudcast.net	OTHER	0.98+
Lyft	ORGANIZATION	0.98+
Twitter	ORGANIZATION	0.98+
Silicon Valley	LOCATION	0.97+
Linux	TITLE	0.97+
over 500 shows	QUANTITY	0.97+
CloudNativeCon Europe 2021 Virtual	EVENT	0.97+
80's	DATE	0.97+
one	QUANTITY	0.97+
OpenShift	TITLE	0.96+
Java	TITLE	0.96+
Kubernetes	ORGANIZATION	0.96+
Lyfts	ORGANIZATION	0.96+
Kubernetes	TITLE	0.96+
pandemic	EVENT	0.96+
theCUBE	ORGANIZATION	0.95+
one part	QUANTITY	0.95+
KubeCon 2021 CloudNativeCon Europe Virtual	EVENT	0.95+
Azure	TITLE	0.94+
Mars	LOCATION	0.94+
CloudNativeCon	TITLE	0.94+
OpenShift	ORGANIZATION	0.93+
Google	ORGANIZATION	0.93+
Kafka	TITLE	0.92+

Compute Session 05

>> Thank you for joining us today for this session entitled, Deploy any Workload as a Service, When General Purpose Technology isn't Enough. This session today will be on our HPE GreenLake platform. And my name is Mark Seamans, and I'm a member of our GreenLake cloud services team. And I'll be kind of leading you through the material today which will include both a slide presentation as well as an interactive demo to get some experience in terms of how the process goes for interacting with your initial experience with our GreenLake system. So, let's go ahead and get started. One of the things that we've noticed over the last decade and I'm sure that you have as well has been the tremendous focus on accelerating business while concurrently trying to increase agility and to reduce costs. And one of the ways a lot of businesses have gone about doing that has been leveraging a cloud based technology set. And in many cases, that's involved moving some of the workloads to the public cloud. And so with that much said, though, while organizations have been able to enjoy that cost control and the agility associated with the public cloud. What we've seen is that the easy to move workloads have been moved but there's a significant amount as much as 70% in many cases of workloads that organizations run which still remain on prem. And there's reasons for that. Some cases it's due to data privacy and security concerns. Other times it's due to latency of really needing high-performance access to data. And the other times, it's really just related to the interconnected nature of systems and that you need to have a whole bunch of systems which form an overall experience and they need to be located close together. So, one of the challenges that we've worked with customers and have actually developed our GreenLake solution to address is this idea of trying to achieve this cloud-like experience for all of your apps and data in a way that leverages the best of the public cloud with also that same type of experience delivered on premise. So as you think about some of the challenges, again, we touched on this that customers are trying to address. One of the ones is this idea of agility, being able to move quickly and to be able to take a set of IT resources that you have and deploy them for different use cases and different models. So, it's one of the things as we built GreenLake, we really had a strong focus on is how do we provide a common foundation, a common framework to deliver that kind of agility. The next one is this term on the top right called scale. And one of the words you may hear is you hear cloud talked about regularly is this notion of what's called elasticity and the ability to have something stretch and get larger kind of on an on demand basis. That's another challenge and premise that we've really tried to work through. And you'll see how we've addressed that. Now, obviously, as you do this, you can achieve scale if you just put a ton of equipment in place much more maybe than you need at any given time but with that comes a lot of costs. And so as you think about wanting to have an agile and flexible system, what you'd also like is something where the costs flexes as your needs grow and it's elastic and that it can get larger and then it can get smaller as needed as well. So, we'll talk about how we do that with our GreenLake solution. And then finally it's complexity, it's trying to abstract away the vision for people of having to be aware of all the complexity it takes to build these systems and provide a single interface, a single experience for people to manage all of their IT assets. So we do that through this solution called HPE GreenLake and really we call it the cloud that comes to you. And as you think about what we're really trying to do here is take the notion of a cloud from being a place where people have thought about the public cloud and turning that to an idea of the cloud being an experience. And so it's regardless of whether it's in the public cloud or running on premise or as is the case with GreenLake, whether it's a mixture of those and maybe even a mixture of multiple public clouds with on-prem experience, the cloud now becomes something you experience and that you leverage as opposed to a place where you have an account and that can include edge computing combined with co-location or data center based computing. It could include equipment stored in your own data center and certainly it can include resources in the public cloud. So, let's take a look at how we go about delivering the experience and what some of those benefits are as we put these solutions in place. So, as you think about why you'd want to do this and the benefits you get from GreenLake, what we've seen in terms of both working with customers and actually having studies done with analysts is the benefits are numerous, but they come in areas that are shown here, one time to deployment. And that once you get this flexible and easily to manage environment in place with what we'll show you are these prebuilt, pre-configured and managed as a service solutions, your time to deployment for putting new workloads in place can shrink dramatically. The next in terms of having these pre-configured solutions and combining both the hardware and software technology with a set of managed services through our GreenLake managed services team, what you can do is dramatically reduce the risk of putting a new workload in place. So for example, if you wanted to deploy virtual desktop infrastructure and maybe you haven't done that in the past, you can leverage a GreenLake VDI solution along with GreenLake management services to very predictably and very reliably put that solution in place. So you're up and running focusing on the needs of your users with incredibly lowered risk, because this was built on a pre-validated and a pre-certified foundation. Obviously, I talked earlier about the idea with GreenLake is that you have flexibility in terms of scaling up your use of the resources, even though they're computers that may be in your data center or a colo, and also scaling them back down. So if you have workloads over time, that may be even an end of month cycle or an end to quarter cycle where certain workloads get larger and then would get smaller again, the ability with GreenLake on a consumption billing basis is there where your costs can flow as your use of the systems flow. And again, I'll show you a screen in just a few minutes, that kind of illustrates what that looks like. And then the last piece is the single pane of glass for control and insight into what's going on. And what we mean by that is not just what's going on from a cost perspective, but also what's going on from a system utilization perspective. You'll see in one of the screens I'll show that there's a system utilization report of all of your GreenLake resources that you can view at any time. And so what you can get visibility to, for example, with storage capacity as your storage capacity is being consumed over time as you generate more data, the system will tell you, hey, you're getting up to about 60, 70% utilized. And then at that point, we would be able to work with you to automatically deploy even though you won't be paying for it yet, additional storage capacity so it's ready as your needs grow to encompass that. So in terms of what are some of these services that we deliver as part of GreenLake? Well, they range and you see here a portfolio of services that we offer. If you start at the bottom, it's simple things, right? Things like compute as a service, and I'll show you examples of that today, networking as a service, hyper-converged infrastructure as a service. And then if we work our way up the stack, we move from kind of basic services to platform services, things like VMware and containers as a service. And then if we go to the top layer of this, we actually can offer complete solutions for targeted workloads. So if your need was for example, to run machine learning and AI, and you wanted to have a complete environment put in place that you could leverage for machine learning and AI and use it and consume it on a consumption as a service basis, we've got our MLOps solution that delivers that. And similarly, I mentioned earlier, VDI for virtual desktops or a solution for SAP HANA. So, the solutions range from very basic compute at the foundation all the way up to complete workload solutions that you can achieve. And the portfolio of what these are is expanding all the time. And as you'll see, you can go out to our hpe.com site and see a complete catalog of all the GreenLake services that are available. So let's take a minute and let's drill in like on that MLOps solution. And we can take a look at how that fits together and what makes that up. So, if you think about GreenLake for MLOps, it's a fast path for data scientists, and it's really oriented around the needs of data scientists within your organization who have a desire to be able to get in and start to analyze data for advantage in your business. So, what comes with an MLOps solution from GreenLake starts at the left side of the slide here with a fully curated hardware platform, including GPU based nodes, data science, optimized hardware, all the storage that you're going to need to run at scale and that performance to make these workloads work. And so that's one piece of it is a curated hardware stack for machine learning. Next in the software component, we pre-validated a whole bunch of the common stack elements that you would need. So beyond operating systems, but things for doing continuous integration, for things like TensorFlow and Jupyter notebooks are already pre-validated and delivered with this solution. So, the tools that your data scientists will need come with this, ready to go, out of the box. And then finally, as this solution gets delivered, there's a services component to it beyond just us installing this full thing and delivering a complete solution to you. But the GreenLake management services options where our services teams can work side by side with data scientists to assist them in getting up to speed on the solution, to leveraging the tools, to understanding best practices if you want those, if you want that assistance for deploying MLOps and the whole thing's delivered as a service. As similar, we similar solutions for other workloads like SAP HANA that would leverage again, different compute building blocks, but always in a way that's done for workload optimized solutions, best practice and that build up that stack. And so your experience in consuming this is always consistent, but what's running under the hood isn't just a generic solution that you might see in for example, a public cloud environment, it's a best practice, hardware optimized, software optimized environment built for each one of the workloads that we can deploy. So I like to do at this point is actually show you what's the process like for actually specifying a GreenLake solution. And maybe we'll take a look at compute as our example today. So, what I've got here is a browser experience, I'm just in my web browser, I'm on the hpe.com website and what I'd like to do. I mean the GreenLake section and I've actually clicked on this services menu and I'm going to go ahead and scroll down. And one of the things you can see here is that catalog of GreenLake services that I referenced. So, just like we showed you on the slide, this is that catalog of services that you can consume. I'm going to go to compute and we'll go about quoting a GreenLake compute solution. So we see when I clicked on that, one of the options I have is to get a price in my inbox. And I'll click on that to go in here to our GreenLake quick quote environment where if in my case here for our demonstration, I'll specify that I'd like to purchase to add to my GreenLake environment some additional general compute capability for some workloads that I might like to run. If I click on this, I go in and you notice here that I'm not going to specify server types. I'm really going to tell the system about the types of workloads that I'd like to run and the characteristics of those workloads. So for example, my workload choices would be adaptable performance or maybe densely optimized compute for highly scalable and high performance computing requirements. So, I'll select adaptable performance. I have a choice of processor types, my case, I'll pick Intel. And I then say, how many servers for the workloads that I want to run would be part of the solution. Again, in my case, maybe we'll quote a 20 server configuration. Now, as we think about the plans here, what you can see is we're really looking at the different options in terms of a balanced performance and price option which is the recommended option. But if I knew that the workloads I were going to run were more performance optimized, I could simply click on that option. And in the system under the hood does all the work to reconfigure the system. I'm not having to pick individual server options as you see. So once I picked between cost optimized balance or performance, I can go in here and select the rest of the options. Now, we'll start at the top right and you see here from a services perspective, this is where it specifies how much services content and in services assistance I'd like all the way from just doing proactive metering of my solution all the way through being able to do actual workload deployment and assistance with me physically managing the equipment myself. The other piece I'll focus on is this variable usage. And this comes back to how much of the variable time, variable capacity of additional capacity, what I like to have available in my data center for this solution. So if I know that my flex could be larger in the future of the capacity, I want to flex up and down. I might pick a slightly larger amount of flex capacity at my location as part of this solution. With that, I'd select that workload. And the less steps would be, I could click on get price and this whole thing will be packaged up and shipped to you in terms of the price of the solution. And any other details that you might like to see. And I encourage you to go out to hpe.com and to go through this process yourself for one of the workloads that might be of interest for you to get a flavor of that experience. So if we move forward, once you've deployed your GreenLake solution, one of the things you see here is that single pane of glass experience in terms of managing the system, right? We've got a single panel that all in one place provides you access to your cost information for billing, and what's driving that billing, your middle and the middle of the top center, you can see we've got information on the capacity planning but then we can actually drill in and actually look at additional things like services we offer around continuous compliance, capacity planning data for you to build and see how things like storage or filling, cost control information with recommendations around how you could reduce or minimize your costs based on the usage profile that you have. So, all of this is a fully integrated experience that can span components running both on-premise and also incorporating services that could be in the public cloud. Now, when we think about who's using this and why is this becoming attractive? You can imagine just looking at this capability that this ability to blend public cloud capabilities with on-premise or in a co-location, private data center capabilities provides tremendous power and provides tremendous flexibility for users. And so we're seeing this adopted broadly as kind of a new way, people are looking to take the advantages of cloud, but bring them into a much more self-managed or on-premise experience. And so some example, customers here include deployments in the automotive field, both at Porsche or over on the right at Zenseact, which is the autonomous driving division of Volvo where they're doing research with tremendous amounts of data to produce the best possible autonomous driving experience. And then in the center, Danfoss who is one of the world's leading manufacturers of both electric and hydraulic control components. And so as they produce components themselves, that drive an optimized management of physical infrastructure, power, liquids and cooling, they're leveraging GreenLake for the same type of control and best practice deployment of their data centers and of their IT infrastructure. So again, somebody who's innovating in their own world taking advantage of compute innovations to get the benefits of the cloud and the flexibility of a cloud-like environment but running within their own premise. And it's not just those three customers clearly. I mean, what we're seeing is, as you see on the slide, it's a unique solution in the market today. It provides the true benefits of the cloud, but with your own on-premise experience, it provides expertise in terms of services to help you take best advantage of it. And if you look at the adoption by customers, over a thousand customers in 50 countries have now deployed GreenLake based solutions as the foundation on which they're building their next generation IT architecture. So, there's a lot of unique capabilities that as we built GreenLake, that we have that really make this a single pane of glass and a very, very unified and elegant experience. So as we kind of wrap up, there's three things I want to call your attention to, one, GreenLake, which we focused a lot on today. I'd also like to call your attention to the point next services, which are an extension of those GreenLake services that I talked about earlier but there's a much broader portfolio of what Pointnext can do in delivering value for your organization. And then again, HPE financial services who much like what we do with GreenLake in this as a service consumption environment can provide a lot of financial flexibility in other models and other use cases. So, I'd encourage you to take time to learn about each of those three areas. And then there's obviously many many resources available online. And again, there's some that are listed here but it kind of as a single point takeaway from this slide, I encourage you to go to hpe.com. If you're interested in GreenLake, click on our GreenLake icon and you can take yourself through that quoting experience for what would be interesting and certainly as well for our compute solutions, there's a tremendous amount of information about the leading solutions that HPE brings to market. So with that, I hope that's been an informative set of experience. I'm thanking you for spending a little bit of time with us today and hopefully you'll take some time to learn more about GreenLake and how it might be a benefit for you within your organization. Thanks again.

Published Date : Apr 9 2021

SUMMARY :

and the benefits you get from GreenLake,

ENTITIES

Entity	Category	Confidence
Volvo	ORGANIZATION	0.99+
Mark Seamans	PERSON	0.99+
Porsche	ORGANIZATION	0.99+
three customers	QUANTITY	0.99+
today	DATE	0.99+
20 server	QUANTITY	0.99+
GreenLake	ORGANIZATION	0.99+
Pointnext	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Zenseact	ORGANIZATION	0.99+
70%	QUANTITY	0.99+
50 countries	QUANTITY	0.99+
over a thousand customers	QUANTITY	0.99+
three things	QUANTITY	0.99+
Danfoss	ORGANIZATION	0.98+
both	QUANTITY	0.98+
each	QUANTITY	0.98+
SAP HANA	TITLE	0.98+
single point	QUANTITY	0.98+
HPE	ORGANIZATION	0.98+
One	QUANTITY	0.98+
single panel	QUANTITY	0.97+
three areas	QUANTITY	0.97+
one piece	QUANTITY	0.97+
single pane	QUANTITY	0.97+
single interface	QUANTITY	0.96+
single experience	QUANTITY	0.95+
Jupyter	ORGANIZATION	0.94+
HPE GreenLake	TITLE	0.93+
hpe.com	OTHER	0.93+
GreenLake	TITLE	0.93+
about 60, 70%	QUANTITY	0.92+
TensorFlow	ORGANIZATION	0.91+
Deploy any Workload as a Service, When General Purpose Technology isn't Enough	TITLE	0.85+
one place	QUANTITY	0.84+
Session 05	QUANTITY	0.83+
Intel	ORGANIZATION	0.79+
one of	QUANTITY	0.78+
hpe.com	ORGANIZATION	0.78+
Gree	ORGANIZATION	0.77+
GreenLake	COMMERCIAL_ITEM	0.74+
up to	QUANTITY	0.67+
ones	QUANTITY	0.66+
last decade	DATE	0.66+

Bratin Saha, Amazon | AWS re:Invent 2020

>>From around the globe. It's the cube with digital coverage of AWS reinvent 2020 sponsored by Intel and AWS. >>Welcome back to the cubes, ongoing coverage, AWS, AWS reinvent virtual. The cube has gone virtual too, and continues to bring our digital coverage of events across the globe. It's been a big week, big couple of weeks at reinvent and a big week for machine intelligence in learning and AI and new services for customers. And with me to discuss the trends in this space is broadened Sahab, who is the vice president and general manager of machine learning services at AWS Rodan. Great to see you. Thanks for coming on the cube. >>Thank you, Dave. Thank you for having me. >>You're very welcome. Let's get right into it. I mean, I remember when SageMaker was announced it was 2017. Uh, it was really a seminal moment in the whole machine learning space, but take us through the journey over the last few years. Uh, what can you tell us? >>So, you know, what, when we came out with SageMaker customers were telling us that machine learning is hard and it was within, you know, it's only a few large organizations that could truly deploy machine learning at scale. And so we released SageMaker in 2017 and we have seen really broad adoption of SageMaker across the entire spectrum of industries. And today, most of the machine learning in the cloud, the vast majority of it happens on AWS. In fact, AWS has more than two weeks of the machine learning than any other provider. And, you know, we saw this morning that more than 90% of the TensorFlow in the cloud and more than 92% of the pipe out in the cloud happens on AWS. So what has happened in that is customers saw that it was much easier to do machine learning once they were using tools like SageMaker. >>And so many customers started applying a handful of models and they started to see that they were getting real business value. You know, machine learning was no longer a niche machine learning was no longer a fictional thing. It was something that they were getting real business value. And then they started to proliferate across that use cases. And so these customers went from deploying like tens of models to deploying hundreds and thousands of models inside. We have one customer that is deploying more than a million models. And so that is what we have seen is really making machine learning broadly accessible to our customers through the use of SageMaker. >>Yeah. So you probably very quickly went through the experimentation phase and people said, wow, you got the aha moments. And, and, and so adoption went through the roof. What kind of patterns have you seen in terms of the way in which people are using data and maybe some of the problems and challenges that has created for organizations that they've asked you to erect help them rectify? Yes. >>And in fact, in a SageMaker is today one of the fastest growing services in AWS history. And what we have seen happen is as customer scaled out the machine learning deployments, they asked us to help them solve the issues that used to come when you deploy machine learning at scale. So one of the things that happens is when you're doing machine learning, you spend a lot of time preparing the data, cleaning the data, making sure the data is done correctly, so it can train your models. And customers wanted to be able to do the data prep in the same service in which they were doing machine learning. And hence we launched Sage, make a data and learn where with a few clicks, you can connect a variety of data stores, AWS data stores, or third party data stores, and do all of your data preparation. >>Now, once you've done your data preparation, customers wanted to be able to store that data. And that's why we came out with SageMaker feature store and then customers want to be able to take this entire end to end pipeline and be able to automate the whole thing. And that is why we came up with SageMaker pipelines. And then one of the things that customers have asked us to help them address is this issue of statistical bias and explainability. And so we released SageMaker clarify that actually helps customers look at statistical bias to the entire machine learning workflow before you do, when you're doing a data processing before you train your model. And even after you have deployed your model and it gives us insights into why your model is behaving in a particular way. And then we had machine learning in the cloud and many customers have started deploying machine learning at the edge, and they want to be able to deploy these models at the edge and wanted a solution that says, Hey, can I take all of these machine learning capabilities that I have in the cloud, specifically, the model management and the MLR SKP abilities and deploy them to the edge devices. >>And that is why we launched SageMaker edge manager. And then customers said, you know, we still need our basic functionality of training and so on to be faster. And so we released a number of enhancements to SageMaker distributed training in terms of new data, parallel models and new model parallelism models that give the fastest training time on SageMaker across both the frameworks. And, you know, that is one of the key things that we have at AWS is we give customers choice. We don't force them onto a single framework. >>Okay, great. And we, I think we hit them all except, uh, I don't know if you talked about SageMaker debugger, but we will. So I want to come back to and ask you a couple of questions about these features. So it's funny. Sometimes people make fun of your names, but I like them because they said, it says what it does because, because people tell me that I spend all my time wrangling data. So you have data Wrangler, it's, you know, it's all about transformation cleaning. And, and because you don't want to spend 80% of your time wrangling data, you want to spend 80 of your time, you know, driving insights and, and monetization. So, so how, how does one engage with, with data Wrangler and how do you see the possibilities there? >>So data angler is part of SageMaker studio. SageMaker studio was the world's first, fully integrated development run for machine learning. So you come to SageMaker studio, you have a tab there, which you SageMaker data angler, and then you have a visual UI. So that visual UI with just a single click, you can connect to AWS data stores like, you know, red shift or a Tina or third party data stores like snowflake and Databricks and Mongo DB, which will be coming. And then you have a set of built-in data processes for machine learning. So you get that data and you do some interactive processing. Once you're happy with the results of your data, you can just send it off as an automated data pipeline job. And, you know, it's really today the easiest and fastest way to do machine learning and really take out that 80% that you were talking about. >>Has it been so hard to automate the Sage, the pipelines to bring CIC D uh, to, uh, data pipelines? Why has that been such a challenge? And how did you resolve that? >>You know, what has happened is when you look at machine learning, machine learning deals with both code and data, okay. Unlike software, which really has to deal with only code. And so we had the CIC D tools for software, but someone needed to extend it to operating on both data and code. And at the same time, you know, you want to provide reproducibility and lineage and trackability, and really getting that whole end to end system to work across code and data across multiple capabilities was what made it hard. And, you know, that is where we brought in SageMaker pipelines to make this easy for our customers. >>Got it. Thank you. And then let me ask you about, uh, clarify. And this is a huge issue in, in machine intelligence, uh, you know, humans by the very nature of bias that they build models, the models of bias in them. Uh, and so you bringing transplant the other problem with, with AI, and I'm not sure that you're solving this problem, but please clarify if you are no pun intended, but it's that black box AI is a black box. I don't know how the answer, how we got to the answer. It seems like you're attacking that, bringing more transparency and really trying to deal with the biases. I wonder if you could talk about how you do that and how people can expect this to affect their operations. >>I'm glad you asked this question because you know, customers have also asked us about the SageMaker clarify is really intended to address the questions that you brought up. One is it gives you the tools to provide a lot of statistical analysis on the data set that you started with. So let's say you were creating a model for loan approvals, and you want to make sure that, you know, you have equal number of male applicants and equal number of female applicants and so on. So SageMaker clarify, lets you run these kinds of analysis to make sure that your data set is balanced to start with. Now, once that happens, you have trained the model. Once you've trained the model, you want to make sure that the training process did not introduce any unintended statistical bias. So then you can use, SageMaker clarify to again, say, well, is the model behaving in the way I expected it to behave based on the training data I had. >>So let's say your training data set, you know, 50% of all the male applicants got the loans approved after training, you can use, clarify to say, does this model actually predict that 50% of the male applicants will get approved? And if it's more than less, you know, you have a problem. And then after that, we get to the problem you mentioned, which is how do we unravel the black box nature of this? And you know, we took the first steps of it last year with autopilot where we actually gave notebooks. But SageMaker clarify really makes it much better because it tells you why our model is predicting the way it's predicting. It gives you the reasons and it tells you, you know, here is why the model predicts that, you know, you had approved a loan and here's why the model said that you may or may not get a loan. So it really makes it easier, gives visibility and transparency and helps to convert insights that you get from model predictions into actionable insights because you now know why the model is predicting what it's predicting. >>That brings out the confidence level. Okay. Thank you for that. Let me, let me ask you about distributed training on SageMaker help us understand what problem you're solving. You're injecting auto parallelism. Is that about, about scale? Help us understand that. >>Yeah. So one of the things that's happening is, you know, our customers are starting to train really large models like, you know, three years back, they will train models with like 20 million parameters. You know, last year they would train models with like couple of hundred million parameters. Now customers are actually training models with billions of parameters. And when you have such large models, that training can take days and sometimes weeks. And so what we have done E are two concepts. One is we introduced a way of taking a model and training it in parallel and multiple GPU's. And that's, you know what we call a data parallel implementation. We have our own custom libraries for this, which give you the fastest performance on AWS. And then the other thing that happens is customer stakes. Some of these models that are fairly large, you know, like billions of parameters and we showed one of them today called T five and these models are so big that they cannot fit in the memory of a single GPU. And so what happens is today customers have to train such a model. They spend weeks of effort trying to paralyze that Marlon, what we introduced in SageMaker today is a mechanism that automatically takes these large models and distributes it across multiple GPU's the auto parallelization that you were talking about, making it much easier and much faster for customers to really work with these big models. >>Well, the GPU is a very expensive resource. And prior to this, you would have the GPU waiting, waiting, waiting, load me up and you don't want to do that with it. Expensive resources. Yeah. >>And you know, one of the things I mentioned before is Sage make a debugger. So one of the things that we also came out with today is the SageMaker profiler, which is only part of the debugger that lets you look at your GPU utilization at your CPU utilization at, in network utilization and so on. And so now, you know, when your training job has started at which point has the GPU utilization gone down and you can go in and fix it. So this really lets you meet, utilize your resources much better and ultimately reducing your cost of training and making it more efficient. Awesome. >>Let's talk about edge manager because I, you know, Andy Jassy, his keynote was interesting. He his, where he's talking about hybrid and his vision is basically an Amazon's vision is we want to bring AWS to the edge. We see the data center as just another edge node. And so, so this is, to me, another example of, uh, of AWS is, you know, edge strategy, talk about how that works and, and, and, and in practice, uh, how does, how does it work? Am I doing inference at the edge and then bringing back data into the cloud? Uh, am I, am I doing things locally? >>Yes. So, you know what? See each man got edge manager does, is it helps you manage, deploy and manage and manage models at the edge. The inference is happening on the edge device. Now considers his case. So Lenovo has been working with us. And what Lenovo wants to do is to take these models and do predictive maintenance on laptops. So you want to get an it shop and you have a couple of hundred thousand laptops. You would want to know when something may go down. And so the deployed is predictive maintenance models on the laptop. They're doing inference locally on the laptop, but you want to see are the models getting degraded and you want to be able to see is the quality up. So what H manager does is number one, it takes your models, optimizes them so they can run on an edge device and we get up to 25 X benefit and then once you've deployed it, it helps you monitor the quality of the models by letting you upload data samples to SageMaker so that you can see if there is drift in your models, that if there's any other degradation, >>All right. And jumpstart is where I go to. It's kind of the portal that I go to, to access all these cool tools. Is that right? Yep. >>And you know, we have a lot of getting started material, lots of false party models, lots of open source models and solutions. >>I probably we're out of time, but I could go on forever and we did thanks so much for, for bringing this knowledge to the cube audience. Really appreciate your time. >>Thank you. Thank you, Dave, for having me. >>And you're very welcome and good luck with the, the announcements. And thank you for watching everybody. This is Dave Volante for the cube and our coverage of AWS reinvent 2020 continues right after this short break.

Published Date : Dec 10 2020

SUMMARY :

It's the cube with digital coverage of AWS And with me to discuss the trends in this Uh, what can you tell us? and it was within, you know, it's only a few large organizations that And so that is what we have seen is really making machine learning broadly accessible and challenges that has created for organizations that they've asked you to erect help them rectify? to come when you deploy machine learning at scale. And even after you have And then customers said, you know, we still need our basic functionality of training And we, I think we hit them all except, uh, I don't know if you talked about SageMaker debugger, And then you have a set of built-in data processes And at the same time, you know, you want to provide reproducibility and And then let me ask you about, uh, clarify. is really intended to address the questions that you brought up. And if it's more than less, you know, you have a problem. Thank you for that. And when you have such large models, And prior to this, you would have the GPU waiting, And so now, you know, when your training job has started at you know, edge strategy, talk about how that works and, and, They're doing inference locally on the laptop, but you want And jumpstart is where I go to. And you know, we have a lot of getting started material, lots of false party models, knowledge to the cube audience. Thank you. And thank you for watching everybody.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Dave	PERSON	0.99+
50%	QUANTITY	0.99+
80%	QUANTITY	0.99+
2017	DATE	0.99+
Amazon	ORGANIZATION	0.99+
Sahab	PERSON	0.99+
more than two weeks	QUANTITY	0.99+
80	QUANTITY	0.99+
last year	DATE	0.99+
more than a million models	QUANTITY	0.99+
one	QUANTITY	0.99+
Bratin Saha	PERSON	0.99+
tens of models	QUANTITY	0.99+
Dave Volante	PERSON	0.99+
today	DATE	0.99+
more than 92%	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
both	QUANTITY	0.99+
one customer	QUANTITY	0.99+
two concepts	QUANTITY	0.99+
SageMaker	ORGANIZATION	0.99+
more than 90%	QUANTITY	0.99+
Lenovo	ORGANIZATION	0.98+
SageMaker	TITLE	0.98+
first	QUANTITY	0.98+
One	QUANTITY	0.98+
Tina	ORGANIZATION	0.98+
three years back	DATE	0.97+
Sage	ORGANIZATION	0.96+
each	QUANTITY	0.96+
first steps	QUANTITY	0.96+
single framework	QUANTITY	0.96+
both data	QUANTITY	0.95+
Intel	ORGANIZATION	0.95+
this morning	DATE	0.94+
AWS Rodan	ORGANIZATION	0.92+
20 million parameters	QUANTITY	0.92+
snowflake	ORGANIZATION	0.91+
single GPU	QUANTITY	0.9+
hundreds and thousands of models	QUANTITY	0.89+
billions of parameters	QUANTITY	0.86+
Mongo DB	ORGANIZATION	0.8+
couple of hundred thousand laptops	QUANTITY	0.77+

December 8th Keynote Analysis | AWS re:Invent 2020

>>From around the globe. It's the cube with digital coverage of AWS reinvent 2020 sponsored by Intel, AWS, and our community partners. >>Hi everyone. Welcome back to the cubes. Virtual coverage of AWS reinvent 2020 virtual. We are the cube virtual I'm John ferry, your host with my coach, Dave Alante for keynote analysis from Swami's machine learning, all things, data huge. Instead of announcements, the first ever machine learning keynote at a re-invent Dave. Great to see you. Thanks Johnny. And from Boston, I'm here in Palo Alto. We're doing the cube remote cube virtual. Great to see you. >>Yeah, good to be here, John, as always. Wall-to-wall love it. So, so, John, um, how about I give you my, my key highlights from the, uh, from the keynote today, I had, I had four kind of curated takeaways. So the first is that AWS is, is really trying to simplify machine learning and use machine intelligence into all applications. And if you think about it, it's good news for organizations because they're not the become machine learning experts have invent machine learning. They can buy it from Amazon. I think the second is they're trying to simplify the data pipeline. The data pipeline today is characterized by a series of hyper specialized individuals. It engineers, data scientists, quality engineers, analysts, developers. These are folks that are largely live in their own swim lane. Uh, and while they collaborate, uh, there's still a fairly linear and complicated data pipeline, uh, that, that a business person or a data product builder has to go through Amazon making some moves to the front of simplify that they're expanding data access to the line of business. I think that's a key point. Is there, there increasingly as people build data products and data services that can monetize, you know, for their business, either cut costs or generate revenue, they can expand that into line of business where there's there's domain context. And I think the last thing is this theme that we talked about the other day, John of extending Amazon, AWS to the edge that we saw that as well in a number of machine learning tools that, uh, Swami talked about. >>Yeah, it was great by the way, we're live here, uh, in Palo Alto in Boston covering the analysis, tons of content on the cube, check out the cube.net and also check out at reinvent. There's a cube section as there's some links to so on demand videos with all the content we've had. Dave, I got to say one of the things that's apparent to me, and this came out of my one-on-one with Andy Jassy and Andy Jassy talked about in his keynote is he kind of teased out this idea of training versus a more value add machine learning. And you saw that today in today's announcement. To me, the big revelation was that the training aspect of machine learning, um, is what can be automated away. And it's under a lot of controversy around it. Recently, a Google paper came out and the person was essentially kind of, kind of let go for this. >>But the idea of doing these training algorithms, some are saying is causes more harm to the environment than it does good because of all the compute power it takes. So you start to see the positioning of training, which can be automated away and served up with, you know, high powered ships and that's, they consider that undifferentiated heavy lifting. In my opinion, they didn't say that, but that's clearly what I see coming out of this announcement. The other thing that I saw Dave that's notable is you saw them clearly taking a three lane approach to this machine, learning the advanced builders, the advanced coders and the developers, and then database and data analysts, three swim lanes of personas of target audience. Clearly that is in line with SageMaker and the embedded stuff. So two big revelations, more horsepower required to process training and modeling. Okay. And to the expansion of the personas that are going to be using machine learning. So clearly this is a, to me, a big trend wave that we're seeing that validates some of the startups and I'll see their SageMaker and some of their products. >>Well, as I was saying at the top, I think Amazon's really trying, working hard on simplifying the whole process. And you mentioned training and, and a lot of times people are starting from scratch when they have to train models and retrain models. And so what they're doing is they're trying to create reusable components, uh, and allow people to, as you pointed out to automate and streamline some of that heavy lifting, uh, and as well, they talked a lot about, uh, doing, doing AI inferencing at the edge. And you're seeing, you know, they, they, uh, Swami talked about several foundational premises and the first being a foundation of frameworks. And you think about that at the, at the lowest level of their S their ML stack. They've got, you know, GPU's different processors, inferential, all these alternative processes, processors, not just the, the Xav six. And so these are very expensive resources and Swami talked a lot about, uh, and his colleagues talked a lot about, well, a lot of times the alternative processor is sitting there, you know, waiting, waiting, waiting. And so they're really trying to drive efficiency and speed. They talked a lot about compressing the time that it takes to, to run these, these models, uh, from, from sometimes weeks down to days, sometimes days down to hours and minutes. >>Yeah. Let's, let's unpack these four areas. Let's stay on the firm foundation because that's their core competency infrastructure as a service. Clearly they're laying that down. You put the processors, but what's interesting is the TensorFlow 92% of tensor flows on Amazon. The other thing is that pie torch surprisingly is back up there, um, with massive adoption and the numbers on pie torch literally is on fire. I was coming in and joke on Twitter. Um, we, a PI torch is telling because that means that TensorFlow is originally part of Google is getting, is getting a little bit diluted with other frameworks, and then you've got MX net, some other things out there. So the fact that you've got PI torch 91% and then TensorFlow 92% on 80 bucks is a huge validation. That means that the majority of most machine learning development and deep learning is happening on AWS. Um, >>Yeah, cloud-based, by the way, just to clarify, that's the 90% of cloud-based cloud, uh, TensorFlow runs on and 91% of cloud-based PI torch runs on ADM is amazingly massive numbers. >>Yeah. And I think that the, the processor has to show that it's not trivial to do the machine learning, but, you know, that's where the infrared internship came in. That's kind of where they want to go lay down that foundation. And they had Tanium, they had trainee, um, they had, um, infrared chow was the chip. And then, you know, just true, you know, distributed training training on SageMaker. So you got the chip and then you've got Sage makers, the middleware games, almost like a machine learning stack. That's what they're putting out there >>And how bad a Gowdy, which was, which is, which is a patrol also for training, which is an Intel based chip. Uh, so that was kind of interesting. So a lot of new chips and, and specialized just, we've been talking about this for awhile, particularly as you get to the edge and do AI inferencing, you need, uh, you know, a different approach than we're used to with the general purpose microbes. >>So what gets your take on tenant? Number two? So tenant number one, clearly infrastructure, a lot of announcements we'll go through those, review them at the end, but tenant number two, that Swami put out there was creating the shortest path to success for builders or machine learning builders. And I think here you lays out the complexity, Dave butts, mostly around methodology, and, you know, the value activities required to execute. And again, this points to the complexity problem that they have. What's your take on this? >>Yeah. Well you think about, again, I'm talking about the pipeline, you collect data, you just data, you prepare that data, you analyze that data. You, you, you make sure that it's it's high quality and then you start the training and then you're iterating. And so they really trying to automate as much as possible and simplify as much as possible. What I really liked about that segment of foundation, number two, if you will, is the example, the customer example of the speaker from the NFL, you know, talked about, uh, you know, the AWS stats that we see in the commercials, uh, next gen stats. Uh, and, and she talked about the ways in which they've, well, we all know they've, they've rearchitected helmets. Uh, they've been, it's really a very much database. It was interesting to see they had the spectrum of the helmets that were, you know, the safest, most safe to the least safe and how they've migrated everybody in the NFL to those that they, she started a 24%. >>It was interesting how she wanted a 24% reduction in reported concussions. You know, you got to give the benefit of the doubt and assume some of that's through, through the data. But you know, some of that could be like, you know, Julian Edelman popping up off the ground. When, you know, we had a concussion, he doesn't want to come out of the game with the new protocol, but no doubt, they're collecting more data on this stuff, and it's not just head injuries. And she talked about ankle injuries, knee injuries. So all this comes from training models and reducing the time it takes to actually go from raw data to insights. >>Yeah. I mean, I think the NFL is a great example. You and I both know how hard it is to get the NFL to come on and do an interview. They're very coy. They don't really put their name on anything much because of the value of the NFL, this a meaningful partnership. You had the, the person onstage virtually really going into some real detail around the depth of the partnership. So to me, it's real, first of all, I love stat cast 11, anything to do with what they do with the stats is phenomenal at this point. So the real world example, Dave, that you starting to see sports as one metaphor, healthcare, and others are going to see those coming in to me, totally a tale sign that Amazon's continued to lead. The thing that got my attention was is that it is an IOT problem, and there's no reason why they shouldn't get to it. I mean, some say that, Oh, concussion, NFL is just covering their butt. They don't have to, this is actually really working. So you got the tech, why not use it? And they are. So that, to me, that's impressive. And I think that's, again, a digital transformation sign that, that, you know, in the NFL is doing it. It's real. Um, because it's just easier. >>I think, look, I think, I think it's easy to criticize the NFL, but the re the reality is, is there anything old days? It was like, Hey, you get your bell rung and get back out there. That's just the way it was a football players, you know, but Ted Johnson was one of the first and, you know, bill Bellacheck was, was, you know, the guy who sent him back out there with a concussion, but, but he was very much outspoken. You've got to give the NFL credit. Uh, it didn't just ignore the problem. Yeah. Maybe it, it took a little while, but you know, these things take some time because, you know, it's generally was generally accepted, you know, back in the day that, okay, Hey, you'd get right back out there, but, but the NFL has made big investments there. And you can say, you got to give him, give him props for that. And especially given that they're collecting all this data. That to me is the most interesting angle here is letting the data inform the actions. >>And next step, after the NFL, they had this data prep data Wrangler news, that they're now integrating snowflakes, Databricks, Mongo DB, into SageMaker, which is a theme there of Redshift S3 and Lake formation into not the other way around. So again, you've been following this pretty closely, uh, specifically the snowflake recent IPO and their success. Um, this is an ecosystem play for Amazon. What does it mean? >>Well, a couple of things, as we, as you well know, John, when you first called me up, I was in Dallas and I flew into New York and an ice storm to get to the one of the early Duke worlds. You know, and back then it was all batch. The big data was this big batch job. And today you want to combine that batch. There's still a lot of need for batch, but when people want real time inferencing and AWS is bringing that together and they're bringing in multiple data sources, you mentioned Databricks and snowflake Mongo. These are three platforms that are doing very well in the market and holding a lot of data in AWS and saying, okay, Hey, we want to be the brain in the middle. You can import data from any of those sources. And I'm sure they're going to add more over time. Uh, and so they talked about 300 pre-configured data transformations, uh, that now come with stage maker of SageMaker studio with essentially, I've talked about this a lot. It's essentially abstracting away the, it complexity, the whole it operations piece. I mean, it's the same old theme that AWS is just pointing. It's its platform and its cloud at non undifferentiated, heavy lifting. And it's moving it up the stack now into the data life cycle and data pipeline, which is one of the biggest blockers to monetizing data. >>Expand on that more. What does that actually mean? I'm an it person translate that into it. Speak. Yeah. >>So today, if you're, if you're a business person and you want, you want the answers, right, and you want say to adjust a new data source, so let's say you want to build a new, new product. Um, let me give an example. Let's say you're like a Spotify, make it up. And, and you do music today, but let's say you want to add, you know, movies, or you want to add podcasts and you want to start monetizing that you want to, you want to identify, who's watching what you want to create new metadata. Well, you need new data sources. So what you do as a business person that wants to create that new data product, let's say for podcasts, you have to knock on the door, get to the front of the data pipeline line and say, okay, Hey, can you please add this data source? >>And then everybody else down the line has to get in line and Hey, this becomes a new data source. And it's this linear process where very specialized individuals have to do their part. And then at the other end, you know, it comes to self-serve capability that somebody can use to either build dashboards or build a data product. In a lot of that middle part is our operational details around deploying infrastructure, deploying, you know, training machine learning models that a lot of Python coding. Yeah. There's SQL queries that have to be done. So a lot of very highly specialized activities, what Amazon is doing, my takeaway is they're really streamlining a lot of those activities, removing what they always call the non undifferentiated, heavy lifting abstracting away that it complexity to me, this is a real positive sign, because it's all about the technology serving the business, as opposed to historically, it's the business begging the technology department to please help me. The technology department obviously evolving from, you know, the, the glass house, if you will, to this new data, data pipeline data, life cycle. >>Yeah. I mean, it's classic agility to take down those. I mean, it's undifferentiated, I guess, but if it actually works, just create a differentiated product. So, but it's just log it's that it's, you can debate that kind of aspect of it, but I hear what you're saying, just get rid of it and make it simpler. Um, the impact of machine learning is Dave is one came out clear on this, uh, SageMaker clarify announcement, which is a bias decision algorithm. They had an expert, uh, nationally CFUs presented essentially how they're dealing with the, the, the bias piece of it. I thought that was very interesting. What'd you think? >>Well, so humans are biased and so humans build models or models are inherently biased. And so I thought it was, you know, this is a huge problem to big problems in artificial intelligence. One is the inherent bias in the models. And the second is the lack of transparency that, you know, they call it the black box problem, like, okay, I know there was an answer there, but how did it get to that answer and how do I trace it back? Uh, and so Amazon is really trying to attack those, uh, with, with, with clarify. I wasn't sure if it was clarity or clarified, I think it's clarity clarify, um, a lot of entirely certain how it works. So we really have to dig more into that, but it's essentially identifying situations where there is bias flagging those, and then, you know, I believe making recommendations as to how it can be stamped. >>Nope. Yeah. And also some other news deep profiling for debugger. So you could make a debugger, which is a deep profile on neural network training, um, which is very cool again on that same theme of profiling. The other thing that I found >>That remind me, John, if I may interrupt there reminded me of like grammar corrections and, you know, when you're typing, it's like, you know, bug code corrections and automated debugging, try this. >>It wasn't like a better debugger come on. We, first of all, it should be bug free code, but, um, you know, there's always biases of the data is critical. Um, the other news I thought was interesting and then Amazon's claiming this is the first SageMaker pipelines for purpose-built CIC D uh, for machine learning, bringing machine learning into a developer construct. And I think this started bringing in this idea of the edge manager where you have, you know, and they call it the about machine, uh, uh, SageMaker store storing your functions of this idea of managing and monitoring machine learning modules effectively is on the edge. And, and through the development process is interesting and really targeting that developer, Dave, >>Yeah, applying CIC D to the machine learning and machine intelligence has always been very challenging because again, there's so many piece parts. And so, you know, I said it the other day, it's like a lot of the innovations that Amazon comes out with are things that have problems that have come up given the pace of innovation that they're putting forth. And, and it's like the customers drinking from a fire hose. We've talked about this at previous reinvents and the, and the customers keep up with the pace of Amazon. So I see this as Amazon trying to reduce friction, you know, across its entire stack. Most, for example, >>Let me lay it out. A slide ahead, build machine learning, gurus developers, and then database and data analysts, clearly database developers and data analysts are on their radar. This is not the first time we've heard that. But we, as the kind of it is the first time we're starting to see products materialized where you have machine learning for databases, data warehouse, and data lakes, and then BI tools. So again, three different segments, the databases, the data warehouse and data lakes, and then the BI tools, three areas of machine learning, innovation, where you're seeing some product news, your, your take on this natural evolution. >>Well, well, it's what I'm saying up front is that the good news for, for, for our customers is you don't have to be a Google or Amazon or Facebook to be a super expert at AI. Uh, companies like Amazon are going to be providing products that you can then apply to your business. And, and it's allowed you to infuse AI across your entire application portfolio. Amazon Redshift ML was another, um, example of them, abstracting complexity. They're taking, they're taking S3 Redshift and SageMaker complexity and abstracting that and presenting it to the data analysts. So that, that, that individual can worry about, you know, again, getting to the insights, it's injecting ML into the database much in the same way, frankly, the big query has done that. And so that's a huge, huge positive. When you talk to customers, they, they love the fact that when, when ML can be embedded into the, into the database and it simplifies, uh, that, that all that, uh, uh, uh, complexity, they absolutely love it because they can focus on more important things. >>Clearly I'm this tenant, and this is part of the keynote. They were laying out all their announcements, quick excitement and ML insights out of the box, quick, quick site cue available in preview all the announcements. And then they moved on to the next, the fourth tenant day solving real problems end to end, kind of reminds me of the theme we heard at Dell technology worlds last year end to end it. So we are starting to see the, the, the land grab my opinion, Amazon really going after, beyond I, as in pass, they talked about contact content, contact centers, Kendra, uh, lookout for metrics, and that'll maintain men. Then Matt would came on, talk about all the massive disruption on the, in the industries. And he said, literally machine learning will disrupt every industry. They spent a lot of time on that and they went into the computer vision at the edge, which I'm a big fan of. I just loved that product. Clearly, every innovation, I mean, every vertical Dave is up for grabs. That's the key. Dr. Matt would message. >>Yeah. I mean, I totally agree. I mean, I see that machine intelligence as a top layer of, you know, the S the stack. And as I said, it's going to be infused into all areas. It's not some kind of separate thing, you know, like, Coobernetti's, we think it's some separate thing. It's not, it's going to be embedded everywhere. And I really like Amazon's edge strategy. It's this, you, you are the first to sort of write about it and your keynote preview, Andy Jassy said, we see, we see, we want to bring AWS to the edge. And we see data center as just another edge node. And so what they're doing is they're bringing SDKs. They've got a package of sensors. They're bringing appliances. I've said many, many times the developers are going to be, you know, the linchpin to the edge. And so Amazon is bringing its entire, you know, data plane is control plane, it's API APIs to the edge and giving builders or slash developers, the ability to innovate. And I really liked the strategy versus, Hey, here's a box it's, it's got an x86 processor inside on a, throw it over the edge, give it a cool name that has edge in it. And here you go, >>That sounds call it hyper edge. You know, I mean, the thing that's true is the data aspect at the edge. I mean, everything's got a database data warehouse and data lakes are involved in everything. And then, and some sort of BI or tools to get the data and work with the data or the data analyst, data feeds, machine learning, critical piece to all this, Dave, I mean, this is like databases used to be boring, like boring field. Like, you know, if you were a database, I have a degree in a database design, one of my degrees who do science degrees back then no one really cared. If you were a database person. Now it's like, man data, everything. This is a whole new field. This is an opportunity. But also, I mean, are there enough people out there to do all this? >>Well, it's a great point. And I think this is why Amazon is trying to extract some of the abstract. Some of the complexity I sat in on a private session around databases today and listened to a number of customers. And I will say this, you know, some of it I think was NDA. So I can't, I can't say too much, but I will say this Amazon's philosophy of the database. And you address this in your conversation with Andy Jassy across its entire portfolio is to have really, really fine grain access to the deep level API APIs across all their services. And he said, he said this to you. We don't necessarily want to be the abstraction layer per se, because when the market changes, that's harder for us to change. We want to have that fine-grained access. And so you're seeing that with database, whether it's, you know, no sequel, sequel, you know, the, the Aurora the different flavors of Aurora dynamo, DV, uh, red shift, uh, you know, already S on and on and on. There's just a number of data stores. And you're seeing, for instance, Oracle take a completely different approach. Yes, they have my SQL cause they know got that with the sun acquisition. But, but this is they're really about put, is putting as much capability into a single database as possible. Oh, you only need one database only different philosophy. >>Yeah. And then obviously a health Lake. And then that was pretty much the end of the, the announcements big impact to health care. Again, the theme of horizontal data, vertical specialization with data science and software playing out in real time. >>Yeah. Well, so I have asked this question many times in the cube, when is it that machines will be able to make better diagnoses than doctors and you know, that day is coming. If it's not here, uh, you know, I think helped like is really interesting. I've got an interview later on with one of the practitioners in that space. And so, you know, healthcare is something that is an industry that's ripe for disruption. It really hasn't been disruption disrupted. It's a very high, high risk obviously industry. Uh, but look at healthcare as we all know, it's too expensive. It's too slow. It's too cumbersome. It's too long sometimes to get to a diagnosis or be seen, Amazon's trying to attack with its partners, all of those problems. >>Well, Dave, let's, let's summarize our take on Amazon keynote with machine learning, I'll say pretty historic in the sense that there was so much content in first keynote last year with Andy Jassy, he spent like 75 minutes. He told me on machine learning, they had to kind of create their own category Swami, who we interviewed many times on the cube was awesome. But a lot of still a lot more stuff, more, 215 announcements this year, machine learning more capabilities than ever before. Um, moving faster, solving real problems, targeting the builders, um, fraud platform set of things is the Amazon cadence. What's your analysis of the keynote? >>Well, so I think a couple of things, one is, you know, we've said for a while now that the new innovation cocktail is cloud plus data, plus AI, it's really data machine intelligence or AI applied to that data. And the scale at cloud Amazon Naylor obviously has nailed the cloud infrastructure. It's got the data. That's why database is so important and it's gotta be a leader in machine intelligence. And you're seeing this in the, in the spending data, you know, with our partner ETR, you see that, uh, that AI and ML in terms of spending momentum is, is at the highest or, or at the highest, along with automation, uh, and containers. And so in. Why is that? It's because everybody is trying to infuse AI into their application portfolios. They're trying to automate as much as possible. They're trying to get insights that, that the systems can take action on. >>And, and, and actually it's really augmented intelligence in a big way, but, but really driving insights, speeding that time to insight and Amazon, they have to be a leader there that it's Amazon it's, it's, it's Google, it's the Facebook's, it's obviously Microsoft, you know, IBM's Tron trying to get in there. They were kind of first with, with Watson, but with they're far behind, I think, uh, the, the hyper hyper scale guys. Uh, but, but I guess like the key point is you're going to be buying this. Most companies are going to be buying this, not building it. And that's good news for organizations. >>Yeah. I mean, you get 80% there with the product. Why not go that way? The alternative is try to find some machine learning people to build it. They're hard to find. Um, so the seeing the scale of kind of replicating machine learning expertise with SageMaker, then ultimately into databases and tools, and then ultimately built into applications. I think, you know, this is the thing that I think they, my opinion is that Amazon continues to move up the stack, uh, with their capabilities. And I think machine learning is interesting because it's a whole new set of it's kind of its own little monster building block. That's just not one thing it's going to be super important. I think it's going to have an impact on the startup scene and innovation is going, gonna have an impact on incumbent companies that are currently leaders that are under threat from new entrance entering the business. >>So I think it's going to be a very entrepreneurial opportunity. And I think it's going to be interesting to see is how machine learning plays that role. Is it a defining feature that's core to the intellectual property, or is it enabling new intellectual property? So to me, I just don't see how that's going to fall yet. I would bet that today intellectual property will be built on top of Amazon's machine learning, where the new algorithms and the new things will be built separately. If you compete head to head with that scale, you could be on the wrong side of history. Again, this is a bet that the startups and the venture capitals will have to make is who's going to end up being on the right wave here. Because if you make the wrong design choice, you can have a very complex environment with IOT or whatever your app serving. If you can narrow it down and get a wedge in the marketplace, if you're a company, um, I think that's going to be an advantage. This could be great just to see how the impact of the ecosystem this will be. >>Well, I think something you said just now it gives a clue. You talked about, you know, the, the difficulty of finding the skills. And I think that's a big part of what Amazon and others who were innovating in machine learning are trying to do is the gap between those that are qualified to actually do this stuff. The data scientists, the quality engineers, the data engineers, et cetera. And so companies, you know, the last 10 years went out and tried to hire these people. They couldn't find them, they tried to train them. So it's taking too long. And now that I think they're looking toward machine intelligence to really solve that problem, because that scales, as we, as we know, outsourcing to services companies and just, you know, hardcore heavy lifting, does it doesn't scale that well, >>Well, you know what, give me some machine learning, give it to me faster. I want to take the 80% there and allow us to build certainly on the media cloud and the cube virtual that we're doing. Again, every vertical is going to impact a Dave. Great to see you, uh, great stuff. So far week two. So, you know, we're cube live, we're live covering the keynotes tomorrow. We'll be covering the keynotes for the public sector day. That should be chock-full action. That environment is going to impact the most by COVID a lot of innovation, a lot of coverage. I'm John Ferrari. And with Dave Alante, thanks for watching.

Published Date : Dec 9 2020

SUMMARY :

It's the cube with digital coverage of Welcome back to the cubes. people build data products and data services that can monetize, you know, And you saw that today in today's And to the expansion of the personas that And you mentioned training and, and a lot of times people are starting from scratch when That means that the majority of most machine learning development and deep learning is happening Yeah, cloud-based, by the way, just to clarify, that's the 90% of cloud-based cloud, And then, you know, just true, you know, and, and specialized just, we've been talking about this for awhile, particularly as you get to the edge and do And I think here you lays out the complexity, It was interesting to see they had the spectrum of the helmets that were, you know, the safest, some of that could be like, you know, Julian Edelman popping up off the ground. And I think that's, again, a digital transformation sign that, that, you know, And you can say, you got to give him, give him props for that. And next step, after the NFL, they had this data prep data Wrangler news, that they're now integrating And today you want to combine that batch. Expand on that more. you know, movies, or you want to add podcasts and you want to start monetizing that you want to, And then at the other end, you know, it comes to self-serve capability that somebody you can debate that kind of aspect of it, but I hear what you're saying, just get rid of it and make it simpler. And so I thought it was, you know, this is a huge problem to big problems in artificial So you could make a debugger, you know, when you're typing, it's like, you know, bug code corrections and automated in this idea of the edge manager where you have, you know, and they call it the about machine, And so, you know, I said it the other day, it's like a lot of the innovations materialized where you have machine learning for databases, data warehouse, Uh, companies like Amazon are going to be providing products that you can then apply to your business. And then they moved on to the next, many, many times the developers are going to be, you know, the linchpin to the edge. Like, you know, if you were a database, I have a degree in a database design, one of my degrees who do science And I will say this, you know, some of it I think was NDA. And then that was pretty much the end of the, the announcements big impact And so, you know, healthcare is something that is an industry that's ripe for disruption. I'll say pretty historic in the sense that there was so much content in first keynote last year with Well, so I think a couple of things, one is, you know, we've said for a while now that the new innovation it's, it's, it's Google, it's the Facebook's, it's obviously Microsoft, you know, I think, you know, this is the thing that I think they, my opinion is that Amazon And I think it's going to be interesting to see is how machine And so companies, you know, the last 10 years went out and tried to hire these people. So, you know, we're cube live, we're live covering the keynotes tomorrow.

ENTITIES

Entity	Category	Confidence
Ted Johnson	PERSON	0.99+
Dave Alante	PERSON	0.99+
Julian Edelman	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
New York	LOCATION	0.99+
Johnny	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dallas	LOCATION	0.99+
John	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Swami	PERSON	0.99+
Dave	PERSON	0.99+
John Ferrari	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
24%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
December 8th	DATE	0.99+
IBM	ORGANIZATION	0.99+
Matt	PERSON	0.99+
NFL	ORGANIZATION	0.99+
80 bucks	QUANTITY	0.99+
Python	TITLE	0.99+
91%	QUANTITY	0.99+
92%	QUANTITY	0.99+
75 minutes	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
today	DATE	0.99+
last year	DATE	0.99+
cube.net	OTHER	0.99+
Intel	ORGANIZATION	0.99+

Debanjan Saha, Google Cloud | October 2020

(gentle music) >> From the cube studios in Palo Alto and Boston, connecting with thought leaders all around the world. This is a Cube conversation. >> With Snowflake's, enormously successful IPO, it's clear that data warehousing in the cloud has come of age and a few companies know more about data and analytics than Google. Hi, I'm Paul Gillen. This is a cube conversation. And today we're going to talk about data warehousing and data analytics in the cloud. Google BigQuery, of course, is a popular, fully managed server less data warehouse that enables rapid SQL queries and interactive analysis of massive data sets. This summer, Google previewed BigQuery Omni, which essentially brings the capabilities of BigQuery to additional platforms including Amazon web services and soon Microsoft Azure. It's all part of Google's multicloud strategy. No one knows more about this strategy than Debanjan Saha, General Manager and Vice President of engineering for data analytics and Google cloud. And he joins me today. Debanjan, thanks so much for joining me. >> Paul, nice to meet you and thank you for having me today. >> So it's clear the data warehousing is now part of many enterprise data strategies. How has the rise of cloud change the way organizations are using data science in your view? >> Well, I mean, you know, the cloud definitely is a big enabler of data warehousing and data science, as you mentioned. I mean, it has enabled things that people couldn't do on-prem, for example, if you think about data science, the key ingredient of data science, before you can start anything is access to data and you need massive amount of data in order to build the right model that you want to use. And this was a big problem on-prem because people are always thinking about what data to keep, what to discard. That's not an issue in cloud. You can keep as much of data as you want, and that has been a big boon for data science. And it's not only your data, you can also have access to other data your, for example, your partner's data, public data sets and many other things that people have access to right? That's number one, number two of course, it's a very compute intensive operation and you know, large enterprises of course can afford them build a large data center and bring in lots of tens of thousands of CPU codes, GPU codes, TPU codes whatever have you, but it is difficult especially for smaller enterprises to have access to that amount of computing power which is very very important for data science. Cloud makes it easy. I mean, you know, it has in many ways democratize the use of data science and not only the big enterprises everyone can take advantage of the power of the computing power that various different cloud vendors make it available on their platform. And the third, not to overlook that, cloud also makes it available to customers and users, lots of various different data science platform, for example, Google's own TensorFlow and you have many other platforms Spark being one example of that, right? Both a cloud native platform as well as open source platforms, which is very very useful for people using data science and managed to open source, Spark also makes it very very affordable. And all of these things have contributed to massive boon in data science in the cloud and from my perspective. >> Now, of course we've seen over the last seven months a rush to the cloud triggered by the COVID-19 pandemic. How has that played out in the analytics field? Do you see any longterm changes to, to the landscape? The way customers are using analytics as a result of what's happened these last seven months? >> You know, I think as you know about kind of a digitization of our business is happening over a long period of time, right? And people are using AIML analytics in increasing numbers. What I've seen because of COVID-19 that trend has accelerated both in terms of people moving to cloud, and in terms of they're using advanced analytics and AIML and they have to do that, right? Pretty much every business is kind of leaning heavily on their data infrastructure in order to gain insight of what's coming next. A lot of the models that people are used to, is no longer valid things are changing very very rapidly right? So in order to survive and thrive people have to lean on data, lean on analytics to figure out what's coming around the corner. And that trend in my view is only going to accelerate. It's not going to go the other way round. >> One of the problems with cloud databases, We often hear complaints about is that there's so many of them. Do you see any resolution to that proliferation? >> Well, you know, I do think a one size does not fit all right. So it is important to have choice. It's important to have specialization. And that's why you see a lot of cloud databases. I don't think the number of cloud databases is going to go down. What I do expect to happen. People are going to use interoperable data formats. They are going to use open API so that it's very, very portable as people want to move from one database to another. The way I think the convergence is going to come is two ways, One, you know, a lot of databases, for example, use Federation. If you look at BigQuery, for example, you can start with BigQuery, but with BigQuery, you can have also access to data in other databases, not only in GCP or Google cloud but also in AWS with BigQuery Omni, for example, right? So that provides a layer of Federation, which kind of create convergence with respect, to weighing various different data assets people may have. I have also seen with, for example, with Looker, you know creation of enterprise wide data models and data API is gives people a platform so that they can build their custom data app and data solutions on top up and even from data API. Those I believe are going to be the points of convergence. I think data is probably going to be in different databases because different databases do different things well, that does not mean people wouldn't have access to all their data through one API or one set of models. >> Well, since we're on the subject of BigQuery. Now this summer, you introduced BigQuery Omni which is a database data warehouse, essentially a version of BigQuery that can query data in other cloud platforms, what, what is the strategy there? And what is the customer reaction been so far? >> Well, I mean, you know as you probably have seen talking to customers more than 80% of the customers that we talk to use multiple clouds and that trend is probably not going to change. I mean, it happens for various different reasons sometime because of compliance sometimes because they want to have different tools and different platform sometime because of M and a, we are a big believer of multi-cloud strategy and that's what we are trying to do with BigQuery Omni. We do realize people have choices. Customers will have their data in various different places and we will take our analytics wherever the data is. So customers won't have to worry about moving data from one place to another., and that's what we are trying to do with BigQuery Omni you know, going to see, you know for example, with Anthos, we have created a platform over which you can build this video as different data stacks and applications, which spans multiple clouds. I believe we are going to see more of that. And BigQuery Omni is just the beginning. >> And how have your customers reacted to that announcement. >> Oh deep! They reacted very, very positively. This is the first time they have a major cloud vendor offering a fully managed server less data warehouse platform on multiple clouds. And as I mentioned, I mean we have many customers who have some of their data assets for example, in GCP, they really love BigQuery. And they also have for example, applications running on AWS and Azure. And today the only option they have is to essentially shuttle their data between various different clouds in order to gain insight across the collective pool of data sets that they have, with BigQuery, Omni, they all tended to do that. They can keep their data wherever it is. They can still join across that data and get insights irrespective of which cloud their data is. >> You recently wrote on Forbes about the shortage of data scientists and the need to make data analytics more accessible to the average business user. What is Google doing in that respect? >> So we strongly, I mean, you know one of our goals is to make the data and insight from data available to everybody in the business right? That is the way you can democratize the use of analytics and AIML. And you know, one way to do that is to teach everybody R or Python or some specific tools but that's going to take a long time. So our approach is make the power of data analytics and AI AML available to our users, no matter what tools they're comfortable with. So for example, if you look at a B Q ML BigQuery ML, we have made it possible for our users who like SQL very much to use the power of ML without having to learn anything else or without having to move their data anywhere else. We have a lot of business users for example, who prefer X prefer spreadsheets and, you know, we've connected sheets. We have made the spreadsheet interface available on top of BigQuery, and they can use the power of BigQuery without having to learn anything else. Better yet we recently launched a BigQuery Q and A. And what Q and A allows you to do is to use natural language on top of big query data, right? So the goal, I mean, if you can do that that I think is the Nevada where people, anyone for example, somebody working in a call center talking to a customer can use a simple query to figure out what's going on with the bill, for example, right? And we believe that if we can democratize the use of data, insight and analytics that not only going to accelerate the digital transformation of the businesses, it's also going to grow consumption. And that's good for both the users, as well as business. >> Now you bought Looker last year, what would you say is different about the way Google is coming out the data analytics market from the way other cloud vendors are doing it. >> So Looker is a great addition to already strong portfolio of products that we have but you know, a lot of people think about Looker as a business intelligence platform. It's actually much more than that. What is unique about Looker is the semantic model that Looker can build on top of data assets, govern semantic model Looker can build on top of data assets, which may be in BigQuery maybe in cloud SQL maybe, you know, in other cloud for example, in Redshift or SQL data warehouse. And once you have the data model, you can create a data API and essentially an ID or integrated development environment on top of which you can build your custom workflows. You can build your custom dashboard you can build your custom data application. And that is, I think, where we are moving. I don't think people want the old dashboards anymore. They want their data experience to be immersive within the workflow and within the context in which they are using the data. And that's where I see Lot of customers are now using the power of Looker and BigQuery and other platform that we have and building this custom data apps. And what again, like BigQuery, Looker is also multi-platform it supports multiple data warehouses and databases and that kind of aligns very well with our philosophy of having an open platform that is multicloud as well as hybrid. >> Certainly, with Anthos and with BigQuery Omni, you demonstrated your commitment on P cloud, but not all cloud vendors have an interest in being multicloud. Do you see any, any change that standoff and are you really in a position to influence it? >> Absolutely. I think more than us it's a customer who is going to influence that, right? And almost every customer I talk to, they don't want to be in a walled garden. They want to be an open platform where they have the choice they have the flexibility and I believe these customers are going to push essentially the adoption of platforms, which are open and multicloud. And, you know, I believe over time the successful platforms have to be open platform. And the closed platform if you look at history has never been very successful, right? And you know, I sincerely think that we are on the right path and we are on the side of customers in this philosophy. >> Final question. What's your most important priority right now? >> You know, I wake up everyday thinking about how can you make our customer successful? And the best way to make our customer successful is to make sure that they can get business outcome out of the data that they have. And that's what we are trying to do. We want to accelerate time to value to data, you know, so that people can keep their data in a governed way. They can gain insight by using the tools that we can provide them. A lot of them, we have used internally for many years and those tools are now available to our customers. We also believe we need to democratize the use of analytics and AIML. And that's why we are trying to give customers tools where they don't have to learn a lot of new things and new skills in order to use them. And if we can do them successfully I think we are going to help our customers get more value out of their data and create businesses which can use that value. I'll give you a couple of quick examples. I mean, for example, if you look at Home Depot, they use our platform to improve the predictability of the inventory by two X. If you look at, for example HSBC, they have been able to use our platform to detect financial fraud 10 X faster. If you look at, for example Juan Perez, who's the CIO of UPS, they have used our AIML and analytics to do better logistics and route planning. And they have been able to save 10 million gallons of fuel every year which amounts to 400 million in cost savings. Those are the kind of business outcome we would like to drive with the power of our platform. >> Powerful stuff, democratize data multicloud data in any cloud who can argue with that. Debanjan Saha, General Manager and Vice President of engineering for data analytics at Google cloud. Thanks so much for joining me today. >> Paul, thank you thank you for inviting me. >> I'm Paul Gillen. This has been a cube conversation. >> Debanjan: Thank you. (soft music)

Published Date : Nov 7 2020

SUMMARY :

From the cube studios in Palo Alto and Boston, of BigQuery to additional platforms Paul, nice to meet you and So it's clear the data You can keep as much of data as you want, a rush to the cloud triggered and they have to do that, right? One of the problems They are going to use open API of BigQuery that can query know, going to see, you know to that announcement. is to essentially shuttle their data and the need to make data That is the way you is coming out the data analytics market of products that we have and are you really in a And you know, What's your most important and analytics to do better of engineering for data Paul, thank you thank This has been a cube conversation. (soft music)

ENTITIES

Entity	Category	Confidence
Paul Gillen	PERSON	0.99+
Paul	PERSON	0.99+
Debanjan	PERSON	0.99+
Juan Perez	PERSON	0.99+
October 2020	DATE	0.99+
Palo Alto	LOCATION	0.99+
Boston	LOCATION	0.99+
HSBC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
UPS	ORGANIZATION	0.99+
BigQuery	TITLE	0.99+
Home Depot	ORGANIZATION	0.99+
400 million	QUANTITY	0.99+
last year	DATE	0.99+
two ways	QUANTITY	0.99+
Debanjan Saha	PERSON	0.99+
more than 80%	QUANTITY	0.99+
Nevada	LOCATION	0.99+
Python	TITLE	0.99+
today	DATE	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
third	QUANTITY	0.99+
SQL	TITLE	0.99+
BigQuery Omni	TITLE	0.99+
Looker	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.98+
Redshift	TITLE	0.98+
BigQuery Omni	TITLE	0.98+
one database	QUANTITY	0.98+
10 million gallons	QUANTITY	0.98+
one set	QUANTITY	0.98+
both	QUANTITY	0.97+
first time	QUANTITY	0.97+
Snowflake	ORGANIZATION	0.97+
One	QUANTITY	0.97+
one	QUANTITY	0.97+
COVID-19 pandemic	EVENT	0.96+
10 X	QUANTITY	0.96+
Both	QUANTITY	0.95+
one example	QUANTITY	0.95+
GCP	TITLE	0.95+
Anthos	ORGANIZATION	0.93+
This summer	DATE	0.92+
this summer	DATE	0.92+
tens of thousands	QUANTITY	0.91+
last seven months	DATE	0.89+
COVID-19	OTHER	0.88+
CPU	QUANTITY	0.86+
two X.	QUANTITY	0.86+
one size	QUANTITY	0.86+
Spark	TITLE	0.82+
Google cloud	ORGANIZATION	0.79+

Stuti Deshpande, AWS | Smart Data Marketplaces

>> Announcer: From around the globe it's theCUBE with digital coverage of smart data marketplaces brought to you by Io Tahoe. >> Hi everybody, this is Dave Vellante. And welcome back. We've been talking about smart data. We've been hearing Io Tahoe talk about putting data to work and keep heart of building great data outcomes is the Cloud of course, and also Cloud native tooling. Stuti Deshpande is here. She's a partner solutions architect for Amazon Web Services and an expert in this area. Stuti, great to see you. Thanks so much for coming on theCUBE. >> Thank you so much for having me here. >> You're very welcome. So let's talk a little bit about Amazon. I mean, you have been on this machine learning journey for quite sometime. Take us through how this whole evolution has occurred in technology over the period of time. Since the Cloud really has been evolving. >> Amazon in itself is a company, an example of a company that has gotten through a multi year machine learning transformation to become the machine learning driven company that you see today. They have been improvising on original personalization model using robotics to all different women's centers, developing a forecasting system to predict the customer needs and improvising on that and reading customer expectations on convenience, fast delivery and speed, from developing natural language processing technology for end user infraction, to developing a groundbreaking technology such as Prime Air jobs to give packages to the customers. So our goal at Amazon With Services is to take this rich expertise and experience with machine learning technology across Amazon, and to work with thousands of customers and partners to handle this powerful technology into the hands of developers or data engineers of all levels. >> Great. So, okay. So if I'm a customer or a partner of AWS, give me the sales pitch on why I should choose you for machine learning. What are the benefits that I'm going to get specifically from AWS? >> Well, there are three main reasons why partners choose us. First and foremost, we provide the broadest and the deepest set of machine learning and AI services and features for your business. The velocity at which we innovate is truly unmatched. Over the last year, we launched 200 different services and features. So not only our pace is accelerating, but we provide fully managed services to our customers and partners who can easily build sophisticated AI driven applications and utilizing those fully managed services began build and train and deploy machine learning models, which is both valuable and differentiating. Secondly, we can accelerate the adoption of machine learning. So as I mentioned about fully managed services for machine learning, we have Amazon SageMaker. So SageMaker is a fully managed service that are any developer of any level or a data scientist can utilize to build complex machine learning, algorithms and models and deploy that at scale with very less effort and a very less cost. Before SageMaker, it used to take so much of time and expertise and specialization to build all these extensive models, but SageMaker, you can literally build any complex models within just a time of days or weeks. So to increase it option, AWS has acceleration programs just in a solution maps. And we also have education and training programs such as DeepRacer, which are enforces on enforcement learning and Embark, which actually help organization to adopt machine learning very readily. And we also support three major frameworks such as TensorFlow five charge, or they have separate teams who are dedicated to just focus on all these frameworks and improve the support of these frameworks for a wide variety of workloads. And finaly, we provide the most comprehensive platform that is optimized for machine learning. So when you think about machine learning, you need to have a data store where you can store your training sets, your test sets, which is highly reliable, highly scalable, and secure data store. Most of our customers want to store all of their data and any kind of data into a centralized repository that can be treated at the central source of fraud. And in this case from the Amazon Esri data store to build and endurance machine learning workflow. So we believe that we provide this capability of having the most comprehensive platform to build the machine learning workflow from internally. >> Great. Thank you for that. So I wanted, my next question is, this is a complicated situation for a lot of customers. You know, having the technology is one thing, but adoption is sort of everything. So I wonder if you could paint a picture for us and help us understand, how you're helping customers think about machine learning, thinking about that journey and maybe give us the context of what the ecosystem looks like? >> Sure. If someone can put up the belt, I would like to provide a picture representation of how AWS and fusion machine learning as three layers of stack. And moving on to next bill, I can talk about the bottom there. And bottom there as you can see over this screen, it's basically for advanced technologists advanced data scientists who are machine learning practitioners who work at the framework level. 90% of data scientists use multiple frameworks because multiple frameworks are adjusted and are suitable for multiple and different kinds of workloads. So at this layer, we provide support for all of the different types of frameworks. And the bottom layer is only for the advanced scientists and developers who are actually actually want to build, train and deploy these machine learning models by themselves and moving onto the next level, which is the middle layer. This layer is only suited for non-experts. So here we have SageMaker where it provides a fully managed service there you can build, tune, train and deploy your machine learning models at a very low cost and with very minimal efforts and at a higher scale, it removes all the complexity, heavy lifting and guesswork from this stage of machine learning and Amazon SageMaker has been the scene that will change. Many of our customers are actually standardizing on top off Amazon SageMaker. And then I'm moving on to the next layer, which is the top most layer. We call this as AI services because this may make the human recognition. So all of the services mentioned here such as Amazon Rekognition, which is basically a deep learning service optimized for image and video analysis. And then we have Amazon Polly, which can do the text to speech conversion and so on and so forth. So these are the AI services that can be embedded into the application so that the end user or the end customer can build AI driven applications. >> Love it. Okay. So you've got the experts at the bottom with the frameworks, the hardcore data scientists, you kind of get the self driving machine learning in the middle, and then you have all the ingredients. I'm like an AI chef or a machine learning chef. I can pull in vision, speech, chatbots, fraud detection, and sort of compile my own solutions that's cool. We hear a lot about SageMaker studio. I wonder if you could tell us a little bit more, can we double click a little bit on SageMaker? That seems to be a pretty important component of that stack that you just showed us. >> I think that was an absolutely very great summarization of all the different layers of machine unexpected. So thank you for providing the gist of that. Of course, I'll be really happy to talk about Amazon SageMaker because most of our customers are actually standardizing on top of SageMaker. That is spoken about how machine learning traditionally has so many complications and it's very complex and expensive and I traded process, which makes it even harder because they don't know integrated tools or if you do the traditional machine learning all kind of deployment, there are no integrated tools for the entire workflow process and deployment. And that is where SageMaker comes into the picture. SageMaker removes all the heaviness thing and complexities from each step of the deployment of machine learning workflow, how it solves our challenges by providing all of the different components that are optimized for every stage of the workflow into one single tool set. So that models get to production faster and with much less effort and at a lower cost. We really continue to add important (indistinct) leading to Amazon SageMaker. I think last year we announced 50 cubic litres in this far SageMaker being improvised it's features and functionalities. And I would love to call out a couple of those here, SageMaker notebooks, which are just one thing, the prominent notebooks that comes along with easy two instances, I'm sorry for quoting Jarvin here is Amazon Elastic Compute Instances. So you just need to have a one thing deployment and you have the entire SageMaker Notebook Interface, along with the Elastic Compute Instances running that gives you the faster time to production. If you're a machine, if you are a data scientist or a data engineer who worked extensively for machine learning, you must be aware about building training datasets is really complex. So there we have on his own ground truth, that is only for building machine learning training data sets, which can reduce your labeling cost by 70%. And if you perform machine learning and other model technology in general, there are some workflows where you need to do inferences. So there we have inference, Elastic Inference Incense, which you can reduce the cost by 75% by adding a little GP acceleration. Or you can reduce the cost by adding managed squad training, utilizing easy to spot instances. So there are multiple ways that you can reduce the costs and there are multiple ways there you can improvise and speed up your machine, learning deployment and workflow. >> So one of the things I love about, I mean, I'm a prime member who is not right. I love to shop at Amazon. And what I like about it is the consumer experience. It kind of helps me find things that maybe I wasn't aware of, maybe based on other patterns that are going on in the buying community with people that are similar. If I want to find a good book. It's always gives me great reviews and recommendations. So I'm wondering if that applies to sort of the tech world and machine learning, are you seeing any patterns emerge across the various use cases, you have such scale? What can you tell us about that? >> Sure. One of the battles that we have seen all the time is to build scalable layer for any kind of use case. So as I spoke before that as much, I'm really looking to put their data into a single set of depository where they have the single source of truth. So storing of data and any kind of data at any velocity into a single source of would actually help them build models who run on these data and get useful insights out of it. So when you speak about an entry and workflow, using Amazon SageMaker along bigger, scalable analytical tool is actually what we have seen as one of the factors where they can perform some analysis using Amazon SageMaker and build predictive models to say samples, if you want to take a healthcare use case. So they can build a predictive model that can victimize the readmissions of using Amazon SageMaker. So what I mean, to say is, by not moving data around and connecting different services to the same set of source of data, that's tumor avoid creating copies of data, which is very crucial when you are having training data set and test data sets with Amazon SageMaker. And it is highly important to consider this. So the pattern that we have seen is to utilize a central source of depository of data, which could be Amazon Extra. In this scenario, scalable analytical layer along with SageMaker. I would have to code at Intuit for a success story over here. I'm using sandwich, a Amazon SageMaker Intuit had reviews the machine learning deployment time by 90%. So I'm quoting here from six months to one week. And if you think about a healthcare industry, there hadn't been a shift from reactive to predictive care. So utilizing predictive models to accelerate research and discovery of new drugs and new treatments. And you've also observed that nurses were supported by AI tools increase their, their productivity has increased by 50%. I would like to say that one of our customers are really diving deep into the AWS portfolio of machine learning and AI services and including transcribed medical, where they are able to provide some insights so that their customers are getting benefits from them. Most of their customers are healthcare providers and they are able to give some into insights so that they can create some more personalized and improvise patient care. So there you have the end user benefits as well. One of the patterns that I have, I can speak about and what we have seen as well, appearing a predictive model with real time integration into healthcare records will actually help their healthcare provider customers for informed decision making and improvising the personalized patient care. >> That's a great example, several there. And I appreciate that. I mean, healthcare is one of those industries that is just so right for technology ingestion and transformation, that is a great example of how the cloud has really enabled really. I mean, I'm talking about major changes in healthcare with proactive versus reactive. We're talking about lower costs, better health, longer lives is really inspiring to see that evolve. We're going to watch it over the next several years. I wonder if we could close in the marketplace. I've had the pleasure of interviewing Dave McCann, a number of times. He and his team have built just an awesome capability for Amazon and its ecosystem. What about the data products, whether it's SageMaker or other data products in the marketplace, what can you tell us? >> Sure. Either of this market visits are interesting thing. So let me first talk about the AWS marketplace of what, AWS marketplace you can browse and search for hundreds of machine learning algorithms and machine learning, modern packages in a broad range of categories that this company provision, fixed analysis, voice answers, email, video, and it says predictive models and so on and so forth. And all of these models and algorithms can be deployed to a Jupiter notebook, which comes as part of the SageMaker that form. And you can integrate all of these different models and algorithms into our fully managed service, which is Amazon SageMaker to Jupiter notebooks, Sage maker, STK, and even command as well. And this experience is followed by either of those marketplace catalog and API. So you get the same benefits as any other marketplace products, the just seamless deployments and consolidate it. So you get the same benefits as the products and the invest marketplace for your machine learning algorithms and model packages. And this is really important because these can be darkly integrated into our SageMaker platform. And I don't even be honest about the data products as well. And I'm really happy to provide and code one of the example over here in the interest of cooler times and because we are in unprecedented times over here we collaborated with our partners to provide some data products. And one of them is data hub by tablet view that gives you the time series data of phases and depth data gathered from multiple trusted sources. And this is to provide better and informed knowledge so that everyone who was utilizing this product can make some informed decisions and help the community at the end. >> I love it. I love this concept of being able to access the data, algorithms, tooling. And it's not just about the data, it's being able to do something with the data and that we've been talking about injecting intelligence into those data marketplaces. That's what we mean by smart data marketplaces. Stuti Deshpande, thanks so much for coming to theCUBES here, sharing your knowledge and tell us a little bit about AWS. There's a pleasure having you. >> It's my pleasure too. Thank you so much for having me here. >> You're very welcome. And thank you for watching. Keep it right there. We will be right back right after this short break. (soft orchestral music)

Published Date : Sep 17 2020

SUMMARY :

brought to you by Io Tahoe. and keep heart of building in technology over the period of time. and to work with thousands What are the benefits that I'm going to and improve the support of these So I wonder if you could paint So all of the services mentioned here in the middle, and then you So that models get to production faster So one of the things I love about, So the pattern that we of how the cloud has and code one of the example And it's not just about the data, Thank you so much for having me here. And thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave McCann	PERSON	0.99+
Stuti Deshpande	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Stuti	PERSON	0.99+
90%	QUANTITY	0.99+
50%	QUANTITY	0.99+
Jarvin	PERSON	0.99+
75%	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
70%	QUANTITY	0.99+
200 different services	QUANTITY	0.99+
First	QUANTITY	0.99+
six months	QUANTITY	0.99+
one week	QUANTITY	0.99+
each step	QUANTITY	0.99+
last year	DATE	0.99+
SageMaker	TITLE	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
Intuit	ORGANIZATION	0.98+
both	QUANTITY	0.97+
TensorFlow	TITLE	0.97+
two instances	QUANTITY	0.97+
Secondly	QUANTITY	0.97+
Io Tahoe	PERSON	0.97+
One	QUANTITY	0.96+
single source	QUANTITY	0.96+
Prime Air	COMMERCIAL_ITEM	0.94+
single set	QUANTITY	0.93+
one thing	QUANTITY	0.92+
today	DATE	0.92+
three main reasons	QUANTITY	0.92+
Elastic Compute	TITLE	0.9+
DeepRacer	TITLE	0.9+
single tool	QUANTITY	0.87+
50 cubic litres	QUANTITY	0.85+
Elastic Compute	TITLE	0.84+
Rekognition	TITLE	0.84+
Amazon With Services	ORGANIZATION	0.82+
hundreds of machine learning algorithms	QUANTITY	0.82+
three major frameworks	QUANTITY	0.81+

Peter Guagenti, Cockroach Labs | DockerCon 2020

>> Male narrator: From around the globe, it's the CUBE with digital coverage of DockerCon Live 2020 brought to you by Docker and its ecosystem partners. >> Hey, welcome back everyone to the DockerCon Virtual Conference. DockerCon 20 being held digitally online is the CUBE's coverage. I'm John for your host of the CUBE. This is the CUBE virtual CUBE digital. We're getting all the remote interviews. We're here in our Palo Alto studio, quarantined crew, all getting the data for you. Got Peter Guangeti who's the Chief Marketing Officer Cockroach Labs, a company that we became familiar with last year. They had the first multicloud event in the history of the industry last year, notable milestone. Hey first, it's always good you're still around. So first you got the first position, Peter. Great to see you. Thanks for coming on the CUBE for DockerCon 20. >> Thank you, John. Thanks for having me. >> So it's kind of interesting, I mentioned that tidbit to give you a little bit of love on the fact that you guys ran or were a part of the first multicloud conference in the industry. Okay, now that's all everyone's talking about. You guys saw this early. Take a minute to explain Cockroach Labs. Why you saw this trend? Why you guys took the initiative and took the risk to have the first ever multicloud conference last year? >> So that's news to me that we were the first, actually. That's a bit of a surprise, cause for us we see multicloud and hybrid cloud as the obvious. I think the credit really for this belongs with folks like Gartner and others who took the time to listen to their customer, right? Took the time to understand what was the need in the market, which, you know, what I hear when I talk to CEOs is cloud is a capability, not a place, right? They're looking at us and saying, "yes, I have a go to cloud strategy, "but I also have made massive investments in my data center. "I believe I don't want to be locked in yet again "to another vendor with proprietary PIs, "proprietary systems, et cetera." So, what I hear when I talk to customers is, "I want to be multicloud show me how, "show me how to do that in a way "that isn't just buying from multiple vendors, right?" Where I've cost arbitrage, show me a way where I actually use the infrastructure in a creative way. And that really resonates with us. And it resonates with us for a few reasons. First is, we built a distributed SQL database for a reason, right? We believed that what you really need in the modern age for global applications is something that is truly diverse and distributed, right? You can have a database that behaves like a single database that lives in multiple locations around the world. But then you also have things like data locality. It's okay with German data stays in Germany because of German law. But when I write my application, I never write each of these things differently. Now, the other reason is, customers are coming to us and saying, "I want a single database that I can deploy "in any of the cloud providers." Azure SQL, and that is a phenomenal product. Google Spanner is a phenomenal product. But once I do that, I'm locked in. Then all I have is theirs. But if I'm a large global auto manufacturer, or if I'm a startup, that's trying to enter multiple markets at the same time. I don't want that. I want to be able to pick my infrastructure and deploy where I want, how I want. And increasingly, we talk to the large banks and they're saying, "I spent tens or even hundreds of millions of dollars "on data centers. "I don't want to throw them out. "I just want better utilization. "And the 15 to 20% that I get "from deploying software on bare metal, right? "I want to be able to containerize. "I want to be able to cloudify my data center "and then have ultimately what we see more and more "as what they call a tripod strategy "where your own data center and two cloud providers "behaving as a single unit "for your most important applications." >> That's awesome. I want to thank you for coming on to, for DockerCon 20, because this is an interesting time where developers are going to be called to the table in a very aggressive way because of COVID-19 crisis is going to accelerate until they pull the future forward ahead of most people thought. I mean, we, in the industry, we are inside the ropes, if you will. So we've been talking about stainless applications, stateful databases, and all the architectural things that's got that longer horizon. But this is an interesting time because now companies are realizing from whether it's the shelter in place at scale problems that emerge to the fact that I got to have high availability at a whole nother level. This kind of exposes a major challenge and a major opportunity. We're expecting projects to be funded, some not to be funded, things to move around. I think it's going to really change the conversation as developers get called in and saying, "I really got to look at my resources at scale. "The database is a critical one because you want data "to be part of that, this data plane, if you will, "across clouds." What's your reaction to this? Do you agree with that, the future has been pulled forward? And what's Cockroach doing to help developers do manage this? >> Yeah, John, I think you're exactly right. And I think that is a story that I'm glad that you're telling. Because, I think there's a lot of signal that's happening right now. But we're not really thinking about what the implications are. And we're seeing something that's I think quite remarkable. We're seeing within our existing customer base and the people we've been talking to, feast or famine. And in some cases, feast and famine in the same company. And what does that really mean? We've looked at these graphs for what's going to happen, for example, with online delivery services. And we've seen the growth rates and this is why they're all so valued. Why Uber invested so big in Uber eats and these other vendors. And we've seen these growth rates the same, and this is going to be amazing in the next 10 years, we're going to have this adoption. That five, 10 years happened overnight, right? We were so desperate to hold onto the things that are what mattered to us. And the things that make us happy on any given day. We're seeing that acceleration, like you said. It's all of that, the future got pulled forward, like you had said. >> Yeah. >> That's remarkable, but were you prepared for it? Many people were absolutely not prepared for it, right? They were on a steady state growth plan. And we have been very lucky because we built an architecture that is truly distributed and dynamic. So, scaling and adding more resilience to a database is something we all learned to do over the last 20 years, as data intensive applications matter. But with a distributed SQL and things like containerization on the stateless side, we know we can just truly elastically scale, right? You need more support for the application of something like Cockroach. You literally just add more nodes and we absorb it, right? Just like we did with containerization, where you need more concurrency, you just add more containers. And thank goodness, right, because I think those who were prepared for those things need to be worked with one of the large delivery services. Overnight, they saw a jump to what was their peak day at any point in time now happening every single day. And they were prepared for that because they already made these architectural decisions. >> Yeah. >> But if you weren't in that position, if you were still on legacy infrastructure, you were still trying to do this stuff manually, or you're manually sharding databases and having to increase the compute on your model, you are in trouble and you're feeling it. >> That's interesting Peter to bring that up and reminds me of the time, if you go back in history a little bit, just not too far back, I mean, I'm old enough to go back to the 80s, I remember all the different inflection points. And they all had their key characteristics as a computer revolution, TCP IP, and you pick your spots, there's always been that demarcation point or lions in where things change. But let's go back to around 2004 and then 2008. During that time, those legacy players out there kind of was sitting around, sleeping at the switch and incomes, open-source, incomes, Facebook, incomes, roll your own. Hey, I'm going to just run. I'm going to run open-source. I'm going to build my own database. And that was because there was nothing in the market. And most companies were buying from general purpose vendors because they didn't have to do all the due diligence. But the tech-savvy folks could build their own and scale. And that changed the game that became the hyperscale and the rest is history. Fast forward to today, because what you're getting at is, this new inflection point. There's going to be another tipping point of trajectory of knowledge, skill that's completely different than what we saw just a year ago. What's your reaction to that? >> I think you're exactly right. We saw and I've been lucky enough, same like you, I've been involved in the web since the very early days. I started my career at the beginning. And what we saw with web 1.0 and the shift to web 2.0, web 2.0 would not have happened without source. And I don't think we give them enough credit if it wasn't for the lamp stack, if it wasn't for Linux, if it wasn't for this wave of innovation and it wasn't even necessarily about rolling around. Yeah, the physics of the world to go hire their own engineers, to go and improve my SQL to make it scale. That was of course a possibility. But the democratization of that software is where all of the success really came from. And I lived on both sides of it in my career, as both an app developer and then as a software executive. In that window and got to see it from both sides and see the benefit. I think what we're entering now is yet another inflection point, like you said. We were already working at it. I think, the move from traditional applications with simple logic and simple rules to now highly data intensive applications, where data is driving the experience, models are driving the experience. I think we were already at a point where ML and AI and data intensive decision-making was going to make us rewrite every application we had and not needed a new infrastructure. But I think this is going to really force the issue. And it's going to force the issue at two levels. First is the people who are already innovating in each of these industries and categories, were already doing this. They were already cloud native. They were already built on top of very modern third generation databases, third generation programming languages, doing really interesting things with machine learning. So they were already out innovating, but now they have a bigger audience, right? And if you're a traditional and all of a sudden your business is under duress because substantial changes in what is happening in the market. Retailers still had strength with footprint as of last year, right? We don't be thinking about e-commerce versus traditional retail. Yeah, it was on a slow decline. There were lots of problems, but there was still a strength there, that happened changed overnight. Right now, that new sources have dried up, so what are you going to do? And how are you going to act? If you've built your entire business, for example, on legacy databases from folks like Oracle and old monolithic ways of building out patients, you're simply not adaptable enough to move with changing times. You're going to have to start, we used to talk about every company needed to become a software company. That mostly happened, but they weren't all very good software companies. I would argue that the next generation used to to be a great software company and great data scientists. We'll look at the software companies that have risen to prominence in the last five to 10 years. Folks like Facebook, folks like Google, folks like Uber, folks like Netflix, they use data better than anyone else in their category. So they have this amazing app experience and leverage data and innovate in such a way that allow them to just dominate their category. And I think that is going to be the change we see over the next 10 years. And we'll see who exits what is obviously going to be a jail term. We'll see who exits on top. >> Well, it's interesting to have you on. I love the perspective and the insights. I think that's great for the folks out there who haven't seen those ways before. Again, this wave is coming. Let's go back to the top when we were talking about what's in it for the developer. Because I believe there's going to be not a renaissance, cause it's always been great, but the developers even more are going to be called to the front lines for solutions. I mean, these are first-generation skill problems that are going to be in this whole next generation, modern era. That's upon us. What are some of the things that's going to be that lamp stack, like experience? What are some of the things that you see cause you guys are kind of at a tail sign, in my opinion, Cockroach, because you're thinking about things in a different construct. You're thinking about multicloud. You're thinking about state, which is a database challenge. Stateless has kind of been around restful API, stateless data service measures. Kubernetes is also showing a cloud native and the microservices or service orientation is the future. There's no debate on that. I think that's done. Okay, so now I'm a developer. What the hell am I going to be dealing with for the next five years? What's your thoughts? >> Well, I think the developer knows what they're already facing from an app perspective. I think you see the rapid evolution in languages, and then, in deployment and all of those things are super obvious. You need just need to go and say I'm sure that all the DockerCon sessions to see what the change to deployment looks like. I think there are a few other key trends that developers should start paying attention to, they are really critical. The first one, and only loosely related to us, is ML apps, right? I think just like we saw with dev and ops, suddenly come together so we can actually develop and deploy in a super fast iterative manner. The same things now are going to start happening with data and all of the work that we do around deploying models. And I think that that's going to be a pretty massive change. You think about the rise of tools like TensorFlow, some of the developments that have happened inside of the cloud providers. I think you're seeing a lot there as a developer, you have to start thinking as much like a data scientist and a data engineer as simply somebody writing front end code, right? And I think that's a critical skill that the best developers already building will continue. I think then the data layer has become as important or more important than any other layer in the stack because of this. And you think about once again, how the leaders are using data and the interesting things that they're doing, the tools you use matter, right? If you are spending a lot of your time trying to figure out how to shard something how to make it scale, how to make it durable when instead you should be focused on just the pure capability, that's a ridiculous use of your time, right? That is not a good use of your time. We're still using 20 to 25 year old open-source databases for many of these applications when they gave up their value probably 10 years ago. Honestly, you know, we keep all paper over it, but it's not a great solution. And unfortunately, no SQL will fix some of the issues with scaling elasticity, it's like you and I starting a business and saying, "okay, everyone speaks English, "but because we're global, "everyone's going to learn Esperanto, right?" That doesn't work, right? So works for a developer. But if you're trying to do something where everyone can interact, this is why this entire new third generation of new SQL databases have risen. We took the distributed architecture SQL. >> Hold up for a second. Can you explain what that means? Cause I think a key topic. I want to just call that out. What is this third generation database mean? Sorry, I speak about it. Like everyone sees it. >> I think it's super important. It's just a highlight. Just take a minute to explain it and we can get into it. There is an entire new wave of database infrastructure that has risen in the last five years. And it started actually with Google. So it started with Google Spanner. So Google was the first to face most of these problems, right? They were the first to face web scale. At least at the scale, we now know it. They were the first to really understand the complexity of working with data. They have their own no SQL. They have their own way of doing things internally and they realized it wasn't working. And what they really needed was a relational database that spoke traditional ANSI SQL, but scaled, like there are no SQL counterparts. And there was a white paper that was released. That was the birth of Spanner. Spanner was an internal product for many, many years. They released the thinking into the wild and then they just started this way with innovation. That's where our company came from. And there were others like us who said, "you're right. "Let's go build something that behaves," like we expect a database to behave with structure and this relational model and like anyone can write simple to use it. It's the simplest API for most people with data, but it behaves like all the best distributed software that we've been using. And so that's how we were born. Our company was founded by ex Googlers who had lived in this space and decided to go and scratch the itch, right? And instead of doing a product that would be locked into a single cloud provider, a database that could be open-source, it could be deployed anywhere. It could cross actual power providers without hiccups and that's been the movement. And it's not just us, there were other vendors in this space and we're all focused on really trying to take the best of the both worlds that came before us. The traditional relational structure, the consistency and asset compliance that we all loved from tools like Oracle, right? And Microsoft who we really enjoyed. But then the developer friendly nature and the simple elastic scalability of distributed software and, that's what we're all seeing. Our company, for example, has only been selling a product for the last two years. We found it five years ago, it took us three years just to rank in the software that we would be happy selling to a customer. We're on what we believe is probably a 10 to 15 year product journey to really go and replace things like Oracle. But we started selling the product two years ago and there is 300% growth year over year. We're probably one of the fastest growing software companies in America, right? And it's all because of the latent demand for this kind of a tool. >> Yeah, that's a great point. I'm a big fan of this third wave. Can I see it? If you look at just the macro tailwinds in the industry, billions of edged devices, immersion of all kinds of software. So that means you can't have one database. I always said to someone, in (mumbles) and others. You can't have one database. It's physically impossible. You need data and whatever database fits the scene, wherever you want to have data being stored, but you got to have it real time. You got to have actionable, you have to have software intelligence into how to manage the data. So I think the data control plane or that layer, I think it's the next interoperability wave. Because without data, nothing really works. Machine learning doesn't really work well. You want the most data. I think cybersecurity is a great early use case because they have to leverage data fast. And so you start to see some interesting financial services, cyber, what's your thoughts on this? Can you share from the Cockroach Labs perspective, from your database, you've got a cloud. What are some of the adoption use cases? Who are those leaders? You can name names if you have them, if not, name the use case. What's the Cockroach approach? Who's winning with it? What's it look like? >> Yeah, that's a great question. And you nailed it, right? The data volumes are so large and they're so globally distributed. And then when you start layering again, the data streaming in from devices that then have to be weighed against all of these things. You want a single database. But you need one that will behave in a way that's going to support all of that and actually is going to live at the edge like you're saying. And that's where we have been shining. And so our use cases are, and unfortunate, I can't name any names, but, for example, in retail. We're seeing retailers who have that elasticity and that skill challenge with commerce. And what they're using us for is then, we're in all of the locations where they do business, right? And so we're able to have data locality associated with the businesses and the purchases in those countries. And however, only have single apps that actually bridge across all of those environments. And with the distributed nature, we were able to scale up and scale down truly elastically, right? Because we spread out the data across the nodes automatically. And, what we see there is, you know, retailers do you have up and down moments? Can you talk about people who can leverage the financial structure of the cloud in a really thoughtful way? Retail is a shining example of that. I remember having customers that had 64 times the amount of traffic on cyber Monday that they had on the average day. In the old data center world, that's what you bought for. That was horrendous. In a cloud environment, still horrendous, even public cloud providers. If you're having to go and change your app to ramp every time, that's a problem with something like a distributed database. and with containerization, you could scale much more quickly and scale down much more. That's a big one for streaming media, is another one. Same thing with data locality in each of these countries, you think about it, somebody like Netflix or Hulu, right? They have shows that are unique to specific countries, right? They haven't have that user behavior, all that user data. You know data sovereignty, you know, what you watch on Netflix, there's some very rich personal data. And we all know how that metadata has been used against people. Or so it's no surprise that you now have countries that I know there's going to be regulation around where that data can live and how it can. And so once again, something like Cockroach where you can have that global distribution, but take a locality, or we can lock data to certain nodes in certain locations. That's a big one. >> There's no doubt in my mind. I think there's such a big topic. We probably do more interviews just on the COVID-19 data problem that they have. The impact of getting this right, is a nerd problem today. But it is a technology solution for society globally in the future. Zero doubt in my mind on that. So, Peter, I want you to get the last word and to give a plugin to the developers that are watching out there about Cockroach. Why should they engage with you guys? What can you offer? Is there anything new you want to share about the company to the audience here at DockerCon 2020? Take us home in the next segment. >> Thank you, John. I'll keep the sales pitch to a minimum. I'm a former developer myself. I don't like being sold, so I appreciate it. But we believe we're building, what is the right database for the coming wave of cognitive applications. And specifically we've built what we believe is the ideal database for distributed applications and for containerized applications. So I would strongly encourage you to try it. It is open-source. It is truly cloud native. We have free education, so you can try it yourself. And once you get into it, it is traditional SQL that behaves like Postgres and other tools that you've already known of. And so it should be very familiar, you know, if you've come up through any of these other spaces will be very natural. Postgres compatible integrates with a number of ORM. So as a developer, just plugged right into the tools you use and we're on a rapid journey. We believe we can replace that first generation of technology built by the Oracles of the world. And we're committed to doing it. We're committed to spending the next five to 10 years in hard engineering to build that most powerful database to solve this problem. >> Well, thanks for coming on, sharing your awesome insight and historical perspective. get it out of experience. We believe and we want to share the audience in this time of crisis, more than ever to focus on critical nature of operations, because coming out of this, it is going to be a whole new reality. And I think the best tech will win the day and people will be building new things to grow, whether it's for profit or for societal benefit. The impact of what we do in the next year or two will determine a big trajectory and new technology, new approaches that are dealing with the realities of infrastructure, scale, working at home , sheltering in place to coming back to the hybrid world. We're coming virtualized, Peter. We've been virtualized, the media, the lifestyle, not just virtualization in the networking sense, but, fun times it was going to be challenging. So thanks for coming on. >> Thank you very much, John. >> Okay, we're here for DockerCon 20 virtual conferences, the CUBE Virtual Segment. I want to thank you for watching. Stay with me. We've got stream all day today and check out the sessions. Jump in, it's going to be on demand. There's a lot of videos it's going to live on and thanks for watching and stay with us for more coverage and analysis. Here at DockerCon 20, I'm John Furrier. Thanks for watching >> Narrator: From the CUBE studios in Palo Alto in Boston, connecting with thought leaders all around the world. This is the CUBE conversation.

Published Date : May 29 2020

SUMMARY :

brought to you by Docker in the history of the industry Thanks for having me. I mentioned that tidbit to "And the 15 to 20% that I get I think it's going to really and this is going to be for the application of and having to increase And that changed the game and the shift to web 2.0, What are some of the things that you see the tools you use matter, right? Cause I think a key topic. And it's all because of the latent demand I always said to someone, that then have to be weighed about the company to the the next five to 10 years in the next year or two and check out the sessions. This is the CUBE conversation.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Peter	PERSON	0.99+
15	QUANTITY	0.99+
20	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Peter Guangeti	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Peter Guagenti	PERSON	0.99+
America	LOCATION	0.99+
10	QUANTITY	0.99+
Cockroach Labs	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
64 times	QUANTITY	0.99+
five	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
2008	DATE	0.99+
Google	ORGANIZATION	0.99+
First	QUANTITY	0.99+
tens	QUANTITY	0.99+
Docker	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
Hulu	ORGANIZATION	0.99+
both sides	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
last year	DATE	0.99+
Oracle	ORGANIZATION	0.99+
today	DATE	0.99+
first	QUANTITY	0.99+
both sides	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Cockroach	ORGANIZATION	0.99+
2004	DATE	0.99+
two levels	QUANTITY	0.99+
two years ago	DATE	0.99+
DockerCon 20	EVENT	0.99+
COVID-19	OTHER	0.99+
15 year	QUANTITY	0.99+
DockerCon	EVENT	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
five years ago	DATE	0.98+
20%	QUANTITY	0.98+
25 year	QUANTITY	0.98+
next year	DATE	0.98+
80s	DATE	0.98+
English	OTHER	0.98+
single apps	QUANTITY	0.98+
Boston	LOCATION	0.98+
first one	QUANTITY	0.98+
both worlds	QUANTITY	0.98+
first position	QUANTITY	0.97+
first generation	QUANTITY	0.97+
two cloud providers	QUANTITY	0.97+
third generation	QUANTITY	0.97+
DockerCon Live 2020	EVENT	0.97+
hundreds of millions of dollars	QUANTITY	0.97+
CUBE	ORGANIZATION	0.97+
a year ago	DATE	0.96+
10 years	QUANTITY	0.96+
SQL	TITLE	0.96+
Linux	TITLE	0.96+
single database	QUANTITY	0.96+

Hui Xue, National Heart, Lung, and Blood Institute | DockerCon Live 2020

>> Narrator: From around the globe it's theCUBE with digital coverage of DockerCon Live 2020. Brought to you by Docker and its ecosystem partners. >> Hi, I'm Stu Miniman and welcome to theCUBE's coverage of DockerCon Live 2020. Really excited to be part of this online event. We've been involved with DockerCon for a long time, of course one of my favorite things is always to be able to talk to the practitioners. Of course we remember for years, Docker exploded onto the marketplace, millions of people downloaded it, using it. So joining me is Hui Xue, who is a Principal Deputy Director of Medical Signal Processing at the National Heart, Lung, and Blood Institute, which is part of the National Institute of Health. Hui, thank you so much for joining us. >> Thank you for inviting me. >> So let's start. Of course, the name of your institute, very specific. I think anyone in the United States knows the NIH. Tell us a little bit about your role there and kind of the scope of what your team covers. >> So I'm basically a researcher and developer of the medical imaging technology. We are the heart, lung and the blood, so we work and focus on imaging the heart. So what we exactly do is to develop the new and novel imaging technology and deploy them to the front of our clinical library, which Docker played an essential role in the process. So, yeah, that's what we do at NHLBI. >> Okay, excellent. So research, you know, of course in the medical field with the global pandemic gets a lot of attention. So you keyed it up there. Let's understand, where does containerization and Docker specifically play into the work that your team is doing? >> So, maybe I'd like to give an example which will suffice. So for example, we're working on the magnetic resonance imaging, MRI. Many of us may may already have been scanned. So we're using MRI to image the heart. What Docker plays, is Docker allow us to deploy our imaging technology to the clinical hospital. So we have a global deployment around 40 hospitals, a bit more, around the world. If we are for example develop a new AI-based image analysis for the heart image, what we do with Docker is we can put our model and software into the Docker so that our collaboration sites, they will pull the software that contains the latest technology, then use them for the patients, of course under the research agreement at NIH. Because Docker is so efficient, available globally, we can actually implement a continuous integration and testing, update the framework based on Docker. Then our collaborators would have the latest technology instead of, you know, in the traditional medical imaging in general, the iteration of technology is pretty slow. But with all this latest technology, and such like container Docker come into the field. It's actually relatively new. In the past two to three years, all these paradigm is, it's changing, certainly very exciting to us. It give us the flexibility we never had before to reach our customers, to reach other people in the world to help them. They also help us so that's a very good experience to have. >> Yeah that's pretty powerful what you're talking about there rather than you know, we install some equipment, who knows how often things get updated, how do you make sure to synchronize between different locations. Obviously the medical field highly regulated and being a government agency, talk a little bit about how you make sure you have the right version control, security is in place, how do all of those things sort out? >> Yes, that's an essential question. So firstly I want to clarify one thing. So it's not NIH who endorse Docker, it's us as researchers. We practiced Docker too and we trust its performance. This container technology is efficient, it's globally available and it's very secure. So all the communication between the container and the imaging equipment is encrypted. We also have all the paperwork it saved to set up to allow us to provide technology to our clinician. When they post the latest software, every version they put up into the Docker went through an automated integration test system. So every time they make a change, the newer version of software runs through a rigorous test, something like 200 gigabytes of data runs through and checked everything is still working. So the basic principle is we don't allow any version of the software to be delivered to customer without testing Docker. Let's say this container technology in general actually is 100% automating all this process, which actually give us a lot of freedom so we have a rather very small team here at NIH. Many people are actually very impressed by how many customer we support within this so small team. So the key reason is because we have a strongly utilized container technology, so its automation is unparalleled, certainly much better than anything I had before using this container technology. So that's actually the key to maintain the quality and the continuous service to our customers. >> Yeah, absolutely. Automation is something we've been talking about in the industry for a long time but if we implement it properly it can have a huge impact. Can you bring us inside a little bit, you know, what tools are you doing? How is that automation set up and managed? And how that fits into the Docker environment. >> So I kind of describe to be more specific. So we are using a continuous testing framework. There are several apps to be using a specific one to build on, which is an open source Python tool, rather small actually. What it can do is, this tool will set up at the service, then this service will watch for example our GitHub repo. Whenever I make a change or someone in the team makes a change for example, fix a bug, add a new feature, or maybe update a new AI model, we push the edge of the GitHub then there's a continuous building system that will notice, it will trigger the integration test run all inside Docker environment. So this is the key. What container technology offers is that we can have 100% reproducible runtime environment for our customers as the software provider, because in our particular use case we don't set up customer with the uniform hardware so they bought their own server around the world, so everyone may have slightly different hardware. We don't want that to get into our software experience. So Docker actually offers us the 100% control of the runtime environment which is very essential if we want to deliver a consistent medical imaging experience because most applications actually it's rather computational intensive, so they don't want something to run for like one minute in one site and maybe three minutes at another site. So what Docker place is that Docker will run all the integration tests. If everything pass then they pack the Docker image then send to the Docker Hub. Then all our collaborators around the world have new image then they will coordinate with them so they will find a proper time to update then they have the newer technology in time. So that's why Docker is such a useful tool for us. >> Yeah, absolutely. Okay, containerization in Docker really transformed the way a lot of those computational solutions happen. I'm wondering if you can explain a little bit more the stack that you're using if people that might not have looked at solutions for a couple of years think oh it's containers, it's dateless architectures, I'm not sure how it fits into my other network environment. Can you tell us what are you doing for the storage in the network? >> So we actually have a rather vertical integration in this medical imaging application, so we build our own service as the software, its backbone is C++ for the higher computational efficiency. There's lots of Python because these days AI model essential. What Docker provides, as I mentioned, uniform always this runtime environment so we have a fixed GCC version then if we want to go into that detail. Specific version of numerical library, certain versions of Python, will be using PyTorch a lot. So that's our AI backbone. Another way of using Docker is actually we deploy the same container into the Microsoft Azure cloud. That's another ability I found out about Docker, so we never need to change anything in our software development process, but the same container I give you must work everywhere on the cloud, on site, for our customers. This actually reduces the development cost, also improve our efficiency a lot. Another important aspect is this actually will improve customers', how do they say it, customer acceptance a lot because they go to one customer, tell them the software you are running is actually running on 30 other sites exactly the same up to the let's say heights there, so it's bit by bit consistent. This actually help us convince many people. Every time when I describe this process I think most people accept the idea. They actually appreciate the way how we deliver software to them because we always can falling back. So yes, here is another aspect. So we have many Docker images that's in the Docker Hub, so if one deployment fails, they can easily falling back. That's actually very important for medical imaging applications that fail because hospitals need to maintain their continuous level of service. So even we want to avoid this completely but yes occasionally, very occasionally, there will be some function not working or some new test case never covered before, then we give them an magnet then, falling back, that's actually also our policy and offered by the container technology. >> Yeah, absolutely. You brought up, many have said that the container is that atomic unit of building block and that portability around any platform environment. What about container orchestration? How are you managing these environments you talked about in the public cloud or in different environments? What are you doing for container orchestration? >> Actually our set-up might be the simplest case. So we basically have a private Docker repo which we paid, actually the Institute has paid. We have something like 50 or 100 private repos, then for every repo we have one specific Docker setup with different software versions of different, for example some image is for PyTorch another for TensorFlow depending on our application. Maybe some customer has the requirement to have rather small Docker image size then they have some trimmed down version of image. In this process, because it's still in a small number like 20, 30 active repo, we are actually managing it semi-automatically so we have the service running to push and pull, and loading back images but we actually configured this process here at the Institute whenever we feel we have something new to offer to the customer. Regarding managing this Docker image, it's actually another aspect for the medical image. So at the customer side, we had a lot of discussion with them for whether we want to set up a continuous automated app, but in the end they decided, they said they'd better have customers involved. Better have some people. So we were finally stopped there by, we noticed customer, there are something new to update then they will decide when to update, how to test. So this is another aspect. Even we have a very high level of confirmation using the container technology, we found it's not 100%. In some site, it's still better have human supervision to help because if the goal is to maintain 100% continuous service then in the end they need some experts on the field to test and verify. So that's how they are in the current stage of deployment of this Docker image. We found it's rather light-weight so even with a few people at NIH in our team, they can manage a rather large network globally, so it's really exciting for us. >> Excellent. Great. I guess final question, give us a little bit of a road map as to, you've already talked about leveraging AI in there, the various pieces, what are you looking for from Docker in the ecosystem, and your solution for the rest of the year? >> I would say the future definitely is on the cloud. One major direction we are trying to push is to go the clinical hospital, linking and use the cloud in building as a routine. So in current status, some of sites, hospital may be very conservative, they are afraid of the security, the connection, all kinds of issues related to cloud. But this scenario is changing rapidly, especially container technology contributes a lot on the cloud. So it makes the whole thing so easy, so reliable. So our next push is to move in lots of the application into the cloud only. So the model will be, for example, we have new AI applications. It may be only available on the cloud. If some customer is waiting to use them they will have to be willing to connect to the cloud and maybe sending data there and receive, for example, the AI apps from our running Docker image in the cloud, but what we need to do is to make the Docker building even more efficiency. Make the computation 100% stable so we can utilize the huge computational power in the cloud. Also the price, so the key here is the price. So if we have one setup in the cloud, a data center for example, we currently maintain two data centers one across Europe, another is in United States. So if we have one data center and 50 hospitals using it every day, then we need the numbers. The average price for one patient comes to a few dollars per patient. So if we consider this medical health care system the costs, the ideal costs of using cloud computing can be truly trivial, but what we can offer to patients and doctor has never happened. The computation you can bring to us is something they never saw before and they never experienced. So I believe that's the future, it's not, the old model is everyone has his own computational server, then maintaining that, it costs a lot of work. Even doctor make the software aspects much easier, but the hardware, someone still need to set-up them. But using cloud will change all of. So I think the next future is definitely to wholly utilize the cloud with the container technology. >> Excellent. Well, we thank you so much. I know everyone appreciates the work your team's doing and absolutely if things can be done to allow scalability and lower cost per patient that would be a huge benefit. Thank you so much for joining us. >> Thank you. >> All right, stay tuned for lots more coverage from theCUBE at DockerCon Live 2020. I'm Stu Miniman and thank you for watching theCUBE. (gentle music)

Published Date : May 29 2020

SUMMARY :

the globe it's theCUBE at the National Heart, Lung, of the scope of what your team covers. of the medical imaging technology. course in the medical field and software into the Docker Obviously the medical field of the software to be the Docker environment. edge of the GitHub then in the network? the way how we deliver about in the public cloud or because if the goal is to from Docker in the ecosystem, So the model will be, for example, the work your team's doing you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
NIH	ORGANIZATION	0.99+
National Institute of Health	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Europe	LOCATION	0.99+
United States	LOCATION	0.99+
200 gigabytes	QUANTITY	0.99+
one minute	QUANTITY	0.99+
three minutes	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
Python	TITLE	0.99+
Hui Xue	PERSON	0.99+
50 hospitals	QUANTITY	0.99+
Docker	ORGANIZATION	0.99+
DockerCon	EVENT	0.99+
20	QUANTITY	0.99+
one patient	QUANTITY	0.99+
30 other sites	QUANTITY	0.99+
PyTorch	TITLE	0.99+
Microsoft	ORGANIZATION	0.99+
Docker	TITLE	0.99+
one data center	QUANTITY	0.98+
one	QUANTITY	0.98+
one site	QUANTITY	0.98+
millions of people	QUANTITY	0.98+
two data centers	QUANTITY	0.97+
DockerCon Live 2020	EVENT	0.97+
firstly	QUANTITY	0.97+
Hui	PERSON	0.97+
NHLBI	ORGANIZATION	0.97+
National Heart, Lung, and Blood Institute	ORGANIZATION	0.97+
theCUBE	ORGANIZATION	0.97+
one customer	QUANTITY	0.96+
National Heart, Lung, and Blood Institute	ORGANIZATION	0.96+
one thing	QUANTITY	0.96+
50	QUANTITY	0.96+
100 private repos	QUANTITY	0.93+
around 40 hospitals	QUANTITY	0.91+
30 active repo	QUANTITY	0.85+
pandemic	EVENT	0.82+
three years	QUANTITY	0.82+
C++	TITLE	0.81+
Hui Xue	ORGANIZATION	0.8+
TensorFlow	TITLE	0.75+
One major	QUANTITY	0.71+
Azure cloud	TITLE	0.7+
Docker	PERSON	0.7+
Medical Signal Processing	ORGANIZATION	0.66+
few dollars per patient	QUANTITY	0.65+
couple of years	QUANTITY	0.58+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for CSS: