Robert Nishihara, Anyscale | AWS Startup Showcase S3 E1

(upbeat music) >> Hello everyone. Welcome to theCube's presentation of the "AWS Startup Showcase." The topic this episode is AI and machine learning, top startups building foundational model infrastructure. This is season three, episode one of the ongoing series covering exciting startups from the AWS ecosystem. And this time we're talking about AI and machine learning. I'm your host, John Furrier. I'm excited I'm joined today by Robert Nishihara, who's the co-founder and CEO of a hot startup called Anyscale. He's here to talk about Ray, the open source project, Anyscale's infrastructure for foundation as well. Robert, thank you for joining us today. >> Yeah, thanks so much as well. >> I've been following your company since the founding pre pandemic and you guys really had a great vision scaled up and in a perfect position for this big wave that we all see with ChatGPT and OpenAI that's gone mainstream. Finally, AI has broken out through the ropes and now gone mainstream, so I think you guys are really well positioned. I'm looking forward to to talking with you today. But before we get into it, introduce the core mission for Anyscale. Why do you guys exist? What is the North Star for Anyscale? >> Yeah, like you mentioned, there's a tremendous amount of excitement about AI right now. You know, I think a lot of us believe that AI can transform just every different industry. So one of the things that was clear to us when we started this company was that the amount of compute needed to do AI was just exploding. Like to actually succeed with AI, companies like OpenAI or Google or you know, these companies getting a lot of value from AI, were not just running these machine learning models on their laptops or on a single machine. They were scaling these applications across hundreds or thousands or more machines and GPUs and other resources in the Cloud. And so to actually succeed with AI, and this has been one of the biggest trends in computing, maybe the biggest trend in computing in, you know, in recent history, the amount of compute has been exploding. And so to actually succeed with that AI, to actually build these scalable applications and scale the AI applications, there's a tremendous software engineering lift to build the infrastructure to actually run these scalable applications. And that's very hard to do. So one of the reasons many AI projects and initiatives fail is that, or don't make it to production, is the need for this scale, the infrastructure lift, to actually make it happen. So our goal here with Anyscale and Ray, is to make that easy, is to make scalable computing easy. So that as a developer or as a business, if you want to do AI, if you want to get value out of AI, all you need to know is how to program on your laptop. Like, all you need to know is how to program in Python. And if you can do that, then you're good to go. Then you can do what companies like OpenAI or Google do and get value out of machine learning. >> That programming example of how easy it is with Python reminds me of the early days of Cloud, when infrastructure as code was talked about was, it was just code the infrastructure programmable. That's super important. That's what AI people wanted, first program AI. That's the new trend. And I want to understand, if you don't mind explaining, the relationship that Anyscale has to these foundational models and particular the large language models, also called LLMs, was seen with like OpenAI and ChatGPT. Before you get into the relationship that you have with them, can you explain why the hype around foundational models? Why are people going crazy over foundational models? What is it and why is it so important? >> Yeah, so foundational models and foundation models are incredibly important because they enable businesses and developers to get value out of machine learning, to use machine learning off the shelf with these large models that have been trained on tons of data and that are useful out of the box. And then, of course, you know, as a business or as a developer, you can take those foundational models and repurpose them or fine tune them or adapt them to your specific use case and what you want to achieve. But it's much easier to do that than to train them from scratch. And I think there are three, for people to actually use foundation models, there are three main types of workloads or problems that need to be solved. One is training these foundation models in the first place, like actually creating them. The second is fine tuning them and adapting them to your use case. And the third is serving them and actually deploying them. Okay, so Ray and Anyscale are used for all of these three different workloads. Companies like OpenAI or Cohere that train large language models. Or open source versions like GPTJ are done on top of Ray. There are many startups and other businesses that fine tune, that, you know, don't want to train the large underlying foundation models, but that do want to fine tune them, do want to adapt them to their purposes, and build products around them and serve them, those are also using Ray and Anyscale for that fine tuning and that serving. And so the reason that Ray and Anyscale are important here is that, you know, building and using foundation models requires a huge scale. It requires a lot of data. It requires a lot of compute, GPUs, TPUs, other resources. And to actually take advantage of that and actually build these scalable applications, there's a lot of infrastructure that needs to happen under the hood. And so you can either use Ray and Anyscale to take care of that and manage the infrastructure and solve those infrastructure problems. Or you can build the infrastructure and manage the infrastructure yourself, which you can do, but it's going to slow your team down. It's going to, you know, many of the businesses we work with simply don't want to be in the business of managing infrastructure and building infrastructure. They want to focus on product development and move faster. >> I know you got a keynote presentation we're going to go to in a second, but I think you hit on something I think is the real tipping point, doing it yourself, hard to do. These are things where opportunities are and the Cloud did that with data centers. Turned a data center and made it an API. The heavy lifting went away and went to the Cloud so people could be more creative and build their product. In this case, build their creativity. Is that kind of what's the big deal? Is that kind of a big deal happening that you guys are taking the learnings and making that available so people don't have to do that? >> That's exactly right. So today, if you want to succeed with AI, if you want to use AI in your business, infrastructure work is on the critical path for doing that. To do AI, you have to build infrastructure. You have to figure out how to scale your applications. That's going to change. We're going to get to the point, and you know, with Ray and Anyscale, we're going to remove the infrastructure from the critical path so that as a developer or as a business, all you need to focus on is your application logic, what you want the the program to do, what you want your application to do, how you want the AI to actually interface with the rest of your product. Now the way that will happen is that Ray and Anyscale will still, the infrastructure work will still happen. It'll just be under the hood and taken care of by Ray in Anyscale. And so I think something like this is really necessary for AI to reach its potential, for AI to have the impact and the reach that we think it will, you have to make it easier to do. >> And just for clarification to point out, if you don't mind explaining the relationship of Ray and Anyscale real quick just before we get into the presentation. >> So Ray is an open source project. We created it. We were at Berkeley doing machine learning. We started Ray so that, in order to provide an easy, a simple open source tool for building and running scalable applications. And Anyscale is the managed version of Ray, basically we will run Ray for you in the Cloud, provide a lot of tools around the developer experience and managing the infrastructure and providing more performance and superior infrastructure. >> Awesome. I know you got a presentation on Ray and Anyscale and you guys are positioning as the infrastructure for foundational models. So I'll let you take it away and then when you're done presenting, we'll come back, I'll probably grill you with a few questions and then we'll close it out so take it away. >> Robert: Sounds great. So I'll say a little bit about how companies are using Ray and Anyscale for foundation models. The first thing I want to mention is just why we're doing this in the first place. And the underlying observation, the underlying trend here, and this is a plot from OpenAI, is that the amount of compute needed to do machine learning has been exploding. It's been growing at something like 35 times every 18 months. This is absolutely enormous. And other people have written papers measuring this trend and you get different numbers. But the point is, no matter how you slice and dice it, it' a astronomical rate. Now if you compare that to something we're all familiar with, like Moore's Law, which says that, you know, the processor performance doubles every roughly 18 months, you can see that there's just a tremendous gap between the needs, the compute needs of machine learning applications, and what you can do with a single chip, right. So even if Moore's Law were continuing strong and you know, doing what it used to be doing, even if that were the case, there would still be a tremendous gap between what you can do with the chip and what you need in order to do machine learning. And so given this graph, what we've seen, and what has been clear to us since we started this company, is that doing AI requires scaling. There's no way around it. It's not a nice to have, it's really a requirement. And so that led us to start Ray, which is the open source project that we started to make it easy to build these scalable Python applications and scalable machine learning applications. And since we started the project, it's been adopted by a tremendous number of companies. Companies like OpenAI, which use Ray to train their large models like ChatGPT, companies like Uber, which run all of their deep learning and classical machine learning on top of Ray, companies like Shopify or Spotify or Instacart or Lyft or Netflix, ByteDance, which use Ray for their machine learning infrastructure. Companies like Ant Group, which makes Alipay, you know, they use Ray across the board for fraud detection, for online learning, for detecting money laundering, you know, for graph processing, stream processing. Companies like Amazon, you know, run Ray at a tremendous scale and just petabytes of data every single day. And so the project has seen just enormous adoption since, over the past few years. And one of the most exciting use cases is really providing the infrastructure for building training, fine tuning, and serving foundation models. So I'll say a little bit about, you know, here are some examples of companies using Ray for foundation models. Cohere trains large language models. OpenAI also trains large language models. You can think about the workloads required there are things like supervised pre-training, also reinforcement learning from human feedback. So this is not only the regular supervised learning, but actually more complex reinforcement learning workloads that take human input about what response to a particular question, you know is better than a certain other response. And incorporating that into the learning. There's open source versions as well, like GPTJ also built on top of Ray as well as projects like Alpa coming out of UC Berkeley. So these are some of the examples of exciting projects in organizations, training and creating these large language models and serving them using Ray. Okay, so what actually is Ray? Well, there are two layers to Ray. At the lowest level, there's the core Ray system. This is essentially low level primitives for building scalable Python applications. Things like taking a Python function or a Python class and executing them in the cluster setting. So Ray core is extremely flexible and you can build arbitrary scalable applications on top of Ray. So on top of Ray, on top of the core system, what really gives Ray a lot of its power is this ecosystem of scalable libraries. So on top of the core system you have libraries, scalable libraries for ingesting and pre-processing data, for training your models, for fine tuning those models, for hyper parameter tuning, for doing batch processing and batch inference, for doing model serving and deployment, right. And a lot of the Ray users, the reason they like Ray is that they want to run multiple workloads. They want to train and serve their models, right. They want to load their data and feed that into training. And Ray provides common infrastructure for all of these different workloads. So this is a little overview of what Ray, the different components of Ray. So why do people choose to go with Ray? I think there are three main reasons. The first is the unified nature. The fact that it is common infrastructure for scaling arbitrary workloads, from data ingest to pre-processing to training to inference and serving, right. This also includes the fact that it's future proof. AI is incredibly fast moving. And so many people, many companies that have built their own machine learning infrastructure and standardized on particular workflows for doing machine learning have found that their workflows are too rigid to enable new capabilities. If they want to do reinforcement learning, if they want to use graph neural networks, they don't have a way of doing that with their standard tooling. And so Ray, being future proof and being flexible and general gives them that ability. Another reason people choose Ray in Anyscale is the scalability. This is really our bread and butter. This is the reason, the whole point of Ray, you know, making it easy to go from your laptop to running on thousands of GPUs, making it easy to scale your development workloads and run them in production, making it easy to scale, you know, training to scale data ingest, pre-processing and so on. So scalability and performance, you know, are critical for doing machine learning and that is something that Ray provides out of the box. And lastly, Ray is an open ecosystem. You can run it anywhere. You can run it on any Cloud provider. Google, you know, Google Cloud, AWS, Asure. You can run it on your Kubernetes cluster. You can run it on your laptop. It's extremely portable. And not only that, it's framework agnostic. You can use Ray to scale arbitrary Python workloads. You can use it to scale and it integrates with libraries like TensorFlow or PyTorch or JAX or XG Boost or Hugging Face or PyTorch Lightning, right, or Scikit-learn or just your own arbitrary Python code. It's open source. And in addition to integrating with the rest of the machine learning ecosystem and these machine learning frameworks, you can use Ray along with all of the other tooling in the machine learning ecosystem. That's things like weights and biases or ML flow, right. Or you know, different data platforms like Databricks, you know, Delta Lake or Snowflake or tools for model monitoring for feature stores, all of these integrate with Ray. And that's, you know, Ray provides that kind of flexibility so that you can integrate it into the rest of your workflow. And then Anyscale is the scalable compute platform that's built on top, you know, that provides Ray. So Anyscale is a managed Ray service that runs in the Cloud. And what Anyscale does is it offers the best way to run Ray. And if you think about what you get with Anyscale, there are fundamentally two things. One is about moving faster, accelerating the time to market. And you get that by having the managed service so that as a developer you don't have to worry about managing infrastructure, you don't have to worry about configuring infrastructure. You also, it provides, you know, optimized developer workflows. Things like easily moving from development to production, things like having the observability tooling, the debug ability to actually easily diagnose what's going wrong in a distributed application. So things like the dashboards and the other other kinds of tooling for collaboration, for monitoring and so on. And then on top of that, so that's the first bucket, developer productivity, moving faster, faster experimentation and iteration. The second reason that people choose Anyscale is superior infrastructure. So this is things like, you know, cost deficiency, being able to easily take advantage of spot instances, being able to get higher GPU utilization, things like faster cluster startup times and auto scaling. Things like just overall better performance and faster scheduling. And so these are the kinds of things that Anyscale provides on top of Ray. It's the managed infrastructure. It's fast, it's like the developer productivity and velocity as well as performance. So this is what I wanted to share about Ray in Anyscale. >> John: Awesome. >> Provide that context. But John, I'm curious what you think. >> I love it. I love the, so first of all, it's a platform because that's the platform architecture right there. So just to clarify, this is an Anyscale platform, not- >> That's right. >> Tools. So you got tools in the platform. Okay, that's key. Love that managed service. Just curious, you mentioned Python multiple times, is that because of PyTorch and TensorFlow or Python's the most friendly with machine learning or it's because it's very common amongst all developers? >> That's a great question. Python is the language that people are using to do machine learning. So it's the natural starting point. Now, of course, Ray is actually designed in a language agnostic way and there are companies out there that use Ray to build scalable Java applications. But for the most part right now we're focused on Python and being the best way to build these scalable Python and machine learning applications. But, of course, down the road there always is that potential. >> So if you're slinging Python code out there and you're watching that, you're watching this video, get on Anyscale bus quickly. Also, I just, while you were giving the presentation, I couldn't help, since you mentioned OpenAI, which by the way, congratulations 'cause they've had great scale, I've noticed in their rapid growth 'cause they were the fastest company to the number of users than anyone in the history of the computer industry, so major successor, OpenAI and ChatGPT, huge fan. I'm not a skeptic at all. I think it's just the beginning, so congratulations. But I actually typed into ChatGPT, what are the top three benefits of Anyscale and came up with scalability, flexibility, and ease of use. Obviously, scalability is what you guys are called. >> That's pretty good. >> So that's what they came up with. So they nailed it. Did you have an inside prompt training, buy it there? Only kidding. (Robert laughs) >> Yeah, we hard coded that one. >> But that's the kind of thing that came up really, really quickly if I asked it to write a sales document, it probably will, but this is the future interface. This is why people are getting excited about the foundational models and the large language models because it's allowing the interface with the user, the consumer, to be more human, more natural. And this is clearly will be in every application in the future. >> Absolutely. This is how people are going to interface with software, how they're going to interface with products in the future. It's not just something, you know, not just a chat bot that you talk to. This is going to be how you get things done, right. How you use your web browser or how you use, you know, how you use Photoshop or how you use other products. Like you're not going to spend hours learning all the APIs and how to use them. You're going to talk to it and tell it what you want it to do. And of course, you know, if it doesn't understand it, it's going to ask clarifying questions. You're going to have a conversation and then it'll figure it out. >> This is going to be one of those things, we're going to look back at this time Robert and saying, "Yeah, from that company, that was the beginning of that wave." And just like AWS and Cloud Computing, the folks who got in early really were in position when say the pandemic came. So getting in early is a good thing and that's what everyone's talking about is getting in early and playing around, maybe replatforming or even picking one or few apps to refactor with some staff and managed services. So people are definitely jumping in. So I have to ask you the ROI cost question. You mentioned some of those, Moore's Law versus what's going on in the industry. When you look at that kind of scale, the first thing that jumps out at people is, "Okay, I love it. Let's go play around." But what's it going to cost me? Am I going to be tied to certain GPUs? What's the landscape look like from an operational standpoint, from the customer? Are they locked in and the benefit was flexibility, are you flexible to handle any Cloud? What is the customers, what are they looking at? Basically, that's my question. What's the customer looking at? >> Cost is super important here and many of the companies, I mean, companies are spending a huge amount on their Cloud computing, on AWS, and on doing AI, right. And I think a lot of the advantage of Anyscale, what we can provide here is not only better performance, but cost efficiency. Because if we can run something faster and more efficiently, it can also use less resources and you can lower your Cloud spending, right. We've seen companies go from, you know, 20% GPU utilization with their current setup and the current tools they're using to running on Anyscale and getting more like 95, you know, 100% GPU utilization. That's something like a five x improvement right there. So depending on the kind of application you're running, you know, it's a significant cost savings. We've seen companies that have, you know, processing petabytes of data every single day with Ray going from, you know, getting order of magnitude cost savings by switching from what they were previously doing to running their application on Ray. And when you have applications that are spending, you know, potentially $100 million a year and getting a 10 X cost savings is just absolutely enormous. So these are some of the kinds of- >> Data infrastructure is super important. Again, if the customer, if you're a prospect to this and thinking about going in here, just like the Cloud, you got infrastructure, you got the platform, you got SaaS, same kind of thing's going to go on in AI. So I want to get into that, you know, ROI discussion and some of the impact with your customers that are leveraging the platform. But first I hear you got a demo. >> Robert: Yeah, so let me show you, let me give you a quick run through here. So what I have open here is the Anyscale UI. I've started a little Anyscale Workspace. So Workspaces are the Anyscale concept for interactive developments, right. So here, imagine I'm just, you want to have a familiar experience like you're developing on your laptop. And here I have a terminal. It's not on my laptop. It's actually in the cloud running on Anyscale. And I'm just going to kick this off. This is going to train a large language model, so OPT. And it's doing this on 32 GPUs. We've got a cluster here with a bunch of CPU cores, bunch of memory. And as that's running, and by the way, if I wanted to run this on instead of 32 GPUs, 64, 128, this is just a one line change when I launch the Workspace. And what I can do is I can pull up VS code, right. Remember this is the interactive development experience. I can look at the actual code. Here it's using Ray train to train the torch model. We've got the training loop and we're saying that each worker gets access to one GPU and four CPU cores. And, of course, as I make the model larger, this is using deep speed, as I make the model larger, I could increase the number of GPUs that each worker gets access to, right. And how that is distributed across the cluster. And if I wanted to run on CPUs instead of GPUs or a different, you know, accelerator type, again, this is just a one line change. And here we're using Ray train to train the models, just taking my vanilla PyTorch model using Hugging Face and then scaling that across a bunch of GPUs. And, of course, if I want to look at the dashboard, I can go to the Ray dashboard. There are a bunch of different visualizations I can look at. I can look at the GPU utilization. I can look at, you know, the CPU utilization here where I think we're currently loading the model and running that actual application to start the training. And some of the things that are really convenient here about Anyscale, both I can get that interactive development experience with VS code. You know, I can look at the dashboards. I can monitor what's going on. It feels, I have a terminal, it feels like my laptop, but it's actually running on a large cluster. And I can, with however many GPUs or other resources that I want. And so it's really trying to combine the best of having the familiar experience of programming on your laptop, but with the benefits, you know, being able to take advantage of all the resources in the Cloud to scale. And it's like when, you know, you're talking about cost efficiency. One of the biggest reasons that people waste money, one of the silly reasons for wasting money is just forgetting to turn off your GPUs. And what you can do here is, of course, things will auto terminate if they're idle. But imagine you go to sleep, I have this big cluster. You can turn it off, shut off the cluster, come back tomorrow, restart the Workspace, and you know, your big cluster is back up and all of your code changes are still there. All of your local file edits. It's like you just closed your laptop and came back and opened it up again. And so this is the kind of experience we want to provide for our users. So that's what I wanted to share with you. >> Well, I think that whole, couple of things, lines of code change, single line of code change, that's game changing. And then the cost thing, I mean human error is a big deal. People pass out at their computer. They've been coding all night or they just forget about it. I mean, and then it's just like leaving the lights on or your water running in your house. It's just, at the scale that it is, the numbers will add up. That's a huge deal. So I think, you know, compute back in the old days, there's no compute. Okay, it's just compute sitting there idle. But you know, data cranking the models is doing, that's a big point. >> Another thing I want to add there about cost efficiency is that we make it really easy to use, if you're running on Anyscale, to use spot instances and these preemptable instances that can just be significantly cheaper than the on-demand instances. And so when we see our customers go from what they're doing before to using Anyscale and they go from not using these spot instances 'cause they don't have the infrastructure around it, the fault tolerance to handle the preemption and things like that, to being able to just check a box and use spot instances and save a bunch of money. >> You know, this was my whole, my feature article at Reinvent last year when I met with Adam Selipsky, this next gen Cloud is here. I mean, it's not auto scale, it's infrastructure scale. It's agility. It's flexibility. I think this is where the world needs to go. Almost what DevOps did for Cloud and what you were showing me that demo had this whole SRE vibe. And remember Google had site reliability engines to manage all those servers. This is kind of like an SRE vibe for data at scale. I mean, a similar kind of order of magnitude. I mean, I might be a little bit off base there, but how would you explain it? >> It's a nice analogy. I mean, what we are trying to do here is get to the point where developers don't think about infrastructure. Where developers only think about their application logic. And where businesses can do AI, can succeed with AI, and build these scalable applications, but they don't have to build, you know, an infrastructure team. They don't have to develop that expertise. They don't have to invest years in building their internal machine learning infrastructure. They can just focus on the Python code, on their application logic, and run the stuff out of the box. >> Awesome. Well, I appreciate the time. Before we wrap up here, give a plug for the company. I know you got a couple websites. Again, go, Ray's got its own website. You got Anyscale. You got an event coming up. Give a plug for the company looking to hire. Put a plug in for the company. >> Yeah, absolutely. Thank you. So first of all, you know, we think AI is really going to transform every industry and the opportunity is there, right. We can be the infrastructure that enables all of that to happen, that makes it easy for companies to succeed with AI, and get value out of AI. Now we have, if you're interested in learning more about Ray, Ray has been emerging as the standard way to build scalable applications. Our adoption has been exploding. I mentioned companies like OpenAI using Ray to train their models. But really across the board companies like Netflix and Cruise and Instacart and Lyft and Uber, you know, just among tech companies. It's across every industry. You know, gaming companies, agriculture, you know, farming, robotics, drug discovery, you know, FinTech, we see it across the board. And all of these companies can get value out of AI, can really use AI to improve their businesses. So if you're interested in learning more about Ray and Anyscale, we have our Ray Summit coming up in September. This is going to highlight a lot of the most impressive use cases and stories across the industry. And if your business, if you want to use LLMs, you want to train these LLMs, these large language models, you want to fine tune them with your data, you want to deploy them, serve them, and build applications and products around them, give us a call, talk to us. You know, we can really take the infrastructure piece, you know, off the critical path and make that easy for you. So that's what I would say. And, you know, like you mentioned, we're hiring across the board, you know, engineering, product, go-to-market, and it's an exciting time. >> Robert Nishihara, co-founder and CEO of Anyscale, congratulations on a great company you've built and continuing to iterate on and you got growth ahead of you, you got a tailwind. I mean, the AI wave is here. I think OpenAI and ChatGPT, a customer of yours, have really opened up the mainstream visibility into this new generation of applications, user interface, roll of data, large scale, how to make that programmable so we're going to need that infrastructure. So thanks for coming on this season three, episode one of the ongoing series of the hot startups. In this case, this episode is the top startups building foundational model infrastructure for AI and ML. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

episode one of the ongoing and you guys really had and other resources in the Cloud. and particular the large language and what you want to achieve. and the Cloud did that with data centers. the point, and you know, if you don't mind explaining and managing the infrastructure and you guys are positioning is that the amount of compute needed to do But John, I'm curious what you think. because that's the platform So you got tools in the platform. and being the best way to of the computer industry, Did you have an inside prompt and the large language models and tell it what you want it to do. So I have to ask you and you can lower your So I want to get into that, you know, and you know, your big cluster is back up So I think, you know, the on-demand instances. and what you were showing me that demo and run the stuff out of the box. I know you got a couple websites. and the opportunity is there, right. and you got growth ahead

ENTITIES

Entity	Category	Confidence
Robert Nishihara	PERSON	0.99+
John	PERSON	0.99+
Robert	PERSON	0.99+
John Furrier	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
35 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Ant Group	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Python	TITLE	0.99+
20%	QUANTITY	0.99+
32 GPUs	QUANTITY	0.99+
Lyft	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
tomorrow	DATE	0.99+
Anyscale	ORGANIZATION	0.99+
three	QUANTITY	0.99+
128	QUANTITY	0.99+
September	DATE	0.99+
today	DATE	0.99+
Moore's Law	TITLE	0.99+
Adam Selipsky	PERSON	0.99+
PyTorch	TITLE	0.99+
Ray	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
64	QUANTITY	0.99+
each worker	QUANTITY	0.99+
each worker	QUANTITY	0.99+
Photoshop	TITLE	0.99+
UC Berkeley	ORGANIZATION	0.99+
Java	TITLE	0.99+
Shopify	ORGANIZATION	0.99+
OpenAI	ORGANIZATION	0.99+
Anyscale	PERSON	0.99+
third	QUANTITY	0.99+
two things	QUANTITY	0.99+
ByteDance	ORGANIZATION	0.99+
Spotify	ORGANIZATION	0.99+
One	QUANTITY	0.99+
95	QUANTITY	0.99+
Asure	ORGANIZATION	0.98+
one line	QUANTITY	0.98+
one GPU	QUANTITY	0.98+
ChatGPT	TITLE	0.98+
TensorFlow	TITLE	0.98+
last year	DATE	0.98+
first bucket	QUANTITY	0.98+
both	QUANTITY	0.98+
two layers	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Alipay	ORGANIZATION	0.98+
Ray	PERSON	0.97+
one	QUANTITY	0.97+
Instacart	ORGANIZATION	0.97+

Robert Nishihara, Anyscale | CUBE Conversation

(upbeat instrumental) >> Hello and welcome to this CUBE conversation. I'm John Furrier, host of theCUBE, here in Palo Alto, California. Got a great conversation with Robert Nishihara who's the co-founder and CEO of Anyscale. Robert, great to have you on this CUBE conversation. It's great to see you. We did your first Ray Summit a couple years ago and congratulations on your venture. Great to have you on. >> Thank you. Thanks for inviting me. >> So you're first time CEO out of Berkeley in Data. You got the Databricks is coming out of there. You got a bunch of activity coming from Berkeley. It's like a, it really is kind of like where a lot of innovations going on data. Anyscale has been one of those startups that has risen out of that scene. Right? You look at the success of what the Data lakes are now. Now you've got the generative AI. This has been a really interesting innovation market. This new wave is coming. Tell us what's going on with Anyscale right now, as you guys are gearing up and getting some growth. What's happening with the company? >> Yeah, well one of the most exciting things that's been happening in computing recently, is the rise of AI and the excitement about AI, and the potential for AI to really transform every industry. Now of course, one of the of the biggest challenges to actually making that happen is that doing AI, that AI is incredibly computationally intensive, right? To actually succeed with AI to actually get value out of AI. You're typically not just running it on your laptop, you're often running it and scaling it across thousands of machines, or hundreds of machines or GPUs, and to, so organizations and companies and businesses that do AI often end up building a large infrastructure team to manage the distributed systems, the computing to actually scale these applications. And that's a, that's a, a huge software engineering lift, right? And so, one of the goals for Anyscale is really to make that easy. To get to the point where, developers and teams and companies can succeed with AI. Can build these scalable AI applications, without really you know, without a huge investment in infrastructure with a lot of, without a lot of expertise in infrastructure, where really all they need to know is how to program on their laptop, how to program in Python. And if you have that, then that's really all you need to succeed with AI. So that's what we've been focused on. We're building Ray, which is an open source project that's been starting to get adopted by tons of companies, to actually train these models, to deploy these models, to do inference with these models, you know, to ingest and pre-process their data. And our goals, you know, here with the company are really to make Ray successful. To grow the Ray community, and then to build a great product around it and simplify the development and deployment, and productionization of machine learning for, for all these businesses. >> It's a great trend. Everyone wants developer productivity seeing that, clearly right now. And plus, developers are voting literally on what standards become. As you look at how the market is open source driven, a lot of that I love the model, love the Ray project love the, love the Anyscale value proposition. How big are you guys now, and how is that value proposition of Ray and Anyscale and foundational models coming together? Because it seems like you guys are in a perfect storm situation where you guys could get a real tailwind and draft off the the mega trend that everyone's getting excited. The new toy is ChatGPT. So you got to look at that and say, hey, I mean, come on, you guys did all the heavy lifting. >> Absolutely. >> You know how many people you are, and what's the what's the proposition for you guys these days? >> You know our company's about a hundred people, that a bit larger than that. Ray's been going really quickly. It's been, you know, companies using, like OpenAI uses Ray to train their models, like ChatGPT. Companies like Uber run all their deep learning you know, and classical machine learning on top of Ray. Companies like Shopify, Spotify, Netflix, Cruise, Lyft, Instacart, you know, Bike Dance. A lot of these companies are investing heavily in Ray for their machine learning infrastructure. And I think it's gotten to the point where, if you're one of these, you know type of businesses, and you're looking to revamp your machine learning infrastructure. If you're looking to enable new capabilities, you know make your teams more productive, increase, speed up the experimentation cycle, you know make it more performance, like build, you know, run applications that are more scalable, run them faster, run them in a more cost efficient way. All of these types of companies are at least evaluating Ray and Ray is an increasingly common choice there. I think if they're not using Ray, if many of these companies that end up not using Ray, they often end up building their own infrastructure. So Ray has been, the growth there has been incredibly exciting over the, you know we had our first in-person Ray Summit just back in August, and planning the next one for, for coming September. And so when you asked about the value proposition, I think there's there's really two main things, when people choose to go with Ray and Anyscale. One reason is about moving faster, right? It's about developer productivity, it's about speeding up the experimentation cycle, easily getting their models in production. You know, we hear many companies say that they, you know they, once they prototype a model, once they develop a model, it's another eight weeks, or 12 weeks to actually get that model in production. And that's a reason they talk to us. We hear companies say that, you know they've been training their models and, and doing inference on a single machine, and they've been sort of scaling vertically, like using bigger and bigger machines. But they, you know, you can only do that for so long, and at some point you need to go beyond a single machine and that's when they start talking to us. Right? So one of the main value propositions is around moving faster. I think probably the phrase I hear the most is, companies saying that they don't want their machine learning people to have to spend all their time configuring infrastructure. All this is about productivity. >> Yeah. >> The other. >> It's the big brains in the company. That are being used to do remedial tasks that should be automated right? I mean that's. >> Yeah, and I mean, it's hard stuff, right? It's also not these people's area of expertise, and or where they're adding the most value. So all of this is around developer productivity, moving faster, getting to market faster. The other big value prop and the reason people choose Ray and choose Anyscale, is around just providing superior infrastructure. This is really, can we scale more? You know, can we run it faster, right? Can we run it in a more cost effective way? We hear people saying that they're not getting good GPU utilization with the existing tools they're using, or they can't scale beyond a certain point, or you know they don't have a way to efficiently use spot instances to save costs, right? Or their clusters, you know can't auto scale up and down fast enough, right? These are all the kinds of things that Ray and Anyscale, where Ray and Anyscale add value and solve these kinds of problems. >> You know, you bring up great points. Auto scaling concept, early days, it was easy getting more compute. Now it's complicated. They're built into more integrated apps in the cloud. And you mentioned those companies that you're working with, that's impressive. Those are like the big hardcore, I call them hardcore. They have a good technical teams. And as the wave starts to move from these companies that were hyper scaling up all the time, the mainstream are just developers, right? So you need an interface in, so I see the dots connecting with you guys and I want to get your reaction. Is that how you see it? That you got the alphas out there kind of kicking butt, building their own stuff, alpha developers and infrastructure. But mainstream just wants programmability. They want that heavy lifting taken care of for them. Is that kind of how you guys see it? I mean, take us through that. Because to get crossover to be democratized, the automation's got to be there. And for developer productivity to be in, it's got to be coding and programmability. >> That's right. Ultimately for AI to really be successful, and really you know, transform every industry in the way we think it has the potential to. It has to be easier to use, right? And that is, and being easier to use, there's many dimensions to that. But an important one is that as a developer to do AI, you shouldn't have to be an expert in distributed systems. You shouldn't have to be an expert in infrastructure. If you do have to be, that's going to really limit the number of people who can do this, right? And I think there are so many, all of the companies we talk to, they don't want to be in the business of building and managing infrastructure. It's not that they can't do it. But it's going to slow them down, right? They want to allocate their time and their energy toward building their product, right? To building a better product, getting their product to market faster. And if we can take the infrastructure work off of the critical path for them, that's going to speed them up, it's going to simplify their lives. And I think that is critical for really enabling all of these companies to succeed with AI. >> Talk about the customers you guys are talking to right now, and how that translates over. Because I think you hit a good thread there. Data infrastructure is critical. Managed services are coming online, open sources continuing to grow. You have these people building their own, and then if they abandon it or don't scale it properly, there's kind of consequences. 'Cause it's a system you mentioned, it's a distributed system architecture. It's not as easy as standing up a monolithic app these days. So when you guys go to the marketplace and talk to customers, put the customers in buckets. So you got the ones that are kind of leaning in, that are pretty peaked, probably working with you now, open source. And then what's the customer profile look like as you go mainstream? Are they looking to manage service, looking for more architectural system, architecture approach? What's the, Anyscale progression? How do you engage with your customers? What are they telling you? >> Yeah, so many of these companies, yes, they're looking for managed infrastructure 'cause they want to move faster, right? Now the kind of these profiles of these different customers, they're three main workloads that companies run on Anyscale, run with Ray. It's training related workloads, and it is serving and deployment related workloads, like actually deploying your models, and it's batch processing, batch inference related workloads. Like imagine you want to do computer vision on tons and tons of, of images or videos, or you want to do natural language processing on millions of documents or audio, or speech or things like that, right? So the, I would say the, there's a pretty large variety of use cases, but the most common you know, we see tons of people working with computer vision data, you know, computer vision problems, natural language processing problems. And it's across many different industries. We work with companies doing drug discovery, companies doing you know, gaming or e-commerce, right? Companies doing robotics or agriculture. So there's a huge variety of the types of industries that can benefit from AI, and can really get a lot of value out of AI. And, but the, but the problems are the same problems that they all want to solve. It's like how do you make your team move faster, you know succeed with AI, be more productive, speed up the experimentation, and also how do you do this in a more performant way, in a faster, cheaper, in a more cost efficient, more scalable way. >> It's almost like the cloud game is coming back to AI and these foundational models, because I was just on a podcast, we recorded our weekly podcast, and I was just riffing with Dave Vellante, my co-host on this, were like, hey, in the early days of Amazon, if you want to build an app, you just, you have to build a data center, and then you go to now you go to the cloud, cloud's easier, pay a little money, penny's on the dollar, you get your app up and running. Cloud computing is born. With foundation models in generative AI. The old model was hard, heavy lifting, expensive, build out, before you get to do anything, as you mentioned time. So I got to think that you're pretty much in a good position with this foundational model trend in generative AI because I just looked at the foundation map, foundation models, map of the ecosystem. You're starting to see layers of, you got the tooling, you got platform, you got cloud. It's filling out really quickly. So why is Anyscale important to this new trend? How do you talk to people when they ask you, you know what does ChatGPT mean for Anyscale? And how does the financial foundational model growth, fit into your plan? >> Well, foundational models are hugely important for the industry broadly. Because you're going to have these really powerful models that are trained that you know, have been trained on tremendous amounts of data. tremendous amounts of computes, and that are useful out of the box, right? That people can start to use, and query, and get value out of, without necessarily training these huge models themselves. Now Ray fits in and Anyscale fit in, in a number of places. First of all, they're useful for creating these foundation models. Companies like OpenAI, you know, use Ray for this purpose. Companies like Cohere use Ray for these purposes. You know, IBM. If you look at, there's of course also open source versions like GPTJ, you know, created using Ray. So a lot of these large language models, large foundation models benefit from training on top of Ray. And, but of course for every company training and creating these huge foundation models, you're going to have many more that are fine tuning these models with their own data. That are deploying and serving these models for their own applications, that are building other application and business logic around these models. And that's where Ray also really shines, because Ray you know, is, can provide common infrastructure for all of these workloads. The training, the fine tuning, the serving, the data ingest and pre-processing, right? The hyper parameter tuning, the and and so on. And so where the reason Ray and Anyscale are important here, is that, again, foundation models are large, foundation models are compute intensive, doing you know, using both creating and using these foundation models requires tremendous amounts of compute. And there there's a big infrastructure lift to make that happen. So either you are using Ray and Anyscale to do this, or you are building the infrastructure and managing the infrastructure yourself. Which you can do, but it's, it's hard. >> Good luck with that. I always say good luck with that. I mean, I think if you really need to do, build that hardened foundation, you got to go all the way. And I think this, this idea of composability is interesting. How is Ray working with OpenAI for instance? Take, take us through that. Because I think you're going to see a lot of people talking about, okay I got trained models, but I'm going to have not one, I'm going to have many. There's big debate that OpenAI is going to be the mother of all LLMs, but now, but really people are also saying that to be many more, either purpose-built or specific. The fusion and these things come together there's like a blending of data, and that seems to be a value proposition. How does Ray help these guys get their models up? Can you take, take us through what Ray's doing for say OpenAI and others, and how do you see the models interacting with each other? >> Yeah, great question. So where, where OpenAI uses Ray right now, is for the training workloads. Training both to create ChatGPT and models like that. There's both a supervised learning component, where you're pre-training this model on doing supervised pre-training with example data. There's also a reinforcement learning component, where you are fine-tuning the model and continuing to train the model, but based on human feedback, based on input from humans saying that, you know this response to this question is better than this other response to this question, right? And so Ray provides the infrastructure for scaling the training across many, many GPUs, many many machines, and really running that in an efficient you know, performance fault tolerant way, right? And so, you know, open, this is not the first version of OpenAI's infrastructure, right? They've gone through iterations where they did start with building the infrastructure themselves. They were using tools like MPI. But at some point, you know, given the complexity, given the scale of what they're trying to do, you hit a wall with MPI and that's going to happen with a lot of other companies in this space. And at that point you don't have many other options other than to use Ray or to build your own infrastructure. >> That's awesome. And then your vision on this data interaction, because the old days monolithic models were very rigid. You couldn't really interface with them. But we're kind of seeing this future of data fusion, data interaction, data blending at large scale. What's your vision? How do you, what's your vision of where this goes? Because if this goes the way people think. You can have this data chemistry kind of thing going on where people are integrating all kinds of data with each other at large scale. So you need infrastructure, intelligence, reasoning, a lot of code. Is this something that you see? What's your vision in all this? Take us through. >> AI is going to be used everywhere right? It's, we see this as a technology that's going to be ubiquitous, and is going to transform every business. I mean, imagine you make a product, maybe you were making a tool like Photoshop or, or whatever the, you know, tool is. The way that people are going to use your tool, is not by investing, you know, hundreds of hours into learning all of the different, you know specific buttons they need to press and workflows they need to go through it. They're going to talk to it, right? They're going to say, ask it to do the thing they want it to do right? And it's going to do it. And if it, if it doesn't know what it's want, what it's, what's being asked of it. It's going to ask clarifying questions, right? And then you're going to clarify, and you're going to have a conversation. And this is going to make many many many kinds of tools and technology and products easier to use, and lower the barrier to entry. And so, and this, you know, many companies fit into this category of trying to build products that, and trying to make them easier to use, this is just one kind of way it can, one kind of way that AI will will be used. But I think it's, it's something that's pretty ubiquitous. >> Yeah. It'll be efficient, it'll be efficiency up and down the stack, and will change the productivity equation completely. You just highlighted one, I don't want to fill out forms, just stand up my environment for me. And then start coding away. Okay well this is great stuff. Final word for the folks out there watching, obviously new kind of skill set for hiring. You guys got engineers, give a plug for the company, for Anyscale. What are you looking for? What are you guys working on? Give a, take the last minute to put a plug in for the company. >> Yeah well if you're interested in AI and if you think AI is really going to be transformative, and really be useful for all these different industries. We are trying to provide the infrastructure to enable that to happen, right? So I think there's the potential here, to really solve an important problem, to get to the point where developers don't need to think about infrastructure, don't need to think about distributed systems. All they think about is their application logic, and what they want their application to do. And I think if we can achieve that, you know we can be the foundation or the platform that enables all of these other companies to succeed with AI. So that's where we're going. I think something like this has to happen if AI is going to achieve its potential, we're looking for, we're hiring across the board, you know, great engineers, on the go-to-market side, product managers, you know people who want to really, you know, make this happen. >> Awesome well congratulations. I know you got some good funding behind you. You're in a good spot. I think this is happening. I think generative AI and foundation models is going to be the next big inflection point, as big as the pc inter-networking, internet and smartphones. This is a whole nother application framework, a whole nother set of things. So this is the ground floor. Robert, you're, you and your team are right there. Well done. >> Thank you so much. >> All right. Thanks for coming on this CUBE conversation. I'm John Furrier with theCUBE. Breaking down a conversation around AI and scaling up in this new next major inflection point. This next wave is foundational models, generative AI. And thanks to ChatGPT, the whole world's now knowing about it. So it really is changing the game and Anyscale is right there, one of the hot startups, that is in good position to ride this next wave. Thanks for watching. (upbeat instrumental)

Published Date : Feb 24 2023

SUMMARY :

Robert, great to have you Thanks for inviting me. as you guys are gearing up and the potential for AI to a lot of that I love the and at some point you need It's the big brains in the company. and the reason people the automation's got to be there. and really you know, and talk to customers, put but the most common you know, and then you go to now that are trained that you know, and that seems to be a value proposition. And at that point you don't So you need infrastructure, and lower the barrier to entry. What are you guys working on? and if you think AI is really is going to be the next And thanks to ChatGPT,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Robert Nishihara	PERSON	0.99+
John Furrier	PERSON	0.99+
12 weeks	QUANTITY	0.99+
Robert	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Lyft	ORGANIZATION	0.99+
Shopify	ORGANIZATION	0.99+
eight weeks	QUANTITY	0.99+
Spotify	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
August	DATE	0.99+
September	DATE	0.99+
Palo Alto, California	LOCATION	0.99+
Cruise	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Instacart	ORGANIZATION	0.99+
Anyscale	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Photoshop	TITLE	0.99+
One reason	QUANTITY	0.99+
Bike Dance	ORGANIZATION	0.99+
Ray	ORGANIZATION	0.99+
Python	TITLE	0.99+
thousands of machines	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
two main things	QUANTITY	0.98+
single machine	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Ray and Anyscale	ORGANIZATION	0.98+
millions of documents	QUANTITY	0.98+
both	QUANTITY	0.98+
one kind	QUANTITY	0.96+
first version	QUANTITY	0.95+
CUBE	ORGANIZATION	0.95+
about a hundred people	QUANTITY	0.95+
hundreds of machines	QUANTITY	0.95+
one	QUANTITY	0.95+
OpenAI	ORGANIZATION	0.94+
First	QUANTITY	0.94+
hundreds of hours	QUANTITY	0.93+
first time	QUANTITY	0.93+
Databricks	ORGANIZATION	0.91+
Ray and Anyscale	ORGANIZATION	0.9+
tons	QUANTITY	0.89+
couple years ago	DATE	0.88+
Ray and	ORGANIZATION	0.86+
ChatGPT	TITLE	0.81+
tons of people	QUANTITY	0.8+

Robert Nishihara, Anyscale | AWS re:Invent 2022 - Global Startup Program

>>Well, hello everybody. John Walls here and continuing our coverage here at AWS Reinvent 22 on the queue. We continue our segments here in the Global Startup program, which of course is sponsored by AWS Startup Showcase, and with us to talk about any scale as the co-founder and CEO of the company, Robert and n, you are Robert. Good to see you. Thanks for joining us. >>Yeah, great. And thank you. >>You bet. Yeah. Glad to have you aboard here. So let's talk about Annie Scale, first off, for those at home and might not be familiar with what you do. Yeah. Because you've only been around for a short period of time, you're telling me >>Company's about >>Three years now. Three >>Years old, >>Yeah. Yeah. So tell us all about it. Yeah, >>Absolutely. So one of the biggest things happening in computing right now is the proliferation of ai. AI is just spreading throughout every industry has the potential to transform every industry. But the thing about doing AI is that it's incredibly computationally intensive. So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, you're doing it across many machines, many gpu, many compute resources, and that's incredibly hard to do. It requires a lot of software engineering expertise, a lot of infrastructure expertise, a lot of cloud computing expertise to build the software infrastructure and distributed systems to really scale AI across all of the, across the cloud. And to do it in a way where you're really getting value out of ai. And so that is the, the problem statement that AI has tremendous potential. It's incredibly hard to do because of the, the scale required. >>And what we are building at any scale is really trying to make that easy. So trying to get to the point where, as a developer, if you know how to program on your laptop, then if you know how to program saying Python on your laptop, then that's enough, right? Then you can do ai, you can get value out of it, you can scale it, you can build the kinds of, you know, incredibly powerful applica AI applications that companies like Google and, and Facebook and others can build. But you don't have to learn about all of the distributed systems and infrastructure. It just, you know, we'll handle that for you. So that's, if we're successful, you know, that's what we're trying to achieve here. >>Yeah. What, what makes AI so hard to work with? I mean, you talk about the complexity. Yeah. A lot of moving parts. I mean, literally moving parts, but, but what is it in, in your mind that, that gets people's eyes spinning a little bit when they, they look at great potential. Yeah. But also they look at the downside of maybe having to work your way through Pike mere of sorts. >>So, so the potential is definitely there, but it's important to remember that a lot of AI initiatives fail. Like a lot of initiative AI initiatives, something like 80 or 90% don't make it out of, you know, the research or prototyping phase and inter production. Hmm. So, some of the things that are hard about AI and the reasons that AI initiatives can fail, one is the scale required, you know, moving. It's one thing to develop something on your laptop, it's another thing to run it across thousands of machines. So that's scale, right? Another is the transition from development and prototyping to production. Those are very different, have very different requirements. Absolutely. A lot of times it's different teams within a company. They have different tech stacks, different software they're using. You know, we hear companies say that when they move from develop, you know, once they prototype and develop a model, it could take six to 12 weeks to get that model in production. >>And that often involves rewriting a lot of code and handing it off to another team. So the transition from development to production is, is a big challenge. So the scale, the development to production handoff. And then lastly, a big challenge is around flexibility. So AI's a fast moving field, you see new developments, new algorithms, new models coming out all the time. And a lot of teams we work with, you know, they've, they've built infrastructure. They're using products out there to do ai, but they've found that it's sort of locking them into rigid workflows or specific tools, and they don't have the flexibility to adopt new algorithms or new strategies or approaches as they're being developed as they come out. And so they, but their developers want the flexibility to use the latest tools, the latest strategies. And so those are some of the main problems we see. It's really like, how do you scale scalability? How do you move easily from development and production and back? And how do you remain flexible? How do you adapt and, and use the best tools that are coming out? And so those are, yeah, just those are and often reasons that people start to use Ray, which is our open source project in any scale, which is our, our product. So tell >>Me about Ray, right? Yeah. Opensource project. I think you said you worked on it >>At Berkeley. That's right. Yeah. So before this company, I did a PhD in machine learning at Berkeley. And one of the challenges that we were running into ourselves, we were trying to do machine learning. We actually weren't infrastructure or distributed systems people, but we found ourselves in order to do machine learning, we found ourselves building all sorts of tools, ad hoc tools and systems to scale the machine learning, to be able to run it in a reasonable amount of time and to be able to leverage the compute that we needed. And it wasn't just us people all across, you know, machine learning researchers, machine learning practitioners were building their own tooling and infrastructure. And that was one of the things that we felt was really holding back progress. And so that's how we slowly and kind of gradually got into saying, Hey, we could build better tools here. >>We could build, we could try to make this easier to do so that all of these people don't have to build their own infrastructure. They can focus on the actual machine learning applications that they're trying to build. And so we started, Ray started this open source project for basically scaling Python applications and scaling machine learning applications. And, well, initially we were running around Berkeley trying to get all of our friends to try it out and, and adopt it and, you know, and give us feedback. And if it didn't work, we would debug it right away. And that slow, you know, that gradually turned into more companies starting to adopt it, bigger teams starting to adopt it, external contributors starting to, to contribute back to the open source project and make it better. And, you know, before you know it, we were hosting meetups, giving to talks, running tutorials, and the project was just taking off. And so that's a big part of what we continue to develop today at any scale, is like really fostering this open source community, growing the open source user base, making sure Ray is just the best way to scale Python applications and, and machine learning applications. >>So, so this was a graduate school project That's right. You say on, on your way to getting your doctorate and now you commercializing now, right? Yeah. I mean, so you're being able to offer it, first off, what a journey that was, right? I mean, who would've thought Absolutely. I guess you probably did think that at some point, but >>No, you know, when we started, when we were working on Ray, we actually didn't anticipate becoming a company, or we at least just weren't looking that far ahead. We were really excited about solving this problem of making distributed computing easy, you know, getting to the point where developers just don't have to learn about infrastructure and distributed systems, but get all the benefits. And of course, it wasn't until, you know, later on as we were graduating from Berkeley and we wanted to continue really taking this project further and, and really solving this problem that it, we realized it made sense to start a company. >>So help me out, like, like what, what, and I might have missed this, so I apologize if I did, but in terms of, of Ray's that building block and essential for your, your ML or AI work down the road, you know, what, what is it doing for me or what, what will it allow me to do in either one of those realms that I, I can't do now? >>Yeah. And so, so like why use Ray versus not using Ray? Yeah, I think the, the answer is that you, you know, if you're doing ai, you need to scale. It's becoming, if you don't find that to be the case today, you probably will tomorrow, you know, or the day after that. And so it's really increasingly, it's a requirement. It's not an option. And so if you're scaling, if you're trying to build these scalable applications you are building, you're either going to use Ray or, or something like Ray or you're going to build the infrastructure yourself and building the infrastructure yourself, that's a long journey. >>So why take that on, right? >>And many of the companies we work with don't want to be in the business of building and managing infrastructure. No. Because, you know, if they, they want their their best engineers to build their product, right? To, to get their product to market faster. >>I want, I want you to do that for me. >>Right? Exactly. And so, you know, we can really accelerate what these teams can do and, you know, and if we can make the infrastructure something they just don't have to think about, that's, that's why you would choose to use Ray. >>Okay. You know, between a and I and ml are, are they different animals in terms of what you're trying to get done or what Ray can do? >>Yeah, and actually I should say like, it's not just, you know, teams that are new teams that are starting out, that are using Ray, many companies that have built, already built their own infrastructure will then switch to using Ray. And to give you a few examples, like Uber runs all their deep learning on Ray, okay. And, you know, open ai, which is really at the frontier of training large models and, and you know, pushing the boundaries of, of ai, they train their largest models using Ray. You know, companies like Shopify rebuilt their entire machine learning platform using Ray, >>But they started somewhere else. >>They had, this is all, you know, like, it's not like the v1, you know, of their, of their machine learning infrastructure. This is like, they did it a different way before, this is like the second version or the third iteration of of, of how they're doing it. And they realize often it's because, you know, I mean in the case of, of Uber, just to give you one example, they built a system called hova for scaling deep learning on a bunch of GPUs. Right Now, as you scale deep learning on GPUs for them, the bottleneck shifted away from, you know, as you scale GPU's training, the bottleneck shifted away from training and to the data ingest and pre-processing. And they wanted to scale data ingest and pre-processing on CPUs. So now Hova, it's a deep learning framework. It doesn't do the data ingest and pre-processing on CPUs, but you can, if you run Hova on top of Ray, you can scale training on GPUs. >>And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. You can pipeline them together. And that allowed them to train larger models on more data before, just to take one example, ETA prediction, if you get in an Uber, it tells you what time you're supposed to arrive. Sure. That uses a deep learning model called d eta. And before they were able to train on about two weeks worth of data. Now, you know, using Ray and for scaling the data, ingestive pre-processing and training, they can train on much more data. You know, you can get more accurate ETA predictions. So that's just one example of the kind of benefit they were able to get. Right. Also, because it's running on top of, of Ray and Ray has this ecosystem of libraries, you know, they can also use Ray's hyper parameter tuning library to do hyper parameter tuning for their deep learning models. >>They can also use it for inference and you know, because these are all built on top of Ray, they inherit the like, elasticity and fault tolerance of running on top of Ray. So really it simplifies things on the infrastructure side cuz there's just, if you have Ray as common infrastructure for your machine learning workloads, there's just one system to, to kind of manage and operate. And if you are, it simplifies things for the end users like the developers because from their perspective, they're just writing a Python application. They don't have to learn how to use three different distributed systems and stitch them together and all of this. >>So aws, before I let you go, how do they come into play here for you? I mean, are you part of the showcase, a startup showcase? So obviously a major partner and major figure in the offering that you're presenting >>People? Yeah, well you can run. So any scale is a managed ray service. Like any scale is just the best way to run Ray and deploy Ray. And we run on top of aws. So many of our customers are, you know, using Ray through any scale on aws. And so we work very closely together and, and you know, we have, we have joint customers and basically, and you know, a lot of the value that any scale is adding on top of Ray is around the production story. So basically, you know, things like high availability, things like failure handling, retry alerting, persistence, reproducibility, these are a lot of the value, the values of, you know, the value that our platform adds on top of the open source project. A lot of stuff as well around collaboration, you know, imagine you are, you, something goes wrong with your application, your production job, you want to debug it, you can just share the URL with your, your coworker. They can click a button, reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. And also a lot around, one thing that's, that's important for a lot of our customers is efficiency around cost. And so we >>Support every customer. >>Exactly. A lot of people are spending a lot of money on, on aws. Yeah. Right? And so any scale supports running out of the box on cheaper like spot instances, these preempt instances, which, you know, just reduce costs by quite a bit. And so things like that. >>Well, the company is any scale and you're on the show floor, right? So if you're having a chance to watch this during reinvent, go down and check 'em out. Robert Ashihara joining us here, the co-founder and ceo and Robert, thanks for being with us. Yeah. Here on the cube. Really enjoyed it. Me too. Thanks so much. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you go. Very nicely done. All right, we're gonna continue our coverage here on the Cube with more here from Las Vegas. We're the Venetian, we're AWS Reinvent 22 and you're watching the Cube, the leader in high tech coverage.

Published Date : Dec 1 2022

SUMMARY :

scale as the co-founder and CEO of the company, Robert and n, you are Robert. And thank you. for those at home and might not be familiar with what you do. Three years now. Yeah, So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, It just, you know, we'll handle that for you. I mean, you talk about the complexity. can fail, one is the scale required, you know, moving. And how do you remain flexible? I think you said you worked on it you know, machine learning researchers, machine learning practitioners were building their own tooling And, you know, before you know it, we were hosting meetups, I guess you probably did think that at some point, distributed computing easy, you know, getting to the point where developers just don't have to learn It's becoming, if you don't find that to be the case today, No. Because, you know, if they, they want their their best engineers to build their product, And so, you know, we can really accelerate what these teams can do to get done or what Ray can do? And to give you a few examples, like Uber runs all their deep learning on Ray, They had, this is all, you know, like, it's not like the v1, And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. And if you are, it simplifies things for the end users reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. these preempt instances, which, you know, just reduce costs by quite a bit. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you

ENTITIES

Entity	Category	Confidence
Robert	PERSON	0.99+
Robert Nishihara	PERSON	0.99+
John Walls	PERSON	0.99+
Robert Ashihara	PERSON	0.99+
six	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Ray	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Annie Scale	PERSON	0.99+
90%	QUANTITY	0.99+
Three	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
80	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Three years	QUANTITY	0.99+
Python	TITLE	0.99+
second version	QUANTITY	0.99+
tomorrow	DATE	0.99+
Facebook	ORGANIZATION	0.99+
Shopify	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
12 weeks	QUANTITY	0.99+
today	DATE	0.99+
third iteration	QUANTITY	0.99+
one system	QUANTITY	0.99+
one example	QUANTITY	0.99+
Ray	ORGANIZATION	0.98+
three years	QUANTITY	0.98+
one	QUANTITY	0.97+
about two weeks	QUANTITY	0.96+
first	QUANTITY	0.96+
thousands of machines	QUANTITY	0.92+
aws	ORGANIZATION	0.91+
one thing	QUANTITY	0.91+
Anyscale	PERSON	0.9+
hova	TITLE	0.84+
Hova	TITLE	0.83+
Venetian	LOCATION	0.81+
money	QUANTITY	0.79+
Reinvent 22	EVENT	0.78+
Invent	EVENT	0.76+
three	QUANTITY	0.74+
Startup Showcase	EVENT	0.71+
Ray	TITLE	0.67+
Reinvent 22	TITLE	0.65+
2022 - Global Startup Program	TITLE	0.63+
things	QUANTITY	0.62+
ceo	PERSON	0.58+
Berkeley	ORGANIZATION	0.55+
v1	TITLE	0.47+
Startup	OTHER	0.38+

Breaking Analysis: We Have the Data…What Private Tech Companies Don’t Tell you About Their Business

>> From The Cube Studios in Palo Alto and Boston, bringing you data driven insights from The Cube at ETR. This is "Breaking Analysis" with Dave Vellante. >> The reverse momentum in tech stocks caused by rising interest rates, less attractive discounted cash flow models, and more tepid forward guidance, can be easily measured by public market valuations. And while there's lots of discussion about the impact on private companies and cash runway and 409A valuations, measuring the performance of non-public companies isn't as easy. IPOs have dried up and public statements by private companies, of course, they accentuate the good and they kind of hide the bad. Real data, unless you're an insider, is hard to find. Hello and welcome to this week's "Wikibon Cube Insights" powered by ETR. In this "Breaking Analysis", we unlock some of the secrets that non-public, emerging tech companies may or may not be sharing. And we do this by introducing you to a capability from ETR that we've not exposed you to over the past couple of years, it's called the Emerging Technologies Survey, and it is packed with sentiment data and performance data based on surveys of more than a thousand CIOs and IT buyers covering more than 400 companies. And we've invited back our colleague, Erik Bradley of ETR to help explain the survey and the data that we're going to cover today. Erik, this survey is something that I've not personally spent much time on, but I'm blown away at the data. It's really unique and detailed. First of all, welcome. Good to see you again. >> Great to see you too, Dave, and I'm really happy to be talking about the ETS or the Emerging Technology Survey. Even our own clients of constituents probably don't spend as much time in here as they should. >> Yeah, because there's so much in the mainstream, but let's pull up a slide to bring out the survey composition. Tell us about the study. How often do you run it? What's the background and the methodology? >> Yeah, you were just spot on the way you were talking about the private tech companies out there. So what we did is we decided to take all the vendors that we track that are not yet public and move 'em over to the ETS. And there isn't a lot of information out there. If you're not in Silicon (indistinct), you're not going to get this stuff. So PitchBook and Tech Crunch are two out there that gives some data on these guys. But what we really wanted to do was go out to our community. We have 6,000, ITDMs in our community. We wanted to ask them, "Are you aware of these companies? And if so, are you allocating any resources to them? Are you planning to evaluate them," and really just kind of figure out what we can do. So this particular survey, as you can see, 1000 plus responses, over 450 vendors that we track. And essentially what we're trying to do here is talk about your evaluation and awareness of these companies and also your utilization. And also if you're not utilizing 'em, then we can also figure out your sales conversion or churn. So this is interesting, not only for the ITDMs themselves to figure out what their peers are evaluating and what they should put in POCs against the big guys when contracts come up. But it's also really interesting for the tech vendors themselves to see how they're performing. >> And you can see 2/3 of the respondents are director level of above. You got 28% is C-suite. There is of course a North America bias, 70, 75% is North America. But these smaller companies, you know, that's when they start doing business. So, okay. We're going to do a couple of things here today. First, we're going to give you the big picture across the sectors that ETR covers within the ETS survey. And then we're going to look at the high and low sentiment for the larger private companies. And then we're going to do the same for the smaller private companies, the ones that don't have as much mindshare. And then I'm going to put those two groups together and we're going to look at two dimensions, actually three dimensions, which companies are being evaluated the most. Second, companies are getting the most usage and adoption of their offerings. And then third, which companies are seeing the highest churn rates, which of course is a silent killer of companies. And then finally, we're going to look at the sentiment and mindshare for two key areas that we like to cover often here on "Breaking Analysis", security and data. And data comprises database, including data warehousing, and then big data analytics is the second part of data. And then machine learning and AI is the third section within data that we're going to look at. Now, one other thing before we get into it, ETR very often will include open source offerings in the mix, even though they're not companies like TensorFlow or Kubernetes, for example. And we'll call that out during this discussion. The reason this is done is for context, because everyone is using open source. It is the heart of innovation and many business models are super glued to an open source offering, like take MariaDB, for example. There's the foundation and then there's with the open source code and then there, of course, the company that sells services around the offering. Okay, so let's first look at the highest and lowest sentiment among these private firms, the ones that have the highest mindshare. So they're naturally going to be somewhat larger. And we do this on two dimensions, sentiment on the vertical axis and mindshare on the horizontal axis and note the open source tool, see Kubernetes, Postgres, Kafka, TensorFlow, Jenkins, Grafana, et cetera. So Erik, please explain what we're looking at here, how it's derived and what the data tells us. >> Certainly, so there is a lot here, so we're going to break it down first of all by explaining just what mindshare and net sentiment is. You explain the axis. We have so many evaluation metrics, but we need to aggregate them into one so that way we can rank against each other. Net sentiment is really the aggregation of all the positive and subtracting out the negative. So the net sentiment is a very quick way of looking at where these companies stand versus their peers in their sectors and sub sectors. Mindshare is basically the awareness of them, which is good for very early stage companies. And you'll see some names on here that are obviously been around for a very long time. And they're clearly be the bigger on the axis on the outside. Kubernetes, for instance, as you mentioned, is open source. This de facto standard for all container orchestration, and it should be that far up into the right, because that's what everyone's using. In fact, the open source leaders are so prevalent in the emerging technology survey that we break them out later in our analysis, 'cause it's really not fair to include them and compare them to the actual companies that are providing the support and the security around that open source technology. But no survey, no analysis, no research would be complete without including these open source tech. So what we're looking at here, if I can just get away from the open source names, we see other things like Databricks and OneTrust . They're repeating as top net sentiment performers here. And then also the design vendors. People don't spend a lot of time on 'em, but Miro and Figma. This is their third survey in a row where they're just dominating that sentiment overall. And Adobe should probably take note of that because they're really coming after them. But Databricks, we all know probably would've been a public company by now if the market hadn't turned, but you can see just how dominant they are in a survey of nothing but private companies. And we'll see that again when we talk about the database later. >> And I'll just add, so you see automation anywhere on there, the big UiPath competitor company that was not able to get to the public markets. They've been trying. Snyk, Peter McKay's company, they've raised a bunch of money, big security player. They're doing some really interesting things in developer security, helping developers secure the data flow, H2O.ai, Dataiku AI company. We saw them at the Snowflake Summit. Redis Labs, Netskope and security. So a lot of names that we know that ultimately we think are probably going to be hitting the public market. Okay, here's the same view for private companies with less mindshare, Erik. Take us through this one. >> On the previous slide too real quickly, I wanted to pull that security scorecard and we'll get back into it. But this is a newcomer, that I couldn't believe how strong their data was, but we'll bring that up in a second. Now, when we go to the ones of lower mindshare, it's interesting to talk about open source, right? Kubernetes was all the way on the top right. Everyone uses containers. Here we see Istio up there. Not everyone is using service mesh as much. And that's why Istio is in the smaller breakout. But still when you talk about net sentiment, it's about the leader, it's the highest one there is. So really interesting to point out. Then we see other names like Collibra in the data side really performing well. And again, as always security, very well represented here. We have Aqua, Wiz, Armis, which is a standout in this survey this time around. They do IoT security. I hadn't even heard of them until I started digging into the data here. And I couldn't believe how well they were doing. And then of course you have AnyScale, which is doing a second best in this and the best name in the survey Hugging Face, which is a machine learning AI tool. Also doing really well on a net sentiment, but they're not as far along on that access of mindshare just yet. So these are again, emerging companies that might not be as well represented in the enterprise as they will be in a couple of years. >> Hugging Face sounds like something you do with your two year old. Like you said, you see high performers, AnyScale do machine learning and you mentioned them. They came out of Berkeley. Collibra Governance, InfluxData is on there. InfluxDB's a time series database. And yeah, of course, Alex, if you bring that back up, you get a big group of red dots, right? That's the bad zone, I guess, which Sisense does vis, Yellowbrick Data is a NPP database. How should we interpret the red dots, Erik? I mean, is it necessarily a bad thing? Could it be misinterpreted? What's your take on that? >> Sure, well, let me just explain the definition of it first from a data science perspective, right? We're a data company first. So the gray dots that you're seeing that aren't named, that's the mean that's the average. So in order for you to be on this chart, you have to be at least one standard deviation above or below that average. So that gray is where we're saying, "Hey, this is where the lump of average comes in. This is where everyone normally stands." So you either have to be an outperformer or an underperformer to even show up in this analysis. So by definition, yes, the red dots are bad. You're at least one standard deviation below the average of your peers. It's not where you want to be. And if you're on the lower left, not only are you not performing well from a utilization or an actual usage rate, but people don't even know who you are. So that's a problem, obviously. And the VCs and the PEs out there that are backing these companies, they're the ones who mostly are interested in this data. >> Yeah. Oh, that's great explanation. Thank you for that. No, nice benchmarking there and yeah, you don't want to be in the red. All right, let's get into the next segment here. Here going to look at evaluation rates, adoption and the all important churn. First new evaluations. Let's bring up that slide. And Erik, take us through this. >> So essentially I just want to explain what evaluation means is that people will cite that they either plan to evaluate the company or they're currently evaluating. So that means we're aware of 'em and we are choosing to do a POC of them. And then we'll see later how that turns into utilization, which is what a company wants to see, awareness, evaluation, and then actually utilizing them. That's sort of the life cycle for these emerging companies. So what we're seeing here, again, with very high evaluation rates. H2O, we mentioned. SecurityScorecard jumped up again. Chargebee, Snyk, Salt Security, Armis. A lot of security names are up here, Aqua, Netskope, which God has been around forever. I still can't believe it's in an Emerging Technology Survey But so many of these names fall in data and security again, which is why we decided to pick those out Dave. And on the lower side, Vena, Acton, those unfortunately took the dubious award of the lowest evaluations in our survey, but I prefer to focus on the positive. So SecurityScorecard, again, real standout in this one, they're in a security assessment space, basically. They'll come in and assess for you how your security hygiene is. And it's an area of a real interest right now amongst our ITDM community. >> Yeah, I mean, I think those, and then Arctic Wolf is up there too. They're doing managed services. You had mentioned Netskope. Yeah, okay. All right, let's look at now adoption. These are the companies whose offerings are being used the most and are above that standard deviation in the green. Take us through this, Erik. >> Sure, yet again, what we're looking at is, okay, we went from awareness, we went to evaluation. Now it's about utilization, which means a survey respondent's going to state "Yes, we evaluated and we plan to utilize it" or "It's already in our enterprise and we're actually allocating further resources to it." Not surprising, again, a lot of open source, the reason why, it's free. So it's really easy to grow your utilization on something that's free. But as you and I both know, as Red Hat proved, there's a lot of money to be made once the open source is adopted, right? You need the governance, you need the security, you need the support wrapped around it. So here we're seeing Kubernetes, Postgres, Apache Kafka, Jenkins, Grafana. These are all open source based names. But if we're looking at names that are non open source, we're going to see Databricks, Automation Anywhere, Rubrik all have the highest mindshare. So these are the names, not surprisingly, all names that probably should have been public by now. Everyone's expecting an IPO imminently. These are the names that have the highest mindshare. If we talk about the highest utilization rates, again, Miro and Figma pop up, and I know they're not household names, but they are just dominant in this survey. These are applications that are meant for design software and, again, they're going after an Autodesk or a CAD or Adobe type of thing. It is just dominant how high the utilization rates are here, which again is something Adobe should be paying attention to. And then you'll see a little bit lower, but also interesting, we see Collibra again, we see Hugging Face again. And these are names that are obviously in the data governance, ML, AI side. So we're seeing a ton of data, a ton of security and Rubrik was interesting in this one, too, high utilization and high mindshare. We know how pervasive they are in the enterprise already. >> Erik, Alex, keep that up for a second, if you would. So yeah, you mentioned Rubrik. Cohesity's not on there. They're sort of the big one. We're going to talk about them in a moment. Puppet is interesting to me because you remember the early days of that sort of space, you had Puppet and Chef and then you had Ansible. Red Hat bought Ansible and then Ansible really took off. So it's interesting to see Puppet on there as well. Okay. So now let's look at the churn because this one is where you don't want to be. It's, of course, all red 'cause churn is bad. Take us through this, Erik. >> Yeah, definitely don't want to be here and I don't love to dwell on the negative. So we won't spend as much time. But to your point, there's one thing I want to point out that think it's important. So you see Rubrik in the same spot, but Rubrik has so many citations in our survey that it actually would make sense that they're both being high utilization and churn just because they're so well represented. They have such a high overall representation in our survey. And the reason I call that out is Cohesity. Cohesity has an extremely high churn rate here about 17% and unlike Rubrik, they were not on the utilization side. So Rubrik is seeing both, Cohesity is not. It's not being utilized, but it's seeing a high churn. So that's the way you can look at this data and say, "Hm." Same thing with Puppet. You noticed that it was on the other slide. It's also on this one. So basically what it means is a lot of people are giving Puppet a shot, but it's starting to churn, which means it's not as sticky as we would like. One that was surprising on here for me was Tanium. It's kind of jumbled in there. It's hard to see in the middle, but Tanium, I was very surprised to see as high of a churn because what I do hear from our end user community is that people that use it, like it. It really kind of spreads into not only vulnerability management, but also that endpoint detection and response side. So I was surprised by that one, mostly to see Tanium in here. Mural, again, was another one of those application design softwares that's seeing a very high churn as well. >> So you're saying if you're in both... Alex, bring that back up if you would. So if you're in both like MariaDB is for example, I think, yeah, they're in both. They're both green in the previous one and red here, that's not as bad. You mentioned Rubrik is going to be in both. Cohesity is a bit of a concern. Cohesity just brought on Sanjay Poonen. So this could be a go to market issue, right? I mean, 'cause Cohesity has got a great product and they got really happy customers. So they're just maybe having to figure out, okay, what's the right ideal customer profile and Sanjay Poonen, I guarantee, is going to have that company cranking. I mean they had been doing very well on the surveys and had fallen off of a bit. The other interesting things wondering the previous survey I saw Cvent, which is an event platform. My only reason I pay attention to that is 'cause we actually have an event platform. We don't sell it separately. We bundle it as part of our offerings. And you see Hopin on here. Hopin raised a billion dollars during the pandemic. And we were like, "Wow, that's going to blow up." And so you see Hopin on the churn and you didn't see 'em in the previous chart, but that's sort of interesting. Like you said, let's not kind of dwell on the negative, but you really don't. You know, churn is a real big concern. Okay, now we're going to drill down into two sectors, security and data. Where data comprises three areas, database and data warehousing, machine learning and AI and big data analytics. So first let's take a look at the security sector. Now this is interesting because not only is it a sector drill down, but also gives an indicator of how much money the firm has raised, which is the size of that bubble. And to tell us if a company is punching above its weight and efficiently using its venture capital. Erik, take us through this slide. Explain the dots, the size of the dots. Set this up please. >> Yeah. So again, the axis is still the same, net sentiment and mindshare, but what we've done this time is we've taken publicly available information on how much capital company is raised and that'll be the size of the circle you see around the name. And then whether it's green or red is basically saying relative to the amount of money they've raised, how are they doing in our data? So when you see a Netskope, which has been around forever, raised a lot of money, that's why you're going to see them more leading towards red, 'cause it's just been around forever and kind of would expect it. Versus a name like SecurityScorecard, which is only raised a little bit of money and it's actually performing just as well, if not better than a name, like a Netskope. OneTrust doing absolutely incredible right now. BeyondTrust. We've seen the issues with Okta, right. So those are two names that play in that space that obviously are probably getting some looks about what's going on right now. Wiz, we've all heard about right? So raised a ton of money. It's doing well on net sentiment, but the mindshare isn't as well as you'd want, which is why you're going to see a little bit of that red versus a name like Aqua, which is doing container and application security. And hasn't raised as much money, but is really neck and neck with a name like Wiz. So that is why on a relative basis, you'll see that more green. As we all know, information security is never going away. But as we'll get to later in the program, Dave, I'm not sure in this current market environment, if people are as willing to do POCs and switch away from their security provider, right. There's a little bit of tepidness out there, a little trepidation. So right now we're seeing overall a slight pause, a slight cooling in overall evaluations on the security side versus historical levels a year ago. >> Now let's stay on here for a second. So a couple things I want to point out. So it's interesting. Now Snyk has raised over, I think $800 million but you can see them, they're high on the vertical and the horizontal, but now compare that to Lacework. It's hard to see, but they're kind of buried in the middle there. That's the biggest dot in this whole thing. I think I'm interpreting this correctly. They've raised over a billion dollars. It's a Mike Speiser company. He was the founding investor in Snowflake. So people watch that very closely, but that's an example of where they're not punching above their weight. They recently had a layoff and they got to fine tune things, but I'm still confident they they're going to do well. 'Cause they're approaching security as a data problem, which is probably people having trouble getting their arms around that. And then again, I see Arctic Wolf. They're not red, they're not green, but they've raised fair amount of money, but it's showing up to the right and decent level there. And a couple of the other ones that you mentioned, Netskope. Yeah, they've raised a lot of money, but they're actually performing where you want. What you don't want is where Lacework is, right. They've got some work to do to really take advantage of the money that they raised last November and prior to that. >> Yeah, if you're seeing that more neutral color, like you're calling out with an Arctic Wolf, like that means relative to their peers, this is where they should be. It's when you're seeing that red on a Lacework where we all know, wow, you raised a ton of money and your mindshare isn't where it should be. Your net sentiment is not where it should be comparatively. And then you see these great standouts, like Salt Security and SecurityScorecard and Abnormal. You know they haven't raised that much money yet, but their net sentiment's higher and their mindshare's doing well. So those basically in a nutshell, if you're a PE or a VC and you see a small green circle, then you're doing well, then it means you made a good investment. >> Some of these guys, I don't know, but you see these small green circles. Those are the ones you want to start digging into and maybe help them catch a wave. Okay, let's get into the data discussion. And again, three areas, database slash data warehousing, big data analytics and ML AI. First, we're going to look at the database sector. So Alex, thank you for bringing that up. Alright, take us through this, Erik. Actually, let me just say Postgres SQL. I got to ask you about this. It shows some funding, but that actually could be a mix of EDB, the company that commercializes Postgres and Postgres the open source database, which is a transaction system and kind of an open source Oracle. You see MariaDB is a database, but open source database. But the companies they've raised over $200 million and they filed an S-4. So Erik looks like this might be a little bit of mashup of companies and open source products. Help us understand this. >> Yeah, it's tough when you start dealing with the open source side and I'll be honest with you, there is a little bit of a mashup here. There are certain names here that are a hundred percent for profit companies. And then there are others that are obviously open source based like Redis is open source, but Redis Labs is the one trying to monetize the support around it. So you're a hundred percent accurate on this slide. I think one of the things here that's important to note though, is just how important open source is to data. If you're going to be going to any of these areas, it's going to be open source based to begin with. And Neo4j is one I want to call out here. It's not one everyone's familiar with, but it's basically geographical charting database, which is a name that we're seeing on a net sentiment side actually really, really high. When you think about it's the third overall net sentiment for a niche database play. It's not as big on the mindshare 'cause it's use cases aren't as often, but third biggest play on net sentiment. I found really interesting on this slide. >> And again, so MariaDB, as I said, they filed an S-4 I think $50 million in revenue, that might even be ARR. So they're not huge, but they're getting there. And by the way, MariaDB, if you don't know, was the company that was formed the day that Oracle bought Sun in which they got MySQL and MariaDB has done a really good job of replacing a lot of MySQL instances. Oracle has responded with MySQL HeatWave, which was kind of the Oracle version of MySQL. So there's some interesting battles going on there. If you think about the LAMP stack, the M in the LAMP stack was MySQL. And so now it's all MariaDB replacing that MySQL for a large part. And then you see again, the red, you know, you got to have some concerns about there. Aerospike's been around for a long time. SingleStore changed their name a couple years ago, last year. Yellowbrick Data, Fire Bolt was kind of going after Snowflake for a while, but yeah, you want to get out of that red zone. So they got some work to do. >> And Dave, real quick for the people that aren't aware, I just want to let them know that we can cut this data with the public company data as well. So we can cross over this with that because some of these names are competing with the larger public company names as well. So we can go ahead and cross reference like a MariaDB with a Mongo, for instance, or of something of that nature. So it's not in this slide, but at another point we can certainly explain on a relative basis how these private names are doing compared to the other ones as well. >> All right, let's take a quick look at analytics. Alex, bring that up if you would. Go ahead, Erik. >> Yeah, I mean, essentially here, I can't see it on my screen, my apologies. I just kind of went to blank on that. So gimme one second to catch up. >> So I could set it up while you're doing that. You got Grafana up and to the right. I mean, this is huge right. >> Got it thank you. I lost my screen there for a second. Yep. Again, open source name Grafana, absolutely up and to the right. But as we know, Grafana Labs is actually picking up a lot of speed based on Grafana, of course. And I think we might actually hear some noise from them coming this year. The names that are actually a little bit more disappointing than I want to call out are names like ThoughtSpot. It's been around forever. Their mindshare of course is second best here but based on the amount of time they've been around and the amount of money they've raised, it's not actually outperforming the way it should be. We're seeing Moogsoft obviously make some waves. That's very high net sentiment for that company. It's, you know, what, third, fourth position overall in this entire area, Another name like Fivetran, Matillion is doing well. Fivetran, even though it's got a high net sentiment, again, it's raised so much money that we would've expected a little bit more at this point. I know you know this space extremely well, but basically what we're looking at here and to the bottom left, you're going to see some names with a lot of red, large circles that really just aren't performing that well. InfluxData, however, second highest net sentiment. And it's really pretty early on in this stage and the feedback we're getting on this name is the use cases are great, the efficacy's great. And I think it's one to watch out for. >> InfluxData, time series database. The other interesting things I just noticed here, you got Tamer on here, which is that little small green. Those are the ones we were saying before, look for those guys. They might be some of the interesting companies out there and then observe Jeremy Burton's company. They do observability on top of Snowflake, not green, but kind of in that gray. So that's kind of cool. Monte Carlo is another one, they're sort of slightly green. They are doing some really interesting things in data and data mesh. So yeah, okay. So I can spend all day on this stuff, Erik, phenomenal data. I got to get back and really dig in. Let's end with machine learning and AI. Now this chart it's similar in its dimensions, of course, except for the money raised. We're not showing that size of the bubble, but AI is so hot. We wanted to cover that here, Erik, explain this please. Why TensorFlow is highlighted and walk us through this chart. >> Yeah, it's funny yet again, right? Another open source name, TensorFlow being up there. And I just want to explain, we do break out machine learning, AI is its own sector. A lot of this of course really is intertwined with the data side, but it is on its own area. And one of the things I think that's most important here to break out is Databricks. We started to cover Databricks in machine learning, AI. That company has grown into much, much more than that. So I do want to state to you Dave, and also the audience out there that moving forward, we're going to be moving Databricks out of only the MA/AI into other sectors. So we can kind of value them against their peers a little bit better. But in this instance, you could just see how dominant they are in this area. And one thing that's not here, but I do want to point out is that we have the ability to break this down by industry vertical, organization size. And when I break this down into Fortune 500 and Fortune 1000, both Databricks and Tensorflow are even better than you see here. So it's quite interesting to see that the names that are succeeding are also succeeding with the largest organizations in the world. And as we know, large organizations means large budgets. So this is one area that I just thought was really interesting to point out that as we break it down, the data by vertical, these two names still are the outstanding players. >> I just also want to call it H2O.ai. They're getting a lot of buzz in the marketplace and I'm seeing them a lot more. Anaconda, another one. Dataiku consistently popping up. DataRobot is also interesting because all the kerfuffle that's going on there. The Cube guy, Cube alum, Chris Lynch stepped down as executive chairman. All this stuff came out about how the executives were taking money off the table and didn't allow the employees to participate in that money raising deal. So that's pissed a lot of people off. And so they're now going through some kind of uncomfortable things, which is unfortunate because DataRobot, I noticed, we haven't covered them that much in "Breaking Analysis", but I've noticed them oftentimes, Erik, in the surveys doing really well. So you would think that company has a lot of potential. But yeah, it's an important space that we're going to continue to watch. Let me ask you Erik, can you contextualize this from a time series standpoint? I mean, how is this changed over time? >> Yeah, again, not show here, but in the data. I'm sorry, go ahead. >> No, I'm sorry. What I meant, I should have interjected. In other words, you would think in a downturn that these emerging companies would be less interesting to buyers 'cause they're more risky. What have you seen? >> Yeah, and it was interesting before we went live, you and I were having this conversation about "Is the downturn stopping people from evaluating these private companies or not," right. In a larger sense, that's really what we're doing here. How are these private companies doing when it comes down to the actual practitioners? The people with the budget, the people with the decision making. And so what I did is, we have historical data as you know, I went back to the Emerging Technology Survey we did in November of 21, right at the crest right before the market started to really fall and everything kind of started to fall apart there. And what I noticed is on the security side, very much so, we're seeing less evaluations than we were in November 21. So I broke it down. On cloud security, net sentiment went from 21% to 16% from November '21. That's a pretty big drop. And again, that sentiment is our one aggregate metric for overall positivity, meaning utilization and actual evaluation of the name. Again in database, we saw it drop a little bit from 19% to 13%. However, in analytics we actually saw it stay steady. So it's pretty interesting that yes, cloud security and security in general is always going to be important. But right now we're seeing less overall net sentiment in that space. But within analytics, we're seeing steady with growing mindshare. And also to your point earlier in machine learning, AI, we're seeing steady net sentiment and mindshare has grown a whopping 25% to 30%. So despite the downturn, we're seeing more awareness of these companies in analytics and machine learning and a steady, actual utilization of them. I can't say the same in security and database. They're actually shrinking a little bit since the end of last year. >> You know it's interesting, we were on a round table, Erik does these round tables with CISOs and CIOs, and I remember one time you had asked the question, "How do you think about some of these emerging tech companies?" And one of the executives said, "I always include somebody in the bottom left of the Gartner Magic Quadrant in my RFPs. I think he said, "That's how I found," I don't know, it was Zscaler or something like that years before anybody ever knew of them "Because they're going to help me get to the next level." So it's interesting to see Erik in these sectors, how they're holding up in many cases. >> Yeah. It's a very important part for the actual IT practitioners themselves. There's always contracts coming up and you always have to worry about your next round of negotiations. And that's one of the roles these guys play. You have to do a POC when contracts come up, but it's also their job to stay on top of the new technology. You can't fall behind. Like everyone's a software company. Now everyone's a tech company, no matter what you're doing. So these guys have to stay in on top of it. And that's what this ETS can do. You can go in here and look and say, "All right, I'm going to evaluate their technology," and it could be twofold. It might be that you're ready to upgrade your technology and they're actually pushing the envelope or it simply might be I'm using them as a negotiation ploy. So when I go back to the big guy who I have full intentions of writing that contract to, at least I have some negotiation leverage. >> Erik, we got to leave it there. I could spend all day. I'm going to definitely dig into this on my own time. Thank you for introducing this, really appreciate your time today. >> I always enjoy it, Dave and I hope everyone out there has a great holiday weekend. Enjoy the rest of the summer. And, you know, I love to talk data. So anytime you want, just point the camera on me and I'll start talking data. >> You got it. I also want to thank the team at ETR, not only Erik, but Darren Bramen who's a data scientist, really helped prepare this data, the entire team over at ETR. I cannot tell you how much additional data there is. We are just scratching the surface in this "Breaking Analysis". So great job guys. I want to thank Alex Myerson. Who's on production and he manages the podcast. Ken Shifman as well, who's just coming back from VMware Explore. Kristen Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE. Does some great editing for us. Thank you. All of you guys. Remember these episodes, they're all available as podcast, wherever you listen. All you got to do is just search "Breaking Analysis" podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me to get in touch david.vellante@siliconangle.com. You can DM me at dvellante or comment on my LinkedIn posts and please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for Erik Bradley and The Cube Insights powered by ETR. Thanks for watching. Be well. And we'll see you next time on "Breaking Analysis". (upbeat music)

Published Date : Sep 7 2022

SUMMARY :

bringing you data driven it's called the Emerging Great to see you too, Dave, so much in the mainstream, not only for the ITDMs themselves It is the heart of innovation So the net sentiment is a very So a lot of names that we And then of course you have AnyScale, That's the bad zone, I guess, So the gray dots that you're rates, adoption and the all And on the lower side, Vena, Acton, in the green. are in the enterprise already. So now let's look at the churn So that's the way you can look of dwell on the negative, So again, the axis is still the same, And a couple of the other And then you see these great standouts, Those are the ones you want to but Redis Labs is the one And by the way, MariaDB, So it's not in this slide, Alex, bring that up if you would. So gimme one second to catch up. So I could set it up but based on the amount of time Those are the ones we were saying before, And one of the things I think didn't allow the employees to here, but in the data. What have you seen? the market started to really And one of the executives said, And that's one of the Thank you for introducing this, just point the camera on me We are just scratching the surface

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Ken Shifman	PERSON	0.99+
Sanjay Poonen	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Erik Bradley	PERSON	0.99+
November 21	DATE	0.99+
Darren Bramen	PERSON	0.99+
Alex	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Postgres	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Netskope	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Fivetran	ORGANIZATION	0.99+
$50 million	QUANTITY	0.99+
21%	QUANTITY	0.99+
Chris Lynch	PERSON	0.99+
19%	QUANTITY	0.99+
Jeremy Burton	PERSON	0.99+
$800 million	QUANTITY	0.99+
6,000	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Redis Labs	ORGANIZATION	0.99+
November '21	DATE	0.99+
ETR	ORGANIZATION	0.99+
First	QUANTITY	0.99+
25%	QUANTITY	0.99+
last year	DATE	0.99+
OneTrust	ORGANIZATION	0.99+
two dimensions	QUANTITY	0.99+
two groups	QUANTITY	0.99+
November of 21	DATE	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
more than 400 companies	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
MySQL	TITLE	0.99+
Moogsoft	ORGANIZATION	0.99+
The Cube	ORGANIZATION	0.99+
third	QUANTITY	0.99+
Grafana	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Mike Speiser	PERSON	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
second	QUANTITY	0.99+
two	QUANTITY	0.99+
first	QUANTITY	0.99+
28%	QUANTITY	0.99+
16%	QUANTITY	0.99+
Second	QUANTITY	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Rayand Anyscale: