Robert Nishihara, Anyscale | AWS Startup Showcase S3 E1

(upbeat music) >> Hello everyone. Welcome to theCube's presentation of the "AWS Startup Showcase." The topic this episode is AI and machine learning, top startups building foundational model infrastructure. This is season three, episode one of the ongoing series covering exciting startups from the AWS ecosystem. And this time we're talking about AI and machine learning. I'm your host, John Furrier. I'm excited I'm joined today by Robert Nishihara, who's the co-founder and CEO of a hot startup called Anyscale. He's here to talk about Ray, the open source project, Anyscale's infrastructure for foundation as well. Robert, thank you for joining us today. >> Yeah, thanks so much as well. >> I've been following your company since the founding pre pandemic and you guys really had a great vision scaled up and in a perfect position for this big wave that we all see with ChatGPT and OpenAI that's gone mainstream. Finally, AI has broken out through the ropes and now gone mainstream, so I think you guys are really well positioned. I'm looking forward to to talking with you today. But before we get into it, introduce the core mission for Anyscale. Why do you guys exist? What is the North Star for Anyscale? >> Yeah, like you mentioned, there's a tremendous amount of excitement about AI right now. You know, I think a lot of us believe that AI can transform just every different industry. So one of the things that was clear to us when we started this company was that the amount of compute needed to do AI was just exploding. Like to actually succeed with AI, companies like OpenAI or Google or you know, these companies getting a lot of value from AI, were not just running these machine learning models on their laptops or on a single machine. They were scaling these applications across hundreds or thousands or more machines and GPUs and other resources in the Cloud. And so to actually succeed with AI, and this has been one of the biggest trends in computing, maybe the biggest trend in computing in, you know, in recent history, the amount of compute has been exploding. And so to actually succeed with that AI, to actually build these scalable applications and scale the AI applications, there's a tremendous software engineering lift to build the infrastructure to actually run these scalable applications. And that's very hard to do. So one of the reasons many AI projects and initiatives fail is that, or don't make it to production, is the need for this scale, the infrastructure lift, to actually make it happen. So our goal here with Anyscale and Ray, is to make that easy, is to make scalable computing easy. So that as a developer or as a business, if you want to do AI, if you want to get value out of AI, all you need to know is how to program on your laptop. Like, all you need to know is how to program in Python. And if you can do that, then you're good to go. Then you can do what companies like OpenAI or Google do and get value out of machine learning. >> That programming example of how easy it is with Python reminds me of the early days of Cloud, when infrastructure as code was talked about was, it was just code the infrastructure programmable. That's super important. That's what AI people wanted, first program AI. That's the new trend. And I want to understand, if you don't mind explaining, the relationship that Anyscale has to these foundational models and particular the large language models, also called LLMs, was seen with like OpenAI and ChatGPT. Before you get into the relationship that you have with them, can you explain why the hype around foundational models? Why are people going crazy over foundational models? What is it and why is it so important? >> Yeah, so foundational models and foundation models are incredibly important because they enable businesses and developers to get value out of machine learning, to use machine learning off the shelf with these large models that have been trained on tons of data and that are useful out of the box. And then, of course, you know, as a business or as a developer, you can take those foundational models and repurpose them or fine tune them or adapt them to your specific use case and what you want to achieve. But it's much easier to do that than to train them from scratch. And I think there are three, for people to actually use foundation models, there are three main types of workloads or problems that need to be solved. One is training these foundation models in the first place, like actually creating them. The second is fine tuning them and adapting them to your use case. And the third is serving them and actually deploying them. Okay, so Ray and Anyscale are used for all of these three different workloads. Companies like OpenAI or Cohere that train large language models. Or open source versions like GPTJ are done on top of Ray. There are many startups and other businesses that fine tune, that, you know, don't want to train the large underlying foundation models, but that do want to fine tune them, do want to adapt them to their purposes, and build products around them and serve them, those are also using Ray and Anyscale for that fine tuning and that serving. And so the reason that Ray and Anyscale are important here is that, you know, building and using foundation models requires a huge scale. It requires a lot of data. It requires a lot of compute, GPUs, TPUs, other resources. And to actually take advantage of that and actually build these scalable applications, there's a lot of infrastructure that needs to happen under the hood. And so you can either use Ray and Anyscale to take care of that and manage the infrastructure and solve those infrastructure problems. Or you can build the infrastructure and manage the infrastructure yourself, which you can do, but it's going to slow your team down. It's going to, you know, many of the businesses we work with simply don't want to be in the business of managing infrastructure and building infrastructure. They want to focus on product development and move faster. >> I know you got a keynote presentation we're going to go to in a second, but I think you hit on something I think is the real tipping point, doing it yourself, hard to do. These are things where opportunities are and the Cloud did that with data centers. Turned a data center and made it an API. The heavy lifting went away and went to the Cloud so people could be more creative and build their product. In this case, build their creativity. Is that kind of what's the big deal? Is that kind of a big deal happening that you guys are taking the learnings and making that available so people don't have to do that? >> That's exactly right. So today, if you want to succeed with AI, if you want to use AI in your business, infrastructure work is on the critical path for doing that. To do AI, you have to build infrastructure. You have to figure out how to scale your applications. That's going to change. We're going to get to the point, and you know, with Ray and Anyscale, we're going to remove the infrastructure from the critical path so that as a developer or as a business, all you need to focus on is your application logic, what you want the the program to do, what you want your application to do, how you want the AI to actually interface with the rest of your product. Now the way that will happen is that Ray and Anyscale will still, the infrastructure work will still happen. It'll just be under the hood and taken care of by Ray in Anyscale. And so I think something like this is really necessary for AI to reach its potential, for AI to have the impact and the reach that we think it will, you have to make it easier to do. >> And just for clarification to point out, if you don't mind explaining the relationship of Ray and Anyscale real quick just before we get into the presentation. >> So Ray is an open source project. We created it. We were at Berkeley doing machine learning. We started Ray so that, in order to provide an easy, a simple open source tool for building and running scalable applications. And Anyscale is the managed version of Ray, basically we will run Ray for you in the Cloud, provide a lot of tools around the developer experience and managing the infrastructure and providing more performance and superior infrastructure. >> Awesome. I know you got a presentation on Ray and Anyscale and you guys are positioning as the infrastructure for foundational models. So I'll let you take it away and then when you're done presenting, we'll come back, I'll probably grill you with a few questions and then we'll close it out so take it away. >> Robert: Sounds great. So I'll say a little bit about how companies are using Ray and Anyscale for foundation models. The first thing I want to mention is just why we're doing this in the first place. And the underlying observation, the underlying trend here, and this is a plot from OpenAI, is that the amount of compute needed to do machine learning has been exploding. It's been growing at something like 35 times every 18 months. This is absolutely enormous. And other people have written papers measuring this trend and you get different numbers. But the point is, no matter how you slice and dice it, it' a astronomical rate. Now if you compare that to something we're all familiar with, like Moore's Law, which says that, you know, the processor performance doubles every roughly 18 months, you can see that there's just a tremendous gap between the needs, the compute needs of machine learning applications, and what you can do with a single chip, right. So even if Moore's Law were continuing strong and you know, doing what it used to be doing, even if that were the case, there would still be a tremendous gap between what you can do with the chip and what you need in order to do machine learning. And so given this graph, what we've seen, and what has been clear to us since we started this company, is that doing AI requires scaling. There's no way around it. It's not a nice to have, it's really a requirement. And so that led us to start Ray, which is the open source project that we started to make it easy to build these scalable Python applications and scalable machine learning applications. And since we started the project, it's been adopted by a tremendous number of companies. Companies like OpenAI, which use Ray to train their large models like ChatGPT, companies like Uber, which run all of their deep learning and classical machine learning on top of Ray, companies like Shopify or Spotify or Instacart or Lyft or Netflix, ByteDance, which use Ray for their machine learning infrastructure. Companies like Ant Group, which makes Alipay, you know, they use Ray across the board for fraud detection, for online learning, for detecting money laundering, you know, for graph processing, stream processing. Companies like Amazon, you know, run Ray at a tremendous scale and just petabytes of data every single day. And so the project has seen just enormous adoption since, over the past few years. And one of the most exciting use cases is really providing the infrastructure for building training, fine tuning, and serving foundation models. So I'll say a little bit about, you know, here are some examples of companies using Ray for foundation models. Cohere trains large language models. OpenAI also trains large language models. You can think about the workloads required there are things like supervised pre-training, also reinforcement learning from human feedback. So this is not only the regular supervised learning, but actually more complex reinforcement learning workloads that take human input about what response to a particular question, you know is better than a certain other response. And incorporating that into the learning. There's open source versions as well, like GPTJ also built on top of Ray as well as projects like Alpa coming out of UC Berkeley. So these are some of the examples of exciting projects in organizations, training and creating these large language models and serving them using Ray. Okay, so what actually is Ray? Well, there are two layers to Ray. At the lowest level, there's the core Ray system. This is essentially low level primitives for building scalable Python applications. Things like taking a Python function or a Python class and executing them in the cluster setting. So Ray core is extremely flexible and you can build arbitrary scalable applications on top of Ray. So on top of Ray, on top of the core system, what really gives Ray a lot of its power is this ecosystem of scalable libraries. So on top of the core system you have libraries, scalable libraries for ingesting and pre-processing data, for training your models, for fine tuning those models, for hyper parameter tuning, for doing batch processing and batch inference, for doing model serving and deployment, right. And a lot of the Ray users, the reason they like Ray is that they want to run multiple workloads. They want to train and serve their models, right. They want to load their data and feed that into training. And Ray provides common infrastructure for all of these different workloads. So this is a little overview of what Ray, the different components of Ray. So why do people choose to go with Ray? I think there are three main reasons. The first is the unified nature. The fact that it is common infrastructure for scaling arbitrary workloads, from data ingest to pre-processing to training to inference and serving, right. This also includes the fact that it's future proof. AI is incredibly fast moving. And so many people, many companies that have built their own machine learning infrastructure and standardized on particular workflows for doing machine learning have found that their workflows are too rigid to enable new capabilities. If they want to do reinforcement learning, if they want to use graph neural networks, they don't have a way of doing that with their standard tooling. And so Ray, being future proof and being flexible and general gives them that ability. Another reason people choose Ray in Anyscale is the scalability. This is really our bread and butter. This is the reason, the whole point of Ray, you know, making it easy to go from your laptop to running on thousands of GPUs, making it easy to scale your development workloads and run them in production, making it easy to scale, you know, training to scale data ingest, pre-processing and so on. So scalability and performance, you know, are critical for doing machine learning and that is something that Ray provides out of the box. And lastly, Ray is an open ecosystem. You can run it anywhere. You can run it on any Cloud provider. Google, you know, Google Cloud, AWS, Asure. You can run it on your Kubernetes cluster. You can run it on your laptop. It's extremely portable. And not only that, it's framework agnostic. You can use Ray to scale arbitrary Python workloads. You can use it to scale and it integrates with libraries like TensorFlow or PyTorch or JAX or XG Boost or Hugging Face or PyTorch Lightning, right, or Scikit-learn or just your own arbitrary Python code. It's open source. And in addition to integrating with the rest of the machine learning ecosystem and these machine learning frameworks, you can use Ray along with all of the other tooling in the machine learning ecosystem. That's things like weights and biases or ML flow, right. Or you know, different data platforms like Databricks, you know, Delta Lake or Snowflake or tools for model monitoring for feature stores, all of these integrate with Ray. And that's, you know, Ray provides that kind of flexibility so that you can integrate it into the rest of your workflow. And then Anyscale is the scalable compute platform that's built on top, you know, that provides Ray. So Anyscale is a managed Ray service that runs in the Cloud. And what Anyscale does is it offers the best way to run Ray. And if you think about what you get with Anyscale, there are fundamentally two things. One is about moving faster, accelerating the time to market. And you get that by having the managed service so that as a developer you don't have to worry about managing infrastructure, you don't have to worry about configuring infrastructure. You also, it provides, you know, optimized developer workflows. Things like easily moving from development to production, things like having the observability tooling, the debug ability to actually easily diagnose what's going wrong in a distributed application. So things like the dashboards and the other other kinds of tooling for collaboration, for monitoring and so on. And then on top of that, so that's the first bucket, developer productivity, moving faster, faster experimentation and iteration. The second reason that people choose Anyscale is superior infrastructure. So this is things like, you know, cost deficiency, being able to easily take advantage of spot instances, being able to get higher GPU utilization, things like faster cluster startup times and auto scaling. Things like just overall better performance and faster scheduling. And so these are the kinds of things that Anyscale provides on top of Ray. It's the managed infrastructure. It's fast, it's like the developer productivity and velocity as well as performance. So this is what I wanted to share about Ray in Anyscale. >> John: Awesome. >> Provide that context. But John, I'm curious what you think. >> I love it. I love the, so first of all, it's a platform because that's the platform architecture right there. So just to clarify, this is an Anyscale platform, not- >> That's right. >> Tools. So you got tools in the platform. Okay, that's key. Love that managed service. Just curious, you mentioned Python multiple times, is that because of PyTorch and TensorFlow or Python's the most friendly with machine learning or it's because it's very common amongst all developers? >> That's a great question. Python is the language that people are using to do machine learning. So it's the natural starting point. Now, of course, Ray is actually designed in a language agnostic way and there are companies out there that use Ray to build scalable Java applications. But for the most part right now we're focused on Python and being the best way to build these scalable Python and machine learning applications. But, of course, down the road there always is that potential. >> So if you're slinging Python code out there and you're watching that, you're watching this video, get on Anyscale bus quickly. Also, I just, while you were giving the presentation, I couldn't help, since you mentioned OpenAI, which by the way, congratulations 'cause they've had great scale, I've noticed in their rapid growth 'cause they were the fastest company to the number of users than anyone in the history of the computer industry, so major successor, OpenAI and ChatGPT, huge fan. I'm not a skeptic at all. I think it's just the beginning, so congratulations. But I actually typed into ChatGPT, what are the top three benefits of Anyscale and came up with scalability, flexibility, and ease of use. Obviously, scalability is what you guys are called. >> That's pretty good. >> So that's what they came up with. So they nailed it. Did you have an inside prompt training, buy it there? Only kidding. (Robert laughs) >> Yeah, we hard coded that one. >> But that's the kind of thing that came up really, really quickly if I asked it to write a sales document, it probably will, but this is the future interface. This is why people are getting excited about the foundational models and the large language models because it's allowing the interface with the user, the consumer, to be more human, more natural. And this is clearly will be in every application in the future. >> Absolutely. This is how people are going to interface with software, how they're going to interface with products in the future. It's not just something, you know, not just a chat bot that you talk to. This is going to be how you get things done, right. How you use your web browser or how you use, you know, how you use Photoshop or how you use other products. Like you're not going to spend hours learning all the APIs and how to use them. You're going to talk to it and tell it what you want it to do. And of course, you know, if it doesn't understand it, it's going to ask clarifying questions. You're going to have a conversation and then it'll figure it out. >> This is going to be one of those things, we're going to look back at this time Robert and saying, "Yeah, from that company, that was the beginning of that wave." And just like AWS and Cloud Computing, the folks who got in early really were in position when say the pandemic came. So getting in early is a good thing and that's what everyone's talking about is getting in early and playing around, maybe replatforming or even picking one or few apps to refactor with some staff and managed services. So people are definitely jumping in. So I have to ask you the ROI cost question. You mentioned some of those, Moore's Law versus what's going on in the industry. When you look at that kind of scale, the first thing that jumps out at people is, "Okay, I love it. Let's go play around." But what's it going to cost me? Am I going to be tied to certain GPUs? What's the landscape look like from an operational standpoint, from the customer? Are they locked in and the benefit was flexibility, are you flexible to handle any Cloud? What is the customers, what are they looking at? Basically, that's my question. What's the customer looking at? >> Cost is super important here and many of the companies, I mean, companies are spending a huge amount on their Cloud computing, on AWS, and on doing AI, right. And I think a lot of the advantage of Anyscale, what we can provide here is not only better performance, but cost efficiency. Because if we can run something faster and more efficiently, it can also use less resources and you can lower your Cloud spending, right. We've seen companies go from, you know, 20% GPU utilization with their current setup and the current tools they're using to running on Anyscale and getting more like 95, you know, 100% GPU utilization. That's something like a five x improvement right there. So depending on the kind of application you're running, you know, it's a significant cost savings. We've seen companies that have, you know, processing petabytes of data every single day with Ray going from, you know, getting order of magnitude cost savings by switching from what they were previously doing to running their application on Ray. And when you have applications that are spending, you know, potentially $100 million a year and getting a 10 X cost savings is just absolutely enormous. So these are some of the kinds of- >> Data infrastructure is super important. Again, if the customer, if you're a prospect to this and thinking about going in here, just like the Cloud, you got infrastructure, you got the platform, you got SaaS, same kind of thing's going to go on in AI. So I want to get into that, you know, ROI discussion and some of the impact with your customers that are leveraging the platform. But first I hear you got a demo. >> Robert: Yeah, so let me show you, let me give you a quick run through here. So what I have open here is the Anyscale UI. I've started a little Anyscale Workspace. So Workspaces are the Anyscale concept for interactive developments, right. So here, imagine I'm just, you want to have a familiar experience like you're developing on your laptop. And here I have a terminal. It's not on my laptop. It's actually in the cloud running on Anyscale. And I'm just going to kick this off. This is going to train a large language model, so OPT. And it's doing this on 32 GPUs. We've got a cluster here with a bunch of CPU cores, bunch of memory. And as that's running, and by the way, if I wanted to run this on instead of 32 GPUs, 64, 128, this is just a one line change when I launch the Workspace. And what I can do is I can pull up VS code, right. Remember this is the interactive development experience. I can look at the actual code. Here it's using Ray train to train the torch model. We've got the training loop and we're saying that each worker gets access to one GPU and four CPU cores. And, of course, as I make the model larger, this is using deep speed, as I make the model larger, I could increase the number of GPUs that each worker gets access to, right. And how that is distributed across the cluster. And if I wanted to run on CPUs instead of GPUs or a different, you know, accelerator type, again, this is just a one line change. And here we're using Ray train to train the models, just taking my vanilla PyTorch model using Hugging Face and then scaling that across a bunch of GPUs. And, of course, if I want to look at the dashboard, I can go to the Ray dashboard. There are a bunch of different visualizations I can look at. I can look at the GPU utilization. I can look at, you know, the CPU utilization here where I think we're currently loading the model and running that actual application to start the training. And some of the things that are really convenient here about Anyscale, both I can get that interactive development experience with VS code. You know, I can look at the dashboards. I can monitor what's going on. It feels, I have a terminal, it feels like my laptop, but it's actually running on a large cluster. And I can, with however many GPUs or other resources that I want. And so it's really trying to combine the best of having the familiar experience of programming on your laptop, but with the benefits, you know, being able to take advantage of all the resources in the Cloud to scale. And it's like when, you know, you're talking about cost efficiency. One of the biggest reasons that people waste money, one of the silly reasons for wasting money is just forgetting to turn off your GPUs. And what you can do here is, of course, things will auto terminate if they're idle. But imagine you go to sleep, I have this big cluster. You can turn it off, shut off the cluster, come back tomorrow, restart the Workspace, and you know, your big cluster is back up and all of your code changes are still there. All of your local file edits. It's like you just closed your laptop and came back and opened it up again. And so this is the kind of experience we want to provide for our users. So that's what I wanted to share with you. >> Well, I think that whole, couple of things, lines of code change, single line of code change, that's game changing. And then the cost thing, I mean human error is a big deal. People pass out at their computer. They've been coding all night or they just forget about it. I mean, and then it's just like leaving the lights on or your water running in your house. It's just, at the scale that it is, the numbers will add up. That's a huge deal. So I think, you know, compute back in the old days, there's no compute. Okay, it's just compute sitting there idle. But you know, data cranking the models is doing, that's a big point. >> Another thing I want to add there about cost efficiency is that we make it really easy to use, if you're running on Anyscale, to use spot instances and these preemptable instances that can just be significantly cheaper than the on-demand instances. And so when we see our customers go from what they're doing before to using Anyscale and they go from not using these spot instances 'cause they don't have the infrastructure around it, the fault tolerance to handle the preemption and things like that, to being able to just check a box and use spot instances and save a bunch of money. >> You know, this was my whole, my feature article at Reinvent last year when I met with Adam Selipsky, this next gen Cloud is here. I mean, it's not auto scale, it's infrastructure scale. It's agility. It's flexibility. I think this is where the world needs to go. Almost what DevOps did for Cloud and what you were showing me that demo had this whole SRE vibe. And remember Google had site reliability engines to manage all those servers. This is kind of like an SRE vibe for data at scale. I mean, a similar kind of order of magnitude. I mean, I might be a little bit off base there, but how would you explain it? >> It's a nice analogy. I mean, what we are trying to do here is get to the point where developers don't think about infrastructure. Where developers only think about their application logic. And where businesses can do AI, can succeed with AI, and build these scalable applications, but they don't have to build, you know, an infrastructure team. They don't have to develop that expertise. They don't have to invest years in building their internal machine learning infrastructure. They can just focus on the Python code, on their application logic, and run the stuff out of the box. >> Awesome. Well, I appreciate the time. Before we wrap up here, give a plug for the company. I know you got a couple websites. Again, go, Ray's got its own website. You got Anyscale. You got an event coming up. Give a plug for the company looking to hire. Put a plug in for the company. >> Yeah, absolutely. Thank you. So first of all, you know, we think AI is really going to transform every industry and the opportunity is there, right. We can be the infrastructure that enables all of that to happen, that makes it easy for companies to succeed with AI, and get value out of AI. Now we have, if you're interested in learning more about Ray, Ray has been emerging as the standard way to build scalable applications. Our adoption has been exploding. I mentioned companies like OpenAI using Ray to train their models. But really across the board companies like Netflix and Cruise and Instacart and Lyft and Uber, you know, just among tech companies. It's across every industry. You know, gaming companies, agriculture, you know, farming, robotics, drug discovery, you know, FinTech, we see it across the board. And all of these companies can get value out of AI, can really use AI to improve their businesses. So if you're interested in learning more about Ray and Anyscale, we have our Ray Summit coming up in September. This is going to highlight a lot of the most impressive use cases and stories across the industry. And if your business, if you want to use LLMs, you want to train these LLMs, these large language models, you want to fine tune them with your data, you want to deploy them, serve them, and build applications and products around them, give us a call, talk to us. You know, we can really take the infrastructure piece, you know, off the critical path and make that easy for you. So that's what I would say. And, you know, like you mentioned, we're hiring across the board, you know, engineering, product, go-to-market, and it's an exciting time. >> Robert Nishihara, co-founder and CEO of Anyscale, congratulations on a great company you've built and continuing to iterate on and you got growth ahead of you, you got a tailwind. I mean, the AI wave is here. I think OpenAI and ChatGPT, a customer of yours, have really opened up the mainstream visibility into this new generation of applications, user interface, roll of data, large scale, how to make that programmable so we're going to need that infrastructure. So thanks for coming on this season three, episode one of the ongoing series of the hot startups. In this case, this episode is the top startups building foundational model infrastructure for AI and ML. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

SUMMARY :

episode one of the ongoing and you guys really had and other resources in the Cloud. and particular the large language and what you want to achieve. and the Cloud did that with data centers. the point, and you know, if you don't mind explaining and managing the infrastructure and you guys are positioning is that the amount of compute needed to do But John, I'm curious what you think. because that's the platform So you got tools in the platform. and being the best way to of the computer industry, Did you have an inside prompt and the large language models and tell it what you want it to do. So I have to ask you and you can lower your So I want to get into that, you know, and you know, your big cluster is back up So I think, you know, the on-demand instances. and what you were showing me that demo and run the stuff out of the box. I know you got a couple websites. and the opportunity is there, right. and you got growth ahead

ENTITIES

Entity	Category	Confidence
Robert Nishihara	PERSON	0.99+
John	PERSON	0.99+
Robert	PERSON	0.99+
John Furrier	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
35 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
$100 million	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Ant Group	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Python	TITLE	0.99+
20%	QUANTITY	0.99+
32 GPUs	QUANTITY	0.99+
Lyft	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
tomorrow	DATE	0.99+
Anyscale	ORGANIZATION	0.99+
three	QUANTITY	0.99+
128	QUANTITY	0.99+
September	DATE	0.99+
today	DATE	0.99+
Moore's Law	TITLE	0.99+
Adam Selipsky	PERSON	0.99+
PyTorch	TITLE	0.99+
Ray	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
64	QUANTITY	0.99+
each worker	QUANTITY	0.99+
each worker	QUANTITY	0.99+
Photoshop	TITLE	0.99+
UC Berkeley	ORGANIZATION	0.99+
Java	TITLE	0.99+
Shopify	ORGANIZATION	0.99+
OpenAI	ORGANIZATION	0.99+
Anyscale	PERSON	0.99+
third	QUANTITY	0.99+
two things	QUANTITY	0.99+
ByteDance	ORGANIZATION	0.99+
Spotify	ORGANIZATION	0.99+
One	QUANTITY	0.99+
95	QUANTITY	0.99+
Asure	ORGANIZATION	0.98+
one line	QUANTITY	0.98+
one GPU	QUANTITY	0.98+
ChatGPT	TITLE	0.98+
TensorFlow	TITLE	0.98+
last year	DATE	0.98+
first bucket	QUANTITY	0.98+
both	QUANTITY	0.98+
two layers	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Alipay	ORGANIZATION	0.98+
Ray	PERSON	0.97+
one	QUANTITY	0.97+
Instacart	ORGANIZATION	0.97+

Robert Nishihara, Anyscale | CUBE Conversation

(upbeat instrumental) >> Hello and welcome to this CUBE conversation. I'm John Furrier, host of theCUBE, here in Palo Alto, California. Got a great conversation with Robert Nishihara who's the co-founder and CEO of Anyscale. Robert, great to have you on this CUBE conversation. It's great to see you. We did your first Ray Summit a couple years ago and congratulations on your venture. Great to have you on. >> Thank you. Thanks for inviting me. >> So you're first time CEO out of Berkeley in Data. You got the Databricks is coming out of there. You got a bunch of activity coming from Berkeley. It's like a, it really is kind of like where a lot of innovations going on data. Anyscale has been one of those startups that has risen out of that scene. Right? You look at the success of what the Data lakes are now. Now you've got the generative AI. This has been a really interesting innovation market. This new wave is coming. Tell us what's going on with Anyscale right now, as you guys are gearing up and getting some growth. What's happening with the company? >> Yeah, well one of the most exciting things that's been happening in computing recently, is the rise of AI and the excitement about AI, and the potential for AI to really transform every industry. Now of course, one of the of the biggest challenges to actually making that happen is that doing AI, that AI is incredibly computationally intensive, right? To actually succeed with AI to actually get value out of AI. You're typically not just running it on your laptop, you're often running it and scaling it across thousands of machines, or hundreds of machines or GPUs, and to, so organizations and companies and businesses that do AI often end up building a large infrastructure team to manage the distributed systems, the computing to actually scale these applications. And that's a, that's a, a huge software engineering lift, right? And so, one of the goals for Anyscale is really to make that easy. To get to the point where, developers and teams and companies can succeed with AI. Can build these scalable AI applications, without really you know, without a huge investment in infrastructure with a lot of, without a lot of expertise in infrastructure, where really all they need to know is how to program on their laptop, how to program in Python. And if you have that, then that's really all you need to succeed with AI. So that's what we've been focused on. We're building Ray, which is an open source project that's been starting to get adopted by tons of companies, to actually train these models, to deploy these models, to do inference with these models, you know, to ingest and pre-process their data. And our goals, you know, here with the company are really to make Ray successful. To grow the Ray community, and then to build a great product around it and simplify the development and deployment, and productionization of machine learning for, for all these businesses. >> It's a great trend. Everyone wants developer productivity seeing that, clearly right now. And plus, developers are voting literally on what standards become. As you look at how the market is open source driven, a lot of that I love the model, love the Ray project love the, love the Anyscale value proposition. How big are you guys now, and how is that value proposition of Ray and Anyscale and foundational models coming together? Because it seems like you guys are in a perfect storm situation where you guys could get a real tailwind and draft off the the mega trend that everyone's getting excited. The new toy is ChatGPT. So you got to look at that and say, hey, I mean, come on, you guys did all the heavy lifting. >> Absolutely. >> You know how many people you are, and what's the what's the proposition for you guys these days? >> You know our company's about a hundred people, that a bit larger than that. Ray's been going really quickly. It's been, you know, companies using, like OpenAI uses Ray to train their models, like ChatGPT. Companies like Uber run all their deep learning you know, and classical machine learning on top of Ray. Companies like Shopify, Spotify, Netflix, Cruise, Lyft, Instacart, you know, Bike Dance. A lot of these companies are investing heavily in Ray for their machine learning infrastructure. And I think it's gotten to the point where, if you're one of these, you know type of businesses, and you're looking to revamp your machine learning infrastructure. If you're looking to enable new capabilities, you know make your teams more productive, increase, speed up the experimentation cycle, you know make it more performance, like build, you know, run applications that are more scalable, run them faster, run them in a more cost efficient way. All of these types of companies are at least evaluating Ray and Ray is an increasingly common choice there. I think if they're not using Ray, if many of these companies that end up not using Ray, they often end up building their own infrastructure. So Ray has been, the growth there has been incredibly exciting over the, you know we had our first in-person Ray Summit just back in August, and planning the next one for, for coming September. And so when you asked about the value proposition, I think there's there's really two main things, when people choose to go with Ray and Anyscale. One reason is about moving faster, right? It's about developer productivity, it's about speeding up the experimentation cycle, easily getting their models in production. You know, we hear many companies say that they, you know they, once they prototype a model, once they develop a model, it's another eight weeks, or 12 weeks to actually get that model in production. And that's a reason they talk to us. We hear companies say that, you know they've been training their models and, and doing inference on a single machine, and they've been sort of scaling vertically, like using bigger and bigger machines. But they, you know, you can only do that for so long, and at some point you need to go beyond a single machine and that's when they start talking to us. Right? So one of the main value propositions is around moving faster. I think probably the phrase I hear the most is, companies saying that they don't want their machine learning people to have to spend all their time configuring infrastructure. All this is about productivity. >> Yeah. >> The other. >> It's the big brains in the company. That are being used to do remedial tasks that should be automated right? I mean that's. >> Yeah, and I mean, it's hard stuff, right? It's also not these people's area of expertise, and or where they're adding the most value. So all of this is around developer productivity, moving faster, getting to market faster. The other big value prop and the reason people choose Ray and choose Anyscale, is around just providing superior infrastructure. This is really, can we scale more? You know, can we run it faster, right? Can we run it in a more cost effective way? We hear people saying that they're not getting good GPU utilization with the existing tools they're using, or they can't scale beyond a certain point, or you know they don't have a way to efficiently use spot instances to save costs, right? Or their clusters, you know can't auto scale up and down fast enough, right? These are all the kinds of things that Ray and Anyscale, where Ray and Anyscale add value and solve these kinds of problems. >> You know, you bring up great points. Auto scaling concept, early days, it was easy getting more compute. Now it's complicated. They're built into more integrated apps in the cloud. And you mentioned those companies that you're working with, that's impressive. Those are like the big hardcore, I call them hardcore. They have a good technical teams. And as the wave starts to move from these companies that were hyper scaling up all the time, the mainstream are just developers, right? So you need an interface in, so I see the dots connecting with you guys and I want to get your reaction. Is that how you see it? That you got the alphas out there kind of kicking butt, building their own stuff, alpha developers and infrastructure. But mainstream just wants programmability. They want that heavy lifting taken care of for them. Is that kind of how you guys see it? I mean, take us through that. Because to get crossover to be democratized, the automation's got to be there. And for developer productivity to be in, it's got to be coding and programmability. >> That's right. Ultimately for AI to really be successful, and really you know, transform every industry in the way we think it has the potential to. It has to be easier to use, right? And that is, and being easier to use, there's many dimensions to that. But an important one is that as a developer to do AI, you shouldn't have to be an expert in distributed systems. You shouldn't have to be an expert in infrastructure. If you do have to be, that's going to really limit the number of people who can do this, right? And I think there are so many, all of the companies we talk to, they don't want to be in the business of building and managing infrastructure. It's not that they can't do it. But it's going to slow them down, right? They want to allocate their time and their energy toward building their product, right? To building a better product, getting their product to market faster. And if we can take the infrastructure work off of the critical path for them, that's going to speed them up, it's going to simplify their lives. And I think that is critical for really enabling all of these companies to succeed with AI. >> Talk about the customers you guys are talking to right now, and how that translates over. Because I think you hit a good thread there. Data infrastructure is critical. Managed services are coming online, open sources continuing to grow. You have these people building their own, and then if they abandon it or don't scale it properly, there's kind of consequences. 'Cause it's a system you mentioned, it's a distributed system architecture. It's not as easy as standing up a monolithic app these days. So when you guys go to the marketplace and talk to customers, put the customers in buckets. So you got the ones that are kind of leaning in, that are pretty peaked, probably working with you now, open source. And then what's the customer profile look like as you go mainstream? Are they looking to manage service, looking for more architectural system, architecture approach? What's the, Anyscale progression? How do you engage with your customers? What are they telling you? >> Yeah, so many of these companies, yes, they're looking for managed infrastructure 'cause they want to move faster, right? Now the kind of these profiles of these different customers, they're three main workloads that companies run on Anyscale, run with Ray. It's training related workloads, and it is serving and deployment related workloads, like actually deploying your models, and it's batch processing, batch inference related workloads. Like imagine you want to do computer vision on tons and tons of, of images or videos, or you want to do natural language processing on millions of documents or audio, or speech or things like that, right? So the, I would say the, there's a pretty large variety of use cases, but the most common you know, we see tons of people working with computer vision data, you know, computer vision problems, natural language processing problems. And it's across many different industries. We work with companies doing drug discovery, companies doing you know, gaming or e-commerce, right? Companies doing robotics or agriculture. So there's a huge variety of the types of industries that can benefit from AI, and can really get a lot of value out of AI. And, but the, but the problems are the same problems that they all want to solve. It's like how do you make your team move faster, you know succeed with AI, be more productive, speed up the experimentation, and also how do you do this in a more performant way, in a faster, cheaper, in a more cost efficient, more scalable way. >> It's almost like the cloud game is coming back to AI and these foundational models, because I was just on a podcast, we recorded our weekly podcast, and I was just riffing with Dave Vellante, my co-host on this, were like, hey, in the early days of Amazon, if you want to build an app, you just, you have to build a data center, and then you go to now you go to the cloud, cloud's easier, pay a little money, penny's on the dollar, you get your app up and running. Cloud computing is born. With foundation models in generative AI. The old model was hard, heavy lifting, expensive, build out, before you get to do anything, as you mentioned time. So I got to think that you're pretty much in a good position with this foundational model trend in generative AI because I just looked at the foundation map, foundation models, map of the ecosystem. You're starting to see layers of, you got the tooling, you got platform, you got cloud. It's filling out really quickly. So why is Anyscale important to this new trend? How do you talk to people when they ask you, you know what does ChatGPT mean for Anyscale? And how does the financial foundational model growth, fit into your plan? >> Well, foundational models are hugely important for the industry broadly. Because you're going to have these really powerful models that are trained that you know, have been trained on tremendous amounts of data. tremendous amounts of computes, and that are useful out of the box, right? That people can start to use, and query, and get value out of, without necessarily training these huge models themselves. Now Ray fits in and Anyscale fit in, in a number of places. First of all, they're useful for creating these foundation models. Companies like OpenAI, you know, use Ray for this purpose. Companies like Cohere use Ray for these purposes. You know, IBM. If you look at, there's of course also open source versions like GPTJ, you know, created using Ray. So a lot of these large language models, large foundation models benefit from training on top of Ray. And, but of course for every company training and creating these huge foundation models, you're going to have many more that are fine tuning these models with their own data. That are deploying and serving these models for their own applications, that are building other application and business logic around these models. And that's where Ray also really shines, because Ray you know, is, can provide common infrastructure for all of these workloads. The training, the fine tuning, the serving, the data ingest and pre-processing, right? The hyper parameter tuning, the and and so on. And so where the reason Ray and Anyscale are important here, is that, again, foundation models are large, foundation models are compute intensive, doing you know, using both creating and using these foundation models requires tremendous amounts of compute. And there there's a big infrastructure lift to make that happen. So either you are using Ray and Anyscale to do this, or you are building the infrastructure and managing the infrastructure yourself. Which you can do, but it's, it's hard. >> Good luck with that. I always say good luck with that. I mean, I think if you really need to do, build that hardened foundation, you got to go all the way. And I think this, this idea of composability is interesting. How is Ray working with OpenAI for instance? Take, take us through that. Because I think you're going to see a lot of people talking about, okay I got trained models, but I'm going to have not one, I'm going to have many. There's big debate that OpenAI is going to be the mother of all LLMs, but now, but really people are also saying that to be many more, either purpose-built or specific. The fusion and these things come together there's like a blending of data, and that seems to be a value proposition. How does Ray help these guys get their models up? Can you take, take us through what Ray's doing for say OpenAI and others, and how do you see the models interacting with each other? >> Yeah, great question. So where, where OpenAI uses Ray right now, is for the training workloads. Training both to create ChatGPT and models like that. There's both a supervised learning component, where you're pre-training this model on doing supervised pre-training with example data. There's also a reinforcement learning component, where you are fine-tuning the model and continuing to train the model, but based on human feedback, based on input from humans saying that, you know this response to this question is better than this other response to this question, right? And so Ray provides the infrastructure for scaling the training across many, many GPUs, many many machines, and really running that in an efficient you know, performance fault tolerant way, right? And so, you know, open, this is not the first version of OpenAI's infrastructure, right? They've gone through iterations where they did start with building the infrastructure themselves. They were using tools like MPI. But at some point, you know, given the complexity, given the scale of what they're trying to do, you hit a wall with MPI and that's going to happen with a lot of other companies in this space. And at that point you don't have many other options other than to use Ray or to build your own infrastructure. >> That's awesome. And then your vision on this data interaction, because the old days monolithic models were very rigid. You couldn't really interface with them. But we're kind of seeing this future of data fusion, data interaction, data blending at large scale. What's your vision? How do you, what's your vision of where this goes? Because if this goes the way people think. You can have this data chemistry kind of thing going on where people are integrating all kinds of data with each other at large scale. So you need infrastructure, intelligence, reasoning, a lot of code. Is this something that you see? What's your vision in all this? Take us through. >> AI is going to be used everywhere right? It's, we see this as a technology that's going to be ubiquitous, and is going to transform every business. I mean, imagine you make a product, maybe you were making a tool like Photoshop or, or whatever the, you know, tool is. The way that people are going to use your tool, is not by investing, you know, hundreds of hours into learning all of the different, you know specific buttons they need to press and workflows they need to go through it. They're going to talk to it, right? They're going to say, ask it to do the thing they want it to do right? And it's going to do it. And if it, if it doesn't know what it's want, what it's, what's being asked of it. It's going to ask clarifying questions, right? And then you're going to clarify, and you're going to have a conversation. And this is going to make many many many kinds of tools and technology and products easier to use, and lower the barrier to entry. And so, and this, you know, many companies fit into this category of trying to build products that, and trying to make them easier to use, this is just one kind of way it can, one kind of way that AI will will be used. But I think it's, it's something that's pretty ubiquitous. >> Yeah. It'll be efficient, it'll be efficiency up and down the stack, and will change the productivity equation completely. You just highlighted one, I don't want to fill out forms, just stand up my environment for me. And then start coding away. Okay well this is great stuff. Final word for the folks out there watching, obviously new kind of skill set for hiring. You guys got engineers, give a plug for the company, for Anyscale. What are you looking for? What are you guys working on? Give a, take the last minute to put a plug in for the company. >> Yeah well if you're interested in AI and if you think AI is really going to be transformative, and really be useful for all these different industries. We are trying to provide the infrastructure to enable that to happen, right? So I think there's the potential here, to really solve an important problem, to get to the point where developers don't need to think about infrastructure, don't need to think about distributed systems. All they think about is their application logic, and what they want their application to do. And I think if we can achieve that, you know we can be the foundation or the platform that enables all of these other companies to succeed with AI. So that's where we're going. I think something like this has to happen if AI is going to achieve its potential, we're looking for, we're hiring across the board, you know, great engineers, on the go-to-market side, product managers, you know people who want to really, you know, make this happen. >> Awesome well congratulations. I know you got some good funding behind you. You're in a good spot. I think this is happening. I think generative AI and foundation models is going to be the next big inflection point, as big as the pc inter-networking, internet and smartphones. This is a whole nother application framework, a whole nother set of things. So this is the ground floor. Robert, you're, you and your team are right there. Well done. >> Thank you so much. >> All right. Thanks for coming on this CUBE conversation. I'm John Furrier with theCUBE. Breaking down a conversation around AI and scaling up in this new next major inflection point. This next wave is foundational models, generative AI. And thanks to ChatGPT, the whole world's now knowing about it. So it really is changing the game and Anyscale is right there, one of the hot startups, that is in good position to ride this next wave. Thanks for watching. (upbeat instrumental)

Published Date : Feb 24 2023

SUMMARY :

Robert, great to have you Thanks for inviting me. as you guys are gearing up and the potential for AI to a lot of that I love the and at some point you need It's the big brains in the company. and the reason people the automation's got to be there. and really you know, and talk to customers, put but the most common you know, and then you go to now that are trained that you know, and that seems to be a value proposition. And at that point you don't So you need infrastructure, and lower the barrier to entry. What are you guys working on? and if you think AI is really is going to be the next And thanks to ChatGPT,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Robert Nishihara	PERSON	0.99+
John Furrier	PERSON	0.99+
12 weeks	QUANTITY	0.99+
Robert	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Lyft	ORGANIZATION	0.99+
Shopify	ORGANIZATION	0.99+
eight weeks	QUANTITY	0.99+
Spotify	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
August	DATE	0.99+
September	DATE	0.99+
Palo Alto, California	LOCATION	0.99+
Cruise	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Instacart	ORGANIZATION	0.99+
Anyscale	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Photoshop	TITLE	0.99+
One reason	QUANTITY	0.99+
Bike Dance	ORGANIZATION	0.99+
Ray	ORGANIZATION	0.99+
Python	TITLE	0.99+
thousands of machines	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
two main things	QUANTITY	0.98+
single machine	QUANTITY	0.98+
Cohere	ORGANIZATION	0.98+
Ray and Anyscale	ORGANIZATION	0.98+
millions of documents	QUANTITY	0.98+
both	QUANTITY	0.98+
one kind	QUANTITY	0.96+
first version	QUANTITY	0.95+
CUBE	ORGANIZATION	0.95+
about a hundred people	QUANTITY	0.95+
hundreds of machines	QUANTITY	0.95+
one	QUANTITY	0.95+
OpenAI	ORGANIZATION	0.94+
First	QUANTITY	0.94+
hundreds of hours	QUANTITY	0.93+
first time	QUANTITY	0.93+
Databricks	ORGANIZATION	0.91+
Ray and Anyscale	ORGANIZATION	0.9+
tons	QUANTITY	0.89+
couple years ago	DATE	0.88+
Ray and	ORGANIZATION	0.86+
ChatGPT	TITLE	0.81+
tons of people	QUANTITY	0.8+

Robert Nishihara, Anyscale | AWS re:Invent 2022 - Global Startup Program

>>Well, hello everybody. John Walls here and continuing our coverage here at AWS Reinvent 22 on the queue. We continue our segments here in the Global Startup program, which of course is sponsored by AWS Startup Showcase, and with us to talk about any scale as the co-founder and CEO of the company, Robert and n, you are Robert. Good to see you. Thanks for joining us. >>Yeah, great. And thank you. >>You bet. Yeah. Glad to have you aboard here. So let's talk about Annie Scale, first off, for those at home and might not be familiar with what you do. Yeah. Because you've only been around for a short period of time, you're telling me >>Company's about >>Three years now. Three >>Years old, >>Yeah. Yeah. So tell us all about it. Yeah, >>Absolutely. So one of the biggest things happening in computing right now is the proliferation of ai. AI is just spreading throughout every industry has the potential to transform every industry. But the thing about doing AI is that it's incredibly computationally intensive. So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, you're doing it across many machines, many gpu, many compute resources, and that's incredibly hard to do. It requires a lot of software engineering expertise, a lot of infrastructure expertise, a lot of cloud computing expertise to build the software infrastructure and distributed systems to really scale AI across all of the, across the cloud. And to do it in a way where you're really getting value out of ai. And so that is the, the problem statement that AI has tremendous potential. It's incredibly hard to do because of the, the scale required. >>And what we are building at any scale is really trying to make that easy. So trying to get to the point where, as a developer, if you know how to program on your laptop, then if you know how to program saying Python on your laptop, then that's enough, right? Then you can do ai, you can get value out of it, you can scale it, you can build the kinds of, you know, incredibly powerful applica AI applications that companies like Google and, and Facebook and others can build. But you don't have to learn about all of the distributed systems and infrastructure. It just, you know, we'll handle that for you. So that's, if we're successful, you know, that's what we're trying to achieve here. >>Yeah. What, what makes AI so hard to work with? I mean, you talk about the complexity. Yeah. A lot of moving parts. I mean, literally moving parts, but, but what is it in, in your mind that, that gets people's eyes spinning a little bit when they, they look at great potential. Yeah. But also they look at the downside of maybe having to work your way through Pike mere of sorts. >>So, so the potential is definitely there, but it's important to remember that a lot of AI initiatives fail. Like a lot of initiative AI initiatives, something like 80 or 90% don't make it out of, you know, the research or prototyping phase and inter production. Hmm. So, some of the things that are hard about AI and the reasons that AI initiatives can fail, one is the scale required, you know, moving. It's one thing to develop something on your laptop, it's another thing to run it across thousands of machines. So that's scale, right? Another is the transition from development and prototyping to production. Those are very different, have very different requirements. Absolutely. A lot of times it's different teams within a company. They have different tech stacks, different software they're using. You know, we hear companies say that when they move from develop, you know, once they prototype and develop a model, it could take six to 12 weeks to get that model in production. >>And that often involves rewriting a lot of code and handing it off to another team. So the transition from development to production is, is a big challenge. So the scale, the development to production handoff. And then lastly, a big challenge is around flexibility. So AI's a fast moving field, you see new developments, new algorithms, new models coming out all the time. And a lot of teams we work with, you know, they've, they've built infrastructure. They're using products out there to do ai, but they've found that it's sort of locking them into rigid workflows or specific tools, and they don't have the flexibility to adopt new algorithms or new strategies or approaches as they're being developed as they come out. And so they, but their developers want the flexibility to use the latest tools, the latest strategies. And so those are some of the main problems we see. It's really like, how do you scale scalability? How do you move easily from development and production and back? And how do you remain flexible? How do you adapt and, and use the best tools that are coming out? And so those are, yeah, just those are and often reasons that people start to use Ray, which is our open source project in any scale, which is our, our product. So tell >>Me about Ray, right? Yeah. Opensource project. I think you said you worked on it >>At Berkeley. That's right. Yeah. So before this company, I did a PhD in machine learning at Berkeley. And one of the challenges that we were running into ourselves, we were trying to do machine learning. We actually weren't infrastructure or distributed systems people, but we found ourselves in order to do machine learning, we found ourselves building all sorts of tools, ad hoc tools and systems to scale the machine learning, to be able to run it in a reasonable amount of time and to be able to leverage the compute that we needed. And it wasn't just us people all across, you know, machine learning researchers, machine learning practitioners were building their own tooling and infrastructure. And that was one of the things that we felt was really holding back progress. And so that's how we slowly and kind of gradually got into saying, Hey, we could build better tools here. >>We could build, we could try to make this easier to do so that all of these people don't have to build their own infrastructure. They can focus on the actual machine learning applications that they're trying to build. And so we started, Ray started this open source project for basically scaling Python applications and scaling machine learning applications. And, well, initially we were running around Berkeley trying to get all of our friends to try it out and, and adopt it and, you know, and give us feedback. And if it didn't work, we would debug it right away. And that slow, you know, that gradually turned into more companies starting to adopt it, bigger teams starting to adopt it, external contributors starting to, to contribute back to the open source project and make it better. And, you know, before you know it, we were hosting meetups, giving to talks, running tutorials, and the project was just taking off. And so that's a big part of what we continue to develop today at any scale, is like really fostering this open source community, growing the open source user base, making sure Ray is just the best way to scale Python applications and, and machine learning applications. >>So, so this was a graduate school project That's right. You say on, on your way to getting your doctorate and now you commercializing now, right? Yeah. I mean, so you're being able to offer it, first off, what a journey that was, right? I mean, who would've thought Absolutely. I guess you probably did think that at some point, but >>No, you know, when we started, when we were working on Ray, we actually didn't anticipate becoming a company, or we at least just weren't looking that far ahead. We were really excited about solving this problem of making distributed computing easy, you know, getting to the point where developers just don't have to learn about infrastructure and distributed systems, but get all the benefits. And of course, it wasn't until, you know, later on as we were graduating from Berkeley and we wanted to continue really taking this project further and, and really solving this problem that it, we realized it made sense to start a company. >>So help me out, like, like what, what, and I might have missed this, so I apologize if I did, but in terms of, of Ray's that building block and essential for your, your ML or AI work down the road, you know, what, what is it doing for me or what, what will it allow me to do in either one of those realms that I, I can't do now? >>Yeah. And so, so like why use Ray versus not using Ray? Yeah, I think the, the answer is that you, you know, if you're doing ai, you need to scale. It's becoming, if you don't find that to be the case today, you probably will tomorrow, you know, or the day after that. And so it's really increasingly, it's a requirement. It's not an option. And so if you're scaling, if you're trying to build these scalable applications you are building, you're either going to use Ray or, or something like Ray or you're going to build the infrastructure yourself and building the infrastructure yourself, that's a long journey. >>So why take that on, right? >>And many of the companies we work with don't want to be in the business of building and managing infrastructure. No. Because, you know, if they, they want their their best engineers to build their product, right? To, to get their product to market faster. >>I want, I want you to do that for me. >>Right? Exactly. And so, you know, we can really accelerate what these teams can do and, you know, and if we can make the infrastructure something they just don't have to think about, that's, that's why you would choose to use Ray. >>Okay. You know, between a and I and ml are, are they different animals in terms of what you're trying to get done or what Ray can do? >>Yeah, and actually I should say like, it's not just, you know, teams that are new teams that are starting out, that are using Ray, many companies that have built, already built their own infrastructure will then switch to using Ray. And to give you a few examples, like Uber runs all their deep learning on Ray, okay. And, you know, open ai, which is really at the frontier of training large models and, and you know, pushing the boundaries of, of ai, they train their largest models using Ray. You know, companies like Shopify rebuilt their entire machine learning platform using Ray, >>But they started somewhere else. >>They had, this is all, you know, like, it's not like the v1, you know, of their, of their machine learning infrastructure. This is like, they did it a different way before, this is like the second version or the third iteration of of, of how they're doing it. And they realize often it's because, you know, I mean in the case of, of Uber, just to give you one example, they built a system called hova for scaling deep learning on a bunch of GPUs. Right Now, as you scale deep learning on GPUs for them, the bottleneck shifted away from, you know, as you scale GPU's training, the bottleneck shifted away from training and to the data ingest and pre-processing. And they wanted to scale data ingest and pre-processing on CPUs. So now Hova, it's a deep learning framework. It doesn't do the data ingest and pre-processing on CPUs, but you can, if you run Hova on top of Ray, you can scale training on GPUs. >>And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. You can pipeline them together. And that allowed them to train larger models on more data before, just to take one example, ETA prediction, if you get in an Uber, it tells you what time you're supposed to arrive. Sure. That uses a deep learning model called d eta. And before they were able to train on about two weeks worth of data. Now, you know, using Ray and for scaling the data, ingestive pre-processing and training, they can train on much more data. You know, you can get more accurate ETA predictions. So that's just one example of the kind of benefit they were able to get. Right. Also, because it's running on top of, of Ray and Ray has this ecosystem of libraries, you know, they can also use Ray's hyper parameter tuning library to do hyper parameter tuning for their deep learning models. >>They can also use it for inference and you know, because these are all built on top of Ray, they inherit the like, elasticity and fault tolerance of running on top of Ray. So really it simplifies things on the infrastructure side cuz there's just, if you have Ray as common infrastructure for your machine learning workloads, there's just one system to, to kind of manage and operate. And if you are, it simplifies things for the end users like the developers because from their perspective, they're just writing a Python application. They don't have to learn how to use three different distributed systems and stitch them together and all of this. >>So aws, before I let you go, how do they come into play here for you? I mean, are you part of the showcase, a startup showcase? So obviously a major partner and major figure in the offering that you're presenting >>People? Yeah, well you can run. So any scale is a managed ray service. Like any scale is just the best way to run Ray and deploy Ray. And we run on top of aws. So many of our customers are, you know, using Ray through any scale on aws. And so we work very closely together and, and you know, we have, we have joint customers and basically, and you know, a lot of the value that any scale is adding on top of Ray is around the production story. So basically, you know, things like high availability, things like failure handling, retry alerting, persistence, reproducibility, these are a lot of the value, the values of, you know, the value that our platform adds on top of the open source project. A lot of stuff as well around collaboration, you know, imagine you are, you, something goes wrong with your application, your production job, you want to debug it, you can just share the URL with your, your coworker. They can click a button, reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. And also a lot around, one thing that's, that's important for a lot of our customers is efficiency around cost. And so we >>Support every customer. >>Exactly. A lot of people are spending a lot of money on, on aws. Yeah. Right? And so any scale supports running out of the box on cheaper like spot instances, these preempt instances, which, you know, just reduce costs by quite a bit. And so things like that. >>Well, the company is any scale and you're on the show floor, right? So if you're having a chance to watch this during reinvent, go down and check 'em out. Robert Ashihara joining us here, the co-founder and ceo and Robert, thanks for being with us. Yeah. Here on the cube. Really enjoyed it. Me too. Thanks so much. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you go. Very nicely done. All right, we're gonna continue our coverage here on the Cube with more here from Las Vegas. We're the Venetian, we're AWS Reinvent 22 and you're watching the Cube, the leader in high tech coverage.

Published Date : Dec 1 2022

SUMMARY :

scale as the co-founder and CEO of the company, Robert and n, you are Robert. And thank you. for those at home and might not be familiar with what you do. Three years now. Yeah, So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, It just, you know, we'll handle that for you. I mean, you talk about the complexity. can fail, one is the scale required, you know, moving. And how do you remain flexible? I think you said you worked on it you know, machine learning researchers, machine learning practitioners were building their own tooling And, you know, before you know it, we were hosting meetups, I guess you probably did think that at some point, distributed computing easy, you know, getting to the point where developers just don't have to learn It's becoming, if you don't find that to be the case today, No. Because, you know, if they, they want their their best engineers to build their product, And so, you know, we can really accelerate what these teams can do to get done or what Ray can do? And to give you a few examples, like Uber runs all their deep learning on Ray, They had, this is all, you know, like, it's not like the v1, And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. And if you are, it simplifies things for the end users reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. these preempt instances, which, you know, just reduce costs by quite a bit. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you

ENTITIES

Entity	Category	Confidence
Robert	PERSON	0.99+
Robert Nishihara	PERSON	0.99+
John Walls	PERSON	0.99+
Robert Ashihara	PERSON	0.99+
six	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Ray	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Annie Scale	PERSON	0.99+
90%	QUANTITY	0.99+
Three	QUANTITY	0.99+
Berkeley	LOCATION	0.99+
80	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Three years	QUANTITY	0.99+
Python	TITLE	0.99+
second version	QUANTITY	0.99+
tomorrow	DATE	0.99+
Facebook	ORGANIZATION	0.99+
Shopify	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
12 weeks	QUANTITY	0.99+
today	DATE	0.99+
third iteration	QUANTITY	0.99+
one system	QUANTITY	0.99+
one example	QUANTITY	0.99+
Ray	ORGANIZATION	0.98+
three years	QUANTITY	0.98+
one	QUANTITY	0.97+
about two weeks	QUANTITY	0.96+
first	QUANTITY	0.96+
thousands of machines	QUANTITY	0.92+
aws	ORGANIZATION	0.91+
one thing	QUANTITY	0.91+
Anyscale	PERSON	0.9+
hova	TITLE	0.84+
Hova	TITLE	0.83+
Venetian	LOCATION	0.81+
money	QUANTITY	0.79+
Reinvent 22	EVENT	0.78+
Invent	EVENT	0.76+
three	QUANTITY	0.74+
Startup Showcase	EVENT	0.71+
Ray	TITLE	0.67+
Reinvent 22	TITLE	0.65+
2022 - Global Startup Program	TITLE	0.63+
things	QUANTITY	0.62+
ceo	PERSON	0.58+
Berkeley	ORGANIZATION	0.55+
v1	TITLE	0.47+
Startup	OTHER	0.38+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Robert Nishihara: