Luis Ceze, OctoML | Cube Conversation
(gentle music) >> Hello, everyone. Welcome to this Cube Conversation. I'm John Furrier, host of theCUBE here, in our Palo Alto Studios. We're featuring OctoML. I'm with the CEO, Luis Ceze. Chief Executive Officer, Co-founder of OctoML. I'm John Furrier of theCUBE. Thanks for joining us today. Luis, great to see you. Last time we spoke was at "re:MARS" Amazon's event. Kind of a joint event between (indistinct) and Amazon, kind of put a lot together. Great to see you. >> Great to see you again, John. I really have good memories of that interview. You know, that was definitely a great time. Great to chat with you again. >> The world of ML and AI, machine learning and AI is really hot. Everyone's talking about it. It's really great to see that advance. So I'm looking forward to this conversation but before we get started, introduce who you are in OctoML. >> Sure. I'm Luis Ceze, Co-founder and CEO at OctoML. I'm also professor of Computer Science at University of Washington. You know, OctoML grew out of our efforts on the Apache CVM project, which is a compiler in runtime system that enables folks to run machine learning models in a broad set of harder in the Edge and in the Cloud very efficiently. You know, we grew that project and grew that community, definitely saw there was something to pain point there. And then we built OctoML, OctoML is about three and a half years old now. And the mission, the company is to enable customers to deploy models very efficiently in the Cloud. And make them, you know, run. Do it quickly, run fast, and run at a low cost, which is something that's especially timely right now. >> I like to point out also for the folks 'casue they should know that you're also a professor in the Computer Science department at University of Washington. A great program there. This is a really an inflection point with AI machine learning. The computer science industry has been waiting for decades to advance AI with all this new cloud computing, all the hardware and silicon advancements, GPUs. This is the perfect storm. And you know, this the computer science now we we're seeing an acceleration. Can you share your view, and you're obviously a professor in that department but also, an entrepreneur. This is a great time for computer science. Explain why. >> Absolutely, yeah, no. Just like the confluence of you know, advances in what, you know, computers can do as devices to computer information. Plus, you know, advances in AI that enable applications that you know, we thought it was highly futuristic and now it's just right there today. You know, AI that can generate photo realistic images from descriptions, you know, can write text that's pretty good. Can help augment, you know, human creativity in a really meaningful way. So you see the confluence of capabilities and the creativity of humankind into new applications is just extremely exciting, both from a researcher point of view as well as an entrepreneur point of view, right. >> What should people know about these large language models we're seeing with ChatGPT and how Google has got a lot of work going on that air. There's been a lot of work recently. What's different now about these models, and why are they so popular and effective now? What's the difference between now, and say five years ago, that makes it more- >> Oh, yeah. It's a huge inflection on their capabilities, I always say like emergent behavior, right? So as these models got more complex and our ability to train and deploy them, you know, got to this point... You know, they really crossed a threshold into doing things that are truly surprising, right? In terms of generating, you know, exhalation for things generating tax, summarizing tax, expending tax. And you know, exhibiting what to some may look like reasoning. They're not quite reasoning fundamentally. They're generating tax that looks like they're reasoning, but they do it so well, that it feels like was done by a human, right. So I would say that the biggest changes that, you know, now, they can actually do things that are extremely useful for business in people's lives today. And that wasn't the case five years ago. So that's in the model capabilities and that is being paired with huge advances in computing that enabled this to be... Enables this to be, you know, actually see line of sites to be deployed at scale, right. And that's where we come in, by the way, but yeah. >> Yeah, I want to get into that. And also, you know, the fusion of data integrating data sets at scales. Another one we're seeing a lot of happening now. It's not just some, you know, siloed, pre-built data modeling. It's a lot of agility and a lot of new integration capabilities of data. How is that impacting the dynamics? >> Yeah, absolutely. So I'll say that the ability to either take the data that has that exists in training a model to do something useful with it, and more interestingly I would say, using baseline foundational models and with a little bit of data, turn them into something that can do a specialized task really, really well. Created this really fast proliferation of really impactful applications, right? >> If every company now is looking at this trend and I'm seeing a lot... And I think every company will rebuild their business with machine learning. If they're not already doing it. And the folks that aren't will probably be dinosaurs will be out of business. This is a real business transformation moment where machine learning and AI, as it goes mainstream. I think it's just the beginning. This is where you guys come in, and you guys are poised for handling this frenzy to change business with machine learning models. How do you guys help customers as they look at this, you know, transition to get, you know, concept to production with machine learning? >> Great. Great questions, yeah, so I would say that it's fair to say there's a bunch of models out there that can do useful things right off the box, right? So and also, the ability to create models improved quite a bit. So the challenge now shifted to customers, you know. Everyone is looking to incorporating AI into their applications. So what we do for them is to, first of all, how do you do that quickly, without needing highly specialized, difficult to find engineering? And very importantly, how do you do that at cost that's accessible, right? So all of these fantastic models that we just talked about, they use an amount of computing that's just astronomical compared to anything else we've done in the past. It means the costs that come with it, are also very, very high. So it's important to enable customers to, you know, incorporate AI into their applications, to their use cases in a way that they can do, with the people that they have, and the costs that they can afford, such that they can have, you know, the maximum impacting possibly have. And finally, you know, helping them deal with hardware availability, as you know, even though we made a lot of progress in making computing cheaper and cheaper. Even to this day, you know, you can never get enough. And getting an allocation, getting the right hardware to run these incredibly hungry models is hard. And we help customers deal with, you know, harder availability as well. >> Yeah, for the folks watching as a... If you search YouTube, there's an interview we did last year at "re:MARS," I mentioned that earlier, just a great interview. You talked about this hardware independence, this traction. I want to get into that, because if you look at all the foundation models that are out there right now, that are getting traction, you're seeing two trends. You're seeing proprietary and open source. And obviously, open source always wins in my opinion, but, you know, there's this iPhone moment and android moment that one of your investors John Torrey from Madrona, talked about was is iPhone versus Android moment, you know, one's proprietary hardware and they're very specialized high performance and then open source. This is an important distinction and you guys are hardware independent. What's the... Explain what all this means. >> Yeah. Great set of questions. First of all, yeah. So, you know, OpenAI, and of course, they create ChatGPT and they offer an API to run these models that does amazing things. But customers have to be able to go and send their data over to OpenAI, right? So, and run the model there and get the outputs. Now, there's open source models that can do amazing things as well, right? So they typically open source models, so they don't lag behind, you know, these proprietary closed models by more than say, you know, six months or so, let's say. And it means that enabling customers to take the models that they want and deploy under their control is something that's very valuable, because one, you don't have to expose your data to externally. Two, you can customize the model even more to the things that you wanted to do. And then three, you can run on an infrastructure that can be much more cost effective than having to, you know, pay somebody else's, you know, cost and markup, right? So, and where we help them is essentially help customers, enable customers to take machine learning models, say an open source model, and automate the process of putting them into production, optimize them to run with the right performance, and more importantly, give them the independence to run where they need to run, where they can run best, right? >> Yeah, and also, you know, I point out all the time that, you know, there's never any stopping the innovation of hardware silicon. You're seeing cloud computing more coming in there. So, you know, being hardware independent has some advantages. And if you look at OpenAI, for instance, you mentioned ChatGPT, I think this is interesting because I think everyone is scratching their head, going, "Okay, I need to move to this new generation." What's your pro tip and advice for folks who want to move to, or businesses that want to say move to machine learning? How do they get started? What are some of the considerations they need to think about to deploy these models into production? >> Yeah, great though. Great set of questions. First of all, I mean, I'm sure they're very aware of the kind of things that you want to do with AI, right? So you could be interacting with customers, you know, automating, interacting with customers. It could be, you know, finding issues in production lines. It could be, you know... Generating, you know, making it easier to produce content and so on. Like, you know, customers, users would have an idea what they want to do. You know, from that it can actually determine, what kind of machine learning models would solve the problem that would, you know, fits that use case. But then, that's when the hard thing begins, right? So when you find a model, identify the model that can do the thing that you wanted to do, you need to turn that into a thing that you can deploy. So how do you go from machine learning model that does a thing that you need to do, to a container with the right executor, the artifact they can actually go and deploy, right? So we've seen customers doing that on their own, right? So, and it's got a bit of work, and that's why we are excited about the automation that we can offer and then turn that into a turnkey problem, right? So a turnkey process. >> Luis, talk about the use cases. If I don't mind going and double down on the previous answer. You got existing services, and then there's new AI applications, AI for applications. What are the use cases with existing stuff, and the new applications that are being built? >> Yeah, I mean, existing itself is, for example, how do you do very smart search and auto completion, you know, when you are editing documents, for example. Very, very smart search of documents, summarization of tax, expanding bullets into pros in a way that, you know, don't have to spend as much human time. Just some of the existing applications, right? So some of the new ones are like truly AI native ways of producing content. Like there's a company that, you know, we share investors and love what they're doing called runwayyML, for example. It's sort of like an AI first way of editing and creating visual content, right? So you could say you have a video, you could say make this video look like, it's night as opposed to dark, or remove that dog in the corner. You can do that in a way that you couldn't do otherwise. So there's like definitely AI native use cases. And yet not only in life sciences, you know, there's quite a bit of advances on AI-based, you know, therapies and diagnostics processes that are designed using automated processes. And this is something that I feel like, we were just scratching the surface there. There's huge opportunities there, right? >> Talk about the inference and AI and production kind of angle here, because cost is a huge concern when you look at... And there's a hardware and that flexibility there. So I can see how that could help, but is there a cost freight train that can get out of control here if you don't deploy properly? Talk about the scale problem around cost in AI. >> Yeah, absolutely. So, you know, very quickly. One thing that people tend to think about is the cost is. You know, training has really high dollar amounts it tends over index on that. But what you have to think about is that for every model that's actually useful, you're going to train it once, and then run it a large number of times in inference. That means that over the lifetime of a model, the vast majority of the compute cycles and the cost are going to go to inference. And that's what we address, right? So, and to give you some idea, if you're talking about using large language model today, you know, you can say it's going to cost a couple of cents per, you know, 2,000 words output. If you have a million users active, you know, a day, you know, if you're lucky and you have that, you can, this cost can actually balloon very quickly to millions of dollars a month, just in inferencing costs. You know, assuming you know, that you actually have access to the infrastructure to run it, right? So means that if you don't pay attention to these inference costs and that's definitely going to be a surprise. And affects the economics of the product where this is embedded in, right? So this is something that, you know, if there's quite a bit of attention being put on right now on how do you do search with large language models and you don't pay attention to the economics, you know, you can have a surprise. You have to change the business model there. >> Yeah. I think that's important to call out, because you don't want it to be a runaway cost structure where you architected it wrong and then next thing you know, you got to unwind that. I mean, it's more than technical debt, it's actually real debt, it's real money. So, talk about some of the dynamics with the customers. How are they architecting this? How do they get ahead of that problem? What do you guys do specifically to solve that? >> Yeah, I mean, well, we help customers. So, it's first of all, be hyper aware, you know, understanding what's going to be the cost for them deploying the models into production and showing them the possibilities of how you can deploy the model with different cost structure, right? So that's where, you know, the ability to have hardware independence is so important because once you have hardware independence, after you optimize models, obviously, you have a new, you know, dimension of freedom to choose, you know, what is the right throughput per dollar for you. And then where, and what are the options? And once you make that decision, you want to automate the process of putting into production. So the way we help customers is showing very clearly in their use case, you know, how they can deploy their models in a much more cost-effective way. You know, when the cases... There's a case study that we put out recently, showing a 4x reduction in deployment costs, right? So this is by doing a mix optimization and choosing the right hardware. >> How do you address the concern that someone might say, Luis said, "Hey, you know, I don't want to degrade performance and latency, and I don't want the user experience to suffer." What's the answer there? >> Two things. So first of all, all of the manipulations that we do in the model is to turn the model to efficient code without changing the behavior of the models. We wouldn't degrade the experience of the user by having the model be wrong more often. And we don't change that at all. The model behaves the way it was validated for. And then the second thing is, you know, user experience with respect to latency, it's all about a maximum... Like, you could say, I want a model to run at 50 milliseconds or less. If it's much faster than 15 seconds, you're not going to notice the difference. But if it's lower, you're going to notice a difference. So the key here is that, how do you find a set of options to deploy, that you are not overshooting performance in a way that's going to lead to costs that has no additional benefits. And this provides a huge, a very significant margin of choices, set of choices that you can optimize for cost without degrading customer experience, right. End user experience. >> Yeah, and I also point out the large language models like the ChatGPTs of the world, they're coming out with Dave Moth and I were talking on this breaking analysis around, this being like, over 10X more computational intensive on capabilities. So this hardware independence is a huge thing. So, and also supply chain, some people can't get servers by the way, so, or hardware these days. >> Or even more interestingly, right? So they do not grow in trees, John. Like GPUs is not kind of stuff that you plant an orchard until you have a bunch and then you can increase it, but no, these things, you know, take a while. So, and you can't increase it overnight. So being able to live with those cycles that are available to you is not just important for all for cost, but also important for people to scale and serve more users at, you know, at whatever pace that they come, right? >> You know, it's really great to talk to you, and congratulations on OctaML. Looking forward to the startup showcase, we'll be featuring you guys there. But I want to get your personal opinion as someone in the industry and also, someone who's been in the computer science area for your career. You know, computer science has always been great, and there's more people enrolling in computer science, more diversity than ever before, but there's also more computer science related fields. How is this opening up computer science and where's AI going with the computers, with the science? Can you share your vision on, you know, the aperture, or the landscape of CompSci, or CS students, and opportunities. >> Yeah, no, absolutely. I think it's fair to say that computer has been embedded in pretty much every aspect of human life these days. Human life these days, right? So for everything. And AI has been a counterpart, it been an integral component of computer science for a while. And this medicines that happened in the last 10, 15 years in AI has shown, you know, new application has I think re-energized how people see what computers can do. And you, you know, there is this picture in our department that shows computer science at the center called the flower picture, and then all the different paddles like life sciences, social sciences, and then, you know, mechanical engineering, all these other things that, and I feel like it can replace that center with computer science. I put AI there as well, you see AI, you know touching all these applications. AI in healthcare, diagnostics. AI in discovery in the sciences, right? So, but then also AI doing things that, you know, the humans wouldn't have to do anymore. They can do better things with their brains, right? So it's permitting every single aspect of human life from intellectual endeavor to day-to-day work, right? >> Yeah. And I think the ChatGPT and OpenAI has really kind of created a mainstream view that everyone sees value in it. Like you could be in the data center, you could be in bio, you could be in healthcare. I mean, every industry sees value. So this brings up what I can call the horizontally scalable use constance. And so this opens up the conversation, what's going to change from this? Because if you go horizontally scalable, which is a cloud concept as you know, that's going to create a lot of opportunities and some shifting of how you think about architecture around data, for instance. What's your opinion on what this will do to change the inflection of the role of architecting platforms and the role of data specifically? >> Yeah, so good question. There is a lot in there, by the way, I should have added the previous question, that you can use AI to do better AI as well, which is what we do, and other folks are doing as well. And so the point I wanted to make here is that it's pretty clear that you have a cloud focus component with a nudge focused counterparts. Like you have AI models, but both in the Cloud and in the Edge, right? So the ability of being able to run your AI model where it runs best also has a data advantage to it from say, from a privacy point of view. That's inherently could say, "Hey, I want to run something, you know, locally, strictly locally, such that I don't expose the data to an infrastructure." And you know that the data never leaves you, right? Never leaves the device. Now you can imagine things that's already starting to happen, like you do some forms of training and model customization in the model architecture itself and the system architecture, such that you do this as close to the user as possible. And there's something called federated learning that has been around for some time now that's finally happening is, how do you get a data from butcher places, you do, you know, some common learning and then you send a model to the Edges, and they get refined for the final use in a way that you get the advantage of aggregating data but you don't get the disadvantage of privacy issues and so on. >> It's super exciting. >> And some of the considerations, yeah. >> It's super exciting area around data infrastructure, data science, computer science. Luis, congratulations on your success at OctaML. You're in the middle of it. And the best thing about its businesses are looking at this and really reinventing themselves and if a business isn't thinking about restructuring their business around AI, they're probably will be out of business. So this is a great time to be in the field. So thank you for sharing your insights here in theCUBE. >> Great. Thank you very much, John. Always a pleasure talking to you. Always have a lot of fun. And we both speak really fast, I can tell, you know, so. (both laughing) >> I know. We'll not the transcript available, we'll integrate it into our CubeGPT model that we have Luis. >> That's right. >> Great. >> Great. >> Great to talk to you, thank you, John. Thanks, man, bye. >> Hey, this is theCUBE. I'm John Furrier, here in Palo Alto, Cube Conversation. Thanks for watching. (gentle music)
SUMMARY :
Luis, great to see you. Great to chat with you again. introduce who you are in OctoML. And make them, you know, run. And you know, this the Just like the confluence of you know, What's the difference between now, Enables this to be, you know, And also, you know, the fusion of data So I'll say that the ability and you guys are poised for handling Even to this day, you know, and you guys are hardware independent. so they don't lag behind, you know, I point out all the time that, you know, that would, you know, fits that use case. and the new applications in a way that, you know, if you don't deploy properly? So, and to give you some idea, and then next thing you So that's where, you know, Luis said, "Hey, you know, that you can optimize for cost like the ChatGPTs of the world, that are available to you Can you share your vision on, you know, you know, the humans which is a cloud concept as you know, is that it's pretty clear that you have So thank you for sharing your I can tell, you know, so. We'll not the transcript available, Great to talk to you, I'm John Furrier, here in
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Luis Ceze | PERSON | 0.99+ |
Dave Moth | PERSON | 0.99+ |
John Torrey | PERSON | 0.99+ |
Luis | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
2,000 words | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
ORGANIZATION | 0.99+ | |
last year | DATE | 0.99+ |
OctoML | ORGANIZATION | 0.99+ |
second thing | QUANTITY | 0.99+ |
4x | QUANTITY | 0.99+ |
android | TITLE | 0.99+ |
Madrona | ORGANIZATION | 0.99+ |
Two things | QUANTITY | 0.99+ |
50 milliseconds | QUANTITY | 0.99+ |
YouTube | ORGANIZATION | 0.99+ |
five years ago | DATE | 0.98+ |
today | DATE | 0.98+ |
both | QUANTITY | 0.98+ |
OctaML | ORGANIZATION | 0.98+ |
University of Washington | ORGANIZATION | 0.98+ |
OctoML | PERSON | 0.97+ |
Android | TITLE | 0.97+ |
first | QUANTITY | 0.96+ |
15 seconds | QUANTITY | 0.96+ |
a day | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
First | QUANTITY | 0.95+ |
ChatGPT | TITLE | 0.94+ |
three | QUANTITY | 0.93+ |
over 10X | QUANTITY | 0.93+ |
OpenAI | ORGANIZATION | 0.92+ |
OctoML | TITLE | 0.91+ |
theCUBE | ORGANIZATION | 0.91+ |
about three and a half years | QUANTITY | 0.91+ |
OpenAI | TITLE | 0.9+ |
Apache | ORGANIZATION | 0.9+ |
two trends | QUANTITY | 0.88+ |
Palo Alto Studios | LOCATION | 0.86+ |
millions of dollars a month | QUANTITY | 0.86+ |
One thing | QUANTITY | 0.84+ |
a million users | QUANTITY | 0.83+ |
Two | QUANTITY | 0.83+ |
Palo Alto, | LOCATION | 0.82+ |
CubeGPT | COMMERCIAL_ITEM | 0.81+ |
re:MARS | EVENT | 0.76+ |
ChatGPT | ORGANIZATION | 0.75+ |
decades | QUANTITY | 0.72+ |
single aspect | QUANTITY | 0.68+ |
couple of cents | QUANTITY | 0.66+ |
runwayyML | TITLE | 0.64+ |
10, 15 years | QUANTITY | 0.6+ |
Cube | TITLE | 0.57+ |
once | QUANTITY | 0.52+ |
last | DATE | 0.5+ |
Conversation | EVENT | 0.49+ |
Conversation | LOCATION | 0.41+ |
Edges | TITLE | 0.38+ |
Conversation | ORGANIZATION | 0.36+ |
Cube | ORGANIZATION | 0.36+ |