Luis Ceze & Anna Connolly, OctoML | AWS Startup Showcase S3 E1
(soft music) >> Hello, everyone. Welcome to theCUBE's presentation of the AWS Startup Showcase. AI and Machine Learning: Top Startups Building Foundational Model Infrastructure. This is season 3, episode 1 of the ongoing series covering the exciting stuff from the AWS ecosystem, talking about machine learning and AI. I'm your host, John Furrier and today we are excited to be joined by Luis Ceze who's the CEO of OctoML and Anna Connolly, VP of customer success and experience OctoML. Great to have you on again, Luis. Anna, thanks for coming on. Appreciate it. >> Thank you, John. It's great to be here. >> Thanks for having us. >> I love the company. We had a CUBE conversation about this. You guys are really addressing how to run foundational models faster for less. And this is like the key theme. But before we get into it, this is a hot trend, but let's explain what you guys do. Can you set the narrative of what the company's about, why it was founded, what's your North Star and your mission? >> Yeah, so John, our mission is to make AI sustainable and accessible for everyone. And what we offer customers is, you know, a way of taking their models into production in the most efficient way possible by automating the process of getting a model and optimizing it for a variety of hardware and making cost-effective. So better, faster, cheaper model deployment. >> You know, the big trend here is AI. Everyone's seeing the ChatGPT, kind of the shot heard around the world. The BingAI and this fiasco and the ongoing experimentation. People are into it, and I think the business impact is clear. I haven't seen this in all of my career in the technology industry of this kind of inflection point. And every senior leader I talk to is rethinking about how to rebuild their business with AI because now the large language models have come in, these foundational models are here, they can see value in their data. This is a 10 year journey in the big data world. Now it's impacting that, and everyone's rebuilding their company around this idea of being AI first 'cause they see ways to eliminate things and make things more efficient. And so now they telling 'em to go do it. And they're like, what do we do? So what do you guys think? Can you explain what is this wave of AI and why is it happening, why now, and what should people pay attention to? What does it mean to them? >> Yeah, I mean, it's pretty clear by now that AI can do amazing things that captures people's imaginations. And also now can show things that are really impactful in businesses, right? So what people have the opportunity to do today is to either train their own model that adds value to their business or find open models out there that can do very valuable things to them. So the next step really is how do you take that model and put it into production in a cost-effective way so that the business can actually get value out of it, right? >> Anna, what's your take? Because customers are there, you're there to make 'em successful, you got the new secret weapon for their business. >> Yeah, I think we just see a lot of companies struggle to get from a trained model into a model that is deployed in a cost-effective way that actually makes sense for the application they're building. I think that's a huge challenge we see today, kind of across the board across all of our customers. >> Well, I see this, everyone asking the same question. I have data, I want to get value out of it. I got to get these big models, I got to train it. What's it going to cost? So I think there's a reality of, okay, I got to do it. Then no one has any visibility on what it costs. When they get into it, this is going to break the bank. So I have to ask you guys, the cost of training these models is on everyone's mind. OctoML, your company's focus on the cost side of it as well as the efficiency side of running these models in production. Why are the production costs such a concern and where specifically are people looking at it and why did it get here? >> Yeah, so training costs get a lot of attention because normally a large number, but we shouldn't forget that it's a large, typically one time upfront cost that customers pay. But, you know, when the model is put into production, the cost grows directly with model usage and you actually want your model to be used because it's adding value, right? So, you know, the question that a customer faces is, you know, they have a model, they have a trained model and now what? So how much would it cost to run in production, right? And now without the big wave in generative AI, which rightfully is getting a lot of attention because of the amazing things that it can do. It's important for us to keep in mind that generative AI models like ChatGPT are huge, expensive energy hogs. They cost a lot to run, right? And given that model usage growth directly, model cost grows directly with usage, what you want to do is make sure that once you put a model into production, you have the best cost structure possible so that you're not surprised when it's gets popular, right? So let me give you an example. So if you have a model that costs, say 1 to $2 million to train, but then it costs about one to two cents per session to use it, right? So if you have a million active users, even if they use just once a day, it's 10 to $20,000 a day to operate that model in production. And that very, very quickly, you know, get beyond what you paid to train it. >> Anna, these aren't small numbers, and it's cost to train and cost to operate, it kind of reminds me of when the cloud came around and the data center versus cloud options. Like, wait a minute, one, it costs a ton of cash to deploy, and then running it. This is kind of a similar dynamic. What are you seeing? >> Yeah, absolutely. I think we are going to see increasingly the cost and production outpacing the costs and training by a lot. I mean, people talk about training costs now because that's what they're confronting now because people are so focused on getting models performant enough to even use in an application. And now that we have them and they're that capable, we're really going to start to see production costs go up a lot. >> Yeah, Luis, if you don't mind, I know this might be a little bit of a tangent, but, you know, training's super important. I get that. That's what people are doing now, but then there's the deployment side of production. Where do people get caught up and miss the boat or misconfigure? What's the gotcha? Where's the trip wire or so to speak? Where do people mess up on the cost side? What do they do? Is it they don't think about it, they tie it to proprietary hardware? What's the issue? >> Yeah, several things, right? So without getting really technical, which, you know, I might get into, you know, you have to understand relationship between performance, you know, both in terms of latency and throughput and cost, right? So reducing latency is important because you improve responsiveness of the model. But it's really important to keep in mind that it often leads diminishing returns. Below a certain latency, making it faster won't make a measurable difference in experience, but it's going to cost a lot more. So understanding that is important. Now, if you care more about throughputs, which is the time it takes for you to, you know, units per period of time, you care about time to solution, we should think about this throughput per dollar. And understand what you want is the highest throughput per dollar, which may come at the cost of higher latency, which you're not going to care about, right? So, and the reality here, John, is that, you know, humans and especially folks in this space want to have the latest and greatest hardware. And often they commit a lot of money to get access to them and have to commit upfront before they understand the needs that their models have, right? So common mistake here, one is not spending time to understand what you really need, and then two, over-committing and using more hardware than you actually need. And not giving yourself enough freedom to get your workload to move around to the more cost-effective choice, right? So this is just a metaphoric choice. And then another thing that's important here too is making a model run faster on the hardware directly translates to lower cost, right? So, but it takes a lot of engineers, you need to think of ways of producing very efficient versions of your model for the target hardware that you're going to use. >> Anna, what's the customer angle here? Because price performance has been around for a long time, people get that, but now latency and throughput, that's key because we're starting to see this in apps. I mean, there's an end user piece. I even seeing it on the infrastructure side where they're taking a heavy lifting away from operational costs. So you got, you know, application specific to the user and/or top of the stack, and then you got actually being used in operations where they want both. >> Yeah, absolutely. Maybe I can illustrate this with a quick story with the customer that we had recently been working with. So this customer is planning to run kind of a transformer based model for tech generation at super high scale on Nvidia T4 GPU, so kind of a commodity GPU. And the scale was so high that they would've been paying hundreds of thousands of dollars in cloud costs per year just to serve this model alone. You know, one of many models in their application stack. So we worked with this team to optimize our model and then benchmark across several possible targets. So that matching the hardware that Luis was just talking about, including the newer kind of Nvidia A10 GPUs. And what they found during this process was pretty interesting. First, the team was able to shave a quarter of their spend just by using better optimization techniques on the T4, the older hardware. But actually moving to a newer GPU would allow them to serve this model in a sub two milliseconds latency, so super fast, which was able to unlock an entirely new kind of user experience. So they were able to kind of change the value they're delivering in their application just because they were able to move to this new hardware easily. So they ultimately decided to plan their deployment on the more expensive A10 because of this, but because of the hardware specific optimizations that we helped them with, they managed to even, you know, bring costs down from what they had originally planned. And so if you extend this kind of example to everything that's happening with generative AI, I think the story we just talked about was super relevant, but the scale can be even higher, you know, it can be tenfold that. We were recently conducting kind of this internal study using GPT-J as a proxy to illustrate the experience of just a company trying to use one of these large language models with an example scenario of creating a chatbot to help job seekers prepare for interviews. So if you imagine kind of a conservative usage scenario where the model generates just 3000 words per user per day, which is, you know, pretty conservative for how people are interacting with these models. It costs 5 cents a session and if you're a company and your app goes viral, so from, you know, beginning of the year there's nobody, at the end of the year there's a million daily active active users in that year alone, going from zero to a million. You'll be spending about $6 million a year, which is pretty unmanageable. That's crazy, right? >> Yeah. >> For a company or a product that's just launching. So I think, you know, for us we see the real way to make these kind of advancements accessible and sustainable, as we said is to bring down cost to serve using these techniques. >> That's a great story and I think that illustrates this idea that deployment cost can vary from situation to situation, from model to model and that the efficiency is so strong with this new wave, it eliminates heavy lifting, creates more efficiency, automates intellect. I mean, this is the trend, this is radical, this is going to increase. So the cost could go from nominal to millions, literally, potentially. So, this is what customers are doing. Yeah, that's a great story. What makes sense on a financial, is there a cost of ownership? Is there a pattern for best practice for training? What do you guys advise cuz this is a lot of time and money involved in all potential, you know, good scenarios of upside. But you can get over your skis as they say, and be successful and be out of business if you don't manage it. I mean, that's what people are talking about, right? >> Yeah, absolutely. I think, you know, we see kind of three main vectors to reduce cost. I think one is make your deployment process easier overall, so that your engineering effort to even get your app running goes down. Two, would be get more from the compute you're already paying for, you're already paying, you know, for your instances in the cloud, but can you do more with that? And then three would be shop around for lower cost hardware to match your use case. So on the first one, I think making the deployment easier overall, there's a lot of manual work that goes into benchmarking, optimizing and packaging models for deployment. And because the performance of machine learning models can be really hardware dependent, you have to go through this process for each target you want to consider running your model on. And this is hard, you know, we see that every day. But for teams who want to incorporate some of these large language models into their applications, it might be desirable because licensing a model from a large vendor like OpenAI can leave you, you know, over provision, kind of paying for capabilities you don't need in your application or can lock you into them and you lose flexibility. So we have a customer whose team actually prepares models for deployment in a SaaS application that many of us use every day. And they told us recently that without kind of an automated benchmarking and experimentation platform, they were spending several days each to benchmark a single model on a single hardware type. So this is really, you know, manually intensive and then getting more from the compute you're already paying for. We do see customers who leave money on the table by running models that haven't been optimized specifically for the hardware target they're using, like Luis was mentioning. And for some teams they just don't have the time to go through an optimization process and for others they might lack kind of specialized expertise and this is something we can bring. And then on shopping around for different hardware types, we really see a huge variation in model performance across hardware, not just CPU vs. GPU, which is, you know, what people normally think of. But across CPU vendors themselves, high memory instances and across cloud providers even. So the best strategy here is for teams to really be able to, we say, look before you leap by running real world benchmarking and not just simulations or predictions to find the best software, hardware combination for their workload. >> Yeah. You guys sound like you have a very impressive customer base deploying large language models. Where would you categorize your current customer base? And as you look out, as you guys are growing, you have new customers coming in, take me through the progression. Take me through the profile of some of your customers you have now, size, are they hyperscalers, are they big app folks, are they kicking the tires? And then as people are out there scratching heads, I got to get in this game, what's their psychology like? Are they coming in with specific problems or do they have specific orientation point of view about what they want to do? Can you share some data around what you're seeing? >> Yeah, I think, you know, we have customers that kind of range across the spectrum of sophistication from teams that basically don't have MLOps expertise in their company at all. And so they're really looking for us to kind of give a full service, how should I do everything from, you know, optimization, find the hardware, prepare for deployment. And then we have teams that, you know, maybe already have their serving and hosting infrastructure up and ready and they already have models in production and they're really just looking to, you know, take the extra juice out of the hardware and just do really specific on that optimization piece. I think one place where we're doing a lot more work now is kind of in the developer tooling, you know, model selection space. And that's kind of an area that we're creating more tools for, particularly within the PyTorch ecosystem to bring kind of this power earlier in the development cycle so that as people are grabbing a model off the shelf, they can, you know, see how it might perform and use that to inform their development process. >> Luis, what's the big, I like this idea of picking the models because isn't that like going to the market and picking the best model for your data? It's like, you know, it's like, isn't there a certain approaches? What's your view on this? 'Cause this is where everyone, I think it's going to be a land rush for this and I want to get your thoughts. >> For sure, yeah. So, you know, I guess I'll start with saying the one main takeaway that we got from the GPT-J study is that, you know, having a different understanding of what your model's compute and memory requirements are, very quickly, early on helps with the much smarter AI model deployments, right? So, and in fact, you know, Anna just touched on this, but I want to, you know, make sure that it's clear that OctoML is putting that power into user's hands right now. So in partnership with AWS, we are launching this new PyTorch native profiler that allows you with a single, you know, one line, you know, code decorator allows you to see how your code runs on a variety of different hardware after accelerations. So it gives you very clear, you know, data on how you should think about your model deployments. And this ties back to choices of models. So like, if you have a set of choices that are equally good of models in terms of functionality and you want to understand after acceleration how are you going to deploy, how much they're going to cost or what are the options using a automated process of making a decision is really, really useful. And in fact, so I think these events can get early access to this by signing up for the Octopods, you know, this is exclusive group for insiders here, so you can go to OctoML.ai/pods to sign up. >> So that Octopod, is that a program? What is that, is that access to code? Is that a beta, what is that? Explain, take a minute and explain Octopod. >> I think the Octopod would be a group of people who is interested in experiencing this functionality. So it is the friends and users of OctoML that would be the Octopod. And then yes, after you sign up, we would provide you essentially the tool in code form for you to try out in your own. I mean, part of the benefit of this is that it happens in your own local environment and you're in control of everything kind of within the workflow that developers are already using to create and begin putting these models into their applications. So it would all be within your control. >> Got it. I think the big question I have for you is when do you, when does that one of your customers know they need to call you? What's their environment look like? What are they struggling with? What are the conversations they might be having on their side of the fence? If anyone's watching this, they're like, "Hey, you know what, I've got my team, we have a lot of data. Do we have our own language model or do I use someone else's?" There's a lot of this, I will say discovery going on around what to do, what path to take, what does that customer look like, if someone's listening, when do they know to call you guys, OctoML? >> Well, I mean the most obvious one is that you have a significant spend on AI/ML, come and talk to us, you know, putting AIML into production. So that's the clear one. In fact, just this morning I was talking to someone who is in life sciences space and is having, you know, 15 to $20 million a year cloud related to AI/ML deployment is a clear, it's a pretty clear match right there, right? So that's on the cost side. But I also want to emphasize something that Anna said earlier that, you know, the hardware and software complexity involved in putting model into production is really high. So we've been able to abstract that away, offering a clean automation flow enables one, to experiment early on, you know, how models would run and get them to production. And then two, once they are into production, gives you an automated flow to continuously updating your model and taking advantage of all this acceleration and ability to run the model on the right hardware. So anyways, let's say one then is cost, you know, you have significant cost and then two, you have an automation needs. And Anna please compliment that. >> Yeah, Anna you can please- >> Yeah, I think that's exactly right. Maybe the other time is when you are expecting a big scale up in serving your application, right? You're launching a new feature, you expect to get a lot of usage or, and you want to kind of anticipate maybe your CTO, your CIO, whoever pays your cloud bills is going to come after you, right? And so they want to know, you know, what's the return on putting this model essentially into my application stack? Am I going to, is the usage going to match what I'm paying for it? And then you can understand that. >> So you guys have a lot of the early adopters, they got big data teams, they're pushed in the production, they want to get a little QA, test the waters, understand, use your technology to figure it out. Is there any cases where people have gone into production, they have to pull it out? It's like the old lemon laws with your car, you buy a car and oh my god, it's not the way I wanted it. I mean, I can imagine the early people through the wall, so to speak, in the wave here are going to be bloody in the sense that they've gone in and tried stuff and get stuck with huge bills. Are you seeing that? Are people pulling stuff out of production and redeploying? Or I can imagine that if I had a bad deployment, I'd want to refactor that or actually replatform that. Do you see that too? >> Definitely after a sticker shock, yes, your customers will come and make sure that, you know, the sticker shock won't happen again. >> Yeah. >> But then there's another more thorough aspect here that I think we likely touched on, be worth elaborating a bit more is just how are you going to scale in a way that's feasible depending on the allocation that you get, right? So as we mentioned several times here, you know, model deployment is so hardware dependent and so complex that you tend to get a model for a hardware choice and then you want to scale that specific type of instance. But what if, when you want to scale because suddenly luckily got popular and, you know, you want to scale it up and then you don't have that instance anymore. So how do you live with whatever you have at that moment is something that we see customers needing as well. You know, so in fact, ideally what we want is customers to not think about what kind of specific instances they want. What they want is to know what their models need. Say, they know the SLA and then find a set of hybrid targets and instances that hit the SLA whenever they're also scaling, they're going to scale with more freedom, right? Instead of having to wait for AWS to give them more specific allocation for a specific instance. What if you could live with other types of hardware and scale up in a more free way, right? So that's another thing that we see customers, you know, like they need more freedom to be able to scale with whatever is available. >> Anna, you touched on this with the business model impact to that 6 million cost, if that goes out of control, there's a business model aspect and there's a technical operation aspect to the cost side too. You want to be mindful of riding the wave in a good way, but not getting over your skis. So that brings up the point around, you know, confidence, right? And teamwork. Because if you're in production, there's probably a team behind it. Talk about the team aspect of your customers. I mean, they're dedicated, they go put stuff into production, they're developers, there're data. What's in it for them? Are they getting better, are they in the beach, you know, reading the book. Are they, you know, are there easy street for them? What's the customer benefit to the teams? >> Yeah, absolutely. With just a few clicks of a button, you're in production, right? That's the dream. So yeah, I mean I think that, you know, we illustrated it before a little bit. I think the automated kind of benchmarking and optimization process, like when you think about the effort it takes to get that data by hand, which is what people are doing today, they just don't do it. So they're making decisions without the best information because it's, you know, there just isn't the bandwidth to get the information that they need to make the best decision and then know exactly how to deploy it. So I think it's actually bringing kind of a new insight and capability to these teams that they didn't have before. And then maybe another aspect on the team side is that it's making the hand-off of the models from the data science teams to the model deployment teams more seamless. So we have, you know, we have seen in the past that this kind of transition point is the place where there are a lot of hiccups, right? The data science team will give a model to the production team and it'll be too slow for the application or it'll be too expensive to run and it has to go back and be changed and kind of this loop. And so, you know, with the PyTorch profiler that Luis was talking about, and then also, you know, the other ways we do optimization that kind of prevents that hand-off problem from happening. >> Luis and Anna, you guys have a great company. Final couple minutes left. Talk about the company, the people there, what's the culture like, you know, if Intel has Moore's law, which is, you know, doubling the performance in few years, what's the culture like there? Is it, you know, more throughput, better pricing? Explain what's going on with the company and put a plug in. Luis, we'll start with you. >> Yeah, absolutely. I'm extremely proud of the team that we built here. You know, we have a people first culture, you know, very, very collaborative and folks, we all have a shared mission here of making AI more accessible and sustainable. We have a very diverse team in terms of backgrounds and life stories, you know, to do what we do here, we need a team that has expertise in software engineering, in machine learning, in computer architecture. Even though we don't build chips, we need to understand how they work, right? So, and then, you know, the fact that we have this, this very really, really varied set of backgrounds makes the environment, you know, it's say very exciting to learn more about, you know, assistance end-to-end. But also makes it for a very interesting, you know, work environment, right? So people have different backgrounds, different stories. Some of them went to grad school, others, you know, were in intelligence agencies and now are working here, you know. So we have a really interesting set of people and, you know, life is too short not to work with interesting humans. You know, that's something that I like to think about, you know. >> I'm sure your off-site meetings are a lot of fun, people talking about computer architectures, silicon advances, the next GPU, the big data models coming in. Anna, what's your take? What's the culture like? What's the company vibe and what are you guys looking to do? What's the customer success pattern? What's up? >> Yeah, absolutely. I mean, I, you know, second all of the great things that Luis just said about the team. I think one that I, an additional one that I'd really like to underscore is kind of this customer obsession, to use a term you all know well. And focus on the end users and really making the experiences that we're bringing to our user who are developers really, you know, useful and valuable for them. And so I think, you know, all of these tools that we're trying to put in the hands of users, the industry and the market is changing so rapidly that our products across the board, you know, all of the companies that, you know, are part of the showcase today, we're all evolving them so quickly and we can only do that kind of really hand in glove with our users. So that would be another thing I'd emphasize. >> I think the change dynamic, the power dynamics of this industry is just the beginning. I'm very bullish that this is going to be probably one of the biggest inflection points in history of the computer industry because of all the dynamics of the confluence of all the forces, which you mentioned some of them, I mean PC, you know, interoperability within internetworking and you got, you know, the web and then mobile. Now we have this, I mean, I wouldn't even put social media even in the close to this. Like, this is like, changes user experience, changes infrastructure. There's going to be massive accelerations in performance on the hardware side from AWS's of the world and cloud and you got the edge and more data. This is really what big data was going to look like. This is the beginning. Final question, what do you guys see going forward in the future? >> Well, it's undeniable that machine learning and AI models are becoming an integral part of an interesting application today, right? So, and the clear trends here are, you know, more and more competitional needs for these models because they're only getting more and more powerful. And then two, you know, seeing the complexity of the infrastructure where they run, you know, just considering the cloud, there's like a wide variety of choices there, right? So being able to live with that and making the most out of it in a way that does not require, you know, an impossible to find team is something that's pretty clear. So the need for automation, abstracting with the complexity is definitely here. And we are seeing this, you know, trends are that you also see models starting to move to the edge as well. So it's clear that we're seeing, we are going to live in a world where there's no large models living in the cloud. And then, you know, edge models that talk to these models in the cloud to form, you know, an end-to-end truly intelligent application. >> Anna? >> Yeah, I think, you know, our, Luis said it at the beginning. Our vision is to make AI sustainable and accessible. And I think as this technology just expands in every company and every team, that's going to happen kind of on its own. And we're here to help support that. And I think you can't do that without tools like those like OctoML. >> I think it's going to be an error of massive invention, creativity, a lot of the format heavy lifting is going to allow the talented people to automate their intellect. I mean, this is really kind of what we see going on. And Luis, thank you so much. Anna, thanks for coming on this segment. Thanks for coming on theCUBE and being part of the AWS Startup Showcase. I'm John Furrier, your host. Thanks for watching. (upbeat music)
SUMMARY :
Great to have you on again, Luis. It's great to be here. but let's explain what you guys do. And what we offer customers is, you know, So what do you guys think? so that the business you got the new secret kind of across the board So I have to ask you guys, And that very, very quickly, you know, and the data center versus cloud options. And now that we have them but, you know, training's super important. John, is that, you know, humans and then you got actually managed to even, you know, So I think, you know, for us we see in all potential, you know, And this is hard, you know, And as you look out, as And then we have teams that, you know, and picking the best model for your data? from the GPT-J study is that, you know, What is that, is that access to code? And then yes, after you sign up, to call you guys, OctoML? come and talk to us, you know, And so they want to know, you know, So you guys have a lot make sure that, you know, we see customers, you know, What's the customer benefit to the teams? and then also, you know, what's the culture like, you know, So, and then, you know, and what are you guys looking to do? all of the companies that, you know, I mean PC, you know, in the cloud to form, you know, And I think you can't And Luis, thank you so much.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Anna | PERSON | 0.99+ |
Anna Connolly | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Luis | PERSON | 0.99+ |
Luis Ceze | PERSON | 0.99+ |
John | PERSON | 0.99+ |
1 | QUANTITY | 0.99+ |
10 | QUANTITY | 0.99+ |
15 | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
10 year | QUANTITY | 0.99+ |
6 million | QUANTITY | 0.99+ |
zero | QUANTITY | 0.99+ |
Intel | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
First | QUANTITY | 0.99+ |
OctoML | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
millions | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Two | QUANTITY | 0.99+ |
$2 million | QUANTITY | 0.98+ |
3000 words | QUANTITY | 0.98+ |
one line | QUANTITY | 0.98+ |
A10 | COMMERCIAL_ITEM | 0.98+ |
OctoML | TITLE | 0.98+ |
one | QUANTITY | 0.98+ |
three main vectors | QUANTITY | 0.97+ |
hundreds of thousands of dollars | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
CUBE | ORGANIZATION | 0.97+ |
T4 | COMMERCIAL_ITEM | 0.97+ |
one time | QUANTITY | 0.97+ |
first one | QUANTITY | 0.96+ |
two cents | QUANTITY | 0.96+ |
GPT-J | ORGANIZATION | 0.96+ |
single model | QUANTITY | 0.95+ |
a minute | QUANTITY | 0.95+ |
about $6 million a year | QUANTITY | 0.95+ |
once a day | QUANTITY | 0.95+ |
$20,000 a day | QUANTITY | 0.95+ |
a million | QUANTITY | 0.94+ |
theCUBE | ORGANIZATION | 0.93+ |
Octopod | TITLE | 0.93+ |
this morning | DATE | 0.93+ |
first culture | QUANTITY | 0.92+ |
$20 million a year | QUANTITY | 0.92+ |
AWS Startup Showcase | EVENT | 0.9+ |
North Star | ORGANIZATION | 0.9+ |
Luis Ceze, OctoML | Cube Conversation
(gentle music) >> Hello, everyone. Welcome to this Cube Conversation. I'm John Furrier, host of theCUBE here, in our Palo Alto Studios. We're featuring OctoML. I'm with the CEO, Luis Ceze. Chief Executive Officer, Co-founder of OctoML. I'm John Furrier of theCUBE. Thanks for joining us today. Luis, great to see you. Last time we spoke was at "re:MARS" Amazon's event. Kind of a joint event between (indistinct) and Amazon, kind of put a lot together. Great to see you. >> Great to see you again, John. I really have good memories of that interview. You know, that was definitely a great time. Great to chat with you again. >> The world of ML and AI, machine learning and AI is really hot. Everyone's talking about it. It's really great to see that advance. So I'm looking forward to this conversation but before we get started, introduce who you are in OctoML. >> Sure. I'm Luis Ceze, Co-founder and CEO at OctoML. I'm also professor of Computer Science at University of Washington. You know, OctoML grew out of our efforts on the Apache CVM project, which is a compiler in runtime system that enables folks to run machine learning models in a broad set of harder in the Edge and in the Cloud very efficiently. You know, we grew that project and grew that community, definitely saw there was something to pain point there. And then we built OctoML, OctoML is about three and a half years old now. And the mission, the company is to enable customers to deploy models very efficiently in the Cloud. And make them, you know, run. Do it quickly, run fast, and run at a low cost, which is something that's especially timely right now. >> I like to point out also for the folks 'casue they should know that you're also a professor in the Computer Science department at University of Washington. A great program there. This is a really an inflection point with AI machine learning. The computer science industry has been waiting for decades to advance AI with all this new cloud computing, all the hardware and silicon advancements, GPUs. This is the perfect storm. And you know, this the computer science now we we're seeing an acceleration. Can you share your view, and you're obviously a professor in that department but also, an entrepreneur. This is a great time for computer science. Explain why. >> Absolutely, yeah, no. Just like the confluence of you know, advances in what, you know, computers can do as devices to computer information. Plus, you know, advances in AI that enable applications that you know, we thought it was highly futuristic and now it's just right there today. You know, AI that can generate photo realistic images from descriptions, you know, can write text that's pretty good. Can help augment, you know, human creativity in a really meaningful way. So you see the confluence of capabilities and the creativity of humankind into new applications is just extremely exciting, both from a researcher point of view as well as an entrepreneur point of view, right. >> What should people know about these large language models we're seeing with ChatGPT and how Google has got a lot of work going on that air. There's been a lot of work recently. What's different now about these models, and why are they so popular and effective now? What's the difference between now, and say five years ago, that makes it more- >> Oh, yeah. It's a huge inflection on their capabilities, I always say like emergent behavior, right? So as these models got more complex and our ability to train and deploy them, you know, got to this point... You know, they really crossed a threshold into doing things that are truly surprising, right? In terms of generating, you know, exhalation for things generating tax, summarizing tax, expending tax. And you know, exhibiting what to some may look like reasoning. They're not quite reasoning fundamentally. They're generating tax that looks like they're reasoning, but they do it so well, that it feels like was done by a human, right. So I would say that the biggest changes that, you know, now, they can actually do things that are extremely useful for business in people's lives today. And that wasn't the case five years ago. So that's in the model capabilities and that is being paired with huge advances in computing that enabled this to be... Enables this to be, you know, actually see line of sites to be deployed at scale, right. And that's where we come in, by the way, but yeah. >> Yeah, I want to get into that. And also, you know, the fusion of data integrating data sets at scales. Another one we're seeing a lot of happening now. It's not just some, you know, siloed, pre-built data modeling. It's a lot of agility and a lot of new integration capabilities of data. How is that impacting the dynamics? >> Yeah, absolutely. So I'll say that the ability to either take the data that has that exists in training a model to do something useful with it, and more interestingly I would say, using baseline foundational models and with a little bit of data, turn them into something that can do a specialized task really, really well. Created this really fast proliferation of really impactful applications, right? >> If every company now is looking at this trend and I'm seeing a lot... And I think every company will rebuild their business with machine learning. If they're not already doing it. And the folks that aren't will probably be dinosaurs will be out of business. This is a real business transformation moment where machine learning and AI, as it goes mainstream. I think it's just the beginning. This is where you guys come in, and you guys are poised for handling this frenzy to change business with machine learning models. How do you guys help customers as they look at this, you know, transition to get, you know, concept to production with machine learning? >> Great. Great questions, yeah, so I would say that it's fair to say there's a bunch of models out there that can do useful things right off the box, right? So and also, the ability to create models improved quite a bit. So the challenge now shifted to customers, you know. Everyone is looking to incorporating AI into their applications. So what we do for them is to, first of all, how do you do that quickly, without needing highly specialized, difficult to find engineering? And very importantly, how do you do that at cost that's accessible, right? So all of these fantastic models that we just talked about, they use an amount of computing that's just astronomical compared to anything else we've done in the past. It means the costs that come with it, are also very, very high. So it's important to enable customers to, you know, incorporate AI into their applications, to their use cases in a way that they can do, with the people that they have, and the costs that they can afford, such that they can have, you know, the maximum impacting possibly have. And finally, you know, helping them deal with hardware availability, as you know, even though we made a lot of progress in making computing cheaper and cheaper. Even to this day, you know, you can never get enough. And getting an allocation, getting the right hardware to run these incredibly hungry models is hard. And we help customers deal with, you know, harder availability as well. >> Yeah, for the folks watching as a... If you search YouTube, there's an interview we did last year at "re:MARS," I mentioned that earlier, just a great interview. You talked about this hardware independence, this traction. I want to get into that, because if you look at all the foundation models that are out there right now, that are getting traction, you're seeing two trends. You're seeing proprietary and open source. And obviously, open source always wins in my opinion, but, you know, there's this iPhone moment and android moment that one of your investors John Torrey from Madrona, talked about was is iPhone versus Android moment, you know, one's proprietary hardware and they're very specialized high performance and then open source. This is an important distinction and you guys are hardware independent. What's the... Explain what all this means. >> Yeah. Great set of questions. First of all, yeah. So, you know, OpenAI, and of course, they create ChatGPT and they offer an API to run these models that does amazing things. But customers have to be able to go and send their data over to OpenAI, right? So, and run the model there and get the outputs. Now, there's open source models that can do amazing things as well, right? So they typically open source models, so they don't lag behind, you know, these proprietary closed models by more than say, you know, six months or so, let's say. And it means that enabling customers to take the models that they want and deploy under their control is something that's very valuable, because one, you don't have to expose your data to externally. Two, you can customize the model even more to the things that you wanted to do. And then three, you can run on an infrastructure that can be much more cost effective than having to, you know, pay somebody else's, you know, cost and markup, right? So, and where we help them is essentially help customers, enable customers to take machine learning models, say an open source model, and automate the process of putting them into production, optimize them to run with the right performance, and more importantly, give them the independence to run where they need to run, where they can run best, right? >> Yeah, and also, you know, I point out all the time that, you know, there's never any stopping the innovation of hardware silicon. You're seeing cloud computing more coming in there. So, you know, being hardware independent has some advantages. And if you look at OpenAI, for instance, you mentioned ChatGPT, I think this is interesting because I think everyone is scratching their head, going, "Okay, I need to move to this new generation." What's your pro tip and advice for folks who want to move to, or businesses that want to say move to machine learning? How do they get started? What are some of the considerations they need to think about to deploy these models into production? >> Yeah, great though. Great set of questions. First of all, I mean, I'm sure they're very aware of the kind of things that you want to do with AI, right? So you could be interacting with customers, you know, automating, interacting with customers. It could be, you know, finding issues in production lines. It could be, you know... Generating, you know, making it easier to produce content and so on. Like, you know, customers, users would have an idea what they want to do. You know, from that it can actually determine, what kind of machine learning models would solve the problem that would, you know, fits that use case. But then, that's when the hard thing begins, right? So when you find a model, identify the model that can do the thing that you wanted to do, you need to turn that into a thing that you can deploy. So how do you go from machine learning model that does a thing that you need to do, to a container with the right executor, the artifact they can actually go and deploy, right? So we've seen customers doing that on their own, right? So, and it's got a bit of work, and that's why we are excited about the automation that we can offer and then turn that into a turnkey problem, right? So a turnkey process. >> Luis, talk about the use cases. If I don't mind going and double down on the previous answer. You got existing services, and then there's new AI applications, AI for applications. What are the use cases with existing stuff, and the new applications that are being built? >> Yeah, I mean, existing itself is, for example, how do you do very smart search and auto completion, you know, when you are editing documents, for example. Very, very smart search of documents, summarization of tax, expanding bullets into pros in a way that, you know, don't have to spend as much human time. Just some of the existing applications, right? So some of the new ones are like truly AI native ways of producing content. Like there's a company that, you know, we share investors and love what they're doing called runwayyML, for example. It's sort of like an AI first way of editing and creating visual content, right? So you could say you have a video, you could say make this video look like, it's night as opposed to dark, or remove that dog in the corner. You can do that in a way that you couldn't do otherwise. So there's like definitely AI native use cases. And yet not only in life sciences, you know, there's quite a bit of advances on AI-based, you know, therapies and diagnostics processes that are designed using automated processes. And this is something that I feel like, we were just scratching the surface there. There's huge opportunities there, right? >> Talk about the inference and AI and production kind of angle here, because cost is a huge concern when you look at... And there's a hardware and that flexibility there. So I can see how that could help, but is there a cost freight train that can get out of control here if you don't deploy properly? Talk about the scale problem around cost in AI. >> Yeah, absolutely. So, you know, very quickly. One thing that people tend to think about is the cost is. You know, training has really high dollar amounts it tends over index on that. But what you have to think about is that for every model that's actually useful, you're going to train it once, and then run it a large number of times in inference. That means that over the lifetime of a model, the vast majority of the compute cycles and the cost are going to go to inference. And that's what we address, right? So, and to give you some idea, if you're talking about using large language model today, you know, you can say it's going to cost a couple of cents per, you know, 2,000 words output. If you have a million users active, you know, a day, you know, if you're lucky and you have that, you can, this cost can actually balloon very quickly to millions of dollars a month, just in inferencing costs. You know, assuming you know, that you actually have access to the infrastructure to run it, right? So means that if you don't pay attention to these inference costs and that's definitely going to be a surprise. And affects the economics of the product where this is embedded in, right? So this is something that, you know, if there's quite a bit of attention being put on right now on how do you do search with large language models and you don't pay attention to the economics, you know, you can have a surprise. You have to change the business model there. >> Yeah. I think that's important to call out, because you don't want it to be a runaway cost structure where you architected it wrong and then next thing you know, you got to unwind that. I mean, it's more than technical debt, it's actually real debt, it's real money. So, talk about some of the dynamics with the customers. How are they architecting this? How do they get ahead of that problem? What do you guys do specifically to solve that? >> Yeah, I mean, well, we help customers. So, it's first of all, be hyper aware, you know, understanding what's going to be the cost for them deploying the models into production and showing them the possibilities of how you can deploy the model with different cost structure, right? So that's where, you know, the ability to have hardware independence is so important because once you have hardware independence, after you optimize models, obviously, you have a new, you know, dimension of freedom to choose, you know, what is the right throughput per dollar for you. And then where, and what are the options? And once you make that decision, you want to automate the process of putting into production. So the way we help customers is showing very clearly in their use case, you know, how they can deploy their models in a much more cost-effective way. You know, when the cases... There's a case study that we put out recently, showing a 4x reduction in deployment costs, right? So this is by doing a mix optimization and choosing the right hardware. >> How do you address the concern that someone might say, Luis said, "Hey, you know, I don't want to degrade performance and latency, and I don't want the user experience to suffer." What's the answer there? >> Two things. So first of all, all of the manipulations that we do in the model is to turn the model to efficient code without changing the behavior of the models. We wouldn't degrade the experience of the user by having the model be wrong more often. And we don't change that at all. The model behaves the way it was validated for. And then the second thing is, you know, user experience with respect to latency, it's all about a maximum... Like, you could say, I want a model to run at 50 milliseconds or less. If it's much faster than 15 seconds, you're not going to notice the difference. But if it's lower, you're going to notice a difference. So the key here is that, how do you find a set of options to deploy, that you are not overshooting performance in a way that's going to lead to costs that has no additional benefits. And this provides a huge, a very significant margin of choices, set of choices that you can optimize for cost without degrading customer experience, right. End user experience. >> Yeah, and I also point out the large language models like the ChatGPTs of the world, they're coming out with Dave Moth and I were talking on this breaking analysis around, this being like, over 10X more computational intensive on capabilities. So this hardware independence is a huge thing. So, and also supply chain, some people can't get servers by the way, so, or hardware these days. >> Or even more interestingly, right? So they do not grow in trees, John. Like GPUs is not kind of stuff that you plant an orchard until you have a bunch and then you can increase it, but no, these things, you know, take a while. So, and you can't increase it overnight. So being able to live with those cycles that are available to you is not just important for all for cost, but also important for people to scale and serve more users at, you know, at whatever pace that they come, right? >> You know, it's really great to talk to you, and congratulations on OctaML. Looking forward to the startup showcase, we'll be featuring you guys there. But I want to get your personal opinion as someone in the industry and also, someone who's been in the computer science area for your career. You know, computer science has always been great, and there's more people enrolling in computer science, more diversity than ever before, but there's also more computer science related fields. How is this opening up computer science and where's AI going with the computers, with the science? Can you share your vision on, you know, the aperture, or the landscape of CompSci, or CS students, and opportunities. >> Yeah, no, absolutely. I think it's fair to say that computer has been embedded in pretty much every aspect of human life these days. Human life these days, right? So for everything. And AI has been a counterpart, it been an integral component of computer science for a while. And this medicines that happened in the last 10, 15 years in AI has shown, you know, new application has I think re-energized how people see what computers can do. And you, you know, there is this picture in our department that shows computer science at the center called the flower picture, and then all the different paddles like life sciences, social sciences, and then, you know, mechanical engineering, all these other things that, and I feel like it can replace that center with computer science. I put AI there as well, you see AI, you know touching all these applications. AI in healthcare, diagnostics. AI in discovery in the sciences, right? So, but then also AI doing things that, you know, the humans wouldn't have to do anymore. They can do better things with their brains, right? So it's permitting every single aspect of human life from intellectual endeavor to day-to-day work, right? >> Yeah. And I think the ChatGPT and OpenAI has really kind of created a mainstream view that everyone sees value in it. Like you could be in the data center, you could be in bio, you could be in healthcare. I mean, every industry sees value. So this brings up what I can call the horizontally scalable use constance. And so this opens up the conversation, what's going to change from this? Because if you go horizontally scalable, which is a cloud concept as you know, that's going to create a lot of opportunities and some shifting of how you think about architecture around data, for instance. What's your opinion on what this will do to change the inflection of the role of architecting platforms and the role of data specifically? >> Yeah, so good question. There is a lot in there, by the way, I should have added the previous question, that you can use AI to do better AI as well, which is what we do, and other folks are doing as well. And so the point I wanted to make here is that it's pretty clear that you have a cloud focus component with a nudge focused counterparts. Like you have AI models, but both in the Cloud and in the Edge, right? So the ability of being able to run your AI model where it runs best also has a data advantage to it from say, from a privacy point of view. That's inherently could say, "Hey, I want to run something, you know, locally, strictly locally, such that I don't expose the data to an infrastructure." And you know that the data never leaves you, right? Never leaves the device. Now you can imagine things that's already starting to happen, like you do some forms of training and model customization in the model architecture itself and the system architecture, such that you do this as close to the user as possible. And there's something called federated learning that has been around for some time now that's finally happening is, how do you get a data from butcher places, you do, you know, some common learning and then you send a model to the Edges, and they get refined for the final use in a way that you get the advantage of aggregating data but you don't get the disadvantage of privacy issues and so on. >> It's super exciting. >> And some of the considerations, yeah. >> It's super exciting area around data infrastructure, data science, computer science. Luis, congratulations on your success at OctaML. You're in the middle of it. And the best thing about its businesses are looking at this and really reinventing themselves and if a business isn't thinking about restructuring their business around AI, they're probably will be out of business. So this is a great time to be in the field. So thank you for sharing your insights here in theCUBE. >> Great. Thank you very much, John. Always a pleasure talking to you. Always have a lot of fun. And we both speak really fast, I can tell, you know, so. (both laughing) >> I know. We'll not the transcript available, we'll integrate it into our CubeGPT model that we have Luis. >> That's right. >> Great. >> Great. >> Great to talk to you, thank you, John. Thanks, man, bye. >> Hey, this is theCUBE. I'm John Furrier, here in Palo Alto, Cube Conversation. Thanks for watching. (gentle music)
SUMMARY :
Luis, great to see you. Great to chat with you again. introduce who you are in OctoML. And make them, you know, run. And you know, this the Just like the confluence of you know, What's the difference between now, Enables this to be, you know, And also, you know, the fusion of data So I'll say that the ability and you guys are poised for handling Even to this day, you know, and you guys are hardware independent. so they don't lag behind, you know, I point out all the time that, you know, that would, you know, fits that use case. and the new applications in a way that, you know, if you don't deploy properly? So, and to give you some idea, and then next thing you So that's where, you know, Luis said, "Hey, you know, that you can optimize for cost like the ChatGPTs of the world, that are available to you Can you share your vision on, you know, you know, the humans which is a cloud concept as you know, is that it's pretty clear that you have So thank you for sharing your I can tell, you know, so. We'll not the transcript available, Great to talk to you, I'm John Furrier, here in
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
John | PERSON | 0.99+ |
Luis Ceze | PERSON | 0.99+ |
Dave Moth | PERSON | 0.99+ |
John Torrey | PERSON | 0.99+ |
Luis | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
2,000 words | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
ORGANIZATION | 0.99+ | |
last year | DATE | 0.99+ |
OctoML | ORGANIZATION | 0.99+ |
second thing | QUANTITY | 0.99+ |
4x | QUANTITY | 0.99+ |
android | TITLE | 0.99+ |
Madrona | ORGANIZATION | 0.99+ |
Two things | QUANTITY | 0.99+ |
50 milliseconds | QUANTITY | 0.99+ |
YouTube | ORGANIZATION | 0.99+ |
five years ago | DATE | 0.98+ |
today | DATE | 0.98+ |
both | QUANTITY | 0.98+ |
OctaML | ORGANIZATION | 0.98+ |
University of Washington | ORGANIZATION | 0.98+ |
OctoML | PERSON | 0.97+ |
Android | TITLE | 0.97+ |
first | QUANTITY | 0.96+ |
15 seconds | QUANTITY | 0.96+ |
a day | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
First | QUANTITY | 0.95+ |
ChatGPT | TITLE | 0.94+ |
three | QUANTITY | 0.93+ |
over 10X | QUANTITY | 0.93+ |
OpenAI | ORGANIZATION | 0.92+ |
OctoML | TITLE | 0.91+ |
theCUBE | ORGANIZATION | 0.91+ |
about three and a half years | QUANTITY | 0.91+ |
OpenAI | TITLE | 0.9+ |
Apache | ORGANIZATION | 0.9+ |
two trends | QUANTITY | 0.88+ |
Palo Alto Studios | LOCATION | 0.86+ |
millions of dollars a month | QUANTITY | 0.86+ |
One thing | QUANTITY | 0.84+ |
a million users | QUANTITY | 0.83+ |
Two | QUANTITY | 0.83+ |
Palo Alto, | LOCATION | 0.82+ |
CubeGPT | COMMERCIAL_ITEM | 0.81+ |
re:MARS | EVENT | 0.76+ |
ChatGPT | ORGANIZATION | 0.75+ |
decades | QUANTITY | 0.72+ |
single aspect | QUANTITY | 0.68+ |
couple of cents | QUANTITY | 0.66+ |
runwayyML | TITLE | 0.64+ |
10, 15 years | QUANTITY | 0.6+ |
Cube | TITLE | 0.57+ |
once | QUANTITY | 0.52+ |
last | DATE | 0.5+ |
Conversation | EVENT | 0.49+ |
Conversation | LOCATION | 0.41+ |
Edges | TITLE | 0.38+ |
Conversation | ORGANIZATION | 0.36+ |
Cube | ORGANIZATION | 0.36+ |
Luis Ceze, OctoML | Amazon re:MARS 2022
(upbeat music) >> Welcome back, everyone, to theCUBE's coverage here live on the floor at AWS re:MARS 2022. I'm John Furrier, host for theCUBE. Great event, machine learning, automation, robotics, space, that's MARS. It's part of the re-series of events, re:Invent's the big event at the end of the year, re:Inforce, security, re:MARS, really intersection of the future of space, industrial, automation, which is very heavily DevOps machine learning, of course, machine learning, which is AI. We have Luis Ceze here, who's the CEO co-founder of OctoML. Welcome to theCUBE. >> Thank you very much for having me in the show, John. >> So we've been following you guys. You guys are a growing startup funded by Madrona Venture Capital, one of your backers. You guys are here at the show. This is a, I would say small show relative what it's going to be, but a lot of robotics, a lot of space, a lot of industrial kind of edge, but machine learning is the centerpiece of this trend. You guys are in the middle of it. Tell us your story. >> Absolutely, yeah. So our mission is to make machine learning sustainable and accessible to everyone. So I say sustainable because it means we're going to make it faster and more efficient. You know, use less human effort, and accessible to everyone, accessible to as many developers as possible, and also accessible in any device. So, we started from an open source project that began at University of Washington, where I'm a professor there. And several of the co-founders were PhD students there. We started with this open source project called Apache TVM that had actually contributions and collaborations from Amazon and a bunch of other big tech companies. And that allows you to get a machine learning model and run on any hardware, like run on CPUs, GPUs, various GPUs, accelerators, and so on. It was the kernel of our company and the project's been around for about six years or so. Company is about three years old. And we grew from Apache TVM into a whole platform that essentially supports any model on any hardware cloud and edge. >> So is the thesis that, when it first started, that you want to be agnostic on platform? >> Agnostic on hardware, that's right. >> Hardware, hardware. >> Yeah. >> What was it like back then? What kind of hardware were you talking about back then? Cause a lot's changed, certainly on the silicon side. >> Luis: Absolutely, yeah. >> So take me through the journey, 'cause I could see the progression. I'm connecting the dots here. >> So once upon a time, yeah, no... (both chuckling) >> I walked in the snow with my bare feet. >> You have to be careful because if you wake up the professor in me, then you're going to be here for two hours, you know. >> Fast forward. >> The average version here is that, clearly machine learning has shown to actually solve real interesting, high value problems. And where machine learning runs in the end, it becomes code that runs on different hardware, right? And when we started Apache TVM, which stands for tensor virtual machine, at that time it was just beginning to start using GPUs for machine learning, we already saw that, with a bunch of machine learning models popping up and CPUs and GPU's starting to be used for machine learning, it was clear that it come opportunity to run on everywhere. >> And GPU's were coming fast. >> GPUs were coming and huge diversity of CPUs, of GPU's and accelerators now, and the ecosystem and the system software that maps models to hardware is still very fragmented today. So hardware vendors have their own specific stacks. So Nvidia has its own software stack, and so does Intel, AMD. And honestly, I mean, I hope I'm not being, you know, too controversial here to say that it kind of of looks like the mainframe era. We had tight coupling between hardware and software. You know, if you bought IBM hardware, you had to buy IBM OS and IBM database, IBM applications, it all tightly coupled. And if you want to use IBM software, you had to buy IBM hardware. So that's kind of like what machine learning systems look like today. If you buy a certain big name GPU, you've got to use their software. Even if you use their software, which is pretty good, you have to buy their GPUs, right? So, but you know, we wanted to help peel away the model and the software infrastructure from the hardware to give people choice, ability to run the models where it best suit them. Right? So that includes picking the best instance in the cloud, that's going to give you the right, you know, cost properties, performance properties, or might want to run it on the edge. You might run it on an accelerator. >> What year was that roughly, when you were going this? >> We started that project in 2015, 2016 >> Yeah. So that was pre-conventional wisdom. I think TensorFlow wasn't even around yet. >> Luis: No, it wasn't. >> It was, I'm thinking like 2017 or so. >> Luis: Right. So that was the beginning of, okay, this is opportunity. AWS, I don't think they had released some of the nitro stuff that the Hamilton was working on. So, they were already kind of going that way. It's kind of like converging. >> Luis: Yeah. >> The space was happening, exploding. >> Right. And the way that was dealt with, and to this day, you know, to a large extent as well is by backing machine learning models with a bunch of hardware specific libraries. And we were some of the first ones to say, like, know what, let's take a compilation approach, take a model and compile it to very efficient code for that specific hardware. And what underpins all of that is using machine learning for machine learning code optimization. Right? But it was way back when. We can talk about where we are today. >> No, let's fast forward. >> That's the beginning of the open source project. >> But that was a fundamental belief, worldview there. I mean, you have a world real view that was logical when you compare to the mainframe, but not obvious to the machine learning community. Okay, good call, check. Now let's fast forward, okay. Evolution, we'll go through the speed of the years. More chips are coming, you got GPUs, and seeing what's going on in AWS. Wow! Now it's booming. Now I got unlimited processors, I got silicon on chips, I got, everywhere >> Yeah. And what's interesting is that the ecosystem got even more complex, in fact. Because now you have, there's a cross product between machine learning models, frameworks like TensorFlow, PyTorch, Keras, and like that and so on, and then hardware targets. So how do you navigate that? What we want here, our vision is to say, folks should focus, people should focus on making the machine learning models do what they want to do that solves a value, like solves a problem of high value to them. Right? So another deployment should be completely automatic. Today, it's very, very manual to a large extent. So once you're serious about deploying machine learning model, you got a good understanding where you're going to deploy it, how you're going to deploy it, and then, you know, pick out the right libraries and compilers, and we automated the whole thing in our platform. This is why you see the tagline, the booth is right there, like bringing DevOps agility for machine learning, because our mission is to make that fully transparent. >> Well, I think that, first of all, I use that line here, cause I'm looking at it here on live on camera. People can't see, but it's like, I use it on a couple couple of my interviews because the word agility is very interesting because that's kind of the test on any kind of approach these days. Agility could be, and I talked to the robotics guys, just having their product be more agile. I talked to Pepsi here just before you came on, they had this large scale data environment because they built an architecture, but that fostered agility. So again, this is an architectural concept, it's a systems' view of agility being the output, and removing dependencies, which I think what you guys were trying to do. >> Only part of what we do. Right? So agility means a bunch of things. First, you know-- >> Yeah explain. >> Today it takes a couple months to get a model from, when the model's ready, to production, why not turn that in two hours. Agile, literally, physically agile, in terms of walk off time. Right? And then the other thing is give you flexibility to choose where your model should run. So, in our deployment, between the demo and the platform expansion that we announced yesterday, you know, we give the ability of getting your model and, you know, get it compiled, get it optimized for any instance in the cloud and automatically move it around. Today, that's not the case. You have to pick one instance and that's what you do. And then you might auto scale with that one instance. So we give the agility of actually running and scaling the model the way you want, and the way it gives you the right SLAs. >> Yeah, I think Swami was mentioning that, not specifically that use case for you, but that use case generally, that scale being moving things around, making them faster, not having to do that integration work. >> Scale, and run the models where they need to run. Like some day you want to have a large scale deployment in the cloud. You're going to have models in the edge for various reasons because speed of light is limited. We cannot make lights faster. So, you know, got to have some, that's a physics there you cannot change. There's privacy reasons. You want to keep data locally, not send it around to run the model locally. So anyways, and giving the flexibility. >> Let me jump in real quick. I want to ask this specific question because you made me think of something. So we're just having a data mesh conversation. And one of the comments that's come out of a few of these data as code conversations is data's the product now. So if you can move data to the edge, which everyone's talking about, you know, why move data if you don't have to, but I can move a machine learning algorithm to the edge. Cause it's costly to move data. I can move computer, everyone knows that. But now I can move machine learning to anywhere else and not worry about integrating on the fly. So the model is the code. >> It is the product. >> Yeah. And since you said, the model is the code, okay, now we're talking even more here. So machine learning models today are not treated as code, by the way. So do not have any of the typical properties of code that you can, whenever you write a piece of code, you run a code, you don't know, you don't even think what is a CPU, we don't think where it runs, what kind of CPU it runs, what kind of instance it runs. But with machine learning model, you do. So what we are doing and created this fully transparent automated way of allowing you to treat your machine learning models if you were a regular function that you call and then a function could run anywhere. >> Yeah. >> Right. >> That's why-- >> That's better. >> Bringing DevOps agility-- >> That's better. >> Yeah. And you can use existing-- >> That's better, because I can run it on the Artemis too, in space. >> You could, yeah. >> If they have the hardware. (both laugh) >> And that allows you to run your existing, continue to use your existing DevOps infrastructure and your existing people. >> So I have to ask you, cause since you're a professor, this is like a masterclass on theCube. Thank you for coming on. Professor. (Luis laughing) I'm a hardware guy. I'm building hardware for Boston Dynamics, Spot, the dog, that's the diversity in hardware, it's tends to be purpose driven. I got a spaceship, I'm going to have hardware on there. >> Luis: Right. >> It's generally viewed in the community here, that everyone I talk to and other communities, open source is going to drive all software. That's a check. But the scale and integration is super important. And they're also recognizing that hardware is really about the software. And they even said on stage, here. Hardware is not about the hardware, it's about the software. So if you believe that to be true, then your model checks all the boxes. Are people getting this? >> I think they're starting to. Here is why, right. A lot of companies that were hardware first, that thought about software too late, aren't making it. Right? There's a large number of hardware companies, AI chip companies that aren't making it. Probably some of them that won't make it, unfortunately just because they started thinking about software too late. I'm so glad to see a lot of the early, I hope I'm not just doing our own horn here, but Apache TVM, the infrastructure that we built to map models to different hardware, it's very flexible. So we see a lot of emerging chip companies like SiMa.ai's been doing fantastic work, and they use Apache TVM to map algorithms to their hardware. And there's a bunch of others that are also using Apache TVM. That's because you have, you know, an opening infrastructure that keeps it up to date with all the machine learning frameworks and models and allows you to extend to the chips that you want. So these companies pay attention that early, gives them a much higher fighting chance, I'd say. >> Well, first of all, not only are you backable by the VCs cause you have pedigree, you're a professor, you're smart, and you get good recruiting-- >> Luis: I don't know about the smart part. >> And you get good recruiting for PhDs out of University of Washington, which is not too shabby computer science department. But they want to make money. The VCs want to make money. >> Right. >> So you have to make money. So what's the pitch? What's the business model? >> Yeah. Absolutely. >> Share us what you're thinking there. >> Yeah. The value of using our solution is shorter time to value for your model from months to hours. Second, you shrink operator, op-packs, because you don't need a specialized expensive team. Talk about expensive, expensive engineers who can understand machine learning hardware and software engineering to deploy models. You don't need those teams if you use this automated solution, right? Then you reduce that. And also, in the process of actually getting a model and getting specialized to the hardware, making hardware aware, we're talking about a very significant performance improvement that leads to lower cost of deployment in the cloud. We're talking about very significant reduction in costs in cloud deployment. And also enabling new applications on the edge that weren't possible before. It creates, you know, latent value opportunities. Right? So, that's the high level value pitch. But how do we make money? Well, we charge for access to the platform. Right? >> Usage. Consumption. >> Yeah, and value based. Yeah, so it's consumption and value based. So depends on the scale of the deployment. If you're going to deploy machine learning model at a larger scale, chances are that it produces a lot of value. So then we'll capture some of that value in our pricing scale. >> So, you have direct sales force then to work those deals. >> Exactly. >> Got it. How many customers do you have? Just curious. >> So we started, the SaaS platform just launched now. So we started onboarding customers. We've been building this for a while. We have a bunch of, you know, partners that we can talk about openly, like, you know, revenue generating partners, that's fair to say. We work closely with Qualcomm to enable Snapdragon on TVM and hence our platform. We're close with AMD as well, enabling AMD hardware on the platform. We've been working closely with two hyperscaler cloud providers that-- >> I wonder who they are. >> I don't know who they are, right. >> Both start with the letter A. >> And they're both here, right. What is that? >> They both start with the letter A. >> Oh, that's right. >> I won't give it away. (laughing) >> Don't give it away. >> One has three, one has four. (both laugh) >> I'm guessing, by the way. >> Then we have customers in the, actually, early customers have been using the platform from the beginning in the consumer electronics space, in Japan, you know, self driving car technology, as well. As well as some AI first companies that actually, whose core value, the core business come from AI models. >> So, serious, serious customers. They got deep tech chops. They're integrating, they see this as a strategic part of their architecture. >> That's what I call AI native, exactly. But now there's, we have several enterprise customers in line now, we've been talking to. Of course, because now we launched the platform, now we started onboarding and exploring how we're going to serve it to these customers. But it's pretty clear that our technology can solve a lot of other pain points right now. And we're going to work with them as early customers to go and refine them. >> So, do you sell to the little guys, like us? Will we be customers if we wanted to be? >> You could, absolutely, yeah. >> What we have to do, have machine learning folks on staff? >> So, here's what you're going to have to do. Since you can see the booth, others can't. No, but they can certainly, you can try our demo. >> OctoML. >> And you should look at the transparent AI app that's compiled and optimized with our flow, and deployed and built with our flow. That allows you to get your image and do style transfer. You know, you can get you and a pineapple and see how you look like with a pineapple texture. >> We got a lot of transcript and video data. >> Right. Yeah. Right, exactly. So, you can use that. Then there's a very clear-- >> But I could use it. You're not blocking me from using it. Everyone's, it's pretty much democratized. >> You can try the demo, and then you can request access to the platform. >> But you get a lot of more serious deeper customers. But you can serve anybody, what you're saying. >> Luis: We can serve anybody, yeah. >> All right, so what's the vision going forward? Let me ask this. When did people start getting the epiphany of removing the machine learning from the hardware? Was it recently, a couple years ago? >> Well, on the research side, we helped start that trend a while ago. I don't need to repeat that. But I think the vision that's important here, I want the audience here to take away is that, there's a lot of progress being made in creating machine learning models. So, there's fantastic tools to deal with training data, and creating the models, and so on. And now there's a bunch of models that can solve real problems there. The question is, how do you very easily integrate that into your intelligent applications? Madrona Venture Group has been very vocal and investing heavily in intelligent applications both and user applications as well as enablers. So we say an enable of that because it's so easy to use our flow to get a model integrated into your application. Now, any regular software developer can integrate that. And that's just the beginning, right? Because, you know, now we have CI/CD integration to keep your models up to date, to continue to integrate, and then there's more downstream support for other features that you normally have in regular software development. >> I've been thinking about this for a long, long, time. And I think this whole code, no one thinks about code. Like, I write code, I'm deploying it. I think this idea of machine learning as code independent of other dependencies is really amazing. It's so obvious now that you say it. What's the choices now? Let's just say that, I buy it, I love it, I'm using it. Now what do I got to do if I want to deploy it? Do I have to pick processors? Are there verified platforms that you support? Is there a short list? Is there every piece of hardware? >> We actually can help you. I hope we're not saying we can do everything in the world here, but we can help you with that. So, here's how. When you have them all in the platform you can actually see how this model runs on any instance of any cloud, by the way. So we support all the three major cloud providers. And then you can make decisions. For example, if you care about latency, your model has to run on, at most 50 milliseconds, because you're going to have interactivity. And then, after that, you don't care if it's faster. All you care is that, is it going to run cheap enough. So we can help you navigate. And also going to make it automatic. >> It's like tire kicking in the dealer showroom. >> Right. >> You can test everything out, you can see the simulation. Are they simulations, or are they real tests? >> Oh, no, we run all in real hardware. So, we have, as I said, we support any instances of any of the major clouds. We actually run on the cloud. But we also support a select number of edge devices today, like ARMs and Nvidia Jetsons. And we have the OctoML cloud, which is a bunch of racks with a bunch Raspberry Pis and Nvidia Jetsons, and very soon, a bunch of mobile phones there too that can actually run the real hardware, and validate it, and test it out, so you can see that your model runs performant and economically enough in the cloud. And it can run on the edge devices-- >> You're a machine learning as a service. Would that be an accurate? >> That's part of it, because we're not doing the machine learning model itself. You come with a model and we make it deployable and make it ready to deploy. So, here's why it's important. Let me try. There's a large number of really interesting companies that do API models, as in API as a service. You have an NLP model, you have computer vision models, where you call an API and then point in the cloud. You send an image and you got a description, for example. But it is using a third party. Now, if you want to have your model on your infrastructure but having the same convenience as an API you can use our service. So, today, chances are that, if you have a model that you know that you want to do, there might not be an API for it, we actually automatically create the API for you. >> Okay, so that's why I get the DevOps agility for machine learning is a better description. Cause it's not, you're not providing the service. You're providing the service of deploying it like DevOps infrastructure as code. You're now ML as code. >> It's your model, your API, your infrastructure, but all of the convenience of having it ready to go, fully automatic, hands off. >> Cause I think what's interesting about this is that it brings the craftsmanship back to machine learning. Cause it's a craft. I mean, let's face it. >> Yeah. I want human brains, which are very precious resources, to focus on building those models, that is going to solve business problems. I don't want these very smart human brains figuring out how to scrub this into actually getting run the right way. This should be automatic. That's why we use machine learning, for machine learning to solve that. >> Here's an idea for you. We should write a book called, The Lean Machine Learning. Cause the lean startup was all about DevOps. >> Luis: We call machine leaning. No, that's not it going to work. (laughs) >> Remember when iteration was the big mantra. Oh, yeah, iterate. You know, that was from DevOps. >> Yeah, that's right. >> This code allowed for standing up stuff fast, double down, we all know the history, what it turned out. That was a good value for developers. >> I could really agree. If you don't mind me building on that point. You know, something we see as OctoML, but we also see at Madrona as well. Seeing that there's a trend towards best in breed for each one of the stages of getting a model deployed. From the data aspect of creating the data, and then to the model creation aspect, to the model deployment, and even model monitoring. Right? We develop integrations with all the major pieces of the ecosystem, such that you can integrate, say with model monitoring to go and monitor how a model is doing. Just like you monitor how code is doing in deployment in the cloud. >> It's evolution. I think it's a great step. And again, I love the analogy to the mainstream. I lived during those days. I remember the monolithic propriety, and then, you know, OSI model kind of blew it. But that OSI stack never went full stack, and it only stopped at TCP/IP. So, I think the same thing's going on here. You see some scalability around it to try to uncouple it, free it. >> Absolutely. And sustainability and accessibility to make it run faster and make it run on any deice that you want by any developer. So, that's the tagline. >> Luis Ceze, thanks for coming on. Professor. >> Thank you. >> I didn't know you were a professor. That's great to have you on. It was a masterclass in DevOps agility for machine learning. Thanks for coming on. Appreciate it. >> Thank you very much. Thank you. >> Congratulations, again. All right. OctoML here on theCube. Really important. Uncoupling the machine learning from the hardware specifically. That's only going to make space faster and safer, and more reliable. And that's where the whole theme of re:MARS is. Let's see how they fit in. I'm John for theCube. Thanks for watching. More coverage after this short break. >> Luis: Thank you. (gentle music)
SUMMARY :
live on the floor at AWS re:MARS 2022. for having me in the show, John. but machine learning is the And that allows you to get certainly on the silicon side. 'cause I could see the progression. So once upon a time, yeah, no... because if you wake up learning runs in the end, that's going to give you the So that was pre-conventional wisdom. the Hamilton was working on. and to this day, you know, That's the beginning of that was logical when you is that the ecosystem because that's kind of the test First, you know-- and scaling the model the way you want, not having to do that integration work. Scale, and run the models So if you can move data to the edge, So do not have any of the typical And you can use existing-- the Artemis too, in space. If they have the hardware. And that allows you So I have to ask you, So if you believe that to be true, to the chips that you want. about the smart part. And you get good recruiting for PhDs So you have to make money. And also, in the process So depends on the scale of the deployment. So, you have direct sales How many customers do you have? We have a bunch of, you know, And they're both here, right. I won't give it away. One has three, one has four. in Japan, you know, self They're integrating, they see this as it to these customers. Since you can see the booth, others can't. and see how you look like We got a lot of So, you can use that. But I could use it. and then you can request But you can serve anybody, of removing the machine for other features that you normally have It's so obvious now that you say it. So we can help you navigate. in the dealer showroom. you can see the simulation. And it can run on the edge devices-- You're a machine learning as a service. know that you want to do, I get the DevOps agility but all of the convenience it brings the craftsmanship for machine learning to solve that. Cause the lean startup No, that's not it going to work. You know, that was from DevOps. double down, we all know the such that you can integrate, and then, you know, OSI on any deice that you Professor. That's great to have you on. Thank you very much. Uncoupling the machine learning Luis: Thank you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Luis Ceze | PERSON | 0.99+ |
Qualcomm | ORGANIZATION | 0.99+ |
Luis | PERSON | 0.99+ |
2015 | DATE | 0.99+ |
John | PERSON | 0.99+ |
John Furrier | PERSON | 0.99+ |
Boston Dynamics | ORGANIZATION | 0.99+ |
two hours | QUANTITY | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
2017 | DATE | 0.99+ |
Japan | LOCATION | 0.99+ |
Madrona Venture Capital | ORGANIZATION | 0.99+ |
AMD | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
three | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
four | QUANTITY | 0.99+ |
2016 | DATE | 0.99+ |
University of Washington | ORGANIZATION | 0.99+ |
Today | DATE | 0.99+ |
Pepsi | ORGANIZATION | 0.99+ |
Both | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
First | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Second | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
SiMa.ai | ORGANIZATION | 0.99+ |
OctoML | TITLE | 0.99+ |
OctoML | ORGANIZATION | 0.99+ |
Intel | ORGANIZATION | 0.98+ |
one instance | QUANTITY | 0.98+ |
DevOps | TITLE | 0.98+ |
Madrona Venture Group | ORGANIZATION | 0.98+ |
Swami | PERSON | 0.98+ |
Madrona | ORGANIZATION | 0.98+ |
about six years | QUANTITY | 0.96+ |
Spot | ORGANIZATION | 0.96+ |
The Lean Machine Learning | TITLE | 0.95+ |
first | QUANTITY | 0.95+ |
theCUBE | ORGANIZATION | 0.94+ |
ARMs | ORGANIZATION | 0.94+ |
pineapple | ORGANIZATION | 0.94+ |
Raspberry Pis | ORGANIZATION | 0.92+ |
TensorFlow | TITLE | 0.89+ |
Snapdragon | ORGANIZATION | 0.89+ |
about three years old | QUANTITY | 0.89+ |
a couple years ago | DATE | 0.88+ |
two hyperscaler cloud providers | QUANTITY | 0.88+ |
first ones | QUANTITY | 0.87+ |
one of | QUANTITY | 0.85+ |
50 milliseconds | QUANTITY | 0.83+ |
Apache TVM | ORGANIZATION | 0.82+ |
both laugh | QUANTITY | 0.82+ |
three major cloud providers | QUANTITY | 0.81+ |