(soft music) >> Hello, everyone. Welcome to theCUBE's presentation of the AWS Startup Showcase. AI and Machine Learning: Top Startups Building Foundational Model Infrastructure. This is season 3, episode 1 of the ongoing series covering the exciting stuff from the AWS ecosystem, talking about machine learning and AI. I'm your host, John Furrier and today we are excited to be joined by Luis Ceze who's the CEO of OctoML and Anna Connolly, VP of customer success and experience OctoML. Great to have you on again, Luis. Anna, thanks for coming on. Appreciate it. >> Thank you, John. It's great to be here. >> Thanks for having us. >> I love the company. We had a CUBE conversation about this. You guys are really addressing how to run foundational models faster for less. And this is like the key theme. But before we get into it, this is a hot trend, but let's explain what you guys do. Can you set the narrative of what the company's about, why it was founded, what's your North Star and your mission? >> Yeah, so John, our mission is to make AI sustainable and accessible for everyone. And what we offer customers is, you know, a way of taking their models into production in the most efficient way possible by automating the process of getting a model and optimizing it for a variety of hardware and making cost-effective. So better, faster, cheaper model deployment. >> You know, the big trend here is AI. Everyone's seeing the ChatGPT, kind of the shot heard around the world. The BingAI and this fiasco and the ongoing experimentation. People are into it, and I think the business impact is clear. I haven't seen this in all of my career in the technology industry of this kind of inflection point. And every senior leader I talk to is rethinking about how to rebuild their business with AI because now the large language models have come in, these foundational models are here, they can see value in their data. This is a 10 year journey in the big data world. Now it's impacting that, and everyone's rebuilding their company around this idea of being AI first 'cause they see ways to eliminate things and make things more efficient. And so now they telling 'em to go do it. And they're like, what do we do? So what do you guys think? Can you explain what is this wave of AI and why is it happening, why now, and what should people pay attention to? What does it mean to them? >> Yeah, I mean, it's pretty clear by now that AI can do amazing things that captures people's imaginations. And also now can show things that are really impactful in businesses, right? So what people have the opportunity to do today is to either train their own model that adds value to their business or find open models out there that can do very valuable things to them. So the next step really is how do you take that model and put it into production in a cost-effective way so that the business can actually get value out of it, right? >> Anna, what's your take? Because customers are there, you're there to make 'em successful, you got the new secret weapon for their business. >> Yeah, I think we just see a lot of companies struggle to get from a trained model into a model that is deployed in a cost-effective way that actually makes sense for the application they're building. I think that's a huge challenge we see today, kind of across the board across all of our customers. >> Well, I see this, everyone asking the same question. I have data, I want to get value out of it. I got to get these big models, I got to train it. What's it going to cost? So I think there's a reality of, okay, I got to do it. Then no one has any visibility on what it costs. When they get into it, this is going to break the bank. So I have to ask you guys, the cost of training these models is on everyone's mind. OctoML, your company's focus on the cost side of it as well as the efficiency side of running these models in production. Why are the production costs such a concern and where specifically are people looking at it and why did it get here? >> Yeah, so training costs get a lot of attention because normally a large number, but we shouldn't forget that it's a large, typically one time upfront cost that customers pay. But, you know, when the model is put into production, the cost grows directly with model usage and you actually want your model to be used because it's adding value, right? So, you know, the question that a customer faces is, you know, they have a model, they have a trained model and now what? So how much would it cost to run in production, right? And now without the big wave in generative AI, which rightfully is getting a lot of attention because of the amazing things that it can do. It's important for us to keep in mind that generative AI models like ChatGPT are huge, expensive energy hogs. They cost a lot to run, right? And given that model usage growth directly, model cost grows directly with usage, what you want to do is make sure that once you put a model into production, you have the best cost structure possible so that you're not surprised when it's gets popular, right? So let me give you an example. So if you have a model that costs, say 1 to $2 million to train, but then it costs about one to two cents per session to use it, right? So if you have a million active users, even if they use just once a day, it's 10 to $20,000 a day to operate that model in production. And that very, very quickly, you know, get beyond what you paid to train it. >> Anna, these aren't small numbers, and it's cost to train and cost to operate, it kind of reminds me of when the cloud came around and the data center versus cloud options. Like, wait a minute, one, it costs a ton of cash to deploy, and then running it. This is kind of a similar dynamic. What are you seeing? >> Yeah, absolutely. I think we are going to see increasingly the cost and production outpacing the costs and training by a lot. I mean, people talk about training costs now because that's what they're confronting now because people are so focused on getting models performant enough to even use in an application. And now that we have them and they're that capable, we're really going to start to see production costs go up a lot. >> Yeah, Luis, if you don't mind, I know this might be a little bit of a tangent, but, you know, training's super important. I get that. That's what people are doing now, but then there's the deployment side of production. Where do people get caught up and miss the boat or misconfigure? What's the gotcha? Where's the trip wire or so to speak? Where do people mess up on the cost side? What do they do? Is it they don't think about it, they tie it to proprietary hardware? What's the issue? >> Yeah, several things, right? So without getting really technical, which, you know, I might get into, you know, you have to understand relationship between performance, you know, both in terms of latency and throughput and cost, right? So reducing latency is important because you improve responsiveness of the model. But it's really important to keep in mind that it often leads diminishing returns. Below a certain latency, making it faster won't make a measurable difference in experience, but it's going to cost a lot more. So understanding that is important. Now, if you care more about throughputs, which is the time it takes for you to, you know, units per period of time, you care about time to solution, we should think about this throughput per dollar. And understand what you want is the highest throughput per dollar, which may come at the cost of higher latency, which you're not going to care about, right? So, and the reality here, John, is that, you know, humans and especially folks in this space want to have the latest and greatest hardware. And often they commit a lot of money to get access to them and have to commit upfront before they understand the needs that their models have, right? So common mistake here, one is not spending time to understand what you really need, and then two, over-committing and using more hardware than you actually need. And not giving yourself enough freedom to get your workload to move around to the more cost-effective choice, right? So this is just a metaphoric choice. And then another thing that's important here too is making a model run faster on the hardware directly translates to lower cost, right? So, but it takes a lot of engineers, you need to think of ways of producing very efficient versions of your model for the target hardware that you're going to use. >> Anna, what's the customer angle here? Because price performance has been around for a long time, people get that, but now latency and throughput, that's key because we're starting to see this in apps. I mean, there's an end user piece. I even seeing it on the infrastructure side where they're taking a heavy lifting away from operational costs. So you got, you know, application specific to the user and/or top of the stack, and then you got actually being used in operations where they want both. >> Yeah, absolutely. Maybe I can illustrate this with a quick story with the customer that we had recently been working with. So this customer is planning to run kind of a transformer based model for tech generation at super high scale on Nvidia T4 GPU, so kind of a commodity GPU. And the scale was so high that they would've been paying hundreds of thousands of dollars in cloud costs per year just to serve this model alone. You know, one of many models in their application stack. So we worked with this team to optimize our model and then benchmark across several possible targets. So that matching the hardware that Luis was just talking about, including the newer kind of Nvidia A10 GPUs. And what they found during this process was pretty interesting. First, the team was able to shave a quarter of their spend just by using better optimization techniques on the T4, the older hardware. But actually moving to a newer GPU would allow them to serve this model in a sub two milliseconds latency, so super fast, which was able to unlock an entirely new kind of user experience. So they were able to kind of change the value they're delivering in their application just because they were able to move to this new hardware easily. So they ultimately decided to plan their deployment on the more expensive A10 because of this, but because of the hardware specific optimizations that we helped them with, they managed to even, you know, bring costs down from what they had originally planned. And so if you extend this kind of example to everything that's happening with generative AI, I think the story we just talked about was super relevant, but the scale can be even higher, you know, it can be tenfold that. We were recently conducting kind of this internal study using GPT-J as a proxy to illustrate the experience of just a company trying to use one of these large language models with an example scenario of creating a chatbot to help job seekers prepare for interviews. So if you imagine kind of a conservative usage scenario where the model generates just 3000 words per user per day, which is, you know, pretty conservative for how people are interacting with these models. It costs 5 cents a session and if you're a company and your app goes viral, so from, you know, beginning of the year there's nobody, at the end of the year there's a million daily active active users in that year alone, going from zero to a million. You'll be spending about $6 million a year, which is pretty unmanageable. That's crazy, right? >> Yeah. >> For a company or a product that's just launching. So I think, you know, for us we see the real way to make these kind of advancements accessible and sustainable, as we said is to bring down cost to serve using these techniques. >> That's a great story and I think that illustrates this idea that deployment cost can vary from situation to situation, from model to model and that the efficiency is so strong with this new wave, it eliminates heavy lifting, creates more efficiency, automates intellect. I mean, this is the trend, this is radical, this is going to increase. So the cost could go from nominal to millions, literally, potentially. So, this is what customers are doing. Yeah, that's a great story. What makes sense on a financial, is there a cost of ownership? Is there a pattern for best practice for training? What do you guys advise cuz this is a lot of time and money involved in all potential, you know, good scenarios of upside. But you can get over your skis as they say, and be successful and be out of business if you don't manage it. I mean, that's what people are talking about, right? >> Yeah, absolutely. I think, you know, we see kind of three main vectors to reduce cost. I think one is make your deployment process easier overall, so that your engineering effort to even get your app running goes down. Two, would be get more from the compute you're already paying for, you're already paying, you know, for your instances in the cloud, but can you do more with that? And then three would be shop around for lower cost hardware to match your use case. So on the first one, I think making the deployment easier overall, there's a lot of manual work that goes into benchmarking, optimizing and packaging models for deployment. And because the performance of machine learning models can be really hardware dependent, you have to go through this process for each target you want to consider running your model on. And this is hard, you know, we see that every day. But for teams who want to incorporate some of these large language models into their applications, it might be desirable because licensing a model from a large vendor like OpenAI can leave you, you know, over provision, kind of paying for capabilities you don't need in your application or can lock you into them and you lose flexibility. So we have a customer whose team actually prepares models for deployment in a SaaS application that many of us use every day. And they told us recently that without kind of an automated benchmarking and experimentation platform, they were spending several days each to benchmark a single model on a single hardware type. So this is really, you know, manually intensive and then getting more from the compute you're already paying for. We do see customers who leave money on the table by running models that haven't been optimized specifically for the hardware target they're using, like Luis was mentioning. And for some teams they just don't have the time to go through an optimization process and for others they might lack kind of specialized expertise and this is something we can bring. And then on shopping around for different hardware types, we really see a huge variation in model performance across hardware, not just CPU vs. GPU, which is, you know, what people normally think of. But across CPU vendors themselves, high memory instances and across cloud providers even. So the best strategy here is for teams to really be able to, we say, look before you leap by running real world benchmarking and not just simulations or predictions to find the best software, hardware combination for their workload. >> Yeah. You guys sound like you have a very impressive customer base deploying large language models. Where would you categorize your current customer base? And as you look out, as you guys are growing, you have new customers coming in, take me through the progression. Take me through the profile of some of your customers you have now, size, are they hyperscalers, are they big app folks, are they kicking the tires? And then as people are out there scratching heads, I got to get in this game, what's their psychology like? Are they coming in with specific problems or do they have specific orientation point of view about what they want to do? Can you share some data around what you're seeing? >> Yeah, I think, you know, we have customers that kind of range across the spectrum of sophistication from teams that basically don't have MLOps expertise in their company at all. And so they're really looking for us to kind of give a full service, how should I do everything from, you know, optimization, find the hardware, prepare for deployment. And then we have teams that, you know, maybe already have their serving and hosting infrastructure up and ready and they already have models in production and they're really just looking to, you know, take the extra juice out of the hardware and just do really specific on that optimization piece. I think one place where we're doing a lot more work now is kind of in the developer tooling, you know, model selection space. And that's kind of an area that we're creating more tools for, particularly within the PyTorch ecosystem to bring kind of this power earlier in the development cycle so that as people are grabbing a model off the shelf, they can, you know, see how it might perform and use that to inform their development process. >> Luis, what's the big, I like this idea of picking the models because isn't that like going to the market and picking the best model for your data? It's like, you know, it's like, isn't there a certain approaches? What's your view on this? 'Cause this is where everyone, I think it's going to be a land rush for this and I want to get your thoughts. >> For sure, yeah. So, you know, I guess I'll start with saying the one main takeaway that we got from the GPT-J study is that, you know, having a different understanding of what your model's compute and memory requirements are, very quickly, early on helps with the much smarter AI model deployments, right? So, and in fact, you know, Anna just touched on this, but I want to, you know, make sure that it's clear that OctoML is putting that power into user's hands right now. So in partnership with AWS, we are launching this new PyTorch native profiler that allows you with a single, you know, one line, you know, code decorator allows you to see how your code runs on a variety of different hardware after accelerations. So it gives you very clear, you know, data on how you should think about your model deployments. And this ties back to choices of models. So like, if you have a set of choices that are equally good of models in terms of functionality and you want to understand after acceleration how are you going to deploy, how much they're going to cost or what are the options using a automated process of making a decision is really, really useful. And in fact, so I think these events can get early access to this by signing up for the Octopods, you know, this is exclusive group for insiders here, so you can go to OctoML.ai/pods to sign up. >> So that Octopod, is that a program? What is that, is that access to code? Is that a beta, what is that? Explain, take a minute and explain Octopod. >> I think the Octopod would be a group of people who is interested in experiencing this functionality. So it is the friends and users of OctoML that would be the Octopod. And then yes, after you sign up, we would provide you essentially the tool in code form for you to try out in your own. I mean, part of the benefit of this is that it happens in your own local environment and you're in control of everything kind of within the workflow that developers are already using to create and begin putting these models into their applications. So it would all be within your control. >> Got it. I think the big question I have for you is when do you, when does that one of your customers know they need to call you? What's their environment look like? What are they struggling with? What are the conversations they might be having on their side of the fence? If anyone's watching this, they're like, "Hey, you know what, I've got my team, we have a lot of data. Do we have our own language model or do I use someone else's?" There's a lot of this, I will say discovery going on around what to do, what path to take, what does that customer look like, if someone's listening, when do they know to call you guys, OctoML? >> Well, I mean the most obvious one is that you have a significant spend on AI/ML, come and talk to us, you know, putting AIML into production. So that's the clear one. In fact, just this morning I was talking to someone who is in life sciences space and is having, you know, 15 to $20 million a year cloud related to AI/ML deployment is a clear, it's a pretty clear match right there, right? So that's on the cost side. But I also want to emphasize something that Anna said earlier that, you know, the hardware and software complexity involved in putting model into production is really high. So we've been able to abstract that away, offering a clean automation flow enables one, to experiment early on, you know, how models would run and get them to production. And then two, once they are into production, gives you an automated flow to continuously updating your model and taking advantage of all this acceleration and ability to run the model on the right hardware. So anyways, let's say one then is cost, you know, you have significant cost and then two, you have an automation needs. And Anna please compliment that. >> Yeah, Anna you can please- >> Yeah, I think that's exactly right. Maybe the other time is when you are expecting a big scale up in serving your application, right? You're launching a new feature, you expect to get a lot of usage or, and you want to kind of anticipate maybe your CTO, your CIO, whoever pays your cloud bills is going to come after you, right? And so they want to know, you know, what's the return on putting this model essentially into my application stack? Am I going to, is the usage going to match what I'm paying for it? And then you can understand that. >> So you guys have a lot of the early adopters, they got big data teams, they're pushed in the production, they want to get a little QA, test the waters, understand, use your technology to figure it out. Is there any cases where people have gone into production, they have to pull it out? It's like the old lemon laws with your car, you buy a car and oh my god, it's not the way I wanted it. I mean, I can imagine the early people through the wall, so to speak, in the wave here are going to be bloody in the sense that they've gone in and tried stuff and get stuck with huge bills. Are you seeing that? Are people pulling stuff out of production and redeploying? Or I can imagine that if I had a bad deployment, I'd want to refactor that or actually replatform that. Do you see that too? >> Definitely after a sticker shock, yes, your customers will come and make sure that, you know, the sticker shock won't happen again. >> Yeah. >> But then there's another more thorough aspect here that I think we likely touched on, be worth elaborating a bit more is just how are you going to scale in a way that's feasible depending on the allocation that you get, right? So as we mentioned several times here, you know, model deployment is so hardware dependent and so complex that you tend to get a model for a hardware choice and then you want to scale that specific type of instance. But what if, when you want to scale because suddenly luckily got popular and, you know, you want to scale it up and then you don't have that instance anymore. So how do you live with whatever you have at that moment is something that we see customers needing as well. You know, so in fact, ideally what we want is customers to not think about what kind of specific instances they want. What they want is to know what their models need. Say, they know the SLA and then find a set of hybrid targets and instances that hit the SLA whenever they're also scaling, they're going to scale with more freedom, right? Instead of having to wait for AWS to give them more specific allocation for a specific instance. What if you could live with other types of hardware and scale up in a more free way, right? So that's another thing that we see customers, you know, like they need more freedom to be able to scale with whatever is available. >> Anna, you touched on this with the business model impact to that 6 million cost, if that goes out of control, there's a business model aspect and there's a technical operation aspect to the cost side too. You want to be mindful of riding the wave in a good way, but not getting over your skis. So that brings up the point around, you know, confidence, right? And teamwork. Because if you're in production, there's probably a team behind it. Talk about the team aspect of your customers. I mean, they're dedicated, they go put stuff into production, they're developers, there're data. What's in it for them? Are they getting better, are they in the beach, you know, reading the book. Are they, you know, are there easy street for them? What's the customer benefit to the teams? >> Yeah, absolutely. With just a few clicks of a button, you're in production, right? That's the dream. So yeah, I mean I think that, you know, we illustrated it before a little bit. I think the automated kind of benchmarking and optimization process, like when you think about the effort it takes to get that data by hand, which is what people are doing today, they just don't do it. So they're making decisions without the best information because it's, you know, there just isn't the bandwidth to get the information that they need to make the best decision and then know exactly how to deploy it. So I think it's actually bringing kind of a new insight and capability to these teams that they didn't have before. And then maybe another aspect on the team side is that it's making the hand-off of the models from the data science teams to the model deployment teams more seamless. So we have, you know, we have seen in the past that this kind of transition point is the place where there are a lot of hiccups, right? The data science team will give a model to the production team and it'll be too slow for the application or it'll be too expensive to run and it has to go back and be changed and kind of this loop. And so, you know, with the PyTorch profiler that Luis was talking about, and then also, you know, the other ways we do optimization that kind of prevents that hand-off problem from happening. >> Luis and Anna, you guys have a great company. Final couple minutes left. Talk about the company, the people there, what's the culture like, you know, if Intel has Moore's law, which is, you know, doubling the performance in few years, what's the culture like there? Is it, you know, more throughput, better pricing? Explain what's going on with the company and put a plug in. Luis, we'll start with you. >> Yeah, absolutely. I'm extremely proud of the team that we built here. You know, we have a people first culture, you know, very, very collaborative and folks, we all have a shared mission here of making AI more accessible and sustainable. We have a very diverse team in terms of backgrounds and life stories, you know, to do what we do here, we need a team that has expertise in software engineering, in machine learning, in computer architecture. Even though we don't build chips, we need to understand how they work, right? So, and then, you know, the fact that we have this, this very really, really varied set of backgrounds makes the environment, you know, it's say very exciting to learn more about, you know, assistance end-to-end. But also makes it for a very interesting, you know, work environment, right? So people have different backgrounds, different stories. Some of them went to grad school, others, you know, were in intelligence agencies and now are working here, you know. So we have a really interesting set of people and, you know, life is too short not to work with interesting humans. You know, that's something that I like to think about, you know. >> I'm sure your off-site meetings are a lot of fun, people talking about computer architectures, silicon advances, the next GPU, the big data models coming in. Anna, what's your take? What's the culture like? What's the company vibe and what are you guys looking to do? What's the customer success pattern? What's up? >> Yeah, absolutely. I mean, I, you know, second all of the great things that Luis just said about the team. I think one that I, an additional one that I'd really like to underscore is kind of this customer obsession, to use a term you all know well. And focus on the end users and really making the experiences that we're bringing to our user who are developers really, you know, useful and valuable for them. And so I think, you know, all of these tools that we're trying to put in the hands of users, the industry and the market is changing so rapidly that our products across the board, you know, all of the companies that, you know, are part of the showcase today, we're all evolving them so quickly and we can only do that kind of really hand in glove with our users. So that would be another thing I'd emphasize. >> I think the change dynamic, the power dynamics of this industry is just the beginning. I'm very bullish that this is going to be probably one of the biggest inflection points in history of the computer industry because of all the dynamics of the confluence of all the forces, which you mentioned some of them, I mean PC, you know, interoperability within internetworking and you got, you know, the web and then mobile. Now we have this, I mean, I wouldn't even put social media even in the close to this. Like, this is like, changes user experience, changes infrastructure. There's going to be massive accelerations in performance on the hardware side from AWS's of the world and cloud and you got the edge and more data. This is really what big data was going to look like. This is the beginning. Final question, what do you guys see going forward in the future? >> Well, it's undeniable that machine learning and AI models are becoming an integral part of an interesting application today, right? So, and the clear trends here are, you know, more and more competitional needs for these models because they're only getting more and more powerful. And then two, you know, seeing the complexity of the infrastructure where they run, you know, just considering the cloud, there's like a wide variety of choices there, right? So being able to live with that and making the most out of it in a way that does not require, you know, an impossible to find team is something that's pretty clear. So the need for automation, abstracting with the complexity is definitely here. And we are seeing this, you know, trends are that you also see models starting to move to the edge as well. So it's clear that we're seeing, we are going to live in a world where there's no large models living in the cloud. And then, you know, edge models that talk to these models in the cloud to form, you know, an end-to-end truly intelligent application. >> Anna? >> Yeah, I think, you know, our, Luis said it at the beginning. Our vision is to make AI sustainable and accessible. And I think as this technology just expands in every company and every team, that's going to happen kind of on its own. And we're here to help support that. And I think you can't do that without tools like those like OctoML. >> I think it's going to be an error of massive invention, creativity, a lot of the format heavy lifting is going to allow the talented people to automate their intellect. I mean, this is really kind of what we see going on. And Luis, thank you so much. Anna, thanks for coming on this segment. Thanks for coming on theCUBE and being part of the AWS Startup Showcase. I'm John Furrier, your host. Thanks for watching. (upbeat music)

Published Date : Mar 9 2023

Entity	Category	Confidence
Anna	PERSON	0.99+
Anna Connolly	PERSON	0.99+
John Furrier	PERSON	0.99+
Luis	PERSON	0.99+
Luis Ceze	PERSON	0.99+
John	PERSON	0.99+
1	QUANTITY	0.99+
10	QUANTITY	0.99+
15	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
10 year	QUANTITY	0.99+
6 million	QUANTITY	0.99+
zero	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
First	QUANTITY	0.99+
OctoML	ORGANIZATION	0.99+
two	QUANTITY	0.99+
millions	QUANTITY	0.99+
today	DATE	0.99+
Two	QUANTITY	0.99+
$2 million	QUANTITY	0.98+
3000 words	QUANTITY	0.98+
one line	QUANTITY	0.98+
A10	COMMERCIAL_ITEM	0.98+
OctoML	TITLE	0.98+
one	QUANTITY	0.98+
three main vectors	QUANTITY	0.97+
hundreds of thousands of dollars	QUANTITY	0.97+
both	QUANTITY	0.97+
CUBE	ORGANIZATION	0.97+
T4	COMMERCIAL_ITEM	0.97+
one time	QUANTITY	0.97+
first one	QUANTITY	0.96+
two cents	QUANTITY	0.96+
GPT-J	ORGANIZATION	0.96+
single model	QUANTITY	0.95+
a minute	QUANTITY	0.95+
about $6 million a year	QUANTITY	0.95+
once a day	QUANTITY	0.95+
$20,000 a day	QUANTITY	0.95+
a million	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.93+
Octopod	TITLE	0.93+
this morning	DATE	0.93+
first culture	QUANTITY	0.92+
$20 million a year	QUANTITY	0.92+
AWS Startup Showcase	EVENT	0.9+
North Star	ORGANIZATION	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend