Jay Marshall, Neural Magic | AWS Startup Showcase S3E1

(upbeat music) >> Hello, everyone, and welcome to theCUBE's presentation of the "AWS Startup Showcase." This is season three, episode one. The focus of this episode is AI/ML: Top Startups Building Foundational Models, Infrastructure, and AI. It's great topics, super-relevant, and it's part of our ongoing coverage of startups in the AWS ecosystem. I'm your host, John Furrier, with theCUBE. Today, we're excited to be joined by Jay Marshall, VP of Business Development at Neural Magic. Jay, thanks for coming on theCUBE. >> Hey, John, thanks so much. Thanks for having us. >> We had a great CUBE conversation with you guys. This is very much about the company focuses. It's a feature presentation for the "Startup Showcase," and the machine learning at scale is the topic, but in general, it's more, (laughs) and we should call it "Machine Learning and AI: How to Get Started," because everybody is retooling their business. Companies that aren't retooling their business right now with AI first will be out of business, in my opinion. You're seeing massive shift. This is really truly the beginning of the next-gen machine learning AI trend. It's really seeing ChatGPT. Everyone sees that. That went mainstream. But this is just the beginning. This is scratching the surface of this next-generation AI with machine learning powering it, and with all the goodness of cloud, cloud scale, and how horizontally scalable it is. The resources are there. You got the Edge. Everything's perfect for AI 'cause data infrastructure's exploding in value. AI is just the applications. This is a super topic, so what do you guys see in this general area of opportunities right now in the headlines? And I'm sure you guys' phone must be ringing off the hook, metaphorically speaking, or emails and meetings and Zooms. What's going on over there at Neural Magic? >> No, absolutely, and you pretty much nailed most of it. I think that, you know, my background, we've seen for the last 20-plus years. Even just getting enterprise applications kind of built and delivered at scale, obviously, amazing things with AWS and the cloud to help accelerate that. And we just kind of figured out in the last five or so years how to do that productively and efficiently, kind of from an operations perspective. Got development and operations teams. We even came up with DevOps, right? But now, we kind of have this new kind of persona and new workload that developers have to talk to, and then it has to be deployed on those ITOps solutions. And so you pretty much nailed it. Folks are saying, "Well, how do I do this?" These big, generational models or foundational models, as we're calling them, they're great, but enterprises want to do that with their data, on their infrastructure, at scale, at the edge. So for us, yeah, we're helping enterprises accelerate that through optimizing models and then delivering them at scale in a more cost-effective fashion. >> Yeah, and I think one of the things, the benefits of OpenAI we saw, was not only is it open source, then you got also other models that are more proprietary, is that it shows the world that this is really happening, right? It's a whole nother level, and there's also new landscape kind of maps coming out. You got the generative AI, and you got the foundational models, large LLMs. Where do you guys fit into the landscape? Because you guys are in the middle of this. How do you talk to customers when they say, "I'm going down this road. I need help. I'm going to stand this up." This new AI infrastructure and applications, where do you guys fit in the landscape? >> Right, and really, the answer is both. I think today, when it comes to a lot of what for some folks would still be considered kind of cutting edge around computer vision and natural language processing, a lot of our optimization tools and our runtime are based around most of the common computer vision and natural language processing models. So your YOLOs, your BERTs, you know, your DistilBERTs and what have you, so we work to help optimize those, again, who've gotten great performance and great value for customers trying to get those into production. But when you get into the LLMs, and you mentioned some of the open source components there, our research teams have kind of been right in the trenches with those. So kind of the GPT open source equivalent being OPT, being able to actually take, you know, a multi-$100 billion parameter model and sparsify that or optimize that down, shaving away a ton of parameters, and being able to run it on smaller infrastructure. So I think the evolution here, you know, all this stuff came out in the last six months in terms of being turned loose into the wild, but we're staying in the trenches with folks so that we can help optimize those as well and not require, again, the heavy compute, the heavy cost, the heavy power consumption as those models evolve as well. So we're staying right in with everybody while they're being built, but trying to get folks into production today with things that help with business value today. >> Jay, I really appreciate you coming on theCUBE, and before we came on camera, you said you just were on a customer call. I know you got a lot of activity. What specific things are you helping enterprises solve? What kind of problems? Take us through the spectrum from the beginning, people jumping in the deep end of the pool, some people kind of coming in, starting out slow. What are the scale? Can you scope the kind of use cases and problems that are emerging that people are calling you for? >> Absolutely, so I think if I break it down to kind of, like, your startup, or I maybe call 'em AI native to kind of steal from cloud native years ago, that group, it's pretty much, you know, part and parcel for how that group already runs. So if you have a data science team and an ML engineering team, you're building models, you're training models, you're deploying models. You're seeing firsthand the expense of starting to try to do that at scale. So it's really just a pure operational efficiency play. They kind of speak natively to our tools, which we're doing in the open source. So it's really helping, again, with the optimization of the models they've built, and then, again, giving them an alternative to expensive proprietary hardware accelerators to have to run them. Now, on the enterprise side, it varies, right? You have some kind of AI native folks there that already have these teams, but you also have kind of, like, AI curious, right? Like, they want to do it, but they don't really know where to start, and so for there, we actually have an open source toolkit that can help you get into this optimization, and then again, that runtime, that inferencing runtime, purpose-built for CPUs. It allows you to not have to worry, again, about do I have a hardware accelerator available? How do I integrate that into my application stack? If I don't already know how to build this into my infrastructure, does my ITOps teams, do they know how to do this, and what does that runway look like? How do I cost for this? How do I plan for this? When it's just x86 compute, we've been doing that for a while, right? So it obviously still requires more, but at least it's a little bit more predictable. >> It's funny you mentioned AI native. You know, born in the cloud was a phrase that was out there. Now, you have startups that are born in AI companies. So I think you have this kind of cloud kind of vibe going on. You have lift and shift was a big discussion. Then you had cloud native, kind of in the cloud, kind of making it all work. Is there a existing set of things? People will throw on this hat, and then what's the difference between AI native and kind of providing it to existing stuff? 'Cause we're a lot of people take some of these tools and apply it to either existing stuff almost, and it's not really a lift and shift, but it's kind of like bolting on AI to something else, and then starting with AI first or native AI. >> Absolutely. It's a- >> How would you- >> It's a great question. I think that probably, where I'd probably pull back to kind of allow kind of retail-type scenarios where, you know, for five, seven, nine years or more even, a lot of these folks already have data science teams, you know? I mean, they've been doing this for quite some time. The difference is the introduction of these neural networks and deep learning, right? Those kinds of models are just a little bit of a paradigm shift. So, you know, I obviously was trying to be fun with the term AI native, but I think it's more folks that kind of came up in that neural network world, so it's a little bit more second nature, whereas I think for maybe some traditional data scientists starting to get into neural networks, you have the complexity there and the training overhead, and a lot of the aspects of getting a model finely tuned and hyperparameterization and all of these aspects of it. It just adds a layer of complexity that they're just not as used to dealing with. And so our goal is to help make that easy, and then of course, make it easier to run anywhere that you have just kind of standard infrastructure. >> Well, the other point I'd bring out, and I'd love to get your reaction to, is not only is that a neural network team, people who have been focused on that, but also, if you look at some of the DataOps lately, AIOps markets, a lot of data engineering, a lot of scale, folks who have been kind of, like, in that data tsunami cloud world are seeing, they kind of been in this, right? They're, like, been experiencing that. >> No doubt. I think it's funny the data lake concept, right? And you got data oceans now. Like, the metaphors just keep growing on us, but where it is valuable in terms of trying to shift the mindset, I've always kind of been a fan of some of the naming shift. I know with AWS, they always talk about purpose-built databases. And I always liked that because, you know, you don't have one database that can do everything. Even ones that say they can, like, you still have to do implementation detail differences. So sitting back and saying, "What is my use case, and then which database will I use it for?" I think it's kind of similar here. And when you're building those data teams, if you don't have folks that are doing data engineering, kind of that data harvesting, free processing, you got to do all that before a model's even going to care about it. So yeah, it's definitely a central piece of this as well, and again, whether or not you're going to be AI negative as you're making your way to kind of, you know, on that journey, you know, data's definitely a huge component of it. >> Yeah, you would have loved our Supercloud event we had. Talk about naming and, you know, around data meshes was talked about a lot. You're starting to see the control plane layers of data. I think that was the beginning of what I saw as that data infrastructure shift, to be horizontally scalable. So I have to ask you, with Neural Magic, when your customers and the people that are prospects for you guys, they're probably asking a lot of questions because I think the general thing that we see is, "How do I get started? Which GPU do I use?" I mean, there's a lot of things that are kind of, I won't say technical or targeted towards people who are living in that world, but, like, as the mainstream enterprises come in, they're going to need a playbook. What do you guys see, what do you guys offer your clients when they come in, and what do you recommend? >> Absolutely, and I think where we hook in specifically tends to be on the training side. So again, I've built a model. Now, I want to really optimize that model. And then on the runtime side when you want to deploy it, you know, we run that optimized model. And so that's where we're able to provide. We even have a labs offering in terms of being able to pair up our engineering teams with a customer's engineering teams, and we can actually help with most of that pipeline. So even if it is something where you have a dataset and you want some help in picking a model, you want some help training it, you want some help deploying that, we can actually help there as well. You know, there's also a great partner ecosystem out there, like a lot of folks even in the "Startup Showcase" here, that extend beyond into kind of your earlier comment around data engineering or downstream ITOps or the all-up MLOps umbrella. So we can absolutely engage with our labs, and then, of course, you know, again, partners, which are always kind of key to this. So you are spot on. I think what's happened with the kind of this, they talk about a hockey stick. This is almost like a flat wall now with the rate of innovation right now in this space. And so we do have a lot of folks wanting to go straight from curious to native. And so that's definitely where the partner ecosystem comes in so hard 'cause there just isn't anybody or any teams out there that, I literally do from, "Here's my blank database, and I want an API that does all the stuff," right? Like, that's a big chunk, but we can definitely help with the model to delivery piece. >> Well, you guys are obviously a featured company in this space. Talk about the expertise. A lot of companies are like, I won't say faking it till they make it. You can't really fake security. You can't really fake AI, right? So there's going to be a learning curve. They'll be a few startups who'll come out of the gate early. You guys are one of 'em. Talk about what you guys have as expertise as a company, why you're successful, and what problems do you solve for customers? >> No, appreciate that. Yeah, we actually, we love to tell the story of our founder, Nir Shavit. So he's a 20-year professor at MIT. Actually, he was doing a lot of work on kind of multicore processing before there were even physical multicores, and actually even did a stint in computational neurobiology in the 2010s, and the impetus for this whole technology, has a great talk on YouTube about it, where he talks about the fact that his work there, he kind of realized that the way neural networks encode and how they're executed by kind of ramming data layer by layer through these kind of HPC-style platforms, actually was not analogous to how the human brain actually works. So we're on one side, we're building neural networks, and we're trying to emulate neurons. We're not really executing them that way. So our team, which one of the co-founders, also an ex-MIT, that was kind of the birth of why can't we leverage this super-performance CPU platform, which has those really fat, fast caches attached to each core, and actually start to find a way to break that model down in a way that I can execute things in parallel, not having to do them sequentially? So it is a lot of amazing, like, talks and stuff that show kind of the magic, if you will, a part of the pun of Neural Magic, but that's kind of the foundational layer of all the engineering that we do here. And in terms of how we're able to bring it to reality for customers, I'll give one customer quote where it's a large retailer, and it's a people-counting application. So a very common application. And that customer's actually been able to show literally double the amount of cameras being run with the same amount of compute. So for a one-to-one perspective, two-to-one, business leaders usually like that math, right? So we're able to show pure cost savings, but even performance-wise, you know, we have some of the common models like your ResNets and your YOLOs, where we can actually even perform better than hardware-accelerated solutions. So we're trying to do, I need to just dumb it down to better, faster, cheaper, but from a commodity perspective, that's where we're accelerating. >> That's not a bad business model. Make things easier to use, faster, and reduce the steps it takes to do stuff. So, you know, that's always going to be a good market. Now, you guys have DeepSparse, which we've talked about on our CUBE conversation prior to this interview, delivers ML models through the software so the hardware allows for a decoupling, right? >> Yep. >> Which is going to drive probably a cost advantage. Also, it's also probably from a deployment standpoint it must be easier. Can you share the benefits? Is it a cost side? Is it more of a deployment? What are the benefits of the DeepSparse when you guys decouple the software from the hardware on the ML models? >> No you actually, you hit 'em both 'cause that really is primarily the value. Because ultimately, again, we're so early. And I came from this world in a prior life where I'm doing Java development, WebSphere, WebLogic, Tomcat open source, right? When we were trying to do innovation, we had innovation buckets, 'cause everybody wanted to be on the web and have their app and a browser, right? We got all the money we needed to build something and show, hey, look at the thing on the web, right? But when you had to get in production, that was the challenge. So to what you're speaking to here, in this situation, we're able to show we're just a Python package. So whether you just install it on the operating system itself, or we also have a containerized version you can drop on any container orchestration platform, so ECS or EKS on AWS. And so you get all the auto-scaling features. So when you think about that kind of a world where you have everything from real-time inferencing to kind of after hours batch processing inferencing, the fact that you can auto scale that hardware up and down and it's CPU based, so you're paying by the minute instead of maybe paying by the hour at a lower cost shelf, it does everything from pure cost to, again, I can have my standard IT team say, "Hey, here's the Kubernetes in the container," and it just runs on the infrastructure we're already managing. So yeah, operational, cost and again, and many times even performance. (audio warbles) CPUs if I want to. >> Yeah, so that's easier on the deployment too. And you don't have this kind of, you know, blank check kind of situation where you don't know what's on the backend on the cost side. >> Exactly. >> And you control the actual hardware and you can manage that supply chain. >> And keep in mind, exactly. Because the other thing that sometimes gets lost in the conversation, depending on where a customer is, some of these workloads, like, you know, you and I remember a world where even like the roundtrip to the cloud and back was a problem for folks, right? We're used to extremely low latency. And some of these workloads absolutely also adhere to that. But there's some workloads where the latency isn't as important. And we actually even provide the tuning. Now, if we're giving you five milliseconds of latency and you don't need that, you can tune that back. So less CPU, lower cost. Now, throughput and other things come into play. But that's the kind of configurability and flexibility we give for operations. >> All right, so why should I call you if I'm a customer or prospect Neural Magic, what problem do I have or when do I know I need you guys? When do I call you in and what does my environment look like? When do I know? What are some of the signals that would tell me that I need Neural Magic? >> No, absolutely. So I think in general, any neural network, you know, the process I mentioned before called sparcification, it's, you know, an optimization process that we specialize in. Any neural network, you know, can be sparcified. So I think if it's a deep-learning neural network type model. If you're trying to get AI into production, you have cost concerns even performance-wise. I certainly hate to be too generic and say, "Hey, we'll talk to everybody." But really in this world right now, if it's a neural network, it's something where you're trying to get into production, you know, we are definitely offering, you know, kind of an at-scale performant deployable solution for deep learning models. >> So neural network you would define as what? Just devices that are connected that need to know about each other? What's the state-of-the-art current definition of neural network for customers that may think they have a neural network or might not know they have a neural network architecture? What is that definition for neural network? >> That's a great question. So basically, machine learning models that fall under this kind of category, you hear about transformers a lot, or I mentioned about YOLO, the YOLO family of computer vision models, or natural language processing models like BERT. If you have a data science team or even developers, some even regular, I used to call myself a nine to five developer 'cause I worked in the enterprise, right? So like, hey, we found a new open source framework, you know, I used to use Spring back in the day and I had to go figure it out. There's developers that are pulling these models down and they're figuring out how to get 'em into production, okay? So I think all of those kinds of situations, you know, if it's a machine learning model of the deep learning variety that's, you know, really specifically where we shine. >> Okay, so let me pretend I'm a customer for a minute. I have all these videos, like all these transcripts, I have all these people that we've interviewed, CUBE alumnis, and I say to my team, "Let's AI-ify, sparcify theCUBE." >> Yep. >> What do I do? I mean, do I just like, my developers got to get involved and they're going to be like, "Well, how do I upload it to the cloud? Do I use a GPU?" So there's a thought process. And I think a lot of companies are going through that example of let's get on this AI, how can it help our business? >> Absolutely. >> What does that progression look like? Take me through that example. I mean, I made up theCUBE example up, but we do have a lot of data. We have large data models and we have people and connect to the internet and so we kind of seem like there's a neural network. I think every company might have a neural network in place. >> Well, and I was going to say, I think in general, you all probably do represent even the standard enterprise more than most. 'Cause even the enterprise is going to have a ton of video content, a ton of text content. So I think it's a great example. So I think that that kind of sea or I'll even go ahead and use that term data lake again, of data that you have, you're probably going to want to be setting up kind of machine learning pipelines that are going to be doing all of the pre-processing from kind of the raw data to kind of prepare it into the format that say a YOLO would actually use or let's say BERT for natural language processing. So you have all these transcripts, right? So we would do a pre-processing path where we would create that into the file format that BERT, the machine learning model would know how to train off of. So that's kind of all the pre-processing steps. And then for training itself, we actually enable what's called sparse transfer learning. So that's transfer learning is a very popular method of doing training with existing models. So we would be able to retrain that BERT model with your transcript data that we have now done the pre-processing with to get it into the proper format. And now we have a BERT natural language processing model that's been trained on your data. And now we can deploy that onto DeepSparse runtime so that now you can ask that model whatever questions, or I should say pass, you're not going to ask it those kinds of questions ChatGPT, although we can do that too. But you're going to pass text through the BERT model and it's going to give you answers back. It could be things like sentiment analysis or text classification. You just call the model, and now when you pass text through it, you get the answers better, faster or cheaper. I'll use that reference again. >> Okay, we can create a CUBE bot to give us questions on the fly from the the AI bot, you know, from our previous guests. >> Well, and I will tell you using that as an example. So I had mentioned OPT before, kind of the open source version of ChatGPT. So, you know, typically that requires multiple GPUs to run. So our research team, I may have mentioned earlier, we've been able to sparcify that over 50% already and run it on only a single GPU. And so in that situation, you could train OPT with that corpus of data and do exactly what you say. Actually we could use Alexa, we could use Alexa to actually respond back with voice. How about that? We'll do an API call and we'll actually have an interactive Alexa-enabled bot. >> Okay, we're going to be a customer, let's put it on the list. But this is a great example of what you guys call software delivered AI, a topic we chatted about on theCUBE conversation. This really means this is a developer opportunity. This really is the convergence of the data growth, the restructuring, how data is going to be horizontally scalable, meets developers. So this is an AI developer model going on right now, which is kind of unique. >> It is, John, I will tell you what's interesting. And again, folks don't always think of it this way, you know, the AI magical goodness is now getting pushed in the middle where the developers and IT are operating. And so it again, that paradigm, although for some folks seem obvious, again, if you've been around for 20 years, that whole all that plumbing is a thing, right? And so what we basically help with is when you deploy the DeepSparse runtime, we have a very rich API footprint. And so the developers can call the API, ITOps can run it, or to your point, it's developer friendly enough that you could actually deploy our off-the-shelf models. We have something called the SparseZoo where we actually publish pre-optimized or pre-sparcified models. And so developers could literally grab those right off the shelf with the training they've already had and just put 'em right into their applications and deploy them as containers. So yeah, we enable that for sure as well. >> It's interesting, DevOps was infrastructure as code and we had a last season, a series on data as code, which we kind of coined. This is data as code. This is a whole nother level of opportunity where developers just want to have programmable data and apps with AI. This is a whole new- >> Absolutely. >> Well, absolutely great, great stuff. Our news team at SiliconANGLE and theCUBE said you guys had a little bit of a launch announcement you wanted to make here on the "AWS Startup Showcase." So Jay, you have something that you want to launch here? >> Yes, and thank you John for teeing me up. So I'm going to try to put this in like, you know, the vein of like an AWS, like main stage keynote launch, okay? So we're going to try this out. So, you know, a lot of our product has obviously been built on top of x86. I've been sharing that the past 15 minutes or so. And with that, you know, we're seeing a lot of acceleration for folks wanting to run on commodity infrastructure. But we've had customers and prospects and partners tell us that, you know, ARM and all of its kind of variance are very compelling, both cost performance-wise and also obviously with Edge. And wanted to know if there was anything we could do from a runtime perspective with ARM. And so we got the work and, you know, it's a hard problem to solve 'cause the instructions set for ARM is very different than the instruction set for x86, and our deep tensor column technology has to be able to work with that lower level instruction spec. But working really hard, the engineering team's been at it and we are happy to announce here at the "AWS Startup Showcase," that DeepSparse inference now has, or inference runtime now has support for AWS Graviton instances. So it's no longer just x86, it is also ARM and that obviously also opens up the door to Edge and further out the stack so that optimize once run anywhere, we're not going to open up. So it is an early access. So if you go to neuralmagic.com/graviton, you can sign up for early access, but we're excited to now get into the ARM side of the fence as well on top of Graviton. >> That's awesome. Our news team is going to jump on that news. We'll get it right up. We get a little scoop here on the "Startup Showcase." Jay Marshall, great job. That really highlights the flexibility that you guys have when you decouple the software from the hardware. And again, we're seeing open source driving a lot more in AI ops now with with machine learning and AI. So to me, that makes a lot of sense. And congratulations on that announcement. Final minute or so we have left, give a summary of what you guys are all about. Put a plug in for the company, what you guys are looking to do. I'm sure you're probably hiring like crazy. Take the last few minutes to give a plug for the company and give a summary. >> No, I appreciate that so much. So yeah, joining us out neuralmagic.com, you know, part of what we didn't spend a lot of time here, our optimization tools, we are doing all of that in the open source. It's called SparseML and I mentioned SparseZoo briefly. So we really want the data scientists community and ML engineering community to join us out there. And again, the DeepSparse runtime, it's actually free to use for trial purposes and for personal use. So you can actually run all this on your own laptop or on an AWS instance of your choice. We are now live in the AWS marketplace. So push button, deploy, come try us out and reach out to us on neuralmagic.com. And again, sign up for the Graviton early access. >> All right, Jay Marshall, Vice President of Business Development Neural Magic here, talking about performant, cost effective machine learning at scale. This is season three, episode one, focusing on foundational models as far as building data infrastructure and AI, AI native. I'm John Furrier with theCUBE. Thanks for watching. (bright upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the "AWS Startup Showcase." Thanks for having us. and the machine learning and the cloud to help accelerate that. and you got the foundational So kind of the GPT open deep end of the pool, that group, it's pretty much, you know, So I think you have this kind It's a- and a lot of the aspects of and I'd love to get your reaction to, And I always liked that because, you know, that are prospects for you guys, and you want some help in picking a model, Talk about what you guys have that show kind of the magic, if you will, and reduce the steps it takes to do stuff. when you guys decouple the the fact that you can auto And you don't have this kind of, you know, the actual hardware and you and you don't need that, neural network, you know, of situations, you know, CUBE alumnis, and I say to my team, and they're going to be like, and connect to the internet and it's going to give you answers back. you know, from our previous guests. and do exactly what you say. of what you guys call enough that you could actually and we had a last season, that you want to launch here? And so we got the work and, you know, flexibility that you guys have So you can actually run Vice President of Business

ENTITIES

Entity	Category	Confidence
Jay	PERSON	0.99+
Jay Marshall	PERSON	0.99+
John Furrier	PERSON	0.99+
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Nir Shavit	PERSON	0.99+
20-year	QUANTITY	0.99+
Alexa	TITLE	0.99+
2010s	DATE	0.99+
seven	QUANTITY	0.99+
Python	TITLE	0.99+
MIT	ORGANIZATION	0.99+
each core	QUANTITY	0.99+
Neural Magic	ORGANIZATION	0.99+
Java	TITLE	0.99+
YouTube	ORGANIZATION	0.99+
Today	DATE	0.99+
nine years	QUANTITY	0.98+
both	QUANTITY	0.98+
BERT	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
ChatGPT	TITLE	0.98+
20 years	QUANTITY	0.98+
over 50%	QUANTITY	0.97+
second nature	QUANTITY	0.96+
today	DATE	0.96+
ARM	ORGANIZATION	0.96+
one	QUANTITY	0.95+
DeepSparse	TITLE	0.94+
neuralmagic.com/graviton	OTHER	0.94+
SiliconANGLE	ORGANIZATION	0.94+
WebSphere	TITLE	0.94+
nine	QUANTITY	0.94+
first	QUANTITY	0.93+
Startup Showcase	EVENT	0.93+
five milliseconds	QUANTITY	0.92+
AWS Startup Showcase	EVENT	0.91+
two	QUANTITY	0.9+
YOLO	ORGANIZATION	0.89+
CUBE	ORGANIZATION	0.88+
OPT	TITLE	0.88+
last six months	DATE	0.88+
season three	QUANTITY	0.86+
double	QUANTITY	0.86+
one customer	QUANTITY	0.86+
Supercloud	EVENT	0.86+
one side	QUANTITY	0.85+
Vice	PERSON	0.85+
x86	OTHER	0.83+
AI/ML: Top Startups Building Foundational Models	TITLE	0.82+
ECS	TITLE	0.81+
$100 billion	QUANTITY	0.81+
DevOps	TITLE	0.81+
WebLogic	TITLE	0.8+
EKS	TITLE	0.8+
a minute	QUANTITY	0.8+
neuralmagic.com	OTHER	0.79+

Brian Stevens, Neural Magic | Cube Conversation

>> John: Hello and welcome to this cube conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE. We got a great conversation on making machine learning easier and more affordable in an era where everybody wants more machine learning and AI. We're featuring Neural Magic with the CEO is also Cube alumni, Brian Steve. CEO, Great to see you Brian. Thanks for coming on this cube conversation. Talk about machine learning. >> Brian: Hey John, happy to be here again. >> John: What a buzz that's going on right now? Machine learning, one of the hottest topics, AI front and center, kind of going mainstream. We're seeing the success of the, of the kind of NextGen capabilities in the enterprise and in apps. It's a really exciting time. So perfect timing. Great, great to have this conversation. Let's start with taking a minute to explain what you guys are doing over there at Neural Magic. I know there's some history there, neural networks, MIT. But the, the convergence of what's going on, this big wave hitting, it's an exciting time for you guys. Take a minute to explain the company and your mission. >> Brian: Sure, sure, sure. So, as you said, the company's Neural Magic and spun out at MIT four plus years ago, along with some people and, and some intellectual property. And you summarize it better than I can cause you said, we're just trying to make, you know, AI that much easier. And so, but like another level of specificity around it is. You know, in the world you have a lot of like data scientists really focusing on making AI work for whatever their use case is. And then the next phase of that, then they're looking at optimizing the models that they built. And then it's not good enough just to work on models. You got to put 'em into production. So, what we do is we make it easier to optimize the models that have been developed and trained and then trying to make it super simple when it comes time to deploying those in production and managing them. >> Brian: You know, we've seen this movie before with the cloud. You start to see abstractions come out. Data science we saw like was like the, the secret art of being like a data scientist now democratization of data. You're kind of seeing a similar wave with machine learning models, foundational models, some call it developers are getting involved. Model complexity's still there, but, but it's getting easier. There's almost like the democratization happening. You got complexity, you got deployment, it's challenges, cost, you got developers involved. So it's like how do you grow it? How do you get more horsepower? And then how do you make developers productive, right? So like, this seems to be the thread. So, so where, where do you see this going? Because there's going to be a massive demand for, I want to do more with my machine learning. But what's the data source? What's the formatting? This kind of a stack develop, what, what are you guys doing to address this? Can you take us through and demystify this, this wave that's hitting, that everyone's seeing? >> Brian: Yeah. Now like you said, like, you know, the democratization of all of it. And that brings me all the way back to like the roots of open source, right? When you think about like, like back in the day you had to build your own tech stack yourself. A lot of people probably probably don't remember that. And then you went, you're building, you're always starting on a body of code or a module that was out there with open source. And I think that's what I equate to where AI has gotten to with what you were talking about the foundational models that didn't really exist years ago. So you really were like putting the layers of your models together in the formulas and it was a lot of heavy lifting. And so there was so much time spent on development. With far too few success cases, you know, to get into production to solve like a business stereo technical need. But as these, what's happening is as these models are becoming foundational. It's meaning people don't have to start from scratch. They're actually able to, you know, the avant-garde now is start with existing model that almost does what you want, but then applying your data set to it. So it's, you know, it's really the industry moving forward. And then we, you know, and, and the best thing about it is open source plays a new dimension, but this time, you know, in the, in the realm of AI. And so to us though, like, you know, I've been like, I spent a career focusing on, I think on like the, not just the technical side, but the consumption of the technology and how it's still way too hard for somebody to actually like, operationalize technology that all those vendors throw at them. So I've always been like empathetic the user around like, you know what their job is once you give them great technology. And so it's still too difficult even with the foundational models because what happens is there's really this impedance mismatch between the development of the model and then where, where the model has to live and run and be deployed and the life cycle of the model, if you will. And so what we've done in our research is we've developed techniques to introduce what's known as sparsity into a machine learning model. It's already been developed and trained. And what that sparsity does is that unlocks by making that model so much smaller. So in many cases we can make a model 90 to 95% smaller, even smaller than that in research. So, and, and so by doing that, we do that in a way that preserves all the accuracy out of the foundational model as you talked about. So now all of a sudden you get this much smaller model just as accurate. And then the even more exciting part about it is we developed a software-based engine called Deep Source. And what that, what the Inference Runtime does is takes that now sparsified model and it runs it, but because you sparsified it, it only needs a fraction of the compute that it, that it would've needed otherwise. So what we've done is make these models much faster, much smaller, and then by pairing that with an inference runtime, you now can actually deploy that model anywhere you want on commodity hardware, right? So X 86 in the cloud, X 86 in the data center arm at the edge, it's like this massive unlock that happens because you get the, the state-of-the-art models, but you get 'em, you know, on the IT assets and the commodity infrastructure. That is where all the applications are running today. >> John: I want to get into the inference piece and the deep sparse you mentioned, but I first have to ask, you mentioned open source, Dave and I with some fellow cube alumnis. We're having a chat about, you know, the iPhone and Android moment where you got proprietary versus open source. You got a similar thing happening with some of these machine learning modules where there's a lot of proprietary things happening and there's open source movement is growing. So is there a balance there? Are they all trying to do the same thing? Is it more like a chip, you know, silicons involved, all kinds of things going on that are really fascinating from a science. What's your, what's your reaction to that? >> Brian: I think it's like anything that, you know, the way we talk about AI you think had been around for decades, but the reality is it's been some of the deep learning models. When we first, when we first started taking models that the brain team was working on at Google and billing APIs around them on Google Cloud where the first cloud to even have AI services was 2015, 2016. So when you think about it, it's really been what, 6 years since like this thing is even getting lift off. So I think with that, everybody's throwing everything at it. You know, there's tons of funded hardware thrown at specialty for training or inference new companies. There's legacy companies that are getting into like AI now and whether it's a, you know, a CPU company that's now building specialized ASEX for training. There's new tech stacks proprietary software and there's a ton of asset service. So it really is, you know, what's gone from nascent 8 years ago is the wild, wild west out there. So there's a, there's a little bit of everything right now and I think that makes sense because at the early part of any industry it really becomes really specialized. And that's the, you know, showing my age of like, you know, the early pilot of the two thousands, you know, red Hat people weren't running X 86 in enterprise back then and they thought it was a toy and they certainly weren't running open source, but you really, and it made sense that they weren't because it didn't deliver what they needed to at that time. So they needed specialty stacks, they needed expensive, they needed expensive hardware that did what an Oracle database needed to do. They needed proprietary software. But what happens is that commoditizes through both hardware and through open source and the same thing's really just starting with with AI. >> John: Yeah. And I think that's a great point before we to call that out because in any industry timing's everything, right? I mean I remember back in the 80s, late 80s and 90s, AI, you know, stuff was going on and it just wasn't, there wasn't enough horsepower, there wasn't enough tech. >> Brian: Yep. >> John: You mentioned some of the processing. So AI is this industry that has all these experts who have been itch scratching that itch for decades. And now with cloud and custom silicon. The tech fundamental at the lower end of the stack, if you will, on the performance side is significantly more performant. It's there you got more capabilities. >> Brian: Yeah. >> John: Now you're kicking into more software, faster software. So it just seems like we're at a tipping point where finally it's here, like that AI moment or machine learning and now data is, is involved. So this is where organizations I see really jumping in with the CEO mandate. Hey team, make ML work for us. Go figure it out. It's got to be an advantage for us. >> Brian: Yeah. >> John: So now they go, okay boss, we will. So what, what do they do? What's the steps does an enterprise take to get machine learning into their organizations? Cause you know, it's coming down from the boards, you know, how does this work for rob? >> Brian: Yeah. Like the, you know, the, what we're seeing is it's like anything, like it's, whether that was source adoption or whether that was cloud adoption, it always starts usually with one person. And increasingly it is the CEO, which realizes they're getting further behind the competition because they're not leaning in, you know, faster. But typically it really comes down to like a really strong practitioner that's inside the organization, right? And, that realizes that the number one goal isn't doing more and just training more models and and necessarily being proprietary about it. It's really around understanding the art of the possible. Something that's grounded in the art of the possible, what, what deep learning can do today and what business outcomes you can deliver, you know, if you can employ. And then there's well proven paths through that. It's just that because of where it's been, it's not that industrialized today. It's very much, you know, you see ML project by ML project is very snowflakey, right? And that was kind of the early days of open source as well. And so, we're just starting to get to the point where it's getting easier, it's getting more industrialized, there's less steps, there's less burdensome on developers, there's less burdensome on, on the deployment side. And we're trying to bring that, that whole last mile by saying, you know what? Deploying deep learning and AI models should be as easy as the as to deploy your application, right? You shouldn't have to take an extra step to deploy an AI model. It shouldn't have to require a new hardware, it shouldn't require a new process, a new DevOps model. It should be as simple as what you're already doing. >> John: What is the best practice for companies to effectively bring an acceptable level of machine learning and performance into their organizations? >> Brian: Yeah, I think like the, the number one start is like what you hinted at before is they, they have to know the use case. They have to, in most cases, you're going to find across every industry you know, that that problem's been tackled by some company, right? And then you have to have the best practice around fine-tuning the models already exist. So fine tuning that existing model. That foundational model on your unique dataset. You, you know, if you are in medical instruments, it's not good enough to identify that it's a medical instrument in the picture. You got to know what type of medical instrument. So there's always a fine tuning step. And so we've created open source tools that make it easy for you to do two things at once. You can fine tune that existing foundational model, whether that's in the language space or whether that's in the vision space. You can fine tune that on your dataset. And at the same time you get an optimized model that comes out the other end. So you get kind of both things. So you, you no longer have to worry about you're, we're freeing you from worrying about the complexity of that transfer learning, if you will. And we're freeing you from worrying about, well where am I going to deploy the model? Where does it need to be? Does it need to be on a device, an edge, a data center, a cloud edge? What kind of hardware is it? Is there enough hardware there? We're liberating you from all of that. Because what you want, what you can count on is there'll always be commodity capability, commodity CPUs where you want to deploy in abundance cause that's where your application is. And so all of a sudden we're just freeing you of that, of that whole step. >> John: Okay. Let's get into deep sparse because you mentioned that earlier. What inspired the creation of deep sparse and how does it differ from any other solutions in the market that are out there? >> Brian: Sure. So, so where unique is it? It starts by, by two things. One is what the industry's pretty good at from the optimization side is they're good at like this thing called quantization, which turns like, you know, big numbers into small numbers, lower precision. So a 32 bit representation of a, of AI weight into a bit. And they're good at like cutting out layers, which also takes away accuracy. What we've figured out is to take those, the industry techniques for those that are best practice, but we combined it with unstructured varsity. So by reducing that model by 90 to 95% in size, that's great because it's made it smaller. But we've taken that when it's the deep sparse engine, when you deploy it that looks at that model and says, because it's so much smaller, I no longer have to run the part of the model that's been essentially sparsified. So what that's done is, it's meant that you no longer need a supercomputer to run models because there's not nearly as much math and processing as there was before the model was optimized. So now what happens is, every CPU platform out there has, has an enormous amount of compute because we've sparsified the rest of it away. So you can pick a, you can pick your, your laptop and you have enough compute to run state-of-the-art models. The second thing that, and you need a software engine to do that cause it ignores the parts of the models. It doesn't need to run, which is what like specialized hardware can't do. The second part is it's then turned into a memory efficiency problem. So it's really around just getting memory, getting the models loaded into the cash of the computer and keeping it there. Never having to go back out to memory. So, so our techniques are both, we reduce the model size and then we only run the part of the model that matters and then we keep it all in cash. And so what that does is it gets us to like these, these low, low latency faster and we're able to increase, you know, the CPU processing by an order magnitude. >> John: Yeah. That low latency is key. And you got developers, you know, co coding super fast. We'll get to the developer angle in a second. I want to just follow up on this, this motivation behind the, the deep sparse because you know, as we were talking earlier before we came on camera about the old days, I mean, not too long ago, virtualization and VMware abstracted away the os from, from the hardware rights and the server virtualization changed the game. >> Brian: Yeah. >> John: And that basically invented cloud computing as we know it today. So, so we see that abstraction. >> Brian: Yeah. >> John: There seems to be a motivation behind abstracting the way the machine learning models away from the hardware. And that seems to be bringing advantages to the AI growth. Can you elaborate on, is that true? And it's, what's your comment? >> Brian: It's true. I think it's true for us. I don't think the industry's there yet, honestly. Cause I think the industry still is of that mindset that if I took, if it took these expensive GPUs to train my model, then I want to run my model on those same expensive GPUs. Because there's often like not a separation between the people that are developing AI and the people that have to manage and deploy at where you need it. So the reality is, is that that's everything that we're after. Like, do we decrease the cost? Yes. Do we make the models smaller? Yes. Do we make them faster? A yes. But I think the most amazing power is that we've turned AI into a docker based microservice. And so like who in the industry wants to deploy their apps the old way on a os without virtualization, without docker, without Kubernetes, without microservices, without service mesh without serverless. You want all those tools for your apps by converting AI models. So they can be run inside a docker container with no apologies around latency and performance cause it's faster. You get the best of that whole world that you just talked about, which is, you know, what we're calling, you know, software delivered AI. So now the AI lives in the same world. Organizations that have gone through that digital cloud transformation with their app infrastructure. AI fits into that world. >> John: And this is where the abstraction concepts matter. When you have these inflection points, the convergence of compute data, machine learning that powers AI, it really becomes a developer opportunity. Because now applications and businesses, when they actually go through the digital transformation, their businesses are completely transformed. There is no IT. Developers are the application. They are the company, right? So AI will be part of whatever business or app will be out there. So there is a application developer angle here. Brian, can you explain >> Brian: Oh completely. >> John: how they're going to use this? Because you mentioned docker container microservice, I mean this really is an insane flipping of the script for developers. >> Brian: Yeah. >> John: So what's that look like? >> Brian: Well speak, it's because like AI's kind of, I mean, again, like it's come so fast. So you figure there's my app team and here's my AI team, right? And they're in different places and the AI team is dragging in specialized infrastructure in support of that as well. And that's not how app developers think. Like they've ran on fungible infrastructure that subtracted and virtualized forever, right? And so what we've done is we've, in addition to fitting into that world that they, that they like, we've also made it simple for them for they don't have to be a machine learning engineer to be able to experiment with these foundational models and transfer learning 'em. We've done that. So they can do that in a couple of commands and it has a simple API that they can either link to their application directly as a library to make difference calls or they can stand it up as a standalone, you know, scale up, scale out inference server. They get two choices. But it really fits into that, you know, you know that world that the modern developer, whether they're just using Python or C or otherwise, we made it just simple. So as opposed to like Go learn something else, they kind of don't have to. So in a way though, it's made it. It's almost made it hard because people expect when we talk to 'em for the first time to be the old way. Like, how do you look like a piece of hardware? Are you compatible with my existing hardware that runs ML? Like, no, we're, we're not. Because you don't need that stack anymore. All you need is a library called to make your prediction and that's it. That's it. >> John: Well, I mean, we were joking on Twitter the other day with someone saying, is AI a pet or a cattle? Right? Because they love their, their AI bots right now. So, so I'd say pet there. But you look at a lot of, there's going to be a lot of AI. So on a more serious note, you mentioned in microservices, will deep sparse have an API for developers? And how does that look like? What do I do? >> Brian: Yeah. >> John: tell me what my, as a developer, what's the roadmap look like? What's the >> Brian: Yeah, it, it really looks, it really can go in both modes. It can go in a standalone server mode where it handles, you know, rest API and it can scale out with ES as the workload comes up and scale back and like try to make hardware do that. Hardware may scale back, but it's just sitting there dormant, you know, so with this, it scales the same way your application needs to. And then for a developer, they basically just, they just, the PIP install de sparse, you know, has one commanded to do an install, and then they do two calls, really. The first call is a library call that the app makes to create the model. And models really already trained, but they, it's called a model create call. And the second command they do is they make a call to do a prediction. And it's as simple as that. So it's, it's AI's as simple as using any other library that the developers are already using, which I, which sounds hard to fathom because it is just so simplified. >> John: Software delivered AI. Okay, that's a cool thing. I believe in it personally. I think that's the way to go. I think there's going to be plenty of hardware options if you look at the advances of cloud players that got more silicon coming out. Yeah. More GPU. I mean, there's more instance, I mean, everything's out there right now. So the question is how does that evolve in your mind? Because that's seems to be key. You have open source projects emerging. What, what path does this take? Is there a parallel mental model that you see, Brian, that is similar? You mentioned open source earlier. Is it more like a VMware virtualization thing or is it more of a cloud thing? Is there Yeah. Is it going to evolve in a, in a trajectory that looks similar to what we might've seen in the past? >> Brian: Yeah, we're, you know, when I, when when I got involved with the company, what I, when I thought about it and I was reasoning about it, like, do you, you know, you want to, like, we all do when you want to join something full-time. I thought about it and said, where will the industry eventually get to? Right? To fully realize the value of, of deep learning and what's plausible as it evolves. And to me, like I, I know it's the old adage of, you know, you know, software, its hardware, cloudy software. But it truly was like, you know, we can solve these problems in software. Like there's nothing special that's happening at the hardware layer and the processing AI. The reality is that it's just early in the industry. So the view that that we had was like, this is eventually the best place where the industry will be, is the liberation of being able to run AI anywhere. Like you're really not democratizing, you democratize the model. But if you can't run the model anywhere you want because these models are getting bigger and bigger with these large language models, then you're kind of not democratizing. And if you got to go and like by a cluster to run this thing on. So the democratization comes by if all of a sudden that model can be consumed anywhere on demand without planning, without provisioning, wherever infrastructure is. And so I think that's with or without Neural Magic, that's where the industry will go and will get to. I think we're the leaders, leaders in getting it there. It's right because we're more advanced on these techniques. >> John: Yeah. And your background too. You've seen OpenStack, pre-cloud, you saw open source grow and still exponentially growing. And so you have the same similar dynamic with machine learning models growing. And they're also segmenting into almost a, an ML stack or foundational model as we talk about. So you're starting to see the formation of tooling inference. So a lot of components coming. It's almost a stack, it's almost a, it literally is like an operating system problem space, you know? How do you run things, how do you link things? How do you bring things together? Is that what's going on here? Is this like a data modeling operating environment kind of red hat type thing going on? Like. >> Brian: Yeah. Yeah. Like I think there is, you know, I thought about that too. And I think there is the role of like distribution, because the industrialization not happening fast enough of this. Like, can I go back to like every customers, every, every user does it in their own kind of way. Like it's not, everyone's a little bit of a snowflake. And I think that's okay. There's definitely plenty of companies that want to come in and say, well, this is the way it's going to be and we industrialize it as long as you do it our way. The reality is technology doesn't get industrialized by one company just saying, do it our way. And so that's why like we've taken the approach through open source by saying like, Hey, you haven't really industrialized it if you said. We made it simple, but you always got to run AI here. Yeah, right. You only like really industrialize it if you break it down into components that are simple to use and they work integrated in the stack the way you want them to. And so to me, that first principles was getting thing into microservices and dockers that could be run on VMware, OpenShare on the cloud in the edge. And so that's the, that's the real part that we're happening with. The other part, like I do agree, like I think it's going to quickly move into less about the model. Less about the training of the model and the transfer learning, you know, the data set of the model. We're taking away the complexity of optimization. Giving liberating deployment to be anywhere. And I think the last mile, John is going to be around the ML ops around that. Because it's easy to think of like soft now that it's just a software problem, we've turned it into a software problem. So it's easy to think of software as like kind of a point release, but that's not the reality, right? It's a life cycle. And it's, and so I think ML very much brings in the what is the lifecycle of that deployment? And, you know, you get into more interesting conversations, to be honest than like, once you've deployed in a docking container is around like model drift and accuracy and the dataset changes and the user changes is how do you become from an ML perspective of where of that sending signal back retraining. And, and that's where I think a lot of the, in more of the innovation's going to start to move there. >> John: Yeah. And software also, the software problem, the software opportunity as well is developer focused. And if you look at the cloud native landscape now, similar stacks developing a lot of components. A lot of things to, to stitch together a lot of things that are automating under the hood. A lot of developer productivity conversations. I think this is going to go down that same road. I want to get your thoughts because developers will set the pace. And this is something that's clear in this next wave developer productivity. They're the defacto standards bodies. They will decide what microservices check, API check. Now, skill gap is going to be a problem because it's relatively new. So model sprawl, model sizes, proprietary versus open. There has to be a way to kind of crunch that down into a, like a DevOps, like just make it, get the developer out of the, the muck. So what's your view? Are we early days like that? Or what's the young kid in college studying CS or whatever degree who comes into this with, with both feet? What are they doing? >> Brian: I'll probably say like the, the non-popular answer to that. A little bit is it's happening so fast that it's going to get kind of boring fast. Meaning like, yeah, you could go to school and go to MIT, right? Sorry. Like, and you could get a hold through end like becoming a model architect, like inventing the next model, right? And the layers and combining 'em and et cetera, et cetera. And then what operators and, and building a model that's bigger than the last one and trains faster, right? And there will be those people, right? That actually, like they're building the engines the same way. You know, I grew up as an infrastructure software developer. There's not a lot of companies that hire those anymore because they're all sitting inside of three big clouds. Yeah. Right? So you better be a good app developer, but I think what you're going to see is before you had to be everything, you had to be the, if you were going to use infrastructure, you had to know how to build infrastructure. And I think the same thing's true around is quickly exiting ML is to be able to use ML in your company, you better be like, great at every aspect of ML, including every intricacy inside of the model and every operation's doing, that's quickly changing. Like, you're going to start with a starting point. You know, in the future you're not going to be like cracking open these GPT models, you're going to just be pulling them off the shelf, fine tuning 'em and go. You don't have to invent it. You don't have to understand it. And I think that's going to be a pivot point, you know, in the industry between, you know, what's the future? What's, what's the future of a, a data scientist? ML engineer researcher look like? >> John: I think that's, the outcome's going to be determined. I mean, you mentioned, you know, doing it yourself what an SRE is for a Google with the servers scale's huge. So yeah, it might have to, at the beginning get boring, you get obsolete quickly, but that means it's progressing. So, The scale becomes huge. And that's where I think it's going to be interesting when we see that scale. >> Brian: Yep. Yeah, I think that's right. I think that's right. And we always, and, and what I've always said, and much the, again, the distribute into my ML team is that I want every developer to be as adept at being able take advantage of ML as non ML engineer, right? It's got to be that simple. And I think, I think it's getting there. I really do. >> John: Well, Brian, great, great to have you on theCUBE here on this cube conversation. As part of the startup showcase that's coming up. You're going to be featured. Or your company would featured on the upcoming ABRA startup showcase on making machine learning easier and more affordable as more machine learning models come in. You guys got deep sparse and some great technology. We're going to dig into that next time. I'll give you the final word right now. What do you see for the company? What are you guys looking for? Give a plug for the company right now. >> Brian: Oh, give a plug that I haven't already doubled in as the plug. >> John: You're hiring engineers, I assume from MIT and other places. >> Brian: Yep. I think like the, the biggest thing is like, like we're on the developer side. We're here to make this easy. The majority of inference today is, is on CPUs already, believe it or not, as much as kind of, we like to talk about hardware and specialized hardware. The majority is already on CPUs. We're basically bringing 95% cost savings to CPUs through this acceleration. So, but we're trying to do it in a way that makes it community first. So I think the, the shout out would be come find the Neural Magic community and engage with us and you'll find, you know, a thousand other like-minded people in Slack that are willing to help you as well as our engineers. And, and let's, let's go take on some successful AI deployments. >> John: Exciting times. This is, I think one of the pivotal moments, NextGen data, machine learning, and now starting to see AI not be that chat bot, just, you know, customer support or some basic natural language processing thing. You're starting to see real innovation. Brian Stevens, CEO of Neural Magic, bringing the magic here. Thanks for the time. Great conversation. >> Brian: Thanks John. >> John: Thanks for joining me. >> Brian: Cheers. Thank you. >> John: Okay. I'm John Furrier, host of theCUBE here in Palo Alto, California for this cube conversation with Brian Stevens. Thanks for watching.

Published Date : Feb 13 2023

SUMMARY :

CEO, Great to see you Brian. happy to be here again. minute to explain what you guys in the world you have a lot So it's like how do you grow it? like back in the day you had and the deep sparse you And that's the, you know, late 80s and 90s, AI, you know, It's there you got more capabilities. the CEO mandate. Cause you know, it's coming the as to deploy your application, right? And at the same time you get in the market that are out meant that you no longer need a the deep sparse because you know, John: And that basically And that seems to be bringing and the people that have to the convergence of compute data, insane flipping of the script But it really fits into that, you know, But you look at a lot of, call that the app makes to model that you see, Brian, the old adage of, you know, And so you have the same the way you want them to. And if you look at the to see is before you had to be I mean, you mentioned, you know, the distribute into my ML team great to have you on theCUBE already doubled in as the plug. and other places. the biggest thing is like, of the pivotal moments, Brian: Cheers. host of theCUBE here in Palo Alto,

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Brian	PERSON	0.99+
Brian Stevens	PERSON	0.99+
Dave	PERSON	0.99+
95%	QUANTITY	0.99+
2015	DATE	0.99+
John Furrier	PERSON	0.99+
90	QUANTITY	0.99+
2016	DATE	0.99+
32 bit	QUANTITY	0.99+
Neural Magic	ORGANIZATION	0.99+
Brian Steve	PERSON	0.99+
Neural Magic	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two calls	QUANTITY	0.99+
both things	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Palo Alto, California	LOCATION	0.99+
second thing	QUANTITY	0.99+
both	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Python	TITLE	0.99+
MIT	ORGANIZATION	0.99+
first call	QUANTITY	0.99+
two things	QUANTITY	0.99+
second part	QUANTITY	0.99+
One	QUANTITY	0.99+
both feet	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
both modes	QUANTITY	0.98+
today	DATE	0.98+
80s	DATE	0.98+
first	QUANTITY	0.98+
second command	QUANTITY	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Neural Magic: