Jay Marshall, Neural Magic | AWS Startup Showcase S3E1

(upbeat music) >> Hello, everyone, and welcome to theCUBE's presentation of the "AWS Startup Showcase." This is season three, episode one. The focus of this episode is AI/ML: Top Startups Building Foundational Models, Infrastructure, and AI. It's great topics, super-relevant, and it's part of our ongoing coverage of startups in the AWS ecosystem. I'm your host, John Furrier, with theCUBE. Today, we're excited to be joined by Jay Marshall, VP of Business Development at Neural Magic. Jay, thanks for coming on theCUBE. >> Hey, John, thanks so much. Thanks for having us. >> We had a great CUBE conversation with you guys. This is very much about the company focuses. It's a feature presentation for the "Startup Showcase," and the machine learning at scale is the topic, but in general, it's more, (laughs) and we should call it "Machine Learning and AI: How to Get Started," because everybody is retooling their business. Companies that aren't retooling their business right now with AI first will be out of business, in my opinion. You're seeing massive shift. This is really truly the beginning of the next-gen machine learning AI trend. It's really seeing ChatGPT. Everyone sees that. That went mainstream. But this is just the beginning. This is scratching the surface of this next-generation AI with machine learning powering it, and with all the goodness of cloud, cloud scale, and how horizontally scalable it is. The resources are there. You got the Edge. Everything's perfect for AI 'cause data infrastructure's exploding in value. AI is just the applications. This is a super topic, so what do you guys see in this general area of opportunities right now in the headlines? And I'm sure you guys' phone must be ringing off the hook, metaphorically speaking, or emails and meetings and Zooms. What's going on over there at Neural Magic? >> No, absolutely, and you pretty much nailed most of it. I think that, you know, my background, we've seen for the last 20-plus years. Even just getting enterprise applications kind of built and delivered at scale, obviously, amazing things with AWS and the cloud to help accelerate that. And we just kind of figured out in the last five or so years how to do that productively and efficiently, kind of from an operations perspective. Got development and operations teams. We even came up with DevOps, right? But now, we kind of have this new kind of persona and new workload that developers have to talk to, and then it has to be deployed on those ITOps solutions. And so you pretty much nailed it. Folks are saying, "Well, how do I do this?" These big, generational models or foundational models, as we're calling them, they're great, but enterprises want to do that with their data, on their infrastructure, at scale, at the edge. So for us, yeah, we're helping enterprises accelerate that through optimizing models and then delivering them at scale in a more cost-effective fashion. >> Yeah, and I think one of the things, the benefits of OpenAI we saw, was not only is it open source, then you got also other models that are more proprietary, is that it shows the world that this is really happening, right? It's a whole nother level, and there's also new landscape kind of maps coming out. You got the generative AI, and you got the foundational models, large LLMs. Where do you guys fit into the landscape? Because you guys are in the middle of this. How do you talk to customers when they say, "I'm going down this road. I need help. I'm going to stand this up." This new AI infrastructure and applications, where do you guys fit in the landscape? >> Right, and really, the answer is both. I think today, when it comes to a lot of what for some folks would still be considered kind of cutting edge around computer vision and natural language processing, a lot of our optimization tools and our runtime are based around most of the common computer vision and natural language processing models. So your YOLOs, your BERTs, you know, your DistilBERTs and what have you, so we work to help optimize those, again, who've gotten great performance and great value for customers trying to get those into production. But when you get into the LLMs, and you mentioned some of the open source components there, our research teams have kind of been right in the trenches with those. So kind of the GPT open source equivalent being OPT, being able to actually take, you know, a multi-$100 billion parameter model and sparsify that or optimize that down, shaving away a ton of parameters, and being able to run it on smaller infrastructure. So I think the evolution here, you know, all this stuff came out in the last six months in terms of being turned loose into the wild, but we're staying in the trenches with folks so that we can help optimize those as well and not require, again, the heavy compute, the heavy cost, the heavy power consumption as those models evolve as well. So we're staying right in with everybody while they're being built, but trying to get folks into production today with things that help with business value today. >> Jay, I really appreciate you coming on theCUBE, and before we came on camera, you said you just were on a customer call. I know you got a lot of activity. What specific things are you helping enterprises solve? What kind of problems? Take us through the spectrum from the beginning, people jumping in the deep end of the pool, some people kind of coming in, starting out slow. What are the scale? Can you scope the kind of use cases and problems that are emerging that people are calling you for? >> Absolutely, so I think if I break it down to kind of, like, your startup, or I maybe call 'em AI native to kind of steal from cloud native years ago, that group, it's pretty much, you know, part and parcel for how that group already runs. So if you have a data science team and an ML engineering team, you're building models, you're training models, you're deploying models. You're seeing firsthand the expense of starting to try to do that at scale. So it's really just a pure operational efficiency play. They kind of speak natively to our tools, which we're doing in the open source. So it's really helping, again, with the optimization of the models they've built, and then, again, giving them an alternative to expensive proprietary hardware accelerators to have to run them. Now, on the enterprise side, it varies, right? You have some kind of AI native folks there that already have these teams, but you also have kind of, like, AI curious, right? Like, they want to do it, but they don't really know where to start, and so for there, we actually have an open source toolkit that can help you get into this optimization, and then again, that runtime, that inferencing runtime, purpose-built for CPUs. It allows you to not have to worry, again, about do I have a hardware accelerator available? How do I integrate that into my application stack? If I don't already know how to build this into my infrastructure, does my ITOps teams, do they know how to do this, and what does that runway look like? How do I cost for this? How do I plan for this? When it's just x86 compute, we've been doing that for a while, right? So it obviously still requires more, but at least it's a little bit more predictable. >> It's funny you mentioned AI native. You know, born in the cloud was a phrase that was out there. Now, you have startups that are born in AI companies. So I think you have this kind of cloud kind of vibe going on. You have lift and shift was a big discussion. Then you had cloud native, kind of in the cloud, kind of making it all work. Is there a existing set of things? People will throw on this hat, and then what's the difference between AI native and kind of providing it to existing stuff? 'Cause we're a lot of people take some of these tools and apply it to either existing stuff almost, and it's not really a lift and shift, but it's kind of like bolting on AI to something else, and then starting with AI first or native AI. >> Absolutely. It's a- >> How would you- >> It's a great question. I think that probably, where I'd probably pull back to kind of allow kind of retail-type scenarios where, you know, for five, seven, nine years or more even, a lot of these folks already have data science teams, you know? I mean, they've been doing this for quite some time. The difference is the introduction of these neural networks and deep learning, right? Those kinds of models are just a little bit of a paradigm shift. So, you know, I obviously was trying to be fun with the term AI native, but I think it's more folks that kind of came up in that neural network world, so it's a little bit more second nature, whereas I think for maybe some traditional data scientists starting to get into neural networks, you have the complexity there and the training overhead, and a lot of the aspects of getting a model finely tuned and hyperparameterization and all of these aspects of it. It just adds a layer of complexity that they're just not as used to dealing with. And so our goal is to help make that easy, and then of course, make it easier to run anywhere that you have just kind of standard infrastructure. >> Well, the other point I'd bring out, and I'd love to get your reaction to, is not only is that a neural network team, people who have been focused on that, but also, if you look at some of the DataOps lately, AIOps markets, a lot of data engineering, a lot of scale, folks who have been kind of, like, in that data tsunami cloud world are seeing, they kind of been in this, right? They're, like, been experiencing that. >> No doubt. I think it's funny the data lake concept, right? And you got data oceans now. Like, the metaphors just keep growing on us, but where it is valuable in terms of trying to shift the mindset, I've always kind of been a fan of some of the naming shift. I know with AWS, they always talk about purpose-built databases. And I always liked that because, you know, you don't have one database that can do everything. Even ones that say they can, like, you still have to do implementation detail differences. So sitting back and saying, "What is my use case, and then which database will I use it for?" I think it's kind of similar here. And when you're building those data teams, if you don't have folks that are doing data engineering, kind of that data harvesting, free processing, you got to do all that before a model's even going to care about it. So yeah, it's definitely a central piece of this as well, and again, whether or not you're going to be AI negative as you're making your way to kind of, you know, on that journey, you know, data's definitely a huge component of it. >> Yeah, you would have loved our Supercloud event we had. Talk about naming and, you know, around data meshes was talked about a lot. You're starting to see the control plane layers of data. I think that was the beginning of what I saw as that data infrastructure shift, to be horizontally scalable. So I have to ask you, with Neural Magic, when your customers and the people that are prospects for you guys, they're probably asking a lot of questions because I think the general thing that we see is, "How do I get started? Which GPU do I use?" I mean, there's a lot of things that are kind of, I won't say technical or targeted towards people who are living in that world, but, like, as the mainstream enterprises come in, they're going to need a playbook. What do you guys see, what do you guys offer your clients when they come in, and what do you recommend? >> Absolutely, and I think where we hook in specifically tends to be on the training side. So again, I've built a model. Now, I want to really optimize that model. And then on the runtime side when you want to deploy it, you know, we run that optimized model. And so that's where we're able to provide. We even have a labs offering in terms of being able to pair up our engineering teams with a customer's engineering teams, and we can actually help with most of that pipeline. So even if it is something where you have a dataset and you want some help in picking a model, you want some help training it, you want some help deploying that, we can actually help there as well. You know, there's also a great partner ecosystem out there, like a lot of folks even in the "Startup Showcase" here, that extend beyond into kind of your earlier comment around data engineering or downstream ITOps or the all-up MLOps umbrella. So we can absolutely engage with our labs, and then, of course, you know, again, partners, which are always kind of key to this. So you are spot on. I think what's happened with the kind of this, they talk about a hockey stick. This is almost like a flat wall now with the rate of innovation right now in this space. And so we do have a lot of folks wanting to go straight from curious to native. And so that's definitely where the partner ecosystem comes in so hard 'cause there just isn't anybody or any teams out there that, I literally do from, "Here's my blank database, and I want an API that does all the stuff," right? Like, that's a big chunk, but we can definitely help with the model to delivery piece. >> Well, you guys are obviously a featured company in this space. Talk about the expertise. A lot of companies are like, I won't say faking it till they make it. You can't really fake security. You can't really fake AI, right? So there's going to be a learning curve. They'll be a few startups who'll come out of the gate early. You guys are one of 'em. Talk about what you guys have as expertise as a company, why you're successful, and what problems do you solve for customers? >> No, appreciate that. Yeah, we actually, we love to tell the story of our founder, Nir Shavit. So he's a 20-year professor at MIT. Actually, he was doing a lot of work on kind of multicore processing before there were even physical multicores, and actually even did a stint in computational neurobiology in the 2010s, and the impetus for this whole technology, has a great talk on YouTube about it, where he talks about the fact that his work there, he kind of realized that the way neural networks encode and how they're executed by kind of ramming data layer by layer through these kind of HPC-style platforms, actually was not analogous to how the human brain actually works. So we're on one side, we're building neural networks, and we're trying to emulate neurons. We're not really executing them that way. So our team, which one of the co-founders, also an ex-MIT, that was kind of the birth of why can't we leverage this super-performance CPU platform, which has those really fat, fast caches attached to each core, and actually start to find a way to break that model down in a way that I can execute things in parallel, not having to do them sequentially? So it is a lot of amazing, like, talks and stuff that show kind of the magic, if you will, a part of the pun of Neural Magic, but that's kind of the foundational layer of all the engineering that we do here. And in terms of how we're able to bring it to reality for customers, I'll give one customer quote where it's a large retailer, and it's a people-counting application. So a very common application. And that customer's actually been able to show literally double the amount of cameras being run with the same amount of compute. So for a one-to-one perspective, two-to-one, business leaders usually like that math, right? So we're able to show pure cost savings, but even performance-wise, you know, we have some of the common models like your ResNets and your YOLOs, where we can actually even perform better than hardware-accelerated solutions. So we're trying to do, I need to just dumb it down to better, faster, cheaper, but from a commodity perspective, that's where we're accelerating. >> That's not a bad business model. Make things easier to use, faster, and reduce the steps it takes to do stuff. So, you know, that's always going to be a good market. Now, you guys have DeepSparse, which we've talked about on our CUBE conversation prior to this interview, delivers ML models through the software so the hardware allows for a decoupling, right? >> Yep. >> Which is going to drive probably a cost advantage. Also, it's also probably from a deployment standpoint it must be easier. Can you share the benefits? Is it a cost side? Is it more of a deployment? What are the benefits of the DeepSparse when you guys decouple the software from the hardware on the ML models? >> No you actually, you hit 'em both 'cause that really is primarily the value. Because ultimately, again, we're so early. And I came from this world in a prior life where I'm doing Java development, WebSphere, WebLogic, Tomcat open source, right? When we were trying to do innovation, we had innovation buckets, 'cause everybody wanted to be on the web and have their app and a browser, right? We got all the money we needed to build something and show, hey, look at the thing on the web, right? But when you had to get in production, that was the challenge. So to what you're speaking to here, in this situation, we're able to show we're just a Python package. So whether you just install it on the operating system itself, or we also have a containerized version you can drop on any container orchestration platform, so ECS or EKS on AWS. And so you get all the auto-scaling features. So when you think about that kind of a world where you have everything from real-time inferencing to kind of after hours batch processing inferencing, the fact that you can auto scale that hardware up and down and it's CPU based, so you're paying by the minute instead of maybe paying by the hour at a lower cost shelf, it does everything from pure cost to, again, I can have my standard IT team say, "Hey, here's the Kubernetes in the container," and it just runs on the infrastructure we're already managing. So yeah, operational, cost and again, and many times even performance. (audio warbles) CPUs if I want to. >> Yeah, so that's easier on the deployment too. And you don't have this kind of, you know, blank check kind of situation where you don't know what's on the backend on the cost side. >> Exactly. >> And you control the actual hardware and you can manage that supply chain. >> And keep in mind, exactly. Because the other thing that sometimes gets lost in the conversation, depending on where a customer is, some of these workloads, like, you know, you and I remember a world where even like the roundtrip to the cloud and back was a problem for folks, right? We're used to extremely low latency. And some of these workloads absolutely also adhere to that. But there's some workloads where the latency isn't as important. And we actually even provide the tuning. Now, if we're giving you five milliseconds of latency and you don't need that, you can tune that back. So less CPU, lower cost. Now, throughput and other things come into play. But that's the kind of configurability and flexibility we give for operations. >> All right, so why should I call you if I'm a customer or prospect Neural Magic, what problem do I have or when do I know I need you guys? When do I call you in and what does my environment look like? When do I know? What are some of the signals that would tell me that I need Neural Magic? >> No, absolutely. So I think in general, any neural network, you know, the process I mentioned before called sparcification, it's, you know, an optimization process that we specialize in. Any neural network, you know, can be sparcified. So I think if it's a deep-learning neural network type model. If you're trying to get AI into production, you have cost concerns even performance-wise. I certainly hate to be too generic and say, "Hey, we'll talk to everybody." But really in this world right now, if it's a neural network, it's something where you're trying to get into production, you know, we are definitely offering, you know, kind of an at-scale performant deployable solution for deep learning models. >> So neural network you would define as what? Just devices that are connected that need to know about each other? What's the state-of-the-art current definition of neural network for customers that may think they have a neural network or might not know they have a neural network architecture? What is that definition for neural network? >> That's a great question. So basically, machine learning models that fall under this kind of category, you hear about transformers a lot, or I mentioned about YOLO, the YOLO family of computer vision models, or natural language processing models like BERT. If you have a data science team or even developers, some even regular, I used to call myself a nine to five developer 'cause I worked in the enterprise, right? So like, hey, we found a new open source framework, you know, I used to use Spring back in the day and I had to go figure it out. There's developers that are pulling these models down and they're figuring out how to get 'em into production, okay? So I think all of those kinds of situations, you know, if it's a machine learning model of the deep learning variety that's, you know, really specifically where we shine. >> Okay, so let me pretend I'm a customer for a minute. I have all these videos, like all these transcripts, I have all these people that we've interviewed, CUBE alumnis, and I say to my team, "Let's AI-ify, sparcify theCUBE." >> Yep. >> What do I do? I mean, do I just like, my developers got to get involved and they're going to be like, "Well, how do I upload it to the cloud? Do I use a GPU?" So there's a thought process. And I think a lot of companies are going through that example of let's get on this AI, how can it help our business? >> Absolutely. >> What does that progression look like? Take me through that example. I mean, I made up theCUBE example up, but we do have a lot of data. We have large data models and we have people and connect to the internet and so we kind of seem like there's a neural network. I think every company might have a neural network in place. >> Well, and I was going to say, I think in general, you all probably do represent even the standard enterprise more than most. 'Cause even the enterprise is going to have a ton of video content, a ton of text content. So I think it's a great example. So I think that that kind of sea or I'll even go ahead and use that term data lake again, of data that you have, you're probably going to want to be setting up kind of machine learning pipelines that are going to be doing all of the pre-processing from kind of the raw data to kind of prepare it into the format that say a YOLO would actually use or let's say BERT for natural language processing. So you have all these transcripts, right? So we would do a pre-processing path where we would create that into the file format that BERT, the machine learning model would know how to train off of. So that's kind of all the pre-processing steps. And then for training itself, we actually enable what's called sparse transfer learning. So that's transfer learning is a very popular method of doing training with existing models. So we would be able to retrain that BERT model with your transcript data that we have now done the pre-processing with to get it into the proper format. And now we have a BERT natural language processing model that's been trained on your data. And now we can deploy that onto DeepSparse runtime so that now you can ask that model whatever questions, or I should say pass, you're not going to ask it those kinds of questions ChatGPT, although we can do that too. But you're going to pass text through the BERT model and it's going to give you answers back. It could be things like sentiment analysis or text classification. You just call the model, and now when you pass text through it, you get the answers better, faster or cheaper. I'll use that reference again. >> Okay, we can create a CUBE bot to give us questions on the fly from the the AI bot, you know, from our previous guests. >> Well, and I will tell you using that as an example. So I had mentioned OPT before, kind of the open source version of ChatGPT. So, you know, typically that requires multiple GPUs to run. So our research team, I may have mentioned earlier, we've been able to sparcify that over 50% already and run it on only a single GPU. And so in that situation, you could train OPT with that corpus of data and do exactly what you say. Actually we could use Alexa, we could use Alexa to actually respond back with voice. How about that? We'll do an API call and we'll actually have an interactive Alexa-enabled bot. >> Okay, we're going to be a customer, let's put it on the list. But this is a great example of what you guys call software delivered AI, a topic we chatted about on theCUBE conversation. This really means this is a developer opportunity. This really is the convergence of the data growth, the restructuring, how data is going to be horizontally scalable, meets developers. So this is an AI developer model going on right now, which is kind of unique. >> It is, John, I will tell you what's interesting. And again, folks don't always think of it this way, you know, the AI magical goodness is now getting pushed in the middle where the developers and IT are operating. And so it again, that paradigm, although for some folks seem obvious, again, if you've been around for 20 years, that whole all that plumbing is a thing, right? And so what we basically help with is when you deploy the DeepSparse runtime, we have a very rich API footprint. And so the developers can call the API, ITOps can run it, or to your point, it's developer friendly enough that you could actually deploy our off-the-shelf models. We have something called the SparseZoo where we actually publish pre-optimized or pre-sparcified models. And so developers could literally grab those right off the shelf with the training they've already had and just put 'em right into their applications and deploy them as containers. So yeah, we enable that for sure as well. >> It's interesting, DevOps was infrastructure as code and we had a last season, a series on data as code, which we kind of coined. This is data as code. This is a whole nother level of opportunity where developers just want to have programmable data and apps with AI. This is a whole new- >> Absolutely. >> Well, absolutely great, great stuff. Our news team at SiliconANGLE and theCUBE said you guys had a little bit of a launch announcement you wanted to make here on the "AWS Startup Showcase." So Jay, you have something that you want to launch here? >> Yes, and thank you John for teeing me up. So I'm going to try to put this in like, you know, the vein of like an AWS, like main stage keynote launch, okay? So we're going to try this out. So, you know, a lot of our product has obviously been built on top of x86. I've been sharing that the past 15 minutes or so. And with that, you know, we're seeing a lot of acceleration for folks wanting to run on commodity infrastructure. But we've had customers and prospects and partners tell us that, you know, ARM and all of its kind of variance are very compelling, both cost performance-wise and also obviously with Edge. And wanted to know if there was anything we could do from a runtime perspective with ARM. And so we got the work and, you know, it's a hard problem to solve 'cause the instructions set for ARM is very different than the instruction set for x86, and our deep tensor column technology has to be able to work with that lower level instruction spec. But working really hard, the engineering team's been at it and we are happy to announce here at the "AWS Startup Showcase," that DeepSparse inference now has, or inference runtime now has support for AWS Graviton instances. So it's no longer just x86, it is also ARM and that obviously also opens up the door to Edge and further out the stack so that optimize once run anywhere, we're not going to open up. So it is an early access. So if you go to neuralmagic.com/graviton, you can sign up for early access, but we're excited to now get into the ARM side of the fence as well on top of Graviton. >> That's awesome. Our news team is going to jump on that news. We'll get it right up. We get a little scoop here on the "Startup Showcase." Jay Marshall, great job. That really highlights the flexibility that you guys have when you decouple the software from the hardware. And again, we're seeing open source driving a lot more in AI ops now with with machine learning and AI. So to me, that makes a lot of sense. And congratulations on that announcement. Final minute or so we have left, give a summary of what you guys are all about. Put a plug in for the company, what you guys are looking to do. I'm sure you're probably hiring like crazy. Take the last few minutes to give a plug for the company and give a summary. >> No, I appreciate that so much. So yeah, joining us out neuralmagic.com, you know, part of what we didn't spend a lot of time here, our optimization tools, we are doing all of that in the open source. It's called SparseML and I mentioned SparseZoo briefly. So we really want the data scientists community and ML engineering community to join us out there. And again, the DeepSparse runtime, it's actually free to use for trial purposes and for personal use. So you can actually run all this on your own laptop or on an AWS instance of your choice. We are now live in the AWS marketplace. So push button, deploy, come try us out and reach out to us on neuralmagic.com. And again, sign up for the Graviton early access. >> All right, Jay Marshall, Vice President of Business Development Neural Magic here, talking about performant, cost effective machine learning at scale. This is season three, episode one, focusing on foundational models as far as building data infrastructure and AI, AI native. I'm John Furrier with theCUBE. Thanks for watching. (bright upbeat music)

Published Date : Mar 9 2023

SUMMARY :

of the "AWS Startup Showcase." Thanks for having us. and the machine learning and the cloud to help accelerate that. and you got the foundational So kind of the GPT open deep end of the pool, that group, it's pretty much, you know, So I think you have this kind It's a- and a lot of the aspects of and I'd love to get your reaction to, And I always liked that because, you know, that are prospects for you guys, and you want some help in picking a model, Talk about what you guys have that show kind of the magic, if you will, and reduce the steps it takes to do stuff. when you guys decouple the the fact that you can auto And you don't have this kind of, you know, the actual hardware and you and you don't need that, neural network, you know, of situations, you know, CUBE alumnis, and I say to my team, and they're going to be like, and connect to the internet and it's going to give you answers back. you know, from our previous guests. and do exactly what you say. of what you guys call enough that you could actually and we had a last season, that you want to launch here? And so we got the work and, you know, flexibility that you guys have So you can actually run Vice President of Business

ENTITIES

Entity	Category	Confidence
Jay	PERSON	0.99+
Jay Marshall	PERSON	0.99+
John Furrier	PERSON	0.99+
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Nir Shavit	PERSON	0.99+
20-year	QUANTITY	0.99+
Alexa	TITLE	0.99+
2010s	DATE	0.99+
seven	QUANTITY	0.99+
Python	TITLE	0.99+
MIT	ORGANIZATION	0.99+
each core	QUANTITY	0.99+
Neural Magic	ORGANIZATION	0.99+
Java	TITLE	0.99+
YouTube	ORGANIZATION	0.99+
Today	DATE	0.99+
nine years	QUANTITY	0.98+
both	QUANTITY	0.98+
BERT	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
ChatGPT	TITLE	0.98+
20 years	QUANTITY	0.98+
over 50%	QUANTITY	0.97+
second nature	QUANTITY	0.96+
today	DATE	0.96+
ARM	ORGANIZATION	0.96+
one	QUANTITY	0.95+
DeepSparse	TITLE	0.94+
neuralmagic.com/graviton	OTHER	0.94+
SiliconANGLE	ORGANIZATION	0.94+
WebSphere	TITLE	0.94+
nine	QUANTITY	0.94+
first	QUANTITY	0.93+
Startup Showcase	EVENT	0.93+
five milliseconds	QUANTITY	0.92+
AWS Startup Showcase	EVENT	0.91+
two	QUANTITY	0.9+
YOLO	ORGANIZATION	0.89+
CUBE	ORGANIZATION	0.88+
OPT	TITLE	0.88+
last six months	DATE	0.88+
season three	QUANTITY	0.86+
double	QUANTITY	0.86+
one customer	QUANTITY	0.86+
Supercloud	EVENT	0.86+
one side	QUANTITY	0.85+
Vice	PERSON	0.85+
x86	OTHER	0.83+
AI/ML: Top Startups Building Foundational Models	TITLE	0.82+
ECS	TITLE	0.81+
$100 billion	QUANTITY	0.81+
DevOps	TITLE	0.81+
WebLogic	TITLE	0.8+
EKS	TITLE	0.8+
a minute	QUANTITY	0.8+
neuralmagic.com	OTHER	0.79+

Brian Stevens, Neural Magic | Cube Conversation

>> John: Hello and welcome to this cube conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE. We got a great conversation on making machine learning easier and more affordable in an era where everybody wants more machine learning and AI. We're featuring Neural Magic with the CEO is also Cube alumni, Brian Steve. CEO, Great to see you Brian. Thanks for coming on this cube conversation. Talk about machine learning. >> Brian: Hey John, happy to be here again. >> John: What a buzz that's going on right now? Machine learning, one of the hottest topics, AI front and center, kind of going mainstream. We're seeing the success of the, of the kind of NextGen capabilities in the enterprise and in apps. It's a really exciting time. So perfect timing. Great, great to have this conversation. Let's start with taking a minute to explain what you guys are doing over there at Neural Magic. I know there's some history there, neural networks, MIT. But the, the convergence of what's going on, this big wave hitting, it's an exciting time for you guys. Take a minute to explain the company and your mission. >> Brian: Sure, sure, sure. So, as you said, the company's Neural Magic and spun out at MIT four plus years ago, along with some people and, and some intellectual property. And you summarize it better than I can cause you said, we're just trying to make, you know, AI that much easier. And so, but like another level of specificity around it is. You know, in the world you have a lot of like data scientists really focusing on making AI work for whatever their use case is. And then the next phase of that, then they're looking at optimizing the models that they built. And then it's not good enough just to work on models. You got to put 'em into production. So, what we do is we make it easier to optimize the models that have been developed and trained and then trying to make it super simple when it comes time to deploying those in production and managing them. >> Brian: You know, we've seen this movie before with the cloud. You start to see abstractions come out. Data science we saw like was like the, the secret art of being like a data scientist now democratization of data. You're kind of seeing a similar wave with machine learning models, foundational models, some call it developers are getting involved. Model complexity's still there, but, but it's getting easier. There's almost like the democratization happening. You got complexity, you got deployment, it's challenges, cost, you got developers involved. So it's like how do you grow it? How do you get more horsepower? And then how do you make developers productive, right? So like, this seems to be the thread. So, so where, where do you see this going? Because there's going to be a massive demand for, I want to do more with my machine learning. But what's the data source? What's the formatting? This kind of a stack develop, what, what are you guys doing to address this? Can you take us through and demystify this, this wave that's hitting, that everyone's seeing? >> Brian: Yeah. Now like you said, like, you know, the democratization of all of it. And that brings me all the way back to like the roots of open source, right? When you think about like, like back in the day you had to build your own tech stack yourself. A lot of people probably probably don't remember that. And then you went, you're building, you're always starting on a body of code or a module that was out there with open source. And I think that's what I equate to where AI has gotten to with what you were talking about the foundational models that didn't really exist years ago. So you really were like putting the layers of your models together in the formulas and it was a lot of heavy lifting. And so there was so much time spent on development. With far too few success cases, you know, to get into production to solve like a business stereo technical need. But as these, what's happening is as these models are becoming foundational. It's meaning people don't have to start from scratch. They're actually able to, you know, the avant-garde now is start with existing model that almost does what you want, but then applying your data set to it. So it's, you know, it's really the industry moving forward. And then we, you know, and, and the best thing about it is open source plays a new dimension, but this time, you know, in the, in the realm of AI. And so to us though, like, you know, I've been like, I spent a career focusing on, I think on like the, not just the technical side, but the consumption of the technology and how it's still way too hard for somebody to actually like, operationalize technology that all those vendors throw at them. So I've always been like empathetic the user around like, you know what their job is once you give them great technology. And so it's still too difficult even with the foundational models because what happens is there's really this impedance mismatch between the development of the model and then where, where the model has to live and run and be deployed and the life cycle of the model, if you will. And so what we've done in our research is we've developed techniques to introduce what's known as sparsity into a machine learning model. It's already been developed and trained. And what that sparsity does is that unlocks by making that model so much smaller. So in many cases we can make a model 90 to 95% smaller, even smaller than that in research. So, and, and so by doing that, we do that in a way that preserves all the accuracy out of the foundational model as you talked about. So now all of a sudden you get this much smaller model just as accurate. And then the even more exciting part about it is we developed a software-based engine called Deep Source. And what that, what the Inference Runtime does is takes that now sparsified model and it runs it, but because you sparsified it, it only needs a fraction of the compute that it, that it would've needed otherwise. So what we've done is make these models much faster, much smaller, and then by pairing that with an inference runtime, you now can actually deploy that model anywhere you want on commodity hardware, right? So X 86 in the cloud, X 86 in the data center arm at the edge, it's like this massive unlock that happens because you get the, the state-of-the-art models, but you get 'em, you know, on the IT assets and the commodity infrastructure. That is where all the applications are running today. >> John: I want to get into the inference piece and the deep sparse you mentioned, but I first have to ask, you mentioned open source, Dave and I with some fellow cube alumnis. We're having a chat about, you know, the iPhone and Android moment where you got proprietary versus open source. You got a similar thing happening with some of these machine learning modules where there's a lot of proprietary things happening and there's open source movement is growing. So is there a balance there? Are they all trying to do the same thing? Is it more like a chip, you know, silicons involved, all kinds of things going on that are really fascinating from a science. What's your, what's your reaction to that? >> Brian: I think it's like anything that, you know, the way we talk about AI you think had been around for decades, but the reality is it's been some of the deep learning models. When we first, when we first started taking models that the brain team was working on at Google and billing APIs around them on Google Cloud where the first cloud to even have AI services was 2015, 2016. So when you think about it, it's really been what, 6 years since like this thing is even getting lift off. So I think with that, everybody's throwing everything at it. You know, there's tons of funded hardware thrown at specialty for training or inference new companies. There's legacy companies that are getting into like AI now and whether it's a, you know, a CPU company that's now building specialized ASEX for training. There's new tech stacks proprietary software and there's a ton of asset service. So it really is, you know, what's gone from nascent 8 years ago is the wild, wild west out there. So there's a, there's a little bit of everything right now and I think that makes sense because at the early part of any industry it really becomes really specialized. And that's the, you know, showing my age of like, you know, the early pilot of the two thousands, you know, red Hat people weren't running X 86 in enterprise back then and they thought it was a toy and they certainly weren't running open source, but you really, and it made sense that they weren't because it didn't deliver what they needed to at that time. So they needed specialty stacks, they needed expensive, they needed expensive hardware that did what an Oracle database needed to do. They needed proprietary software. But what happens is that commoditizes through both hardware and through open source and the same thing's really just starting with with AI. >> John: Yeah. And I think that's a great point before we to call that out because in any industry timing's everything, right? I mean I remember back in the 80s, late 80s and 90s, AI, you know, stuff was going on and it just wasn't, there wasn't enough horsepower, there wasn't enough tech. >> Brian: Yep. >> John: You mentioned some of the processing. So AI is this industry that has all these experts who have been itch scratching that itch for decades. And now with cloud and custom silicon. The tech fundamental at the lower end of the stack, if you will, on the performance side is significantly more performant. It's there you got more capabilities. >> Brian: Yeah. >> John: Now you're kicking into more software, faster software. So it just seems like we're at a tipping point where finally it's here, like that AI moment or machine learning and now data is, is involved. So this is where organizations I see really jumping in with the CEO mandate. Hey team, make ML work for us. Go figure it out. It's got to be an advantage for us. >> Brian: Yeah. >> John: So now they go, okay boss, we will. So what, what do they do? What's the steps does an enterprise take to get machine learning into their organizations? Cause you know, it's coming down from the boards, you know, how does this work for rob? >> Brian: Yeah. Like the, you know, the, what we're seeing is it's like anything, like it's, whether that was source adoption or whether that was cloud adoption, it always starts usually with one person. And increasingly it is the CEO, which realizes they're getting further behind the competition because they're not leaning in, you know, faster. But typically it really comes down to like a really strong practitioner that's inside the organization, right? And, that realizes that the number one goal isn't doing more and just training more models and and necessarily being proprietary about it. It's really around understanding the art of the possible. Something that's grounded in the art of the possible, what, what deep learning can do today and what business outcomes you can deliver, you know, if you can employ. And then there's well proven paths through that. It's just that because of where it's been, it's not that industrialized today. It's very much, you know, you see ML project by ML project is very snowflakey, right? And that was kind of the early days of open source as well. And so, we're just starting to get to the point where it's getting easier, it's getting more industrialized, there's less steps, there's less burdensome on developers, there's less burdensome on, on the deployment side. And we're trying to bring that, that whole last mile by saying, you know what? Deploying deep learning and AI models should be as easy as the as to deploy your application, right? You shouldn't have to take an extra step to deploy an AI model. It shouldn't have to require a new hardware, it shouldn't require a new process, a new DevOps model. It should be as simple as what you're already doing. >> John: What is the best practice for companies to effectively bring an acceptable level of machine learning and performance into their organizations? >> Brian: Yeah, I think like the, the number one start is like what you hinted at before is they, they have to know the use case. They have to, in most cases, you're going to find across every industry you know, that that problem's been tackled by some company, right? And then you have to have the best practice around fine-tuning the models already exist. So fine tuning that existing model. That foundational model on your unique dataset. You, you know, if you are in medical instruments, it's not good enough to identify that it's a medical instrument in the picture. You got to know what type of medical instrument. So there's always a fine tuning step. And so we've created open source tools that make it easy for you to do two things at once. You can fine tune that existing foundational model, whether that's in the language space or whether that's in the vision space. You can fine tune that on your dataset. And at the same time you get an optimized model that comes out the other end. So you get kind of both things. So you, you no longer have to worry about you're, we're freeing you from worrying about the complexity of that transfer learning, if you will. And we're freeing you from worrying about, well where am I going to deploy the model? Where does it need to be? Does it need to be on a device, an edge, a data center, a cloud edge? What kind of hardware is it? Is there enough hardware there? We're liberating you from all of that. Because what you want, what you can count on is there'll always be commodity capability, commodity CPUs where you want to deploy in abundance cause that's where your application is. And so all of a sudden we're just freeing you of that, of that whole step. >> John: Okay. Let's get into deep sparse because you mentioned that earlier. What inspired the creation of deep sparse and how does it differ from any other solutions in the market that are out there? >> Brian: Sure. So, so where unique is it? It starts by, by two things. One is what the industry's pretty good at from the optimization side is they're good at like this thing called quantization, which turns like, you know, big numbers into small numbers, lower precision. So a 32 bit representation of a, of AI weight into a bit. And they're good at like cutting out layers, which also takes away accuracy. What we've figured out is to take those, the industry techniques for those that are best practice, but we combined it with unstructured varsity. So by reducing that model by 90 to 95% in size, that's great because it's made it smaller. But we've taken that when it's the deep sparse engine, when you deploy it that looks at that model and says, because it's so much smaller, I no longer have to run the part of the model that's been essentially sparsified. So what that's done is, it's meant that you no longer need a supercomputer to run models because there's not nearly as much math and processing as there was before the model was optimized. So now what happens is, every CPU platform out there has, has an enormous amount of compute because we've sparsified the rest of it away. So you can pick a, you can pick your, your laptop and you have enough compute to run state-of-the-art models. The second thing that, and you need a software engine to do that cause it ignores the parts of the models. It doesn't need to run, which is what like specialized hardware can't do. The second part is it's then turned into a memory efficiency problem. So it's really around just getting memory, getting the models loaded into the cash of the computer and keeping it there. Never having to go back out to memory. So, so our techniques are both, we reduce the model size and then we only run the part of the model that matters and then we keep it all in cash. And so what that does is it gets us to like these, these low, low latency faster and we're able to increase, you know, the CPU processing by an order magnitude. >> John: Yeah. That low latency is key. And you got developers, you know, co coding super fast. We'll get to the developer angle in a second. I want to just follow up on this, this motivation behind the, the deep sparse because you know, as we were talking earlier before we came on camera about the old days, I mean, not too long ago, virtualization and VMware abstracted away the os from, from the hardware rights and the server virtualization changed the game. >> Brian: Yeah. >> John: And that basically invented cloud computing as we know it today. So, so we see that abstraction. >> Brian: Yeah. >> John: There seems to be a motivation behind abstracting the way the machine learning models away from the hardware. And that seems to be bringing advantages to the AI growth. Can you elaborate on, is that true? And it's, what's your comment? >> Brian: It's true. I think it's true for us. I don't think the industry's there yet, honestly. Cause I think the industry still is of that mindset that if I took, if it took these expensive GPUs to train my model, then I want to run my model on those same expensive GPUs. Because there's often like not a separation between the people that are developing AI and the people that have to manage and deploy at where you need it. So the reality is, is that that's everything that we're after. Like, do we decrease the cost? Yes. Do we make the models smaller? Yes. Do we make them faster? A yes. But I think the most amazing power is that we've turned AI into a docker based microservice. And so like who in the industry wants to deploy their apps the old way on a os without virtualization, without docker, without Kubernetes, without microservices, without service mesh without serverless. You want all those tools for your apps by converting AI models. So they can be run inside a docker container with no apologies around latency and performance cause it's faster. You get the best of that whole world that you just talked about, which is, you know, what we're calling, you know, software delivered AI. So now the AI lives in the same world. Organizations that have gone through that digital cloud transformation with their app infrastructure. AI fits into that world. >> John: And this is where the abstraction concepts matter. When you have these inflection points, the convergence of compute data, machine learning that powers AI, it really becomes a developer opportunity. Because now applications and businesses, when they actually go through the digital transformation, their businesses are completely transformed. There is no IT. Developers are the application. They are the company, right? So AI will be part of whatever business or app will be out there. So there is a application developer angle here. Brian, can you explain >> Brian: Oh completely. >> John: how they're going to use this? Because you mentioned docker container microservice, I mean this really is an insane flipping of the script for developers. >> Brian: Yeah. >> John: So what's that look like? >> Brian: Well speak, it's because like AI's kind of, I mean, again, like it's come so fast. So you figure there's my app team and here's my AI team, right? And they're in different places and the AI team is dragging in specialized infrastructure in support of that as well. And that's not how app developers think. Like they've ran on fungible infrastructure that subtracted and virtualized forever, right? And so what we've done is we've, in addition to fitting into that world that they, that they like, we've also made it simple for them for they don't have to be a machine learning engineer to be able to experiment with these foundational models and transfer learning 'em. We've done that. So they can do that in a couple of commands and it has a simple API that they can either link to their application directly as a library to make difference calls or they can stand it up as a standalone, you know, scale up, scale out inference server. They get two choices. But it really fits into that, you know, you know that world that the modern developer, whether they're just using Python or C or otherwise, we made it just simple. So as opposed to like Go learn something else, they kind of don't have to. So in a way though, it's made it. It's almost made it hard because people expect when we talk to 'em for the first time to be the old way. Like, how do you look like a piece of hardware? Are you compatible with my existing hardware that runs ML? Like, no, we're, we're not. Because you don't need that stack anymore. All you need is a library called to make your prediction and that's it. That's it. >> John: Well, I mean, we were joking on Twitter the other day with someone saying, is AI a pet or a cattle? Right? Because they love their, their AI bots right now. So, so I'd say pet there. But you look at a lot of, there's going to be a lot of AI. So on a more serious note, you mentioned in microservices, will deep sparse have an API for developers? And how does that look like? What do I do? >> Brian: Yeah. >> John: tell me what my, as a developer, what's the roadmap look like? What's the >> Brian: Yeah, it, it really looks, it really can go in both modes. It can go in a standalone server mode where it handles, you know, rest API and it can scale out with ES as the workload comes up and scale back and like try to make hardware do that. Hardware may scale back, but it's just sitting there dormant, you know, so with this, it scales the same way your application needs to. And then for a developer, they basically just, they just, the PIP install de sparse, you know, has one commanded to do an install, and then they do two calls, really. The first call is a library call that the app makes to create the model. And models really already trained, but they, it's called a model create call. And the second command they do is they make a call to do a prediction. And it's as simple as that. So it's, it's AI's as simple as using any other library that the developers are already using, which I, which sounds hard to fathom because it is just so simplified. >> John: Software delivered AI. Okay, that's a cool thing. I believe in it personally. I think that's the way to go. I think there's going to be plenty of hardware options if you look at the advances of cloud players that got more silicon coming out. Yeah. More GPU. I mean, there's more instance, I mean, everything's out there right now. So the question is how does that evolve in your mind? Because that's seems to be key. You have open source projects emerging. What, what path does this take? Is there a parallel mental model that you see, Brian, that is similar? You mentioned open source earlier. Is it more like a VMware virtualization thing or is it more of a cloud thing? Is there Yeah. Is it going to evolve in a, in a trajectory that looks similar to what we might've seen in the past? >> Brian: Yeah, we're, you know, when I, when when I got involved with the company, what I, when I thought about it and I was reasoning about it, like, do you, you know, you want to, like, we all do when you want to join something full-time. I thought about it and said, where will the industry eventually get to? Right? To fully realize the value of, of deep learning and what's plausible as it evolves. And to me, like I, I know it's the old adage of, you know, you know, software, its hardware, cloudy software. But it truly was like, you know, we can solve these problems in software. Like there's nothing special that's happening at the hardware layer and the processing AI. The reality is that it's just early in the industry. So the view that that we had was like, this is eventually the best place where the industry will be, is the liberation of being able to run AI anywhere. Like you're really not democratizing, you democratize the model. But if you can't run the model anywhere you want because these models are getting bigger and bigger with these large language models, then you're kind of not democratizing. And if you got to go and like by a cluster to run this thing on. So the democratization comes by if all of a sudden that model can be consumed anywhere on demand without planning, without provisioning, wherever infrastructure is. And so I think that's with or without Neural Magic, that's where the industry will go and will get to. I think we're the leaders, leaders in getting it there. It's right because we're more advanced on these techniques. >> John: Yeah. And your background too. You've seen OpenStack, pre-cloud, you saw open source grow and still exponentially growing. And so you have the same similar dynamic with machine learning models growing. And they're also segmenting into almost a, an ML stack or foundational model as we talk about. So you're starting to see the formation of tooling inference. So a lot of components coming. It's almost a stack, it's almost a, it literally is like an operating system problem space, you know? How do you run things, how do you link things? How do you bring things together? Is that what's going on here? Is this like a data modeling operating environment kind of red hat type thing going on? Like. >> Brian: Yeah. Yeah. Like I think there is, you know, I thought about that too. And I think there is the role of like distribution, because the industrialization not happening fast enough of this. Like, can I go back to like every customers, every, every user does it in their own kind of way. Like it's not, everyone's a little bit of a snowflake. And I think that's okay. There's definitely plenty of companies that want to come in and say, well, this is the way it's going to be and we industrialize it as long as you do it our way. The reality is technology doesn't get industrialized by one company just saying, do it our way. And so that's why like we've taken the approach through open source by saying like, Hey, you haven't really industrialized it if you said. We made it simple, but you always got to run AI here. Yeah, right. You only like really industrialize it if you break it down into components that are simple to use and they work integrated in the stack the way you want them to. And so to me, that first principles was getting thing into microservices and dockers that could be run on VMware, OpenShare on the cloud in the edge. And so that's the, that's the real part that we're happening with. The other part, like I do agree, like I think it's going to quickly move into less about the model. Less about the training of the model and the transfer learning, you know, the data set of the model. We're taking away the complexity of optimization. Giving liberating deployment to be anywhere. And I think the last mile, John is going to be around the ML ops around that. Because it's easy to think of like soft now that it's just a software problem, we've turned it into a software problem. So it's easy to think of software as like kind of a point release, but that's not the reality, right? It's a life cycle. And it's, and so I think ML very much brings in the what is the lifecycle of that deployment? And, you know, you get into more interesting conversations, to be honest than like, once you've deployed in a docking container is around like model drift and accuracy and the dataset changes and the user changes is how do you become from an ML perspective of where of that sending signal back retraining. And, and that's where I think a lot of the, in more of the innovation's going to start to move there. >> John: Yeah. And software also, the software problem, the software opportunity as well is developer focused. And if you look at the cloud native landscape now, similar stacks developing a lot of components. A lot of things to, to stitch together a lot of things that are automating under the hood. A lot of developer productivity conversations. I think this is going to go down that same road. I want to get your thoughts because developers will set the pace. And this is something that's clear in this next wave developer productivity. They're the defacto standards bodies. They will decide what microservices check, API check. Now, skill gap is going to be a problem because it's relatively new. So model sprawl, model sizes, proprietary versus open. There has to be a way to kind of crunch that down into a, like a DevOps, like just make it, get the developer out of the, the muck. So what's your view? Are we early days like that? Or what's the young kid in college studying CS or whatever degree who comes into this with, with both feet? What are they doing? >> Brian: I'll probably say like the, the non-popular answer to that. A little bit is it's happening so fast that it's going to get kind of boring fast. Meaning like, yeah, you could go to school and go to MIT, right? Sorry. Like, and you could get a hold through end like becoming a model architect, like inventing the next model, right? And the layers and combining 'em and et cetera, et cetera. And then what operators and, and building a model that's bigger than the last one and trains faster, right? And there will be those people, right? That actually, like they're building the engines the same way. You know, I grew up as an infrastructure software developer. There's not a lot of companies that hire those anymore because they're all sitting inside of three big clouds. Yeah. Right? So you better be a good app developer, but I think what you're going to see is before you had to be everything, you had to be the, if you were going to use infrastructure, you had to know how to build infrastructure. And I think the same thing's true around is quickly exiting ML is to be able to use ML in your company, you better be like, great at every aspect of ML, including every intricacy inside of the model and every operation's doing, that's quickly changing. Like, you're going to start with a starting point. You know, in the future you're not going to be like cracking open these GPT models, you're going to just be pulling them off the shelf, fine tuning 'em and go. You don't have to invent it. You don't have to understand it. And I think that's going to be a pivot point, you know, in the industry between, you know, what's the future? What's, what's the future of a, a data scientist? ML engineer researcher look like? >> John: I think that's, the outcome's going to be determined. I mean, you mentioned, you know, doing it yourself what an SRE is for a Google with the servers scale's huge. So yeah, it might have to, at the beginning get boring, you get obsolete quickly, but that means it's progressing. So, The scale becomes huge. And that's where I think it's going to be interesting when we see that scale. >> Brian: Yep. Yeah, I think that's right. I think that's right. And we always, and, and what I've always said, and much the, again, the distribute into my ML team is that I want every developer to be as adept at being able take advantage of ML as non ML engineer, right? It's got to be that simple. And I think, I think it's getting there. I really do. >> John: Well, Brian, great, great to have you on theCUBE here on this cube conversation. As part of the startup showcase that's coming up. You're going to be featured. Or your company would featured on the upcoming ABRA startup showcase on making machine learning easier and more affordable as more machine learning models come in. You guys got deep sparse and some great technology. We're going to dig into that next time. I'll give you the final word right now. What do you see for the company? What are you guys looking for? Give a plug for the company right now. >> Brian: Oh, give a plug that I haven't already doubled in as the plug. >> John: You're hiring engineers, I assume from MIT and other places. >> Brian: Yep. I think like the, the biggest thing is like, like we're on the developer side. We're here to make this easy. The majority of inference today is, is on CPUs already, believe it or not, as much as kind of, we like to talk about hardware and specialized hardware. The majority is already on CPUs. We're basically bringing 95% cost savings to CPUs through this acceleration. So, but we're trying to do it in a way that makes it community first. So I think the, the shout out would be come find the Neural Magic community and engage with us and you'll find, you know, a thousand other like-minded people in Slack that are willing to help you as well as our engineers. And, and let's, let's go take on some successful AI deployments. >> John: Exciting times. This is, I think one of the pivotal moments, NextGen data, machine learning, and now starting to see AI not be that chat bot, just, you know, customer support or some basic natural language processing thing. You're starting to see real innovation. Brian Stevens, CEO of Neural Magic, bringing the magic here. Thanks for the time. Great conversation. >> Brian: Thanks John. >> John: Thanks for joining me. >> Brian: Cheers. Thank you. >> John: Okay. I'm John Furrier, host of theCUBE here in Palo Alto, California for this cube conversation with Brian Stevens. Thanks for watching.

Published Date : Feb 13 2023

SUMMARY :

CEO, Great to see you Brian. happy to be here again. minute to explain what you guys in the world you have a lot So it's like how do you grow it? like back in the day you had and the deep sparse you And that's the, you know, late 80s and 90s, AI, you know, It's there you got more capabilities. the CEO mandate. Cause you know, it's coming the as to deploy your application, right? And at the same time you get in the market that are out meant that you no longer need a the deep sparse because you know, John: And that basically And that seems to be bringing and the people that have to the convergence of compute data, insane flipping of the script But it really fits into that, you know, But you look at a lot of, call that the app makes to model that you see, Brian, the old adage of, you know, And so you have the same the way you want them to. And if you look at the to see is before you had to be I mean, you mentioned, you know, the distribute into my ML team great to have you on theCUBE already doubled in as the plug. and other places. the biggest thing is like, of the pivotal moments, Brian: Cheers. host of theCUBE here in Palo Alto,

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Brian	PERSON	0.99+
Brian Stevens	PERSON	0.99+
Dave	PERSON	0.99+
95%	QUANTITY	0.99+
2015	DATE	0.99+
John Furrier	PERSON	0.99+
90	QUANTITY	0.99+
2016	DATE	0.99+
32 bit	QUANTITY	0.99+
Neural Magic	ORGANIZATION	0.99+
Brian Steve	PERSON	0.99+
Neural Magic	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two calls	QUANTITY	0.99+
both things	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
Palo Alto, California	LOCATION	0.99+
second thing	QUANTITY	0.99+
both	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Python	TITLE	0.99+
MIT	ORGANIZATION	0.99+
first call	QUANTITY	0.99+
two things	QUANTITY	0.99+
second part	QUANTITY	0.99+
One	QUANTITY	0.99+
both feet	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
both modes	QUANTITY	0.98+
today	DATE	0.98+
80s	DATE	0.98+
first	QUANTITY	0.98+
second command	QUANTITY	0.98+

Machine Learning Applied to Computationally Difficult Problems in Quantum Physics

>> My name is Franco Nori. Is a great pleasure to be here and I thank you for attending this meeting and I'll be talking about some of the work we are doing within the NTT-PHI group. I would like to thank the organizers for putting together this very interesting event. The topics studied by NTT-PHI are very exciting and I'm glad to be part of this great team. Let me first start with a brief overview of just a few interactions between our team and other groups within NTT-PHI. After this brief overview or these interactions then I'm going to start talking about machine learning and neural networks applied to computationally difficult problems in quantum physics. The first one I would like to raise is the following. Is it possible to have decoherence free interaction between qubits? And the proposed solution was a postdoc and a visitor and myself some years ago was to study decoherence free interaction between giant atoms made of superconducting qubits in the context of waveguide quantum electrodynamics. The theoretical prediction was confirmed by a very nice experiment performed by Will Oliver's group at MIT was probably so a few months ago in nature and it's called waveguide quantum electrodynamics with superconducting artificial giant atoms. And this is the first joint MIT Michigan nature paper during this NTT-PHI grand period. And we're very pleased with this. And I look forward to having additional collaborations like this one also with other NTT-PHI groups, Another collaboration inside NTT-PHI regards the quantum hall effects in a rapidly rotating polarity and condensates. And this work is mainly driven by two people, a Michael Fraser and Yoshihisa Yamamoto. They are the main driving forces of this project and this has been a great fun. We're also interacting inside the NTT-PHI environment with the groups of marandI Caltech, like McMahon Cornell, Oliver MIT, and as I mentioned before, Fraser Yamamoto NTT and others at NTT-PHI are also very welcome to interact with us. NTT-PHI is interested in various topics including how to use neural networks to solve computationally difficult and important problems. Let us now look at one example of using neural networks to study computationally difficult and hard problems. Everything we'll be talking today is mostly working progress to be extended and improve in the future. So the first example I would like to discuss is topological quantum phase transition retrieved through manifold learning, which is a variety of version of machine learning. This work is done in collaboration with Che, Gneiting and Liu all members of the group. preprint is available in the archive. Some groups are studying a quantum enhanced machine learning where machine learning is supposed to be used in actual quantum computers to use exponential speed-up and using quantum error correction we're not working on these kind of things we're doing something different. We're studying how to apply machine learning applied to quantum problems. For example how to identify quantum phases and phase transitions. We shall be talking about right now. How to achieve, how to perform quantum state tomography in a more efficient manner. That's another work of ours which I'll be showing later on. And how to assist the experimental data analysis which is a separate project which we recently published. But I will not discuss today because the experiments can produce massive amounts of data and machine learning can help to understand these huge tsunami of data provided by these experiments. Machine learning can be either supervised or unsupervised. Supervised is requires human labeled data. So we have here the blue dots have a label. The red dots have a different label. And the question is the new data corresponds to either the blue category or the red category. And many of these problems in machine learning they use the example of identifying cats and dogs but this is typical example. However, there are the cases which are also provides with there are no labels. So you're looking at the cluster structure and you need to define a metric, a distance between the different points to be able to correlate them together to create these clusters. And you can manifold learning is ideally suited to look at problems we just did our non-linearities and unsupervised. Once you're using the principle component analysis along this green axis here which are the principal axis here. You can actually identify a simple structure with linear projection when you increase the axis here, you get the red dots in one area, and the blue dots down here. But in general you could get red green, yellow, blue dots in a complicated manner and the correlations are better seen when you do an nonlinear embedding. And in unsupervised learning the colors represent similarities are not labels because there are no prior labels here. So we are interested on using machine learning to identify topological quantum phases. And this requires looking at the actual phases and their boundaries. And you start from a set of Hamiltonians or wave functions. And recall that this is difficult to do because there is no symmetry breaking, there is no local order parameters and in complicated cases you can not compute the topological properties analytically and numerically is very hard. So therefore machine learning is enriching the toolbox for studying topological quantum phase transitions. And before our work, there were quite a few groups looking at supervised machine learning. The shortcomings that you need to have prior knowledge of the system and the data must be labeled for each phase. This is needed in order to train the neural networks . More recently in the past few years, there has been increased push on looking at all supervised and Nonlinear embeddings. One of the shortcomings we have seen is that they all use the Euclidean distance which is a natural way to construct the similarity matrix. But we have proven that it is suboptimal. It is not the optimal way to look at distance. The Chebyshev distances provides better performance. So therefore the difficulty here is how to detect topological quantifies transition is a challenge because there is no local order parameters. Few years ago we thought well, three or so years ago machine learning may provide effective methods for identifying topological Features needed in the past few years. The past two years several groups are moving this direction. And we have shown that one type of machine learning called manifold learning can successfully retrieve topological quantum phase transitions in momentum and real spaces. We have also Shown that if you use the Chebyshev distance between data points are supposed to Euclidean distance, you sharpen the characteristic features of these topological quantum phases in momentum space and the afterwards we do so-called diffusion map, Isometric map can be applied to implement the dimensionality reduction and to learn about these phases and phase transition in an unsupervised manner. So this is a summary of this work on how to characterize and study topological phases. And the example we used is to look at the canonical famous models like the SSH model, the QWZ model, the quenched SSH model. We look at this momentous space and the real space, and we found that the metal works very well in all of these models. And moreover provides a implications and demonstrations for learning also in real space where the topological invariants could be either or known or hard to compute. So it provides insight on both momentum space and real space and its the capability of manifold learning is very good especially when you have the suitable metric in exploring topological quantum phase transition. So this is one area we would like to keep working on topological faces and how to detect them. Of course there are other problems where neural networks can be useful to solve computationally hard and important problems in quantum physics. And one of them is quantum state tomography which is important to evaluate the quality of state production experiments. The problem is quantum state tomography scales really bad. It is impossible to perform it for six and a half 20 qubits. If you have 2000 or more forget it, it's not going to work. So now we're seeing a very important process which is one here tomography which cannot be done because there is a computationally hard bottleneck. So machine learning is designed to efficiently handle big data. So the question we're asking a few years ago is chemistry learning help us to solve this bottleneck which is quantum state tomography. And this is a project called Eigenstate extraction with neural network tomography with a student Melkani , research scientists of the group Clemens Gneiting and I'll be brief in summarizing this now. The specific machine learning paradigm is the standard artificial neural networks. They have been recently shown in the past couple of years to be successful for tomography of pure States. Our approach will be to carry this over to mixed States. And this is done by successively reconstructing the eigenStates or the mixed states. So it is an iterative procedure where you can slowly slowly get into the desired target state. If you wish to see more details, this has been recently published in phys rev A and has been selected as a editor suggestion. I mean like some of the referees liked it. So tomography is very hard to do but it's important and machine learning can help us to do that using neural networks and these to achieve mixed state tomography using an iterative eigenstate reconstruction. So why it is so challenging? Because you're trying to reconstruct the quantum States from measurements. You have a single qubit, you have a few Pauli matrices there are very few measurements to make when you have N qubits then the N appears in the exponent. So the number of measurements grows exponentially and this exponential scaling makes the computation to be very difficult. It's prohibitively expensive for large system sizes. So this is the bottleneck is these exponential dependence on the number of qubits. So by the time you get to 20 or 24 it is impossible. It gets even worst. Experimental data is noisy and therefore you need to consider maximum-likelihood estimation in order to reconstruct the quantum state that kind of fits the measurements best. And again these are expensive. There was a seminal work sometime ago on ion-traps. The post-processing for eight qubits took them an entire week. There were different ideas proposed regarding compressed sensing to reduce measurements, linear regression, et cetera. But they all have problems and you quickly hit a wall. There's no way to avoid it. Indeed the initial estimate is that to do tomography for 14 qubits state, you will take centuries and you cannot support a graduate student for a century because you need to pay your retirement benefits and it is simply complicated. So therefore a team here sometime ago we're looking at the question of how to do a full reconstruction of 14-qubit States with in four hours. Actually it was three point three hours Though sometime ago and many experimental groups were telling us that was very popular paper to read and study because they wanted to do fast quantum state tomography. They could not support the student for one or two centuries. They wanted to get the results quickly. And then because we need to get these density matrices and then they need to do these measurements here. But we have N qubits the number of expectation values go like four to the N to the Pauli matrices becomes much bigger. A maximum likelihood makes it even more time consuming. And this is the paper by the group in Inns brook, where they go this one week post-processing and they will speed-up done by different groups and hours. Also how to do 14 qubit tomography in four hours, using linear regression. But the next question is can machine learning help with quantum state tomography? Can allow us to give us the tools to do the next step to improve it even further. And then the standard one is this one here. Therefore for neural networks there are some inputs here, X1, X2 X3. There are some weighting factors when you get an output function PHI we just call Nonlinear activation function that could be heavy side Sigmon piecewise, linear logistic hyperbolic. And this creates a decision boundary and input space where you get let's say the red one, the red dots on the left and the blue dots on the right. Some separation between them. And you could have either two layers or three layers or any number layers can do either shallow or deep. This cannot allow you to approximate any continuous function. You can train data via some cost function minimization. And then there are different varieties of neural nets. We're looking at some sequel restricted Boltzmann machine. Restricted means that the input layer speeds are not talking to each other. The output layers means are not talking to each other. And we got reasonably good results with the input layer, output layer, no hidden layer and the probability of finding a spin configuration called the Boltzmann factor. So we try to leverage Pure-state tomography for mixed-state tomography. By doing an iterative process where you start here. So there are the mixed States in the blue area the pure States boundary here. And then the initial state is here with the iterative process you get closer and closer to the actual mixed state. And then eventually once you get here, you do the final jump inside. So you're looking at a dominant eigenstate which is closest pure state and then computer some measurements and then do an iterative algorithm that to make you approach this desire state. And after you do that then you can essentially compare results with some data. We got some data for four to eight trapped-ion qubits approximate W States were produced and they were looking at let's say the dominant eigenstate is reliably recorded for any equal four, five six, seven, eight for the ion-state, for the eigenvalues we're still working because we're getting some results which are not as accurate as we would like to. So this is still work in progress, but for the States is working really well. So there is some cost scaling which is beneficial, goes like NR as opposed to N squared. And then the most relevant information on the quality of the state production is retrieved directly. This works for flexible rank. And so it is possible to extract the ion-state within network tomography. It is cost-effective and scalable and delivers the most relevant information about state generation. And it's an interesting and viable use case for machine learning in quantum physics. We're also now more recently working on how to do quantum state tomography using Conditional Generative Adversarial Networks. Usually the masters student are analyzed in PhD and then two former postdocs. So this CGANs refers to this Conditional Generative Adversarial Networks. In this framework you have two neural networks which are essentially having a dual, they're competing with each other. And one of them is called generator another one is called discriminator. And there they're learning multi-modal models from the data. And then we improved these by adding a cost of neural network layers that enable the conversion of outputs from any standard neural network into physical density matrix. So therefore to reconstruct the density matrix, the generator layer and the discriminator networks So the two networks, they must train each other on data using standard gradient-based methods. So we demonstrate that our quantum state tomography and the adversarial network can reconstruct the optical quantum state with very high fidelity which is orders of magnitude faster and from less data than a standard maximum likelihood metals. So we're excited about this. We also show that this quantum state tomography with these adversarial networks can reconstruct a quantum state in a single evolution of the generator network. If it has been pre-trained on similar quantum States. so requires some additional training. And all of these is still work in progress where some preliminary results written up but we're continuing. And I would like to thank all of you for attending this talk. And thanks again for the invitation.

Published Date : Sep 26 2020

SUMMARY :

And recall that this is difficult to do

ENTITIES

Entity	Category	Confidence
Michael Fraser	PERSON	0.99+
Franco Nori	PERSON	0.99+
Yoshihisa Yamamoto	PERSON	0.99+
one	QUANTITY	0.99+
NTT-PHI	ORGANIZATION	0.99+
two people	QUANTITY	0.99+
two layers	QUANTITY	0.99+
Clemens Gneiting	ORGANIZATION	0.99+
20	QUANTITY	0.99+
MIT	ORGANIZATION	0.99+
three hours	QUANTITY	0.99+
first	QUANTITY	0.99+
three layers	QUANTITY	0.99+
four	QUANTITY	0.99+
one week	QUANTITY	0.99+
Melkani	PERSON	0.99+
14 qubits	QUANTITY	0.99+
today	DATE	0.98+
one area	QUANTITY	0.98+
first example	QUANTITY	0.98+
Inns brook	LOCATION	0.98+
six and a half 20 qubits	QUANTITY	0.98+
24	QUANTITY	0.98+
four hours	QUANTITY	0.98+
Will Oliver	PERSON	0.98+
two centuries	QUANTITY	0.98+
Few years ago	DATE	0.98+
first joint	QUANTITY	0.98+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
each phase	QUANTITY	0.97+
three point	QUANTITY	0.96+
Fraser Yamamoto	PERSON	0.96+
two networks	QUANTITY	0.96+
first one	QUANTITY	0.96+
2000	QUANTITY	0.96+
six	QUANTITY	0.95+
five	QUANTITY	0.94+
14 qubit	QUANTITY	0.94+
Boltzmann	OTHER	0.94+
a century	QUANTITY	0.93+
one example	QUANTITY	0.93+
eight qubits	QUANTITY	0.92+
Caltech	ORGANIZATION	0.91+
NTT	ORGANIZATION	0.91+
centuries	QUANTITY	0.91+
few months ago	DATE	0.91+
single	QUANTITY	0.9+
Oliver	PERSON	0.9+
two former postdocs	QUANTITY	0.9+
single qubit	QUANTITY	0.89+
few years ago	DATE	0.88+
14-qubit	QUANTITY	0.86+
NTT-PHI	TITLE	0.86+
eight	QUANTITY	0.86+
Michigan	LOCATION	0.86+
past couple of years	DATE	0.85+
two neural	QUANTITY	0.84+
seven	QUANTITY	0.83+
eight trapped-	QUANTITY	0.83+
three or so years ago	DATE	0.82+
Liu	PERSON	0.8+
Pauli	OTHER	0.79+
one type	QUANTITY	0.78+
past two years	DATE	0.77+
some years ago	DATE	0.73+
Cornell	PERSON	0.72+
McMahon	ORGANIZATION	0.71+
Gneiting	PERSON	0.69+
Chebyshev	OTHER	0.68+
few years	DATE	0.67+
phys rev	TITLE	0.65+
past few years	DATE	0.64+
NTT	EVENT	0.64+
Che	PERSON	0.63+
CGANs	ORGANIZATION	0.61+
Boltzmann	PERSON	0.57+
Euclidean	LOCATION	0.57+
marandI	ORGANIZATION	0.5+
Hamiltonians	OTHER	0.5+
each	QUANTITY	0.5+
NTT	TITLE	0.44+
-PHI	TITLE	0.31+
PHI	ORGANIZATION	0.31+

Naveen Rao, Intel | AWS re:Invent 2019

>> Announcer: Live from Las Vegas, it's theCUBE! Covering AWS re:Invent 2019. Brought to you by Amazon Web Services and Intel, along with its ecosystem partners. >> Welcome back to the Sands Convention Center in Las Vegas everybody, you're watching theCUBE, the leader in live tech coverage. My name is Dave Vellante, I'm here with my cohost Justin Warren, this is day one of our coverage of AWS re:Invent 2019, Naveen Rao here, he's the corporate vice president and general manager of artificial intelligence, AI products group at Intel, good to see you again, thanks for coming to theCUBE. >> Thanks for having me. >> Dave: You're very welcome, so what's going on with Intel and AI, give us the big picture. >> Yeah, I mean actually the very big picture is I think the world of computing is really shifting. The purpose of what a computer is made for is actually shifting, and I think from its very conception, from Alan Turing, the machine was really meant to be something that recapitulated intelligence, and we took sort of a divergent path where we built applications for productivity, but now we're actually coming back to that original intent, and I think that hits everything that Intel does, because we're a computing company, we supply computing to the world, so everything we do is actually impacted by AI, and will be in service of building better AI platforms, for intelligence at the edge, intelligence in the cloud, and everything in between. >> It's really come full circle, I mean, when I first started this industry, AI was the big hot topic, and really, Intel's ascendancy was around personal productivity, but now we're seeing machines replacing cognitive functions for humans, that has implications for society. But there's a whole new set of workloads that are emerging, and that's driving, presumably, different requirements, so what do you see as the sort of infrastructure requirements for those new workloads, what's Intel's point of view on that? >> Well, so maybe let's focus that on the cloud first. Any kind of machine learning algorithm typically has two phases to it, one is called training or learning, where we're really iterating over large data sets to fit model parameters. And once that's been done to a satisfaction of whatever performance metrics that are relevant to your application, it's rolled out and deployed, that phase is called inference. So these two are actually quite different in their requirements in that inference is all about the best performance per watt, how much processing can I shove into a particular time and power budget? On the training side, it's much more about what kind of flexibility do I have for exploring different types of models, and training them very very fast, because when this field kind of started taking off in 2014, 2013, typically training a model back then would take a month or so, those models now take minutes to train, and the models have grown substantially in size, so we've still kind of gone back to a couple of weeks of training time, so anything we can do to reduce that is very important. >> And why the compression, is that because of just so much data? >> It's data, the sheer amount of data, the complexity of data, and the complexity of the models. So, very broad or a rough categorization of the complexity can be the number of parameters in a model. So, back in 2013, there were, call it 10 million, 20 million parameters, which was very large for a machine learning model. Now they're in the billions, one or two billion is sort of the state of the art. To give you bearings on that, the human brain is about a three to 500 trillion model, so we're still pretty far away from that. So we got a long way to go. >> Yeah, so one of the things about these models is that once you've trained them, that then they do things, but understanding how they work, these are incredibly complex mathematical models, so are we at a point where we just don't understand how these machines actually work, or do we have a pretty good idea of, "No no no, when this model's trained to do this thing, "this is how it behaves"? >> Well, it really depends on what you mean by how much understanding we have, so I'll say at one extreme, we trust humans to do certain things, and we don't really understand what's happening in their brain. We trust that there's a process in place that has tested them enough. A neurosurgeon's cutting into your head, you say you know what, there's a system where that neurosurgeon probably had to go through a ton of training, be tested over and over again, and now we trust that he or she is doing the right thing. I think the same thing is happening in AI, some aspects we can bound and say, I have analytical methods on how I can measure performance. In other ways, other places, it's actually not so easy to measure the performance analytically, we have to actually do it empirically, which means we have data sets that we say, "Does it stand up to all the different tests?" One area we're seeing that in is autonomous driving. Autonomous driving, it's a bit of a black box, and the amount of situations one can incur on the road are almost limitless, so what we say is, for a 16 year old, we say "Go out and drive," and eventually you sort of learn it. Same thing is happening now for autonomous systems, we have these training data sets where we say, "Do you do the right thing in these scenarios?" And we say "Okay, we trust that you'll probably "do the right thing in the real world." >> But we know that Intel has partnered with AWS, I ran autonomous driving with their DeepRacer project, and I believe it's on Thursday is the grand final, it's been running for, I think it was announced on theCUBE last year, and there's been a whole bunch of competitions running all year, basically training models that run on this Intel chip inside a little model car that drives around a race track, so speaking of empirical testing of whether or not it works, lap times gives you a pretty good idea, so what have you learned from that experience, of having all of these people go out and learn how to use these ALM models on a real live race car and race around a track? >> I think there's several things, I mean one thing is, when you turn loose a number of developers on a competitive thing, you get really interesting results, where people find creative ways to use the tools to try to win, so I always love that process, I think competition is how you push technology forward. On the tool side, it's actually more interesting to me, is that we had to come up with something that was adequately simple, so that a large number of people could get going on it quickly. You can't have somebody who spends a year just getting the basic infrastructure to work, so we had to put that in place. And really, I think that's still an iterative process, we're still learning what we can expose as knobs, what kind of areas of innovation we allow the user to explore, and where we sort of walk it down to make it easy to use. So I think that's the biggest learning we get from this, is how I can deploy AI in the real world, and what's really needed from a tool chain standpoint. >> Can you talk more specifically about what you guys each bring to the table with your collaboration with AWS? >> Yeah, AWS has been a great partner. Obviously AWS has a huge ecosystem of developers, all kinds of different developers, I mean web developers are one sort of developer, database developers are another, AI developers are yet another, and we're kind of partnering together to empower that AI base. What we bring from a technological standpoint are of course the hardware, our CPUs, our AI ready now with a lot of software that we've been putting out in the open source. And then other tools like OpenVINO, which make it very easy to start using AI models on our hardware, and so we tie that in to the infrastructure that AWS is building for something like DeepRacer, and then help build a community around it, an ecosystem around it of developers. >> I want to go back to the point you were making about the black box, AI, people are concerned about that, they're concerned about explainability. Do you feel like that's a function of just the newness that we'll eventually get over, and I mean I can think of so many examples in my life where I can't really explain how I know something, but I know it, and I trust it. Do you feel like it's sort of a tempest in a teapot? >> Yeah, I think it depends on what you're talking about, if you're talking about the traceability of a financial transaction, we kind of need that maybe for legal reasons, so even for humans we do that. You got to write down everything you did, why did you do this, why'd you do that, so we actually want traceability for humans, even. In other places, I think it is really about the newness. Do I really trust this thing, I don't know what it's doing. Trust comes with use, after a while it becomes pretty straightforward, I mean I think that's probably true for a cell phone, I remember the first smartphones coming out in the early 2000s, I didn't trust how they worked, I would never do a credit card transaction on 'em, these kind of things, now it's taken for granted. I've done it a million times, and I never had any problems, right? >> It's the opposite in social media, most people. >> Maybe that's the opposite, let's not go down that path. >> I quite like Dr. Kate Darling's analogy from MIT lab, which is we already we have AI, and we're quite used to them, they're called dogs. We don't fully understand how a dog makes a decision, and yet we use 'em every day. In a collaboration with humans, so a dog, sort of replace a particular job, but then again they don't, I don't particularly want to go and sniff things all day long. So having AI systems that can actually replace some of those jobs, actually, that's kind of great. >> Exactly, and think about it like this, if we can build systems that are tireless, and we can basically give 'em more power and they keep going, that's a big win for us. And actually, the dog analogy is great, because I think, at least my eventual goal as an AI researcher is to make the interface for intelligent agents to be like a dog, to train it like a dog, reinforce it for the behaviors you want and keep pushing it in new directions that way, as opposed to having to write code that's kind of esoteric. >> Can you talk about GANs, what is GANs, what's it stand for, what does it mean? >> Generative Adversarial Networks. What this means is that, you can kind of think of it as, two competing sides of solving a problem. So if I'm trying to make a fake picture of you, that makes it look like you have no hair, like me, you can see a Photoshop job, and you can kind of tell, that's not so great. So, one side is trying to make the picture, and the other side is trying to guess whether it's fake or not. We have two neural networks that are kind of working against each other, one's generating stuff, and the other one's saying, is it fake or not, and then eventually you keep improving each other, this one tells that one "No, I can tell," this one goes and tries something else, this one says "No, I can still tell." The one that's trying with a discerning network, once it can't tell anymore, you've kind of built something that's really good, that's sort of the general principle here. So we basically have two things kind of fighting each other to get better and better at a particular task. >> Like deepfakes. >> I use that because it is relevant in this case, and that's kind of where it came from, is from GANs. >> All right, okay, and so wow, obviously relevant with 2020 coming up. I'm going to ask you, how far do you think we can take AI, two part question, how far can we take AI in the near to mid term, let's talk in our lifetimes, and how far should we take it? Maybe you can address some of those thoughts. >> So how far can we take it, well, I think we often have the sci-fi narrative out there of building killer machines and this and that, I don't know that that's actually going to happen anytime soon, for several reasons, one is, we build machines for a purpose, they don't come from an embattled evolutionary past like we do, so their motivations are a little bit different, say. So that's one piece, they're really purpose-driven. Also, building something that's as general as a human or a dog is very hard, and we're not anywhere close to that. When I talked about the trillions of parameters that a human brain has, we might be able to get close to that from a engineering standpoint, but we're not really close to making those trillions of parameters work together in such a coherent way that a human brain does, and efficient, human brain does that in 20 watts, to do it today would be multiple megawatts, so it's not really something that's easily found, just laying around. Now how far should we take it, I look at AI as a way to push humanity to the next level. Let me explain what that means a little bit. Simple equation I always sort of write down, is people are like "Radiologists aren't going to have a job." No no no, what it means is one radiologist plus AI equals 100 radiologists. I can take that person's capabilities and scale it almost freely to millions of other people. It basically increases the accessibility of expertise, we can scale expertise, that's a good thing. It makes, solves problems like we have in healthcare today. All right, that's where we should be going with this. >> Well a good example would be, when, and probably part of the answer's today, when will machines make better diagnoses than doctors? I mean in some cases it probably exists today, but not broadly, but that's a good example, right? >> It is, it's a tool, though, so I look at it as more, giving a human doctor more data to make a better decision on. So, what AI really does for us is it doesn't limit the amount of data on which we can make decisions, as a human, all I can do is read so much, or hear so much, or touch so much, that's my limit of input. If I have an AI system out there listening to billions of observations, and actually presenting data in a form that I can make better decisions on, that's a win. It allows us to actually move science forward, to move accessibility of technologies forward. >> So keeping the context of that timeframe I said, someday in our lifetimes, however you want to define that, when do you think that, or do you think that driving your own car will become obsolete? >> I don't know that it'll ever be obsolete, and I'm a little bit biased on this, so I actually race cars. >> Me too, and I drive a stick, so. >> I kind of race them semi-professionally, so I don't want that to go away, but it's the same thing, we don't need to ride horses anymore, but we still do for fun, so I don't think it'll completely go away. Now, what I think will happen is that commutes will be changed, we will now use autonomous systems for that, and I think five, seven years from now, we will be using autonomy much more on prescribed routes. It won't be that it completely replaces a human driver, even in that timeframe, because it's a very hard problem to solve, in a completely general sense. So, it's going to be a kind of gentle evolution over the next 20 to 30 years. >> Do you think that AI will change the manufacturing pendulum, and perhaps some of that would swing back to, in this country, anyway, on-shore manufacturing? >> Yeah, perhaps, I was in Taiwan a couple of months ago, and we're actually seeing that already, you're seeing things that maybe were much more labor-intensive before, because of economic constraints are becoming more mechanized using AI. AI as inspection, did this machine install this thing right, so you have an inspector tool and you have an AI machine building it, it's a little bit like a GAN, you can think of, right? So this is happening already, and I think that's one of the good parts of AI, is that it takes away those harsh conditions that humans had to be in before to build devices. >> Do you think AI will eventually make large retail stores go away? >> Well, I think as long as there are humans who want immediate satisfaction, I don't know that it'll completely go away. >> Some humans enjoy shopping. >> Naveen: Some people like browsing, yeah. >> Depends how fast you need to get it. And then, my last AI question, do you think banks, traditional banks will lose control of the payment systems as a result of things like machine intelligence? >> Yeah, I do think there are going to be some significant shifts there, we're already seeing many payment companies out there automate several aspects of this, and reducing the friction of moving money. Moving money between people, moving money between different types of assets, like stocks and Bitcoins and things like that, and I think AI, it's a critical component that people don't see, because it actually allows you to make sure that first you're doing a transaction that makes sense, when I move from this currency to that one, I have some sense of what's a real number. It's much harder to defraud, and that's a critical element to making these technologies work. So you need AI to actually make that happen. >> All right, we'll give you the last word, just maybe you want to talk a little bit about what we can expect, AI futures, or anything else you'd like to share. >> I think it's, we're at a really critical inflection point where we have something that works, basically, and we're going to scale it, scale it, scale it to bring on new capabilities. It's going to be really expensive for the next few years, but we're going to then throw more engineering at it and start bringing it down, so I start seeing this look a lot more like a brain, something where we can start having intelligence everywhere, at various levels, very low power, ubiquitous compute, and then very high power compute in the cloud, but bringing these intelligent capabilities everywhere. >> Naveen, great guest, thanks so much for coming on theCUBE. >> Thank you, thanks for having me. >> You're really welcome, all right, keep it right there everybody, we'll be back with our next guest, Dave Vellante for Justin Warren, you're watching theCUBE live from AWS re:Invent 2019. We'll be right back. (techno music)

Published Date : Dec 3 2019

SUMMARY :

Brought to you by Amazon Web Services and Intel, AI products group at Intel, good to see you again, Dave: You're very welcome, so what's going on and we took sort of a divergent path so what do you see as the Well, so maybe let's focus that on the cloud first. the human brain is about a three to 500 trillion model, and the amount of situations one can incur on the road is that we had to come up with something that was on our hardware, and so we tie that in and I mean I can think of so many examples You got to write down everything you did, and we're quite used to them, they're called dogs. and we can basically give 'em more power and you can kind of tell, that's not so great. and that's kind of where it came from, is from GANs. and how far should we take it? I don't know that that's actually going to happen it doesn't limit the amount of data I don't know that it'll ever be obsolete, but it's the same thing, we don't need to ride horses that humans had to be in before to build devices. I don't know that it'll completely go away. Depends how fast you need to get it. and reducing the friction of moving money. All right, we'll give you the last word, and we're going to scale it, scale it, scale it we'll be back with our next guest,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
20 watts	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2014	DATE	0.99+
10 million	QUANTITY	0.99+
Naveen Rao	PERSON	0.99+
Justin Warren	PERSON	0.99+
20 million	QUANTITY	0.99+
one	QUANTITY	0.99+
Taiwan	LOCATION	0.99+
2013	DATE	0.99+
100 radiologists	QUANTITY	0.99+
Alan Turing	PERSON	0.99+
Naveen	PERSON	0.99+
Intel	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two	QUANTITY	0.99+
last year	DATE	0.99+
billions	QUANTITY	0.99+
a month	QUANTITY	0.99+
2020	DATE	0.99+
two part	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
one piece	QUANTITY	0.99+
Thursday	DATE	0.99+
Kate Darling	PERSON	0.98+
early 2000s	DATE	0.98+
two billion	QUANTITY	0.98+
first smartphones	QUANTITY	0.98+
one side	QUANTITY	0.98+
Sands Convention Center	LOCATION	0.97+
today	DATE	0.97+
OpenVINO	TITLE	0.97+
one radiologist	QUANTITY	0.96+
Dr.	PERSON	0.96+
16 year old	QUANTITY	0.95+
two phases	QUANTITY	0.95+
trillions of parameters	QUANTITY	0.94+
first	QUANTITY	0.94+
a million times	QUANTITY	0.93+
seven years	QUANTITY	0.93+
billions of observations	QUANTITY	0.92+
one thing	QUANTITY	0.92+
one extreme	QUANTITY	0.91+
two competing sides	QUANTITY	0.9+
500 trillion model	QUANTITY	0.9+
a year	QUANTITY	0.89+
five	QUANTITY	0.88+
each	QUANTITY	0.88+
One area	QUANTITY	0.88+
a couple of months ago	DATE	0.85+
one sort	QUANTITY	0.84+
two neural	QUANTITY	0.82+
GANs	ORGANIZATION	0.79+
couple of weeks	QUANTITY	0.78+
DeepRacer	TITLE	0.77+
millions of	QUANTITY	0.76+
Photoshop	TITLE	0.72+
deepfakes	ORGANIZATION	0.72+
next few years	DATE	0.71+
year	QUANTITY	0.67+
re:Invent 2019	EVENT	0.66+
three	QUANTITY	0.64+
Invent 2019	EVENT	0.64+
about	QUANTITY	0.63+

Jonathan Ballon, Intel | AWS re:Invent 2018

>> Live from Las Vegas, it's theCUBE, covering AWS re:Invent 2018. Brought to you by Amazon Web Services, Intel, and their Ecosystem partners. >> Oh welcome back, to theCUBE. Continuing coverage here from AWS re:Invent, as we start to wind down our coverage here on the second day. We'll be here tomorrow as well, live on theCUBE, bringing you interviews from Hall D at the Sands Expo. Along with Justin Warren, I'm John Walls, and we're joined by Jonathan Ballon, who's the Vice President of the internet of things at Intel. Jonathan, thank you for being with us today. Good to see you, >> Thanks for having me guys. >> All right, interesting announcement today, and last year it was all about DeepLens. This year it's about DeepRacer. Tell us about that. >> What we're really trying to do is make AI accessible to developers and democratize various AI tools. Last year it was about computer vision. The DeepLens camera was a way for developers to very inexpensively get a hold of a camera, the first camera that was a deep-learning enabled, cloud connected camera, so that they could start experimenting and see what they could do with that type of device. This year we took the camera and we put it in a car, and we thought what could they do if we add mobility to the equation, and specifically, wanted to introduce a relatively obscure form of AI called reinforcement learning. Historically this has been an area of AI that hasn't really been accessible to most developers, because they haven't had the compute resources at their disposal, or the scale to do it. And so now, what we've done is we've built a car, and a set of tools that help the car run. >> And it's a little miniature car, right? I mean it's a scale. >> It's 1/118th scale, it's an RC car. It's four-wheel drive, four-wheel steering. It's got GPS, it's got two batteries. One that runs the car itself, one that runs the compute platform and the camera. It's got expansion capabilities. We've got plans for next year of how we can turbo-charge the car. >> I love it. >> Right now it's baby steps, so to speak, and basically giving the developer the chance to write a reinforcement learning model, an algorithm that helps them to determine what is the optimum way that this car can move around a track, but you're not telling the car what the optimum way is, you're letting the car figure it out on their own. And that's really the key to reinforcement learning is you don't need a large dataset to begin with, it's pre-trained. You're actually letting, in this case, a device figure it out for themselves, and this becomes very powerful as a tool, when you think about it being applied to various industries, or various use-cases, where we don't know the answer today, but we can allow vast amounts of computing resources to run a reinforcement model over and over, perhaps millions of times, until they find the optimum solution. >> So how do you, I mean that's a lot of input right? That's a lot, that's a crazy number of variables. So, how do you do that? So, how do you, like in this case, provide a car with all the multiple variables that will come into play. How fast it goes, and which direction it goes, and all that, and on different axes and all those things, to make these own determinations, and how will that then translate to a real specific case in the workplace? >> Well, I mean the obvious parallel is of course autonomous driving. AWS had Formula One on stage today during Andy Jassy's keynote, that's also an Intel customer, and what Formula One does is they have the fastest cars in the world, and they have over 120 sensors on that car that are bringing in over a million pieces of data per second. Being able to process that vast amount of data that quickly, which includes a variety of data, like it's not just, it's also audio data, it's visual data, and being able to use that to inform decisions in close to real time, requires very powerful compute resources, and those resources exist both in the cloud as well as close to the source of the data itself at the edge, in the physical environment. >> So, tell us a bit about the software that's involved here, 'cause people think of Intel, you know that some people don't know about the software heritage that Intel has. It's not just about, the Intel inside isn't just the hardware chips that's there, there's a lot of software that goes into this. So, what's the Intel angle here on the software that powers this kind of distributed learning. >> Absolutely, software is a very important part of any AI architecture, and for us we've a tremendous amount of investment. It's almost perhaps, equal investment in software as we do in hardware. In the case of what we announced today with DeepRacer and AWS, there's some toolkits that allow developers to better harness the compute resources on the car itself. Two things specifically, one is we have a tool called, RL Coach or Reinforcement Learning Coach, that is integrated into SageMaker, AWS' machine learning toolkit, that allows them to access better performance in the cloud of that data that's coming into the, off their model and into their cloud. And then we also have a toolkit called OpenVINO. It's not about drinking wine. >> Oh darn. >> Alright. >> Open means it's an opensource contribution that we made to the industry. Vino, V-I-N-O is Visual Inference and Neural Network Optimization, and this is a powerful tool, because so much of AI is about harnessing compute resources efficiently, and as more and more of the data that we bring into our compute environments is actually taking place in the physical world, it's really important to be able to do that in a cost-effective and power-efficient way. OpenVINO allows developers to actually isolate individual cores or an integrated GPU on a CPU without knowing anything about hardware architecture, and it allows them then to apply different applications, or different algorithms, or inference workloads very efficiently onto that compute architecture, but it's abstracted away from any knowledge of that. So, it's really designed for an application developer, who maybe is working with a data scientist that's built a neural network in a framework like TensorFlow, or Onyx, or Pytorch, any tool that they're already comfortable with, abstract away from the silicon and optimize their model onto this hardware platform, so it performs at orders of magnitude better performance then what you would get from a more traditional GPU approach. >> Yeah, and that kind of decision making about understanding chip architectures to be able to optimize how that works, that's some deep magic really. The amount of understanding that you would need to have to do that as a human is enormous, but as a developer, I don't know anything about chip architectures, so it sounds like the, and it's a thing that we've been hearing over the last couple of days, is these tools allow developers to have essentially superpowers, so you become an augmented intelligence yourself. Rather than just giving everything to an artificial intelligence, these tools actually augment the human intelligence and allow you to do things that you wouldn't otherwise be able to do. >> And that's I think the key to getting mass market adoption of some of these AI implementations. So, for the last four or five years since ImageNet solved the image recognition problem, and now we have greater accuracy from computer models then we do from our own human eyes, really AI was limited to academia, or large IT tech companies, or proof-of-concepts. It didn't really scale into these production environments, but what we've seen over the couple of years is really a democratization of AI by companies like AWS and Intel that are making tools available to developers, so they don't need to know how to code in Python to optimize a compute module, or they don't need to, in many cases, understand the fundamental underlying architectures. They can focus on whatever business problem they're tryin' to solve, or whatever AI use-case it is that they're working on. >> I know you talked about DeepLens last year, and now we've got DeepRacer this year, and you've got the contest going on throughout this coming year with DeepRacer, and we're going to have a big race at the AWS re:Invent 2019. So what's next? I mean, or what are you thinking about conceptually to, I guess build on what you've already started there? >> Well, I can't reveal what next years, >> Well that I understand >> Project will be. >> But generally speaking. >> But what I can tell you, what I can tell you is what's available today in these DeepRacer cars is a level playing field. Everyone's getting the same car and they have essentially the same tool sets, but I've got a couple of pro-tips for your viewers if they want to win some of these AWS Summits that are going to be around the world in 2019. Two pro-tips, one is they can leverage the OpenVINO toolkit to get much higher inference performance from what's already on that car. So, I encourage them to work with OpenVINO. It's integrated into SageMaker, so that they have easy access to it if they're an AWS developer, but also we're going to allow an expansion of, almost an accelerator of the car itself, by being able to plug in an Intel Neural Compute Stick. We just released the second version of this stick. It's a USB form factor. It's got a Movidius Myriad X Vision processing unit inside. This years version is eight times more powerful than last years version, and when they plug it into the car, all of that inference workload, all of those images, and information that's coming off those sensors will be put onto the VPU, allowing all the CPU, and GPU resources to be used for other activities. It's going to allow that car to go at turbo speed. >> To really cook. >> Yeah. (laughing) >> Alright, so now you know, you have no excuse, right? I mean Jonathan has shared the secret sauce, although I still think when you said OpenVINO you got Justin really excited. >> It is vino time. >> It is five o'clock actually. >> Alright, thank you for being with us. >> Thanks for having me guys. >> And good luck with DeepRacer for the coming year. >> Thank you. >> It looks like a really, really fun project. We're back with more, here at AWS re:Invent on theCUBE, live in Las Vegas. (rhythmic digital music)

Published Date : Nov 29 2018

SUMMARY :

Brought to you by Amazon Web Services, Intel, Good to see you, and last year it was all about DeepLens. that hasn't really been accessible to most developers, And it's a little miniature car, right? One that runs the car itself, And that's really the key to reinforcement learning to a real specific case in the workplace? and being able to use that to inform decisions It's not just about, the Intel inside that allows them to access better performance in the cloud and as more and more of the data that we bring Yeah, and that kind of decision making about And that's I think the key to getting mass market adoption I mean, or what are you thinking about conceptually to, so that they have easy access to it I mean Jonathan has shared the secret sauce, on theCUBE, live in Las Vegas.

ENTITIES

Entity	Category	Confidence
Justin Warren	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Jonathan Ballon	PERSON	0.99+
Jonathan	PERSON	0.99+
John Walls	PERSON	0.99+
AWS	ORGANIZATION	0.99+
last year	DATE	0.99+
AWS'	ORGANIZATION	0.99+
Last year	DATE	0.99+
Las Vegas	LOCATION	0.99+
Andy Jassy	PERSON	0.99+
Intel	ORGANIZATION	0.99+
2019	DATE	0.99+
one	QUANTITY	0.99+
Python	TITLE	0.99+
next year	DATE	0.99+
Justin	PERSON	0.99+
two batteries	QUANTITY	0.99+
first camera	QUANTITY	0.99+
This year	DATE	0.99+
second version	QUANTITY	0.99+
tomorrow	DATE	0.99+
eight times	QUANTITY	0.99+
five o'clock	DATE	0.99+
Two things	QUANTITY	0.99+
this year	DATE	0.98+
Two pro-tips	QUANTITY	0.98+
over a million pieces	QUANTITY	0.98+
today	DATE	0.98+
over 120 sensors	QUANTITY	0.98+
OpenVINO	TITLE	0.98+
One	QUANTITY	0.98+
four-wheel	QUANTITY	0.97+
Sands Expo	EVENT	0.97+
DeepRacer	ORGANIZATION	0.97+
SageMaker	TITLE	0.96+
Myriad X Vision	COMMERCIAL_ITEM	0.95+
DeepLens	COMMERCIAL_ITEM	0.95+
V-I-N-O	TITLE	0.94+
second day	QUANTITY	0.94+
TensorFlow	TITLE	0.94+
both	QUANTITY	0.94+
millions of times	QUANTITY	0.92+
Pytorch	TITLE	0.92+
Onyx	TITLE	0.91+
Neural Compute Stick	COMMERCIAL_ITEM	0.91+
RL Coach	TITLE	0.91+
Movidius	ORGANIZATION	0.89+
Invent 2018	EVENT	0.86+
coming year	DATE	0.86+
Reinforcement Learning Coach	TITLE	0.85+
this coming year	DATE	0.82+
ImageNet	ORGANIZATION	0.82+
theCUBE	ORGANIZATION	0.82+
five years	QUANTITY	0.81+
re:Invent 2019	EVENT	0.8+
Vino	TITLE	0.78+
last couple of days	DATE	0.77+
Formula One	TITLE	0.75+
AWS re:Invent 2018	EVENT	0.72+
Hall D	LOCATION	0.71+
couple of years	QUANTITY	0.71+
four	QUANTITY	0.71+
data per second	QUANTITY	0.69+
re:	EVENT	0.67+
1/118th scale	QUANTITY	0.67+
DeepRacer	COMMERCIAL_ITEM	0.67+
Formula	ORGANIZATION	0.67+
DeepRacer	TITLE	0.65+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for two neural: