SiliconANGLE News | Swami Sivasubramanian Extended Version

(bright upbeat music) >> Hello, everyone. Welcome to SiliconANGLE News breaking story here. Amazon Web Services expanding their relationship with Hugging Face, breaking news here on SiliconANGLE. I'm John Furrier, SiliconANGLE reporter, founder, and also co-host of theCUBE. And I have with me, Swami, from Amazon Web Services, vice president of database, analytics, machine learning with AWS. Swami, great to have you on for this breaking news segment on AWS's big news. Thanks for coming on and taking the time. >> Hey, John, pleasure to be here. >> You know- >> Looking forward to it. >> We've had many conversations on theCUBE over the years, we've watched Amazon really move fast into the large data modeling, SageMaker became a very smashing success, obviously you've been on this for a while. Now with ChatGPT OpenAI, a lot of buzz going mainstream, takes it from behind the curtain inside the ropes, if you will, in the industry to a mainstream. And so this is a big moment, I think, in the industry, I want to get your perspective, because your news with Hugging Face, I think is another tell sign that we're about to tip over into a new accelerated growth around making AI now application aware, application centric, more programmable, more API access. What's the big news about, with AWS Hugging Face, you know, what's going on with this announcement? >> Yeah. First of all, they're very excited to announce our expanded collaboration with Hugging Face, because with this partnership, our goal, as you all know, I mean, Hugging Face, I consider them like the GitHub for machine learning. And with this partnership, Hugging Face and AWS, we'll be able to democratize AI for a broad range of developers, not just specific deep AI startups. And now with this, we can accelerate the training, fine tuning and deployment of these large language models, and vision models from Hugging Face in the cloud. And the broader context, when you step back and see what customer problem we are trying to solve with this announcement, essentially if you see these foundational models, are used to now create like a huge number of applications, suggest like tech summarization, question answering, or search image generation, creative, other things. And these are all stuff we are seeing in the likes of these ChatGPT style applications. But there is a broad range of enterprise use cases that we don't even talk about. And it's because these kind of transformative, generative AI capabilities and models are not available to, I mean, millions of developers. And because either training these elements from scratch can be very expensive or time consuming and need deep expertise, or more importantly, they don't need these generic models, they need them to be fine tuned for the specific use cases. And one of the biggest complaints we hear is that these models, when they try to use it for real production use cases, they are incredibly expensive to train and incredibly expensive to run inference on, to use it at a production scale. So, and unlike web search style applications, where the margins can be really huge, here in production use cases and enterprises, you want efficiency at scale. That's where Hugging Face and AWS share our mission. And by integrating with Trainium and Inferentia, we're able to handle the cost efficient training and inference at scale, I'll deep dive on it. And by teaming up on the SageMaker front, now the time it takes to build these models and fine tune them is also coming down. So that's what makes this partnership very unique as well. So I'm very excited. >> I want to get into the time savings and the cost savings as well on the training and inference, it's a huge issue, but before we get into that, just how long have you guys been working with Hugging Face? I know there's a previous relationship, this is an expansion of that relationship, can you comment on what's different about what's happened before and then now? >> Yeah. So, Hugging Face, we have had a great relationship in the past few years as well, where they have actually made their models available to run on AWS, you know, fashion. Even in fact, their Bloom Project was something many of our customers even used. Bloom Project, for context, is their open source project which builds a GPT-3 style model. And now with this expanded collaboration, now Hugging Face selected AWS for that next generation office generative AI model, building on their highly successful Bloom Project as well. And the nice thing is, now, by direct integration with Trainium and Inferentia, where you get cost savings in a really significant way, now, for instance, Trn1 can provide up to 50% cost to train savings, and Inferentia can deliver up to 60% better costs, and four x more higher throughput than (indistinct). Now, these models, especially as they train that next generation generative AI models, it is going to be, not only more accessible to all the developers, who use it in open, so it'll be a lot cheaper as well. And that's what makes this moment really exciting, because we can't democratize AI unless we make it broadly accessible and cost efficient and easy to program and use as well. >> Yeah. >> So very exciting. >> I'll get into the SageMaker and CodeWhisperer angle in a second, but you hit on some good points there. One, accessibility, which is, I call the democratization, which is getting this in the hands of developers, and/or AI to develop, we'll get into that in a second. So, access to coding and Git reasoning is a whole nother wave. But the three things I know you've been working on, I want to put in the buckets here and comment, one, I know you've, over the years, been working on saving time to train, that's a big point, you mentioned some of those stats, also cost, 'cause now cost is an equation on, you know, bundling whether you're uncoupling with hardware and software, that's a big issue. Where do I find the GPUs? Where's the horsepower cost? And then also sustainability. You've mentioned that in the past, is there a sustainability angle here? Can you talk about those three things, time, cost, and sustainability? >> Certainly. So if you look at it from the AWS perspective, we have been supporting customers doing machine learning for the past years. Just for broader context, Amazon has been doing ML the past two decades right from the early days of ML powered recommendation to actually also supporting all kinds of generative AI applications. If you look at even generative AI application within Amazon, Amazon search, when you go search for a product and so forth, we have a team called MFi within Amazon search that helps bring these large language models into creating highly accurate search results. And these are created with models, really large models with tens of billions of parameters, scales to thousands of training jobs every month and trained on large model of hardware. And this is an example of a really good large language foundation model application running at production scale, and also, of course, Alexa, which uses a large generator model as well. And they actually even had a research paper that showed that they are more, and do better in accuracy than other systems like GPT-3 and whatnot. So, and we also touched on things like CodeWhisperer, which uses generative AI to improve developer productivity, but in a responsible manner, because 40% of some of the studies show 40% of this generated code had serious security flaws in it. This is where we didn't just do generative AI, we combined with automated reasoning capabilities, which is a very, very useful technique to identify these issues and couple them so that it produces highly secure code as well. Now, all these learnings taught us few things, and which is what you put in these three buckets. And yeah, like more than 100,000 customers using ML and AI services, including leading startups in the generative AI space, like stability AI, AI21 Labs, or Hugging Face, or even Alexa, for that matter. They care about, I put them in three dimension, one is around cost, which we touched on with Trainium and Inferentia, where we actually, the Trainium, you provide to 50% better cost savings, but the other aspect is, Trainium is a lot more power efficient as well compared to traditional one. And Inferentia is also better in terms of throughput, when it comes to what it is capable of. Like it is able to deliver up to three x higher compute performance and four x higher throughput, compared to it's previous generation, and it is extremely cost efficient and power efficient as well. >> Well. >> Now, the second element that really is important is in a day, developers deeply value the time it takes to build these models, and they don't want to build models from scratch. And this is where SageMaker, which is, even going to Kaggle uses, this is what it is, number one, enterprise ML platform. What it did to traditional machine learning, where tens of thousands of customers use StageMaker today, including the ones I mentioned, is that what used to take like months to build these models have dropped down to now a matter of days, if not less. Now, a generative AI, the cost of building these models, if you look at the landscape, the model parameter size had jumped by more than thousand X in the past three years, thousand x. And that means the training is like a really big distributed systems problem. How do you actually scale these model training? How do you actually ensure that you utilize these efficiently? Because these machines are very expensive, let alone they consume a lot of power. So, this is where SageMaker capability to build, automatically train, tune, and deploy models really concern this, especially with this distributor training infrastructure, and those are some of the reasons why some of the leading generative AI startups are actually leveraging it, because they do not want a giant infrastructure team, which is constantly tuning and fine tuning, and keeping these clusters alive. >> It sounds like a lot like what startups are doing with the cloud early days, no data center, you move to the cloud. So, this is the trend we're seeing, right? You guys are making it easier for developers with Hugging Face, I get that. I love that GitHub for machine learning, large language models are complex and expensive to build, but not anymore, you got Trainium and Inferentia, developers can get faster time to value, but then you got the transformers data sets, token libraries, all that optimized for generator. This is a perfect storm for startups. Jon Turow, a former AWS person, who used to work, I think for you, is now a VC at Madrona Venture, he and I were talking about the generator AI landscape, it's exploding with startups. Every alpha entrepreneur out there is seeing this as the next frontier, that's the 20 mile stairs, next 10 years is going to be huge. What is the big thing that's happened? 'Cause some people were saying, the founder of Yquem said, "Oh, the start ups won't be real, because they don't all have AI experience." John Markoff, former New York Times writer told me that, AI, there's so much work done, this is going to explode, accelerate really fast, because it's almost like it's been waiting for this moment. What's your reaction? >> I actually think there is going to be an explosion of startups, not because they need to be AI startups, but now finally AI is really accessible or going to be accessible, so that they can create remarkable applications, either for enterprises or for disrupting actually how customer service is being done or how creative tools are being built. And I mean, this is going to change in many ways. When we think about generative AI, we always like to think of how it generates like school homework or arts or music or whatnot, but when you look at it on the practical side, generative AI is being actually used across various industries. I'll give an example of like Autodesk. Autodesk is a customer who runs an AWS and SageMaker. They already have an offering that enables generated design, where designers can generate many structural designs for products, whereby you give a specific set of constraints and they actually can generate a structure accordingly. And we see similar kind of trend across various industries, where it can be around creative media editing or various others. I have the strong sense that literally, in the next few years, just like now, conventional machine learning is embedded in every application, every mobile app that we see, it is pervasive, and we don't even think twice about it, same way, like almost all apps are built on cloud. Generative AI is going to be part of every startup, and they are going to create remarkable experiences without needing actually, these deep generative AI scientists. But you won't get that until you actually make these models accessible. And I also don't think one model is going to rule the world, then you want these developers to have access to broad range of models. Just like, go back to the early days of deep learning. Everybody thought it is going to be one framework that will rule the world, and it has been changing, from Caffe to TensorFlow to PyTorch to various other things. And I have a suspicion, we had to enable developers where they are, so. >> You know, Dave Vellante and I have been riffing on this concept called super cloud, and a lot of people have co-opted to be multicloud, but we really were getting at this whole next layer on top of say, AWS. You guys are the most comprehensive cloud, you guys are a super cloud, and even Adam and I are talking about ISVs evolving to ecosystem partners. I mean, your top customers have ecosystems building on top of it. This feels like a whole nother AWS. How are you guys leveraging the history of AWS, which by the way, had the same trajectory, startups came in, they didn't want to provision a data center, the heavy lifting, all the things that have made Amazon successful culturally. And day one thinking is, provide the heavy lifting, undifferentiated heavy lifting, and make it faster for developers to program code. AI's got the same thing. How are you guys taking this to the next level, because now, this is an opportunity for the competition to change the game and take it over? This is, I'm sure, a conversation, you guys have a lot of things going on in AWS that makes you unique. What's the internal and external positioning around how you take it to the next level? >> I mean, so I agree with you that generative AI has a very, very strong potential in terms of what it can enable in terms of next generation application. But this is where Amazon's experience and expertise in putting these foundation models to work internally really has helped us quite a bit. If you look at it, like amazon.com search is like a very, very important application in terms of what is the customer impact on number of customers who use that application openly, and the amount of dollar impact it does for an organization. And we have been doing it silently for a while now. And the same thing is true for like Alexa too, which actually not only uses it for natural language understanding other city, even national leverages is set for creating stories and various other examples. And now, our approach to it from AWS is we actually look at it as in terms of the same three tiers like we did in machine learning, because when you look at generative AI, we genuinely see three sets of customers. One is, like really deep technical expert practitioner startups. These are the startups that are creating the next generation models like the likes of stability AIs or Hugging Face with Bloom or AI21. And they generally want to build their own models, and they want the best price performance of their infrastructure for training and inference. That's where our investments in silicon and hardware and networking innovations, where Trainium and Inferentia really plays a big role. And we can nearly do that, and that is one. The second middle tier is where I do think developers don't want to spend time building their own models, let alone, they actually want the model to be useful to that data. They don't need their models to create like high school homeworks or various other things. What they generally want is, hey, I had this data from my enterprises that I want to fine tune and make it really work only for this, and make it work remarkable, can be for tech summarization, to generate a report, or it can be for better Q&A, and so forth. This is where we are. Our investments in the middle tier with SageMaker, and our partnership with Hugging Face and AI21 and co here are all going to very meaningful. And you'll see us investing, I mean, you already talked about CodeWhisperer, which is an open preview, but we are also partnering with a whole lot of top ISVs, and you'll see more on this front to enable the next wave of generated AI apps too, because this is an area where we do think lot of innovation is yet to be done. It's like day one for us in this space, and we want to enable that huge ecosystem to flourish. >> You know, one of the things Dave Vellante and I were talking about in our first podcast we just did on Friday, we're going to do weekly, is we highlighted the AI ChatGPT example as a horizontal use case, because everyone loves it, people are using it in all their different verticals, and horizontal scalable cloud plays perfectly into it. So I have to ask you, as you look at what AWS is going to bring to the table, a lot's changed over the past 13 years with AWS, a lot more services are available, how should someone rebuild or re-platform and refactor their application of business with AI, with AWS? What are some of the tools that you see and recommend? Is it Serverless, is it SageMaker, CodeWhisperer? What do you think's going to shine brightly within the AWS stack, if you will, or service list, that's going to be part of this? As you mentioned, CodeWhisperer and SageMaker, what else should people be looking at as they start tinkering and getting all these benefits, and scale up their ups? >> You know, if we were a startup, first, I would really work backwards from the customer problem I try to solve, and pick and choose, bar, I don't need to deal with the undifferentiated heavy lifting, so. And that's where the answer is going to change. If you look at it then, the answer is not going to be like a one size fits all, so you need a very strong, I mean, granted on the compute front, if you can actually completely accurate it, so unless, I will always recommend it, instead of running compute for running your ups, because it takes care of all the undifferentiated heavy lifting, but on the data, and that's where we provide a whole variety of databases, right from like relational data, or non-relational, or dynamo, and so forth. And of course, we also have a deep analytical stack, where data directly flows from our relational databases into data lakes and data virus. And you can get value along with partnership with various analytical providers. The area where I do think fundamentally things are changing on what people can do is like, with CodeWhisperer, I was literally trying to actually program a code on sending a message through Twilio, and I was going to pull up to read a documentation, and in my ID, I was actually saying like, let's try sending a message to Twilio, or let's actually update a Route 53 error code. All I had to do was type in just a comment, and it actually started generating the sub-routine. And it is going to be a huge time saver, if I were a developer. And the goal is for us not to actually do it just for AWS developers, and not to just generate the code, but make sure the code is actually highly secure and follows the best practices. So, it's not always about machine learning, it's augmenting with automated reasoning as well. And generative AI is going to be changing, and not just in how people write code, but also how it actually gets built and used as well. You'll see a lot more stuff coming on this front. >> Swami, thank you for your time. I know you're super busy. Thank you for sharing on the news and giving commentary. Again, I think this is a AWS moment and industry moment, heavy lifting, accelerated value, agility. AIOps is going to be probably redefined here. Thanks for sharing your commentary. And we'll see you next time, I'm looking forward to doing more follow up on this. It's going to be a big wave. Thanks. >> Okay. Thanks again, John, always a pleasure. >> Okay. This is SiliconANGLE's breaking news commentary. I'm John Furrier with SiliconANGLE News, as well as host of theCUBE. Swami, who's a leader in AWS, has been on theCUBE multiple times. We've been tracking the growth of how Amazon's journey has just been exploding past five years, in particular, past three. You heard the numbers, great performance, great reviews. This is a watershed moment, I think, for the industry, and it's going to be a lot of fun for the next 10 years. Thanks for watching. (bright music)

Published Date : Feb 22 2023

SUMMARY :

Swami, great to have you on inside the ropes, if you And one of the biggest complaints we hear and easy to program and use as well. I call the democratization, the Trainium, you provide And that means the training What is the big thing that's happened? and they are going to create this to the next level, and the amount of dollar impact that's going to be part of this? And generative AI is going to be changing, AIOps is going to be John, always a pleasure. and it's going to be a lot

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Swami	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Jon Turow	PERSON	0.99+
John Markoff	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
40%	QUANTITY	0.99+
Autodesk	ORGANIZATION	0.99+
50%	QUANTITY	0.99+
Madrona Venture	ORGANIZATION	0.99+
20 mile	QUANTITY	0.99+
Hugging Face	ORGANIZATION	0.99+
Friday	DATE	0.99+
second element	QUANTITY	0.99+
more than 100,000 customers	QUANTITY	0.99+
AI21	ORGANIZATION	0.99+
tens of thousands	QUANTITY	0.99+
first podcast	QUANTITY	0.99+
three tiers	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
twice	QUANTITY	0.98+
Bloom Project	TITLE	0.98+
one	QUANTITY	0.98+
SageMaker	ORGANIZATION	0.98+
Hugging Face	TITLE	0.98+
Alexa	TITLE	0.98+
first	QUANTITY	0.98+
GitHub	ORGANIZATION	0.98+
one model	QUANTITY	0.98+
up to 50%	QUANTITY	0.97+
ChatGPT	TITLE	0.97+
First	QUANTITY	0.97+
more than thousand X	QUANTITY	0.97+
amazon.com	ORGANIZATION	0.96+
tens of billions	QUANTITY	0.96+
One	QUANTITY	0.96+
up to 60%	QUANTITY	0.96+
one framework	QUANTITY	0.96+
Yquem	ORGANIZATION	0.94+
three things	QUANTITY	0.94+
Inferentia	ORGANIZATION	0.94+
CodeWhisperer	TITLE	0.93+
four	QUANTITY	0.92+
three sets	QUANTITY	0.92+
three	QUANTITY	0.92+
Twilio	ORGANIZATION	0.92+

Patrick Osborne, HPE | CUBEConversation, November 2018

>> From the SiliconANGLE Media Office in Boston, Massachusets, it's theCUBE. Now, here's your host, Dave Vellante. >> Hi everybody, welcome to this preview of HPE's, Discover Madrid storage news. We're gonna unpack that. My name is Dave Vellante and Hewlett Packard Enterprise has a six-month cadence of shows. They have one in the June timeframe in Las Vegas, and then one in Europe. This year, again, it's in Madrid and you always see them announce products and innovations coinciding with those big user shows. With me here is Patrick Osborne who's the Vice President and General Manager of Big Data and Secondary Storage at HPE. Patrick, great to see you again. >> Great to be here, love theCUBE, thanks for having us. >> Oh, you're very welcome. So let's, let's unpack some of these announcements. You guys, as I said, you're on this six-month cadence. You've got sort of three big themes that you're vectoring into, maybe you could start there. >> Yeah, so within HP Storage and Big Data where, you know, where our point of view is around intelligent storage and intelligent data management and underneath that we've kind of vectored in on three pillars that you talked about. AI driven, so essentially bringing the intelligence, self-managing, self-healing, to all of our storage platforms, and big-data platforms, built for the Cloud, right? We've got a lot of use cases, and user stories, and you've seen from an HPE perspective, Hybrid Cloud, you know, is a big investment we're making in addition to the edge. And the last is delivering all of our capabilities, from product perspective, solutions and services as a service, right? So GreenLake is something that we started a few years ago and being able to provide that type of elastic, you know, purchasing experience for our customers is gonna weave itself in further products and solutions that we announce. >> So I like your strategy around AI. AI of course gets a lot of buzz these days. You guy are taking a practical approach. The Nimble acquisition gave you some capabilities there in predictive maintenance. You've pushed it into your automation capabilities. So let's talk about the hard news specifically around InfoSight. >> Yeah, so InfoSight is an incredible platform and what you see is that we've been not only giving customers richer experiences on top of InfoSight that go further up into the stack so we're providing recommendation engines so we've got this whole concept of Cross-stack Analytics that go from, you know, your app and your virtualization layer through the physical infrastructure. So we've had a number of pieces of that, that we're announcing to give very rich, AI-driven guidance, to customers, you know, to fix specific problems. We're also extending it to more platforms. Right, we just announced last week the ability to run InfoSight on our server platforms, right? So we're starting off on a journey of providing that which we're doing at the storage and networking layer weaving in our server platform. So essentially platforms like ProLiant, Synergy, Apollo, all of our value compute platforms. So we are, we're doing some really cool stuff not only providing the experience on new platforms, but richer experiences certainly around performance bottlenecks on 3PAR so we're getting deeper AI-driven recommendation engines as well as what we call an AI-driven resource planner for Nimble. So if you take a look at it from a tops-down view this isn't AI marketing. We're actually applying these techniques and machine learning within our install base in our fleet which is growing larger as we extend support from our platforms that actually make people's lives easier from a storage administration perspective. >> And that was a big part of the acquisition that IP, that machine intelligence IP. Obviously you had to evaluate that and the complexity of bringing it across the portfolio. You know we live in this API-driven world, Nimble was a very modern platform so that facilitated that injection of that intelligence across the platform and that's what we're seeing now isn't it. >> Yeah, absolutely. You go from essentially tooling up these platforms for this very rich telemetry really delivering a differentiated support experience that takes a lot of the manual interactions and interventions from a human perspective out of it and now we're moving in with these three announcements that we've made into things that are doing predictive analytics, recommendations and automation at the end of the day. So we're really making, trying to make people's lives easier from an admin perspective and giving them time back to work on higher value activities. >> Well let's talk about Cloud. HP doesn't have a public Cloud like an Amazon or an Azure, you partner with those guys, but you have Cloud Volumes, which is Cloud-like, it's actually Cloud from a business model perspective. Explain what Cloud Volumes is and what's the news here? >> Yeah, so, we've got a great service, it's called HPE Cloud Volumes and you'll see throughout the year us extending more user stories and experiences for Hybrid Cloud, right. So we have CloudBank, which focuses on secondary storage, Cloud Volumes is for primary storage users, so it is a Cloud, public Cloud adjacent storage as a service and it allows you to go into the portal, into your credentials. You can enter in your credit card number and essentially get storage as a service as an adjacent, or replacement data service for, for example, EBS from Amazon. So you're able to stand up storage as a service within a co-location facility that we manage and it's completely delivered as a service and then our announcement for that is that, so what we've done in the Americas is you can essentially apply compute instances from the public Cloud to that storage, so it's in a co-location facility it's very close from a latency standpoint to the public Cloud. Now we're gonna be extending that service into Europe, so UK, Ireland, and for the EMEA users as well as now we can also support persistent storage work loads for Docker and Kubernetes and this is a big win for a lot of customers that wanna do continuous improvement, continuous development, and use those containerized frameworks and then you can essentially, you know, integrate with your on-prem storage to your off-prem and then pull in the compute from the Cloud. >> Okay so you got that, write once, run anywhere sort of model. I was gonna ask you well why would I do this instead of EBS, I think you just answered that question. It's because you now can do that anywhere, hybrid is a key theme here, right? >> Yeah, also too from a resiliency perspective, performance, and durability perspective, the service that we provide is, you know, certainly six-nines, very high performant, from a latency perspective. We've been in the enterprise-storage game for quite some time so we feel we've got a really good service just from the technology perspective as well. >> And the European piece, I presume a lot of that is, well of course, GDPR, the fines went into effect in May of 2018. There's a lot of discussion about okay, data can't leave a particular locality, it's especially onerous in Europe, but probably other places as well. So there's a, there's a data locality governance compliance angle here too, is there not? >> Yeah, absolutely, and for us if you take a specific industry like healthcare, you know, for example, so you have to have pretty clear line of sight for your data provenance so it allows us to provide the service in these locations for a healthcare customer, or a healthcare ISV, you know, SAS provider to be able to essentially point to where that data is, you know, and so for us it's gonna be an entrance into that vertical for hybrid Cloud use cases. >> Alright so, so again, we've got the AI-driven piece, the Cloud piece, I see as a service, which is the third piece, I see Cloud as one, and as a service is one-A, it's almost like a feature of Cloud. So let's unpack that a little bit. What are you announcing in as a service and what's your position there? >> Yeah, so our vision is to be able to provide, and as a service experience, for almost everything we have that we provide our customers. Whether it's an individual product, whether it's a solution, or actually like a segment, right? So in the space that I work in, in Big Data and secondary service, secondary storage, backup is a service, for example, right, it's something that customers want, right? They don't want to be able to manage that on their own by piece parts, architect the whole thing, so what we're able to do is provide your primary storage, your secondary storage, your backup ISV, so in this case we're gonna be providing backup as a service through GreenLake with Vim. And then we even can bring in your Cloud capacity, so for example, Azure Blob Storage which will be your tertiary storage, you know, from an archive perspective. So for us it really allows us to provide customers an experience that, you know, is more of an, it's an experienced, Cloud is a destination, we're providing a multi-Cloud, a Hybrid-Cloud experience not only from a technology perspective, but also from a purchasing flex up, flex down, flex out experience and we're gonna keep on doing that over and over for the next, you know, foreseeable future. >> So you've been doing GreenLake for awhile here-- >> Yeah, absolutely. >> So how's that going and what's new here? >> Yeah, so that's been going great. We have well over, I think at this point, 500 petabytes on our management under GreenLake and so the service is, it's interesting when you think about it, when we were designing this we thought, just like the public Cloud, the compute as a service would take off, but from our perspective I think one of the biggest pain points for customers is managing data, you know, storage and Big Data, so storage as a service has grown very rapidly. So these services are very popular and we'll keep on iterating on them to create maximum velocity. One of the other things that's interesting about some of these accounting rules that have taken place, is that customers seed to us the, the ability to do architecture, right, so we're essentially creating no Snowflakes for our customers and they get better outcomes from a business perspective so we help them with the architecture, we help them with planning an architecture of the actual equipment and then they get a very defined business outcome in SLA that they pay for as a service, right? So it's a win-win across the board, is really good. >> Okay, so no Snowflakes as in, not everything's custom-- >> Absolutely. >> And then that, so that lowers not only your cost, it lowers the customer's cost. So let's take an example like that, let's take backup as a service which is part of GreenLake. How does that work if I wanna engage with you on backup as a service? >> Yeah, so we have a team of folks in Pointnext that can engage like very far up in the front end, right, so they say, hey, listen, I know that I need to do a major re-architecture for my secondary storage, HPE, can you help me out? So we provide advisory services, we have well-known architectures that fit a set of well-known mission critical, business critical applications at a typical customer site so we can drive that all the way from the inception of that project to implementation. We can take more customized view, or a road-mapped approach to customers where they want to bite off a little bit at a time and use things like Flex Capacity, and then weave in a full GreenLake implementation so it's very flexible in terms of the way we can implement it. So we can go soup to nuts, or we can get down to a very small granular pieces of infrastructure. >> Just sticking on data protection for a second, I saw a stat the other day, it's a fairly well, you know, popular, often quoted stat, it was Gartner I think, is 50% of customers are gonna change their backup platform by like 2023 or something. And you think about, and by the way, I think that's a legitimate stat and when you talk to customers about why, well things are changing, the Cloud, Multicloud, things like GDPR, Ransomware, digital transformation, I wanna get more out of my data then just insurance, my backup then just insurance, I wanna do analytics. So there's all these other sort of evolving things. I presume your backup as a service is evolving with that? >> Absolutely. >> What are you seeing there? >> Yeah, we're definitely seeing that the secondary storage market is very dynamic in terms of the expectations from customers, are, you know, they're changing, and changing very rapidly. And so not only are providing things like GreenLake and backup as a service we're also seeking new partners in this space so one of the big announcements that we'll make at Discover is we are doing a pretty big amplification of our partnership in an OEM relationship with Cohesity, right, so a lot of customers are looking for a secondary platform from a consolidation standpoint, so being able to run a number of very different disparate workloads from a secondary storage perspective and make them, you know, work. So it's a great platform scale-out. It's gonna run on a number of our HPE platforms, right, so we're gonna be able to provide customers that whole solution from HPE partnering with Cohesity. So, you know, in general this secondary storage market's hot and we're making some bets in our ecosystem right now. >> You also have Big Data in your title so you're responsible for that portfolio. I know Apollo in the HPC world has been at a foothold there. There's a lot of synergies between high-performance computing and Big Data-- >> Absolutely. >> What's going on in the Big Data world? >> Yeah, so Big Data is one of our fastest growing segments within HPE. I'd say Big Data and Analytics and some of the things that are going on with AI, and commercial high-performance applications. So for us we're, we have a new platform that we're announcing, our Gen10 version of Apollo 4200, it's definitely the workhorse of our Apollo server line for applications like, Cloudera, Hortonworks, MapR, we see Apache Spark, Kafka, a number of these as well as some of these newer workloads around HPC, so TensorFlow, Caffe, H2O, and so that platform allows us with a really good compute memory and storage mix, from a footprint perspective, and it certainly scales into rack-level infrastructure. That part of the business for us is growing very quickly. I think a lot of customers are using these Big Data Analytics techniques to transform their business and, you know, as we go along and help them it certainly, it's been a really cool ride to see all this implemented at customer sites. >> You know with all this talk about sort of Big Data and Analytics, and Cloud, and AI, you sort of, you know, get lost, the infrastructure kinda gets lost, but you know, the plumbing still matters, right, and so underneath this. So we saw the flash trend, and that really had a major impact on certainly the storage business specifically, but generally, the overall marketplace, I mean, you really, it'd be hard to support a lot of these emerging workloads without flash and that stack continues to evolve, the pyramid if you will. So you've got flash memory now replacing much of the spinning disk space, you've got DRAM which obviously is the most expensive, highest performance, and there seems to be this layer emerging in the middle, this storage-class memory layer. What are you guys doing there? Is there anything new there? >> Yeah, so we've got a couple things cooking in that space. In general, like when you talk about the infrastructure it is important, right, and we're trying to help customers not only by providing really good product in scalable infrastructure, things like Apollo, you know our system's Nimble 3PAR. We're also trying to provide experience around that too. So, you know, combining things like InfoSight, InfoSight on storage, InfoSight on servers and Apollo for Big Data workloads is something that we're gonna be delivering in the future. The platforms really matter. So we're gonna be introducing NVME and storage class memory into our, what we feel is the industry-leading portfolio for our, for flash storage. So between Nimble and 3PAR we'll have, those platforms will be, and they're NVME ready and we'll be making some product announcements on the availability of that type of medium. So if you think about using it in a platform like 3PAR, right, industry leading from a performance perspective allows to get sub 200 millisecond performance for very mission-critical latency intolerant applications and it's a great architecture. It scales in parallel, active, active, active, right, so you can get quite a bit of performance from a very, a large 3PAR system and we're gonna be introducing NVME into that equation as a part of this announcement. >> So, we see this as critical, for years, in the storage business, you talk about how storage is growing, storage is growing, storage is growing, and we'd show the charts upper to the right, and, but it always like yeah, and somehow you gotta store it, you gotta manage it, you might have to move it, it's a real pain. The whole equation is changing now because of things like flash, things like GPU, storage class memory, NVME, now you're seeing, and of course all this ML and deep learning tech, and now you're seeing things that you're able to do with the data that you've never been able to do before-- >> Absolutely. >> And emerging use cases and so it's not just lots of data, it's completely new use cases and it's driving new demands for infrastructure isn't it? >> Absolutely, I mean, there's some macro economic tailwinds that we had this year, but HP had a phenomenal year this year and we're looking at some pretty good outlooks into next year as well. So, yeah, from our perspective the requirement for customers, for latency improvements, bandwidth improvements, and total addressable capacity improvements is, never stops, right? So it's always going on and it's the data pipeline is getting longer. The amount of services and experiences that you're tying on to, existing applications, keeps on augmenting, right? So for us there's always new capabilities, always new ways that we can improve our products. We use for things like InfoSight, and a lot of the predictive Analytics, we're using those techniques for ourselves to improve our customers experience with our products. So it's been, it's a very, you know, virtual cycle in the industry right now. >> Well Patrick, thanks for coming in to theCube and unpacking these announcements at Discover Madrid. You're doing a great job sort of executing on the storage plan. Every time I see you there's new announcements, new innovations, you guys are hittin' all your marks, so congratulations on that. >> HPE, intelligent storage, intelligent data management, so if you guys have data needs you know where to come to. >> Alright, thanks again Patrick. >> Great, thank you so much. >> Talk to you soon. Alright, thanks for watching everybody. This is Dave Vellante from theCUBE. We'll see ya next time. (upbeat music)

Published Date : Nov 27 2018

SUMMARY :

From the SiliconANGLE Media Office and you always see them announce products and innovations Great to be here, love theCUBE, maybe you could start there. that type of elastic, you know, So let's talk about the hard news and what you see is that we've been not only of that intelligence across the platform that takes a lot of the manual interactions but you have Cloud Volumes, which is Cloud-like, from the public Cloud to that storage, Okay so you got that, write once, run anywhere the service that we provide is, you know, And the European piece, I presume a lot of that is, Yeah, absolutely, and for us if you take What are you announcing in as a service for the next, you know, foreseeable future. and so the service is, How does that work if I wanna engage with you of the way we can implement it. and when you talk to customers about why, and make them, you know, work. I know Apollo in the HPC world has been and so that platform allows us the pyramid if you will. right, so you can get quite a bit of performance in the storage business, you talk about how So it's been, it's a very, you know, virtual cycle new innovations, you guys are hittin' all your marks, so if you guys have data needs Talk to you soon.

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Madrid	LOCATION	0.99+
Patrick Osborne	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
Ireland	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
six-month	QUANTITY	0.99+
50%	QUANTITY	0.99+
HP	ORGANIZATION	0.99+
May of 2018	DATE	0.99+
Americas	LOCATION	0.99+
November 2018	DATE	0.99+
UK	LOCATION	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
next year	DATE	0.99+
Discover	ORGANIZATION	0.99+
Apollo	ORGANIZATION	0.99+
Nimble	ORGANIZATION	0.99+
last week	DATE	0.99+
500 petabytes	QUANTITY	0.99+
third piece	QUANTITY	0.99+
this year	DATE	0.99+
This year	DATE	0.99+
EBS	ORGANIZATION	0.99+
three announcements	QUANTITY	0.98+
Discover Madrid	ORGANIZATION	0.98+
June	DATE	0.98+
Cohesity	ORGANIZATION	0.98+
InfoSight	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
GDPR	TITLE	0.97+
Big Data	ORGANIZATION	0.97+
SAS	ORGANIZATION	0.96+
Kafka	TITLE	0.96+
Cloud	TITLE	0.96+
One	QUANTITY	0.95+
Synergy	ORGANIZATION	0.95+
SiliconANGLE Media Office	ORGANIZATION	0.95+
Cloud Volumes	TITLE	0.94+
few years ago	DATE	0.93+
Massachusets	LOCATION	0.93+
EMEA	ORGANIZATION	0.91+
Apache	ORGANIZATION	0.91+
GreenLake	ORGANIZATION	0.91+
Vim	ORGANIZATION	0.85+
six-nines	QUANTITY	0.84+
Pointnext	ORGANIZATION	0.83+
GreenLake	TITLE	0.83+
MapR	TITLE	0.82+
three	QUANTITY	0.79+
ProLiant	ORGANIZATION	0.79+
theCUBE	ORGANIZATION	0.79+

DDN Chrowdchat | October 11, 2018

(uptempo orchestral music) >> Hi, I'm Peter Burris and welcome to another Wikibon theCUBE special feature. A special digital community event on the relationship between AI, infrastructure and business value. Now it's sponsored by DDN with participation from NIVIDA, and over the course of the next hour, we're going to reveal something about this special and evolving relationship between sometimes tried and true storage technologies and the emerging potential of AI as we try to achieve these new business outcomes. So to do that we're going to start off with a series of conversations with some thought leaders from DDN and from NVIDIA and at the end, we're going to go into a crowd chat and this is going to be your opportunity to engage these experts directly. Ask your questions, share your stories, find out what your peers are thinking and how they're achieving their AI objectives. That's at the very end but to start, let's begin the conversation with Kurt Kuckein who is a senior director of marketing at DDN. >> Thanks Peter, happy to be here. >> So tell us a little bit about DNN at the start. >> So DDN is a storage company that's been around for 20 years. We've got a legacy in high performance computing, and that's what we see a lot of similarities with this new AI workload. DDN is well known in that HPC community. If you look at the top 100 super computers in the world, we're attached to 75% of them. And so we have the fundamental understanding of that type of scalable need, that's where we're focused. We're focused on performance requirements. We're focused on scalability requirements which can mean multiple things. It can mean the scaling of performance. It can mean the scaling of capacity, and we're very flexible. >> Well let me stop you and say, so you've got a lot of customers in the high performance world. And a lot of those customers are at the vanguard of moving to some of these new AI workloads. What are customer's saying? With this significant engagement that you have with the best and the brightest out there. What are they saying about this transition to AI? >> Well I think it's fascinating that we have a bifurcated customer base here where we have those traditionalist who probably have been looking at AI for over 40 years, and they've been exploring this idea and they've gone to the peaks and troughs in the promise of AI, and then contraction because CPUs weren't powerful enough. Now we've got this emergence of GPS in the super computing world. And if you look at how the super computing world has expanded in the last few years. It is through investment in GPUs. And then we've got an entirely different segment which is a much more commercial segment, and they may be newly invested in this AI arena. They don't have the legacy of 30, 40 years of research behind them, and they are trying to figure out exactly what do I do here. A lot of companies are coming to us. Hey, I have an AI initiative. Well, what's behind it? We don't know yet but we've got to have something, and they don't you understand where is this infrastructure going to come from. >> So a general availability of AI technologies and obviously flash has been a big part of that. Very high speed networks within data centers. Virtualization certainly helps as well. Now opens up the possibility for using these algorithms, some of which have been around for a long time that require very specialized bespoke configurations of hardware to the enterprise. That still begs the question. There are some differences between high performance computing workloads and AI workloads. Let's start with some of the, what are the similarities and let's explore some of the differences. >> So the biggest similarity I think is it's an intractable hard IO problem. At least from the storage perspective, it requires a lot of high throughput. Depending on where those idle characteristics are from. It can be a very small file, high opt intensive type workflows but it needs the ability of the entire infrastructure to deliver all of that seamlessly from end to end. >> So really high performance throughput so that you can get to the data you need and keep this computing element saturated. >> Keeping the GPU saturated is really the key. That's where the huge investment is. >> So how do AI and HPC workloads differ? >> So how they are fundamentally different is often AI workloads operate on a smaller scale in terms of the amount of capacity, at least today's AI workloads, right? As soon as a project encounter success, what our forecast is is those things will take off and you'll want to apply those algorithm games bigger and bigger data sets. But today, we encounter things like 10 terabyte data sets, 50 terabyte data sets, and a lot of customers are focused only on that but what happens when you're successful? How you scale your current infrastructure to petabytes and multi petabytes when you'll need it in the future. >> So when I think of HPC, I think of often very, very big batch jobs. Very, very large complex datasets. When I think about AI, like image processing or voice processing whatever else it might be. Like for a lot of small files randomly access that require nonetheless some very complex processing that you don't want to have to restart all the time and the degree of some pushing that's required to make sure that you have the people that can do. Have I got that right? >> You've got that right. Now one, I think misconception is on the HPC side, that whole random small file thing has come in in the last five, 10 years, and it's something DDN have been working on quite a bit. Our legacy was in high performance throughput workloads but the workloads have evolved so much on the HPC side as well, and as you posited at the beginning so much of it has become AI and deep learning research. >> Right, so they look a lot more alike. >> They do look a lot more alike. >> So if we think about the revolving relationship now between some of these new data first workloads, AI oriented change the way the business operates type of stuff. What do you anticipate is going to be the future of the relationship between AI and storage? >> Well, what we foresee really is that the explosion in AI needs and AI capability is going to mimic what we already see, and really drive what we see on the storage side. We've been showing that graph for years and years of just everything going up into the right but as AI starts working on itself and improving itself, as the collection means keep getting better and more sophisticated, and have increased resolutions whether you're talking about cameras or in life sciences, acquisition. Capabilities just keep getting better and better and the resolutions get better and better. It's more and more data right and you want to be able to expose a wide variety of data to these algorithms. That's how they're going to learn faster. And so what we see is that the data centric part of the infrastructure is going to need the scale even if you're starting today with a small workload. >> Kurt, thank you very much, great conversation. How did this turn into value for users? Well let's take a look at some use cases that come out of these technologies. >> DDN A3I within video DGX-1 is a fully integrated and optimized technology solution that provides an enable into acceleration for a wide variety of AI and the use cases in any scale. The platform provides tremendous flexibility and supports a wide variety of workflows and data types. Already today, customers in the industry, academia and government all around the globe are leveraging DDN A3I within video DGX-1 for their AI and DL efforts. In this first example used case, DDN A3I enables the life sciences research laboratory to accelerate through microscopic capture and analysis pipeline. On the top half of the slide is the legacy pipeline which displays low resolution results from a microscope with a three minute delay. On the bottom half of the slide is the accelerated pipeline where DDN A3I within the video DGX-1 delivers results in real time. 200 times faster and with much higher resolution than the legacy pipeline. This used case demonstrates how a single unit deployment of the solution can enable researchers to achieve better science and the fastest times to results without the need to build out complex IT infrastructure. The white paper for this example used case is available on the DDN website. In the second example used case, DDN A3I with NVIDIA DGX-1 enables an autonomous vehicle development program. The process begins in the field where an experimental vehicle generates a wide range of telemetry that's captured on a mobile deployment of the solution. The vehicle data is used to train capabilities locally in the field which are transmitted to the experimental vehicle. Vehicle data from the fleet is captured to a central location where a large DDN A3I within video DGX-1 solution is used to train more advanced capabilities, which are transferred back to experimental vehicles in the field. The central facility also uses the large data sets in the repository to train experimental vehicles and simulate environments to further advance the AV program. This used case demonstrates the scalability, flexibility and edge to data center capability of the solution. DDN A3I within video DGX-1 brings together industry leading compute, storage and network technologies, in a fully integrated and optimized package that makes it easy for customers in all industries around the world to pursue break from business innovation using AI and DL. >> Ultimately, this industry is driven by what users must do, the outcomes if you try to seek. But it's always is made easier and faster when you got great partnerships working on some of these hard technologies together. Let's hear how DDN and NVIDIA are working together to try to deliver new classes of technology capable of making these AI workloads scream. Specifically, we've got Kurt Kuckein coming back. He's a senior director of marketing for DDN and Darrin Johnson who is global director of technical marketing for NVIDIA in the enterprise and deep learning. Today, we're going to be talking about what infrastructure can do to accelerate AI. And specifically we're going to use a relationship. A virgin relationship between DDN and NVIDIA to describe what we can do to accelerate AI workloads by using higher performance, smarter and more focused infrastructure for computing. Now to have this conversation, we've got two great guest here. We've got Kurt Kuckein, who is the senior director of marketing at DDN. And also Darrin Johnson, who's the global director of technical marketing for enterprise at NVIDIA. Kurt, Darrin, welcome to the theCUBE. >> Thank you very much. >> So let's get going on this 'cause this is a very, very important topic, and I think it all starts with this notion of that there is a relationship that you guys put forward. Kurt, why don't you describe. >> Sure, well so what we're announcing today is DDNs, A3I architecture powered by NVIDIA. So it is a full rack level solution, a reference architecture that's been fully integrated and fully tested to deliver an AI infrastructure very simply, very completely. >> So if we think about why this is important. AI workloads clearly put special stress on underline technology. Darrin talk to us a little bit about the nature of these workloads and why in particular things like GPUs, and other technologies are so important to make them go fast? >> Absolutely, and as you probably know AI is all about the data. Whether you're doing medical imaging, whether you're doing natural language processing. Whatever it is, it's all driven by the data. The more data that you have, the better results that you get but to drive that data into the GPUs, you need greater IO and that's why we're here today to talk about DDN and the partnership of how to bring that IO to the GPUs on our DGX platforms. >> So if we think about what you describe. A lot of small files often randomly distributed with nonetheless very high profile jobs that just can't stop midstream and start over. >> Absolutely and if you think about the history of high performance computing which is very similar to AI, really IO is just that. Lots of files. You have to get it there. Low latency, high throughput and that's why DDNs probably, nearly 20 years of experience working in that exact same domain is perfect because you get the parallel file system which gives you that throughput, gives you that low latency. Just helps drive the GPU. >> So you mentioned HPC from 20 years of experience. Now it use to be that HPC, you'd have a scientist with a bunch of graduate students setting up some of these big, honking machine. but now we're moving with commercial domain You don't have graduate students running around. You have very low cost, high quality people. A lot of administrators, nonetheless quick people but a lot to learn. So how does this relationship actually start making or bringing AI within reach of the commercial world? Kurt, why you don't you-- >> Yeah, that's exactly where this reference architecture comes in. So a customer doesn't need to start from scratch. They have a design now that allows them to quickly implement AI. It's something that's really easily deployable. We fully integrated the solution. DDN has made changes to our parallel file system appliance to integrate directly with the DGX-1 environment. Makes the even easier to deploy from there, and extract the maximum performance out of this without having to run around and tuning a bunch of knobs, change a bunch of settings. It's really going to work out of the box. >> And NVIDIA has done more than the DGX-1. It's more than hardware. You've don't a lot of optimization of different AI toolkits et cetera so talk a little bit about that Darrin. >> Talking about the example that used researchers in the past with HPC. What we have today are data scientists. A scientist understand pie charts, they understand TensorFlow, they understand the frameworks. They don't want to understand the underlying file system, networking, RDM, a InfiniBand any of that. They just want to be able to come in, run their TensorFlow, get the data, get the results, and just keep turning that whether it's a single GPU or 90 DGXs or as many DGXs as you want. So this solution helps bring that to customers much easier so those data scientist don't have to be system administrators. >> So roughly it's the architecture that makes things easier but it's more than just for some of these commercial things. It's also the overall ecosystem. New application fires up, application developers. How is this going to impact the aggregate ecosystem is growing up around the need to do AI related outcomes? >> Well, I think one point that Darrin was getting to there in one of the bigg effects is also as these ecosystems reach a point where they're going to need to scale. There's somewhere where DDN has tons of experience. So many customers are starting off with smaller datasets. They still need the performance, a parallel file system in that case is going to deliver that performance. But then also as they grow, going from one GBU to 90 GXs is going to be an incredible amount of both performance scalability that they're going to need from their IO as well as probably capacity, scalability. And that's another thing that we've made easy with A3I is being able to scale that environment seamlessly within a single name space, so that people don't have to deal with a lot of again tuning and turning of knobs to make this stuff work really well and drive those outcomes that they need as they're successful. In the end, it is the application that's most important to both of us, right? It's not the infrastructure. It's making the discoveries faster. It's processing information out in the field faster. It's doing analysis of the MRI faster. Helping the doctors, helping anybody who is using this to really make faster decisions better decisions. >> Exactly. >> And just to add to that. In automotive industry, you have datasets that are 50 to 500 petabytes, and you need access to all that data, all the time because you're constantly training and retraining to create better models to create better autonomous vehicles, and you need the performance to do that. DDN helps bring that to bear, and with this reference architecture is simplifies it so you get the value add of NVIDIA GPUs plus its ecosystem software plus DDN. It's match made in heaven. >> Kurt, Darrin, thank you very much. Great conversation. To learn more about what they're talking about, let's take a look at a video created by DDN to explain the product and the offering. >> DDN A3I within video NVIDIA DGX-1 is a fully integrated and optimized technology solution that enables and accelerates end to end data pipelines for AI and DL workloads of any scale. It is designed to provide extreme amounts of performance and capacity backed by a jointly engineered and validated architecture. Compute is the first component of the solution. The DGX-1 delivers over one petaflop of DL training performance leveraging eight NVIDIA tester V100 GPUs in a 3RU appliance. The GPUs are configured in a hybrid cube mesh topology using the NVIDIA and VLink interconnect. DGX-1 delivers linearly predictable application performance and is powered by the NVIDIA DGX software stack. DDN A31 solutions can scale from single to multiple DGX-1s. Storage is a second component of the solution. The DDN and the AI200 is all NVIDIA parallel file storage appliance that's optimized for performance. The AI200 is specifically engineered to keep GPU computing resources fully utilized. The AI200 ensures maximum application productivity while easily managing to update data operations. It's offered in three capacity options and a compact tour U chassis. AI200 appliance can deliver up to 20 gigabytes a second of throughput and 350,000 IOPS. The DDN A3I architecture can scale up and out seamlessly over multiple appliances. The third component of the solution is a high performance, low latency, RDM capable network. Both EDR and InfiniBand, and 100 gigabit ethernet options are available. This provides flexibility, interesting seamless scaling and easy integration of the solution within any IT infrastructure. DDN A3I solutions within video DGX-1 brings together industry leading compute, storage and network technologies in a fully integrated and optimized package that's easy to deploy and manage. It's backed by deep expertise and enables customers to focus on what really matters. Extracting the most value from their data with unprecedented accuracy and velocity. >> Always great to hear the product. Let's hear the analyst's perspective. Now I'm joined by Dave Vellante, who's now with Wikibon, colleague here at Wikibon and co-CEO of SiliconANGLE. Dave welcome to theCUBE. Dave a lot of conversations about AI. What is it about today that is making AI so important to so many businesses? >> Well I think it's three things Peter. The first is the data we've been on this decade long aduped bandwagon and what that did is really focused organizations on putting data at the center of their business, and now they're trying to figure okay, how do we get more value of that? So the second piece of that is technology is now becoming available, so AI of course have been around forever but the infrastructure to support that, GPUs, the processing power, flash storage, deep learning frameworks like TensorFlow have really have started to come to the marketplace. So the technology is now available to act on that data, and I think the third is people are trying to get digital right. This is it about digital transformation. Digital meets data. We talked about that all the time and every corner office is trying to figure out what their digital strategy should be. So there try to remain competitive and they see automation, and artificial intelligence, machine intelligence applied to that data as a lynch pan of their competitiveness. >> So a lot of people talk about the notion of data as a source value in some and the presumption that's all going to the cloud. Is that accurate? >> Oh yes, it's funny that you say that because as you know, we're done a lot of work of this and I think the thing that's important organizations have realized in the last 10 years is the idea of bringing five megabytes of compute to a petabyte of data is far more valuable. And as a result a pendullum is really swinging in many different directions. One being the edge, data is going to say there, and certainly the cloud is a major force. And most of the data still today lives on premises, and that's where most of the data os likely going to stay. And so no all the data is not going to go into the cloud. >> It's not the central cloud? >> That's right, the central public cloud. You can redefined the boundaries of the cloud and the key is you want to bring that cloud like experience to the data. We've talked about that a lot in the Wikibon and Cube communities, and that's all about the simplification and cloud business models. >> So that suggest pretty strongly that there is going to continue to be a relationship between choices about hardware infrastructure on premises, and the success at making some of these advance complex workloads, run and scream and really drive some of that innovative business capabilities. As you think about that what is it about AI technologies or AI algorithms and applications that have an impact on storage decisions? >> Well, the characteristics of the workloads are going to be often times is going to be largely unstructured data that's going to be small files. There's going to a lot of those small files, and they're going to be randomly distributed, and as a result, that's going to change the way in which people are going to design systems to accommodate those workloads. There's going to be a lot more bandwidth. There's going to be a lot more parallelism in those systems in order to accommodate and keep those CPUs busy. And yeah, we're going to talk about but the workload characteristics are changing so the fundamental infrastructure has to change as well. >> And so our goal ultimately is to ensure that we keep these new high performing GPUs saturated by flowing data to them without a lot of spiky performance throughout the entire subsystem. We've got that right? >> Yeah, I think that's right, and that's when I was talking about parallelism, that's what you want to do. You want to be able to load up that processor especially these alternative processors like GPUs, and make sure that they stay busy. The other thing is when there's a problem, you don't want to have to restart the job. So you want to have real time error recovery, if you will. And that's been crucial in the high performance world for a long, long time on terms of, because these jobs as you know take a long, long time to the extent that you don't have to restart a job from ground zero. You can save a lot of money. >> Yeah especially as you said, as we start to integrate some of these AI applications with some of the operational applications that are actually recording your results of the work that's being performed or the prediction that's being made or the recommendation that's been offered. So I think ultimately, if we start thinking about this crucial role that AI workloads is going to have in business and that storage is going to have on AI, move more processes closer to data et cetera. That suggest that there's going to be some changes in the offering for the storage industry. What are your thinking about how storage interest is going to evolve over time? >> Well there's certainly a lot of hardware stuff that's going on. We always talk about software define but they say hardware stuff matters. If obviously flash doors changed the game from a spinning mechanical disc, and that's part of this. Also as I said the day before seeing a lot more parallelism, high bandwidth is critical. A lot of the discussion that we're having in our community is the affinity between HPC, high performance computing and big data, and I think that was pretty clear, and now that's evolving to AI. So the internal network, things like InfiniBand are pretty important. NVIDIA is coming onto the scene. So those are some of the things that we see. I think the other one is file systems. NFS tends to deal really well with unstructured data and data that is sequential. When you have all the-- >> Streaming. >> Exactly, and you have all this what we just describe as random nature and you have the need for parallelism. You really need to rethink file systems. File systems are again a lynch pan of getting the most of these AI workloads, and the others if we talk about the cloud model. You got to make this stuff simple. If we're going to bring AI and machine intelligence workloads to the enterprise, it's got to be manageable by enterprise admins. You're not going to be able to have a scientist be able to deploy this stuff, so it's got to be simple or cloud like. >> Fantastic, Dave Vellante, Wikibon. Thanks for much for being on theCUBE. >> My pleasure. >> We've had he analyst's perspective. Now tells take a look at some real numbers. Not a lot of companies has delivered a rich set of bench marks relating AI, storage and business outcomes. DDN has, let's take a video that they prepared describing the bench mark associated with these new products. >> DDN A3I within video DGX-1 is a fully integrated and optimized technology solution that provides massive acceleration for AI and DL applications. DDN has engaged extensive performance and interoperable testing programs in close collaboration with expert technology partners and customers. Performance testing has been conducted with synthetic throughputs in IOPS workloads. The results demonstrate that the DDN A3I parallel architecture delivers over 100,000 IOPS and over 10 gigabytes per second of throughput to a single DGX-1 application container. Testing with multiple container demonstrates linear scaling up to full saturation of the DGX-1 Zyo capabilities. These results show concurrent IO activity from four containers with an aggregate delivered performance of 40 gigabytes per second. The DDN A3I parallel architecture delivers true application acceleration, extensive interoperability and performance testing has been completed with a dozen popular DL frameworks on DGX-1. The results show that with the DDN A3I parallel architecture, DL applications consistently achieve a higher training throughput and faster completion times. In this example, Caffe achieves almost eight times higher training throughput on DDN A3I as well it completes over five times faster than when using a legacy file sharing architecture and protocol. Comprehensive test and results are fully documented in the DDN A3I solutions guide available from the DDN website. This test illustrates the DGX-1 GPU utilization and read activity from the AI 200 parallel storage appliance during a TensorFlow training integration. The green line shows that the DGX-1 be used to achieve maximum utilization throughout the test. The red line shows the AI200 delivers a steady stream of data to the application during the training process. In the graph below, we show the same test using a legacy file sharing architecture and protocol. The green line shows that the DGX-1 never achieves full GPU utilization and that the legacy file sharing architecture and protocol fails to sustain consistent IO performance. These results show that with DDN A3I, this DL application on the DGX-1 achieves maximum GPU product activity and completes twice as fast. This test then resolved is also documented in the DDN A3I solutions guide available from the DDN website. DDN A3I solutions within video DGX-1 brings together industry meaning compute, storage and network technologies in a fully integrated and optimized package that enables widely used DL frameworks to run faster, better and more reliably. >> You know, it's great to see real benchmarking data because this is a very important domain, and there is not a lot of benchmarking information out there around some of these other products that are available but let's try to turn that benchmarking information into business outcomes. And to do that we've got Kurt Kuckein back from DDN. Kurt, welcome back. Let's talk a bit about how are these high value outcomes That seeks with AI going to be achieved as a consequence of this new performance, faster capabilities et cetera. >> So there is a couple of considerations. The first consideration, I think, is just the selection of AI infrastructure itself. Right, we have customers telling us constantly that they don't know where to start. Now they have readily available reference architectures that tell them hey, here's something you can implement, get installed quickly, you're up and running your AI from day one. >> So the decision process for what to get is reduced. >> Exactly. >> Okay. >> Number two is, you're unlocking all ends of the investment with something like this, right. You're maximizing the performance on the GPU side, you're maximizing the performance on the ingest side for the storage. You're maximizing the throughput of the entire system. So you're really gaining the most out of your investment there. And not just gaining the most out of your investment but truly accelerating the application and that's the end goal, right, that we're looking for with customers. Plenty of people can deliver fast storage but if it doesn't impact the application and deliver faster results, cut run times down then what are you really gaining from having fast storage? And so that's where we're focused. We're focused on application acceleration. >> So simpler architecture, faster implementation based on that, integrated capabilities, ultimately, all revealing or all resulting in better application performance. >> Better application performance and in the end something that's more reliable as well. >> Kurt Kuckein, thanks so much for being on theCUBE again. So that's ends our prepared remarks. We've heard a lot of great stuff about the relationship between AI, infrastructure especially storage and business outcomes but here's your opportunity to go into crowd chat and ask your questions get your answers, share your stories, engage your peers and some of the experts that we've been talking with about this evolving relationship between these key technologies, and what it's going to mean for business. So I'm Peter Burris. Thank you very much for listening. Let's step into the crowd chat and really engage and get those key issues addressed.

Published Date : Oct 10 2018

SUMMARY :

and over the course of the next hour, It can mean the scaling of performance. in the high performance world. A lot of companies are coming to us. and let's explore some of the differences. So the biggest similarity I think is so that you can get to the data you need Keeping the GPU saturated is really the key. of the amount of capacity, and the degree of some pushing that's required to make sure on the HPC side as well, and as you posited at the beginning of the relationship between AI and storage? of the infrastructure is going to need the scale that come out of these technologies. in the repository to train experimental vehicles of technical marketing for NVIDIA in the enterprise and I think it all starts with this notion of that there is and fully tested to deliver an AI infrastructure Darrin talk to us a little bit about the nature of how to bring that IO to the GPUs on our DGX platforms. So if we think about what you describe. Absolutely and if you think about the history but a lot to learn. Makes the even easier to deploy from there, And NVIDIA has done more than the DGX-1. in the past with HPC. So roughly it's the architecture that makes things easier so that people don't have to deal with a lot of DDN helps bring that to bear, to explain the product and the offering. and easy integration of the solution Let's hear the analyst's perspective. So the technology is now available to act on that data, So a lot of people talk about the notion of data And so no all the data is not going to go into the cloud. and the key is you want to bring and the success at making some of these advance so the fundamental infrastructure has to change as well. by flowing data to them without a lot And that's been crucial in the high performance world and that storage is going to have on AI, A lot of the discussion that we're having in our community and the others if we talk about the cloud model. Thanks for much for being on theCUBE. describing the bench mark associated and read activity from the AI 200 parallel storage appliance And to do that we've got Kurt Kuckein back from DDN. is just the selection of AI infrastructure itself. and that's the end goal, right, So simpler architecture, and in the end something that's more reliable as well. and some of the experts that we've been talking

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Kurt Kuckein	PERSON	0.99+
Peter	PERSON	0.99+
Dave	PERSON	0.99+
Peter Burris	PERSON	0.99+
Kurt	PERSON	0.99+
50	QUANTITY	0.99+
200 times	QUANTITY	0.99+
Darrin	PERSON	0.99+
October 11, 2018	DATE	0.99+
DDN	ORGANIZATION	0.99+
Darrin Johnson	PERSON	0.99+
50 terabyte	QUANTITY	0.99+
20 years	QUANTITY	0.99+
10 terabyte	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
75%	QUANTITY	0.99+
two	QUANTITY	0.99+
five megabytes	QUANTITY	0.99+
Today	DATE	0.99+
second piece	QUANTITY	0.99+
third component	QUANTITY	0.99+
both	QUANTITY	0.99+
first	QUANTITY	0.99+
DNN	ORGANIZATION	0.99+
third	QUANTITY	0.99+
second component	QUANTITY	0.99+
90 GXs	QUANTITY	0.99+
first component	QUANTITY	0.99+
today	DATE	0.99+
three minute	QUANTITY	0.99+
AI200	COMMERCIAL_ITEM	0.98+
over 40 years	QUANTITY	0.98+
first example	QUANTITY	0.98+
DGX-1	COMMERCIAL_ITEM	0.98+
100 gigabit	QUANTITY	0.98+
500 petabytes	QUANTITY	0.98+
V100	COMMERCIAL_ITEM	0.98+
30, 40 years	QUANTITY	0.98+
second example	QUANTITY	0.97+
NIVIDA	ORGANIZATION	0.97+
over 100,000 IOPS	QUANTITY	0.97+
SiliconANGLE	ORGANIZATION	0.97+
AI 200	COMMERCIAL_ITEM	0.97+
first consideration	QUANTITY	0.97+
three things	QUANTITY	0.96+

09_19_18 Peter & Dave DDN Signal Event

>> Dave Vellante, welcome to theCUBE! >> Thank you, Peter. Good to see you. >> Good to see you too. So, Dave, lot of conversation about AI. What is about today that is making AI so important in so many businesses? >> Well, I think there's three things, Peter. The first is the data. We've been on this decade-long Hadoop bandwagon, and what that did is it really focused organizations on putting data at the center of their business. And now, they're trying to figure out, okay, how do we get more value out of that, so the second piece of that is the technology is now becoming available, so, AI, of course, has been around forever, but the infrastructure to support that, the GPUs, the processing power, flash storage, deep learning frameworks like TensorFlow and Caffe have started to come to the marketplace, so the technology is now available to act on that data, and I think the third is, people are trying to get digital right. This is about digital transformation. Digital means data, we talk about that all the time. And every corner office is trying to figure out what their digital strategy should be, so they're trying to remain competitive, and they see automation and artificial intelligence, machine intelligence applied to that data as a linchpin of their competitiveness. >> So, a lot of people talk about the notion of data as a source of value, and there's been some presumption that's all going to the cloud. Is that accurate? >> (laughs) Funny you say that, because, as you know, we've done a lot of work on this, and I think the thing that organizations have realized in the last 10 years is, the idea of bringing five megabytes of compute to petabyte of data is far more viable and as a result, the pendulum is really swinging in many different directions, one being the edge, data is going to stay there, certainly the cloud is a major force. And most of the data, still today, lives on premises, and that's where most of the data is likely going to stay, and so, no, all the data is not going to go into the cloud. >> At least not the central cloud. >> That's right, the central public cloud. You can maybe redefine the boundaries of the cloud. I think the key is, you want to bring that cloud-like experience to the data, we've talked about that a lot in the Wikibon and CUBE communities, and that's all about simplification and cloud business models. >> So that suggests pretty strongly that there is going to continue to be a relationship between choices about hardware infrastructure on premises and the success at making some of these advanced, complex workloads run and scream and really drive some of that innovative business capabilities. As you think about that, what is it about AI technologies or AI algorithms and applications that have an impact on storage decisions? >> Well, I mean, the characteristics of the workloads are going to be, oftentimes, largely unstructured data, there's going to be small files, there's going to be a lot of those small files, and they're going to be kind of randomly distributed, and as a result, that's going to change the way in which people are going to design systems to accommodate those workloads. There's going to be a lot more bandwidth, there's going to be a lot more parallelism in those systems in order to accommodate and keep those CPUs busy, you'll know, we're going to talk more about that, but the workload characteristics are changing, so the fundamental infrastructure has to change as well. >> And so our goal, ultimately, is to ensure that we can keep these new, high-performing GPUs saturated by flowing data to them without a lot of spiky performance throughout the entire subsystem, have I got that right? >> Yeah, I think that's right, that's when I was talking about parallelism, that's what you want to do, you want to be able to load up that processor, especially these alternative processors like GPUs, and make sure that they stay busy. You know, the other thing is, when there's a problem, you don't want to have to restart the job. So you want to have realtime error recovery, if you will. That's been crucial in the high performance world for a long, long time, because these jobs as you know, take a long, long, time, so to the extent that you don't have to restart a job from ground zero, you can save a lot of money. >> Yeah, especially as you said, as we start to integrate some of these AI applications with some of the operational implications, they're actually recording the results of the work that's being performed, or the prediction that's being made, or the recommendation that's being proffered. So I think, ultimately, if we start thinking about this crucial role that AI workloads are going to have in business, and that storage is going to have on AI, move more processing close to the data, et cetera, that suggests that there's going to be some changes in the offing for the storage industry. What are you thinking about how the storage industry is going to evolve over time? >> Well, there's certainly a lot of hardware stuff that's going on, we always talk about software definement, hardware still matters, right? So obviously, flash storage changed the game from spinning mechanical disk, and that's part of this. You're also, as I said before, seeing a lot more parallelism, high bandwidth is critical. Lot of the discussion we're having in our community is, the affinity between HPC, high performance computing, and big data, and I think that was pretty clear, and now that's evolving to AI, so the internal network, things like InifiBand are pretty important, NVMe is coming onto the scene. So those are some of the things that we see. I think the other one is file systems. NFS tends to deal really well with unstructured data and data that is sequential. When you have all this-- >> Streaming, for example. >> Exactly, and when you have all this, what we just described, this sort of random nature and you have the need for parallelism, you really need to rethink file systems. File systems are, again, a linchpin of getting the most out of these AI workloads. And I think the others, we talked about the cloud model, you got to make this stuff simple. If we're going to bring AI and machine intelligence workloads to the enterprise, it's got to be manageable by enterprise admins. You're not going to be able to have a scientist be able to deploy this stuff, so it got to be simpler, cloud-like. >> Fantastic, Dave Vellante, Wikibon, thanks very much for being on theCUBE. >> My pleasure.

Published Date : Sep 28 2018

SUMMARY :

Good to see you. Good to see you too. so the technology is now available to act on that data, that's all going to the cloud. and so, no, all the data is not going to go into the cloud. that cloud-like experience to the data, and the success at making some of these and as a result, that's going to change the way so to the extent that you don't have to restart a job and that storage is going to have on AI, and now that's evolving to AI, so it got to be simpler, cloud-like. Fantastic, Dave Vellante, Wikibon,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Peter	PERSON	0.99+
five megabytes	QUANTITY	0.99+
Dave	PERSON	0.99+
third	QUANTITY	0.99+
first	QUANTITY	0.99+
second piece	QUANTITY	0.99+
three things	QUANTITY	0.99+
today	DATE	0.96+
09_19_18	DATE	0.95+
CUBE	ORGANIZATION	0.9+
Wikibon	ORGANIZATION	0.78+
last 10 years	DATE	0.77+
InifiBand	ORGANIZATION	0.75+
petabyte	QUANTITY	0.67+
TensorFlow	TITLE	0.64+
Wikibon	PERSON	0.63+
ground zero	QUANTITY	0.6+
one	QUANTITY	0.59+
Signal Event	EVENT	0.56+
Caffe	ORGANIZATION	0.54+
DDN	ORGANIZATION	0.41+

Partha Seetala, Robin Systems | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone, you are watching day two of theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight. I'm coming at you with my cohost Jame Kobielus. We're joined by Partha Seetala, he is the Chief Technology Officer at Robin Systems, thanks so much for coming on theCUBE. >> Pleasure to be here. >> You're a first timer, so we promise we don't bite. >> Actually I'm not, I was on theCUBE- >> Oh! >> At DockerCon in 2016. >> Oh well excellent, okay, so now you're a veteran, right. >> Yes, ma'am. >> So Robin Systems, as before the cameras were rolling, we were talking about it, it's about four years old, based here in San Jose, venture backed company. Tell us a little bit more about the company and what you do. >> Absolutely. First of all, thanks for hosting me here. Like you said, Robin is a Silicon Valley based company. Our focus is in allowing applications, such as big data, databases, no sequel and AI ML, to run within the Kubernetes platform. What we have built is a product that converges storage, complex storage, networking, application workflow management, along with Kubernetes to create a one click experience where users can get managed services kind of feel when they're deploying these applications. They can also do one click life cycle management on these apps. Our thesis has initially been to, instead of looking at this problem from an infrastructure up into application, to actually look at it from the applications down and then say, "Let the applications drive the underlying infrastructure to meet the user's requirements." >> Is that your differentiating factor, would you say? >> Yeah, I think it is because most of the folks out there today are looking at is as if it's a competent based play, it's like they want to bring storage to Kubernetes or networking to Kubernetes but the challenges are not really around storage and networking. If you talk to the operations folk they say that, "You know what? Those are underlying problems but my challenge is more along the lines of, okay, my CIO says the initiative is to make my applications mobile. They want go across to different Clouds. That's my challenge." The line of business user says, "I want to get a managed source experience." Yes, storage is the thing that you want to manage underneath, but I want to go and click and create my, let's say, an Oracle database or distributions log. >> In terms of the developer experience here, from the application down, give us a sense for how Robin Systems tooling your product enables that degree of specification of the application logic that will then get containerized within? >> Absolutely, like I said, we want applications to drive the infrastructure. What it means is that we, Robin is a software platform. We later ourselves on top of the machines that we sit on whether it is bare metal machines on premises, our VMs, or even an Azure, Google Cloud as well as AWs. Then we make the underlying compute, storage, network resources almost invisible. We treat it as a pool of resources. Now once you have this pool of resources, they can be attached to the applications that are being deployed as can inside containers. I mean, it's a software place, install on machines. Once it's installed, the experience now moves away from infrastructure into applications. You log in, you can see a portal, you have a lot of applications in that portal. We ship support for about 25 applications of some such. >> So these are templates? >> Yes. >> That the developer can then customize to their specific requirements? Or no? >> Absolutely, we ship reference templates for pretty much a wide variety of the most popular big data, no sequel, database, AI ML applications today. But again, as I said, it's a reference implementation. Typically customers take the reference recommendation and they enhance it or they use that to onboard their custom apps, for example, or the apps that we don't ship out of the box. So it's a very open, extensible platform but the goal being that whatever the application might be, in fact we keep saying that, if it runs somewhere else, it's runs on Robin, right? So the idea here is that you can bring anything, and we just, the flip of switch, you can make it a one click deploy, one click manage, one click mobile across Clouds. >> You keep mentioning this one click and this idea of it being so easy, so convenient, so seamless, is that what you say is the biggest concern of your customers? Is this ease and speed? Or what are some other things that are on their minds that you want to deliver? >> Right, so one click of course is a user experience part but what is the real challenge? The real challenges, there are a wide variety of tools being used by enterprises today. Even the data analytic pipeline, there's a lot across the data store, processor pipeline. Users don't want to deal with setting it up and keeping it up and running. They don't want that, they want to get the job done, right? Now when you only get the job done, you really want to hide the underlying details of those platforms and the best way to convey that, the best way to give that experience is to make it a single click experience from the UI. So I keep calling it all one click because that is the experience that you get to hide the underlying complexity for these apps. >> Does your environment actually compile executable code based on that one click experience? Or where does the compilation and containerization actually happen in your distributed architecture? >> Alright, so, I think the simplest- >> You're a prem based offering, right? You're not in the Cloud yourself? >> No, we are. We work on all the three big public clouds. >> Oh, okay. >> Whether it is Azure, AWS or Google. >> So your entire application is containerized itself for deployment into these Clouds? >> Yes, it is. >> Okay. >> So the idea here is let's simplify it significantly, right? You have Kubernetes today, it can run anywhere, on premises, in the public Cloud and so on. Kubernetes is a great platform for orchestrating containers but it is largely inaccessible to a certain class of data centric applications. >> Yeah. >> We make that possible. But our take is, just onboarding those applications on Kubernetes does not solve your CXO or you line of business user's problems. You ought to make the management, from an application point of view, not from a container management point of view, from an application point of view, a lot easier and that is where we kind of create this experience that I'm talking about, one click experience. >> Give us a sense for how, we're here at DataWorks and it's the Hortonworks show. Discuss with us your partnership with Hortonworks and you know, we've heard the announcement of HDP 3.0 and containerization support, just give us a rough sense for how you align or partner with Hortonworks in this area. >> Absolutely. It's kind of interesting because Hortonworks is a data management platform, if you think about it from that point of view and when we engaged with them first- So some of our customers have been using the product, Hortonworks, on top of Robin, so orchestrating Hortonworks, making it a lot easier to use. >> Right. >> One of the requirements was, "Are you certified with Hortonworks?" And the challenge that Hortonworks also had is they had never certified a container based deployment of Hortonworks before. They actually were very skeptical, you know, "You guys are saying all these things. Can you actually containerize and run Hortonworks?" So we worked with Hortonworks and we are, I mean if you go to the Hortonworks website, you'll see that we are the first in the entire industry who have been certified as a container based play that can actually deploy and manage Hortonworks. They have certified us by running a wide variety of tests, which they call the Q80 Test Suite, and when we got certified the only other players in the market that got that stamp of approval was Microsoft in Azure and EMC with Isilon. >> So you're in good company? >> I think we are in great company. >> You're certified to work with HTP 3.0 or the prior version or both? >> When we got certified we were still in the 2.X version of Hortonworks, HTP 3.0 is a more relatively newer version. But our plan is that we want to continue working with Hortonworks to get certified as they release the program and also help them because HTP 3.0 also has some container based orchestration and deployment so you want to help them provide the underlying infrastructure so that it becomes easier for beyond to spin up more containers. >> The higher level security and governance and all these things you're describing, they have to be over the Kubernetes layer. Hortonworks supports it in their data plane services portfolio. Does Robin Systems solutions portfolio tap in to any of that, or do you provide your own layer of sort of security and metadata management so forth? >> Yeah, so we don't want- >> In context of what you offer? >> Right, so we don't want to take away the security model that the application itself provides because might have step it up so that they are doing governance, it's not just logging in and auto control and things like this. Some governance is built into. We don't want to change that. We want to keep the same experience and the same workflow hat customers have so we just integrate with whatever security that the application has. We, of course, provide security in terms of isolating these different apps that are running on the Robin platform where the security or the access into the application itself is left to the apps themselves. When I say apps, I'm talking about Hortonworks. >> Yeah, sure. >> Or any other databases. >> Moving forward, as you think about ways you're going to augment and enhance and alter the Robin platform, what are some of the biggest trends that are driving your decision making around that in the sense of, as we know that companies are living with this deluge of data, how are you helping them manage it better? >> Sure. I think there are a few trends that we are closely watching. One is around Cloud mobility. CIOs want their applications along with their data to be available where their end users are. It's almost like follow the sun model, where you might have generated the data in one Cloud and at a different time, different time zone, you'll basically want to keep the app as well as data, moving. So we are following that very closely. How we can enable the mobility of data and apps a lot easier in that world. The other one is around the general AI ML workflow. One of the challenges there, of course, you have great apps like TensorFlow or Theano or Caffe, these are very good AI ML toolkits but one of the challenges that people face, is they are buying this very expensive, let's say NVIDIA DGX Box, this box costs about $150,000 each, how do you keep these boxes busy so that you're getting a good return on investment? It will require you to better manage the resources offered with these boxes. We are also monitoring that space and we're seeing that how can we take the Robin platform and how do you enable the better utilization of GPUs or the sharing of GPUs for running your AI ML kind of workload. >> Great. >> Those are, I think, two key trends that we are closely watching. >> We'll be discussing those at the next DataWorks Summit, I'm sure, at some other time in the future. >> Absolutely. >> Thank you so much for coming on theCUBE, Partha. >> Thank you. >> Thank you, my pleasure. Thanks. >> I'm Rebecca Knight for James Kobielus, We will have more from DataWorks coming up in just a little bit. (techno beat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, he is the Chief Technology we promise we don't bite. so now you're a veteran, right. and what you do. from the applications down Yes, storage is the thing that you want the machines that we sit on or the apps that we don't because that is the No, we are. So the idea here is let's and that is where we kind of create and it's the Hortonworks show. if you think about it One of the requirements was, or the prior version or both? the underlying infrastructure so that to any of that, or do you that are running on the Robin platform the Robin platform and how do you enable that we are closely watching. at the next DataWorks Summit, Thank you so much for Thank you, my pleasure. We will have more from DataWorks

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Jame Kobielus	PERSON	0.99+
San Jose	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Robin Systems	ORGANIZATION	0.99+
Partha Seetala	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
one click	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2016	DATE	0.99+
both	QUANTITY	0.99+
HTP 3.0	TITLE	0.99+
NVIDIA	ORGANIZATION	0.99+
first	QUANTITY	0.99+
DataWorks	ORGANIZATION	0.99+
Robin	ORGANIZATION	0.98+
Kubernetes	TITLE	0.98+
One	QUANTITY	0.98+
TensorFlow	TITLE	0.98+
about $150,000 each	QUANTITY	0.98+
about 25 applications	QUANTITY	0.98+
one click	QUANTITY	0.98+
Partha	PERSON	0.98+
Isilon	ORGANIZATION	0.97+
DGX Box	COMMERCIAL_ITEM	0.97+
today	DATE	0.96+
First	QUANTITY	0.96+
DockerCon	EVENT	0.96+
Azure	ORGANIZATION	0.96+
Theano	TITLE	0.96+
DataWorks Summit 2018	EVENT	0.95+
theCUBE	ORGANIZATION	0.94+
Caffe	TITLE	0.91+
Azure	TITLE	0.91+
Robin	PERSON	0.91+
Robin	TITLE	0.9+
two key trends	QUANTITY	0.89+
HDP 3.0	TITLE	0.87+
EMC	ORGANIZATION	0.86+
single click	QUANTITY	0.86+
day two	QUANTITY	0.84+
DataWorks Summit	EVENT	0.83+
three big public clouds	QUANTITY	0.82+
DataWorks	EVENT	0.81+

Adrian Cockcroft, AWS | KubeCon + CloudNativeCon 2018

>> Announcer: From Copenhagen, Denmark, it's theCUBE. Covering KubeCon and CloudNativeCon Europe 2018. Brought to you by the Cloud Native Computing Foundation and its ecosystem partners. >> Hello and welcome back to the live CUBE coverage here in Copenhagen, Denmark, for KubeCon 2018, Kubernetes European conference. This is theCUBE, I'm John Furrier, my co-host Lauren Cooney here with Adrian Cockcroft who is the Vice President of Cloud Architecture and Strategy for Amazon Web Services, AWS. CUBE alumni, great to see you, a legend in the industry, great to have you on board today. Thanks for coming on. >> Thanks very much. >> Quick update, Amazon, we were at AWS Summit recently, I was at re:Invent last year, it gets bigger and bigger just continue to grow. Congratulations on successful great earnings. You guys posted last week, just continuing to show the scale and leverage that the cloud has. So, again, nothing really new here, cloud is winning and the model of choice. So you guys are doing a great job, so congratulations. Open source, you're handling a lot of that now. This community here, is all about driving cloud standards. >> Adrian: Yeah. >> Your guys position on that is? Standards are great, you do what customers want, as Andy Jassy always says, what's the update? I mean, what's new since Austin last year? >> Yeah, well, it's been great to be back on had a great video of us talking at Austin, it's been very helpful to get the message out of what we're doing in containers and what the open source team that I lead has been up to. It's been very nice. Since then we've done quite a lot. We were talking about doing things then, which we've now actually done and delivered on. We're getting closer to getting our Kubernetes service out, EKS. We hired Bob Wise, he started with us in January, he's the general manager of EKS. Some of you may know Bob has been working with Kubernetes since the early days. He was on the CNCF board before he joined us. He's working very hard, they have a team cranking away on all the things we need to do to get the EKS service out. So that's been major focus, just get it out. We have a lot of people signed up for the preview. Huge interest, we're onboarding a lot of people every week, and we're getting good feedback from people. We have demos of it in the booth here this week. >> So you guys are very customer-centric, following you guys closely as you know. What's the feedback that you're hearing and what are you guys ingesting from an intelligence standpoint from the field. Obviously, a new constituent, not new, but a major constituent is open source communities, as well as paying enterprise customers? What's the feedback? What are you hearing? I would say beyond tire kicking, there's general interest in what Kubernetes has enabled. What's Amazon's view of that? >> Yeah, well, open source in general is always getting a larger slice of what people want to do. Generally, people are trying to get off of their enterprise solutions and evolving into an open source space and then you kind of evolve from that into buying it as a service. So that's kind of the evolution from one trend, custom or enterprise software, to open source to as a service. And we're standing up all of these tools as a service to make them easier to consume for people. Just, everybody's happy to do that. What I'm hearing from customers is that that's what they're looking for. They want it to be easy to use, they want it to scale, they want it to be reliable and work, and that's what we're good at doing. And then they want to track the latest moves in the industry and run with the latest technologies and that's what Kubernetes and the CNCF is doing, gathering together a lot of technologies. Building the community around it, just able to move faster than we'd move on our own. We're leveraging all of those things into what we're doing. >> And the status of EKS right now is in preview? And the estimated timetable for GA? >> In the next few months. >> Next few months. >> You know, get it out then right now it's running in Oregon, in our Oregon data center, so the previews are all happening there. That gets us our initial thing and then everyone go okay, we want to in our other regions, so we have to do that. So another service we have is Fargate, which is basically say just here's a container, I want to run it, you don't have to declare a node or an instance to run it first. We launched that at re:Invent, that's already in production obviously, we just rolled that out to four regions. That's in Virginia, Oregon, Dublin and Ohio right now. A huge interest in Fargate, it lets you simplify your deployments a little bit. We just posted a new blog post that we have an open source blog, you can find if you want to keep up with what's going on with the open source team at AWS. Just another post this morning and it's a first pass at getting Fargate to work with Kubernetes using Virtual Kubelet which is a project that was kicked off by, it's an experimental project, not part of the core Kubernetes system. But it's running on the side. It's something that Microsoft came up with a little while ago. So we now have, we're working with them. We did a pull request, they accepted it, so that team and AWS and a few other customers and other people in the community, working together to provide you a way to start up Fargate as the underlying layer for provisioning containers underneath Kubernetes as the API for doing you know the management of that. >> So who do you work with mostly when you're working in open source? Who do you partner with? What communities are you engaging with in particular? >> It's all over. >> All over? >> Wherever the communities are we're engaging with them. >> Lauren: Okay, any particular ones that stand out? >> Other than CNCF, we have a lot of engagement with Apache Hadoop ecosystem. A lot of work in data science, there's many, many projects in that space. In AI and machine learning, we've sponsored, we've spend a lot of time working with Apache MXNet, we were also working off with TensorFlow by Torch and Caffe and there's a lot, those are all open source frameworks so there's lots of contributions there. In the serverless arena, we have our own SAM service application model. We've been open sourcing more of that recently ourselves and we're working with various other people. Across these different groups there's different conferences you go to, there's different things we do. We just sponsored Rails Conference. My team sponsors and manages most of the open source conference events we go to now. We just did RAILCON, we're doing a Rust conference, soon I think, there's Python conferences. I forget when all these are. There's a massive calendar of conferences that we're supporting. >> Make sure you email us that that list, we're interested actually in looking at what the news and action is. >> So the language ones, AltCon's our flagship one, we'll be top-level sponsor there. When we get to the U.S., CubeCon in Seattle, it's right there, it's two weeks after re:Invent. It's going to be much easier to manage. When we go to re:Invent it's like everyone just wants to take that week off, right. We got a week for everyone to recover and then it's in the hometown. >> You still have that look in your eyes when we interviewed you in Austin you came down, we both were pretty exhausted after re:Invent. >> Yeah, so we announced a bunch of things on Wednesday and Thursday and I had to turn it into a keynote by Tuesday and get everyone to agree. That's what was going on, that was very compressed. We have more time and all of the engineering teams that really want to be at an event like this, were right in the hometown for a lot. >> What's it like workin' at Amazon, I got to ask you it since you brought it up. I mean and you guys run hard at Amazon, you're releasing stuff with a pace that's unbelievable. I mean, I get blown away every year. Almost seems like, inhuman that that you guys can run at that pace. And earnings, obviously, the business results speak for themselves, what's it like there? I mean, you put your running shoes on, you run a marathon every day. >> It's lots of small teams working relatively independently and that scales and that's something other engineering organizations have trouble with. They build hierarchies that slow down. We have a really good engineering culture where every time you start a new team, it runs at its own speed. We've shown that as we add more and more resources, more teams, they are just executing. In fact, their accelerated, they're building on top of other things. We get to build higher and higher level abstractions to layer into. Just getting easier and easier to build things. We're accelerating our pace of innovation there's no slowing down. >> I was telling Jassy they're going to write a Harvard Business School case study on a lot of the management practices, but certainly the impact on the business side with the model that you guys do. But I got to ask you, on the momentum side, super impressed with SageMaker. I predicted on theCUBE at AWS Summit that that will be the fastest growing service. It will overtake Aurora, I think that is currently on stage, presented as the fastest growing service. SageMaker is really popular. Updates there, its role in the community. Obviously, Kubernete's a good fit for orchestrating things. We heard about CubeFlow, is an interesting model. What's going on with SageMaker how is it interplaying with Kubernetes? >> People that want to run, if you're running on-premise, cluster of GPU enabled machines then CubeFlow is a great way of doing that. You're on TensorFlow, that manages your cluster, you run CubeFlow on top. SageMaker is running at very low scale and like a lot of things we do at AWS, what you need to run an individual cluster for any one customer is different from running a multi-tenant service. SageMaker sits on top of ECS and it's now one of the largest generators of traffic to ECS which is Amazon's horizontally scaled, multi-tenant, cluster management system, which is now doing hundreds of millions of container launches a week. That is continuing to grow. We see Kubernetes as it's a more portable abstraction. It has some more, different layers of API's and a big community around it. But for the heavy lifting of running tens of thousands of containers in for a single application, we're still at the level where ECS does that every day and Kubernetes that's kind of the extreme case, where a few people are pushing it. It'll gradually grow scale. >> It's evolution. >> There's an evolution here. But the interesting things are, we're starting to get some convergence on some of the interfaces. Like the interfacing at CNA, CNA is the way you do networking on containers and there is one way of doing that, that is shared by everybody through CNA. EKS uses it, BCS uses it and Kubernetes uses it. >> And the impact of customers is what for that? What's the impact? >> It means the networking structures you want to set up will be the same. And the capabilities and the interfaces. But what happens on AWS is because it has a direct plug-in, you can hook it up to our accelerated networking infrastructure. So, AWS's instances right now, we've offloaded most of the network traffic processing. You're running 25 gigabits of traffic, that's quite a lot of work even for a big CPU, but it's handled by the the Nitro plug-in architecture we have, this in our latest instance type. So if you talked a bit about that at re:Invent but what you're getting is enormous, complete hypervisor offload at the core machine level. You get to use that accelerated networking. You're plugging into that interface. But that, if you want to have a huge number of containers on a machine and you're not really trying to drive very high throughput, then you can use Calico and we support that as well. So, multiple different ways but all through the same thing, the same plug-ins on both. >> System portability. You mentioned some stats, what's the numbers you mentioned? How many containers you're launching a week, hundreds of thousands? On ECS, our container platform that's been out for a few years, so hundreds of millions a week. It's really growing very fast. The containers are taking off everywhere. >> Microservices growth is, again that's the architecture. As architecture is a big part of the conversation what's your dialogue with customers? Because the modern software architecture in cloud, looks a lot different than what it was in the three layered approach that used to be the web stack. >> Yeah, and I think to add to that, you know we were just talking to folks about how in large enterprise organizations, you're still finding groups that do waterfall development. How are you working to kind of bring these customers and these developers into the future, per se? >> Yeah, that's actually, I spend about half my time managing the open source team and recruiting. The other half is talking to customers about this topic. I spend my time traveling around the world, talking at summits and events like this and meeting with customers. There's lots of different problems slowing people down. I think you see three phases of adoption of cloud, in general. One is just speed. I want to get something done quickly, I have a business need, I want to do it. I want machines in minutes instead of months, right, and that speeds everything up so you get something done quickly. The second phase is where you're starting to do stuff at scale and that's where you need cloud native. You really need to have elastic services, you can scale down as well as up, otherwise, you just end up with a lot of idle machines that cost you too much and it's not giving you the flexibility. The third phase we're getting into is complete data center shutdown. If you look at investing in a new data center or data center refresh or just opening an AWS account, it really doesn't make sense nowadays. We're seeing lots of large enterprises either considering it or well into it. Some are a long way into this. When you shut down the data center all of the backend core infrastructure starts coming out. So we're starting to see sort of mainframe replacement and the really critical business systems being replaced. Those are the interesting conversations, that's one of the areas that I'm particularly interested in right now and it's leading into this other buzzword, if you like, called chaos engineering. Which is sort of the, think of it as the availability model for cloud native and microservices. We're just starting a working group at CNCF around chaos engineering, is being started this week. So you can get a bit involved in how we can build some standards. >> That's going to be at Stanford? >> It's here, I mean it's a working group. >> Okay, online. >> The CNCF working group, they are wherever the people are, right. >> So, what is that conversation when you talk about that mainframe kind of conversation or shut down data centers to the cloud. What is the key thing that you promote, up front, that needs to get done by the by the customer? I mean, obviously you have the pillars, the key pillars, but you think about microservices it's a global platform, it's not a lift and shift situation, kind of is, it shut down, but I mean not at that scale. But, security, identity, authentication, there's no perimeter so you know microservices, potentially going to scale. What are the things that you promote upfront, that they have to do up front. What are the up front, table stake decisions? >> For management level, the real problem is people problems. And it's a technology problem somewhere down in the weeds. Really, if you don't get the people structures right then you'll spend forever going through these migrations. So if you sort of bite the bullet and do the reorganization that's needed first and get the right people in the right place, then you move much faster through it. I say a lot of the time, we're way upstream of picking a technology, it's much more about understanding the sort of DevOps, Agile and the organizational structures for these more cellular based organizations, you know, AWS is a great example of that. Netflix are another good example of that. Capital One is becoming a good example of that too. In banking, they're going much faster because they've already gone through that. >> So they're taking the Amazon model, small teams. Is that your general recommendation? What's your general recommendation? >> Well, this is the whole point of microservices, is that they're built by these small teams. It's called Conway's law, which says that the code will end up looking like the team, the org structure that built it. So, if you set up a lots of small teams, you will end up with microservices. That's just the way it works, right. If you try to take your existing siloed architecture with your long waterfall things, it's very hard not to build a monolith. Getting the org structure done first is right. Then we get into kind of the landing zone thing. You could spend years just debating what your architecture should be and some people have and then every year they come back, and it's changing faster than they can decide what to do. That's another kind of like analysis paralysis mode you see some larger enterprises in. I always think just do it. What's the standard best practice, layout my accounts like this, my networks like this, my structures we call it landing zone. We get somebody up to speed incredibly quickly and it's the beaten path. We're starting to build automation around these on boarding things, we're just getting stuff going. >> That's great. >> Yeah, and then going back to the sort of chaos engineering kind of idea, one of the first things I should think you should put into this infrastructure is the disaster recovery automation. Because if that gets there before the apps do, then the apps learn to live with the chaos monkeys and things like that. Really, one of the first apps we installed at Netflix was Chaos Monkey. It wasn't added later, it was there when you arrived. Your app had to survive the chaos that was in the system. So, think of that as, it used to be disaster recovery was incredibly expensive, hard to build, custom and very difficult to test. People very rarely run through their disaster recovery testing data center fail over, but if you build it in on day one, you can build it automated. I think Kubernetes is particularly interesting because the API's to do that automation are there. So we're looking at automating injecting failure at the Kubernetes level and also injecting into the underlying machines that are running Good Maze, like attacking the control plane to make sure that the control plane recovery works. I think there's a lot we can do there to automate it and make it into a low-cost, productized, safe, reliable thing, that you do a lot. Rather than being something that everyone's scared of doing that. >> Or they bolted on after they make decisions and the retrofit, pre-existing conditions into a disaster recovery. Which is chaotic in and of itself. >> So, get the org chart right and then actually get the disaster recovery patterns. If you need something highly available, do that first, before the apps turn up. >> Adrian, thanks for coming on, chaos engineering, congratulations and again, we know you know a little about Netflix, you know that environment, and been big Amazon customer. Congratulations on your success, looking forward to keeping in touch. Thanks for coming on and sharing the AWS perspective on theCUBE. I'm John Furrier, Lauren Cooney live in Denmark for KubeCon 2018 part of the CNC at the Cloud Native Compute Foundation. We'll back with more live coverage, stay with us. We'll be right back. (upbeat music)

Published Date : May 2 2018

SUMMARY :

Brought to you by the Cloud Native Computing Foundation great to have you on board today. So you guys are doing a great job, so congratulations. We have demos of it in the booth here this week. and what are you guys ingesting from So that's kind of the evolution from one trend, as the API for doing you know the management of that. In the serverless arena, we have our the news and action is. So the language ones, AltCon's our flagship one, when we interviewed you in Austin you came down, and Thursday and I had to turn it into a keynote I got to ask you it since you brought it up. where every time you start a new team, the business side with the model that you guys do. and Kubernetes that's kind of the extreme case, But the interesting things are, we're starting most of the network traffic processing. You mentioned some stats, what's the numbers you mentioned? As architecture is a big part of the conversation Yeah, and I think to add to that, and that speeds everything up so you the people are, right. What is the key thing that you promote, up front, and get the right people in the right place, Is that your general recommendation? and it's the beaten path. one of the first things I should think you should Which is chaotic in and of itself. So, get the org chart right and then actually we know you know a little about Netflix,

ENTITIES

Entity	Category	Confidence
Adrian Cockcroft	PERSON	0.99+
Lauren Cooney	PERSON	0.99+
Oregon	LOCATION	0.99+
Lauren	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adrian	PERSON	0.99+
Andy Jassy	PERSON	0.99+
January	DATE	0.99+
Denmark	LOCATION	0.99+
EKS	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
John Furrier	PERSON	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
Austin	LOCATION	0.99+
Virginia	LOCATION	0.99+
Ohio	LOCATION	0.99+
Cloud Native Compute Foundation	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Dublin	LOCATION	0.99+
Bob Wise	PERSON	0.99+
Thursday	DATE	0.99+
last week	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
25 gigabits	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
CNCF	ORGANIZATION	0.99+
Bob	PERSON	0.99+
second phase	QUANTITY	0.99+
KubeCon	EVENT	0.99+
Wednesday	DATE	0.99+
last year	DATE	0.99+
Harvard Business School	ORGANIZATION	0.99+
Copenhagen, Denmark	LOCATION	0.99+
Fargate	ORGANIZATION	0.99+
hundreds of thousands	QUANTITY	0.99+
third phase	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Chaos Monkey	TITLE	0.99+
SageMaker	TITLE	0.99+
one	QUANTITY	0.99+
U.S.	LOCATION	0.99+
Kubernetes	TITLE	0.99+
Tuesday	DATE	0.99+
Torch	ORGANIZATION	0.99+
Capital One	ORGANIZATION	0.99+
KubeCon 2018	EVENT	0.99+
Apache	ORGANIZATION	0.98+
Kubernetes	ORGANIZATION	0.98+
Python	TITLE	0.98+
CNA	TITLE	0.98+
CubeFlow	TITLE	0.98+
this week	DATE	0.98+
hundreds of millions a week	QUANTITY	0.97+
Kubernete	TITLE	0.97+
One	QUANTITY	0.97+
Calico	ORGANIZATION	0.97+
a week	QUANTITY	0.97+
both	QUANTITY	0.97+
first	QUANTITY	0.97+
tens of thousands of containers	QUANTITY	0.97+
re:Invent	EVENT	0.97+
CloudNativeCon Europe 2018	EVENT	0.97+
GA	LOCATION	0.96+

David Aronchick & JD Velasquez, Google | KubeCon + CloudNativeCon 2018

>> Announcer: Live, from Copenhagen, Denmark. It's theCUBE! Covering KubeCon and CloudNativeCon Europe 2018. Brought to you by the Cloud Native Computing Foundation, and its Ecosystem partners. >> Hi everyone, welcome back, this is theCUBE's exclusive coverage of the Linux Foundation's Cloud Native Compute Foundation KubeCon 2018 in Europe. I'm John Furrier, host of theCUBE and we're here with two Google folks. JD Velazquez who's the Product Manager for Stackdriver, got some news on that we're going to cover, and David Aronchick, who's the co-founder of Kubeflow, also with Google, news here on that. Guys, welcome to theCUBE, thanks for coming on. >> Thank you John. >> Thank you very much. >> So we're going to have Google Next coming out, theCUBE will be there this summer, looking forward to digging in to all the enterprise traction you guys have, and we had some good briefings at Google. Ton of movement on the Cloud for Google, so congratulations. >> JD: Thank you. >> Open source is not new to Google. This is a big show for you guys. What's the focus, you've got some news on Stackdriver, and Kubeflow. Kubeflow, not Cube flow, that's our flow. (laughing) David, share some of the news and then we'll get into Stackdriver. >> Absolutely, so Kubeflow is a brand new project. We launched it in December, and it is basically how to make machine learning stacks easy to use and deploy and maintain on Kubernetes. So we're not launching anything new. We support TensorFlow and PyTorch, Caffe, all the tools that you're familiar with today. But we use all the native APIs and constructs that Kubernetes rides to make it very easy and to let data scientists and researchers focus on what they do great, and let the I.T. Ops people deploy and manage these stacks. >> So simplifying the interactions and cross-functionality of the apps. Using Kubernetes. >> Exactly, when you go and talk to any researcher out there or data scientist, what you'll find is that while the model, TensorFlow, or Pytorch or whatever, that gets a little bit of the attention. 95% of the time is spent in all the other elements of the pipeline. Transforming your data, ingesting it, experimenting, visualizing. And then rolling it out toward production. What we want to do with Kubeflow is give everyone a standard way to interact with those, to interact with all those components. And give them a great workflow for doing so. >> That's great, and the Stackdriver news, what's the news we got going on? >> We're excited, we just announced the beta release of Stackdriver Kubernetes monitoring, which provides very rich and comprehensive observability for Kubernetes. So this is essentially simplifying operations for developers and operators. It's a very cool solution, it integrates many signals across the Kubernetes environment, including metrics, logs, events, as well as metadata. So what it allows is for you to really inspect your Kubernetes environment, regardless of the role, and regardless of where your deployment is running it. >> David is bringing up just the use cases. I just, my mind is exploding, 'cause you think about what Tensorflow is to a developer, and all the goodness that's going on with the app layer. The monitoring and the instrumentation is a critical piece, because Kubernetes is going to bring the people what is thousands and thousands of new services. So, how do you instrument that? I mean, you got to know, I want to provision this service dynamically, that didn't exist. How do you measure that, I mean this is, is this the challenge you guys are trying to figure out here? >> Yeah, for sure John. The great thing here is that we, and at Google primarily, many of our ancillary practices go beyond monitoring. It really is about observability, which I would describe more as a property of a system. How do you, are able to collect all these many signals to help you diagnose the production failure, and to get information about usage and so forth. So we do all of that for you in your Kubernetes environment, right. We take that toil away from the developer or the operator. Now, a cool thing is that you can also instrument your application in open source. You can use Prometheus, and we have an integration for that, so anything you've done in a Prometheus instrumentation, now you can bring into the cloud as needed. >> Tell about this notion, everyone gets that, oh my God, Google's huge. You guys are very open, you're integrating well. Talk about the guiding principles you guys have when you think about Prometheus as an example. Integrating in with these other projects. How are you guys treating these other projects? What's the standard practice? API Base? Is there integration plans? How do you guys address that question? >> Yeah, at a high level I would say, at Google, we really believe in contributing and helping grow open communities. I think that the best way to maintain a community open and portable is to help it grow. And Prometheus particularly, and Kubernetes of course, is a very vibrant community in that sense. So we are, from the start, designing our systems to be able to have integration, via APIs and so on, but also contributing directly to the projects. >> And I think that one thing that's just leveraging off that exact point, y'know, we realize what the world looks like. There's literally zero customers out there, like, "Well, I want be all in on one cloud. "Y'know, that 25 million dollar data center "I spent last year building. "Yeah, I'll toss that out so that I can get, "y'know, some special thing." The reality is, people are multi-cloud. And the only way to solve any problem is with these very open standards that work wherever people are. And that's very much core to our philosophy. >> Well, I mean, I've been critical of multi-cloud, by the definition. Statistically, if I'm on Azure, with 365, that's Azure. If I'm running something on Amazon, those are two clouds, they're not multi-cloud, by my definition. Which brings up where this is going, which is latency and portability, which you guys are really behind. How are you guys looking at that, because you mentioned observation. Let's talk about the observation space of clouds. How are you guys looking at, 'cause that's what people are talking about. When are we going to get to the future state, which is, I need to have workload portability, in real time, if I want to move something from Azure to AWS or Google Cloud, that would be cool. Can't do that today. >> That is actually the core of what we did around Kubeflow. What we are able to do is describe in code all the layers of your pipeline, all the steps of your pipeline. That works based on any conformant Kubernetes cluster. So, you have a Kubernetes conformant cluster on Azure, or on AWS, or on Google Cloud, or on your laptop, or in your private data center, that's great. And to be clear, I totally agree. I don't think that having single workloads spread across cloud, that's not just unrealistic, because of all the things you identified. Latency, variability, unknown failures, y'know. Cap theorem is a thing because, y'know, it's well-known. But what people want to do is, they want to take advantage of different clouds for the efforts that they provide. Maybe my data is here, maybe I have a legal reason, maybe this particular cloud has a unique chip, or unique service-- >> Use cases can drive it. >> Exactly, and then I can take my workload, which has been described in code and deploy it to that place where it makes sense. Keeping it within a single cloud, but as an organization I'll use multiple clouds together. >> Yeah, I agree, and the data's key, because if you can have data moving between clouds, I think that's something I would like to see, because that's going to be, because the metadata you mentioned is a real critical piece of all these apps. Whether it's instrumentation logging, and/or, y'know, provisioning new services. >> Yeah, and as soon as you have, as David is mentioning, if you have deployments on, y'know, with public or private clouds, then the difficult part is that of severability, that we were talking before. Because now you're trying to stitch together data, and tools to help you get that diagnosed, or get signals when you need them. This is what we're doing with Stackdriver Kubernetes monitoring, precisely. >> Y'know, we're early days in the cloud. It stills feels like we're 10 years in, but, y'know, a lot of people are now coming to realize cloud native, so. Y'know, I'm not a big fan of the whole, y'know, Amazon, although they do say Amazon's winning, they are doing quite well with the cloud, 'cause they're a cloud. It's early days, and you guys are doing some really specific good things with the cloud, but you don't have the breadth of services, say, Amazon has. And you guys are above board about that. You're like, "Hey, we're not trying to meet them "speed for speed on services." But you do certain things really, really well. You mentioned SRE. Site Reliability Engineers. This is a scale best practice that you guys have bringing to the table. But yet the customers are learning about Kubernetes. Some people who have never heard of it before say, "Hey, what's this Kubernetes thing?" >> Right. >> What is your perspectives on the relevance of Kubernetes at this point in history? Because it really feels like a critical mass, de facto, standard movement where everyone's getting behind Kubernetes, for all the right reasons. It feels a lot like interoperability is here. Thoughts on Kubernetes' relevance. >> Well I think that Alexis Richardson summed it up great today, the chairperson of the technical oversight committee. The reality is that what we're looking for, what operators and software engineers have been looking for forever, is clean lines between the various concerns. So as you think about the underlying infrastructure, and then you think about the applications that run on top of that, potentially services that run on top of that, then you think about applications, then you think about how that shows up to end users. Before, if you're old like me, you remember that you buy a $50,000 machine and stick it in the corner, and you'd stack everything on there, right? That never works, right? The power supply goes out, the memory goes out, this particular database goes out. Failure will happen. The only way to actually build a system that is reliable, that can meet your business needs, is by adopting something more cloud native, where if any particular component fails, your system can recover. If you have business requirements that change, you can move very quickly and adapt. Kubernetes provides a rich, portable, common set of APIs, that do work everywhere. And as a result, you're starting to see a lot of adoption, because it gives people that opportunity. But I think, y'know and let me hand off to JD here, y'know, the next layer up is about observability. Because without observing what's going on in each of those stacks, you're not going to have any kind of-- >> Well, programmability comes behind it, to your point. Talk about that, that's a huge point. >> Yeah, and just to build on what David is saying, one thing that is unique about Google is that we've been doing for more than a decade now, we've been very good at being able to provide innovative services without compromising reliability. Right, and so what we're doing is in that commitment, and you see that with Kubernetes and Istio, we're externalizing many of our, y'know, opinionated infrastructure, and platforms in that sense, but it's not just the platforms. You need those methodologies and best practices. And now the toolset. So that's what we're doing now, precisely. >> And you guys have made great strides, just to kind of point out to the folks watching, in the enterprise, I know you've got a lot more work to do but you're pedaling as fast as you can. I want to ask you specifically around this, because again, we're still early days with the cloud, if you think about it, there are now table stakes that are on the table that you got to get done. Check boxes if you will. Certainly on the government side there's like, compliance issues, and you guys are now checking those boxes. What is the key thing, 'cause you guys are operating at a scale that enterprises can't even fathom. I mean, millions of services, on and on up a huge scale. That's going to be helpful for them down the road, no doubt about it. But today, what is the Google table stakes that are done, and what are enterprises need to have for table stakes to do cloud native right, from your perspective? >> Well, I think more than anything, y'know, I agree with you. The reality is all the hyperscale cloud providers have the same table stakes, all the check boxes are checked, we're ready to go. I think what will really differentiate and move the ball forward for so many people is this adoption of cloud native. And really, how cloud native is your cloud, right? How much do you need to spin up an entire SRE team like Netflix in order to operate in the Netflix model of, y'know, complete automation and building your own services and things like that. Does your cloud help you get cloud native? And I think that's where we really want to lean in. It's not about IAS anymore, it's about does your cloud support the reliability, support the distribution, all the various services, in order to help you move even faster and achieve higher velocity. >> And standing up that is critical, because now these applications are the business model of companies, when you talk about digital. So I tweeted, I want to get your reaction to this, yesterday I got a quote I overheard from a person here in the hallways. "I need to get away from VPNs and firewalls. "I need user application layer security "with unphishable access, otherwise I'm never safe." Again this talks about the perimeterless cloud, spearphishing is really hot right now, people are getting killed with security concerns. So, I'm going to stop if I'm enterprise, I'm going to say, "Hold on, I'm not going," Y'know, I'm going to proceed with caution. What are you guys doing to take away the fear, and also the reality that as you provision all these, stand up all this infrastructure, services for customers, what are you guys doing to prevent phishing attacks from happening, security concerns, what's the Google story? >> So I think that more than anything, what we're trying to do is exactly what JD just said, which is externalize all the practices that we have. So, for example, at Google we have all sorts of internal tools that we've used, and internal practices. For example, we just published a whitepaper about our security practices where you need to have two vulnerabilities in order to break out of any system. We have all that written up there. We just published a whitepaper about encryption and how to do encryption by default, encryption between machines and so on. But I think what we're really doing is, we're helping people to operate like Google without having to spin up an entire SRE team as big as Google's to do it. An example is, we just released something internally, we have something called BeyondCorp. It's a non-firewall, non-VPN based way for you to authenticate against any Google system, using two-factor authentication, for our internal employees. Externally, we just released it, it's called, Internet, excuse me, IdentityAware proxy. You can use with literally any service that you have. You can provision a domain name, you can integrate with OAuth, you can, including Google OAuth or your own private OAuth. All those various things. That's simply a service that we offer, and so, really, y'know, I think-- >> And there's also multi, more than two-factor coming down the road, right? >> Exactly, actually IdentityAware proxy already supports two-factor. But I will say, one of the things that I always tell people, is a lot of enterprises say exactly what you said. "Jeez, this new world looks very scary to me. "I'm going to slow down." The problem is they're mistaken, under the mistaken impression that they're secure today. More than likely, they're not. They already have firewall, they already have VPN, and it's not great. In many ways, the enterprises that are going to win are the ones that lean in and move faster to the new world. >> Well, they have to, otherwise they're going to die, with IOT and all these benefits, they're exposed even as they are, just operationally. >> Yep. >> Just to support it. Okay, I want to get your thoughts, guys, on Google's role here at the Linux Foundation's CNCF KubeCon event. You guys do a lot of work in open source. You've got a lot of great fan base. I'm a fan of what you guys do, love the tech Google brings to the table. How do people get involved, what are you guys connecting with here, what's going on at the show, and how does someone get on board with the Google train? Certainly TensorFlow has been, it's like, great open source goodness, developers are loving it, what's going on? >> Well we have over almost 200 people from Google here at the show, helping and connecting with people, we have a Google booth which I invite people to stop by and tell about the different project we have. >> Yeah, and exactly like you said, we have an entire repo on Github. Anyone can jump in, all our things are open source and available for everyone to use no matter where they are. Obviously I've been on Kubernetes for a while. The Kubernetes project is on fire, Tensorflow is on fire, KubeFlow that we mentioned earlier is completely open source, we're integrating with Prometheus, which is a CNCF project. We are huge fans of these open source foundations and we think that's the direction that most software projects are going to go. >> Well congratulations, I know you guys invested a lot. I just want to highlight that. Again, to show my age, y'know these younger generation have no idea how hard open source was in the early days. I call it open bar and open source, you guys are bringing so much, y'know, everyone's drunk on all this goodness. Y'know, just these libraries you guys bringing to the table. >> David: Right. >> I mean Tensorflow is just the classic poster-child example. I mean, you're bringing a lot of stuff to the table. I mean, you invented Kubernetes. So much good stuff coming in. >> Yeah, I couldn't agree more. I hesitate to say we invented it. It really was a community effort, but yeah, absolutely-- >> But you opened it up, and you did it right, and did a good job. Congratulations. Thanks for coming on theCUBE, I'm going to see you at Google Next. theCUBE will be broadcasting live at Google Next in July. Of course we'll do a big drill-down on Google Cloud platform at that show. It's theCUBE here at KubeCon 2018 in Copenhagen, Denmark. More live coverage after this short break, stay with us. (upbeat music)

Published Date : May 2 2018

SUMMARY :

Brought to you by the Cloud Native Computing Foundation, of the Linux Foundation's Cloud Native Compute Foundation all the enterprise traction you guys have, This is a big show for you guys. and let the I.T. and cross-functionality of the apps. Exactly, when you go and talk to any researcher out there So what it allows is for you is this the challenge you guys to help you diagnose the production failure, Talk about the guiding principles you guys have is to help it grow. And the only way to solve any problem is with these How are you guys looking at that, because of all the things you identified. and deploy it to that place where it makes sense. because the metadata you mentioned Yeah, and as soon as you have, that you guys have bringing to the table. the relevance of Kubernetes at this point in history? and then you think about Well, programmability comes behind it, to your point. and you see that with Kubernetes and Istio, and you guys are now checking those boxes. in order to help you move even faster and also the reality that as you provision all these, You can use with literally any service that you have. is a lot of enterprises say exactly what you said. with IOT and all these benefits, I'm a fan of what you guys do, and tell about the different project we have. Yeah, and exactly like you said, Y'know, just these libraries you guys bringing to the table. I mean, you invented Kubernetes. I hesitate to say we invented it. I'm going to see you at Google Next.

ENTITIES

Entity	Category	Confidence
JD Velazquez	PERSON	0.99+
David	PERSON	0.99+
David Aronchick	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
John	PERSON	0.99+
thousands	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
JD Velasquez	PERSON	0.99+
December	DATE	0.99+
John Furrier	PERSON	0.99+
Prometheus	TITLE	0.99+
Netflix	ORGANIZATION	0.99+
95%	QUANTITY	0.99+
Europe	LOCATION	0.99+
July	DATE	0.99+
10 years	QUANTITY	0.99+
Alexis Richardson	PERSON	0.99+
two-factor	QUANTITY	0.99+
Linux Foundation	ORGANIZATION	0.99+
$50,000	QUANTITY	0.99+
Copenhagen, Denmark	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
zero customers	QUANTITY	0.99+
yesterday	DATE	0.99+
Kubernetes	TITLE	0.99+
last year	DATE	0.99+
JD	PERSON	0.99+
today	DATE	0.99+
one	QUANTITY	0.98+
KubeCon 2018	EVENT	0.98+
theCUBE	ORGANIZATION	0.98+
KubeCon	EVENT	0.98+
two clouds	QUANTITY	0.98+
two vulnerabilities	QUANTITY	0.97+
OAuth	TITLE	0.97+
each	QUANTITY	0.97+
two	QUANTITY	0.97+
single cloud	QUANTITY	0.96+
CloudNativeCon Europe 2018	EVENT	0.96+
one thing	QUANTITY	0.96+
Stackdriver	ORGANIZATION	0.96+
25 million dollar	QUANTITY	0.96+
more than two-factor	QUANTITY	0.95+
Istio	ORGANIZATION	0.95+
Github	ORGANIZATION	0.94+
Kubernetes	ORGANIZATION	0.93+
one cloud	QUANTITY	0.93+
Next	TITLE	0.93+
CNCF KubeCon	EVENT	0.93+
almost 200 people	QUANTITY	0.93+
Azure	TITLE	0.93+
TensorFlow	TITLE	0.93+
Google OAuth	TITLE	0.93+
more than a decade	QUANTITY	0.93+

Ritika Gunnar, IBM | IBM Think 2018

>> Narrator: Live from Las Vegas, it's theCUBE! Covering IBM Think 2018. Brought to you by IBM. >> Hello and I'm John Furrier. We're here in theCUBE studios at Think 2018, IBM Think 2018 in Mandalay Bay, in Las Vegas. We're extracting the signal from the noise, talking to all the executives, customers, thought leaders, inside the community of IBM and theCUBE. Our next guest is Ritika Gunnar who is the VP of Product for Watson and AI, cloud data platforms, all the goodness of the product side. Welcome to theCUBE. >> Thank you, great to be here again. >> So, we love talking to the product people because we want to know what the product strategy is. What's available, what's the hottest features. Obviously, we've been talking about, these are our words, Jenny introduced the innovation sandwich. >> Ritika: She did. >> The data's in the middle, and you have blockchain and AI on both sides of it. This is really the future. This is where they're going to see automation. This is where you're going to see efficiencies being created, inefficiencies being abstracted away. Obviously blockchain's got more of an infrastructure, futuristic piece to it. AI in play now, machine learning. You got Cloud underneath it all. How has the product morphed? What is the product today? We've heard of World of Watson in the past. You got Watson for this, you got Watson for IOT, You got Watson for this. What is the current offering? What's the product? Can you take a minute, just to explain what, semantically, it is? >> Sure. I'll start off by saying what is Watson? Watson is AI for smarter business. I want to start there. Because Watson is equal to how do we really get AI infused in our enterprise organizations and that is the core foundation of what Watson is. You heard a couple of announcements that the conference this week about what we're doing with Watson Studio, which is about providing that framework for what it means to infuse AI in our clients' applications. And you talked about machine learning. It's not just about machine learning anymore. It really is about how do we pair what machine learning is, which is about tweaking and tuning single algorithms, to what we're doing with deep learning. And that's one of the core components of what we're doing with Watson Studio is how do we make AI truly accessible. Not just machine learning but deep learning to be able to infuse those in our client environments really seamlessly and so the deep learning as a service piece of what we're doing in the studio was a big part of the announcements this week because deep learning allows our clients to really have it in a very accessible way. And there were a few things we announced with deep learning as a service. We said, look just like with predictive analytics we have capabilities that easily allow you to democratize that to knowledge workers and to business analysts by adding drag-and-drop capabilities. We can do the same thing with deep learning and deep learning capabilities. So we have taken a lot of things that have come from our research area and started putting those into the product to really bring about enterprise capabilities for deep learning but in a really de-skilled way. >> Yeah, and also to remind the folks, there's a platform involved here. Maybe you can say it's been re-platformed, I don't know. Maybe you can answer that. Has it been re-platformed or is it just the platformization of existing stuff? Because there's certainly demand. TensorFlow at Google showed that there's a demand for machine learning libraries and then deep learning behind. You got Amazon Web Services with Sagemaker, Touting. As a service model for AI, it's definitely in demand. So talk about the platform piece underneath. What is it? How does it get rendered? And then we'll come back and talk about the user consumption side. >> So it definitely is not a re-platformization. You recall what we have done with a focus initially on what we did on data science and what we did on machine learning. And the number one thing that we did was we were about supporting open-source and open frameworks. So it's not just one framework, like a TensorFlow framework, but it's about what we can do with TensorFlow, Keras, PyTorch, Caffe, and be able to use all of our builders' favorite open-source frameworks and be able to use that in a way where then we can add additional value on top of that and help them accelerate what it means to actually have that in the enterprise and what it means to actually de-skill that for the organization. So we started there. But really, if you look at where Watson has focused on the APIs and the API services, it's bringing together those capabilities of what we're doing with unstructured, pre-trained services, and then allowing clients to be able to bring together the structured and unstructured together on one platform, and adding the deep learning as a service capabilities, which is truly differentiating. >> Well, I think the important point there, just to amplify, and for the people to know is, it's not just your version of the tools for the data, you're looking at bringing data in from anywhere the customer, your customer wants it. And that's super critical. You don't want to ignore data. You can't. You got to have access to the data that matters. >> Yeah, you know, I think one of the other critical pieces that we're talking about here is, data without AI is meaningless and AI without data is really not useful or very accurate. So, having both of them in a yin yang and then bringing them together as we're doing in the Watson Studio is extremely important. >> The other thing I want get now to the user side, the consumption side you mentioned making it easier, but one of the things we've been hearing, that's been a theme in the hallways and certainly in theCUBE here is; bad data equals bad AI. >> Bad data equals bad AI. >> It's not just about bolting a AI on, you really got to take a holistic approach and a hygiene approach to the data and understanding where the data is contextually is relevant to the application. Talk about, that means kind of nuance, but break that down. What's your reaction to that and how do you talk to customers saying, okay look you want to do AI here's the playbook. How do you explain that in a very simple way? >> Well you heard of the AI ladder, making your data ready for AI. This is a really important concept because you need to be able to have trust in the data that you have, relevancy in the data that you have, and so it is about not just the connectivity to that data, but can you start having curated and rich data that is really valuable, that's accurate that you can trust, that you can leverage. It becomes not just about the data, but about the governance and the self-service capabilities that you can have and around that data and then it is about the machine learning and the deep learning characteristics that you can put on there. But, all three of those components are absolutely essential. What we're seeing it's not even about the data that you have within the firewall of your organization, it's about what you're doing to really augment that with external data. That's another area that we're having pre-trained, enriched, data sets with what we're doing with the Wats and data kits is extremely important; industry specific data. >> Well you know my pet peeve is always I love data. I'm a data geek, I love innovation, I love data driven, but you can't have data without good human interaction. The human component is critical and certainly with seeing trends where startups like Elation that we've interviewed; are taking this social approach to data where they're looking at it like you don't need to be a data geek or data scientist. The average business person's creating the value in especially blockchain, we were just talking in theCUBE that it's the business model Innovations, it's universal property and the technology can be enabled and managed appropriately. This is where the value is. What's the human component? Is there like... You want to know who's using the data? >> Well-- >> Why are they using data? It's like do I share the data? Can you leverage other people's data? This is kind of a melting pot. >> It is. >> What's the human piece of it? >> It truly is about enabling more people access to what it means to infuse AI into their organization. When I said it's not about re-platforming, but it's about expanding. We started with the data scientists, and we're adding to that the application developer. The third piece of that is, how do you get the knowledge worker? The subject matter expert? The person who understand the actual machine, or equipment that needs to be inspected. How do you get them to start customizing models without having to know anything about the data science element? That's extremely important because I can auto-tag and auto-classify stuff and use AI to get them started, but there is that human element of not needing to be a data scientist, but still having input into that AI and that's a very beautiful thing. >> You know it's interesting is in the security industry you've seen groups; birds of a feather flock together, where they share hats and it's a super important community aspect of it. Data has now, and now with AI, you get the AI ladder, but this points to AI literacy within the organizations. >> Exactly. >> So you're seeing people saying, hey we need AI literacy. Not coding per se, but how do we manage data? But it's also understanding who within your peer group is evolving. So your seeing now a whole formation of user base out there, users who want to know who their; the birds of the other feather flocking together. This is now a social gamification opportunity because they're growing together. >> There're-- >> What's your thought on that? >> There're two things there I would say. First, is we often go to the technology and as a product person I just spoke to you a lot about the technology. But, what we find in talking to our clients, is that it really is about helping them with the skills, the culture, the process transformation that needs to happen within the organization to break down the boundaries and the silos exist to truly get AI into an organization. That's the first thing. The second, is when you think about AI and what it means to actually infuse AI into an enterprise organization there's an ethics component of this. There's ethics and bias, and bias components which you need to mitigate and detect, and those are real problems and by the way IBM, especially with the work that we're doing within Watson, with the work that we're doing in research, we're taking this on front and center and it's extremely important to what we do. >> You guys used to talk about that as cognitive, but I think you're so right on. I think this is such a progressive topic, love to do a deeper dive on it, but really you nailed it. Data has to have a consensus algorithm built into it. Meaning you need to have, that's why I brought up this social dynamic, because I'm seeing people within organizations address regulatory issues, legal issues, ethical, societal issues all together and it requires a group. >> That's right. >> Not just algorithm, people to synthesize. >> Exactly. >> And that's either diversity, diverse groups from different places and experiences whether it's an expert here, user there; all coming together. This is not really talked about much. How are you guys-- >> I think it will be more. >> John: It will, you think so? >> Absolutely it will be more. >> What do you see from customers? You've done a lot of client meetings. Are they talking about this? Or they still more in the how do I stand up AI, literacy. >> They are starting to talk about it because look, imagine if you train your model on bad data. You actually have bias then in your model and that means that the accuracy of that model is not where you need it to be if your going to run it in an enterprise organization. So, being able to do things like detect it and proactively mitigate it are at the forefront and by the way this where our teams are really focusing on what we can do to further the AI practice in the enterprise and it is where we really believe that the ethics part of this is so important for that enterprise or smarter business component. >> Iterating through the quality the data's really good. Okay, so now I was talking to Rob Thomas talking about data containers. We were kind of nerding out on Kubernetes and all that good stuff. You almost imagine Kubernetes and containers making data really easy to move around and manage effectively with software, but I mentioned consensus on the understanding the quality of the data and understanding the impact of the data. When you say consensus, the first thing that jumps in my mind is blockchain, cryptocurrency. Is there a tokenization economics model in data somewhere? Because all the best stuff going on in blockchain and cryptocurrency that's technically more impactful is the changing of the economics. Changing of the technical architectures. You almost can say, hmm. >> You can actually see over a time that there is a business model that puts more value not just on the data and the data assets themselves, but on the models and the insights that are actually created from the AI assets themselves. I do believe that is a transformation just like what we're seeing in blockchain and the type of cryptocurrency that exists within there, and the kind of where the value is. We will see the same shift within data and AI. >> Well, you know, we're really interested in exploring and if you guys have any input to that we'd love to get more access to thought leaders around the relationship people and things have to data. Obviously the internet of things is one piece, but the human relationship the data. You're seeing it play out in real time. Uber had a first death this week, that was tragic. First self-driving car fatality. You're seeing Facebook really get handed huge negative press on the fact that they mismanaged the data that was optimized for advertising not user experience. You're starting to see a shift in an evolution where people are starting to recognize the role of the human and their data and other people's data. This is a big topic. >> It's a huge topic and I think we'll see a lot more from it and the weeks, and months, and years ahead on this. I think it becomes a really important point as to how we start to really innovate in and around not just the data, but the AI we apply to it and then the implications of it and what it means in terms of if the data's not right, if the algorithm's aren't right, if the biases is there. It is big implications for society and for the environment as a whole. >> I really appreciate you taking the time to speak with us. I know you're super busy. My final question's much more share some color commentary on IBM Think this week, the event, your reaction to, obviously it's massive, and also the customer conversations you've had. You've told me that your in client briefings and meetings. What are they talking about? What are they asking for? What are some of the things that are, low-hanging fruit use cases? Where's the starting point? Where are people jumping in? Can you just share any data you have on-- >> Oh I can share. That's a fully loaded question; that's like 10 questions all in one. But the Think conference has been great in terms of when you think about the problems that we're trying to solve with AI, it's not AI alone, right? It actually is integrated in with things like data, with the systems, with how we actually integrate that in terms of a hybrid way of what we're doing on premises and what we're doing in private Cloud, what we're doing in public Cloud. So, actually having a forum where we're talking about all of that together in a unified manner has actually been great feedback that I've heard from many customers, many analysts, and in general from an IBM perspective, I believe has been extremely valuable. I think the types of questions that I'm hearing and the types of inputs and conversations we're having, are one of where clients want to be able to innovate and really do things that are in Horizon three type things. What are the things they should be doing in Horizon one, Horizon two, and Horizon three when it comes to AI and when it comes to AI and how they treat their data. This is really important because-- >> What's Horizon one, two and three? >> You think about Horizon one, those are things you should be doing immediately to get immediate value in your business. Horizon two, are kind of mid-term, 18 to 24. 24 plus months out is Horizon 3. So when you think about an AI journey, what is your AI journey really look like in terms of what you should be doing in the immediate terms. Small, quick wins. >> Foundational. >> What are things that you can do kind of projects that will pan out in a year and what are the two to three year projects that we should be doing. This are the most frequent conversations that I've been having with a lot of our clients in terms of what is that AI journey we should be thinking about, what are the projects right now, how do we work with you on the projects right now on H1 and H2. What are the things we can start incubating that are longer term. And these extremely transformational in nature. It's kind of like what do we do to really automate self-driving, not just cars, but what we do for trains and we do to do really revolutionize certain industries and professions. >> How does your product roadmap to your Horizons? Can you share a little bit about the priorities on the roadmap? I know you don't want to share a lot of data, competitive information. But, can you give an antidotal or at least a trajectory of what the priorities are and some guiding principals? >> I hinted at some of it, but I only talked about the Studio, right... During this discussion, but still Studio is just one of a three-pronged approach that we have in Watson. The Studio really is about laying the foundation that is equivalent for how do we get AI in our enterprises for the builders, and it's like a place where builders go to be able to create, build, deploy those models, machine learning, deep learning models and be able to do so in a de-skilled way. Well, on top of that, as you know, we've done thousands of engagements and we know the most comprehensive ways that clients are trying to use Watson and AI in their organizations. So taking our learnings from that, we're starting to harden those in applications so that clients can easily infuse that into their businesses. We have capabilities for things like Watson Assistance, which was announced this week at the conference that really helped clients with pre-existing skills like how do you have a customer care solution, but then how can you extend it to other industries like automotive, or hospitality, or retail. So, we're working not just within Watson but within broader IBM to bring solutions like that. We also have talked about compliance. Every organization has a regulatory, or compliance, or legal department that deals with either SOWs, legal documents, technical documents. How do you then start making sure that you're adhering to the types of regulations or legal requirements that you have on those documents. Compare and comply actually uses a lot of the Watson technologies to be able to do that. And scaling this out in terms of how clients are really using the AI in their business is the other point of where Watson will absolutely focus going forward. >> That's awesome, Ritika. Thank you for coming on theCUBE, sharing the awesome work and again gutting across IBM and also outside in the industry. The more data the better the potential. >> Absolutely. >> Well thanks for sharing the data. We're putting the data out there for you. theCUBE is one big data machine, we're data driven. We love doing these interviews, of course getting the experts and the product folks on theCUBE is super important to us. I'm John Furrier, more coverage for IBM Think after this short break. (upbeat music)

Published Date : Mar 21 2018

SUMMARY :

Brought to you by IBM. all the goodness of the product side. Jenny introduced the innovation sandwich. and you have blockchain and AI on both sides of it. and that is the core foundation of what Watson is. Yeah, and also to remind the folks, there's a platform and adding the deep learning as a service capabilities, and for the people to know is, and then bringing them together the consumption side you mentioned making it easier, and how do you talk to customers saying, and the self-service capabilities that you can have and the technology can be enabled and managed appropriately. It's like do I share the data? that human element of not needing to be a data scientist, You know it's interesting is in the security industry the birds of the other feather flocking together. and the silos exist to truly get AI into an organization. love to do a deeper dive on it, but really you nailed it. How are you guys-- What do you see from customers? and that means that the accuracy of that model is not is the changing of the economics. and the kind of where the value is. and if you guys have any input to and for the environment as a whole. and also the customer conversations you've had. and the types of inputs and conversations we're having, what you should be doing in the immediate terms. What are the things we can start incubating on the roadmap? of the Watson technologies to be able to do that. and also outside in the industry. and the product folks on theCUBE is super important to us.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Jenny	PERSON	0.99+
John	PERSON	0.99+
Ritika Gunnar	PERSON	0.99+
Uber	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Mandalay Bay	LOCATION	0.99+
10 questions	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Ritika	PERSON	0.99+
both	QUANTITY	0.99+
First	QUANTITY	0.99+
three year	QUANTITY	0.99+
Horizon 3	TITLE	0.99+
third piece	QUANTITY	0.99+
second	QUANTITY	0.99+
Horizon three	TITLE	0.99+
Watson	TITLE	0.99+
one piece	QUANTITY	0.98+
both sides	QUANTITY	0.98+
first death	QUANTITY	0.98+
this week	DATE	0.98+
TensorFlow	TITLE	0.98+
Las Vegas	LOCATION	0.98+
one	QUANTITY	0.98+
a year	QUANTITY	0.97+
one platform	QUANTITY	0.97+
Kubernetes	TITLE	0.97+
Horizon two	TITLE	0.97+
Elation	ORGANIZATION	0.96+
first thing	QUANTITY	0.96+
18	QUANTITY	0.96+
Watson Studio	TITLE	0.96+
today	DATE	0.95+
thousands	QUANTITY	0.95+
two things	QUANTITY	0.95+
PyTorch	TITLE	0.95+
Watson Assistance	TITLE	0.94+
24. 24	QUANTITY	0.94+
2018	DATE	0.94+
Horizon one	TITLE	0.93+
Think 2018	EVENT	0.93+
three	QUANTITY	0.9+
one framework	QUANTITY	0.89+
Sagemaker	ORGANIZATION	0.88+
theCUBE	ORGANIZATION	0.88+
single algorithms	QUANTITY	0.88+
Think	COMMERCIAL_ITEM	0.85+
Keras	TITLE	0.84+
Caffe	TITLE	0.81+
three	TITLE	0.8+
First self-	QUANTITY	0.79+

Ziya Ma, Intel | Big Data SV 2018

>> Live from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big data SV. I'm Lisa Martin with my co-host George Gilbert. We're down the street from the Strata Data Conference, hearing a lot of interesting insights on big data. Peeling back the layers, looking at opportunities, some of the challenges, barriers to overcome but also the plethora of opportunities that enterprises alike have that they can take advantage of. Our next guest is no stranger to theCUBE, she was just on with me a couple days ago at the Women in Data Science Conference. Please welcome back to theCUBE, Ziya Ma. Vice President of Software and Services Group and the Director of Big Data Technologies from Intel. Hi Ziya! >> Hi Lisa. >> Long time, no see. >> I know, it was just really two to three days ago. >> It was, well and now I can say happy International Women's Day. >> The same to you, Lisa. >> Thank you, it's great to have you here. So as I mentioned, we are down the street from the Strata Data Conference. You've been up there over the last couple days. What are some of the things that you're hearing with respect to big data? Trends, barriers, opportunities? >> Yeah, so first it's very exciting to be back at the conference again. The one biggest trend, or one topic that's hit really hard by many presenters, is the power of bringing the big data system and data science solutions together. You know, we're definitely seeing in the last few years the advancement of big data and advancement of data science or you know, machine learning, deep learning truly pushing forward business differentiation and improve our life quality. So that's definitely one of the biggest trends. Another thing I noticed is there was a lot of discussion on big data and data science getting deployed into the cloud. What are the learnings, what are the use cases? So I think that's another noticeable trend. And also, there were some presentations on doing the data science or having the business intelligence on the edge devices. That's another noticeable trend. And of course, there were discussion on security, privacy for data science and big data so that continued to be one of the topics. >> So we were talking earlier, 'cause there's so many concepts and products to get your arms around. If someone is looking at AI and machine learning on the back end, you know, we'll worry about edge intelligence some other time, but we know that Intel has the CPU with the Xeon and then this lower power one with Atom. There's the GPU, there's ASICs, FPGAS, and then there are these software layers you know, with higher abstraction layer, higher abstraction level. Help us put some of those pieces together for people who are like saying, okay, I know I've got a lot of data, I've got to train these sophisticated models, you know, explain this to me. >> Right, so Intel is a real solution provider for data science and big data. So at the hardware level, and George, as you mentioned, we offer a wide range of products from general purpose like Xeon to targeted silicon such as FPGA, Nervana, and other ASICs chips like Nervana. And also we provide adjacencies like networking the hardware, non-volatile memory and mobile. You know, those are the other adjacent products that we offer. Now on top of the hardware layer, we deliver fully optimized software solutions stack from libraries, frameworks, to tools and solutions. So that we can help engineers or developers to create AI solutions with greater ease and productivity. For instance, we deliver Intel optimized math kernel library. That leverage of the latest instruction set gives us significant performance boosts when you are running your software on Intel hardware. We also deliver framework like BigDL and for Spark and big data type of customers if they are looking for deep learning capabilities. We also optimize some popular open source deep learning frameworks like Caffe, like TensorFlow, MXNet, and a few others. So our goal is to provide all the necessary solutions so that at the end our customers can create the applications, the solutions that they really need to address their biggest pinpoints. >> Help us think about the maturity level now. Like, we know that the very most sophisticated internet service providers who are sort of all over this machine learning now for quite a few years. Banks, insurance companies, people who've had this. Statisticians and actuaries who have that sort of skillset are beginning to deploy some of these early production apps. Where are we in terms of getting this out to the mainstream? What are some of the things that have to happen? >> To get it to mainstream, there are so many things we could do. First I think we will continue to see the wide range of silicon products but then there are a few things Intel is pushing. For example, we're developing this in Nervana, graph compiler that will encapsulate the hardware integration details and present a consistent API for developers to work with. And this is one thing that we hope that we can eventually help the developer community with. And also, we are collaborating with the end user. Like, from the enterprise segment. For example, we're working with the financial services industry, we're working with a manufacturing sector and also customers from the medical field. And online retailers, trying to help them to deliver or create the data science and analytics solutions on Intel-based hardware or Intel optimized software. So that's another thing that we do. And we're seeing actually very good progress in this area. Now we're also collaborating with many cloud service providers. For instance, we work with some of the top seven cloud service providers, both in the U.S. and also in China to democratize the, not only our hardware, but also our libraries and tools, BigDL, MKL, and other frameworks and libraries so that our customers, including individuals and businesses, can easily access to those building blocks from the cloud. So definitely we're working from different factors. >> So last question in the last couple of minutes. Let's kind of vibe on this collaboration theme. Tell us a little bit about the collaboration that you're having with, you mentioned customers in some highly regulated industries, for as an example. But a little bit to understand what's that symbiosis? What is Intel learning from your customers that's driving Intel's innovation of your technologies and big data? >> That's an excellent question. So Lisa, maybe I can start my sharing a couple of customer use cases. What kind of a solution that we help our customer to address. I think it's always wise not to start a conversation with the customer on technology that you deliver. You want to understand the customer's needs first. And then so that you can provide a solution that really address their biggest pinpoint rather than simply selling technology. So for example, we have worked with an online retailer to better understand their customers' shopping behavior and to assess their customers' preferences and interests. And based upon that analysis, the online retailer made different product recommendations and maximized its customers' purchase potential. And it drove up the retailer's sales. You know, that's one type of use case that we have worked. We also have partnered with the customers from the medical field. Actually, today at the Strata Conference we actually had somebody highlighting, we had a joint presentation with UCSF where we helped the medical center to automate the diagnosis and grading of meniscus lesions. And so today actually, that's all done manually by the radiologist but now that entire process is automated. The result is much more accurate, much more consistent, and much more timely. Because you don't have to wait for the availability of a radiologist to read all the 3D MRI images. And that can all be done by machines. You know, so those are the areas that we work with our customers, understand their business need, and give them the solution they are looking for. >> Wow, the impact there. I wish we had more time to dive into some of those examples. But we thank you so much, Ziya, for stopping by twice in one week to theCUBE and sharing your insights. And we look forward to having you back on the show in the near future. >> Thanks, so thanks Lisa, thanks George for having me. >> And for my co-host George Gilbert, I'm Lisa Martin. We are live at Big Data SV in San Jose. Come down, join us for the rest of the afternoon. We're at this cool place called Forager Tasting and Eatery. We will be right back with our next guest after a short break. (electronic outro music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media some of the challenges, barriers to overcome What are some of the things that you're So that's definitely one of the biggest trends. on the back end, So at the hardware level, and George, as you mentioned, What are some of the things that have to happen? and also customers from the medical field. So last question in the last couple of minutes. customers from the medical field. And we look forward to having you We will be right back with our

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
UCSF	ORGANIZATION	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
San Jose	LOCATION	0.99+
China	LOCATION	0.99+
Ziya Ma	PERSON	0.99+
U.S.	LOCATION	0.99+
International Women's Day	EVENT	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Ziya	PERSON	0.99+
one week	QUANTITY	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
First	QUANTITY	0.99+
Strata Data Conference	EVENT	0.99+
one topic	QUANTITY	0.98+
Spark	TITLE	0.98+
both	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
one thing	QUANTITY	0.98+
three days ago	DATE	0.98+
Women in Data Science Conference	EVENT	0.97+
Strata Conference	EVENT	0.96+
first	QUANTITY	0.96+
BigDL	TITLE	0.96+
TensorFlow	TITLE	0.96+
one type	QUANTITY	0.95+
two	DATE	0.94+
MXNet	TITLE	0.94+
Caffe	TITLE	0.92+
theCUBE	ORGANIZATION	0.91+
one	QUANTITY	0.9+
Software and Services Group	ORGANIZATION	0.9+
Forager Tasting and Eatery	ORGANIZATION	0.88+
Vice President	PERSON	0.86+
Big Data Technologies	ORGANIZATION	0.84+
seven cloud service providers	QUANTITY	0.81+
last couple days	DATE	0.81+
Atom	COMMERCIAL_ITEM	0.76+
Silicon Valley	LOCATION	0.76+
Big Data SV 2018	EVENT	0.74+
a couple days ago	DATE	0.72+
Big Data SV	ORGANIZATION	0.7+
Xeon	COMMERCIAL_ITEM	0.7+
Nervana	ORGANIZATION	0.68+
Big Data	EVENT	0.62+
last	DATE	0.56+
data	EVENT	0.54+
case	QUANTITY	0.52+
3D	QUANTITY	0.48+
couple	QUANTITY	0.47+
years	DATE	0.47+
Nervana	TITLE	0.45+
Big	ORGANIZATION	0.32+

Armughan Ahmad, Dell EMC | Super Computing 2017

>> Announcer: From Denver, Colorado, it's theCUBE, covering Super Computing 17. Brought to you by Intel. (soft electronic music) Hey, welcome back, everybody. Jeff Frick here with theCUBE. We're gettin' towards the end of the day here at Super Computing 2017 in Denver, Colorado. 12,000 people talkin' really about the outer limits of what you can do with compute power and lookin' out into the universe and black holes and all kinds of exciting stuff. We're kind of bringin' it back, right? We're all about democratization of technology for people to solve real problems. We're really excited to have our last guest of the day, bringin' the energy, Armughan Ahmad. He's SVP and GM, Hybrid Cloud and Ready Solutions for Dell EMC, and a many-time CUBE alumni. Armughan, great to see you. >> Yeah, good to see you, Jeff. So, first off, just impressions of the show. 12,000 people, we had no idea. We've never been to this show before. This is great. >> This is a show that has been around. If you know the history of the show, this was an IEEE engineering show, that actually turned into high-performance computing around research-based analytics and other things that came out of it. But, it's just grown. We're seeing now, yesterday the super computing top petaflops were released here. So, it's fascinating. You have some of the brightest minds in the world that actually come to this event. 12,000 of them. >> Yeah, and Dell EMC is here in force, so a lot of announcements, a lot of excitement. What are you guys excited about participating in this type of show? >> Yeah, Jeff, so when we come to an event like this, HBC-- We know that HBC is also evolved from your traditional HBC, which was around modeling and simulation, and how it started from engineering to then clusters. It's now evolving more towards machine learning, deep learning, and artificial intelligence. So, what we announced here-- Yesterday, our press release went out. It was really related to how our strategy of advancing HBC, but also democratizing HBC's working. So, on the advancing, on the HBC side, the top 500 super computing list came out. We're powering some of the top 500 of those. One big one is TAC, which is Texas Institute out of UT, University of Texas. They now have, I believe, the number 12 spot in the top 500 super computers in the world, running an 8.2 petaflops off computing. >> So, a lot of zeros. I have no idea what a petaflop is. >> It's very, very big. It's very big. It's available for machine learning, but also eventually going to be available for deep learning. But, more importantly, we're also moving towards democratizing HBC because we feel that democratizing is also very important, where HBC should not only be for the research and the academia, but it should also be focused towards the manufacturing customers, the financial customers, our commercial customers, so that they can actually take the complexity of HBC out, and that's where our-- We call it our HBC 2.0 strategy, off learning from the advancements that we continue to drive, to then also democratizing it for our customers. >> It's interesting, I think, back to the old days of Intel microprocessors getting better and better and better, and you had Spark and you had Silicon Graphics, and these things that were way better. This huge differentiation. But, the Intel I32 just kept pluggin' along and it really begs the question, where is the distinction now? You have huge clusters of computers you can put together with virtualization. Where is the difference between just a really big cluster and HBC and super computing? >> So, I think, if you look at HBC, HBC is also evolving, so let's look at the customer view, right? So, the other part of our announcement here was artificial intelligence, which is really, what is artificial intelligence? It's, if you look at a customer retailer, a retailer has-- They start with data, for example. You buy beer and chips at J's Retailer, for example. You come in and do that, you usually used to run a SEQUEL database or you used to run a RDBMS database, and then that would basically tell you, these are the people who can purchase from me. You know their purchase history. But, then you evolved into BI, and then if that data got really, very large, you then had an HBC cluster, would which basically analyze a lot of that data for you, and show you trends and things. That would then tell you, you know what, these are my customers, this is how many times they are frequent. But, now it's moving more towards machine learning and deep learning as well. So, as the data gets larger and larger, we're seeing datas becoming larger, not just by social media, but your traditional computational frameworks, your traditional applications and others. We're finding that data is also growing at the edge, so by 2020, about 20 billion devices are going to wake up at the edge and start generating data. So, now, Internet data is going to look very small over the next three, four years, as the edge data comes up. So, you actually need to now start thinking of machine learning and deep learning a lot more. So, you asked the question, how do you see that evolving? So, you see an RDBMS traditional SQL evolving to BI. BI then evolves into either an HBC or hadoop. Then, from HBC and hadoop, what do you do next? What you do next is you start to now feed predictive analytics into machine learning kind of solutions, and then once those predictive analytics are there, then you really, truly start thinking about the full deep learning frameworks. >> Right, well and clearly like the data in motion. I think it's funny, we used to make decisions on a sample of data in the past. Now, we have the opportunity to take all the data in real time and make those decisions with Kafka and Spark and Flink and all these crazy systems that are comin' to play. Makes Hadoop look ancient, tired, and yesterday, right? But, it's still valid, right? >> A lot of customers are still paying. Customers are using it, and that's where we feel we need to simplify the complex for our customers. That's why we announced our Machine Learning Ready Bundle and our Deep Learning Ready Bundle. We announced it with Intel and Nvidia together, because we feel like our customers either go to the GPU route, which is your accelerator's route. We announced-- You were talking to Ravi, from our server team, earlier, where he talked about the C4140, which has the quad GPU power, and it's perfect for deep learning. But, with Intel, we've also worked on the same, where we worked on the AI software with Intel. Why are we doing all of this? We're saying that if you thought that RDBMS was difficult, and if you thought that building a hadoop cluster or HBC was a little challenging and time consuming, as the customers move to machine learning and deep learning, you now have to think about the whole stack. So, let me explain the stack to you. You think of a compute storage and network stack, then you think of-- The whole eternity. Yeah, that's right, the whole eternity of our data center. Then you talk about our-- These frameworks, like Theano, Caffe, TensorFlow, right? These are new frameworks. They are machine learning and deep learning frameworks. They're open source and others. Then you go to libraries. Then you go to accelerators, which accelerators you choose, then you go to your operating systems. Now, you haven't even talked about your use case. Retail use case or genomic sequencing use case. All you're trying to do is now figure out TensorFlow works with this accelerator or does not work with this accelerator. Or, does Caffe and Theano work with this operating system or not? And, that is a complexity that is way more complex. So, that's where we felt that we really needed to launch these new solutions, and we prelaunched them here at Super Computing, because we feel the evolution of HBC towards AI is happening. We're going to start shipping these Ready Bundles for machine learning and deep learning in first half of 2018. >> So, that's what the Ready Solutions are? You're basically putting the solution together for the client, then they can start-- You work together to build the application to fix whatever it is they're trying to do. >> That's exactly it. But, not just fix it. It's an outcome. So, I'm going to go back to the retailer. So, if you are the CEO of the biggest retailer and you are saying, hey, I just don't want to know who buys from me, I want to now do predictive analytics, which is who buys chips and beer, but who can I sell more things to, right? So, you now start thinking about demographic data. You start thinking about payroll data and other datas that surround-- You start feeding that data into it, so your machine now starts to learn a lot more of those frameworks, and then can actually give you predictive analytics. But, imagine a day where you actually-- The machine or the deep learning AI actually tells you that it's not just who you want to sell chips and beer to, it's who's going to buy the 4k TV? You're makin' a lot of presumptions. Well, there you go, and the 4k-- But, I'm glad you're doin' the 4k TV. So, that's important, right? That is where our customers need to understand how predictive analytics are going to move towards cognitive analytics. So, this is complex but we're trying to make that complex simple with these Ready Solutions from machine learning and deep learning. >> So, I want to just get your take on-- You've kind of talked about these three things a couple times, how you delineate between AI, machine learning, and deep learning. >> So, as I said, there is an evolution. I don't think a customer can achieve artificial intelligence unless they go through the whole crawl walk around space. There's no shortcuts there, right? What do you do? So, if you think about, Mastercard is a great customer of ours. They do an incredible amount of transactions per day, (laughs) as you can think, right? In millions. They want to do facial recognitions at kiosks, or they're looking at different policies based on your buying behavior-- That, hey, Jeff doesn't buy $20,000 Rolexes every year. Maybe once every week, you know, (laughs) it just depends how your mood is. I was in the Emirates. Exactly, you were in Dubai (laughs). Then, you think about his credit card is being used where? And, based on your behaviors that's important. Now, think about, even for Mastercard, they have traditional RDBMS databases. They went to BI. They have high-performance computing clusters. Then, they developed the hadoop cluster. So, what we did with them, we said okay. All that is good. That data that has been generated for you through customers and through internal IT organizations, those things are all very important. But, at the same time, now you need to start going through this data and start analyzing this data for predictive analytics. So, they had 1.2 million policies, for example, that they had to crunch. Now, think about 1.2 million policies that they had to say-- In which they had to take decisions on. That they had to take decisions on. One of the policies could be, hey, does Jeff go to Dubai to buy a Rolex or not? Or, does Jeff do these other patterns, or is Armughan taking his card and having a field day with it? So, those are policies that they feed into machine learning frameworks, and then machine learning actually gives you patterns that they can now see what your behavior is. Then, based on that, eventually deep learning is when they move to next. Deep learning now not only you actually talk about your behavior patterns on the credit card, but your entire other life data starts to-- Starts to also come into that. Then, now, you're actually talking about something before, that's for catching a fraud, you can actually be a lot more predictive about it and cognitive about it. So, that's where we feel that our Ready Solutions around machine learning and deep learning are really geared towards, so taking HBC to then democratizing it, advancing it, and then now helping our customers move towards machine learning and deep learning, 'cause these buzzwords of AIs are out there. If you're a financial institution and you're trying to figure out, who is that customer who's going to buy the next mortgage from you? Or, who are you going to lend to next? You want the machine and others to tell you this, not to take over your life, but to actually help you make these decisions so that your bottom line can go up along with your top line. Revenue and margins are important to every customer. >> It's amazing on the credit card example, because people get so pissed if there's a false positive. With the amount of effort that they've put into keep you from making fraudulent transactions, and if your credit card ever gets denied, people go bananas, right? The behavior just is amazing. But, I want to ask you-- We're comin' to the end of 2017, which is hard to believe. Things are rolling at Dell EMC. Michael Dell, ever since he took that thing private, you could see the sparkle in his eye. We got him on a CUBE interview a few years back. A year from now, 2018. What are we going to talk about? What are your top priorities for 2018? >> So, number one, Michael continues to talk about that our vision is advancing human progress through technology, right? That's our vision. We want to get there. But, at the same time we know that we have to drive IT transformation, we have to drive workforce transformation, we have to drive digital transformation, and we have to drive security transformation. All those things are important because lots of customers-- I mean, Jeff, do you know like 75% of the S&P 500 companies will not exist by 2027 because they're either not going to be able to make that shift from Blockbuster to Netflix, or Uber taxi-- It's happened to our friends at GE over the last little while. >> You can think about any customer-- That's what Michael did. Michael actually disrupted Dell with Dell technologies and the acquisition of EMC and Pivotal and VMWare. In a year from now, our strategy is really about edge to core to the cloud. We think the world is going to be all three, because the rise of 20 billion devices at the edge is going to require new computational frameworks. But, at the same time, people are going to bring them into the core, and then cloud will still exist. But, a lot of times-- Let me ask you, if you were driving an autonomous vehicle, do you want that data-- I'm an Edge guy. I know where you're going with this. It's not going to go, right? You want it at the edge, because data gravity is important. That's where we're going, so it's going to be huge. We feel data gravity is going to be big. We think core is going to be big. We think cloud's going to be big. And we really want to play in all three of those areas. >> That's when the speed of light is just too damn slow, in the car example. You don't want to send it to the data center and back. You don't want to send it to the data center, you want those decisions to be made at the edge. Your manufacturing floor needs to make the decision at the edge as well. You don't want a lot of that data going back to the cloud. All right, Armughan, thanks for bringing the energy to wrap up our day, and it's great to see you as always. Always good to see you guys, thank you. >> All right, this is Armughan, I'm Jeff Frick. You're watching theCUBE from Super Computing Summit 2017. Thanks for watching. We'll see you next time. (soft electronic music)

Published Date : Nov 16 2017

SUMMARY :

Brought to you by Intel. So, first off, just impressions of the show. You have some of the brightest minds in the world What are you guys excited about So, on the advancing, on the HBC side, So, a lot of zeros. the complexity of HBC out, and that's where our-- You have huge clusters of computers you can and then if that data got really, very large, you then had and all these crazy systems that are comin' to play. So, let me explain the stack to you. for the client, then they can start-- The machine or the deep learning AI actually tells you So, I want to just get your take on-- But, at the same time, now you need to start you could see the sparkle in his eye. But, at the same time we know that we have to But, at the same time, people are going to bring them and it's great to see you as always. We'll see you next time.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Jeff	PERSON	0.99+
Dubai	LOCATION	0.99+
Armughan	PERSON	0.99+
$20,000	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
EMC	ORGANIZATION	0.99+
2018	DATE	0.99+
TAC	ORGANIZATION	0.99+
Nvidia	ORGANIZATION	0.99+
2027	DATE	0.99+
Armughan Ahmad	PERSON	0.99+
Dell	ORGANIZATION	0.99+
12,000	QUANTITY	0.99+
Emirates	LOCATION	0.99+
75%	QUANTITY	0.99+
Mastercard	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
2020	DATE	0.99+
Pivotal	ORGANIZATION	0.99+
8.2 petaflops	QUANTITY	0.99+
C4140	COMMERCIAL_ITEM	0.99+
12,000 people	QUANTITY	0.99+
Texas Institute	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
One	QUANTITY	0.99+
1.2 million policies	QUANTITY	0.99+
J's Retailer	ORGANIZATION	0.99+
Denver, Colorado	LOCATION	0.99+
Yesterday	DATE	0.99+
500 super computers	QUANTITY	0.99+
millions	QUANTITY	0.99+
20 billion devices	QUANTITY	0.99+
University of Texas	ORGANIZATION	0.99+
VMWare	ORGANIZATION	0.99+
Caffe	ORGANIZATION	0.98+
Super Computing Summit 2017	EVENT	0.98+
yesterday	DATE	0.98+
Dell EMC	ORGANIZATION	0.98+
Uber	ORGANIZATION	0.98+
Intel	ORGANIZATION	0.98+
HBC	ORGANIZATION	0.97+
Ravi	PERSON	0.97+
about 20 billion devices	QUANTITY	0.97+
end of 2017	DATE	0.97+
I32	COMMERCIAL_ITEM	0.97+
three	QUANTITY	0.96+
CUBE	ORGANIZATION	0.96+
first half of 2018	DATE	0.96+
Super Computing 17	EVENT	0.95+
Super Computing 2017	EVENT	0.95+
Deep Learning Ready Bundle	COMMERCIAL_ITEM	0.94+
GM	ORGANIZATION	0.94+
Hadoop	TITLE	0.93+
three things	QUANTITY	0.91+
S&P 500	ORGANIZATION	0.91+
SQL	TITLE	0.9+
UT	ORGANIZATION	0.9+
about 1.2 million policies	QUANTITY	0.89+
first	QUANTITY	0.89+
Rolex	ORGANIZATION	0.89+
Hybrid Cloud	ORGANIZATION	0.88+
Blockbuster	ORGANIZATION	0.87+
Theano	ORGANIZATION	0.86+
12	QUANTITY	0.86+
IEEE	ORGANIZATION	0.85+

Raja Mukhopadhyay & Stefanie Chiras - Nutanix .NEXTconf 2017 - #NEXTconf - #theCUBE

[Voiceover] - Live from Washington D.C. It's theCUBE covering dot next conference. Brought to you by Nutanix. >> Welcome back to the district everybody. This is Nutanix NEXTconf, hashtag NEXTconf. And this is theCUBE, the leader in live tech coverage. Stephanie Chiras is here. She's the Vice President of IBM Power Systems Offering Management, and she's joined by Raja Mukhopadhyay who is the VP of Product Management at Nutanix. Great to see you guys again. Thanks for coming on. >> Yeah thank you. Thanks for having us. >> So Stephanie, you're welcome, so Stephanie I'm excited about you guys getting into this whole hyper converged space. But I'm also excited about the cognitive systems group. It's kind of a new play on power. Give us the update on what's going on with you guys. >> Yeah so we've been through some interesting changes here. IBM Power Systems, while we still maintain that branding around our architecture, from a division standpoint we're now IBM Cognitive Systems. We've been through a change in leadership. We have now Senior Vice President Bob Picciano leading IBM Cognitive Systems, which is foundationally built upon the technology that's comes from Power Systems. So our portfolio remains IBM Power Systems, but really what it means is we've set our sights on how to take our technology into really those cognitive workloads. It's a focus on clients going to the cognitive era and driving their business into the cognitive era. It's changed everything we do from how we deliver and pull together our offerings. We have offerings like Power AI, which is an offering built upon a differentiated accelerated product with Power technology inside. It has NVIDIA GPU's, it has NVLink capability, and we have all the optimized frameworks. So you have Caffe, Torch, TensorFlow, Chainer, Theano. All of those are optimized for the server, downloadable right in a binary. So it's really about how do we bring ease of use for cognitive workloads and allow clients to work in machine learning and deep learning. >> So Raja, again, part of the reason I'm so excited is IBM has a $15 billion analytics business. You guys talk, you guys talked to the analysts this morning about one of the next waves of workloads is this sort of data oriented, AI, machine learning workloads. IBM obviously has a lot of experience in that space. How did this relationship come together, and let's talk about what it brings to customers. >> It was all like customer driven, right? So all our customers they told us that, look Nutanix we have used your software to bring really unprecedented levels of like agility and simplicity to our data center infrastructure. But, you know, they run at certain sets of workloads on, sort of, non IBM platforms. But a lot of mission critical applications, a lot of the, you know, the cognitive applications. They want to leverage IBM for that, and they said, look can we get the same Nutanix one click simplicity all across my data center. And that is a promise that we see, can we bring all of the AHV goodness that abstracts the underlying platform no matter whether you're running on x86, or your cognitive applications, or your mission critical applications on IBM power. You know, it's a fantastic thing for a joint customer. >> So Stephanie come on, couldn't you reach somewhere into the IBM portfolio and pull out a hyper converged, you know, solution? Why Nutanix? >> Clients love it. Look what the hyper converged market is doing. It's growing at incredible rates, and clients love Nutanix, right? We see incredible repurchases around Nutanix. Clients buy three, next they buy 10. Those repurchase is a real sign that clients like the experience. Now you can take that experience, and under the same simplicity and elegance right of the Prism platform for clients. You can pull in and choose the infrastructure that's best for your workload. So I look at a single Prism experience, if I'm running a database, I can pull that onto a Power based offering. If I'm running a BDI I can pull that onto an alternative. But I can now with the simplicity of action under Prism, right for clients who love that look and feel, pick the best infrastructure for the workloads you're running, simply. That's the beauty of it. >> Raja, you know, Nutanix is spread beyond the initial platform that you had. You have Supermicro inside, you've got a few OEMs. This one was a little different. Can you bring us inside a little bit? You know, what kind of engineering work had to happen here? And then I want to understand from a workload perspective, it used to be, okay what kind of general purpose? What do you want on Power, and what should you say isn't for power? >> Yeah, yeah, it's actually I think a power to, you know it speaks to the, you know, the power of our engineering teams that the level of abstraction that they were able to sort of imbue into our software. The transition from supporting x86 platforms to making the leap onto Power, it has not been a significant lift from an engineering standpoint. So because the right abstractions were put in from the get go. You know, literally within a matter of mere months, something like six to eight months, we were able to have our software put it onto the IBM power platform. And that is kind of the promise that our customers saw that look, for the first time as they are going through a re-platforming of their data center. They see the power in Nutanix as software to abstract all these different platforms. Now in terms of the applications that, you know, they are hoping to run. I think, you know, we're at the cusp of a big transition. If you look at enterprise applications, you could have framed them as systems of record, and systems of engagement. If you look forward the next 10 years, we'll see this big shift, and this new class of applications around systems of intelligence. And that is what a lot-- >> David: Say that again, systems of-- >> Systems of intelligence, right? And that is where a lot of like IBM Power platform, and the things that the Power architecture provides. You know, things around better GPU capabilities. It's going to drive those applications. So our customers are thinking of running both the classical mission critical applications that IBM is known for, but as well as the more sort of forward leaning cognitive and data analytics driven applications. >> So Stephanie, on one hand I look at this just as an extension of what IBM's done for years with Linux. But why is it more, what's it going to accelerate from your customers and what applications that they want to deploy? >> So first, one of the additional reasons Nutanix was key to us is they support the Acropolis platform, which is KVM based. Very much supports our focus on being open around our playing in the Linux space, playing in the KVM space, supporting open. So now as you've seen, throughout since we launched POWER8 back in early 2014 we went Little Endian. We've been very focused on getting a strategic set of ISV's ported to the platform. Right, Hortonworks, MongoDB, EnterpriseDB. Now it's about being able to take the value propositions that we have and, you know, we're pretty bullish on our value propositions. We have a two x price performance guarantee on MongoDB that runs better on Power than it runs on the alternative competition. So we're pretty bullish. Now for clients who have taken a stance that their data center will be a hyper converged data center because they like the simplicity of it. Now they can pull in that value in a seamless way. To me it's really all about compatibility. Pick the best architecture, and all compatible within your data center. >> So you talked about, six to eight months you were able to do the integration. Was that Open Power that allowed you to do that, was it Little Endian, you know, advancements? >> I think it was a combination of both, right? We have done a lot from our Linux side to be compatible within the broad Linux ecosystem particularly around KVM. That was critical for this integration into Acropolis. So we've done a lot from the bottoms up to be, you know, Linux is Linux is Linux. And just as Raja said, right, they've done a lot in their platform to be able to abstract from the underlying and provide a seamless experience that, you know, I think you guys used the term invisible infrastructure, right? The experience to the client is simple, right? And in a simple way, pick the best, right for the workload I run. >> You talked about systems of intelligence. Bob Picciano a lot of times would talk about the insight economy. And so we're, you're right we have the systems of records, systems of engagement. Systems of intelligence, let's talk about those workloads a little bit. I infer from that, that you're essentially basically affecting outcomes, while the transaction is occurring. Maybe it's bringing transactions in analytics together. And doing so in a fashion that maybe humans aren't as involved. Maybe they're not involved at all. What do you mean by systems of intelligence, and how do your joint solutions address those? >> Yeah so, you know, one way to look at it is, I mean, so far if you look at how, sort of decisions are made and insights are gathered. It's we look at data, and between a combination of mostly, you know we try to get structured data, and then we try to draw inferences from it. And mostly it's human beings drawing the inferences. If you look at the promise of technologies like machine learning and deep learning. It is precisely that you can throw unstructured data where no patterns are obvious, and software will find patterns there in. And what we mean by systems of intelligence is imagine you're going through your business, and literally hundreds of terabytes of your transactional data is flowing through a system. The software will be able to come up with insights that would be very hard for human beings to otherwise kind of, you know infer, right? So that's one dimension, and it speaks to kind of the fact that there needs to be a more real time aspect to that sort of system. >> Is part of your strategy to drive specific solutions, I mean integrating certain IBM software on Power, or are you sort of stepping back and say, okay customers do whatever you want. Maybe you can talk about that. >> No we're very keen to take this up to a solution value level, right? We have architected our ISV strategy. We have architected our software strategy for this space, right? It is all around the cognitive workloads that we're focused on. But it's about not just being a platform and an infrastructure platform, it's about being able to bring that solution level above and target it. So when a client runs that workload they know this is the infrastructure they should put it on. >> What's the impact on the go to market then for that offering? >> So from a solutions level or when the-- >> Just how you know it's more complicated than the traditional, okay here is your platform for infrastructure. You know, what channel, maybe it's a question for Raja, but yeah. >> Yeah sure, so clearly, you know, the product will be sold by, you know, the community of Nutanix's channel partners as well as IBM's channels partners, right? So, and, you know, we'll both make the appropriate investments to make sure that the, you know, the daughter channel community is enabled around how they essentially talk about the value proposition of the solution in front of our joint customers. >> Alright we have to leave there, Stephanie, Raja, thanks so much for coming back in theCUBE. It's great to see you guys. >> Raja: Thank you. >> Stephanie: Great to see you both, thank you. >> Alright keep it right there everybody we'll be back with our next guest we're live from D.C. Nutanix dot next, be right back. (electronic music)

Published Date : Jun 28 2017

SUMMARY :

Brought to you by Nutanix. Great to see you guys again. Thanks for having us. so Stephanie I'm excited about you guys getting So you have Caffe, Torch, TensorFlow, You guys talk, you guys talked to the analysts this morning a lot of the, you know, the cognitive applications. for the workloads you're running, simply. beyond the initial platform that you had. Now in terms of the applications that, you know, and the things that the Power architecture provides. So Stephanie, on one hand I look at this just as that we have and, you know, Was that Open Power that allowed you to do that, to be, you know, Linux is Linux is Linux. What do you mean by systems of intelligence, It is precisely that you can throw unstructured data or are you sort of stepping back and say, It is all around the cognitive workloads Just how you know it's more complicated the appropriate investments to make sure that the, you know, It's great to see you guys. you both, thank you. Alright keep it right there everybody

ENTITIES

Entity	Category	Confidence
Raja Mukhopadhyay	PERSON	0.99+
Stephanie	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Stephanie Chiras	PERSON	0.99+
David	PERSON	0.99+
Bob Picciano	PERSON	0.99+
Stefanie Chiras	PERSON	0.99+
$15 billion	QUANTITY	0.99+
Raja	PERSON	0.99+
six	QUANTITY	0.99+
NVIDIA	ORGANIZATION	0.99+
Washington D.C.	LOCATION	0.99+
both	QUANTITY	0.99+
eight months	QUANTITY	0.99+
IBM Cognitive Systems	ORGANIZATION	0.99+
three	QUANTITY	0.99+
early 2014	DATE	0.99+
Linux	TITLE	0.99+
10	QUANTITY	0.98+
first	QUANTITY	0.98+
first time	QUANTITY	0.98+
one	QUANTITY	0.98+
two	QUANTITY	0.97+
IBM Power Systems Offering Management	ORGANIZATION	0.96+
hundreds of terabytes	QUANTITY	0.95+
#NEXTconf	EVENT	0.95+
Prism	ORGANIZATION	0.95+
single	QUANTITY	0.94+
MongoDB	TITLE	0.94+
Supermicro	ORGANIZATION	0.93+
Hortonworks	ORGANIZATION	0.93+
Vice President	PERSON	0.92+
one way	QUANTITY	0.92+
Senior Vice President	PERSON	0.86+
POWER8	TITLE	0.86+
next 10 years	DATE	0.86+
NEXTconf	EVENT	0.83+
this morning	DATE	0.83+
one dimension	QUANTITY	0.79+
Acropolis	ORGANIZATION	0.79+
x86	QUANTITY	0.75+
NVLink	OTHER	0.74+
Endian	ORGANIZATION	0.73+
EnterpriseDB	TITLE	0.73+
VP	PERSON	0.68+

Day One Wrap - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's the CUBE covering Spark Summit 2017, brought to by Databricks. (energetic music plays) >> And what an exciting day we've had here at the CUBE. We've been at Spark Summit 2017, talking to partners, to customers, to founders, technologists, data scientists. It's been a load of information, right? >> Yeah, an overload of information. >> Well, George, you've been here in the studio with me talking with a lot of the guests. I'm going to ask you to maybe recap some of the top things you've heard today for our guests. >> Okay so, well, Databricks laid down, sort of, three themes that they wanted folks to take away. Deep learning, Structured Streaming, and serverless. Now, deep learning is not entirely new to Spark. But they've dramatically improved their support for it. I think, going beyond the frameworks that were written specifically for Spark, like Deeplearning4j and BigDL by Intel And now like TensorFlow, which is the opensource framework from Google, has gotten much better support. Structured Streaming, it was not clear how much more news we were going to get, because it's been talked about for 18 months. And they really, really surprised a lot of people, including me, where they took, essentially, the processing time for an event or a small batch of events down to 1 millisecond. Whereas, before, it was in the hundreds if not higher. And that changes the type of apps you can build. And also, the Databricks guys had coined the term continuous apps, which means they operate on a never-ending stream of data, which is different from what we've had in the past where it's batch or with a user interface, request-response. So they definitely turned up the volume on what they can do with continuous apps. And serverless, they'll talk about more tomorrow. And Jim, I think, is going to weigh in. But it, basically, greatly simplifies the ability to run this infrastructure, because you don't think of it as a cluster of resources. You just know that it's sort of out there, and you ask requests of it, and it figures out how to fulfill it. I will say, the other big surprise for me was when we have Matei, who's the creator of Spark and the chief technologist at Databricks, come on the show and say, when we asked him about how Spark was going to deal with, essentially, more advanced storage of data so that you could update things, so that you could get queries back, so that you could do analytics, and not just of stuff that's stored in Spark but stuff that Spark stores essentially below it. And he said, "You know, Databricks, you can expect to see come out with or partner with a database to do these advanced scenarios." And I got the distinct impression, and after listen to the tape again, that he was talking about for Apache Spark, which is separate from Databricks, that they would do some sort of key-value store. So in other words, when you look at competitors or quasi-competitors like Confluent Kafka or a data artist in Flink, they don't, they're not perfect competitors. They overlap some. Now Spark is pushing its way more into overlapping with some of those solutions. >> Alright. Well, Jim Kobielus. And thank you for that, George. You've been mingling with the masses today. (laughs) And you've been here all day as well. >> Educated masses, yeah, (David laughs) who are really engaged in this stuff, yes. >> Well, great, maybe give us some of your top takeaways after all the conversations you've had today. >> They're not all that dissimilar from George's. What Databricks, Databricks of course being the center, the developer, the primary committer in the Spark opensource community. They've done a number of very important things in terms of the announcements today at this event that push Spark, the Spark ecosystem, where it needs to go to expand the range of capabilities and their deployability into production environments. I feel the deep-learning side, announcement in terms of the deep-learning pipeline API very, very important. Now, as George indicated, Spark has been used in a fair number of deep-learning development environments. But not as a modeling tool so much as a training tool, a tool for In Memory distributed training of deep-learning models that we developed in TensorFlow, in Caffe, and other frameworks. Now this announcement is essentially bringing support for deep learning directly into the Spark modeling pipeline, the machine-learning modeling pipeline, being able to call out to deep learning, you know, TensorFlow and so forth, from within MLlib. That's very important. That means that Spark developers, of which there are many, far more than there are TensorFlow developers, will now have an easy pass to bring more deep learning into their projects. That's critically important to democratize deep learning. I hope, and from what I've seen what Databricks has indicated, that they have support currently in API reaching out to both TensorFlow and Keras, that they have plans to bring in API support for access to other leading DL toolkits such as Caffe, Caffe 2, which is Facebook-developed, such as MXNet, which is Amazon-developed, and so forth. That's very encouraging. Structured Streaming is very important in terms of what they announced, which is an API to enable access to faster, or higher-throughput Structured Streaming in their cloud environment. And they also announced that they have gone beyond, in terms of the code that they've built, the micro-batch architecture of Structured Streaming, to enable it to evolve into a more true streaming environment to be able to contend credibly with the likes of Flink. 'Cause I think that the Spark community has, sort of, had their back against the wall with Structured Streaming that they couldn't fully provide a true sub-millisecond en-oo-en latency environment heretofore. But it sounds like with this R&D that Databricks is addressing that, and that's critically important for the Spark community to continue to evolve in terms of continuous computation. And then the serverless-apps announcement is also very important, 'cause I see it as really being, it's a fully-managed multi-tenant Spark-development environment, as an enabler for continuous Build, Deploy, and Testing DevOps within a Spark machine-learning and now deep-learning context. The Spark community as it evolves and matures needs robust DevOps tools to production-ize these machine-learning and deep-learning models. Because really, in many ways, many customers, many developers are now using, or developing, Spark applications that are real 24-by-7 enterprise application artifacts that need a robust DevOps environment. And I think that Databricks has indicated they know where this market needs to go and they're pushing it with R&D. And I'm encouraged by all those signs. >> So, great. Well thank you, Jim. I hope both you gentlemen are looking forward to tomorrow. I certainly am. >> Oh yeah. >> And to you out there, tune in again around 10:00 a.m. Pacific Time. We're going to be broadcasting live here. From Spark Summit 2017, I'm David Goad with Jim and George, saying goodbye for now. And we'll see you in the morning. (sparse percussion music playing) (wind humming and waves crashing).

Published Date : Jun 7 2017

SUMMARY :

Announcer: Live from San Francisco, it's the CUBE to customers, to founders, technologists, data scientists. I'm going to ask you to maybe recap And that changes the type of apps you can build. And thank you for that, George. after all the conversations you've had today. for the Spark community to continue to evolve I hope both you gentlemen are looking forward to tomorrow. And to you out there, tune in again

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
George	PERSON	0.99+
David	PERSON	0.99+
David Goad	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei	PERSON	0.99+
tomorrow	DATE	0.99+
Amazon	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
Spark	TITLE	0.99+
both	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
Intel	ORGANIZATION	0.98+
Spark Summit 2017	EVENT	0.98+
18 months	QUANTITY	0.98+
Flink	ORGANIZATION	0.97+
Facebook	ORGANIZATION	0.97+
Confluent Kafka	ORGANIZATION	0.97+
Caffe	ORGANIZATION	0.96+
today	DATE	0.96+
TensorFlow	TITLE	0.94+
three themes	QUANTITY	0.94+
10:00 a.m. Pacific Time	DATE	0.94+
CUBE	ORGANIZATION	0.94+
Deeplearning4j	TITLE	0.94+
Spark	ORGANIZATION	0.93+
1 millisecond	QUANTITY	0.93+
Keras	ORGANIZATION	0.91+
Day One	QUANTITY	0.81+
BigDL	TITLE	0.79+
TensorFlow	ORGANIZATION	0.79+
7	QUANTITY	0.77+
MLlib	TITLE	0.73+
Caffe 2	ORGANIZATION	0.7+
Caffe	TITLE	0.7+
24-	QUANTITY	0.68+
MXNet	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.54+

Dr. Jisheng Wang, Hewlett Packard Enterprise, Spark Summit 2017 - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's theCUBE covering Sparks Summit 2017 brought to you by Databricks. >> You are watching theCUBE at Sparks Summit 2017. We continue our coverage here talking with developers, partners, customers, all things Spark, and today we're honored now to have our next guest Dr. Jisheng Wang who's the Senior Director of Data Science at the CTO Office at Hewlett Packard Enterprise. Dr. Wang, welcome to the show. >> Yeah, thanks for having me here. >> All right and also to my right we have Mr. Jim Kobielus who's the Lead Analyst for Data Science at Wikibon. Welcome, Jim. >> Great to be here like always. >> Well let's jump into it. At first I want to ask about your background a little bit. We were talking about the organization, maybe you could do a better job (laughs) of telling me where you came from and you just recently joined HPE. >> Yes. I actually recently joined HPE earlier this year through the Niara acquisition, and now I'm the Senior Director of Data Science in the CTO Office of Aruba. Actually, Aruba you probably know like two years back, HP acquired Aruba as a wireless networking company, and now Aruba takes charge of the whole enterprise networking business in HP which is about over three billion annual revenue every year now. >> Host: That's not confusing at all. I can follow you (laughs). >> Yes, okay. >> Well all I know is you're doing some exciting stuff with Spark, so maybe tell us about this new solution that you're developing. >> Yes, actually my most experience of Spark now goes back to the Niara time, so Niara was a three and a half year old startup that invented, reinvented the enterprise security using big data and data science. So what is the problem we solved, we tried to solve in Niara is called a UEBA, user and entity behavioral analytics. So I'll just try to be very brief here. Most of the transitional security solutions focus on detecting attackers from outside, but what if the origin of the attacker is inside the enterprise, say Snowden, what can you do? So you probably heard of many cases today employees leaving the company by stealing lots of the company's IP and sensitive data. So UEBA is a new solution try to monitor the behavioral change of the enterprise users to detect both this kind of malicious insider and also the compromised user. >> Host: Behavioral analytics. >> Yes, so it sounds like it's a native analytics which we run like a product. >> Yeah and Jim you've done a lot of work in the industry on this, so any questions you might have for him around UEBA? >> Yeah, give us a sense for how you're incorporating streaming analytics and machine learning into that UEBA solution and then where Spark fits into the overall approach that you take? >> Right, okay. So actually when we started three and a half years back, the first version when we developed the first version of the data pipeline, we used a mix of Hadoop, YARN, Spark, even Apache Storm for different kind of stream and batch analytics work. But soon after with increased maturity and also the momentum from this open source Apache Spark community, we migrated all our stream and batch, you know the ETL and data analytics work into Spark. And it's not just Spark. It's Spark, Spark streaming, MLE, the whole ecosystem of that. So there are at least a couple advantages we have experienced through this kind of a transition. The first thing which really helped us is the simplification of the infrastructure and also the reduction of the DevOps efforts there. >> So simplification around Spark, the whole stack of Spark that you mentioned. >> Yes. >> Okay. >> So for the Niara solution originally, we supported, even here today, we supported both the on-premise and the cloud deployment. For the cloud we also supported the public cloud like AWS, Microsoft Azure, and also Privia Cloud. So you can understand with, if we have to maintain a stack of different like open source tools over this kind of many different deployments, the overhead of doing the DevOps work to monitor, alarming, debugging this kind of infrastructure over different deployments is very hard. So Spark provides us some unified platform. We can integrate the streaming, you know batch, real-time, near real-time, or even longterm batch job all together. So that heavily reduced both the expertise and also the effort required for the DevOps. This is one of the biggest advantages we experienced, and certainly we also experienced something like the scalability, performance, and also the convenience for developers to develop a new applications, all of this, from Spark. >> So are you using the Spark structured streaming runtime inside of your application? Is that true? >> We actually use Spark in the steaming processing when the data, so like in the UEBS solutions, the first thing is collecting a lot of the data, different account data source, network data, cloud application data. So when the data comes in, the first thing is streaming job for the ETL, to process the data. Then after that, we actually also develop the some, like different frequency like one minute, 10 minute, one hour, one day of this analytics job on top of that. And even recently we have started some early adoption of the deep learning into this, how to use deep learning to monitor the user behavior change over time, especially after user gives a notice what user, is user going to access like most servers or download some of the sensitive data? So all of this requires very complex analytics infrastructure. >> Now there were some announcements today here at Spark Summit by Databricks of adding deep learning support to their core Spark code base. What are your thoughts about the deep learning pipelines, API, that they announced this morning? It's new news, I'll understand if you don't, haven't digested it totally, but you probably have some good thoughts on the topic. >> Yes, actually this is also news for me, so I can just speak from my current experience. How to integrate deep learning into Spark actually was a big challenge so far for us because what we used so far, the deep learning piece, we used TensorFlow. And certainly most of our other stream and data massaging or ETL work is done by Spark. So in this case, there are a couple ways to manage this, too. One is to set up two separate resource pool, one for Spark, the other one for TensorFlow, but in our deployment there is some very small on-premise department which has only like four node or five node cluster. It's not efficient to split resource in that way. So we actually also looking for some closer integration between deep learning and Spark. So one thing we looked before is called the TensorFlow on Spark which was open source a couple months ago by Yahoo. >> Right. >> So maybe this is certainly more exciting news for the Spark team to develop this native integration. >> Jim: Very good. >> Okay and we talked about the UEBA solution, but let's go back to a little broader HPE perspective. You have this concept called the intelligent edge, what's that all about? >> So that's a very cool name. Actually come a little bit back. I come from the enterprise background, and enterprise applications have some, actually a lag behind than consumer applications in terms of the adoption of the new data science technology. So there are some native challenges for that. For example, collecting and storing large amount of this enterprise sensitive data is a huge concern, especially in European countries. Also for the similar reason how to collect, normally weigh developer enterprise applications. You're lack of some good quantity and quality of the trending data. So this is some native challenges when you develop enterprise applications, but even despite of this, HPE and Aruba recently made several acquisitions of analytics companies to accelerate the adoption of analytics into different product line. Actually that intelligent age comes from this IOT, which is internet of things, is expected to be the fastest growing market in the next few years here. >> So are you going to be integrating the UEBA behavioral analytics and Spark capability into your IOT portfolio at HP? Is that a strategy or direction for you? >> Yes. Yes, for the big picture that certainly is. So you can think, I think some of the Gartner Report expected the number of the IOT devices is going to grow over 20 billion by 2020. Since all of this IOT devices are connected to either intranet or internet, either through wire or wireless, so as a networking company, we have the advantage of collecting data and even take some actions at the first of place. So the idea of this intelligent age is we want to turn each of these IOT devices, the small IOT devices like IP camera, like those motion detection, all of these small devices as opposed to the distributed sensor for the data collection and also some inline actor to do some real-time or even close to real-time decisions. For example, the behavior anomaly detection is a very good example here. If IOT devices is compromised, if the IP camera has been compromised, then use that to steal your internal data. We should detect and stop that at the first place. >> Can you tell me about the challenges of putting deep learning algorithms natively on resource constrained endpoints in the IOT? That must be really challenging to get them to perform well considering that there may be just a little bit of memory or flash capacity or whatever on the endpoints. Any thoughts about how that can be done effectively and efficiently? >> Very good question >> And at low cost. >> Yes, very good question. So there are two aspects into this. First is this global training of the intelligence which is not going to be done on each of the device. In that case, each of the device is more like the sensor for the data collection. So we are going to build a, collect the data sent to the cloud, or build all of this giant pool, like computing resource to trend the classifier, to trend the model, but when we trend the model, we are going to ship the model, so the inference and the detection of the model of those behavioral anomaly really happen on the endpoint. >> Do the training centrally and then push the trained algorithms down to the edge devices. >> Yes. But even like, the second as well even like you said, some of the device like say people try to put those small chips in the spoon, in the case of, in hospital to make it like more intelligent, you cannot put even just the detection piece there. So we also looking to some new technology. I know like Caffe recently announced, released some of the lightweight deep learning models. Also there's some, your probably know, there's some of the improvement from the chip industry. >> Jim: Yes. >> How to optimize the chip design for this kind of more analytics driven task there. So we are all looking to this different areas now. >> We have just a couple minutes left, and Jim you get one last question after this, but I got to ask you, what's on your wishlist? What do you wish you could learn or maybe what did you come to Spark Summit hoping to take away? >> I've always treated myself as a technical developer. One thing I am very excited these days is the emerging of the new technology, like a Spark, like TensorFlow, like Caffe, even Big-Deal which was announced this morning. So this is something like the first go, when I come to this big advanced industry events, I want to learn the new technology. And the second thing is mostly to share our experience and also about adopting of this new technology and also learn from other colleagues from different industries, how people change life, disrupt the old industry by taking advantage of the new technologies here. >> The community's growing fast. I'm sure you're going to receive what you're looking for. And Jim, final question? >> Yeah, I heard you mention DevOps and Spark in same context, and that's a huge theme we're seeing, more DevOps is being wrapped around the lifecycle of development and training and deployment of machine learning models. If you could have your ideal DevOps tool for Spark developers, what would it look like? What would it do in a nutshell? >> Actually it's still, I just share my personal experience. In Niara, we actually developed a lot of the in-house DevOps tools like for example, when you run a lot of different Spark jobs, stream, batch, like one minute batch verus one day batch job, how do you monitor the status of those workflows? How do you know when the data stop coming? How do you know when the workflow failed? Then even how, monitor is a big thing and then alarming when you have something failure or something wrong, how do you alarm it, and also the debug is another big challenge. So I certainly see the growing effort from both Databricks and the community on different aspects of that. >> Jim: Very good. >> All right, so I'm going to ask you for kind of a soundbite summary. I'm going to put you on the spot here, you're in an elevator and I want you to answer this one question. Spark has enabled me to do blank better than ever before. >> Certainly, certainly. I think as I explained before, it helped a lot from both the developer, even the start-up try to disrupt some industry. It helps a lot, and I'm really excited to see this deep learning integration, all different road map report, you know, down the road. I think they're on the right track. >> All right. Dr. Wang, thank you so much for spending some time with us. We appreciate it and go enjoy the rest of your day. >> Yeah, thanks for being here. >> And thank you for watching the Cube. We're here at Spark Summit 2017. We'll be back after the break with another guest. (easygoing electronic music)

Published Date : Jun 6 2017

SUMMARY :

brought to you by Databricks. at the CTO Office at Hewlett Packard Enterprise. All right and also to my right we have Mr. Jim Kobielus (laughs) of telling me where you came from of the whole enterprise networking business I can follow you (laughs). that you're developing. of the company's IP and sensitive data. Yes, so it sounds like it's a native analytics of the data pipeline, we used a mix of Hadoop, YARN, the whole stack of Spark that you mentioned. We can integrate the streaming, you know batch, of the deep learning into this, but you probably have some good thoughts on the topic. one for Spark, the other one for TensorFlow, for the Spark team to develop this native integration. Okay and we talked about the UEBA solution, Also for the similar reason how to collect, of the IOT devices is going to grow natively on resource constrained endpoints in the IOT? collect the data sent to the cloud, Do the training centrally But even like, the second as well even like you said, So we are all looking to this different areas now. And the second thing is mostly to share our experience And Jim, final question? If you could have your ideal DevOps tool So I certainly see the growing effort All right, so I'm going to ask you even the start-up try to disrupt some industry. We appreciate it and go enjoy the rest of your day. We'll be back after the break with another guest.

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
HPE	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
10 minute	QUANTITY	0.99+
one hour	QUANTITY	0.99+
one minute	QUANTITY	0.99+
Wang	PERSON	0.99+
San Francisco	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
Jisheng Wang	PERSON	0.99+
Niara	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
one day	QUANTITY	0.99+
two aspects	QUANTITY	0.99+
Jim Kobielus	PERSON	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Caffe	ORGANIZATION	0.99+
Spark	TITLE	0.99+
Spark	ORGANIZATION	0.99+
one	QUANTITY	0.99+
each	QUANTITY	0.99+
three and a half year	QUANTITY	0.99+
both	QUANTITY	0.99+
Sparks Summit 2017	EVENT	0.99+
first	QUANTITY	0.99+
DevOps	TITLE	0.99+
2020	DATE	0.99+
second thing	QUANTITY	0.99+
Aruba	ORGANIZATION	0.98+
Snowden	PERSON	0.98+
two years back	DATE	0.98+
first thing	QUANTITY	0.98+
one last question	QUANTITY	0.98+
AWS	ORGANIZATION	0.98+
over 20 billion	QUANTITY	0.98+
one question	QUANTITY	0.98+
UEBA	TITLE	0.98+
today	DATE	0.98+
Spark Summit	EVENT	0.97+
Microsoft	ORGANIZATION	0.97+
Spark Summit 2017	EVENT	0.96+
Apache	ORGANIZATION	0.96+
three and a half years back	DATE	0.96+
Databricks	ORGANIZATION	0.96+
one day batch	QUANTITY	0.96+
earlier this year	DATE	0.94+
Aruba	LOCATION	0.94+
One	QUANTITY	0.94+
#SparkSummit	EVENT	0.94+
One thing	QUANTITY	0.94+
one thing	QUANTITY	0.94+
European	LOCATION	0.94+
Gartner	ORGANIZATION	0.93+

Ziya Ma, Intel - Spark Summit East 2017 - #sparksummit - #theCUBE

>> [Narrator] Live from Boston Massachusetts. This is the Cube, covering Sparks Summit East 2017. Brought to you by Databricks. Now here are your hosts, Dave Alante and George Gilbert. >> Back to you Boston everybody. This is the Cube and we're here live at Spark Summit East, #SparkSummit. Ziya Ma is here. She's the Vice President of Big Data at Intel. Ziya, thanks for coming to the Cube. >> Thanks for having me. >> You're welcome. So software is our topic. Software at Intel. You know people don't necessarily associate Intel with always with software but what's the story there? >> So actually there are many things that we do for software. Since I manage the Big Data engineering organization so I'll just say a little bit more about what we do for Big Data. >> [Dave] Great. >> So you know Intel do all the processors, all the hardware. But when our customers are using the hardware, they like to get the best performance out of Intel hardware. So this is for the Big Data space. We optimize the Big Data solution stack, including Spark and Hadoop on top of Intel hardware. And make sure that we leverage the latest instructions set so that the customers get the most performance out of the newest released Intel hardware. And also we collaborated very extensively with the open source community for Big Data ecosystem advancement. For example we're a leading contributor to Apache Spark ecosystem. We're also a top contributor to Apache Hadoop ecosystem. And lately we're getting into the machine learning and deep learning and the AI space, especially integrating those capabilities into the Big Data eTcosystem. >> So I have to ask you a question to just sort of strategically, if we go back several years, you look at during the Unix days, you had a number of players developing hardware, microprocessors, there were risk-based systems, remember MIPS and of course IBM had one and Sun, et cetera, et cetera. Some of those live on but very, very small portion of the market. So Intel has dominated the general purpose market. So as Big Data became more mainstream, was there a discussion okay, we have to develop specialized processors, which I know Intel can do as well, or did you say, okay, we can actually optimize through software. Was that how you got here? Or am I understanding that? >> We believe definitely software optimization, optimizing through software is one thing that we do. That's why Intel actually have, you may not know this, Intel has one of the largest software divisions that focus on enabling and optimizing the solutions in Intel hardware. And of course we also have very aggressive product roadmap for advancing continuously our hardware products. And actually, you mentioned a general purpose computing. CPU today, in the Big Data market, still has more than 95% of the market. So that's still the biggest portion of the Big Data market. And will continue our advancement in that area. And obviously as the Ai and machine learning, deep learning use cases getting added into the Big Data domain and we are expanding our product portfolio into some other Silicon products. >> And of course that was kind of the big bet of, we want to bet on Intel. And I guess, I guess-- >> You should still do. >> And still do. And I guess, at the time, Seagate or other disk mounts. Now flash comes in. And of course now Spark with memory, it's really changing the game, isn't it? What does that mean for you and the software group? >> Right, so what do we... Actually, still we focus on the optimi-- Obviously at the hardware level, like Intel now, is not just offering the computing capability. We also offer very powerful network capability. We offer very good memory solutions, memory hardware. Like we keep talking about this non-volatile memory technologies. So for Big Data, we're trying to leverage all those newest hardware. And we're already working with many of our customers to help them, to improve their Big Data memory solution, the e-memory, analytics type of capability on Intel hardware, give them the most optimum performance and most secure result using Intel hardware. So that's definitely one thing that we continue to do. That's going to be our still our top priority. But we don't just limit our work to optimization. Because giving user the best experience, giving user the complete experience on Intel platform is our ultimate goal. So we work with our customers from financial services company. We work with folks from manufacturing. From transportation. And from other IOT internet of things segment. And to make sure that we give them the easiest Big Data analytics experience on Intel hardware. So when they are running those solutions they don't have to worry too much about how to make their application work with Intel hardware, and how to make it more performant with Intel hardware. Because that's the Intel software solution that's going to bridge the gap. We do that part of the job. And so that it will make our customers experience easier and more complete. >> You serve as the accelerant to the marketplace. Go ahead George. >> [Ziya] That's right. >> So Intel's big ML as the news product, as of the last month of so, open source solution. Tell us how there are other deep learning frameworks that aren't as fully integrated with Spark yet and where BigML fits in since we're at a Spark conference. How it backfills some functionality and how it really takes advantage of Intel hardware. >> George, just like you said, BigDL, we just open sourced a month ago. It's a deep learning framework that we organically built onto of Apache Spark. And it has quite some differences from the other mainstream deep learning frameworks like Caffe, Tensorflow, Torch and Tianu are you name it. The reason that we decide to work on this project was again, through our experience, working with our analytics, especially Big Data analytic customers, as they build their AI solutions or AI modules within their analytics application, it's funny, it's getting more and more difficult to build and integrate AI capability into their existing Big Data analytics ecosystem. They had to set up a different cluster and build a different set of AI capabilities using, let's say, one of the deep learning frameworks. And later they have to overcome a lot of challenges, for example, moving the model and data between the two different clusters and then make sure that AI result is getting integrated into the existing analytics platform or analytics application. So that was the primary driver. How do we make our customers experience easier? Do they have to leave their existing infrastructure and build a separate AI module? And can we do something organic on top of the existing Big Data platform, let's say Apache Spark? Can we just do something like that? So that the user can just leverage the existing infrastructure and make it a naturally integral part of the overall analytics ecosystem that they already have. So this was the primary driver. And also the other benefit that we see by integrating this BigDL framework naturally was the Big Data platform, is that it enables efficient scale-out and fault tolerance and elasticity and dynamic resource management. And those are the benefits that's on naturally brought by Big Data platform. And today, actually, just with this short period of time, we have already tested that BigDL can scale easily to tens or hundreds of nodes. So the scalability is also quite good. And another benefit with solution like BigDL, especially because it eliminates the need of setting a separate cluster and moving the model between different hardware clusters, you save your total cost of ownership. You can just leverage your existing infrastructure. There is no need to buy additional set of hardware and build another environment just for training the model. So that's another benefit that we see. And performance-wise, again we also tested BigDL with Caffe, Torch and TensorFlow. So the performance of BigDL on single node Xeon is orders of magnitude faster than out of box at open source Caffe, TensorFlow or Torch. So it definitely it's going to be very promising. >> Without the heavy lifting. >> And useful solution, yeah. >> Okay, can you talk about some of the use cases that you expect to see from your partners and your customers. >> Actually very good question. You know we already started a few engagement with some of the interested customers. The first customer is from Stuart Industry. Where improving the accuracy for steel-surface defect recognition is very important to it's quality control. So we worked with this customer in the last few months and built end-to-end image recognition pipeline using BigDL and Spark. And the customer just through phase one work, already improved it's defect recognition accuracy to 90%. And they're seeing a very yield improvement with steel production. >> And it used to by human? >> It used to be done by human, yes. >> And you said, what was the degree of improvement? >> 90, nine, zero. So now the accuracy is up to 90%. And another use case and financial services actually, is another use case, especially for fraud detection. So this customer, again I'm not at the customer's request, they're very sensitive the financial industry, they're very sensitive with releasing their name. So the customer, we're seeing is fraud risks were increasing tremendously. With it's wide range of products, services and customer interaction channels. So the implemented end-to-end deep learning solution using BigDL and Spark. And again, through phase one work, they are seeing the fraud detection rate improved 40 times, four, zero times. Through phase one work. We think there were more improvement that we can do because this is just a collaboration in the last few month. And we'll continue this collaboration with this customer. And we expect more use cases from other business segments. But that are the two that's already have BigDL running in production today. >> Well so the first, that's amazing. Essentially replacing the human, have to interact and be much more accurate. The fraud detection, is interesting because fraud detection has come a long way in the last 10 years as you know. Used to take six months, if they found fraud. And now it's minutes, seconds but there's a lot of false positives still. So do you see this technology helping address that problem? >> Yeah, we actually that's continuously improving the prediction accuracy is one of the goals. This is another reason why we need to bring AI and Big Data together. Because you need to train your model. You need to train your AI capabilities with more and more training data. So that you get much more improved training accuracy. Actually this is the biggest way of improving your training accuracy. So you need a huge infrastructure, a big data platform so that you can host and well manage your training data sets. And so that it can feed into your deep learning solution or module for continuously improving your training accuracy. So yes. >> This is a really key point it seems like. I would like to unpack that a little bit. So when we talk to customers and application vendors, it's that training feedback loop that gets the models smarter and smarter. So if you had one cluster for training that was with another framework, and then Spark was your... Rest of your analytics. How would training with feedback data work when you had two separate environments? >> You know that's one of the drivers why we're creating BigDL. Because, we tried to port, we did not come to BigDL at the very beginning. We tried to port the existing deep learning frameworks like Caffe and Tensorflow onto Spark. And you also probably saw some research papers folks. There's other teams that out there that's also trying to port Caffe, Tensorflow and other deep learning framework that's out there onto Spark. Because you have that need. You need to bring the two capabilities together. But the problem is that those systems were developed in a very traditional way. With Big Data, not yet in consideration, when those frameworks were created, were innovated. But now the need for converging the two becomes more and more clear, and more necessary. And that's we way, when we port it over, we said gosh, this is so difficult. First it's very challenging to integrate the two. And secondly the experience, after you've moved it over, is awkward. You're literally using Spark as a dispatcher. The integration is not coherent. It's like they're superficially integrated. So this is where we said, we got to do something different. We can not just superficially integrate two systems together. Can we do something organic on top of the Big Data platform, on top of Apache Spark? So that the integration between the training system, between the feature engineering, between data management can &be more consistent, can be more integrated. So that's exactly the driver for this work. >> That's huge. Seamless integration is one of the most overused phrases in the technology business. Superficial integration is maybe a better description for a lot of those so-called seamless integrations. You're claiming here that it's seamless integration. We're out of time but last word Intel and Spark Summit. What do you guys got going here? What's the vibe like? >> So actually tomorrow I have a keynote. I'm going to talk a little bit more about what we're doing with BigDL. Actually this is one of the big things that we're doing. And of course, in order for BigDL, system like BigDL or even other deep learning frameworks, to get optimum performance on Intel hardware, there's another item that we're highlighting at MKL, Intel optimized Math Kernel Library. It has a lot of common math routines. That's optimized for Intel processor using the latest instruction set. And that's already, today, integrated into the BigDL ecosystem.z6 So that's another thing that we're highlighting. And another thing is that those are just software. And at hardware level, during November, Intel's AI day, our executives from BK, Diane Bryant and Doug Fisher. They also highlighted the Nirvana product portfolio that's coming out. That will give you different hardware choices for AI. You can look at FPGA, Xeon Fi, Xeon and our new Nirvana based Silicon like Crestlake. And those are some good silicon products that you can expect in the future. Intel, taking us to Nirvana, touching every part of the ecosystem. Like you said, 95% share and in all parts of the business. Yeah, thanks very much for coming the Cube. >> Thank you, thank you for having me. >> You're welcome. Alright keep it right there. George and I will be back with our next guest. This is Spark Summit, #SparkSummit. We're the Cube. We'll be right back.

Published Date : Feb 8 2017

SUMMARY :

This is the Cube, covering Sparks Summit East 2017. This is the Cube and we're here live So software is our topic. Since I manage the Big Data engineering organization And make sure that we leverage the latest instructions set So Intel has dominated the general purpose market. So that's still the biggest portion of the Big Data market. And of course that was kind of the big bet of, And I guess, at the time, Seagate or other disk mounts. And to make sure that we give them the easiest You serve as the accelerant to the marketplace. So Intel's big ML as the news product, And also the other benefit that we see that you expect to see from your partners And the customer just through phase one work, So the customer, we're seeing is fraud risks in the last 10 years as you know. So that you get much more improved training accuracy. that gets the models smarter and smarter. So that the integration between the training system, Seamless integration is one of the most overused phrases integrated into the BigDL ecosystem We're the Cube.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Seagate	ORGANIZATION	0.99+
Dave Alante	PERSON	0.99+
40 times	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Dave	PERSON	0.99+
tomorrow	DATE	0.99+
two	QUANTITY	0.99+
six months	QUANTITY	0.99+
Ziya Ma	PERSON	0.99+
November	DATE	0.99+
Doug Fisher	PERSON	0.99+
two systems	QUANTITY	0.99+
tens	QUANTITY	0.99+
more than 95%	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
Boston Massachusetts	LOCATION	0.99+
one	QUANTITY	0.99+
Boston	LOCATION	0.99+
Spark	TITLE	0.99+
first	QUANTITY	0.99+
Ziya	PERSON	0.99+
first customer	QUANTITY	0.99+
a month ago	DATE	0.98+
First	QUANTITY	0.98+
Diane Bryant	PERSON	0.98+
Stuart Industry	ORGANIZATION	0.98+
zero times	QUANTITY	0.98+
nine	QUANTITY	0.98+
zero	QUANTITY	0.97+
two capabilities	QUANTITY	0.97+
Big Data	TITLE	0.97+
BigDL	TITLE	0.97+
Tensorflow	TITLE	0.97+
95% share	QUANTITY	0.96+
Caffe	TITLE	0.96+
one thing	QUANTITY	0.96+
four	QUANTITY	0.96+
#SparkSummit	EVENT	0.96+
one cluster	QUANTITY	0.96+
up to 90%	QUANTITY	0.96+
two different clusters	QUANTITY	0.96+
Hadoop	TITLE	0.96+
today	DATE	0.96+
two separate environments	QUANTITY	0.95+
Cube	COMMERCIAL_ITEM	0.95+
Apache	ORGANIZATION	0.94+
Databricks	ORGANIZATION	0.94+
Spark Summit East 2017	EVENT	0.94+
Big Data	ORGANIZATION	0.93+
Nirvana	LOCATION	0.92+
MIPS	TITLE	0.92+
Spark Summit East	LOCATION	0.92+
hundreds of nodes	QUANTITY	0.91+
secondly	QUANTITY	0.9+
BigML	TITLE	0.89+
Sparks Summit East 2017	EVENT	0.89+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Caffe: