Swami Sivasubramanian, AWS | AWS Summit Online 2020

>> Narrator: From theCUBE Studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is a CUBE conversation. >> Hello everyone, welcome to this special CUBE interview. We are here at theCUBE Virtual covering AWS Summit Virtual Online. This is Amazon's Summits that they normally do all around the world. They're doing them now virtually. We are here in the Palo Alto COVID-19 quarantine crew getting all the interviews here with a special guest, Vice President of Machine Learning, we have Swami, CUBE Alumni, who's been involved in not only the machine learning, but all of the major activity around AWS around how machine learning's evolved, and all the services around machine learning workflows from transcribe, recognition, you name it. Swami, you've been at the helm for many years, and we've also chatted about that before. Welcome to the virtual CUBE covering AWS Summit. >> Hey, pleasure to be here, John. >> Great to see you. I know times are tough. Everything okay at Amazon? You guys are certainly cloud scaled, not too unfamiliar of working remotely. You do a lot of travel, but what's it like now for you guys right now? >> We're actually doing well. We have been I mean, this many of, we are working hard to make sure we continue to serve our customers. Even from their site, we have done, yeah, we had taken measures to prepare, and we are confident that we will be able to meet customer demands per capacity during this time. So we're also helping customers to react quickly and nimbly, current challenges, yeah. Various examples from amazing startups working in this area to reorganize themselves to serve customer. We can talk about that common layer. >> Large scale, you guys have done a great job and fun watching and chronicling the journey of AWS, as it now goes to a whole 'nother level with the post pandemic were expecting even more surge in everything from VPNs, workspaces, you name it, and all these workloads are going to be under a lot of pressure to do more and more value. You've been at the heart of one of the key areas, which is the tooling, and the scale around machine learning workflows. And this is where customers are really trying to figure out what are the adequate tools? How do my teams effectively deploy machine learning? Because now, more than ever, the data is going to start flowing in as virtualization, if you will, of life, is happening. We're going to be in a hybrid world with life. We're going to be online most of the time. And I think COVID-19 has proven that this new trajectory of virtualization, virtual work, applications are going to have to flex, and adjust, and scale, and be reinvented. This is a key thing. What's going on with machine learning, what's new? Tell us what are you guys doing right now. >> Yeah, I see now, in AWS, we offer broadest-- (poor audio capture obscures speech) All the way from like expert practitioners, we offer our frameworks and infrastructure layer support for all popular frameworks from like TensorFlow, Apache MXNet, and PyTorch, PowerShell, (poor audio capture obscures speech) custom chips like inference share. And then, for aspiring ML developers, who want to build their own custom machine learning models, we're actually building, we offer SageMaker, which is our end-to-end machine learning service that makes it easy for customers to be able to build, train, tune, and debug machine learning models, and it is one of our fastest growing machine learning services, and many startups and enterprises are starting to standardize their machine learning building on it. And then, the final tier is geared towards actually application developers, who did not want to go into model-building, just want an easy API to build capabilities to transcribe, run voice recognition, and so forth. And I wanted to talk about one of the new capabilities we are about to launch, enterprise search called Kendra, and-- >> So actually, so just from a news standpoint, that's GA now, that's being announced at the Summit. >> Yeah. >> That was a big hit at re:Invent, Kendra. >> Yeah. >> A lot of buzz! It's available. >> Yep, so I'm excited to say that Kendra is our new machine learning powered, highly accurate enterprise search service that has been made generally available. And if you look at what Kendra is, we have actually reimagined the traditional enterprise search service, which has historically been an underserved market segment, so to speak. If you look at it, on the public search, on the web search front, it is a relatively well-served area, whereas the enterprise search has been an area where data in enterprise, there are a huge amount of data silos, that is spread in file systems, SharePoint, or Salesforce, or various other areas. And deploying a traditional search index has always that even simple persons like when there's an ID desk open or when what is the security policy, or so forth. These kind of things have been historically, people have to find within an enterprise, let alone if I'm actually in a material science company or so forth like what 3M was trying to do. Enable collaboration of researchers spread across the world, to search their experiment archives and so forth. It has been super hard for them to be able to things, and this is one of those areas where Kendra has enabled the new, of course, where Kendra is a deep learning powered search service for enterprises, which breaks down data silos, and collects actually data across various things all the way from S3, or file system, or SharePoint, and various other data sources, and uses state-of-art NLP techniques to be able to actually index them, and then, you can query using natural language queries such as like when there's my ID desk-scoping, and the answer, it won't just give you a bunch of random, right? It'll tell you it opens at 8:30 a.m. in the morning. >> Yeah. >> Or what is the credit card cashback returns for my corporate credit card? It won't give you like a long list of links related to it. Instead it'll give you answer to be 2%. So it's that much highly accurate. (poor audio capture obscures speech) >> People who have been in the enterprise search or data business know how hard this is. And it is super, it's been a super hard problem, the old in the old guard models because databases were limiting to schemas and whatnot. Now, you have a data-driven world, and this becomes interesting. I think the big takeaway I took away from Kendra was not only the new kind of discovery navigation that's possible, in terms of low latency, getting relevant content, but it's really the under-the-covers impact, and I think I'd like to get your perspective on this because this has been an active conversation inside the community, in cloud scale, which is data silos have been a problem. People have had built these data silos, and they really talk about breaking them down but it's really again hard, there's legacy problems, and well, applications that are tied to them. How do I break my silos down? Or how do I leverage either silos? So I think you guys really solve a problem here around data silos and scale. >> Yeah. >> So talk about the data silos. And then, I'm going to follow up and get your take on the kind of size of of data, megabytes, petabytes, I mean, talk about data silos, and the scale behind it. >> Perfect, so if you look at actually how to set up something like a Kendra search cluster, even as simple as from your Management Console in the AWS, you'll be able to point Kendra to various data sources, such as Amazon S3, or SharePoint, and Salesforce, and various others. And say, these are kind of data I want to index. And Kendra automatically pulls in this data, index these using its deep learning and NLP models, and then, automatically builds a corpus. Then, I, as in user of the search index, can actually start querying it using natural language, and don't have to worry where it comes from, and Kendra takes care of things like access control, and it uses finely-tuned machine learning algorithms under the hood to understand the context of natural language query and return the most relevant. I'll give a real-world example of some of the field customers who are using Kendra. For instance, if you take a look at 3M, 3M is using Kendra to support search, support its material science R&D by enabling natural language search of their expansive repositories of past research documents that may be relevant to a new product. Imagine what this does to a company like 3M. Instead of researchers who are spread around the world, repeating the same experiments on material research over and over again, now, their engineers and researchers will allow everybody to quickly search through documents. And they can innovate faster instead of trying to literally reinvent the wheel all the time. So it is better acceleration to the market. Even we are in this situation, one of the interesting work that you might be interested in is the Semantic Scholar team at Allen Institute for AI, recently opened up what is a repository of scientific research called COVID-19 Open Research Dataset. These are expert research articles. (poor audio capture obscures speech) And now, the index is using Kendra, and it helps scientists, academics, and technologists to quickly find information in a sea of scientific literature. So you can even ask questions like, "Hey, how different is convalescent plasma "treatment compared to a vaccine?" And various in that question and Kendra automatically understand the context, and gets the summary answer to these questions for the customers, so. And this is one of the things where when we talk about breaking the data silos, it takes care of getting back the data, and putting it in a central location. Understanding the context behind each of these documents, and then, being able to also then, quickly answer the queries of customers using simple query natural language as well. >> So what's the scale? Talk about the scale behind this. What's the scale numbers? What are you guys seeing? I see you guys always do a good job, I've run a great announcement, and then following up with general availability, which means I know you've got some customers using it. What are we talking about in terms of scales? Petabytes, can you give some insight into the kind of data scale you're talking about here? >> So the nice thing about Kendra is it is easily linearly scalable. So I, as a developer, I can keep adding more and more data, and that is it linearly scales to whatever scale our customers want. So and that is one of the underpinnings of Kendra search engine. So this is where even if you see like customers like PricewaterhouseCoopers is using Kendra to power its regulatory application to help customers search through regulatory information quickly and easily. So instead of sifting through hundreds of pages of documents manually to answer certain questions, now, Kendra allows them to answer natural language question. I'll give another example, which is speaks to the scale. One is Baker Tilly, a leading advisory, tax, and assurance firm, is using Kendra to index documents. Compared to a traditional SharePoint-based full-text search, now, they are using Kendra to quickly search product manuals and so forth. And they're able to get answers up to 10x faster. Look at that kind of impact what Kendra has, being able to index vast amount of data, with in a linearly scalable fashion, keep adding in the order of terabytes, and keep going, and being able to search 10x faster than traditional, I mean traditional keyword search based algorithm is actually a big deal for these customers. They're very excited. >> So what is the main problem that you're solving with Kendra? What's the use case? If I'm the customer, what's my problem that you're solving? Is it just response to data, whether it's a call center, or support, or is it an app? I mean, what's the main focus that you guys came out? What was the vector of problem that you're solving here? >> So when we talked to customers before we started building Kendra, one of the things that constantly came back for us was that they wanted the same ease of use and the ability to search the world wide web, and customers like us to search within an enterprise. So it can be in the form of like an internal search to search within like the HR documents or internal wiki pages and so forth, or it can be to search like internal technical documentation or the public documentation to help the contact centers or is it the external search in terms of customer support and so forth, or to enable collaboration by sharing knowledge base and so forth. So each of these is really dissected. Why is this a problem? Why is it not being solved by traditional search techniques? One of the things that became obvious was that unlike the external world where the web pages are linked that easily with very well-defined structure, internal world is very messy within an enterprise. The documents are put in a SharePoint, or in a file system, or in a storage service like S3, or on naturally, tell-stores or Box, or various other things. And what really customers wanted was a system which knows how to actually pull the data from various these data silos, still understand the access control behind this, and enforce them in the search. And then, understand the real data behind it, and not just do simple keyword search, so that we can build remarkable search service that really answers queries in a natural language. And this has been the theme, premise of Kendra, and this is what had started to resonate with our customers. I talked with some of the other examples even in areas like contact centers. For instance, Magellan Health is using Kendra for its contact centers. So they are able to seamlessly tie like member, provider, or client specific information with other inside information about health care to its agents so that they can quickly resolve the call. Or it can be on internally to do things like external search as well. So very satisfied client. >> So you guys took the basic concept of discovery navigation, which is the consumer web, find what you're looking for as fast as possible, but also took advantage of building intelligence around understanding all the nuances and configuration, schemas, access, under the covers and allowing things to be discovered in a new way. So you basically makes data be discoverable, and then, provide an interface. >> Yeah. >> For discovery and navigation. So it's a broad use cat, then. >> Right, yeah that's sounds somewhat right except we did one thing more. We actually understood not just, we didn't just do discovery and also made it easy for people to find the information but they are sifting through like terabytes or hundreds of terabytes of internal documentation. Sometimes, one other things that happens is throwing a bunch of hundreds of links to these documents is not good enough. For instance, if I'm actually trying to find out for instance, what is the ALS marker in an health care setting, and for a particular research project, then, I don't want to actually sift through like thousands of links. Instead, I want to be able to correctly pinpoint which document contains answer to it. So that is the final element, which is to really understand the context behind each and every document using natural language processing techniques so that you not only find discover the information that is relevant but you also get like highly accurate possible precise answers to some of your questions. >> Well, that's great stuff, big fan. I was really liking the announcement of Kendra. Congratulations on the GA of that. We'll make some room on our CUBE Virtual site for your team to put more Kendra information up. I think it's fascinating. I think that's going to be the beginning of how the world changes, where this, this certainly with the voice activation and API-based applications integrating this in. I just see a ton of activity that this is going to have a lot of headroom. So appreciate that. The other thing I want to get to while I have you here is the news around the augmented artificial intelligence has been brought out as well. >> Yeah. >> So the GA of that is out. You guys are GA-ing everything, which is right on track with your cadence of AWS laws, I'd say. What is this about? Give us the headline story. What's the main thing to pay attention to of the GA? What have you learned? What's the learning curve, what's the results? >> So augmented artificial intelligence service, I called it A2I but Amazon A2I service, we made it generally available. And it is a very unique service that makes it easy for developers to augment human intelligence with machine learning predictions. And this is historically, has been a very challenging problem. We look at, so let me take a step back and explain the general idea behind it. You look at any developer building a machine learning application, there are use cases where even actually in 99% accuracy in machine learning is not going to be good enough to directly use that result as the response to back to the customer. Instead, you want to be able to augment that with human intelligence to make sure, hey, if my machine learning model is returning, saying hey, my confidence interval for this prediction is less than 70%, I would like it to be augmented with human intelligence. Then, A2I makes it super easy for customers to be, developers to use actually, a human reviewer workflow that comes in between. So then, I can actually send it either to the public pool using Mechanical Turk, where we have more than 500,000 Turkers, or I can use a private workflow as a vendor workflow. So now, A2I seamlessly integrates with our Textract, Rekognition, or SageMaker custom models. So now, for instance, NHS is integrated A2I with Textract, so that, and they are building these document processing workflows. The areas where the machine learning model confidence load is not as high, they will be able augment that with their human reviewer workflows so that they can actually build in highly accurate document processing workflow as well. So this, we think is a powerful capability. >> So this really kind of gets to what I've been feeling in some of the stuff we worked with you guys on our machine learning piece. It's hard for companies to hire machine learning people. This has been a real challenge. So I like this idea of human augmentation because humans and machines have to have that relationship, and if you build good abstraction layers, and you abstract away the complexity, which is what you guys do, and that's the vision of cloud, then, you're going to need to have that relationship solidified. So at what point do you think we're going to be ready for theCUBE team, or any customer that doesn't have the or can't find a machine learning person? Or may not want to pay the wages that's required? I mean it's hard to find a machine learning engineer, and when does the data science piece come in with visualization, the spectrum of pure computer science, math, machine learning guru to full end user productivity? Machine learning is where you guys are doing a lot of work. Can you just share your opinion on that evolution of where we are on that? Because people want to get to the point where they don't have to hire machine learning folks. >> Yeah. >> And have that kind support too. >> If you look at the history of technology, I actually always believe that many of these highly disruptive technology started as a way that it is available only to experts, and then, they quickly go through the cycles, where it becomes almost common place. I'll give an example with something totally outside the IT space. Let's take photography. I think more than probably 150 years ago, the first professional camera was invented, and built like three to four years still actually take a really good picture. And there were only very few expert photographers in the world. And then, fast forward to time where we are now, now, even my five-year-old daughter takes actually very good portraits, and actually gives it as a gift to her mom for Mother's Day. So now, if you look at Instagram, everyone is a professional photographer. I kind of think the same thing is about to, it will happen in machine learning too. Compared to 2012, where there were very few deep learning experts, who can really build these amazing applications, now, we are starting to see like tens of thousands of actually customers using machine learning in production in AWS, not just proof of concepts but in production. And this number is rapidly growing. I'll give one example. Internally, if you see Amazon, to aid our entire company to transform and make machine learning as a natural part of the business, six years ago, we started a Machine Learning University. And since then, we have been training all our engineers to take machine learning courses in this ML University, and a year ago, we actually made these coursework available through our Training and Certification platform in AWS, and within 48 hours, more than 100,000 people registered. Think about it, that's like a big all-time record. That's why I always like to believe that developers are always eager to learn, they're very hungry to pick up new technology, and I wouldn't be surprised if four or five years from now, machine learning is kind of becomes a normal feature of the app, the same with databases are, and that becomes less special. If that day happens, then, I would see it as my job is done, so. >> Well, you've got a lot more work to do because I know from the conversations I've been having around this COVID-19 pandemic is it's that there's general consensus and validation that the future got pulled forward, and what used to be an inside industry conversation that we used to have around machine learning and some of the visions that you're talking about has been accelerated on the pace of the new cloud scale, but now that people now recognize that virtual and experiencing it firsthand globally, everyone, there are now going to be an acceleration of applications. So we believe there's going to be a Cambrian explosion of new applications that got to reimagine and reinvent some of the plumbing or abstractions in cloud to deliver new experiences, because the expectations have changed. And I think one of the things we're seeing is that machine learning combined with cloud scale will create a whole new trajectory of a Cambrian explosion of applications. So this has kind of been validated. What's your reaction to that? I mean do you see something similar? What are some of the things that you're seeing as we come into this world, this virtualization of our lives, it's every vertical, it's not one vertical anymore that's maybe moving faster. I think everyone sees the impact. They see where the gaps are in this new reality here. What's your thoughts? >> Yeah, if you see the history from machine learning specifically around deep learning, while the technology is really not new, especially because the early deep learning paper was probably written like almost 30 years ago. And why didn't we see deep learning take us sooner? It is because historically, deep learning technologies have been hungry for computer resources, and hungry for like huge amount of data. And then, the abstractions were not easy enough. As you rightfully pointed out that cloud has come in made it super easy to get like access to huge amount of compute and huge amount of data, and you can literally pay by the hour or by the minute. And with new tools being made available to developers like SageMaker and all the AI services, we are talking about now, there is an explosion of options available that are easy to use for developers that we are starting to see, almost like a huge amount of like innovations starting to pop up. And unlike traditional disruptive technologies, which you usually see crashing in like one or two industry segments, and then, it crosses the chasm, and then goes mainstream, but machine learning, we are starting to see traction almost in like every industry segment, all the way from like in financial sector, where fintech companies like Intuit is using it to forecast its call center volume and then, personalization. In the health care sector, companies like Aidoc are using computer vision to assist radiologists. And then, we are seeing in areas like public sector. NASA has partnered with AWS to use machine learning to do anomaly detection, algorithms to detect solar flares in the space. And yeah, examples are plenty. It is because now, machine learning has become such common place that and almost every industry segment and every CIO is actually already looking at how can they reimagine, and reinvent, and make their customer experience better covered by machine learning. In the same way, Amazon actually asked itself, like eight or 10 years ago, so very exciting. >> Well, you guys continue to do the work, and I agree it's not just machine learning by itself, it's the integration and the perfect storm of elements that have come together at this time. Although pretty disastrous, but I think ultimately, it's going to come out, we're going to come out of this on a whole 'nother trajectory. It's going to be creativity will be emerged. You're going to start seeing really those builders thinking, "Okay hey, I got to get out there. "I can deliver, solve the gaps we are exposed. "Solve the problems, "pre-create new expectations, new experience." I think it's going to be great for software developers. I think it's going to change the computer science field, and it's really bringing the lifestyle aspect of things. Applications have to have a recognition of this convergence, this virtualization of life. >> Yeah. >> The applications are going to have to have that. So and remember virtualization helped Amazon formed the cloud. Maybe, we'll get some new kinds of virtualization, Swami. (laughs) Thanks for coming on, really appreciate it. Always great to see you. Thanks for taking the time. >> Okay, great to see you, John, also. Thank you, thanks again. >> We're with Swami, the Vice President of Machine Learning at AWS. Been on before theCUBE Alumni. Really sharing his insights around what we see around this virtualization, this online event at the Amazon Summit, we're covering with the Virtual CUBE. But as we go forward, more important than ever, the data is going to be important, searching it, finding it, and more importantly, having the humans use it building an application. So theCUBE coverage continues, for AWS Summit Virtual Online, I'm John Furrier, thanks for watching. (enlightening music)

Published Date : May 13 2020

SUMMARY :

leaders all around the world, and all the services around Great to see you. and we are confident that we will the data is going to start flowing in one of the new capabilities we are about announced at the Summit. That was a big hit A lot of buzz! and the answer, it won't just give you list of links related to it. and I think I'd like to get and the scale behind it. and then, being able to also then, into the kind of data scale So and that is one of the underpinnings One of the things that became obvious to be discovered in a new way. and navigation. So that is the final element, that this is going to What's the main thing to and explain the general idea behind it. and that's the vision of cloud, And have that and built like three to four years still and some of the visions of options available that are easy to use and it's really bringing the are going to have to have that. Okay, great to see you, John, also. the data is going to be important,

ENTITIES

Entity	Category	Confidence
NASA	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Swami	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2012	DATE	0.99+
John Furrier	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Boston	LOCATION	0.99+
99%	QUANTITY	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.99+
Kendra	ORGANIZATION	0.99+
Aidoc	ORGANIZATION	0.99+
2%	QUANTITY	0.99+
hundreds of pages	QUANTITY	0.99+
Swami Sivasubramanian	PERSON	0.99+
four years	QUANTITY	0.99+
less than 70%	QUANTITY	0.99+
thousands of links	QUANTITY	0.99+
S3	TITLE	0.99+
10x	QUANTITY	0.99+
more than 100,000 people	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
Mother's Day	EVENT	0.99+
3M	ORGANIZATION	0.99+
six years ago	DATE	0.99+
SharePoint	TITLE	0.99+
Magellan Health	ORGANIZATION	0.99+
hundreds of links	QUANTITY	0.98+
eight	DATE	0.98+
a year ago	DATE	0.98+
each	QUANTITY	0.98+
8:30 a.m.	DATE	0.98+
48 hours	QUANTITY	0.98+
Mechanical Turk	ORGANIZATION	0.98+
PricewaterhouseCoopers	ORGANIZATION	0.98+
one example	QUANTITY	0.98+
Textract	TITLE	0.97+
Amazon Summit	EVENT	0.97+
five-year-old	QUANTITY	0.97+
Salesforce	TITLE	0.97+
ML University	ORGANIZATION	0.97+
hundreds of terabytes	QUANTITY	0.97+
Allen Institute for AI	ORGANIZATION	0.97+
first professional camera	QUANTITY	0.96+
COVID-19 pandemic	EVENT	0.96+
A2I	TITLE	0.96+
One	QUANTITY	0.95+
COVID-19	OTHER	0.95+
Machine Learning University	ORGANIZATION	0.95+
GA	LOCATION	0.94+
Instagram	ORGANIZATION	0.94+
pandemic	EVENT	0.93+
theCUBE Studios	ORGANIZATION	0.93+
COVID	TITLE	0.93+
Baker Tilly	ORGANIZATION	0.92+
AWS Summit	EVENT	0.92+

CUBE Insights from re:Invent 2018

(upbeat music) >> Live from Las Vegas, it's theCUBE covering AWS re:Invent 2018. Brought to you by Amazon Web Services, Intel, and their ecosystem partners. >> Okay, welcome back everyone. Live coverage here in Las Vegas for Amazon re:Invent 2018. Day three, we're winding down over 150 videos. We'll have over 500 clips. Losing the voice. Dave Vellante, my co-host. Suzi analyst tech that we're going to extract theCUBE insights, James Kobielus. David Floyer from Wikibon. Jim you've been prolific on the blogs, Siliconangle.com, great stories. David you've got some research. What's your take? Jim, you're all over what's going on in the news. What's the impact? >> Well I think what this years re:Invent shows is that AWS is doubling down on A.I. If you look at the sheer range of innovative A.I. capabilities they've introduced into their portfolio, in terms of their announcements, it's really significant. A. They have optimized tense or flow for their cloud. B. They now have an automated labeling, called Ground Truth, labeling capability that leverages mechanical turf, which has been an Amazon capability for a while. They've also got now the industries first, what's called reinforcement learning plug-in to their data science tool chain, in this case Sage Maker, reinforcement learning is becoming so important for robotics, and gaming, and lots of other applications of A.I., and I'm just scratching the surface. So they've announced a lot of things, and David can discuss other things, but I'm seeing the depth of A.I. Their investment in it shows that they've really got their fingers on what enterprises are doing, and will be doing to differentiate themselves with this technology over the next five to ten years. >> What's an area that you see that people are getting? Clearly A.I. What areas are people missing that's compelling that you've observed here? >> When you say people are missing, you mean the general...? >> Journalists. >> Oh. >> Audience. There's so much news. >> Yeah. Yeah. >> Where are the nuggets that are hidden in the news? (laughing) What are you seeing that people might not see that's different? >> Getting back to the point I was raising, which is that robotics is becoming a predominant application realm for A.I. Robotics, outside the laboratory, or outside of the industrial I.O.T., robots are coming into everything, and there's a special type of A.I. you build into robots, re-enforcement learning is a big part of it. So I think the general, if you look at the journalists, they've missed the fact that I've seen in the past couple of years, robotics and re-enforcement learning are almost on the verge of being mainstream in the space, and AWS gets it. Just the depth of their investments. Like Deep Racer, that cute little autonomous vehicle that they rolled out here at this event, that just shows that they totally get it. That will be a huge growth sector. >> David Floyer, outpost is their on premises cloud. You've been calling this for I don't know how many years, >> (laughing) Three years. >> Three years? >> Yeah. What's the impact? >> And people said, no way Foyer's wrong (laughing). >> So you get vindication but... >> And people, in particular in AWS. (laughing) >> So you're right. So you're right, but is it going to be out in a year? >> Yeah, next in 2019. >> Will this thing actually make it to the market? And if it does what is the impact? Who wins and who loses? >> Well let's start with will it get to the market? Absolutely. It is outposts, AWS Outposts, is the name. It is taking AWS in the cloud and putting it on premise. The same API's. The same services. It'll be eventually identical between the two. And that has enormous increase in the range, and the reach that AWS and the time that AWS can go after. It is a major, major impact on the marketplace, puts pressure on a whole number of people, the traditional vendors who are supplying that marketplace of the moment, and in my opinion it's going to be wildly successful. People have been waiting that, wanting that, particularly in the enterprise market. They reasons for it are simple. Latency, low latency, you've got to have the data and the compute very close together. Moving data is very, very expensive over long distances, and the third one is many people want, or need to have the data in certain places. So the combination is meeting the requirements, they've taken a long time to get there. I think it's going to be, however wildly successful. It's going to be coming out in 2019. They'll have their alpha, their betas in the beginning of it. They'll have some announcements, probably about mid 2019. >> Who's threatened by this? Everybody? Cisco? HP? Dell? >> The integration of everything, storage, networking, compute, all in the same box is obviously a threat to all suppliers within that. And their going to have to adapt to that pretty strongly. It's going to be a declining market. Declining markets are good if you adapt properly. A lot of people make a lot of money from, like IBM, from mainframe. >> It's a huge threat to IBM. >> You're playing it safe. You're not naming names. (laughing) Okay, I'll rephrase. What's your prediction? >> What's my prediction on? >> Of the landscape after this is wildly successful. >> The landscape is that the alternatives is going to be a much, much smaller pie, and only those that have volume, and only those that can adapt to that environment are going to survive. >> Well, and let's name names. So who's threatened by this? Clearly Dell, EMC, is threatened by this. >> HP. >> HP, New Tanix, the VX rat guys, Lenovo is in there. Are they wiped out? No, but they have to respond. How do they respond? >> They have to respond, yeah. They have to have self service. They have to have utility pricing. They have to connect to the cloud. So either they go hard after AWS, connecting AWS, or they belly up to Microsoft >> With Azure Stack, >> Microsoft Azure. that's clearly going to be their fallback place, so in a way, Microsoft with Azure Stack is also threatened by this, but in a way it's goodness for them because the ecosystem is going to evolve to that. So listen, these guys don't just give up. >> No, no I know. >> They're hard competitors, they're fighters. It's also to me a confirmation of Oracle's same same strategy. On paper Oracle's got that down, they're executing on that, even though it's in a narrow Oracle world. So I think it does sort of indicate that that iPhone for the enterprise strategy is actually quite viable. If I may jump in here, four things stood out to me. The satellite as a service, was to me amazing. What's next? Amazon with scale, there's just so many opportunities for them. The Edge, if we have time. >> I was going to talk about the Edge. >> Love to talk about the Edge. The hybrid evolution, and Open Source. Amazon use to make it easy for the enterprise players to complete. They had limited sales and service capabilities, they had no Open Source give back, they were hybrid deniers. Everything's going to go into the public cloud. That's all changed. They're making it much, much more difficult, for what they call the old guard, to compete. >> So that same way the objection? >> Yeah, they're removing those barriers, those objections. >> Awesome. Edge. >> Yeah, and to comment on one of the things you were talking about, which is the Edge, they have completely changed their approach to the Edge. They have put in Neo as part of Sage Maker, which allows them to push out inference code, and they themselves are pointing out that inference code is 90% of all the compute, into... >> Not the training. >> Not the training, but the inference code after that, that's 90% of the compute. They're pushing that into the devices at the Edge, all sorts of architectures. That's a major shift in mindset about that. >> Yeah, and in fact I was really impressed by Elastic Inference for the same reasons, because it very much is a validation of a trend I've been seeing in the A.I. space for the last several years, which is, you can increasingly build A.I. in your preferred visual, declarative environment with Python code, and then the abstraction layers of the A.I. Ecosystem have developed to a point where, the ecosystem increasingly will auto-compile to TensorFlow, or MXNet, or PyTorch, and then from there further auto-compile your deployed trained model to the most efficient format for the Edge device, for the GP, or whatever. Where ever it's going to be executed, that's already a well established trend. The fact that AWS has productized that, with this Elastic Inference in their cloud, shows that not only do they get that trend, they're just going to push really hard. I'm making sure that AWS, it becomes in many ways, the hub of efficient inferencing for everybody. >> One more quick point on the Edge, if I may. What's going on on the Edge reminds me of the days when Microsoft was trying to take Windows and stick it on mobile. Right, the windows phone. Top down, I.T. guys coming at it, >> Oh that's right. >> and that's what a lot of people are doing today in IT. It's not going to work. What Amazon is doing see, we're going to build an environment that you can build applications on, that are secure, you can manage them from a bottoms up approach. >> Yeah. Absolutely. >> Identifying what the operations technology developers want. Giving them the tools to do that. That's a winning strategy. >> And focusing on them producing the devices, not themselves. >> Right. >> And not declaring where the boundaries are. >> Spot on. >> Very very important. >> Yep. >> And they're obviously inferencing, you get most value out of the data if you put that inferencing as close as you possibly can to that data, within a camera, is in the camera itself. >> And I eluded to it earlier, another key announcement from AWS here is, first of all the investment in Sage Maker itself is super impressive. In the year since they've introduced it, look at they've already added, they have that slide with all the feature enhancements, and new modules. Sage Maker Ground Truth, really important, the fully managed service for automating labeling of training datasets, using Mechanical Turk . The vast majority of the costs in a lot of A.I. initiatives involves human annotators of training data, and without human annotated training data you can't do supervised learning, which is the magic on a lot of A.I, AWS gets the fact that their customers want to automate that to the nth degree. Now they got that. >> We sound like Fam boys (laughing). >> That's going to be wildly popular. >> As we say, clean data makes good M.L., and good M.L. makes great A.I. >> Yeah. (laughing) >> So you don't want any dirty data out there. Cube, more coverage here. Cube insights panel, here in theCUBE at re:Invent. Stay with us for more after this short break. (upbeat music)

Published Date : Nov 29 2018

SUMMARY :

Brought to you by Amazon Web Services, What's the impact? of A.I., and I'm just scratching the surface. What's an area that you see that people are getting? you mean the general...? There's so much news. Just the depth of their investments. David Floyer, outpost is their on premises cloud. What's the impact? And people, in particular in AWS. So you're right. And that has enormous increase in the range, And their going to have to adapt to that pretty strongly. What's your prediction? The landscape is that the alternatives is going to be Well, and let's name names. No, but they have to respond. They have to have self service. because the ecosystem is going to evolve to that. for the enterprise strategy is actually quite viable. for the enterprise players to complete. that inference code is 90% of all the compute, into... They're pushing that into the devices at the Edge, for the Edge device, for the GP, or whatever. What's going on on the Edge reminds me of the days It's not going to work. Identifying what the operations And focusing on them producing the devices, you get most value out of the data if you put that AWS gets the fact that their customers (laughing). and good M.L. makes great A.I. Yeah. So you don't want any dirty data out there.

ENTITIES

Entity	Category	Confidence
David Floyer	PERSON	0.99+
David	PERSON	0.99+
Dave Vellante	PERSON	0.99+
James Kobielus	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Lenovo	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
2019	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Jim	PERSON	0.99+
90%	QUANTITY	0.99+
Three years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
two	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Intel	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Python	TITLE	0.99+
New Tanix	ORGANIZATION	0.99+
over 500 clips	QUANTITY	0.99+
over 150 videos	QUANTITY	0.98+
mid 2019	DATE	0.98+
Windows	TITLE	0.98+
a year	QUANTITY	0.97+
third one	QUANTITY	0.97+
Day three	QUANTITY	0.96+
M.L.	PERSON	0.96+
ten years	QUANTITY	0.95+
VX	ORGANIZATION	0.94+
Edge	TITLE	0.94+
today	DATE	0.93+
Azure Stack	TITLE	0.92+
Sage Maker	TITLE	0.92+
windows	TITLE	0.91+
one	QUANTITY	0.91+
Wikibon	ORGANIZATION	0.9+
first	QUANTITY	0.88+
Invent 2018	EVENT	0.88+
PyTorch	TITLE	0.86+
TensorFlow	TITLE	0.86+
MXNet	TITLE	0.84+
Inference	ORGANIZATION	0.83+
Siliconangle.com	OTHER	0.83+
Amazon re:Invent 2018	EVENT	0.82+
five	QUANTITY	0.79+
re:Invent 2018	EVENT	0.74+
re:	EVENT	0.73+
Elastic	TITLE	0.71+
past couple of years	DATE	0.71+
One more quick point	QUANTITY	0.67+
re	EVENT	0.66+
Cube	ORGANIZATION	0.66+

Dell EMC AI Lab Tour | Dell EMC: Get Ready For AI

(upbeat music) >> Thank you for coming to the HBCN AI Innovation Lab. So, I'm sure that you've heard a lot of excitement in the industry about what we can do with AI and machine learning and deep learning. And our team in our lab has been building solutions for this space. So, very similar to what we do with our other solutions, including high performance computing where we take servers, storage, networking, software, and put it all together to build and design targeted solutions for a particular use case and then bring in services and support along with that, so that we have a complete product. That's what we're doing for the AI space, as well. So, whether we're doing with machine learning, algorithms and whether your data, say for example in Hadoop, or whether your doing deep learning, convolution neural networks, R&M. And no matter what technology you're using, right? So, you have different choices for compute, that those compute choices can be CPUs, GPUs, FPGAs, custom ASICs. There's all sorts of different choices for compute. Similarly you have a lot of different choices for networking, for storage, and your actual use case. Right, are you doing image recognition, fraud detection, what are you trying to do? So our goal is multiple form. First, we want to bring in all these new technologies, all these different technologies, see how they work well together. Specifically in the AI space, we want to make sure that we have the right software framework. Because of a big piece of putting these solutions together is making sure that your MXNet and CAP B, and Tensorflow, and all these frameworks are working well together, along with all these different neural network models. So putting all these things together are making sure that we can run standard benchmark datasets so we can do comparisons across configurations, and then as a result of all that work, share best practices and tuning. Including the storage piece as well. Our top 500 cluster is over here, so multiple racks, this is a cluster that is more that 500 servers today, so around 560 servers. And on the latest top 500 list, which is a list that's published twice a year of the 500 fastest supercomputers in the world. We started with a smaller number of CPUs. We had 128 servers. And then we added more servers, we swapped over to the next generation of CPUs, then we added even more servers, and now we have the latest generation Intel CPUs in this cluster. One of the questions we've been getting more and more, is what do you see with liquid cooling? So, Dell has had the capability to do liquid cooled systems for a while now, but we recently added this capability into factory as well. So you can order systems that are direct contact liquid cooled directly from factory. Let's compare the two, right? Right over here, you have an air cooled rack. Here we have the exact same configuration, so the same compute infrastructure, but with liquid cool. The CPU has a cold plate on it, and that's cooled with facilities water. So these pipes actually have water flowing through them, and so each sled has two pipes coming out of it, for the water loop, and these pipes from each server, each sled, go into these rack manifolds, and at the bottom of the rack over there, is where we have our heat exchanger. In our early studies, we have seen that, your efficiency in terms of how much performance you get out of the server, should not matter whether you're air cooled or liquid cooled, if you're air cooling solution can provide enough cooling for your components. So, what they means is, if you have a well air cooled solution, it's not going to perform any worse than a liquid cooled solution. What the liquid cooling allows you to do is in the same rack space, put in a higher level configuration, higher TDP processors, more disks, a configuration that you say cannot adequately air cool, that configuration in the same space in your data center with the same air flow, you will be able to liquid cool. The biggest advantage of liquid cooling today, is to do with PUE ratios. So how much of your infrastructure power are you using for compute and your infrastructure versus for cooling and power. This is production, this is part of the cluster. What we are doing right now is we are running rack level studies, right? So we've done single chassis studies in our thermal lab along with our thermal engineers on the advantages of liquid cooling and what we can do and how it works for our particular workloads. But now we have a rack level solution, and so we are running different types of workloads, manufacturing workloads, weather simulation, some AI workloads, standard high performance, linpack benchmarks, on an entire rack of liquid cooled, an entire rack of air cooled, all these racks have metered PDUs, where we can measure power, so we're going to measure power consumption as well, and then we have sensors which will allow us to measure temperature, and then we can tell you the whole story. And of course, we have a really, you know, phenomenal group of people in our thermal team, our architects, and we also have the ability to come in and evaluate a data center to see, does liquid cooling make sense for you today. It's not a one size fits all, and liquid cooling is what everybody must do and you must do it today, no. It's a, and that's the value of this lab, right? Actual quantitative results, for liquid cooling, for all our technologies, for all our solutions, so that we can give you the right configuration, right optimizations, with the data backing it up for the right decision for you, instead of forcing you into the one solution that we do have. So now we're actually standing right in the middle of our Zenith super computers, so all the racks around you are Zenith. You can hear that the noise level is higher, that's because this is one cluster, it's running workload right now, both from our team and our engineers, as well as from customers who can get access into the lab and run their workload. So that noise level you hear, is an actual super computer, we have C6420 servers in here today, with the Intel Xeon scalable family processors, and that's what you see in these racks behind you and in front of you. And this cluster is interconnected using the Omnipath interconnect. There are thousands and thousands of applications in the HPC space, and over the years we've added more and more capability. So today in the lab we do a lot of work with manufacturing applications, that's computational fluid dynamic, CFDs, CAE, structural mechanics, you know, things like that. We do a lot of work with life sciences, that's next generation sequencing applications, molecular dynamics, cryogenic electron microscopy, we do weather simulation applications, and a whole bunch more. Quantum chromo dynamics, we do a whole bunch of benchmarking of subsystems. So tests, for compute, for network, for memory, for storage, we do a lot of parify systems, and I/O tests, and when I talk about application benchmarking, we're doing that across different compute, network, and storage to see what the full picture looks like. The list that I've given you, is not a complete list. This switch is an Dell Network H-Series switch, which supports the Omnipath fabric, the Omnipath interconnect, that today runs at a hundred gigabits per second. What you have is all the clusters, all the Zenith servers in the lab, are connected to this switch. Because we started with a few number of servers and then scaled, we knew we were going to grow. We chose to start with a director class switch, which allowed us to add leaf modules as we grew. So the servers, the racks, that are closest to the switch have copper cables, the ones that are coming from across the lab have our fiber cables. So, you know, this switch is what allows us to call this HPC cluster, where we have a high-speed interconnect for our parallel and distributed computations, and a lot of our current deep learning work is being done on this cluster as well on the Intel Xeon side. (upbeat music)

Published Date : Aug 7 2018

SUMMARY :

and then we can tell you the whole story.

ENTITIES

Entity	Category	Confidence
thousands	QUANTITY	0.99+
two pipes	QUANTITY	0.99+
128 servers	QUANTITY	0.99+
two	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
each sled	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
each server	QUANTITY	0.98+
HBCN AI Innovation Lab	ORGANIZATION	0.98+
one solution	QUANTITY	0.98+
both	QUANTITY	0.98+
Xeon	COMMERCIAL_ITEM	0.97+
one cluster	QUANTITY	0.97+
today	DATE	0.96+
twice a year	QUANTITY	0.96+
around 560 servers	QUANTITY	0.96+
C6420	COMMERCIAL_ITEM	0.95+
Network H-Series	COMMERCIAL_ITEM	0.95+
500 servers	QUANTITY	0.95+
Intel	ORGANIZATION	0.94+
500 fastest supercomputers	QUANTITY	0.93+
Dell EMC	ORGANIZATION	0.92+
single chassis	QUANTITY	0.9+
Hadoop	TITLE	0.9+
Omnipath	COMMERCIAL_ITEM	0.81+
a hundred gigabits per second	QUANTITY	0.79+
applications	QUANTITY	0.76+
Tensorflow	TITLE	0.71+
AI Lab Tour	EVENT	0.67+
CAP	TITLE	0.64+
500	QUANTITY	0.6+
one	QUANTITY	0.56+
Zenith	ORGANIZATION	0.55+
top 500	QUANTITY	0.54+
MXNet	TITLE	0.5+
Zenith	COMMERCIAL_ITEM	0.46+
Omnipath	ORGANIZATION	0.36+

Adrian Cockcroft, AWS | KubeCon + CloudNativeCon 2018

>> Announcer: From Copenhagen, Denmark, it's theCUBE. Covering KubeCon and CloudNativeCon Europe 2018. Brought to you by the Cloud Native Computing Foundation and its ecosystem partners. >> Hello and welcome back to the live CUBE coverage here in Copenhagen, Denmark, for KubeCon 2018, Kubernetes European conference. This is theCUBE, I'm John Furrier, my co-host Lauren Cooney here with Adrian Cockcroft who is the Vice President of Cloud Architecture and Strategy for Amazon Web Services, AWS. CUBE alumni, great to see you, a legend in the industry, great to have you on board today. Thanks for coming on. >> Thanks very much. >> Quick update, Amazon, we were at AWS Summit recently, I was at re:Invent last year, it gets bigger and bigger just continue to grow. Congratulations on successful great earnings. You guys posted last week, just continuing to show the scale and leverage that the cloud has. So, again, nothing really new here, cloud is winning and the model of choice. So you guys are doing a great job, so congratulations. Open source, you're handling a lot of that now. This community here, is all about driving cloud standards. >> Adrian: Yeah. >> Your guys position on that is? Standards are great, you do what customers want, as Andy Jassy always says, what's the update? I mean, what's new since Austin last year? >> Yeah, well, it's been great to be back on had a great video of us talking at Austin, it's been very helpful to get the message out of what we're doing in containers and what the open source team that I lead has been up to. It's been very nice. Since then we've done quite a lot. We were talking about doing things then, which we've now actually done and delivered on. We're getting closer to getting our Kubernetes service out, EKS. We hired Bob Wise, he started with us in January, he's the general manager of EKS. Some of you may know Bob has been working with Kubernetes since the early days. He was on the CNCF board before he joined us. He's working very hard, they have a team cranking away on all the things we need to do to get the EKS service out. So that's been major focus, just get it out. We have a lot of people signed up for the preview. Huge interest, we're onboarding a lot of people every week, and we're getting good feedback from people. We have demos of it in the booth here this week. >> So you guys are very customer-centric, following you guys closely as you know. What's the feedback that you're hearing and what are you guys ingesting from an intelligence standpoint from the field. Obviously, a new constituent, not new, but a major constituent is open source communities, as well as paying enterprise customers? What's the feedback? What are you hearing? I would say beyond tire kicking, there's general interest in what Kubernetes has enabled. What's Amazon's view of that? >> Yeah, well, open source in general is always getting a larger slice of what people want to do. Generally, people are trying to get off of their enterprise solutions and evolving into an open source space and then you kind of evolve from that into buying it as a service. So that's kind of the evolution from one trend, custom or enterprise software, to open source to as a service. And we're standing up all of these tools as a service to make them easier to consume for people. Just, everybody's happy to do that. What I'm hearing from customers is that that's what they're looking for. They want it to be easy to use, they want it to scale, they want it to be reliable and work, and that's what we're good at doing. And then they want to track the latest moves in the industry and run with the latest technologies and that's what Kubernetes and the CNCF is doing, gathering together a lot of technologies. Building the community around it, just able to move faster than we'd move on our own. We're leveraging all of those things into what we're doing. >> And the status of EKS right now is in preview? And the estimated timetable for GA? >> In the next few months. >> Next few months. >> You know, get it out then right now it's running in Oregon, in our Oregon data center, so the previews are all happening there. That gets us our initial thing and then everyone go okay, we want to in our other regions, so we have to do that. So another service we have is Fargate, which is basically say just here's a container, I want to run it, you don't have to declare a node or an instance to run it first. We launched that at re:Invent, that's already in production obviously, we just rolled that out to four regions. That's in Virginia, Oregon, Dublin and Ohio right now. A huge interest in Fargate, it lets you simplify your deployments a little bit. We just posted a new blog post that we have an open source blog, you can find if you want to keep up with what's going on with the open source team at AWS. Just another post this morning and it's a first pass at getting Fargate to work with Kubernetes using Virtual Kubelet which is a project that was kicked off by, it's an experimental project, not part of the core Kubernetes system. But it's running on the side. It's something that Microsoft came up with a little while ago. So we now have, we're working with them. We did a pull request, they accepted it, so that team and AWS and a few other customers and other people in the community, working together to provide you a way to start up Fargate as the underlying layer for provisioning containers underneath Kubernetes as the API for doing you know the management of that. >> So who do you work with mostly when you're working in open source? Who do you partner with? What communities are you engaging with in particular? >> It's all over. >> All over? >> Wherever the communities are we're engaging with them. >> Lauren: Okay, any particular ones that stand out? >> Other than CNCF, we have a lot of engagement with Apache Hadoop ecosystem. A lot of work in data science, there's many, many projects in that space. In AI and machine learning, we've sponsored, we've spend a lot of time working with Apache MXNet, we were also working off with TensorFlow by Torch and Caffe and there's a lot, those are all open source frameworks so there's lots of contributions there. In the serverless arena, we have our own SAM service application model. We've been open sourcing more of that recently ourselves and we're working with various other people. Across these different groups there's different conferences you go to, there's different things we do. We just sponsored Rails Conference. My team sponsors and manages most of the open source conference events we go to now. We just did RAILCON, we're doing a Rust conference, soon I think, there's Python conferences. I forget when all these are. There's a massive calendar of conferences that we're supporting. >> Make sure you email us that that list, we're interested actually in looking at what the news and action is. >> So the language ones, AltCon's our flagship one, we'll be top-level sponsor there. When we get to the U.S., CubeCon in Seattle, it's right there, it's two weeks after re:Invent. It's going to be much easier to manage. When we go to re:Invent it's like everyone just wants to take that week off, right. We got a week for everyone to recover and then it's in the hometown. >> You still have that look in your eyes when we interviewed you in Austin you came down, we both were pretty exhausted after re:Invent. >> Yeah, so we announced a bunch of things on Wednesday and Thursday and I had to turn it into a keynote by Tuesday and get everyone to agree. That's what was going on, that was very compressed. We have more time and all of the engineering teams that really want to be at an event like this, were right in the hometown for a lot. >> What's it like workin' at Amazon, I got to ask you it since you brought it up. I mean and you guys run hard at Amazon, you're releasing stuff with a pace that's unbelievable. I mean, I get blown away every year. Almost seems like, inhuman that that you guys can run at that pace. And earnings, obviously, the business results speak for themselves, what's it like there? I mean, you put your running shoes on, you run a marathon every day. >> It's lots of small teams working relatively independently and that scales and that's something other engineering organizations have trouble with. They build hierarchies that slow down. We have a really good engineering culture where every time you start a new team, it runs at its own speed. We've shown that as we add more and more resources, more teams, they are just executing. In fact, their accelerated, they're building on top of other things. We get to build higher and higher level abstractions to layer into. Just getting easier and easier to build things. We're accelerating our pace of innovation there's no slowing down. >> I was telling Jassy they're going to write a Harvard Business School case study on a lot of the management practices, but certainly the impact on the business side with the model that you guys do. But I got to ask you, on the momentum side, super impressed with SageMaker. I predicted on theCUBE at AWS Summit that that will be the fastest growing service. It will overtake Aurora, I think that is currently on stage, presented as the fastest growing service. SageMaker is really popular. Updates there, its role in the community. Obviously, Kubernete's a good fit for orchestrating things. We heard about CubeFlow, is an interesting model. What's going on with SageMaker how is it interplaying with Kubernetes? >> People that want to run, if you're running on-premise, cluster of GPU enabled machines then CubeFlow is a great way of doing that. You're on TensorFlow, that manages your cluster, you run CubeFlow on top. SageMaker is running at very low scale and like a lot of things we do at AWS, what you need to run an individual cluster for any one customer is different from running a multi-tenant service. SageMaker sits on top of ECS and it's now one of the largest generators of traffic to ECS which is Amazon's horizontally scaled, multi-tenant, cluster management system, which is now doing hundreds of millions of container launches a week. That is continuing to grow. We see Kubernetes as it's a more portable abstraction. It has some more, different layers of API's and a big community around it. But for the heavy lifting of running tens of thousands of containers in for a single application, we're still at the level where ECS does that every day and Kubernetes that's kind of the extreme case, where a few people are pushing it. It'll gradually grow scale. >> It's evolution. >> There's an evolution here. But the interesting things are, we're starting to get some convergence on some of the interfaces. Like the interfacing at CNA, CNA is the way you do networking on containers and there is one way of doing that, that is shared by everybody through CNA. EKS uses it, BCS uses it and Kubernetes uses it. >> And the impact of customers is what for that? What's the impact? >> It means the networking structures you want to set up will be the same. And the capabilities and the interfaces. But what happens on AWS is because it has a direct plug-in, you can hook it up to our accelerated networking infrastructure. So, AWS's instances right now, we've offloaded most of the network traffic processing. You're running 25 gigabits of traffic, that's quite a lot of work even for a big CPU, but it's handled by the the Nitro plug-in architecture we have, this in our latest instance type. So if you talked a bit about that at re:Invent but what you're getting is enormous, complete hypervisor offload at the core machine level. You get to use that accelerated networking. You're plugging into that interface. But that, if you want to have a huge number of containers on a machine and you're not really trying to drive very high throughput, then you can use Calico and we support that as well. So, multiple different ways but all through the same thing, the same plug-ins on both. >> System portability. You mentioned some stats, what's the numbers you mentioned? How many containers you're launching a week, hundreds of thousands? On ECS, our container platform that's been out for a few years, so hundreds of millions a week. It's really growing very fast. The containers are taking off everywhere. >> Microservices growth is, again that's the architecture. As architecture is a big part of the conversation what's your dialogue with customers? Because the modern software architecture in cloud, looks a lot different than what it was in the three layered approach that used to be the web stack. >> Yeah, and I think to add to that, you know we were just talking to folks about how in large enterprise organizations, you're still finding groups that do waterfall development. How are you working to kind of bring these customers and these developers into the future, per se? >> Yeah, that's actually, I spend about half my time managing the open source team and recruiting. The other half is talking to customers about this topic. I spend my time traveling around the world, talking at summits and events like this and meeting with customers. There's lots of different problems slowing people down. I think you see three phases of adoption of cloud, in general. One is just speed. I want to get something done quickly, I have a business need, I want to do it. I want machines in minutes instead of months, right, and that speeds everything up so you get something done quickly. The second phase is where you're starting to do stuff at scale and that's where you need cloud native. You really need to have elastic services, you can scale down as well as up, otherwise, you just end up with a lot of idle machines that cost you too much and it's not giving you the flexibility. The third phase we're getting into is complete data center shutdown. If you look at investing in a new data center or data center refresh or just opening an AWS account, it really doesn't make sense nowadays. We're seeing lots of large enterprises either considering it or well into it. Some are a long way into this. When you shut down the data center all of the backend core infrastructure starts coming out. So we're starting to see sort of mainframe replacement and the really critical business systems being replaced. Those are the interesting conversations, that's one of the areas that I'm particularly interested in right now and it's leading into this other buzzword, if you like, called chaos engineering. Which is sort of the, think of it as the availability model for cloud native and microservices. We're just starting a working group at CNCF around chaos engineering, is being started this week. So you can get a bit involved in how we can build some standards. >> That's going to be at Stanford? >> It's here, I mean it's a working group. >> Okay, online. >> The CNCF working group, they are wherever the people are, right. >> So, what is that conversation when you talk about that mainframe kind of conversation or shut down data centers to the cloud. What is the key thing that you promote, up front, that needs to get done by the by the customer? I mean, obviously you have the pillars, the key pillars, but you think about microservices it's a global platform, it's not a lift and shift situation, kind of is, it shut down, but I mean not at that scale. But, security, identity, authentication, there's no perimeter so you know microservices, potentially going to scale. What are the things that you promote upfront, that they have to do up front. What are the up front, table stake decisions? >> For management level, the real problem is people problems. And it's a technology problem somewhere down in the weeds. Really, if you don't get the people structures right then you'll spend forever going through these migrations. So if you sort of bite the bullet and do the reorganization that's needed first and get the right people in the right place, then you move much faster through it. I say a lot of the time, we're way upstream of picking a technology, it's much more about understanding the sort of DevOps, Agile and the organizational structures for these more cellular based organizations, you know, AWS is a great example of that. Netflix are another good example of that. Capital One is becoming a good example of that too. In banking, they're going much faster because they've already gone through that. >> So they're taking the Amazon model, small teams. Is that your general recommendation? What's your general recommendation? >> Well, this is the whole point of microservices, is that they're built by these small teams. It's called Conway's law, which says that the code will end up looking like the team, the org structure that built it. So, if you set up a lots of small teams, you will end up with microservices. That's just the way it works, right. If you try to take your existing siloed architecture with your long waterfall things, it's very hard not to build a monolith. Getting the org structure done first is right. Then we get into kind of the landing zone thing. You could spend years just debating what your architecture should be and some people have and then every year they come back, and it's changing faster than they can decide what to do. That's another kind of like analysis paralysis mode you see some larger enterprises in. I always think just do it. What's the standard best practice, layout my accounts like this, my networks like this, my structures we call it landing zone. We get somebody up to speed incredibly quickly and it's the beaten path. We're starting to build automation around these on boarding things, we're just getting stuff going. >> That's great. >> Yeah, and then going back to the sort of chaos engineering kind of idea, one of the first things I should think you should put into this infrastructure is the disaster recovery automation. Because if that gets there before the apps do, then the apps learn to live with the chaos monkeys and things like that. Really, one of the first apps we installed at Netflix was Chaos Monkey. It wasn't added later, it was there when you arrived. Your app had to survive the chaos that was in the system. So, think of that as, it used to be disaster recovery was incredibly expensive, hard to build, custom and very difficult to test. People very rarely run through their disaster recovery testing data center fail over, but if you build it in on day one, you can build it automated. I think Kubernetes is particularly interesting because the API's to do that automation are there. So we're looking at automating injecting failure at the Kubernetes level and also injecting into the underlying machines that are running Good Maze, like attacking the control plane to make sure that the control plane recovery works. I think there's a lot we can do there to automate it and make it into a low-cost, productized, safe, reliable thing, that you do a lot. Rather than being something that everyone's scared of doing that. >> Or they bolted on after they make decisions and the retrofit, pre-existing conditions into a disaster recovery. Which is chaotic in and of itself. >> So, get the org chart right and then actually get the disaster recovery patterns. If you need something highly available, do that first, before the apps turn up. >> Adrian, thanks for coming on, chaos engineering, congratulations and again, we know you know a little about Netflix, you know that environment, and been big Amazon customer. Congratulations on your success, looking forward to keeping in touch. Thanks for coming on and sharing the AWS perspective on theCUBE. I'm John Furrier, Lauren Cooney live in Denmark for KubeCon 2018 part of the CNC at the Cloud Native Compute Foundation. We'll back with more live coverage, stay with us. We'll be right back. (upbeat music)

Published Date : May 2 2018

SUMMARY :

Brought to you by the Cloud Native Computing Foundation great to have you on board today. So you guys are doing a great job, so congratulations. We have demos of it in the booth here this week. and what are you guys ingesting from So that's kind of the evolution from one trend, as the API for doing you know the management of that. In the serverless arena, we have our the news and action is. So the language ones, AltCon's our flagship one, when we interviewed you in Austin you came down, and Thursday and I had to turn it into a keynote I got to ask you it since you brought it up. where every time you start a new team, the business side with the model that you guys do. and Kubernetes that's kind of the extreme case, But the interesting things are, we're starting most of the network traffic processing. You mentioned some stats, what's the numbers you mentioned? As architecture is a big part of the conversation Yeah, and I think to add to that, and that speeds everything up so you the people are, right. What is the key thing that you promote, up front, and get the right people in the right place, Is that your general recommendation? and it's the beaten path. one of the first things I should think you should Which is chaotic in and of itself. So, get the org chart right and then actually we know you know a little about Netflix,

ENTITIES

Entity	Category	Confidence
Adrian Cockcroft	PERSON	0.99+
Lauren Cooney	PERSON	0.99+
Oregon	LOCATION	0.99+
Lauren	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Adrian	PERSON	0.99+
Andy Jassy	PERSON	0.99+
January	DATE	0.99+
Denmark	LOCATION	0.99+
EKS	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
John Furrier	PERSON	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
Austin	LOCATION	0.99+
Virginia	LOCATION	0.99+
Ohio	LOCATION	0.99+
Cloud Native Compute Foundation	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Dublin	LOCATION	0.99+
Bob Wise	PERSON	0.99+
Thursday	DATE	0.99+
last week	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
25 gigabits	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
CNCF	ORGANIZATION	0.99+
Bob	PERSON	0.99+
second phase	QUANTITY	0.99+
KubeCon	EVENT	0.99+
Wednesday	DATE	0.99+
last year	DATE	0.99+
Harvard Business School	ORGANIZATION	0.99+
Copenhagen, Denmark	LOCATION	0.99+
Fargate	ORGANIZATION	0.99+
hundreds of thousands	QUANTITY	0.99+
third phase	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Chaos Monkey	TITLE	0.99+
SageMaker	TITLE	0.99+
one	QUANTITY	0.99+
U.S.	LOCATION	0.99+
Kubernetes	TITLE	0.99+
Tuesday	DATE	0.99+
Torch	ORGANIZATION	0.99+
Capital One	ORGANIZATION	0.99+
KubeCon 2018	EVENT	0.99+
Apache	ORGANIZATION	0.98+
Kubernetes	ORGANIZATION	0.98+
Python	TITLE	0.98+
CNA	TITLE	0.98+
CubeFlow	TITLE	0.98+
this week	DATE	0.98+
hundreds of millions a week	QUANTITY	0.97+
Kubernete	TITLE	0.97+
One	QUANTITY	0.97+
Calico	ORGANIZATION	0.97+
a week	QUANTITY	0.97+
both	QUANTITY	0.97+
first	QUANTITY	0.97+
tens of thousands of containers	QUANTITY	0.97+
re:Invent	EVENT	0.97+
CloudNativeCon Europe 2018	EVENT	0.97+
GA	LOCATION	0.96+

Ziya Ma, Intel | Big Data SV 2018

>> Live from San Jose, it's theCUBE! Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to theCUBE. Our continuing coverage of our event, Big data SV. I'm Lisa Martin with my co-host George Gilbert. We're down the street from the Strata Data Conference, hearing a lot of interesting insights on big data. Peeling back the layers, looking at opportunities, some of the challenges, barriers to overcome but also the plethora of opportunities that enterprises alike have that they can take advantage of. Our next guest is no stranger to theCUBE, she was just on with me a couple days ago at the Women in Data Science Conference. Please welcome back to theCUBE, Ziya Ma. Vice President of Software and Services Group and the Director of Big Data Technologies from Intel. Hi Ziya! >> Hi Lisa. >> Long time, no see. >> I know, it was just really two to three days ago. >> It was, well and now I can say happy International Women's Day. >> The same to you, Lisa. >> Thank you, it's great to have you here. So as I mentioned, we are down the street from the Strata Data Conference. You've been up there over the last couple days. What are some of the things that you're hearing with respect to big data? Trends, barriers, opportunities? >> Yeah, so first it's very exciting to be back at the conference again. The one biggest trend, or one topic that's hit really hard by many presenters, is the power of bringing the big data system and data science solutions together. You know, we're definitely seeing in the last few years the advancement of big data and advancement of data science or you know, machine learning, deep learning truly pushing forward business differentiation and improve our life quality. So that's definitely one of the biggest trends. Another thing I noticed is there was a lot of discussion on big data and data science getting deployed into the cloud. What are the learnings, what are the use cases? So I think that's another noticeable trend. And also, there were some presentations on doing the data science or having the business intelligence on the edge devices. That's another noticeable trend. And of course, there were discussion on security, privacy for data science and big data so that continued to be one of the topics. >> So we were talking earlier, 'cause there's so many concepts and products to get your arms around. If someone is looking at AI and machine learning on the back end, you know, we'll worry about edge intelligence some other time, but we know that Intel has the CPU with the Xeon and then this lower power one with Atom. There's the GPU, there's ASICs, FPGAS, and then there are these software layers you know, with higher abstraction layer, higher abstraction level. Help us put some of those pieces together for people who are like saying, okay, I know I've got a lot of data, I've got to train these sophisticated models, you know, explain this to me. >> Right, so Intel is a real solution provider for data science and big data. So at the hardware level, and George, as you mentioned, we offer a wide range of products from general purpose like Xeon to targeted silicon such as FPGA, Nervana, and other ASICs chips like Nervana. And also we provide adjacencies like networking the hardware, non-volatile memory and mobile. You know, those are the other adjacent products that we offer. Now on top of the hardware layer, we deliver fully optimized software solutions stack from libraries, frameworks, to tools and solutions. So that we can help engineers or developers to create AI solutions with greater ease and productivity. For instance, we deliver Intel optimized math kernel library. That leverage of the latest instruction set gives us significant performance boosts when you are running your software on Intel hardware. We also deliver framework like BigDL and for Spark and big data type of customers if they are looking for deep learning capabilities. We also optimize some popular open source deep learning frameworks like Caffe, like TensorFlow, MXNet, and a few others. So our goal is to provide all the necessary solutions so that at the end our customers can create the applications, the solutions that they really need to address their biggest pinpoints. >> Help us think about the maturity level now. Like, we know that the very most sophisticated internet service providers who are sort of all over this machine learning now for quite a few years. Banks, insurance companies, people who've had this. Statisticians and actuaries who have that sort of skillset are beginning to deploy some of these early production apps. Where are we in terms of getting this out to the mainstream? What are some of the things that have to happen? >> To get it to mainstream, there are so many things we could do. First I think we will continue to see the wide range of silicon products but then there are a few things Intel is pushing. For example, we're developing this in Nervana, graph compiler that will encapsulate the hardware integration details and present a consistent API for developers to work with. And this is one thing that we hope that we can eventually help the developer community with. And also, we are collaborating with the end user. Like, from the enterprise segment. For example, we're working with the financial services industry, we're working with a manufacturing sector and also customers from the medical field. And online retailers, trying to help them to deliver or create the data science and analytics solutions on Intel-based hardware or Intel optimized software. So that's another thing that we do. And we're seeing actually very good progress in this area. Now we're also collaborating with many cloud service providers. For instance, we work with some of the top seven cloud service providers, both in the U.S. and also in China to democratize the, not only our hardware, but also our libraries and tools, BigDL, MKL, and other frameworks and libraries so that our customers, including individuals and businesses, can easily access to those building blocks from the cloud. So definitely we're working from different factors. >> So last question in the last couple of minutes. Let's kind of vibe on this collaboration theme. Tell us a little bit about the collaboration that you're having with, you mentioned customers in some highly regulated industries, for as an example. But a little bit to understand what's that symbiosis? What is Intel learning from your customers that's driving Intel's innovation of your technologies and big data? >> That's an excellent question. So Lisa, maybe I can start my sharing a couple of customer use cases. What kind of a solution that we help our customer to address. I think it's always wise not to start a conversation with the customer on technology that you deliver. You want to understand the customer's needs first. And then so that you can provide a solution that really address their biggest pinpoint rather than simply selling technology. So for example, we have worked with an online retailer to better understand their customers' shopping behavior and to assess their customers' preferences and interests. And based upon that analysis, the online retailer made different product recommendations and maximized its customers' purchase potential. And it drove up the retailer's sales. You know, that's one type of use case that we have worked. We also have partnered with the customers from the medical field. Actually, today at the Strata Conference we actually had somebody highlighting, we had a joint presentation with UCSF where we helped the medical center to automate the diagnosis and grading of meniscus lesions. And so today actually, that's all done manually by the radiologist but now that entire process is automated. The result is much more accurate, much more consistent, and much more timely. Because you don't have to wait for the availability of a radiologist to read all the 3D MRI images. And that can all be done by machines. You know, so those are the areas that we work with our customers, understand their business need, and give them the solution they are looking for. >> Wow, the impact there. I wish we had more time to dive into some of those examples. But we thank you so much, Ziya, for stopping by twice in one week to theCUBE and sharing your insights. And we look forward to having you back on the show in the near future. >> Thanks, so thanks Lisa, thanks George for having me. >> And for my co-host George Gilbert, I'm Lisa Martin. We are live at Big Data SV in San Jose. Come down, join us for the rest of the afternoon. We're at this cool place called Forager Tasting and Eatery. We will be right back with our next guest after a short break. (electronic outro music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media some of the challenges, barriers to overcome What are some of the things that you're So that's definitely one of the biggest trends. on the back end, So at the hardware level, and George, as you mentioned, What are some of the things that have to happen? and also customers from the medical field. So last question in the last couple of minutes. customers from the medical field. And we look forward to having you We will be right back with our

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
UCSF	ORGANIZATION	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
San Jose	LOCATION	0.99+
China	LOCATION	0.99+
Ziya Ma	PERSON	0.99+
U.S.	LOCATION	0.99+
International Women's Day	EVENT	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Ziya	PERSON	0.99+
one week	QUANTITY	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
First	QUANTITY	0.99+
Strata Data Conference	EVENT	0.99+
one topic	QUANTITY	0.98+
Spark	TITLE	0.98+
both	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
one thing	QUANTITY	0.98+
three days ago	DATE	0.98+
Women in Data Science Conference	EVENT	0.97+
Strata Conference	EVENT	0.96+
first	QUANTITY	0.96+
BigDL	TITLE	0.96+
TensorFlow	TITLE	0.96+
one type	QUANTITY	0.95+
two	DATE	0.94+
MXNet	TITLE	0.94+
Caffe	TITLE	0.92+
theCUBE	ORGANIZATION	0.91+
one	QUANTITY	0.9+
Software and Services Group	ORGANIZATION	0.9+
Forager Tasting and Eatery	ORGANIZATION	0.88+
Vice President	PERSON	0.86+
Big Data Technologies	ORGANIZATION	0.84+
seven cloud service providers	QUANTITY	0.81+
last couple days	DATE	0.81+
Atom	COMMERCIAL_ITEM	0.76+
Silicon Valley	LOCATION	0.76+
Big Data SV 2018	EVENT	0.74+
a couple days ago	DATE	0.72+
Big Data SV	ORGANIZATION	0.7+
Xeon	COMMERCIAL_ITEM	0.7+
Nervana	ORGANIZATION	0.68+
Big Data	EVENT	0.62+
last	DATE	0.56+
data	EVENT	0.54+
case	QUANTITY	0.52+
3D	QUANTITY	0.48+
couple	QUANTITY	0.47+
years	DATE	0.47+
Nervana	TITLE	0.45+
Big	ORGANIZATION	0.32+

Wikibon Predictions Webinar with Slides

(upbeat music) >> Hi, welcome to this year's Annual Wikibon Predictions. This is our 2018 version. Last year, we had a very successful webinar describing what we thought was going to happen in 2017 and beyond and we've assembled a team to do the same thing again this year. I'm very excited to be joined by the folks listed here on the screen. My name is Peter Burris. But with me is David Floyer, Jim Kobielus is remote. George Gilbert's here in our Pal Alto studio with me. Neil Raden is remote. David Vellante is here in the studio with me. And Stuart Miniman is back in our Marlboro office. So thank you analysts for attending and we look forward to a great teleconference today. Now what we're going to do over the course of the next 45 minutes or so is we're going to hit about 13 of the 22 predictions that we have for the coming year. So if you have additional questions, I want to reinforce this, if you have additional questions or things that don't get answered, if you're a client, give us a call. Reach out to us. We'll leave you with the contact information at the end of the session. But to start things off we just want to make sure that everybody understands where we're coming from. And let you know who is Wikibon. So Wikibon is a company that starts with the idea of what's important as to research communities. Communities are where the action is. Community is where the change is happening. And community is where the trends are being established. And so we use digital technologies like theCUbE, CrowdChat and others to really ensure that we are surfacing the best ideas that are in a community and making them available to our clients so that they can succeed successfully, they can be more successful in their endeavors. When we do that, our focus has always been on a very simple premise. And that is that we're moving to an era of digital business. For many people, digital business can mean virtually anything. For us it means something very specific. To us, the difference between business and digital business is data. A digital business uses data to differentially create and keep a customer. So borrowing from what Peter Drucker said if the goal of business is to create customers and keep and sustain customers, the goal of digital business is to use data to do that. And that's going to inform an enormous number of conversations and an enormous number of decisions and strategies over the next few years. We specifically believe that all businesses are going to have establish what we regard as the five core digital business capabilities. First, they're going to have to put in place concrete approaches to turning more data into work. It's not enough to just accrete data, to capture data or to move data around. You have to be very purposeful and planful in how you establish the means by which you turn that data into work so that you can create and keep more customers. Secondly, it's absolutely essential that we build kind of the three core technology issues here, technology capabilities of effectively doing a better job of capturing data and IoT and people, or internet of things and people, mobile computing for example, is going to be a crucial feature of that. You have to then once you capture that data, turn it into value. And we think this is the essence of what big data and in many respects AI is going to be all about. And then once you have the possibility, kind of the potential energy of that data in place, then you have to turn it into kinetic energy and generate work in your business through what we call systems of agency. Now, all of this is made possible by this significant transformation that happens to be conterminous with this transition to digital business. And that is the emergence of the cloud. The technology industry has always been defined by the problems it was able to solve, catalyzed by the characteristics of the technology that made it possible to solve them. And cloud is crucial to almost all of the new types of problems that we're going to solve. So these are the five digital business capabilities that we're going to talk about, where we're going to have our predictions. Let's start first and foremost with this notion of turn more data into work. So our first prediction relates to how data governance is likely to change in a global basis. If we believe that we need to turn more data into work well, businesses haven't generally adopted many of the principles associated with those practices. They haven't optimized to do that better. They haven't elevated those concepts within the business as broadly and successfully as they have or as they should. We think that's going to change in part by the emergence of GDPR or the General Data Protection Regulation. It's going to go in full effect in May 2018. A lot has been written about it. A lot has been talked about. But our core issues ultimately are is that the dictates associated with GDPR are going to elevate the conversation on a global basis. And it mandates something that's now called the data protection officer. We're going to talk about that in a second David Vellante. But if is going to have real teeth. So we were talking with one chief privacy officer not too long ago who suggested that had the Equifax breach occurred under the rules of GDPR that the actual finds that would have been levied would have been in excess of 160 billion dollars which is a little bit more than the zero dollars that has been fined thus far. Now we've seen new bills introduced in Congress but ultimately our observation and our conversations with a lot of data chief privacy officers or data protection officers is that in the B2B world, GDPR is going to strongly influence not just our businesses behavior regarding data in Europe but on a global basis. Now that has an enormous implication David Vellante because it certainly suggest this notion of a data protection officer is something now we've got another potential chief here. How do we think that's going to organize itself over the course of the next few years? >> Well thank you Peter. There are a lot of chiefs (laughs) in the house and sometimes it gets confusing as the CIO, there's the CDO and that's either chief digital officer or chief data officer. There's the CSO, could be strategy, sometimes that could be security. There's the CPO, is that privacy or product. As he says, it gets confusing sometimes. On theCUbE we talked to all of these roles so we wanted to try to add some clarity to that. First thing we want to say is that the CIO, the chief information officer, that role is not going away. A lot of people predict that, we think that's nonsense. They will continue to have a critical role. Digital transformations are the priority in organizations. And so the chief digital officer is evolving from more than just a strategy role to much more of an operation role. Generally speaking, these chiefs tend to report in our observation to the chief operating officer, president COO. And we see the chief digital officer as increasing operational responsibility aligning with the COO and getting incremental responsibility that's more operational in nature. So the prediction really is that the chief digital officer is going to emerge as a charismatic leader amongst these chiefs. And by 2022, nearly 50% of organizations will position the chief digital officer in a more prominent role than the CIO, the CISO, the CDO and the CPO. Those will still be critical roles. The CIO will be an enabler. The chief information security officer has a huge role obviously to play especially in terms of making security a teams sport and not just falling on IT's shoulders or the security team's shoulders. The chief data officer who really emerged from a records and data management role in many cases, particularly within regulated industries will still be responsible for that data architecture and data access working very closely with the emerging chief privacy officer and maybe even the chief data protection officer. Those roles will be pretty closely aligned. So again, these roles remain critical but the chief digital officer we see as increasing in prominence. >> Great, thank you very much David. So when we think about these two activities, what we're really describing is over the course of the next few years, we strongly believe that data will be regarded more as an asset within business and we'll see resources devoted to it and we'll see certainly management devoted to it. Now, that leads to the next set of questions as data becomes an asset, the pressure to acquire data becomes that much more acute. We believe strongly that IoT has an enormous implication longer term as a basis for thinking about how data gets acquired. Now, operational technology has been in place for a long time. We're not limiting ourselves just operational technology when we talk about this. We're really talking about the full range of devices that are going to provide and extend information and digital services out to consumers, out to the Edge, out to a number of other places. So let's start here. Over the course of the next few years, the Edge analytics are going to be an increasingly important feature overall of how technology decisions get made, how technology or digital business gets conceived and even ultimately how business gets defined. Now David Floyer's done a significant amount of work in this domain and we've provided that key finding on the right hand side. And what it shows is that if you take a look at an Edge based application, a stylized Edge based application and you presume that all the data moves back to an centralized cloud, you're going to increase your costs dramatically over a three year period. Now that moderates the idea or moderates the need ultimately for providing an approach to bringing greater autonomy, greater intelligence down to the Edge itself and we think that ultimately IoT and Edge analytics become increasingly synonymous. The challenge though is that as we evolve, while this has a pressure to keep more of the data at the Edge, that ultimately a lot of the data exhaust can someday become regarded as valuable data. And so as a consequence of that, there's still a countervailing impression to try to still move all data not at the moment of automation but for modeling and integration purposes, back to some other location. The thing that's going to determine that is going to be rate at which the cost of moving the data around go down. And our expectation is over the next few years when we think about the implications of some of the big cloud suppliers, Amazon, Google, others, that are building out significant networks to facilitate their business services may in fact have a greater impact on the common carriers or as great an impact on the common carriers as they have on any server or other infrastructure company. So our prediction over the next few years is watch what Amazon, watch what Google do as they try to drive costs down inside their networks because that will have an impact how much data moves from the Edge back to the cloud. It won't have an impact necessarily on the need for automation at the Edge because latency doesn't change but it will have a cost impact. Now that leads to a second consideration and the second consideration is ultimately that when we talk about greater autonomy at the Edge we need to think about how that's going to play out. Jim Kobielus. >> Jim: Hey thanks a lot Peter. Yeah, so what we're seeing at Wikibon is that more and more of the AI applications, more of the AI application development involves AI and more and more of the AI involves deployment of those models, deep learning machine learning and so forth to the Edges of the internet of things and people. And much of that AI will be operating autonomously with little or no round-tripping back to the cloud. What that's causing, in fact, we're seeing really about a quarter of the AI development projects (static interference with web-conference) as Edge deployment. What that involves is that more and more of that AI will be, those applications will be bespoke. They'll be one of a kind, or unique or an unprecedented application and what that means is that, you know, there's a lot of different deployment scenarios within which organizations will need to use new forms of learning to be able to ready that data, those AI applications to do their jobs effectively albeit to predictions of real time, guiding of an autonomous vehicle and so forth. Reinforcement learning is the core of what many of these kinds of projects, especially those that involve robotics. So really software is hitting the world and you know the biggest parts are being taken out of the Edge, much of that is AI, much of that autonomous, where there is no need or less need for real time latency in need of adaptive components, AI infused components where as they can learn by doing. From environmental variables, they can adapt their own algorithms to take the right actions. So, they'll have far reaching impacts on application development in 2018. For the developer, the new developer really is a data scientist at heart. They're going to have to tap into a new range of sources of data especially Edge sourced data from the senors on those devices. They're going to need to do commitment training and testing especially reinforcement learning which doesn't involve trained data so much as it involves being able to build an algorithm that can learn to maximum what's called accumulative reward function and if you do the training there adaptly in real time at the Edge and so forth and so on. So really, much of this will be bespoke in the sense that every Edge device increasingly will have its own set of parameters and its own set of objective functions which will need to be optimized. So that's one of the leading edge forces, trends, in development that we see in the coming year. Back to you Peter. >> Excellent Jim, thank you very much. The next question here how are you going to create value from data? So once you've, we've gone through a couple trends and we have multiple others about what's going to happen at the Edge. But as we think about how we're going to create value from data, Neil Raden. >> Neil: You know, the problem is that data science emerged rapidly out of sort of a perfect storm of big data and cloud computing and so forth. And people who had been involved in quantitative methods you know rapidly glommed onto the title because it was, lets face it, it was very glamorous and paid very well. But there weren't really good best practices. So what we have in data science is a pretty wide field of things that are called data science. My opinion is that the true data scientists are people who are scientists and are involved in developing new or improving algorithms as opposed to prepping data and applying models. So the whole field really kind of generated very quickly, in really, just in a few years. To me I called it generation zero which is more like data prep and model management all done manually. And it wasn't really sustainable in most organizations because for obvious reasons. So generation one, then some vendors stepped up with tool kits or benchmarks or whatever for data scientists and made it a little better. And generation two is what we're going to see in 2018, is the need for data scientists to no longer prep data or at least not spend very much time with it. And not to do model management because the software will not only manage the progression of the models but even recommend them and generate them and select the data and so forth. So it's in for a very big change and I think what you're going to see is that the ranks of data scientists are going to sort of bifurcate to old style, let me sit down and write some spaghetti code in R or Java or something and those that use these advanced tool kits to really get the work done. >> That's great Neil and of course, when we start talking about getting the work done, we are becoming increasingly dependent upon tools, aren't we George? But the tool marketplace for data science, for big data, has been somewhat fragmented and fractured. And hasn't necessarily focused on solving the problems of the data scientists. But in many respects focusing the problems that the tools themselves have. What's going to happen in the coming year when we start thinking about Neil's prescription that as the tools improve what's going to happen to the tools. >> Okay so, the big thing that we see supporting what Neil's talking about, what Neil was talking about is partly a symptom of a product issue and a go to market issue where the produce issue was we had a lot of best of breed products that were all designed to fit together. That in the broader big data space, that's the same issue that we faced with more narrowly with ArpiM Hadoop where you know, where we were trying to fit together a bunch of open source packages that had an admin and developer burden. More broadly, what Neil is talking about is sort of a richer end to end tools that handle both everything from the ingest all to the way to the operationalization and feedback of the models. But part of what has to go on here is that with open source, these open source tools the price point and the functional footprints that many of the vendors are supporting right now can't feed an enterprise sales force. Everyone talks with their open source business models about land and expand and inside sales. But the problem is once you want to go to wide deployment in an enterprise, you still need someone negotiating commercial terms at a senior level. You still need the technical people fitting the tools into a broader architecture. And most of the vendors that we have who are open source vendors today, don't have either the product breadth or the deal size to support traditional enterprise software. An account team would typically a million and a half to two million quota every year so we see consolidation and the consolidation again driven by the need for simplicity for the admins and the developers and for business model reasons to support enterprise sales force. >> All right, so what we're going to see happen in the course of the coming year is a lot of specialization and recognition of what is data science, what are the practices, how is it going to work, supported by an increasing quality of tools and a lot of tool vendors are going to be left behind. Now the third kind of notion here for those core technology capabilities is we still have to enact based on data. The good new is that big data is starting to show some returns in part because of some of the things that AI and other technologies are capable of doing. But we have to move beyond just creating the potential for, we have to turn that into work and that's what we mean ultimately by this notion of systems of agency. The idea that data driven applications will increasingly be act on behalf of a brand, on behalf of a company and building those systems out is going to be crucial. It's going to have a whole new set of disciplines and expertise required. So when we think about what's going to be required, it always starts with this notion of AI. A lot of folks are presuming however, that AI is going to be relatively easy to build or relatively easy to put together. We have a different opinion George. What do we think is going to happen as these next few years unfold related to AI adoption in large enterprises? >> Okay so, let's go back to the lessons we learned from sort of the big data, the raw, you know, let's put a data link in place which was sort of the top of everyone's agenda for several years. The expectation was it was going to cure cancer, taste like chocolate and cost a dollar. And uh. (laughing) It didn't quite work out that way. Partly because we had a burden on the administrator again of so many tools that weren't all designed to fit together, even though they were distributed together. And then the data scientists, the guys who had to take all this data that wasn't carefully curated yet. And turn that into advanced analytics and machine learning models. We have many of the same problems now with tool sets that are becoming more integrated but at lower levels. This is partly what Neil Raden was just talking about. What we have to recognize is something that we see all along, I mean since the beginning of (laughs) corporate computing. We have different levels of extraction and you know at the very bottom, when you're dealing with things like Tensorflow or MXNet, that's not for mainstream enterprises. That's for you know, the big sophisticated tech companies who are building new algorithms on those frameworks. There's a level above that where you're using like a spark cluster in the machine learning built into that. That's slightly more accessible but when we talk about mainstream enterprises taking advantage of AI, the low hanging fruit is for them to use the pre-trained models that the public cloud vendors have created with all the consumer data on speech, image recognition, natural language processing. And then some of those capabilities can be further combined into applications like managing a contact center and we'll see more from like Amazon, like recommendation engines, fulfillment optimization, pricing optimization. >> So our expectation ultimately George is that we're going to see a lot of this, a lot of AI adoption happen through existing applications because the vendors that are capable of acquiring a talent, taking or experimenting, creating value, software vendors are going to be where a lot of the talent ends up. So Neil, we have an example of that. Give us an example of what we think is going to happen in 2018 when we start thinking about exploiting AI and applications. >> Neil: I think that it's fairly clear to be the application of what's called advanced analytics and data science and even machine learning. But really, it's rapidly becoming a commonplace in organizations not just at the bottom of the triangle here. But I like the example of SalesForce.com. What they've done with Einstein, is they've made machine learning and I guess you can say, AI applications available to their customer base and why is that a good thing? Because their customer base already has a giant database of clean data that they can use. So you're going to see a huge number of applications being built with Einstein against Salesforce.com data. But there's another thing to consider and that is a long time ago Salesforce.com built connectors to a zillion times of external data. So, if you're a SalesForce.com customer using Einstein, you're going to be able to use those advanced tools without knowing anything about how to train a machine learning model and start to build those things. And I think that they're going to lead the industry in that sense. That's going to push their revenue next year to, I don't know, 11 billion dollars or 12 billion dollars. >> Great, thanks Neil. All right so when we think about further evidence of this and further impacts, we ultimately have to consider some of the challenges associated with how we're going to create application value continually from these tools. And that leads to the idea that one of the cobblers children, it's going to gain or benefit from AI will in fact be the developer organization. Jim, what's our prediction for how auto-programming impacts development? >> Jim: Thank you very much Peter. Yeah, automation, wow. Auto-programming like I said is the epitome of enterprise application development for us going forward. People know it as co-generation but that really understates the control of auto-programming as it's evolving. Within 2018, what we're going to see is that machine learning driven co-generation approach of becoming the forefront of innovation. We're seeing a lot of activity in the industry in which applications use ML to drive the productivity of developers for all kinds of applications. We're also seeing a fair amount of what's called RPA, robotic process automation. And really, how they differ is that ML will deliver or will drive co-generation, from what I call the inside out meaning, creating reams of code that are geared to optimize a particular application scenario. This is RPA which really takes over the outside in approach which is essentially, it's the evolution of screen scraping that it's able to infer the underlined code needed for applications of various sorts from the external artifacts, the screens and from sort of the flow of interactions and clips and so forth for a given application. We're going to see that ML and RPA will compliment each other in the next generation of auto-programming capabilities. And so, you know, really application development tedium is really the enemy of, one of the enemies of productivity (static interference with web-conference). This is a lot of work, very detailed painstaking work. And what they need is to be better, more nuanced and more adaptive auto-programming tools to be able to build the code at the pace that's absolutely necessary for this new environment of cloud computing. So really AI-related technologies can be applied and are being applied to application development productivity challenges of all sorts. AI is fundamental to RPA as well. We're seeing a fair number of the vendors in that stage incorporate ML driven OCR and natural language processing and screen scraping and so forth into their core tools to be able to quickly build up the logic albeit to drive sort of the verbiage outside in automation of fairly complex orchestration scenario. In 2018, we'll see more of these technologies come together. But you know, they're not a silver bullet. 'Cause fundamentally and for organizations that are considering going deeply down into auto-programming they're going to have to factor AI into their overall plans. They need to get knowledgeable about AI. They're going to need to bring more AI specialists into their core development teams to be able to select from the growing range of tools that are out there, RPA and ML driven auto-programming. Overall, really what we're seeing is that the AI, the data scientists, who's been the fundamental developer of AI, they're coming into the core of development tools and skills in organizations. And they're going to be fundamental to this whole trend in 2018 and beyond. If AI gets proven out in auto-programming, these developers will then be able to evangelize the core utility of the this technology, AI. In a variety of other backend but critically important investments that organizations will be making in 2018 and beyond. Especially in IT operations and in management, AI is big in that area as well. Back to you there, Peter. >> Yeah, we'll come to that a little bit later in the presentation Jim, that's a crucial point but the other thing we want to note here regarding ultimately how folks will create value out of these technologies is to consider the simple question of okay, how much will developers need to know about infrastructure? And one of the big things we see happening is this notion of serverless. And here we've called it serverless, developer more. Jim, why don't you take us through why we think serverless is going to have a significant impact on the industry, at least certainly from a developer perspective and developer productivity perspective. >> Jim: Yeah, thanks. Serverless is really having an impact already and has for the last several years now. Now, everybody, many are familiar in the developer world, AWS Lambda which is really the ground breaking public cloud service that incorporates the serverless capabilities which essentially is an extraction layer that enables developers to build stateless code that executes in a cloud environment without having to worry about and to build microservices, we don't have to worry about underlined management of containers and virtual machines and so forth. So in many ways, you know, serverless is a simplification strategy for developers. They don't have to worry about the underlying plumbing. They can worry, they need to worry about the code, of course. What are called Lambda functions or functional methods and so forth. Now functional programming has been around for quite a while but now it's coming to the form in this new era of serverless environment. What we'll see in 2018 is that we're predicting is that more than 50% of lean microservices employees, in the public cloud will be deployed in serverless environments. There's AWS and Microsoft has the Azure function. IMB has their own. Google has their own. There's a variety of private, there's a variety of multiple service cloud code bases for private deployment of serverless environments that we're seeing evolving and beginning to deploy in 2018. They all involve functional programming which really, along, you know, when coupled with serverless clouds, enables greater scale and speed in terms of development. And it's very agile friendly in the sense that you can quickly Github a functionally programmed serverless microservice in a hurry without having to manage state and so forth. It's very DevOps friendly. In the very real sense it's a lot faster than having to build and manage and tune. You know, containers and DM's and so forth. So it can enable a more real time and rapid and iterative development pipeline going forward in cloud computing. And really fundamentally what serverless is doing is it's pushing more of these Lamba functions to the Edge, to the Edges. If you're at an AWS Green event last week or the week before, but you notice AWS is putting a big push on putting Lambda functions at the Edge and devices for the IoT as we're going to see in 2018. Pretty much the entire cloud arena. Everybody will push more of the serverless, functional programming to the Edge devices. It's just a simplification strategy. And that actually is a powerful tool for speeding up some of the development metabolism. >> All right, so Jim let me jump in here and say that we've now introduced the, some of these benefits and really highlighted the role that the cloud is going to play. So, let's turn our attention to this question of cloud optimization. And Stu, I'm going to ask you to start us off by talking about what we mean by true private cloud and ultimately our prediction for private cloud. Do we have, why don't you take us through what we think is going to happen in this world of true private cloud? >> Stuart: Sure Peter, thanks a lot. So when Wikibon, when we launched the true private cloud terminology which was about two weeks ago next week, two years ago next week, it was in some ways coming together of a lot of trends similar to things that you know, George, Neil and James have been talking about. So, it is nothing new to say that we needed to simplify the IT stack. We all know, you know the tried and true discussion of you know, way too much of the budget is spent kind of keeping lights on. What we'd like to say is kind of running the business. If you squint through this beautiful chart that we have on here, a big piece of this is operational staffing is where we need to be able to make a significant change. And what we've been really excited and what led us to this initial market segment and what we're continuing to see good growth on is the move from traditional, really siloed infrastructure to you want to have, you know, infrastructure where it is software based. You want IT to really be able to focus on the application services that they're running. And what our focus for the this for the 2018 is of course it's the central point, it's the data that matters here. The whole reason we've infrastructured this to be able to run applications and one of the things that is a key determiner as to where and what I use is the data and how can I not only store that data but actually gain value from that data. Something we've talked about time and again and that is a major determining factor as to am I building this in a public cloud or am I doing it in you know my core. Is it something that is going to live on the Edge. So that's what we were saying here with the true private cloud is not only are we going to simplify our environment and therefore it's really the operational model that we talked about. So we often say the line, cloud is not a destination. But it's an operational model. So a true private cloud giving me some of the you know, feel and management type of capability that I had had in the public cloud. It's, as I said, not just virtualization. It's much more than that. But how can I start getting services and one of the extensions is true private cloud does not live in isolation. When we have kind of a core public cloud and Edge deployments, I need to think about the operational models. Where data lives, what processing happens need to be as environments, and what data we'll need to move between them and of course there's fundamental laws of physics that we need to consider in that. So, the prediction of course is that we know how much gear and focus has been on the traditional data center. And true private cloud helps that transformation to modernization and the big focus is many of these applications we've been talking about and uses of data sets are starting to come into these true private cloud environments. So, you know, we've had discussions. There's Spark, there's modern databases. Many of these, there's going to be many reasons why they might live in the private cloud environment. And therefore that's something that we're going to see tremendous growth and a lot of focus. And we're seeing a new wave of companies that are focusing on this to deliver solutions that will do more than just a step function for infrastructure or get us outside of our silos. But really helps us deliver on those cloud native applications where we pull in things like what Jim was talking about with serverless and the like. >> All right, so Stu, what that suggests ultimately is that data is going to dictate that everything's not going to end up in the private or in the public cloud or centralized public clouds because of latency costs, data governance and IP protection reasons. And there will be some others. At bare minimum, that means that we're going to have it in most large enterprises as least a couple of clouds. Talk to us about what this impact of multi cloud is going to look like over the course of the next few years. >> Stuart: Yeah, critical point there Peter. Because, right, unfortunately, we don't have one solution. There's nobody that we run into that say, oh, you know, I just do a single you know, one environment. You know it would be great if we only had one application to worry about. But as you've done this lovely diagram here, we all use lots of SaaS and increasingly, you know, Oracle, Microsoft, SalesForce, you know, all pushing everybody to multiple SaaS environments that has major impacts on my security and where my data lives. Public clouds, no doubt is growing at leaps and bounds. And many customers are choosing applications to live in different places. So just as in data centers, I would kind of look at it from an application standpoint and build up what I need. Often, there's you know, Amazon doing phenomenal. But you know, maybe there's things that I'm doing with Azure. Maybe there's things that's I'm doing with Google or others as well as my service providers for locality, for you know, specialized services, that there's reasons why people are doing it. And what customers would love is an operational model that can actually span between those. So we are very early in trying to attack this multi cloud environment. There's everything from licensing to security to you know, just operationally how do I manage those. And a piece of them that we were touching on in this prediction year, is Kubernetes actually can be a key enabler for that cloud native environment. As Jim talked about the serverless, what we'd really like is our developer to be able to focus on building their application and not think as much about the underlined infrastructure whether that be you know, racket servers that I built myself or public cloud infrastructures. So we really want to think more it's at the data and application level. It's SaaS and pass is the model and Kubernetes holds the promise to solve a piece of this puzzle. Now Kubernetes is not by no means a silver bullet for everything that we need. But it absolutely, it is doing very well. Our team was at the Linux, the CNCF show at KubeCon last week and there is you know, broad adoption from over 40 of the leading providers including Amazon is now a piece. Even SalesForce signed up to the CNCF. So Kubernetes is allowing me to be able to manage multi cloud workflows and therefore the prediction we have here Peter is that 50% of developing teams will be building, sustaining multi cloud with Kubernetes as a foundational component of that. >> That's excellent Stu. But when we think about it, the hardware of technology especially because of the opportunities associated with true private cloud, the hardware technologies are also going to evolve. There will be enough money here to sustain that investment. David Floyer, we do see another architecture on the horizon where for certain classes of workloads, we will be able to collapse and replicate many of these things in an economical, practical way on premise. We call that UniGrid, NVME is, over fabric is a crucial feature of UniGrid. >> Absolutely. So, NVMe takes, sorry NVMe over fabric or NVMe-oF takes NVMe which is out there as storage and turns it into a system framework. It's a major change in system architecture. We call this UniGrid. And it's going to be a focus of our research in 2018. Vendors are already out there. This is the fastest movement from early standards into products themselves. You can see on the chart that IMB have come out with NVMe over fabrics with the 900 storage connected to the power. Nine systems. NetApp have the EF750. A lot of other companies are there. Meta-Lox is out there looking for networks, for high speed networks. Acceler has a major part of the storage software. So and it's going to be used in particular with things like AI. So what are the drivers and benefits of this architecture? The key is that data is the bottleneck for application. We've talked about data. The amount of data is key to making applications more effective and higher value. So NVMe and NVMe over fabrics allows data to be accessed in microseconds as opposed to milliseconds. And it allows gigabytes of data per second as opposed to megabytes of data per second. And it also allows thousands of processes to access all of the data in very very low latencies. And that gives us amazing parallelism. So what's is about is disaggregation of storage and network and processes. There are some huge benefits from that. Not least of which is you save about 50% of the processor you get back because you don't have to do storage and networking on it. And you save from stranded storage. You save from stranded processor and networking capabilities. So it's overall, it's going to be cheaper. But more importantly, it makes it a basis for delivering systems of intelligence. And systems of intelligence are bringing together systems of record, the traditional systems, not rewriting them but attaching them to real time analytics, real time AI and being able to blend those two systems together because you've got all of that additional data you can bring to bare on a particular problem. So systems themselves have reached pretty well the limit of human management. So, one of the great benefits of UniGrid is to have a single metadata lab from all of that data, all of those processes. >> Peter: All those infrastructure elements. >> All those infrastructure elements. >> Peter: And application. >> And applications themselves. So what that leads to is a huge potential to improve automation of the data center and the application of AI to operations, operational AI. >> So George, it sounds like it's going to be one of the key potential areas where we'll see AI be practically adopted within business. What do we think is going to happen here as we think about the role that AI is going to play in IT operations management? >> Well if we go back to the analogy with big data that we thought was going to you know, cure cancer, taste like chocolate, cost a dollar, and it turned out that the application, the most wide spread application of big data was to offload ETL from expensive data warehouses. And what we expect is the first widespread application of AI embedded in applications for horizontal use where Neil mentioned SalesForce and the ability to use Einstein as SalesForce data and connected data. Now because the applications we're building are so complex that as Stu mentioned you know, we have this operational model with a true private cloud. It's actually not just the legacy stuff that's sucking up all the admin overhead. It's the complexity of the new applications and the stringency of the SLA's, means that we would have to turn millions of people into admins, the old you know, when the telephone networks started, everyone's going to have to be an operator. The only way we can get past this is if we sort of apply machine learning to IT Ops and application performance management. The key here is that the models can learn how the infrastructure is laid out and how it operates. And it can also learn about how all the application services and middleware works, behaving independently and with each other and how they tie with the infrastructure. The reason that's important is because all of a sudden you can get very high fidelity root cause analysis. In the old management technology, if you had an underlined problem, you'd have a whole sort of storm of alerts, because there was no reliable way to really triangulate on the or triage the root cause. Now, what's critical is if you have high fidelity root cause analysis, you can have really precise recommendations for remediation or automated remediation which is something that people will get comfortable with over time, that's not going to happen right away. But this is critical. And this is also the first large scale application of not just machine learning but machine data and so this topology of collecting widely desperate machine data and then applying models and then reconfiguring the software, it's training wheels for IoT apps where you're going to have it far more distributed and actuating devices instead of software. >> That's great, George. So let me sum up and then we'll take some questions. So very quickly, the action items that we have out of this overall session and again, we have another 15 or so predictions that we didn't get to today. But one is, as we said, digital business is the use of data assets to compete. And so ultimately, this notion is starting to diffuse rapidly. We're seeing it on theCUbE. We're seeing it on the the CrowdChats. We're seeing it in the increase of our customers. Ultimately, we believe that the users need to start preparing for even more business scrutiny over their technology management. For example, something very simple and David Floyer, you and I have talked about this extensively in our weekly action item research meeting, the idea of backing up and restoring a system is no longer in a digital business world. It's not just backing up and restoring a system or an application, we're talking about restoring the entire business. That's going to require greater business scrutiny over technology management. It's going to lead to new organizational structures. New challenges of adopting systems, et cetera. But, ultimately, our observations is that data is going to indicate technology directions across the board whether we talk about how businesses evolve or the roles that technology takes in business or we talk about the key business capability, digital business capabilities, of capturing data, turning it into value and then turning into work. Or whether we talk about how we think about cloud architecture and which organizations of cloud resources we're going to utilize. It all comes back to the role that data's going to play in helping us drive decisions. The last action item we want to put here before we get to the questions is clients, if we don't get to your question right now, contact us. Send us an inquiry. Support@silicongangle.freshdesk.com. And we'll respond to you as fast as we can over the course of the next day, two days, to try to answer your question. All right, David Vellante, you've been collecting some questions here. Why don't we see if we can take a couple of them before we close out. >> Yeah, we got about five or six minutes in the chat room, Jim Kobielus has been awesome helping out and so there's a lot of detailed answer there. The first, there's some questions and comments. The first one was, are there too many chiefs? And I guess, yeah. There's some title inflation. I guess my comment there would be titles are cheap, results aren't. So if you're creating chief X officers just for the, to check a box, you're probably wasting money. So you've got to give them clear roles. But I think each of these chiefs has clear roles to the extent that they are you know empowered. Another comment came up which is we don't want you know, Hadoop spaghetti soup all over again. Well true that. Are we at risk of having Hadoop spaghetti soup as the centricity of big data moves from Hadoop to AI and ML and deep learning? >> Well, my answer is we are at risk of that but that there's customer pressure and vendor economic pressure to start consolidating. And we'll also see, what we didn't see in the ArpiM big data era, with cloud vendors, they're just going to start making it easier to use some of the key services together. That's just natural. >> And I'll speak for Neil on this one too, very quickly, that the idea ultimately is as the discipline starts to mature, we won't have people that probably aren't really capable of doing some of this data science stuff, running around and buying a tool to try to supplement their knowledge and their experience. So, that's going to be another factor that I think ultimately leads to clarity in how we utilize these tools as we move into an AI oriented world. >> Okay, Jim is on mute so if you wouldn't mind unmuting him. There was a question, is ML a more informative way of describing AI? Jim, when you and I were in our Boston studio, I sort of asked a similar question. AI is sort of the uber category. Machine learning is math. Deep learning is a more sophisticated math. You have a detailed answer in the chat. But maybe you can give a brief summary. >> Jim: Sure, sure. I don't want too pedantic here but deep learning is essentially, it's a lot more hierarchical deeper stacks of neural network of layers to be able to infer high level extractions from data, you know face recognitions, sentiment analysis and so forth. Machine learning is the broader phenomenon. That's simply along a different and part various approaches for distilling patterns, correlations and algorithms from the data itself. What we've seen in the last week, five, six tenure, let's say, is that all of the neural network approaches for AI have come to the forefront. And in fact, the core often market place and the state of the art. AI is an ancient paradigm that's older than probably you or me that began and for the longest time was rules based system, expert systems. Those haven't gone away. The new era of AI we see as a combination of both statical approaches as well as rules based approaches, and possibly even orchestration based approaches like graph models or building broader context or AI for a variety of applications especially distributed Edge application. >> Okay, thank you and then another question slash comment, AI like graphics in 1985, we move from a separate category to a core part of all apps. AI infused apps, again, Jim, you have a very detailed answer in the chat room but maybe you can give the summary version. >> Jim: Well quickly now, the most disruptive applications we see across the world, enterprise, consumer and so forth, the advantage involves AI. You know at the heart of its machine learning, that's neural networking. I wouldn't say that every single application is doing AI. But the ones that are really blazing the trail in terms of changing the fabric of our lives very much, most of them have AI at their heart. That will continue as the state of the art of AI continues to advance. So really, one of the things we've been saying in our research at Wikibon `is that the data scientists or those skills and tools are the nucleus of the next generation application developer, really in every sphere of our lives. >> Great, quick comment is we will be sending out these slides to all participants. We'll be posting these slides. So thank you Kip for that question. >> And very importantly Dave, over the course of the next few days, most of our predictions docs will be posted up on Wikibon and we'll do a summary of everything that we've talked about here. >> So now the questions are coming through fast and furious. But let me just try to rapid fire here 'cause we only got about a minute left. True private cloud definition. Just say this, we have a detailed definition that we can share but essentially it's substantially mimicking the public cloud experience on PRIM. The way we like to say it is, bringing the cloud operating model to your data versus trying to force fit your business into the cloud. So we've got detailed definitions there that frankly are evolving. about PaaS, there's a question about PaaS. I think we have a prediction in one of our, you know, appendices predictions but maybe a quick word on PaaS. >> Yeah, very quick word on PaaS is that there's been an enormous amount of effort put on the idea of the PaaS marketplace. Cloud Foundry, others suggested that there would be a PaaS market that would evolve because you want to be able to effectively have mobility and migration and portability for this large cloud application. We're not seeing that happen necessarily but what we are seeing is that developers are increasingly becoming a force in dictating and driving cloud decision making and developers will start biasing their choices to the platforms that demonstrate that they have the best developer experience. So whether we call it PaaS, whether we call it something else. Providing the best developer experience is going to be really important to the future of the cloud market place. >> Okay great and then George, George O, George Gilbert, you'll follow up with George O with that other question we need some clarification on. There's a question, really David, I think it's for you. Will persistent dims emerge first on public clouds? >> Almost certainly. But public clouds are where everything is going first. And when we talked about UniGrid, that's where it's going first. And then, the NVMe over fabrics, that architecture is going to be in public clouds. And it has the same sort of benefits there. And NV dims will again develop pretty rapidly as a part of the NVMe over fabrics. >> Okay, we're out of time. We'll look through the chat and follow up with any other questions. Peter, back to you. >> Great, thanks very much Dave. So once again, we want to thank you everybody here that has participated in the webinar today. I apologize for, I feel like Hans Solo and saying it wasn't my fault. But having said that, none the less, I apologize Neil Raden and everybody who had to deal with us finding and unmuting people but we hope you got a lot out of today's conversation. Look for those additional pieces of research on Wikibon, that pertain to the specific predictions on each of these different things that we're talking about. And by all means, Support@silicongangle.freshdesk.com, if you have an additional question but we will follow up with as many as we can from those significant list that's starting to queue up. So thank you very much. This closes out our webinar. We appreciate your time. We look forward to working with you more in 2018. (upbeat music)

Published Date : Dec 16 2017

SUMMARY :

And that is the emergence of the cloud. but the chief digital officer we see how much data moves from the Edge back to the cloud. and more and more of the AI involves deployment and we have multiple others that the ranks of data scientists are going to sort Neil's prescription that as the tools improve And most of the vendors that we have that AI is going to be relatively easy to build the low hanging fruit is for them to use of the talent ends up. of the triangle here. And that leads to the idea the logic albeit to drive sort of the verbiage And one of the big things we see happening is in the sense that you can quickly the role that the cloud is going to play. Is it something that is going to live on the Edge. is that data is going to dictate that and Kubernetes holds the promise to solve the hardware technologies are also going to evolve. of the processor you get back and the application of AI to So George, it sounds like it's going to be one of the key and the stringency of the SLA's, over the course of the next day, two days, to the extent that they are you know empowered. in the ArpiM big data era, with cloud vendors, as the discipline starts to mature, AI is sort of the uber category. and the state of the art. in the chat room but maybe you can give the summary version. at Wikibon `is that the data scientists these slides to all participants. over the course of the next few days, bringing the cloud operating model to your data Providing the best developer experience is going to be with that other question we need some clarification on. that architecture is going to be in public clouds. Peter, back to you. on Wikibon, that pertain to the specific predictions

ENTITIES

Entity	Category	Confidence
David Floyer	PERSON	0.99+
David Vellante	PERSON	0.99+
Jim	PERSON	0.99+
Neil	PERSON	0.99+
David	PERSON	0.99+
Stuart	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Neil Raden	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
2018	DATE	0.99+
AWS	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
George	PERSON	0.99+
Wikibon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
2017	DATE	0.99+
Stuart Miniman	PERSON	0.99+
George Gilbert	PERSON	0.99+
Peter Drucker	PERSON	0.99+
May 2018	DATE	0.99+
Peter	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Dave	PERSON	0.99+
1985	DATE	0.99+
50%	QUANTITY	0.99+
Last year	DATE	0.99+
George O	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Hans Solo	PERSON	0.99+
Support@silicongangle.freshdesk.com	OTHER	0.99+
12 billion dollars	QUANTITY	0.99+
second consideration	QUANTITY	0.99+
11 billion dollars	QUANTITY	0.99+
Nine systems	QUANTITY	0.99+

Action Item | 2018 Predictions Addendum

>> Hi I'm Peter Burris. Welcome to Action Item. (upbeat electronic music) Every week I bring the Wikibon research team together to talk about some of the issues that are most important in the computing industry and this week is no different. This week I'm joined by four esteemed Wikibon analysts, David Floyer, Neil Radon, Jim Kobielus, Ralph Finos, and what we're going to do is we're going to talk a few minutes about some of the predictions that we did not get into, our recent predictions webinar. So, I'd like to start off with Jim Kobielus. Jim, one of the things that we didn't get a chance to talk about yesterday in the overall predictions webinar was some of the new AI frameworks that are on the horizon for developers. So, let's take a look at it. What's the prediction? >> Prediction for 2018, Peter, is that the AI community will converge on an open framework. An open framework for developing, training and deploying deep learning and machine learning applications. In fact, in 2017, we've seen the momentum in this direction, strong momentum. If you were at AWS re:Invent just a few weeks ago, you'll notice that on the main stage, they discuss what they're doing in terms of catalyzing an open API, per building AI, an open model interchange format, and an open model compilation framework, and they're not the only vendor who's behind this. Microsoft has been working with AWS, as well as independently and with other partners to catalyze various aspects of this open framework. We also see Intel and Google and IBM and others marching behind a variety of specifications such as Gluon (mumbles) NNVM and so forth, so we expect continued progress along these lines in 2018, and that we expect that other AI solution provider, as well as users and developers will increasingly converge on this, basically, the abstraction framework that will make it irrelevant whether you build your model in TensorFlow or MXNet or whatever, you'd be able to compile it and run it in anybody else's back end. >> So Jim, one question then we'll move on to Neil really quickly but one question that i have is the relationship between tool choice and role in the organization has always been pretty tight. Roles have changed as a consequence of the availability of tools. Now we talked about some of the other predictions. How the data scientist role is going to change. As we think about some of these open AI development frameworks, how are they going to accommodate the different people that are going to be responsible for building and creating business value out of AI and data? >> Pete, hit it on another level that i didn't raise in my recent predictions document, but i'll just quickly touch on it. We're also seeing the development of open devops environments within which teams of collaborators, data scientists, subject matter experts, data engineers and so forth will be able to build and model and train and deploy deep learning and so forth within a standard workflow where each one of them has task-oriented tools to enable their piece but they all share a common governance around the models, the data and so forth. In fact, we published a report several months ago, Wikibon, talking about devops for data science, and this is a huge research focus for us going forward, and really, for the industry as a whole. It's productionizing of AI in terms of building and deploying the most critical applications, the most innovative applications now in business. >> Great, Jim, thanks very much for that. So Neil, I want to turn to you now. One of the challenges that the big data and the computing industry faces overall is that how much longer are we going to be able to utilize the technologies that have taken us through the first 50 years at the hardware level, and there is some promise in some new approaches to thinking about computing. What's your prediction? >> Well in 2018, you're going to see a demonstration of an actual quantum computer chip that's built on top of existing silicone technology and fabrication. This is a real big deal because what this group in the University of New South Wales came up with was a way to layer traditional transistors and silicon on top of those wacky quantum bits to control them, and to deal with, I don't want to get too technical about that, but the point is that quantum computing has the promise of moving computing light years ahead of where we are now. We've managed to build lots of great software on things that go on or off, and quantum computing is much more than that. I think what you're going to see in 2018 is a demonstration of actual quantum computing chips built on this, and the big deal in that is that we can take these existing machines and factories and capital equipment designed for silicone, and start to produce quantum chips without basically developing a whole new industry. Now why is this important? It's only the first step because these things are not going to be based on the existing Intel i86 instruction set, so all new software will have to be developed, software engineers are going to have to learn a whole new way of doing things, but the possibilities are endless. If you can think about a drug discovery, or curing disease, or dealing with the climate, or new forms of energy to propel us into space, that's where quantum computing is likely to take this. >> Yeah, quantum computing, just to bring a, kind of a fine point on it, allows, at any given time, the machine to be in multiple different states, and it's that fact that allows, in many respects, a problem to be attacked from a large number of directions at the same time, and then test each of them out, so it has a natural affinity with some of the things that we think about in AI, so it's going to have an enormous impact over the course of the next few years and it's going to be interesting to see how this plays out. So David Floyer, I now want to turn to you. We're not likely to see quantum computing at the edge anytime soon, by virtue of some of the technologies we face. More likely it'll be specialized processors up in the cloud service provider in the near term. But what are you going to talk about when we think about the role that the edge is going to play in the industry, and the impacts it's going to have on, quite frankly, the evolution of de facto standards? >> Well, I'd like to focus on the economics of edge devices. And my prediction is that the economics of consumer-led volume will dominate the design of IoT devices at the edge. If you take an IoT device, it's made up of sensors and advanced analytics and AI, and specifically designed compute elements, and together with the physical setup of fitting it into wherever you're going to put it, that is the overall device that will be put into the edge, and that's where all of the data is going to be generated, and obviously, if you generate data somewhere, the most efficient way of processing that data is actually at the edge itself, so you don't have to transport huge amounts of data. So the prediction is that new vendors with deep knowledge of the technology itself, using all the tools that Jim was talking about, and deep knowledge of the end user environments and the specific solutions that they're going to offer, they will come out with much lower cost solutions than traditional vendors. So to put a little bit of color around it, let's take a couple of real-world examples where this is already in place in the consumer world, and will be the basis of solutions in the enterprise. If we take the Apple iPhone X, it has facial recognition built-in, and it has facial recognition built-in on their A11 chips, but they're bionic chips. They've got GPUs, they've got neural networks all in the chip itself, and the total cost of that solution is around a hundred dollars in terms of these parts, and that includes the software. So if we take that hundred dollars and put it into what it would actually be priced at, that's around $300. So that's a much, much lower cost than a traditional IT vendor could ever do, and a much, at least an order of magnitude, and probably two orders of magnitude cheaper than an IT department could produce for its own use. So that leaves (mumbles) inclusions, going to be a lot of new vendors. People like Sony, for example, Hitachi, Fujitsu, Honeywell. Possibly people like Apple and Microsoft. Nvidia, Samsung, and many companies that we'll predict are going to come out of India, China and Russia who have strong mathematical educational programs. So the action item is for CIOs, is to really look carefully at the projects that you are looking at, and determine, do I really have the volume to be unique in this area? If that volume, if it's a problem which is going to be industry-wide, the advice we would give is wait for that device to come out from a specialized vendor rather than develop it yourself. And focus investment on areas where you have both the volume of devices and the volume of data that will allow you to be successful. >> All right, David, thank you very much. So let me wrap this week's Action Item, which has been kind of a bridge, but we've looked specifically at some of the predictions that didn't make it into our recent predictions webinar, and if I want to try to summarize or try to bring all these things together, here's what I think what we'd say. Number one, we'd say that the development community has to prepare itself for some pretty significant changes as a consequence of having an application development environment that's more probabilistic, driven by data and driven by AI and related technologies, and we think that there will be new frameworks that are deployed in 2018, and that's just where it's going to start, and will mature over the next few years as we heard from Jim Kobielus. We've also heard that there is going to be a new computing architecture that's going to drive change, perhaps for the next 50 years, and the whole concept of quantum computing is very, very real, and it's going to have significant implications. Now it will take some time to roll out, but again, software developers have to think about the implications of some these new architectures on their work because not only are they going to have to deal with technology approaches that are driven by data, but they're also going to have to look at entirely new ways of framing problems because it used to be about something different than it is today. The next thing that we need to think about is that there still is going to be the economics of computing that are going to ultimately shape how all of this plays out. David Floyer talked about, specifically at the edge, where Wikibon believes it's going to have an enormous implication on the true cost of computing and how well some of these complex problems actually find their way into commercial and other domains. So with a background of those threee things, we think, ultimately, that's an addendum to the predictions that we have and once again, i'm Peter Burris. Thank you very much for joining us for Action Item, and we look forward to working with you more closely over the course of the next year, 2018, as we envision the new changes and the practice of how to make those changes a reality. From our Palo Alto theCUBE studios, this has been Action Item. (bright electronic music)

Published Date : Dec 15 2017

SUMMARY :

that are most important in the computing industry and that we expect that other AI solution provider, How the data scientist role is going to change. and really, for the industry as a whole. and the computing industry faces overall in the University of New South Wales came up with and the impacts it's going to have on, and that includes the software. is that there still is going to be the economics

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Fujitsu	ORGANIZATION	0.99+
Samsung	ORGANIZATION	0.99+
Hitachi	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Sony	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
David Floyer	PERSON	0.99+
Ralph Finos	PERSON	0.99+
Neil Radon	PERSON	0.99+
Peter Burris	PERSON	0.99+
2017	DATE	0.99+
IBM	ORGANIZATION	0.99+
Honeywell	ORGANIZATION	0.99+
2018	DATE	0.99+
Neil	PERSON	0.99+
Google	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
hundred dollars	QUANTITY	0.99+
University of New South Wales	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
one question	QUANTITY	0.99+
yesterday	DATE	0.99+
India	LOCATION	0.99+
Russia	LOCATION	0.99+
Peter	PERSON	0.99+
Pete	PERSON	0.99+
first step	QUANTITY	0.99+
China	LOCATION	0.99+
Intel	ORGANIZATION	0.99+
This week	DATE	0.99+
this week	DATE	0.99+
first 50 years	QUANTITY	0.99+
around $300	QUANTITY	0.99+
iPhone X	COMMERCIAL_ITEM	0.98+
today	DATE	0.97+
both	QUANTITY	0.97+
each	QUANTITY	0.97+
each one	QUANTITY	0.96+
around a hundred dollars	QUANTITY	0.96+
TensorFlow	TITLE	0.95+
one	QUANTITY	0.95+
One	QUANTITY	0.95+
MXNet	TITLE	0.93+
next year, 2018	DATE	0.93+
threee things	QUANTITY	0.91+
few weeks ago	DATE	0.91+
years	DATE	0.9+
several months ago	DATE	0.88+
two orders	QUANTITY	0.88+
four	QUANTITY	0.84+
theCUBE	ORGANIZATION	0.82+
i86	COMMERCIAL_ITEM	0.81+
Palo Alto	LOCATION	0.71+
Gluon	ORGANIZATION	0.68+
next 50 years	DATE	0.61+
next	DATE	0.54+
Invent	EVENT	0.51+
couple	QUANTITY	0.5+
NNVM	TITLE	0.46+
A11	COMMERCIAL_ITEM	0.45+

Adrian Cockcroft, AWS | KubeCon 2017

>> Announcer: Live from Austin, Texas, It's The Cube. Covering KubeCon 2017 and CloudNativeCon 2017. Brought to you by Red Hat, The Lennox Foundation, and The Cube's ecosystem partners. >> Okay, welcome back everyone. Live here in Austin, Texas, this is The Cube's exclusive coverage of the CNCF CloudNativeCon which was yesterday, and today is KubeCon, for Kubernetes conference, and a little bit tomorrow as well, some sessions. Our next guest is Adrian Cockcroft, VP of Cloud Architecture Strategy at AWS, Amazon Web Services, and my co-host Stu Miniman. Obviously, Adrian, an industry legend on Twitter and the industry, formerly with Netflix, knows a lot about AWS, now VP of Cloud Architecture, thanks for joining us. Appreciate it. >> Thanks very much. >> This is your first time as an AWS employee on The Cube. You've been verified. >> I've been on The Cube before. >> Many times. You've been verified. What's going on now with you guys, obviously coming off a hugely successful reinvent, there's a ton of video of me ranting and raving about how you guys are winning, and there's no second place, in the rear-view mirror, certainly Amazon's doing great. But CloudNative's got the formula, here. This is a cultural shift. What is going on here that's similar to what you guys are doing architecturally, why are you guys here, are you evangelizing, are you recruiting, are you proposing anything? What's the story? >> Yeah, it's really all of those things. We've been doing CloudNative for a long time, and the key thing with AWS, we always listen to our customers, and go wherever they take us. That's a big piece of the way we've always managed to keep on top of everything. And in this case, the whole container industry, there's a whole whole market there, there's a lot of different pieces, we've been working on that for a long time, and we found more and more people interested in CNCF and Kubernetes, and really started to engage. Part of my role is to host the open source team that does outbound engagement with all the different open source communities. So I've hired a few people, I hired Arun Gupta, who's very active in CNCF earlier this year, and internally we were looking at, we need to join CNCF at some point. We got to do that eventually and venture in, let's go make it happen. So last summer we just did all the internal paperwork, and running around talking to people and got everyone on the same page. And then in August we announced, hey, we're joining. So we got that done. I'm on the board of CNCF, Arun's my alternate for the board and technical, running around, and really deeply involved in as much of the technology and everything. And then that was largely so that we could kind of get our contributions from engineering on a clear footing. We were starting to contribute to Kupernetes, like as an outsider to the whole thing. So that's why we're, what's going on here? So getting that in place was like the basis for getting the contributions in place, we start hiring, we get the teams in place, and then getting our ducks in a row, if you like. And then last week at Reinvent, we announced EKS, the EC2 Kubernete's Service. And this week, we all had to be here. Like last week after Reinvent, everyone at AWS wants to go and sleep for a week. But no, we're going to go to Austin, we're going to do this. So we have about 20 people here, we came in, I did a little keynote yesterday. I could talk through the different topics, there, but fundamentally we wanted to be here where we've got the engineering teams here, we've got the engineering managers, they're in full-on hiring mode, because we've got the basic teams in place, but there's a lot more we want to do, and we're just going out and engaging, really getting to know the customers in detail. So that's really what drives it. Customer interactions, little bit of hiring, and just being present in this community. >> Adrian, you're very well known in the open source community, everything that you've done. Netflix, when you were on the VC side, you evangelized a bunch of it, if I can use the term. Amazon, many of us from the outside looked and, trying to understand. Obviously Amazon used lots of open source, Amazon's participated in a number of open source. MXNet got a lot of attention, joining the CNCF is something, I know this community, it's been very positively received, everybody's been waiting for it. What can you tell us about how Amazon, how do they think about open source? Is that something that fits into the strategy, or is it a tactic? Obviously, you're building out your teams, that sends certain signals to market, but can you help clarify for those of us that are watching what Amazon thinks about when it comes to this space? >> I think we've been, so, we didn't really have a team focused on outbound communication of what we were doing in open source until I started building this team a year ago. I think that was the missing link. We were actually doing a lot more than most people realized. I'd summarize it as saying, we were doing more than most people expected, but less than we probably could have been given the scale of what we are, the scale that AWS is at. So part of what we're doing is unlocking some internal demand where engineering teams were going. We'd like to open source something, we don't know how to engage with the communities. We're trying to build trust with these communities, and I've hired a team, I've got several people now, who are mostly from the open source community, we were also was kind of interviewing people like crazy. That was our sourcing for this team. So we get these people in and then we kind of say, all right, we have somebody that understands how to build these communities, how to respond, how to engage with the open source community. It's a little different to a standard customer, enterprise, start up, those are different entities that you'd want to relate to. But from a customer point of view, being customer-obsessed as AWS is, how do we get AWS to listen to an open source community and work with them, and meet all their concerns. So we've been, I think, doing a better job of that now we've pretty much got the team in place. >> That's your point, is customer focus is the ethos there. The communities are your customers in this case. So you're formalizing, you're formalizing that for Amazon, which has been so busy building out, and contributing here and there, so it sounds like there was a lot of activity going on within AWS, it was just kind of like contributing, but so much work on building out cloud ... >> Well there's a lot going on, but if no one was out there telling the story, you didn't know about it. Actually one of the best analogies we have for the EKS is actually our EMR, our Hadoop service, which launched 2010 or something, 2009, we've had it forever. But from the first few years when we did EMR, it was actually in a fork. We kept just sort of building our own version of it to do things, but about three or four years ago, we started upstreaming everything, and it's a completely clean, upstreamed version of all the Hadoop and all the related projects. But you make one API call, a cluster appears. Hey, give me a Hadoop cluster. Voom, and I want Spark and I want all these other things on it. And we're basically taking Kubernetes, it's very similar, we're going to reduce that to a single API call, a cluster appears, and it's a fully upstreamed experience. So that's, in terms of an engineering relationship to open source, we've already got a pretty good success story that nobody really knew about. And we're following a very similar path. >> Adrian, can you help us kind of unpack the Amazon Kubernetes stack a little bit? One of the announcements had a lot of attention, definitely got our attention, Fargate, kind of sits underneath what Kubernetes is doing, my understanding. Where are you sitting with the service measures, kind of bring us through the Amazon stack. What does Amazon do on its own versus the open source, and how those all fit together. >> Yeah, so everyone knows Amazon is a place where you can get virtual machines. It's easy to get me a virtual machine from ten years ago, everyone gets that, right? And then about three years ago, I think it was three years ago, we announced Lambda - was that two or three years ago? I lose track of how many reinvents ago it was. But with Lambda it's like, well, just give me a function. But as a first class entity, there's a, give me a function, here's the code I want you to run. We've now added two new ways that you can deploy to, two things you can deploy to. One of them's bare metal, which is already announced, one of the many, many, many announcements last week that might have slipped by without you noticing, but Bare Metal is a service. People go, 'those machines are really big'. Yes, of course they're really big! You get the whole machine and you can be able to bring your own virtualization or run whatever you want. But you could launch, you could run Kubernetes on that if you wanted, but we don't really care what you run it on. So we had Bare Metal, and then we have container. So Fargate is container as a first class entity that you deploy to. So here's my container registry, point you at it, and run one of these for me. And you don't have to think about deploying the underlying machines it's running on, you don't have to think about what version of Lennox it is, you have to build an AMI, all of the agents and fussing around, and you can get it in much smaller chunks. So you can say you get a CPU and half a gig of ram, and have that as just a small container. So it becomes much more granular, and you can get a broader range of mixes. A lot of our instances are sort of powers of two of a ratio of CPU to memory, and with Fargate you can ask for a much broader ratio. So you can have more CPU, less memory, and go back the other way, as well. 'Cause we can mix it up more easily at the container level. So it gives you a lot more flexibility, and if you buy into this, basically you'll get to do a lot of cost reduction for the sort of smaller scale things that you're running. Maybe test environments, you could shrink them down to just the containers and not have a lot of wasted space where you're trying to, you have too many instances running that you want to put it in. So it's partly the finer grain giving you more ability to say -- >> John: Or consumption choice. >> Yeah, and the other thing that we did recently was move to per-second billing, after the first minute, it's per-second. So the granularity of Cloud is now getting to be extremely fine-grained, and Lambda is per hundred millisecond, so it's just a little bit -- >> $4.03 for your bill, I mean this is the key thing. You guys have simplified the consumption experience. Bare Metal, VM's, containers, and functions. I mean pick one. >> Or pick all of them, it's fine. And when you look at the way Fargate's deployed in ECS it's a mixture. It's not all one or all the other, you deploy a number of instances with your containers on them, plus Fargate to deploy some additional containers that maybe didn't fit those instances. Maybe you've got a fleet of GPU enhanced machines, but you want to run a bit of Logic around it, some other containers in the same execution environment, but these don't need to be on the GPU. That kind of thing, you can mix it up. The other part of the question was, so how does this play into Kubernetes, and the discussions are just that we had to release the thing first, and then we can start talking, okay, how does this fit. Parts of the model fit into Kubernetes, parts don't. So we have to expose some more functionality in Fargate for this to make sense, 'cause we've got a really minimal initial release right now, we're going to expose it and add some more features. And then we possibly have to look at ways that we mutate Kubernetes a little bit for it to fit. So the initial EKS release won't include Fargate, because we're just trying to get it out based on what everyone knows today, we'd rather get that out earlier. But we'll be doing development work in the meantime, so a subsequent release we'll have done the integration work, which will all happen in public, in discussion with the community, and we'll have a debate about, okay, this is the features Fargate needs to properly integrate into Kubernetes, and there are other similar services from other top providers that want to integrate to the same API. So it's all going to be done as a public development, how we architect this. >> I saw a tweet here, I want to hear your comments on, it's from your keynote, someone retweeted, "managing over 100,000 clusters on ACS, hashtag Fargate," integrated into ECS, your hashtag, open, ADM's open. What is that hundred thousand number. Is that the total number, is that an example? On elastic container service, what does that mean? >> So ECS is a very large scale, multi-tenant container operation service that we've had for several years. It's in production, if you compare it to Kubernetes it's running much larger clusters, and it's been running at production-grade for longer. So it's a little bit more robust and secure and all those kinds of things. So I think it's missing some Kubernetes features, and there's a few places where we want to bring in capabilities from Kubernetes and make ECS a better experience for people. Think of Kubernetes as some what optimized for the developer experience, and ECS for more the operations experience, and we're trying to bring all this together. It is operating over a hundred thousand clusters of containers, over a hundred thousand clusters. And I think the other number was hundreds of millions of new containers are launched every week, or something like that. I think it was hundreds of millions a week. So, it's a very large scale system that is already deployed, and we're running some extremely large customers on, like Expedia and Macbook. Macbook ... Mac Box. Some of these people are running tens of thousands of containers in production as a single, we have single clusters in the tens of thousands range. So it's a different beast, right? And it meets a certain need, and we're going to evolve it forwards, and Kubernetes is serving a very different purpose. If you look at our data science space, if you want exactly the same Hadoop thing, you can get that on prem, you can run EMR. But we have Athena and Red Shift and all these other ways that are more native to the way we think, where we can go iterate and build something very specific to AWS, so you blend these two together and it depends on what you're trying to achieve. >> Well Adrian, congratulations on a great opportunity, I think the world is excited to have you in your role, if you could clarify and just put the narrative around, what's actually happening in AWS, what's been happening, and what you guys are going to do forward. I'll give you the last minute to let folks know what your job is, what your objective is, what you're looking for to hire, and your philosophy in the open source for AWS. >> I think there's a couple of other projects, and we've talked, this is really all about containers. The other two key project areas that we've been looking at are deep learning frameworks, since all of the deep learning frameworks are open source. A lot of Kubernetes people are using it to run GPUs and do that kind of stuff. So Apache MXNet is another focus on my team. It went into the incubation phase last January, we're walking it through, helping it on its way. It's something where we're 30, 40% of that project is AWS contribution. So we're not dominating it, but we're one of its main sponsors, and we're working with other companies. There's joint work with, it's lots of open source projects around here. We're working with Microsoft on Gluon, we're working with Facebook and Microsoft on Onyx which is an open URL network exchange. There's a whole lot of things going on here. And I have somebody on my team who hasn't started yet, can't tell you who it is, but they're starting pretty soon, who's going to be focusing on that open source, deep learning AI space. And the final area I think is interesting is IOT, serverless, Edge, that whole space. One announcement recently is free AltOS. So again, we sort of acquired the founder of this thing, this free real-time operating system. Everything you have, you probably personally own hundreds of instances of this without knowing it, it's in everything. Just about every little thing that sits there, that runs itself, every light bulb, probably, in your house that has a processor in it, those are all free AltOS. So it's incredibly pervasive, and we did an open source announcement last week where we switched its license to be a pure MIT license, to be more friendly for the community, and announced an Amazon version of it with better Amazon integration, but also some upgrades to the open source version. So, again, we're pushing an open source platform, strategy, in the embedded and IOT space as well. >> And enabling people to build great software, take the software engineering hassles out for the application developers, while giving the software engineers more engineering opportunities to create some good stuff. Thanks for coming on The Cube and congratulations on your continued success, and looking forward to following up on the Amazon Web Services open source collaboration, contribution, and of course, innovation. The Cube doing it's part here with its open source content, three days of coverage of CloudNativeCon and KubeCon. It's our second day, I'm John Furrier, Stu Miniman, we'll be back with more live coverage in Austin, Texas, after this short break. >> Offscreen: Thank you.

Published Date : Dec 7 2017

SUMMARY :

Brought to you by Red Hat, The Lennox Foundation, exclusive coverage of the CNCF CloudNativeCon This is your first time as an AWS employee on The Cube. What's going on now with you guys, and got everyone on the same page. Is that something that fits into the strategy, So we get these people in and then we kind of say, and there, so it sounds like there was a lot of activity telling the story, you didn't know about it. One of the announcements had a lot of attention, So it's partly the finer grain giving you more Yeah, and the other thing that we did recently was move to You guys have simplified the consumption experience. It's not all one or all the other, you deploy Is that the total number, is that an example? that are more native to the way we think, and what you guys are going to do forward. So it's incredibly pervasive, and we did an open source And enabling people to build great software,

ENTITIES

Entity	Category	Confidence
Adrian	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Adrian Cockcroft	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
John Furrier	PERSON	0.99+
last week	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
August	DATE	0.99+
Netflix	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
second day	QUANTITY	0.99+
One	QUANTITY	0.99+
CNCF	ORGANIZATION	0.99+
2010	DATE	0.99+
this week	DATE	0.99+
AltOS	TITLE	0.99+
Austin, Texas	LOCATION	0.99+
yesterday	DATE	0.99+
first minute	QUANTITY	0.99+
Austin	LOCATION	0.99+
last summer	DATE	0.99+
Arun Gupta	PERSON	0.99+
tens of thousands	QUANTITY	0.99+
KubeCon	EVENT	0.99+
today	DATE	0.99+
one	QUANTITY	0.99+
MXNet	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Macbook	COMMERCIAL_ITEM	0.99+
2009	DATE	0.99+
John	PERSON	0.99+
three years ago	DATE	0.99+
a year ago	DATE	0.99+
hundreds of millions a week	QUANTITY	0.99+
two	DATE	0.98+
last January	DATE	0.98+
The Cube	ORGANIZATION	0.98+
ten years ago	DATE	0.98+
two things	QUANTITY	0.98+
three days	QUANTITY	0.98+
over a hundred thousand clusters	QUANTITY	0.98+
KubeCon 2017	EVENT	0.98+
over 100,000 clusters	QUANTITY	0.98+
$4.03	QUANTITY	0.97+
two	QUANTITY	0.97+
hundred thousand	QUANTITY	0.97+
two new ways	QUANTITY	0.97+
Fargate	ORGANIZATION	0.97+
Lambda	TITLE	0.97+
CloudNativeCon 2017	EVENT	0.97+
The Lennox Foundation	ORGANIZATION	0.97+
half a gig	QUANTITY	0.97+

Swami Sivasubramanian, AWS | AWS re:Invent 2017

>> Announcer: Live from Las Vegas, it's theCUBE. Covering AWS re:Invent 2017. Presented by AWS, Intel and our ecosystem of partners. >> Hey, welcome back everyone. We're live here in Las Vegas. It's theCUBE's exclusive coverage of AWS. Amazon Web Services re:Invent 2017. Amazon web Services annual conference, 45,000 people here. Five years in a row for theCUBE, and we're going to be continuing to cover years and decades after, it's on a tear. I'm John Furrier, my co-host Stu Miniman. Exciting science, one of the biggest themes here is AI, IoT, data, Deep Learning, DeepLens, all the stuff that's been really trending has been really popular at the show. And the person behind that Amazon is Swami. He's the Vice President of Machine Learning at AWS, among other things, Deep Learning and data. Welcome to theCUBE. >> Stu: Good to see you. >> Excited to be here. >> Thanks for coming on. You're the star of the show. Your team put out some great announcements, congratulations. We're seeing new obstruction layers of complexity going away. You guys have made it easy to do voice, Machine Learning, all those great stuff. >> Swami: Yeah. >> What are you most excited about, so many good things? Can you pick a child? I don't want to pick my favorite child among all my children. Our goal is to actually put Machine Learning capabilities in the hands of all developers and data scientists. That's why, I mean, we want to actually provide different kinds of capabilities right from like machine developers who want to build their own Machine Learning models. That's where SageMakers and n21 platform that lets people build, train and deploy these models in a one-click fashion. It supports all popular Deep Learning frameworks. It can be TensorFlow, MXNet or PyCharm. We also not only help train but automatically tune where we use Machine Learning for Machine Learning to build these things. It's very powerful. The other thing we're excited about is the API services that you talked about, the new obstruction layer where app developers who do not want to know anything about Machine Learning but they want to transcribe their audio to convert from speech to text, or translate it or understand the text, or analyze videos. The other thing coming from academia where I'm excited about is I want to teach developers and students Machine Learning in a fun fashion, where they should be excited about Machine Learning. It's such a transformative capability. That's why actually we built a device meant for Machine Learning in a hands-on fashion that's called DeepLens. We have developers right on re:Invent where from the time they take to un-box to actually build a computer with an application to build Hotdog or Not Hotdog, they can do it in less than 10 minutes. It's an amazing time to be a developer. >> John: Yeah. >> Stu: Oh my God, Swami. I've had so many friends that have sat through that session. First of all, the people that sit through it they get like a kit. >> Swami: That's awesome. >> Stu: They're super excited. Last year it was the Ecodot and everybody with new skills. This year, DeepLens definitely seems to be the one that all the geeks are playing with, really programing stuff. There's a bunch of other things here, but definitely some huge buzz and excitement. >> That's awesome, glad to hear. >> Talk about the culture at Amazon. Because I know in covering you guys for so many years and now being intimate with a lot of the developers in your teams. You guys just don't launch products, you actually listen to customers. You brought up Machine Learning for developers. What specifically jumped out at you from talking to customers around making it easier? It was too hard, was it, or it was confined to hardcore math driven data scientists? Was it just the thirst and desire for Machine Learning? Or you're just doing this for side benefits, it's like a philanthropy project? >> No, in Amazon we don't build technology because it's cool. We build technology because that's what our customers want. Like 90 to 95% of our roadmap is influenced by listening to customers. The other 5 to 10% is us reading between the lines. One of the things I actually ... When I started playing with Machine Learning, having built a bunch of database storage and analytics products. When I started getting into Deep Learning and various things I realized there's a transformative capability of these technologies. It was too hard for developers to use it on a day to day fashion, because these models are too hard to build and train. Our data now, the right level of obstruction. That's why we actually think of it as in a multi-layered strategy where we cater to export practitioners and data scientists. For them we have SageMaker. Then for app developers who do not want to know anything about Machine Learning they say, "I'll give you an audio file, transcribe it for me," or "I'll give you text, get me insights or translate it." For them we actually we actually provide simple to use API services, so that they can actually get going without having to know anything about what is TensorFlow or PyCharm. >> TensorFlow got a lot of attention, because that really engaged the developer community in the current Machine Learning, because we're like, "Oh wow, this is cool." >> Swami: Yeah. >> Then it got, I won't say hard to use, but it was high end. Are you guys responding to TensorFlow in particular or you're responding to other forces? What was the driver? >> In amazon we have been using Machine Learning for like 20 years. Since the year of like 1995 we have been leveraging Machine Learning for recommendation engine, fulfillment center where we use robots to pick packages and then Elixir of course and Amazon Go. One of the things we actually hear is while frameworks like TensorFlow or PyCharm, MXNet or PyCharm is cool. It is just too hard for developers to make use of it. We actually don't mind, our users use Cafe or TensorFlow. We want the, to be successful where they take from idea to product shell. And when we talk to developers, this process took anywhere from 6 to 18 months and it should not be this hard. We wanted to do what AWS did to IT industry for compute storage and databases. We want to do the same for Machine Learning by making it really easy to get started and consumer does in utility. That was our intel. >> Swami, I wonder if you can tell us. We've been talking for years about the flywheel of customers for Amazon. What are the economies of scale that you get for the data that you have there. I think of all the training of all the Machine Learning, the developers. How can you leverage the economies of scale that Amazon has in all those kind of environments? >> When you look at Machine Learning, Machine Learning tends to be mostly the icing on the cake. Even when we talk to the expert professors who are the top 10 scientists in the world, the data that goes into the Machine Learning is going to be the determining factor for how good it is in terms of how well you train it and so forth. This is where data scientists keep saying the breath of storage and database and analytics offerings that exist really matter for them to build highly accurate models. When you talk about not just the data, but actually the underlying database technology and storage technology really is important. S3 is the world's most powerful data leg that exists that is highly secure, reliable, scalable and cost effective. We really wanted to make sure customers like Glacier Cloud who store high resolution satellite imagery on S3 and glacier. We wanted them to leverage ML capabilities in a really easy one-click fashion. That's important. >> I got to ask you about the roadmap, because you say customers are having input on that. I would agree with you that that would be true, because you guys have a track record there. But I got to put the dots that I'm connecting in my mind right now forward by saying, you guys ... And telegraphing here certainly heard well, Furner say it and Andy, data is key and opening up that data and we're seeing New Relic here, Sumo Logic. They're sharing anonymous data from usage, workloads really instructive. Data is instructive for the marketplace, but you got to feed the models on the data. The question for you is you guys get so much data. It's really a systems management dream it's an application performance dream. You got more use case data. Are you going to open that up and what's the vision behind it? Because it seems like you could offer more and more services. >> Actually we already have. If you look at x-rays and service that we launched last year. That is one of the coolest capabilities, even I am a developer during the weekends when I cool out. Being able to dive into specific capabilities so one of the performance insights where is the borderline. It's so important that actually we are able to do things like x-raying into an application. We are just getting started. The Cloud transformed how we are building applications. Now with Machine Learning, what is going to happen is we can even do various things like ... Which is going to be the borderline on what kind of datasets. It's just going to be such an amazing time. >> You can literally reimagine applications that are once dominant with all the data you have, if you opened it up and then let me bring my data in. Then that will open up a bigger aperture of data. Wouldn't that make the Machine Learning and then AI more effective? >> Actually, you already can do similar things with Lex. Lex, think of it as it's an automatic speech recognition natural language understanding where we are pre-trained on our data. But then to customize it for your own chat bots or voice applications, you can actually add your own intents and several things and we customize it underlying Deep Learning model specific to your data. You're leveraging the amount of data that we have trained in addition to specifically tuning for yours. It's only going to get better and better, to your point. >> It's going to happen, it's already happening. >> It's already happening, yeah. >> Swami, great slate of announcements on the Machine Learning side. We're seeing the products get all updated. I'm wondering if you can talk to us a little bit about the human side of things. Because we've seen a lot of focus, right, it's not just these tools but it's the tools and the people putting those together. How does Amazon going to help the data scientists, help retrain, help them get ready to be able to leverage and work even better with all these tools? >> Machine Learning, we have seen some amazing usage of how developers are using Machine Learning. For example, Mariness Analytics is a non-profit organization that its goal is to fight human trafficking. They use recognition which is our image processing. They do actually identify persons of interest and victims so that they can notify law enforcement officer. Like Royal National Institute of Blind. They actually are using audio text to speech to generate audio books for visually impaired. I'm really excited about all the innovative applications that we can do to simply improve our everyday lives using Machine Learning, and it's such in early days. >> Swami, the innovation is endless in my mind. But I want to get two thoughts from you, one startup and one practitioner. Because we've heard here in theCUBE, people come here and saying, "I can do so much more now. "I've got my EMR, it's so awesome. "I can do this solving problem." Obviously making it easy to use is super cool, that's one. I want to get your thoughts on where that goes next. And two, startups. We're seeing a lot of startups retooling on Cloud economics. I call it post-2013 >> Swami: Yeah. >> They don't need a lot of money, they can hit critical mass. They can get market product, market fit earlier. They can get economic value quicker. So they're changing the dynamics. But the worry is, how do I leverage the benefit of Amazon? Because we know Amazon is going to grow and all Clouds grow and just for you guys. How do I play with Amazon? Where is the white space? How do I engage, do I just ...? Once I'm on the platform, how do I become the New Relic or slunk? How can I grow my marketplace and differentiate? Because Amazon might come out with something similar. How do I stay in that cadence of growth, even a startup? >> If you see in AWS we have tens of thousands of partners of course, right from ISV, SIs and whatnot. Software industry is an amazing industry where it's not like winner take all market. For example, in the document management space, even though we have S3 and WorkDocs, it doesn't mean Dropbox and Box are not successful either, and so forth. What we provide in AWS is the same infrastructure for any startup or for my team, even though I build probably many of the underlying infrastructure. Nowadays for my AI team, it's literally like a startup except I probably stay in an AWS building, but otherwise I don't get any internal APIs, it's the same API so easy to S3. >> John: It's a level playing field. >> It's a level playing field. >> By the way, everyone should know, he wrote DynamoDB. As an intern or was that ...? (Swami laughs) And then SQS, rockstar techy here, so it's great to have. You're what we call a tech athlete. Great to have you on. No white space, just go for it. >> Innovation is the key. The key thing, what we have seen amazing startups who have done exceptionally well is they intently listen to customers and innovate and really look for what it matters for their customers and go for it. >> The biggest buzz of the show from your group. What's your biggest buzz from the show here? DeepLens? >> DeepLens has been ... Our idea was to actually come up with a fun way to learn Machine Learning. Machine Learning, it used to be, even until recently actually as well as last week, it was actually an intimate thing for developers to learn while there is, it's all the buzz. It's not really straight forward for developers to use it. We thought, "Hey, what is a fun way for developers "to get engaged and build Machine Learning?" That's why we actually can see DeepLens so that you can actually build fun applications. I talked about Hotdog, Not Hotdog. I'm personally going to be building what I call as a Bear Cam. Because I live in the suburbs of Seattle where we actually have bears visiting our backyard digging our trash. I want to actually have DeepLens with a pre-train model that I'm going to train to detect bears. That it sends me a message through SQS and SNS so I get a text. >> Here's an idea we want to do, maybe your team can build it for us. CUBE Cam, we put the DeepLens here and then as anyone goes by, if they're a Twitter follower of theCUBE they can send me a message. (John and Swami laughing) Swami, great stuff. Deep Learning again, more goodness coming. >> Swami: That's awesome. >> What are you most excited about? >> In Amazon we have a phrase called, "It's Day One." Even though we are a 22-year-old company, I jokingly tell my team that, "It's day one for us, "except we just woke up and we haven't even "had a cup of coffee yet." We have just scratched the surface with Machine Learning, there is so much stuff to do. I'm super excited about this space. >> Your goals for this year is what? What's your goals? >> Our goals for this year was to put Machine Learning capabilities in the hands of all developers of all skill levels. I think we have done pretty well so far I think. >> Well, congratulations Swami here on theCUBE. Vice president of Machine Learning and a lot more, all those applications that were announced Wednesday along with the Deep Leaning and the AI and the DeepLens all part of his innovative team here at Amazon. Changing the game is theCUBE doing our part bringing data to you, video and more coverage. Go to Siliconangle.com for all the stories, Wikibon.com for research and of course theCUBE.net. I'm John Furrier and Stu Miniman. Thanks for watching, we'll be right back.

Published Date : Dec 1 2017

SUMMARY :

Announcer: Live from Las Vegas, it's theCUBE. has been really popular at the show. You're the star of the show. is the API services that you talked about, First of all, the people that sit through it that all the geeks are playing with, a lot of the developers in your teams. One of the things I actually ... because that really engaged the developer community Are you guys responding to TensorFlow in particular One of the things we actually hear is What are the economies of scale that you get is going to be the determining factor for how good it is I got to ask you about the roadmap, so one of the performance insights where is the borderline. Wouldn't that make the Machine Learning You're leveraging the amount of data that we have trained and the people putting those together. I'm really excited about all the innovative applications Swami, the innovation is endless in my mind. Where is the white space? it's the same API so easy to S3. Great to have you on. Innovation is the key. The biggest buzz of the show from your group. Because I live in the suburbs of Seattle Here's an idea we want to do, We have just scratched the surface with Machine Learning, Machine Learning capabilities in the hands Changing the game is theCUBE doing our part

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Swami	PERSON	0.99+
Stu Miniman	PERSON	0.99+
5	QUANTITY	0.99+
90	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Royal National Institute of Blind	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Las Vegas	LOCATION	0.99+
Wednesday	DATE	0.99+
Swami Sivasubramanian	PERSON	0.99+
1995	DATE	0.99+
Five years	QUANTITY	0.99+
Last year	DATE	0.99+
Andy	PERSON	0.99+
two thoughts	QUANTITY	0.99+
last year	DATE	0.99+
6	QUANTITY	0.99+
less than 10 minutes	QUANTITY	0.99+
45,000 people	QUANTITY	0.99+
Stu	PERSON	0.99+
95%	QUANTITY	0.99+
one practitioner	QUANTITY	0.99+
Mariness Analytics	ORGANIZATION	0.99+
TensorFlow	TITLE	0.99+
S3	TITLE	0.99+
20 years	QUANTITY	0.99+
two	QUANTITY	0.99+
New Relic	ORGANIZATION	0.98+
last week	DATE	0.98+
18 months	QUANTITY	0.98+
10%	QUANTITY	0.98+
This year	DATE	0.98+
Siliconangle.com	OTHER	0.98+
PyCharm	TITLE	0.98+
2013	DATE	0.98+
Amazon web Services	ORGANIZATION	0.98+
10 scientists	QUANTITY	0.98+
Cafe	TITLE	0.97+
amazon	ORGANIZATION	0.97+
Intel	ORGANIZATION	0.97+
Hotdog	ORGANIZATION	0.97+
one startup	QUANTITY	0.97+
MXNet	TITLE	0.97+
22-year-old	QUANTITY	0.96+
WorkDocs	TITLE	0.96+
one	QUANTITY	0.96+
Machine Learning	TITLE	0.95+
this year	DATE	0.95+
one-click	QUANTITY	0.95+
DeepLens	COMMERCIAL_ITEM	0.95+
DeepLens	ORGANIZATION	0.95+
One	QUANTITY	0.94+
Twitter	ORGANIZATION	0.94+

Nayaki Nayyar, BMC Software| AWS re:Invent

>> Narrator: Live from Las Vegas, it's theCUBE, covering AWS re:Invent 2017. Presented by AWS, Intel and our ecosystem of partners. >> Welcome back, we are live here in Las Vegas, located at the Sands. Day three of our coverage here at re:Invent. AWS starting to wrap things up, but still, I think, making a very major statement about the progress they're making in their making in their market. 45,000 plus attendees here, thousands of exhibitors and exhibit space being used here in hundreds of thousands of square footage. Sort of a reflection of the vibrancy of that market. I'm with James Kobielus, who's the lead analyst at Wikibon and we're joined, once again, second appearance on theCUBE in one day, how 'bout that for Nayaki Nayyar, who is the President of Digital Services Management at BMC. Glad to have you back, we appreciate the time. >> Thank you, John, thank you, Jim. Great to be here and I'm becoming a pro at this, right? >> You are. >> My second time of the day. >> We'll punch your card and you win a prize by being on theCUBE more than once a day. >> Twice in four hours, I mean, that's a pretty good track record. >> We'll pick up your toy, you know. >> Tell me about, first off, just your thought about the show in general. I mean, you've been in this environment for some time now, but I'm kind of curious what you think about what you're seeing here and the sense of how this thing's really taking off. >> So, first of all, it's just the energy, the vibe, the fun that we're having here is just amazing. But, I do want to drop to the keynote that Andy did yesterday, it's just phenomenal the pace at which AWS is innovating. Just to be releasing over 1300 features in a year, that is phenomenal. >> James: I think he said innovations in a year. >> Features a year. >> Did he say features, okay. >> Yeah, I think so. But, independent of that, I'm just saying the pace at which, and their model of new stuff that they're bringing to the market is just phenomenal. For customers like us, vendors, it's just phenomenal. >> We hear a lot about, I mean, it's the buzzword, digital transformation and all that. So, what does it really mean to service? What transformation is happening in that, what is that pushing you on that side of the fence to have to be thinking about now? >> You said the word, digital, and sometimes it's very hyper-used. And what we have done at BMC, since our core is service management, we have defined what service management looks like for our customers in this digital age. And we have defined it, because we were primarily in I.T. service management for the last 10-15 years, the future of the service management in this digital world is what we call cognitive service management. Where service management is no longer just reactive, it is proactive and it is also a conversational through various agents like chatbots, or Alexa or virtual agents. So, it's a complete transformation that we are experiencing and we are driving most of that change for our customers right now. >> And, of course, the word cognitive signals the fact that there's some artificial intelligence going on behind the scenes, possibly to drive that conversational UI. With that in mind, I believe that, at BMC, you are one of AWS's partners for Alexa for businesses, is that true? And you're bringing it into an I.T. service management context. That's sounds like an innovation, can you tell us more about that? >> Absolutely, so we announced partnership with AWS on multiple fronts. One of them is with Alexa, Alexa for Business, where we do integrate with Alexa for providing that end user experience. So, Alexa was known for consumer world, my son used it all the time. >> Tell me the temperature? >> But now, we are looking at how we could bring it into the enterprise world, especially to provide service to all employees. So that, you don't actually have to send an email or pick up the phone to call a service agent, now you can actually interact with Alexa or a chatbot to get any service you need. So that's what we call omni-channel experience for providing that experience for end users, employees, customers, partners, anyone. >> So, do you have, right now, any reference customers, it's so new? Or, can you give us a sense for how this capability is working in the field in terms of your testing? Do business people understand, or are they comfortable, with using essentially a consumer appliance as an interface to some serious business infrastructure? Like, being able to report a fault in a server, or whatnot. There's a risk there of bringing in a technology, like a consumer technology, before it's really been accepted as a potential business tool. Tell us how that's working. >> That's a very interesting. We are actually seeing a very fast pace at which customers are adopting it. As we speak, I have three customers I'm working with right now, who not only wants to use a chatbot, or a virtual agent, for providing service, not just to employees but to the end customers, also want to use Alexa inside their company for providing service to their employees. So, it's starting the journey, we already have the integration that is working with Alexa. Customers have gotten very excited about it, they're doing POCs, they're starting their journey. I think in the next couple of years, we'll see a huge uptake with customers wanting to do that across the board. >> Well, give me an example, if I'm working and I need to go to Alexa Business, how deep can I go? What kind of problems can be solved? And then, at what point where does that shut off and then we trip over to the human element? >> James: Don't forget where the A.I. fits in to the picture. If you could just have a little bit of the plumbing, not too much. >> So, let me give you like two segments, one is the experience through Alexa, the second one is, where does deep learning get embedded into the process. So, usually every company has level one, level two service desk agents who are taking the calls, are responding to emails for resetting passwords or fixing foreign issues, laptop issues. So, that level one, level two service desk process is what is being replaced through a chatbot or an Alexa. So, now you can take the routine kind of a task away from having a human respond to it, you can have Alexa or a chatbot respond, do that work. The second piece, for high-complex scenarios, is where it switches. So, being able to automatically switch between an Alexa to a live agent, is where the beauty comes in and how we handle the transition. It has all the historical interaction through the whole journey for the customer. >> But then, Alexa forwards any information it has gained from the conversations- >> That interaction history we call it. >> To a human being who takes it to the next step. >> Nayaki: So when I- >> Can a human kick it back to Alexa at some point? >> No, no, we haven't seen that go back. It's usually, level one, level two is where Alexa takes care and then level three is where the human takes care and goes forward. Now, the second piece, the A.I.-ML piece. In a service management, there are a lot of processes that are very, I would say, routine and very manual. Like, every ticket that comes in, customers have millions of tickets that come in on a periodic basis. Every ticket that comes in, how you assign the ticket to the right individual, log the ticket and categorize the ticket is a very labor intensive and expensive process. So, we are embedding deep learning capabilities into that so we can automate, customers can automate all of those. >> James: Natural language processing, is that? >> With NLP embedded into it. Now, customers can choose to use an NLP engine of their choice, like Watson, or Amazon, or Cortana. And then, that gets fed back into the service management process. >> In fact, that's consistent with what AWS is saying about the whole deep learning space. They are agnostic as to the underlying deep learning framework you use to build this logic, whether it be TensorFlow or MXNet, or whatever. So, what you're saying is very consistent with that sort of open framework for plugging deep learning, or A.I., into the, in this case, the business application. Very good. So, developers within your customer base, what are you doing, BMC, to get developers up to speed on what they'll need to do to build the skills to be able to drive this whole service management workflow? >> So, all this work that we're doing with, what we call these cognitive services, they're all micro services that we are built into our platform. That, not only we are using in our own applications, like in Remedy, like in, what we call digital workplace, but also we have made it available for all the developers, partners, ecosystems, to consume it in their own applications. Just like what Amazon is doing with their micro services strategy, we have micro services for every one of these processes that developers can now consume and build their own special use cases, or use cases that are very unique to their business or to their customers. >> So who, I mean we were talking about this before we started the interview, about invent versus innovation, so, on the innovation side, what's driving that? I mean, are these interactions that you're having with customers and so you're trying to absorb whatever that input is, that feedback? Or, are you innovating almost in a vacuum, or in space a little bit, and are providing tools that you think could get traction? >> No, in fact, no, we are not just dreaming in our labs and saying, "This is what we should go do." (laughing) >> James: Dreaming in our labs. >> That's not where the driver is. What's really happening, independent of the industry, you pick any industry like telcos or financial industry, any industry is going through a major transformation where they are under competitive pressure to provide a service at the highest efficiency, highest speed, at the lowest cost. So if I'm a bank, or if I'm a telco, when a customer calls me and they have an issue, the pace at which I provide the service, the speed, and the cost at which I provide that service, and the accuracy at which I provide that service, is my competitive advantage. So, that is what is actually driving the innovations that we are bringing to market. And, all the three things that I talked about, end user experience through bots or through virtual agents, how we are automating the processes inside the service management, and how we are also providing it for the developers. All these three, create a package for our customers in every one of those industries, to address the speed, the efficiency and the cost for their service management. >> John: Go ahead James. >> At this show, AWS, among their many announcements that are building on their A.I., they have a new product called, and it's related to this, the accuracy, it's called Amazon Comprehend. Which is able to build on Polly, their NLP, their Natural Language Processing, to be able to identify in a natural language, entities like, "Hey, my PC doesn't work "and I think it's the hard drive," those are entities. But, also identify sentiment, whether the customer is very angry, mildly miffed, and so forth. Conceivably, you could use, or your customers could use that information in building out skills that are more fine-grained in terms of handing off to level two or level three support, "Okay, we've identified with a high degree of confidence "that the problem might reside in this particular component "of the system, the customer is really out of joint, "you need to put somebody on this right away." So forth and so on. Any thoughts about possibly using this new functionality within the context of Alexa for Business as you were deploying it at BMC? In the future? Your thoughts? >> Absolutely, in fact that was what I was very excited about that, when they announced that. You know, in an NLP, NLP has been around for many years now and there's been a lot of experiments around NLP. >> The first patent for NLP was like in the late '50s. >> But the maturity of NLP now, and the pace at which, like Amazon, they're innovating is just phenomenal. And the real beauty of it would be, when an NLP engine can really become intelligent when it can understand the sentiment of the customer, when the customer is saying something, it should detect that the customer is angry, happy, or on the edge. We are not there yet, I'm really excited to see the announcements from AWS on the Comprehend side. If they really can deliver on that understanding sentiment, I think it would be phenomenal. >> I don't want to get us off the tracks, but it's a fascinating point. Because, as you know words, in a static environment can be misinterpreted one of 50,000 ways. So, how do you get this A.I. to apply to emotional pitch, tone, agitation? How do you recognize that? >> That is where NLP, the maturity of an NLP, is what's gonna be game changing in the long term. For it to be able to know what the underlying sentiment. >> Anger, excitement, joy, despair, I mean, all those things. "I've had enough," can be said many different ways. >> And that's when we'll switch to a live agent, if it's not able to do it, we will quickly switch to a live agent. (laughing) >> The bot gives up, right? (laughing) >> Or is it emotion threshold where a human being might be the best immediate front-line support. >> Just curious, it's fascinating. Well, thank you for the time, we certainly appreciate that. And, we promise, this'll be it for the day. (laughing) All right, no more CUBE duty. But, we certainly wish you all the best down the road. And, like you, I think we've certainly seen, and have a deeper appreciation for what's happening in this marketplace with what we've seen here this week. It was extraordinary. >> Fascinating. >> Thank you, John, it was a pleasure. And really excited to have two CUBE interviews in a day. >> John: How 'bout that? >> But, I think it's a great forum for us to get our message out and get the world to know what we are doing as BMC and the innovations we're beginning. >> We're excited to talk to real innovators in the business world, so, all power to you. >> Thanks for the time. >> Thank you. >> Nice to meet you. Back with more, we are live here at re:Invent AWS in Las Vegas. Back with more live here on theCUBE right after this break. (upbeat music)

Published Date : Nov 30 2017

SUMMARY :

and our ecosystem of partners. Glad to have you back, we appreciate the time. Great to be here and I'm becoming a pro at this, right? We'll punch your card and you win a prize Twice in four hours, I mean, and the sense of how this thing's really taking off. So, first of all, it's just the energy, the vibe, that they're bringing to the market is just phenomenal. what is that pushing you on that side of the fence in I.T. service management for the last 10-15 years, And, of course, the word cognitive signals the fact Absolutely, so we announced partnership with AWS to get any service you need. as an interface to some serious business infrastructure? So, it's starting the journey, to the picture. the second one is, where does deep learning and categorize the ticket is a very labor intensive into the service management process. to the underlying deep learning framework you use or to their customers. No, in fact, no, we are not just dreaming in our labs inside the service management, and how we are also providing Which is able to build on Polly, their NLP, Absolutely, in fact that was what I was very excited about it should detect that the customer is angry, happy, So, how do you get this A.I. to apply to emotional pitch, For it to be able to know what the underlying sentiment. Anger, excitement, joy, despair, I mean, all those things. if it's not able to do it, we will quickly switch might be the best immediate front-line support. But, we certainly wish you all the best down the road. And really excited to have two CUBE interviews in a day. and the innovations we're beginning. in the business world, so, all power to you. Nice to meet you.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Nayaki Nayyar	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Andy	PERSON	0.99+
Las Vegas	LOCATION	0.99+
second piece	QUANTITY	0.99+
Twice	QUANTITY	0.99+
Nayaki	PERSON	0.99+
second time	QUANTITY	0.99+
three customers	QUANTITY	0.99+
50,000 ways	QUANTITY	0.99+
Alexa	TITLE	0.99+
Cortana	TITLE	0.99+
BMC	ORGANIZATION	0.99+
CUBE	ORGANIZATION	0.99+
two segments	QUANTITY	0.99+
yesterday	DATE	0.98+
one day	QUANTITY	0.98+
One	QUANTITY	0.98+
two	QUANTITY	0.98+
BMC Software	ORGANIZATION	0.98+
Intel	ORGANIZATION	0.98+
second appearance	QUANTITY	0.98+
one	QUANTITY	0.98+
millions of tickets	QUANTITY	0.97+
late '50s	DATE	0.97+
Remedy	TITLE	0.97+
a year	QUANTITY	0.97+
more than once a day	QUANTITY	0.96+
three	QUANTITY	0.96+
over 1300 features	QUANTITY	0.96+
Wikibon	ORGANIZATION	0.96+
Day three	QUANTITY	0.95+
first	QUANTITY	0.95+
Polly	ORGANIZATION	0.95+
TensorFlow	TITLE	0.94+
three things	QUANTITY	0.94+
thousands of exhibitors	QUANTITY	0.94+
Sands	LOCATION	0.93+
level one	QUANTITY	0.93+
hundreds of thousands of square footage	QUANTITY	0.92+
re:Invent AWS	EVENT	0.92+
MXNet	TITLE	0.92+
this week	DATE	0.91+
level two	QUANTITY	0.91+
re:Invent	EVENT	0.91+
first patent	QUANTITY	0.9+
level three	QUANTITY	0.89+
second one	QUANTITY	0.88+
four hours	QUANTITY	0.88+
Alexa for Business	TITLE	0.86+

Day Two Wrap Up | PentahoWorld 2017

>> Narrator: Live from Orlando, Florida it's theCUBE covering PentahoWorld 2017. Brought to you by Hitachi Vantara. >> Welcome back to sunny Orlando everybody. This is theCUBE, the leader in live tech coverage, and this is our second day covering PentahoWorld 2017. theCUBE was here in 2015 when Pentaho had just been recently acquired by Hitachi. We then, let's see, around September timeframe we saw Hitachi rebrand, Hitachi Data Systems rebrand as Hitachi Vantara, bringing together three components of its business, the Hitachi Data Systems business, the Hitachi Insights business, and of course, the Pentaho Analytics platform. We heard yesterday from Brian Householder, the president and COO of Hitachi Vantara, what the strategy was. I thought he was a very crisp, clear presenter. The strategy made a lot of sense, it resonated. Obviously a lot of execution to be done. And then subsequently at the last two days we've heard largely from Pentaho practitioners who are applying this end to end analytics platform to really transform their businesses, to really become data driven supporting those digital transformations. So pretty positive story overall. A lot of work to be done. We got to see how this whole edge to outcome plays out. Sounds good. There's got to be some execution there. We got to see the ecosystem grow for sure. These guys got a great story. This conference should explode. >> It's really a validation for Pentaho. They've been on the market for more than a decade now as the spearhead for the open source analytics revolution in business analytics, and in predictive modeling, and in data integration, all of it open source. And they've come very far and they're really a blue chip solution program. I think this show has been a great validation of Pentaho's portfolio presence in the market. Now Hitachi Vantara has a gem of a core asset. Clearly, the storage market, the data center converged infrastructure, the core Hitachi Data Systems product lines, are starting to experience the low growth that such a mature space experiences. And clearly they're placing a strong bet on Hitachi Vantara that the IoT, that the edge analytics market, will just boom wide open. Hitachi Insight Group, which was only created last year by their corporate parent, was chartered to explore opportunities in IoT. They've got the Lumata platform. They had, Hitachi Next, their conference last month, focused on IoT. I think that's really the capstone, the Lumata portfolio, in this overall story. Now, I think what we're hearing this week is that great, they've got the components, the building blocks, of potential growth, but I don't think they're going to be able to achieve takeoff growth until such time, Hitachi Vantara, they have a stronger, more credible reach out to the developer community, specifically the developers who are building the AI and machine learning for deployment to the edge. That will require to have credibility in that space. Clearly it's going to have to be the new set of frameworks, such as TensorFlow, and MXNet, and Fee-an-o, and so forth. They're going to need some sort of a modeling framework or abstraction from it that sits on top of the Pentaho platform or really across all of their offerings, including Lumata, and enables a developer to using, the mainstream application developer to use code, whether it be Python or R or Java, whatever, to build the deep learning and AI models at the highest level of abstraction, the business level of abstraction, then to automatically compile those models, which are computational graphs, down to formats that are optimized and efficient to run on devices of all sorts, chip sets of all sorts, that are increasingly resource constrained. They're not there yet. I'm not hearing that overall developer story at this show. I think they've got a lot of smart people, including Brian, pushing them in that direction. Hopefully next year's PentahoWorld or however they may rebrand this show, I think they'll probably have more of that put together, but we'll keep on waiting to see. >> And that's something that I pushed on a little bit this week. In particular, that requires a whole new go to market where the starting point is developers and then you're nurturing those developers. And certainly Pentaho has experience with community editions, but that was more to get enterprise buyers to kind of try before they buy. As you know well, Jim, the developer community is, they're very fickle, they're persnickety, they're demanding, and they're super smart, and they can be your best advocates or they'll just ignore you. That's just kind of the way it is with developers. And if you can appeal to them you can get a foothold in markets. We've seen it. Look at what Microsoft has done, look at what Amazon has done, certainly Docker, you know, on and on and on. >> Community marketing that's full bore (mumbles) user groups, developer days, hackathons, the whole nine yards, I'm not seeing a huge emphasis on community marketing in that really evangelistic sense. They need to go there seriously. They need to win the hearts and minds of the next generation developer, the next generation developer who actually won't care about whether it's TensorFlow backends or the other ones. What they will care is the high level framework, and really a collaborative framework, that's a solution provider gives them for their teams to collaborate on building and training and deploying all this stuff. I'm not hearing from this solution provider, devops really, here this year. Hopefully in the coming years there will be. Other vendors are a bit further along than they are. We see a bit further along IBM is. We see a bit further along like Cloudera and others are in putting together really a developer friendly ecosystem of components within a broader data lake framework. >> Yeah, and that's not been the historical Pentaho DNA. However, as you know, to reach out, have a community effort to reach out to developers requires resources and commitment, and it's not a one shot deal. But, it also requires a platform, and what we're seeing today is the formation of that. The reformation of Hitachi into Hitachi Vantara with a lot of resources that has a vision of a platform, of which Pentaho is a critical component, but it's going to take a lot of effort, a lot of cultivating. I presume they're having those conversations internally. They're not ready to have them externally, which is I presume why they're not having them. But that's something that we're going to certainly watch for in the coming years. What else? You gave a talk this afternoon. >> Yeah, AI is Eating the Edge, and it was well received. In fact, when I prepared my thoughts and my research about a month ago for this event I was thinking, "Am I way too far ahead?" This is Pentaho. I've been of course familiar with them since their inception. I thought, "Are there other users? "Are there developers? "Is their community going deep into AI "and all the IoT stuff?" And the last day or so here at this event it's like, "Whoa, everybody here is into that. "They know this stuff." So, not only was I relieved that I wouldn't have to explain the ABCs of all that, they were ahead of me in terms of the questions I got. The questions are, once again, what framework should we adopt for AI, the whole TensorFlow, all those framework wars, which I think are sort of overblown and they will be fairly soon, it'll be irrelevant, but those kinds of questions. Those are actually developer level questions that people are just here and they're coming to me with. >> Well, you know, I tell you, I'm no expert in frameworks, but my advice would be whatever framework you adopt you're probably not going to be using that same framework down the road. So you have to be flexible as an organization. A lot of technical leaders tell me this is look, technology is going to come and it's going to go. We got to have great people. We've got to be able to respond to the market requirements. We have to have processes that allow us to be proactive and responsive, and that your choice of framework should ensure that it doesn't constrict you in those areas. >> And you know, the framework that actually appeals to this crowd, including the people in my room, it's a wiki bot framework, it's also what Brian Hopkins of Forrester presented, the three tier architecture. There's the edge devices. There are the gateways or hubs. There's the cloud. We call them primary, secondary, tertiaries. Whatever you call them, you put different data, you put different analytics on each of those tiers. And then really in many ways in a modular fashion then you begin to orchestrate with Kubernetes and so forth these AI infused apps and these distributed architectures, like self driving vehicles or whatever. And the buzz I've been getting here, including in my session, everybody is saying, "Yeah, that's exactly the way to go." In other words, thinking in those terms prevents you as a developer from thinking that AI has to be some monolithic frigging stack on one single node. No, it actually has to be massively parallel and distributed, because these are potentially very compute intensive applications. I think there's a growing realization in the developer community that when you're talking about developing AI you're really talking about developing two core workloads. There's the inferencing, which is where the magic happens in terms of predictions and classifications, but even more resource consumptive is the training that has to happen in the cloud, and that's data, that's exabytes, petabytes intensive potentially. That's compute intensive. Very different workload. That definitely needs to happen in the cloud primarily. There's a little bit of federated training that goes out to the edge, but that's really the exception right now. So there's a growing realization in the developer community that boy, we better get a really good platform for training. And actually they could leverage, we've seen it in our research of wiki bot is that, many AI developers, many deep learning developers, actually leverage their Spark clusters for training of TensorFlow and so forth, because of in memory massive parallelism, so forth and so on. I think there will be a growing realization in the developer community that the investments they've been making in Hadoop and Spark will just be leveraged for this growing stack, for training if nothing else. >> Well, in 8.0 that was sort of the big buzz here. And you and I talked at the open with Rebecca, our other co-host, about 8.0 A lot of incremental improvements. But you know what, in talking to customers that's kind of what they want. They want Pentaho to do a good job of incorporating, curating, open source content, open source platforms and products, bringing them into their system, and making sure that their customers can take advantage of them. That's what they consistently kept asking for. They weren't freaked out about lack of AI and lack of deep learning and ML and Weka is fine. Now maybe it's a blind spot, I don't know. >> No, no, actually I've had 24 hours since they announced to chew on it. In fact, I have a SiliconANGLE article going up fairly soon with essentially my trip report and my basic takeaway. And actually what I like about 8.0 is that it focuses on streaming, bringing open source analytic streaming more completely into the Pentaho data integration platform, in other words, their stronger interoperability with Spark streaming, with Kafka, and so forth, but also they have the ability within 8.0 to better match realtime streaming workloads to execution engines in a distributed fabric. In other words, what I think that represents not only in terms of Hitachi Vantara's portfolio, but in terms of where the industry is going with all things to do with big data applications whether or not they involve AI is streaming is coming into the mainstream, pun intended, and data at rest platforms are starting to become marginalized in a lot of applications. In other words, Hadoop is data at rest par excellence, so are a fair number of other no SQL platforms. Those are not going away. Those are the core of your data lakes. But most development is being developed now, most AI and machine learning is being developed for streaming environments that increasingly are edge oriented. So Pentaho, Hitachi Vantara, for 8.0 have put in the right incremental features for the market that lies ahead. So in many ways I think that was actually a well thought out release for this particular event. >> Great. Okay, some of the highlights here. We had a lot of different industries, gaming, we had experts on autonomous vehicles, we had the NASDAQ guys on, that was a very interesting segment, the German police interview you did, the chief data officer of community colleges in Indiana. So, a lot of diversity, which underscores the platformness of Pentaho. It's not some industry specific system. It is a horizontal capabilities platform. Final thoughts on the show, some interesting things that you saw, things you learned? >> Yeah, on the show itself, they did a really good job. Hitachi Vantara, of course it's a new brand, but it's an old company, and it's even an old established set of product teams that have come together in a hurry essentially, though it's really been two years since the acquisition. They did a really good job of presenting a unified go to market message. That's a good start They've done a good job of the fact that they had these two shows in a rapid sequence, Hitachi Next, which was IoT and Lumata, but it was Hitachi Vantara, and now this one where it's all data analytics. The fact that here in the peak of fall event season they had these two shows really highlighting their innovations and their romance for those two core of their portfolio, and have done a good job of positioning themselves in each case, that shows that the teams are orchestrating well in terms of at least go to market presenting their value prop. I think in terms of the actual, we've had a lot of great customer and partner interviews on this show. And I think, you mentioned gaming first, I wasn't actually on the gaming related CUBE interview, but gaming is a hot, of course it's a hot, hot market for AI increasingly. A lot of AI that gets developed now for lots of applications involves simulations of whatever scenario you're building, including like autonomous vehicles. So gaming is in many ways a set of practices that are well established and mature that are becoming fundamental to development of all AI, because you're developing synthetic data based on simulation environments. The fact that Hitachi Vantara has strong presence as a data provider in the gaming market I think in many ways indicates that they've got ... It's a crowded marketplace. They have much larger competitors and deeper pocketed, but I think the fact is they've got all the piece parts needed to be a roaring success in this new era, and they've got strong and very loyal customers I'm discovering, not discovering, I've known this all along. But, since I've rejoined the analysts' space it's been revalidated that Pentaho how strong in blue chip they are. Now that they're a new brand in a new era, they're turning themselves around fairly well. I don't think that they'll be isolated by ... Clearly, I mean, with AI ... AI right now belongs to AWS and Microsoft and Google and IBM to some degree. We have to recognize that the Hitachi Vantaras of the world right now are still a second tier in that arena. They probably have to hitch their wagon to at least one of those core cloud providers as a core partner going forward to really prevail. >> Dave: Which they can do. >> Yeah, they can do. >> Alright. Jim, thanks very much for closing with me. Thanks to you all for watching. theCUBE puts out a lot of content. You can go to SiliconAngle.com to see all the news. theCUBE.net is where we host all these videos. Wikibon.com is our research site, so check that out, as well. We've got CrowdChats going on, CrowdChat.net. It's just unbelievable. >> Unbelievable. >> Rush of content. We're all about the data, we're all about sharing, so check those sites out. Thanks very much to the crew here. Great job. And next week a lot going on. We're in New York City. We've got some stuff going on there. Want to thank our sponsor, without whom this show, this CUBE show, would not be possible, Hitachi Vantara slash Pentaho. >> Thank you to sunny Orlando. It's great and wonderful. >> This has been theCUBE at PentahoWorld 2017. We'll see you next time. Thanks for watching. (techno music)

Published Date : Oct 27 2017

SUMMARY :

Brought to you by Hitachi Vantara. and of course, the Pentaho Analytics platform. the mainstream application developer to use code, That's just kind of the way it is with developers. of the next generation developer, Yeah, and that's not been the historical Pentaho DNA. that people are just here and they're coming to me with. that same framework down the road. that has to happen in the cloud, and making sure that their customers all things to do with big data applications the German police interview you did, The fact that here in the peak of fall event season Thanks to you all for watching. We're all about the data, Thank you to sunny Orlando. We'll see you next time.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Hopkins	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Jim	PERSON	0.99+
Brian Householder	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Indiana	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
2015	DATE	0.99+
last year	DATE	0.99+
New York City	LOCATION	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
Pentaho	ORGANIZATION	0.99+
24 hours	QUANTITY	0.99+
Hitachi Data Systems	ORGANIZATION	0.99+
two shows	QUANTITY	0.99+
last month	DATE	0.99+
two years	QUANTITY	0.99+
yesterday	DATE	0.99+
Python	TITLE	0.99+
Java	TITLE	0.99+
Hitachi Insight Group	ORGANIZATION	0.99+
each case	QUANTITY	0.99+
Lumata	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
Forrester	ORGANIZATION	0.99+
next year	DATE	0.99+
second day	QUANTITY	0.99+
next week	DATE	0.99+
NASDAQ	ORGANIZATION	0.98+
two core	QUANTITY	0.98+
Spark	TITLE	0.98+
this week	DATE	0.98+
theCUBE	ORGANIZATION	0.97+
R	TITLE	0.97+
three tier	QUANTITY	0.97+
this year	DATE	0.97+
second tier	QUANTITY	0.97+
Orlando	LOCATION	0.96+
Hitachi Insights	ORGANIZATION	0.96+
8.0	QUANTITY	0.96+
more than a decade	QUANTITY	0.94+
theCUBE.net	OTHER	0.94+
first	QUANTITY	0.94+
PentahoWorld 2017	EVENT	0.94+
one shot	QUANTITY	0.93+
each	QUANTITY	0.92+
this afternoon	DATE	0.92+
Kafka	TITLE	0.91+
today	DATE	0.91+
Cloudera	ORGANIZATION	0.9+
three components	QUANTITY	0.89+
Hadoop	TITLE	0.89+
a month ago	DATE	0.89+
TensorFlow	TITLE	0.87+
wiki	TITLE	0.86+
German	OTHER	0.85+
MXNet	TITLE	0.85+

Day One Wrap | PentahoWorld 2017

>> Announcer: Live from Orlando, Florida. It's TheCUBE covering PentahoWorld 2017. Brought to you by Hitachi Ventara. >> Welcome back to TheCUBE's live coverage of PentahoWorld brought to you by Hitachi Ventara, we are wrapping up day one. I'm your host Rebecca Knight along with my cohosts today James Kobielus and Dave Vellante. Guys, day one is done what have we learned? What's been the most exciting thing that you've seen at this conference? >> The most exciting thing is that clearly Hitachi Ventara which of course, Pentaho is a centerpiece is very much building on their strong background and legacy and open analytics, and pushing towards open analytics in the Internet of things, their portfolio, the whole edge to outcome theme, with Brian Householder doing a sensational Keynote this morning, laying out their strategic directions now Dave had a great conversation with him on TheCUBE earlier but I was very impressed with the fact that they've got a dynamic leader and a dynamic strategy, and just as important Hitachi, the parent company, has clearly put together three product units that make sense. You got strong data integration, you got a strong industrial IOT focus, and you got a really strong predictive and machine learning capability with Pentaho for the driving the entire pipeline towards the edge. Now that to me shows that they've got all the basic strategic components necessary to seize the future, further possibilities. Now, they brought a lot of really good customers on, including our latest one from IMS, Hillove, to discuss exactly what they're doing in that area. So I was impressed with the amount of solid substance of them seizing the opportunity. >> Well so I go back two years, when TheCUBE first did PentagoWorld 2015, and the story then was pretty strong. You had a company in big data, they seemingly were successful, they had a lot of good customer references, they achieved escape velocity, and had a nice exit under Quentin Galavine, who was the CEO at the time and the team. And they had a really really good story, I thought. But I was like okay, now what? We heard about conceptually we're going to bring the industrial internet and analytics together, and then it kind of got quiet for two years. And now, you're starting to see the strategy take shape in typical Hitachi form. They tend not to just rush in to big changes and transformations like this, they've been around for a long time, a very thoughtful company. I kind of look at Hitachi limited in a way, as an IBM like company of Japan, even though they do industrial equipment, and IBM's obviously in a somewhat different business, but they're very thoughtful. And so I like the story the problem I see is not enough people know about the story. Brian was very transparent this morning, how many people do business with Hitachi? Very few. And so I want to see the ecosystem grow. The ecosystem here is Hitachi, a couple of big data players, I don't see any reason why they can't explode this event and the ecosystem around Hitachi Ventara, to fulfill it's vision. I think that that's a key aspect of what they have to do. >> I want to see-- >> What will be the tipping point? Just to get as you said, I mean it's the brand awareness, and every customer we had on the show really said, when he when he said that my eyes lit up and I thought oh wow, we could actually be doing more stuff with Hitachi, there's more here. >> I want to see a strong developer focus, >> Yeah. >> Going forward, that focuses on AI and deep learning at the at the edge. I'm not hearing a lot of that here at PentahoWorld, of that rate now. So that to me is a strategic gap right now and what they're offering. When everybody across the IT and data and so forth is going real deep on things like frameworks like TensorFlow and so forth, for building evermore sophisticated, data driven algorithms with the full training pipeline and deployment and all that, I'm not hearing a lot of that from the Pentaho product group or from the Hitachi Ventara group here at this event. So next year at this event I would like to hear more of what they're doing in that area. For them to really succeed, they're going to have to have a solid strategy to migrate up there, openstack to include like I said, a bit of TensorFlow, MXNet, or some of the other deep learning tool kits that are becoming essentially defacto standards with developers. >> Yeah, so I mean I think the vision's right. Many of the pieces are in place, and the pieces that aren't there, I'm actually not that worried about, because Hitachi has the resources to go get them, either build them organically, which has proven it can do overtime, or bring in acquisition. Hitachi is a decent acquire of companies. Its content platform came in on an acquisition, I've seen them do some hardware acquisitions, some have worked, some haven't. But there's a lot of interesting software players out there and I think there's some values, frankly. The big data, tons of money poured in to this open source world, hard to make money in opensource, which means I think companies like Hitachi could pick off to do some M and A and find some value. Personally, I think if the numbers right at a half a billion dollars, I personally think that that was pretty good value for Hitachi. You see in all these multi billion dollar acquisitions going left and right. And so the other thing is the fact that Hitachi under the leadership under Brian Householder and others, was able to shift its model from 80% hardware, now it's 50/50 software and services I'd like to dig into that a little bit. They're a public company but you can't really peel the onion on the Hitachi Ventara side, so it kind of is what they say it is, I would imagine that's a lot of infrastructure software, kind of like EMC's a software company. >> James: Right. >> But nonetheless, they're moving towards a subscription model, they're committed to that, and I think that the other thing is that a lot of costumers. We come to a lot of shows and they struggle to get costumers on with substantive stories, so we heard virtually every costumer we talked to today is like Here's how I'm using Pentaho, here's how it's affecting. Not like super sexy stories yet, I mean that's what the IOT and the edge piece come in, but fundamental plumbing around big data, Pentaho seems like a pretty important piece of it. >> Their fundamental-- >> Their fundamental plumbing that's really saving them a lot of money too, and having a big ROI. >> They're fairly blue-chip as a solution provider of a full core data of a portfolio of Pentaho. I think of them in many ways as sort of like SAP, not a flashy vendor, but a very much a solid blue-chip in their core markets >> Right. >> I'm just naming another vendor that I don't see with a strong AI focus yet. >> Yeah. >> Pentaho, nothing to sneeze at when you have one customer after another like we've had here, rolling out some significant work they've been doing with Pantaho for quite a while, not to sneeze at their delivering value but they have to rise to the next level of value before long, to avoid be left in the dust. >> You got this data obviously they're going to be capturing more more data with the devices. >> James: Yeah. >> And The relationship with Hitachi proper, the elevator makers is still a little fuzzy to me, I'm trying to understand how that all shakes up, but my question for you Jim is: okay so let's assume for second they're going to have this infrastructure in place because they are industrial internet, and they got the analytics platform, maybe there's some holes that they can fill in, one being AI and some of the deep learning stuff, can't they get that somewhere? I mean there's so much action going on-- >> Yes. >> In the AI world, can't they bring that in and learn how to apply it overtime? >> Of course they can. First of all they can acquire and tap their own internal expertise. They've got like Mark Hall for example on the panel, they've obviously got a deep bench of data scientist like him who can take it to that next level, that's important. I think another thing that Hitachi Ventara needs to do to take it to the next level is they need a strong robotics portfolio. It's really talking about industrial internet of things, it's robotics with AI inside. I think they're definitely a company that could go there fairly quickly, a wide range of partners they can bring in or acquire to get fairly significant in terms of not just robotics in general, but robotics for a broad range of use cases where the AI is not so much the supervise learning and stuff that involves training, but things like reinforcement learning, and there's a fair amount of smarts and academe on Reinforcement learning for in body cognition, for robots, that's out there in terms of that's like the untapped space other than the broad AI portfolio, reinforcement learning. If somebody's going to innovate and differentiate themselves in terms of the enterprise, in terms of leveraging robotics in a variety of applications, it's going to to be somebody with a really strong grounding and reinforcement learning and productizing that and baking that in to an actual solution portfolio, I don't see yet the Google's and the IBM's and the Microsofts going there, and so if these guys want to stand out, that's one area they might explore. >> Yeah, and I think to pick up on that, I think this notion of robotics process automation, that market's going to explode. We were at a conference this week in Boston, the data rowdy of Boston, the chief data officer conference at the Park Plaza, 20 to 25% of the audiences, the CDO's in the audience had some kind of RPA, robotic process automation, initiative going on which I thought was astoundingly high. And so it would seem to me that Hitachi's going to be in a good position to capture all that data. The other thing that Brian stressed, which a lot of companies without a cloud will stress, is that it's your data, you own the data, we're not trying to resell that data, monetize that data, repackage that data. I pushed him a little bit on well what about that data training models, and where do those models go? And he says Look we are not in the business of taking models and you know as a big consultancy, and bringing it over to other competitors. Now Hitachi does have consultancy, but it's sort of in a focus, but as Brian said in his keynote, you have to listen to what people say and then watch them to see how they act. >> Rebecca: Do they walk the walk? >> How they respond. >> Right. >> And so that's you have to make your decision, but I do think that's going to be a very interesting field to watch because Hitachi's going to have so much data in their devices. Of course they're going to mine that data for things like predictive analytics, those devices are going to be in factories, they're going to be in ecosystems, and there's going to be a battle for who owns the data, and it's going to be really interesting to see how that shakes out. >> So I want to ask you both, as you've both have said, we've had a lot of great customer stories here on TheCUBE today. We had a woman who does autonomous vehicles, we had a gamer from Finland, we had a benefit scientist out of Massachusetts, Who were your favorite customer stories and what excited you most about their stories? >> James: Hmmm. >> Well I know you like the car woman. >> Well, yeah the car woman, >> The car woman. >> Ella Hillel. >> Ella Hillel, Yes. >> The PHD. That was really what I found many things fascinating, I was on a panel with Ella as well as she was on TheCUBE, what I found interesting I was expecting her to go to town on all things autonomous driving, self driving vehicles, and so forth, was she actually talked about the augmentation of the driver, passenger experience through analytics, dashboards in the sense that dashboards that help not only drivers but insurance companies and fleet managers, to do behavioral modification to help them modify the behavior, to get the most out of their vehicular experience, like reducing wear and tear on tires, and by taking better roads, or revising I thought that's kind of interesting; build more of the recommendation engine capability into the overall driving experience. That depends on an infrastructure of predictive analytics and big data, but also metered data coming from the vehicle and so forth. I found that really interesting because they're doing work clearly in that area, that's an area that you don't need levels one through five of self driving vehicles to get that. You can get that at any level of that whole model, just by bringing those analytics somehow into an organic way hopefully safely, into your current driving experience, maybe through a heads-up display that's integrated through your GPS or whatever might be, I found that interesting because that's something you could roll out universally, and it can actually make a huge difference in A: safety, B: people's sort of pleasure with the driving experience, Fahrvergnugen that's a Volkswagon, and then also see how people make the best use of their own vehicular assets in an era where people still mostly own their own car. >> Well for me if there's gambling involved-- >> Rebecca: You're there. >> It was the gaming, now not only because of the gambling, and we didn't find out how to beat the house Leonard, maybe next time, but it was confirmation of the three-tier data model from from edge-- >> James: Yes. >> To gateway to cloud, and that the cloud is two vectors; the on-premise and the off-premise cloud, and the fact that as a gaming company who designs their own slot machines it's an edge device, and they're basically instrumenting that edge device for real-time interactions. He said that most of the data will go back, I'm not sure. Maybe in that situation it might, maybe all the data will go back like weather data, it all comes back, But generally speaking I think there's going to be a lot of analog data at the edge that's going to be digitize that maybe you don't have to save and persist. But anyway, confirmation of that three-tiered data model I think is important because I think that is how Brian talked about it, we all know the pendulum is swinging, swung away from mainframe to decentralize back to the centralized data center and now it's swinging again to a much more distributed sort of data architecture. So it was good to hear confirmation of that, and I think it's again, it's really early innings in terms of how that all shakes out. >> Great, and we'll know more tomorrow at Pentaho day two, and I look forward to to being up here again with both of you tomorrow. >> Likewise. >> Great, this has been TheCUBE's live coverage of PentahoWorld brought to you by Hitachi Ventara, I'm Rebecca Knight for Jim Kobielus and Dave Vellante, we'll see you back here tomorrow.

Published Date : Oct 27 2017

SUMMARY :

Brought to you by Hitachi Ventara. brought to you by Hitachi Ventara, Now that to me shows that they've got PentagoWorld 2015, and the Just to get as you said, So that to me and the pieces that aren't there, and they struggle to get costumers on with a lot of money too, and having a big ROI. I think of them in many with a strong AI focus yet. have to rise to the next level they're going to be capturing and baking that in to Yeah, and I think to pick up on that, and there's going to be a So I want to ask you both, build more of the and that the cloud is two vectors; and I look forward to to you by Hitachi Ventara,

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Mark Hall	PERSON	0.99+
Google	ORGANIZATION	0.99+
Quentin Galavine	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Ella	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Ella Hillel	PERSON	0.99+
Massachusetts	LOCATION	0.99+
Finland	LOCATION	0.99+
Leonard	PERSON	0.99+
Jim	PERSON	0.99+
Brian Householder	PERSON	0.99+
Boston	LOCATION	0.99+
Microsofts	ORGANIZATION	0.99+
Pentaho	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
three-tier	QUANTITY	0.99+
tomorrow	DATE	0.99+
Orlando, Florida	LOCATION	0.99+
five	QUANTITY	0.99+
Hitachi Ventara	ORGANIZATION	0.99+
IMS	ORGANIZATION	0.99+
Park Plaza	LOCATION	0.99+
Pantaho	ORGANIZATION	0.99+
20	QUANTITY	0.99+
both	QUANTITY	0.99+
two vectors	QUANTITY	0.99+
next year	DATE	0.99+
this week	DATE	0.98+
TheCUBE	ORGANIZATION	0.98+
Hillove	ORGANIZATION	0.98+
today	DATE	0.98+
25%	QUANTITY	0.98+
one customer	QUANTITY	0.97+
first	QUANTITY	0.97+
day one	QUANTITY	0.97+
day two	QUANTITY	0.96+
half a billion dollars	QUANTITY	0.95+
one	QUANTITY	0.95+
SAP	ORGANIZATION	0.95+
Japan	LOCATION	0.93+
PentahoWorld	ORGANIZATION	0.92+
50/50	QUANTITY	0.92+
TensorFlow	TITLE	0.92+

Day One Kickoff | PentahoWorld 2017

>> Narrator: Live from Orlando, Florida, its theCUBE. Covering Pentaho World 2017. Brought to you by Hitachi Vantara. >> We are kicking off day one of Pentaho World. Brought to you, of course, by Hitachi Vantara. I'm your host, Rebecca Knight, along with my co-hosts. We have Dave Vellante and James Kobielus. Guys I'm thrilled to be here in Orlando, Florida. Kicking off Pentaho World with theCUBE. >> Hey Rebecca, twice in one week. >> I know, this is very exciting, very exciting. So we were just listening to the key notes. We heard a lot about the big three, the power of the big three. Which is internet of things, predictive analytics, big data. So the question for you both is where is Hitachi Vantara in this marketplace? And are they doing what they need to do to win? >> Well so the first big question everyone is asking is what the heck is Hitachi-Vantara? (laughing) What is that? >> Maybe we should have started there. >> We joke, some people say it sounds like a SUV, Japanese company, blah blah blah. When we talked to Brian-- >> Jim: A well engineered SUV. >> So Brian Householder told us, well you know it really is about vantage and vantage points. And when you listen to their angles on insights and data, anywhere and however you want it. So they're trying to give their customers an advantage and a vantage point on data and insights. So that's kind of interesting and cool branding. The second big, I think, point is Hitachi has undergone a massive transformation itself. Certainly Hitachi America, which is really not a brand they use anymore, but Hitachi Data Systems. Brian Householder talked in his keynote, when he came in 14 years ago, Hitachi was 80 percent hardware, and infrastructure, and storage. And they've transformed that. They're about 50/50 last year. In terms of infrastructure versus software and services. But what they've done, in my view, is taken now the next step. I think Hitachi has said, alright listen, storage is going to the cloud, Dell and EMC are knocking each others head off. China is coming in to play. Do we really want to try and dominate that business? Rather, why don't we play from our strengths? Which is devices, internet of things, the industrial internet. So they buy Pentaho two years ago, and we're going to talk more about that, bring in an analytics platform. And this sort of marrying IT and OT, information technology and operation technology, together to go attack what is a trillion dollar marketplace. >> That's it so Pentaho was a very strategic acquisition. For Hitachi, of course, Hitachi data system plus Hitachi insides, plus Pentaho equals Hitachi Vantara. Pentaho was one of the pioneering vendors more than a decade ago. In the whole open source analytics arena. If you cast your mind back to the middle millennium decade, open source was starting to come into its own. Of course, we already had Linux an so forth, but in terms of the data world, we're talking about the pre-Hadoop era, the pre-Spark era. We're talking about the pre-TensorFlow era. Pentaho, I should say at that time. Which is, by the way, now a product group within Hitachi Vantara. It's not a stand alone company. Pentaho established itself as the spearhead for open-source, predictive analytics, and data mining. They made something called Weka, which is an open-source data mining toolkit that was actually developed initially in New Zealand. The core of their offering, to market, in many ways became very much a core player in terms of analytics as a service a so forth, but very much established themselves, Pentaho, as an up and coming solution provider taking a more or less, by the book, open source approach for delivering solutions to market. But they were entering a market that was already fairly mature in terms of data mining. Because you are talking about the mid-2000's. You already had SaaS, and SPSS, and some of the others that had been in that space. And done quite well for a long time. And so cut ahead to the present day. Pentaho had evolved to incorporate some fairly robust data integration, data transformation, all ETL capabilities into their portfolio. They had become a big data player in their own right, With a strong focus on embedded analytics, as the keynoters indicated this morning. There's a certain point where in this decade it became clear that they couldn't go it any further, in terms of differentiating themselves in this space. In a space that dominated by Hadoop and Spark, and AI things like TensorFlow. Unless they are part of a more diversified solution provider that offered, especially I think the critical thing was the edge orientation of the industrial internet of things. Which is really where many of the opportunities are now for a variety of new markets that are opening up, including autonomous vehicles, which was the focus of here all-- >> Let's clarify some things a little bit. So Pentaho actually started before the whole Hadoop movement. >> Yeah, yeah. >> That's kind of interesting. You know they were young company when Hadoop just started to take off. And they said alright we can adopt these techniques and processes as well. So they weren't true legacy, right? >> Jim: No. >> So they were able to ride that sort of modern wave. But essentially they're in the business of data, I call it data management. And maybe that's not the right term. They do ingest, they're doing ETL, transformation anyway. They're embedding, they've got analytics, they're embedding analytics. Like you said, they're building on top of Weka. >> James: In the first flesh and BI as a hot topic in the market in the mid-200's, they became a fairly substantial BI player. That actually helped them to grow in terms of revenue and customers. >> So they're one of those companies that touches on a lot of different areas. >> Yes. >> So who do we sort of compare them to? Obviously, what you think of guys like Informatica. >> Yeah, yeah. >> Who do heavy ETL. >> Yes. You mentioned BI, you mentioned before. Like, guys like Saas. What about Tableau? >> Well, BBI would be like, there's Tableau, and ClickView and so forth. But there's also very much-- >> Talend. >> Cognos under IBM. And, of course, there's the business objects Portfolio under SAP. >> David: Right. And Talend would be? >> In fact I think Talend is in many ways is the closest analog >> Right. >> to Pentaho in terms of predominatly open-source, go to market approach, that involves both the robust data integration and cleansing and so forth from the back end. And also, a deep dive of open source analytics on the front end. >> So they're differentiation they sort of claim is they're sort of end to end integration. >> Jim: Yeah. >> Which is something we've been talking about at Wikibon for a while. And George is doing some work there, you probably are too. It's an age old thing in software. Do you do best-of-breed or do you do sort of an integrated suite? Now the interesting thing about Pentaho is, they don't own their own cloud. Hitachi Vantara doesn't own their own cloud. So they do a lot of, it's an integrated pipeline, but it doesn't include its own database and other tooling. >> Jim: Yeah. >> Right, and so there is an interesting dynamic occurring that we want to talk to Donna Perlik about obviously, is how they position relative to roll your own. And then how they position, sort of, in the cloud world. >> And we should ask also how are they positioning now in the world of deep learning frameworks? I mean they don't provide, near as I know, their own deep learning frameworks to compete with the likes of TensorFlow, or MXNet, or CNT or so forth. So where are they going in that regard? I'd like to know. I mean there are some others that are big players in this space, like IBM, who don't offer their own deep learning framework, but support more than one of the existing frameworks in a portfolio that includes much of the other componentry. So in other words, what I'm saying is you don't need to have your own deep learning framework, or even open-source deep learning code-based, to compete in this new marketplace. And perhaps Pentaho, or Hitachi Vantara, roadmapping, maybe they'll take an IBM like approach. Where they'll bundle support, or incorporate support, for two or more of these third party tools, or open source code bases into their solution. Weka is not theirs either. It's open source. I mean Weka is an open source tool that they've supported from the get go. And they've done very well by it. >> It's just kind of like early day machine leraning. >> David: Yeah. >> Okay, so we've heard about Hitachi's transformation internally. And then their messaging today was, of course-- >> Exactly, that's where I really wanted to go next was we're talking about it from the product and the technology standpoint. But one of the things we kept hearing about today was this idea of the double bottom line. And this is how Hitachi Vantara is really approaching the marketplace, by really focusing on better business, better outcomes, for their customers. And obviously for Hitachi Vantara, too, but also for bettering society. And that's what we're going to see on theCUBE today. We're going to have a lot of guests who will come on and talk about how they're using Pentaho to solve problems in healthcare data, in keeping kids from dropping out of college, from getting computing and other kinds of internet power to underserved areas. I think that's another really important approach that Hitachi Vantara is taking in its model. >> The fact that Hitachi Vantara, I know, received Pentaho Solution, has been on the market for so long and they have such a wide range of reference customers all over the world, in many vertical. >> Rebecca: That's a great point. >> The most vertical. Willing to go on camera and speak at some length of how they're using it inside their business and so forth. Speaks volumes about a solution provider. Meaning, they do good work. They provide good offerings. They're companies have invested a lot of money in, and are willing to vouch for them. That says a lot. >> Rebecca: Right. >> And so the acquisition was in 2015. I don't believe it was a public number. It's Hitachi Limited. I don't think they had to report it, but the number I heard was about a half a billion. >> Jim: Uh-hm >> Which for a company with the potential of Pentaho, is actually pretty cheap, believe it or not. You see a lot of unicorns, billion dollar plus companies. But the more important thing is it allows Hitachi to further is transformation and really go after this trillion dollar business. Which is really going to be interesting to see how that unfolds. Because while Hitachi has a long-term view, it always takes a long-term view, you still got to make money. It's fuzzy, how you make money in IOT these days. Obviously, you can make money selling devices. >> How do you think money, open source anything? You know, so yeah. >> But they're sort of open source, with a hybrid model, right? >> Yeah. >> And we talked to Brian about this. There's a proprietary component in there so they can make their margin. Wikibon, we see this three tier model emerging. A data model, where you've got the edge in some analytics, real time analytics at the edge, and maybe persists some of that data, but they're low cost devices. And then there's a sort of aggregation point, or a hub. I think Pentaho today called it a gateway. Maybe it was Brian from Forester. A gateway where you're sort of aggregating data, and then ultimately the third tier is the cloud. And that cloud, I think, vectors into two areas. One is Onprem and one was public cloud. What's interesting with Brian from Forester was saying that basically said that puts the nail in the coffin of Onprem analytics and Onprem big data. >> Uh-hm >> I don't buy that. >> I don't buy that either. >> No, I think the cloud is going to go to your data. Wherever the data lives. The cloud model of self-service and agile and elastic is going to go to your data. >> Couple of weeks ago, of course we Wikibon, we did a webinar for our customers all around the notion of a true private cloud. And Dave, of course, Peter Burse were on it. Explaining that hybrid clouds, of course, public and private play together. But where the cloud experience migrates to where the data is. In other words, that data will be both in public and in private clouds. But you will have the same reliability, high availability, scaleability, ease of programming, so forth, wherever you happen to put your data assets. In other words, many companies we talk to do this. They combine zonal architecture. They'll put some of their resources, like some of their analytics, will be in the private cloud for good reason. The data needs to stay there for security and so forth. But much in the public cloud where its way cheaper quite often. Also, they can improve service levels for important things. What I'm getting at is that the whole notion of a true private cloud is critically important to understand that its all datacentric. Its all gravitating to where the data is. And really analytics are gravitating to where the data is. And increasingly the data is on the edge itself. Its on those devices where its being persistent, much of it. Because there's no need to bring much of the raw data to the gateway or to the cloud. If you can do the predominate bulk of the inferrencing on that data at edge devices. And more and more the inferrencing, to drive things like face recognition from you Apple phone, is happening on the edge. Most of the data will live there, and most of the analytics will be developed centrally. And then trained centrally, and pushed to those edge devices. That's the way it's working. >> Well, it is going to be an exciting conference. I can't wait to hear more from all of our guests, and both of you, Dave Vellante and Jim Kobielus. I'm Rebecca Knight, we'll have more from theCUBE's live coverage of Pentaho World, brought to you by Hitachi Vantara just after this.

Published Date : Oct 26 2017

SUMMARY :

Brought to you by Hitachi Vantara. Guys I'm thrilled to be So the question for you both is When we talked to Brian-- is taken now the next step. but in terms of the data world, before the whole Hadoop movement. And they said alright we can And maybe that's not the right term. in the market in the mid-200's, So they're one of those Obviously, what you think You mentioned BI, you mentioned before. ClickView and so forth. And, of course, there's the that involves both the they're sort of end to end integration. Now the interesting sort of, in the cloud world. much of the other componentry. It's just kind of like And then their messaging is really approaching the marketplace, has been on the market for so long Willing to go on camera And so the acquisition was in 2015. Which is really going to be interesting How do you think money, and maybe persists some of that data, is going to go to your data. and most of the analytics brought to you by Hitachi

ENTITIES

Entity	Category	Confidence
Hitachi	ORGANIZATION	0.99+
Brian	PERSON	0.99+
George	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Rebecca	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Donna Perlik	PERSON	0.99+
Pentaho	ORGANIZATION	0.99+
James	PERSON	0.99+
Jim	PERSON	0.99+
Peter Burse	PERSON	0.99+
2015	DATE	0.99+
EMC	ORGANIZATION	0.99+
David	PERSON	0.99+
New Zealand	LOCATION	0.99+
Brian Householder	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80 percent	QUANTITY	0.99+
two	QUANTITY	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
Hitachi Limited	ORGANIZATION	0.99+
last year	DATE	0.99+
Orlando, Florida	LOCATION	0.99+
Onprem	ORGANIZATION	0.99+
today	DATE	0.99+
twice	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Hitachi Data Systems	ORGANIZATION	0.99+
Forester	ORGANIZATION	0.99+
two areas	QUANTITY	0.99+
two years ago	DATE	0.99+
Informatica	ORGANIZATION	0.99+
one week	QUANTITY	0.99+
one	QUANTITY	0.99+
Weka	ORGANIZATION	0.99+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
Tableau	TITLE	0.98+
PentahoWorld	EVENT	0.98+
14 years ago	DATE	0.98+
Hitachi America	ORGANIZATION	0.98+
Wikibon	ORGANIZATION	0.98+
Linux	TITLE	0.97+
about a half a billion	QUANTITY	0.97+

Wikibon Conversation with John Furrier and George Gilbert

(upbeat electronic music) >> Hello, everyone. Welcome to the Cube Studios in Palo Alto, California. I'm John Furrier, the co-host of the Cube and co-founder of SiliconANGLE Media Inc. I'm here with George Gilbert for a Wikibon conversation on the state of the big data. George Gilbert is the analyst at Wikibon covering big data. George, great to see you. Looking good. (laughing) >> Good to see you, John. >> So George, you're obviously covering big data. Everyone knows you. You always ask the tough questions, you're always drilling down, going under the hood, and really inspecting all the trends, and also looking at the technology. What are you working on these days as the big data analyst? What's the hot thing that you're covering? >> OK, so, what's really interesting is we've got this emerging class of applications. The name that we've used so far is modern operational analytic applications. Operational in the sense that they help drive business operations, but analytical in the sense that the analytics either inform or drive transactions, or anticipate and inform interactions with people. That's the core of this class of apps. And then there are some sort of big challenges that customers are having in trying to build, and deploy, and operate these things. That's what I want to go through. >> George, you know, this is a great piece. I can't wait to (mumbling) some of these questions and ask you some pointed questions. But I would agree with you that to me, the number one thing I see customers either fumbling with or accelerating value with is how to operationalize some of the data in a way that they've never done it before. So you start to see disciplines come together. You're starting to see people with a notion of digital business being something that's not a department, it's not a marketing department. Data is everywhere, it's horizontally scalable, and the smart executives are really looking at new operational tactics to do that. With that, let me kick off the first question to you. People are trying to balance the cloud, On Premise, and The Edge, OK. And that's classic, you're seeing that now. I've got a data center, I have to go to the cloud, a hybrid cloud. And now the edge of the network. We were just taking about Block Chain today, there's this huge problem. They've got the balance that, but they've got to balance it versus leveraging specialized services. How do you respond to that? What is your reaction? What is your presentation? >> OK, so let's turn it into something really concrete that everyone can relate to, and then I'll generalize it. The concrete version is for a number of years, everyone associated Hadoop with big data. And Hadoop, you tried to stand up on a cluster on your own premises, for the most part. It was on had EMR, but sort of the big company activity outside, even including the big tech companies was stand up a Hadoop cluster as a pilot and start building a data lake. Then see what you could do with sort of huge amounts of data that you couldn't normally sort of collect and analyze. The operational challenges of standing up that sort of cluster was rather overwhelming, and I'll explain that later, so sort of park that thought. Because of that complexity, more and more customers, all but the most sophisticated, are saying we need a cloud strategy for that. But once you start taking Hadoop into the cloud, the components of this big data analytic system, you have tons more alternatives. So whereas in Cloudera's version of Hadoop you had Impala as your MPP sequel database. On Amazon, you've got Amazon Redshift, you've got Snowflake, you've got dozens up MPP sequel databases. And so the whole playing field shifts. And not only that, Amazon has instrumented their, in that particular case, their application, to be more of a more managed service, so there's a whole lot less for admins to do. And you take that on sort of, if you look at the slides, you take every step in that pipeline. And when you put it on a different cloud, it's got different competitors. And even if you take the same step in a pipeline, let's say Spark on HDFS to do your ETL, and your analysis, and your shaping of data, and even some of the machine learning, you put that on Azure and on Amazon, it's actually on different storage foundation. So even if you're using the same component, it's different. There's a lot of complexity and a lot of trade off that you got to make. >> Is that a problem for customers? >> Yes, because all of a sudden, they have to evaluate what those trade offs are. They have to evaluate the trade off between specialization. Do I use the best to breed thing on one platform. And if I do, it's not compatible with what I might be running on prem. >> That'll slow a lot of things down. I can tell you right now, people want to have the same code base on all environments, and then just have the same seamless operational role. OK, that's a great point, George. Thanks for sharing that. The second point here is harmonizing and simplifying management across hybrid clouds. Again, back to your point. You set that up beautifully. Great example, open source innovation hits a roadblock. And the roadblock is incompatible components in multiple clouds. That's a problem. It's a management nightmare. How do harmonization about hybrid cloud work? >> You couldn't have asked it better. Let me put it up in terms of an X Y chart where on the x-axis, you have the components of an analytic pipeline. Ingest, process, analyze, predict, serve. But then on the y-axis, this is for an admin, not a developer. These are just some of the tasks they have to worry about. Data governance, performance monitoring, scheduling and orchestration, availability and recovery, that whole list. Now, if you have a different product for each step in that pipeline, and each product has a different way of handling all those admin tasks, you're basically taking all the unique activities on the y-axis, multiplying it by all the unique products on the x-axis, and you have overwhelming complexity, even if these are managed services on the cloud. Here now you've got several trade offs. Do I use the specialized products that you would call best to breed? Do I try and do end to end integration so I get simplification across the pipeline? Or do I use products that I had on-prem, like you were saying, so that I have seamless compatibility? Or do I use the cloud vendors? That's a tough trade off. There's another similar one for developers. Again, on the y-axis, for all the things that a developer would have to deal with, not all of them, just a sample. The data model and the data itself, how to address it, the programing model, the persistence. So on that y-axis, you multiply all those different things you have to master for each product. And then on the x-axis, all the different products and the pipeline. And you have that same trade off, again. >> Complexity is off the charts. >> Right. And you can trade end to end integration to simplify the complexity, but we don't really have products that are fully fleshed out and mature that stretch from one end of the pipeline to the other, so that's a challenge. Alright. Let's talk about another way of looking at management. This was looking at the administrators and the developers. Now, we're getting better and better software for monitoring performance and operations, and trying to diagnose root cause when something goes wrong and then remediate it. There's two real approaches. One is you go really deep, but on a narrow part of your application and infrastructure landscape. And that narrow part might be, you know, your analytic pipeline, your big data. The broad approach is to get end to end visibility across Edge with your IOT devices, across on-prem, perhaps even across multiple clouds. That's the breadth approach, end to end visibility. Now, there's a trade off here too as in all technology choices. When you go deep, you have bounded visibility, but that bounded visibility allows you to understand exactly what is in that set of services, how they fit together, how they work. Because the vendor, knowing that they're only giving you management of your big data pipeline, they can train their models, their machine learning models, so that whenever something goes wrong, they know exactly what caused it and they can filter out all the false positives, the scattered errors that can confuse administrators. Whereas if you want breadth, you want to see end to end your entire landscape so that you can do capacity planning and see if there was an error way upstream, something might be triggered way downstream or a bunch of things downstream. So the best way to understand this is how much knowledge do you have of all the pieces work together, and how much knowledge you have of all the pieces, the software pieces fit together. >> This is actually an interesting point. So if I kind of connect the dots for you here is the bounded root cause analysis that we see a lot of machine learning, that's where the automation is. >> George: Yeah. >> The unbounded, the breadth, that's where the data volume is. But they can work together, that's what you're saying. >> Yes. And actually, I hadn't even got to that, so thanks for taking it out. >> John: Did I jump ahead on that one? (laughing) >> No, no, you teed it out. (laughing) Because ultimately-- >> Well a lot of people want to know where it's going to be automated away. All the undifferentiated labored and scale can be automated. >> Well, when you talk about them working together. So for the deep depth first, there's a small company called Unravel Data that sort of modeled eight million jobs or workloads of big data workloads from high tech companies, so they know how all that fits together and they can tell you when something goes wrong exactly what goes wrong and how to remediate it. So take something like Rocana or Splunk, they look end to end. The interesting thing that you brought up is at some point, that end to end product is going to be like a data warehouse and the depth products are going to sit on top of it. So you'll have all the contextual data of your end to end landscape, but you'll have the deep knowledge of how things work and what goes wrong sitting on it. >> So just before we jump to the machine learning question which I want to ask you, what you're saying is the industry is evolving to almost looking like a data warehouse model, but in a completely different way. >> Yeah. Think of it as, another cue. (laughing) >> John: That's what I do, George. I help you out with the cues. (laughing) No, but I mean the data warehouse, everyone knows what that was. A huge industry, created a lot of value, but then the world got rocked by unstructured data. And then their bounded, if you will, view has got democratized. So creative destruction happened which is another word for new entrants came in and incumbents got rattled. But now it's kind of going back to what looks like a data warheouse, but it's completely distributed around. >> Yes. And I was going to do one of my movie references, but-- >> No, don't do it. Save us the judge. >> If you look at this starting in the upper right, that's the data lake where you're collecting all the data and it's for search, it's exploratory. As you get more structure, you get to the descriptive place where you can build dashboards to monitor what's going on. And you get really deep, that's when you have the machine learning. >> Well, the machine learning is hitting the low hanging fruit, and that's where I want to get to next to move it along. Sourcing machine learning capability, let's discuss that. >> OK, alright. Just to set contacts before we get there, notice that when you do end to end visibility, you're really seeing across a broad landscape. And when I'm showing my public cloud big data, that would be depth first just for that component. But you would do breadth first, you could do like a Rocana or a Splunk that then sees across everything. The point I wanted to make was when you said we're reverting back to data warehouses and revisiting that dream again, the management applications started out as saying we know how to look inside machine data and tell you what's going on with your landscape. It turns out that machine data and business operations data, your application data, are really becoming one and the same. So what used to be a transaction, there was one transaction. And that, when you summarized them, that went into the data warehouse. Then we had with systems of engagement, you had about 100 interaction events that you tracked or sort of stored for everything business transaction. And then when we went out to the big data world, it's so resource intensive that we actually had 1,000 to 10,000 infrastructure events for every business transaction. So that's why the data volumes have grown so much and why we had to go back first to data lake, and then curate it to the warehouse. >> Classic innovation story, great. Machine learning. Sourcing machine learning capabilities 'cause that's where the rubber starts hitting the road. You're starting to see clear skies when it comes to where machine learning is starting fit in. Sourcing machine learning capabilities. >> You know, even though we sort of didn't really rehearse this, you're helping cue me on perfectly. Let me make the assertion that with machine learning, we have the same shortage of really trained data scientists that we had when we were trying to stand up Hadoop clusters and do big data analytics. We did not have enough administrators because these were open source components built from essentially different projects, and putting them all together required a huge amount of skills. Data science requires, really, knowledge of algorithms that even really sophisticated programmers will tell you, "Jeez, now I need a PhD "to really understand how this stuff works." So the shortage, that means we're not going to get a lot of hand-built machine learning applications for a while. >> John: In a lot of libraries out there right now, you see TensorFlow from Google. Big traction with that application. >> George: But for PhDs, for PhDs. My contention is-- >> John: Well developers too, you could argue developers, but I'm just putting it out there. >> George: I will get to that, actually. A slide just on that. Let me do this one first because my contention is the first big application, widespread application of machine learning, is going to be the depth first management because it comes with a model built in of how all the big data workloads, services, and infrastructure fit together and work together. And if you look at how the machine learning model operates, when it knows something goes wrong, let's say an analytic job takes 17 hours and then just falls over and crashes, the model can actually look at the data layout and say we have way too much on one node, and it can change the settings and change the layout or the data because it knows how all the stuff works. The point about this is the vendor. In this particular example, Unravel Data, they built into their model an understanding of how to keep a big data workload running as opposed to telling the customer, "You have to program it." So that fits into the question you were just asking which is where do you get this talent. When you were talking about like TensorFlow, and Cafe, and Torch, and MXnet, those are all like assembly language. Yes, those are the most powerful places you could go to program machine learning. But the number of people is inversely proportional to the power of those. >> John: Yeah, those are like really unique specialty people. High, you know, the top guys. >> George: Lab coats, rocket scientists. >> John: Well yeah, just high end tier one coders, tier one brains coding away, AI gurus. This is not your working developer. >> George: But if you go up two levels. So go up one level is Amazon machine learning, Spark machine learning. Go up another level, and I'm using Amazon as an example here. Amazon has a vision service called Recognition. They have a speech generation service, Natural Language. Those are developer ready. And when I say developer ready, I mean developer just uses an API, you know, passes in the data that comes out. He doesn't have to know how the model works. >> John: It's kind of like what DevOps was for cloud at the end of the day. This slide is completely accurate in my opinion. And we're at the early days and you're starting to see the platforms develop. It's the classic abstraction layer. Whoever can extract away the complexity as AI and machine learning grows is going to be the winning platform, no doubt about it. Amazon is showing some good moves there. >> George: And you know how they abstracted away. In traditional programming, it was just building higher and higher APIs, more accessible. In machine learning, you can't do that. You have to actually train the models which means you need data. So if you look at the big cloud vendors right now. So Google, Microsoft, Amazon, and IBM. Most of them, the first three, they have a lot of data from their B to C businesses. So you know, people talking to Echo, people talking to Google Assistant or Siri. That's where they get enough of their speech. >> John: So data equals power? >> George: Yes. >> By having data, you have the ingredients. And the more data that you have, the more data that you know about, the more data that has information around it, the more effective it can be to train machine learning algorithms. >> Yes. >> And the benefit comes back to the people who have the data. >> Yes. And so even though your capabilities get narrower, 'cause you could do anything on TensorFlow. >> John: Well, that's why Facebook is getting killed right now just to kind of change tangents. They have all this data and people are very unhappy, they just released that the Russians were targeting anti-semitic advertising, they enabled that. So it's hard to be a data platform and still provide user utility. This is what's going on. Whoever has the data has the power. It was a Frankenstein moment for Facebook. So there's that out there for everyone. How do companies do the right thing? >> And there's also the issue of customer intellectual property protection. As consumers, we're like you can take our voice, you can take all our speech to Siri or to Echo or whatever and get better at recognizing speech because we've given up control of that 'cause we want those services for free. >> Whoever can shift the data value to the users. >> George: To the developers. >> Or to the developers, or communities, better said, will win. >> OK. >> In my opinion, that's my opinion. >> For the most part, Amazon, Microsoft, and Google have similar data assets. For the most part, so far. IBM has something different which is they work closely with their industry customers and they build progressively. They're working with Mercedes, they're working with BMW. They'll work on the connected car, you know, the autonomous car, and they build out those models slowly. >> So George, this slide is really really interesting and I think this should be a roadmap for all customers to look at to try to peg where they are in the machine learning journey. But then the question comes in. They do the blocking and tackling, they have the foundational low level stuff done, they're building the models, they're understanding the mission, they have the right organizational mindset and personnel. Now, they want to orchestrate it and implement it into action. That's the final question. How do you orchestrate the distributed machine learning feedback and the data coherency? How do you get this thing scaling? How do these machines and the training happen so you have the breadth, and then you could bring the machine learning up the curve into the dashboard? >> OK. We've saved the best for last. It's not easy. When I show the chevrons, that's the analytic data pipeline. And imagine in the serve and predict at the very end, let's take an IOT app, a very sophisticated one. which would be an autonomous car. And it doesn't actually have to be an autonomous one, you could just be collected a lot of information off the car to do a better job insuring it, the insurance company. But the key then is you're collecting data on a fleet of cars, right? You're collecting data off each one, but you're also collecting then the fleet. And that, in the cloud, is where you keep improving your model of how the car works. You run simulations to figure out not just how to design better ones in the future, but how to tune and optimize the ones that are on the road now. That's number three. And then in four, you push that feedback back out to the cars on the road. And you have to manage, and this is tricky, you have to make sure that the models that you trained in step three are coherent, or the same, when you take out the fleet data and then you put the model for a particular instance of a car back out on the highway. >> George, this is a great example, and I think this slide really represents the modern analytical operational role in digital business. You can't look further than Tesla, this is essentially Tesla, and now all cars as a great example 'cause it's complex, it's an internet (mumbling) device, it's on the edge of the network, it's mobility, it's using 5G. It encapsulates everything that you are presenting, so I think this is example, is a great one, of the modern operational analytic applications that supports digital business. Thanks for joining this Wikibon conversaion. >> Thank you, John. >> George Gilbert, the analyst at Wikibon covering big data and the modern operational analytical system supporting digital business. It's data driven. The people with the data can train the machines that have the power. That's the mandate, that's the action item. I'm John Furrier with George Gilbert. Thanks for watching. (upbeat electronic music)

Published Date : Sep 23 2017

SUMMARY :

George Gilbert is the analyst at Wikibon covering big data. and really inspecting all the trends, that the analytics either inform or drive transactions, With that, let me kick off the first question to you. And even if you take the same step in a pipeline, they have to evaluate what those trade offs are. And the roadblock is These are just some of the tasks they have to worry about. that stretch from one end of the pipeline to the other, So if I kind of connect the dots for you here But they can work together, that's what you're saying. And actually, I hadn't even got to that, No, no, you teed it out. All the undifferentiated labored and scale can be automated. and the depth products are going to sit on top of it. to almost looking like a data warehouse model, Think of it as, another cue. And then their bounded, if you will, view And I was going to do one of my movie references, but-- No, don't do it. that's when you have the machine learning. is hitting the low hanging fruit, and tell you what's going on with your landscape. You're starting to see clear skies So the shortage, that means we're not going to get you see TensorFlow from Google. George: But for PhDs, for PhDs. John: Well developers too, you could argue developers, So that fits into the question you were just asking High, you know, the top guys. This is not your working developer. George: But if you go up two levels. at the end of the day. So if you look at the big cloud vendors right now. And the more data that you have, And the benefit comes back to the people 'cause you could do anything on TensorFlow. Whoever has the data has the power. you can take all our speech to Siri or to Echo or whatever Or to the developers, you know, the autonomous car, and then you could bring the machine learning up the curve or the same, when you take out the fleet data It encapsulates everything that you are presenting, and the modern operational analytical system

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Mercedes	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
John	PERSON	0.99+
BMW	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
1,000	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
SiliconANGLE Media Inc.	ORGANIZATION	0.99+
first	QUANTITY	0.99+
second point	QUANTITY	0.99+
17 hours	QUANTITY	0.99+
Siri	TITLE	0.99+
Wikibon	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
first question	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
eight million jobs	QUANTITY	0.99+
Echo	COMMERCIAL_ITEM	0.99+
two levels	QUANTITY	0.99+
Tesla	ORGANIZATION	0.99+
One	QUANTITY	0.99+
each product	QUANTITY	0.99+
each step	QUANTITY	0.99+
first three	QUANTITY	0.98+
Cube Studios	ORGANIZATION	0.98+
one level	QUANTITY	0.98+
one platform	QUANTITY	0.98+
Rocana	ORGANIZATION	0.98+
one transaction	QUANTITY	0.97+
about 100 interaction	QUANTITY	0.97+
dozens	QUANTITY	0.96+
four	QUANTITY	0.96+
Cube	ORGANIZATION	0.96+
one end	QUANTITY	0.96+
each one	QUANTITY	0.96+
Google Assistant	TITLE	0.96+
two real approaches	QUANTITY	0.94+
Unravel Data	ORGANIZATION	0.94+
one	QUANTITY	0.93+
today	DATE	0.92+

Wrap Up | IBM Fast Track Your Data 2017

>> Narrator: Live from Munich Germany, it's theCUBE, covering IBM, Fast Track Your Data. Brought to you by IBM. >> We're back. This is Dave Vellante with Jim Kobielus, and this is theCUBE, the leader in live tech coverage. We go out to the events. We extract the signal from the noise. We are here covering special presentation of IBM's Fast Track your Data, and we're in Munich Germany. It's been a day-long session. We started this morning with a panel discussion with five senior level data scientists that Jim and I hosted. Then we did CUBE interviews in the morning. We cut away to the main tent. Kate Silverton did a very choreographed scripted, but very well done, main keynote set of presentations. IBM made a couple of announcements today, and then we finished up theCUBE interviews. Jim and I are here to wrap. We're actually running on IBMgo.com. We're running live. Hilary Mason talking about what she's doing in data science, and also we got a session on GDPR. You got to log in to see those sessions. So go ahead to IBMgo.com, and you'll find those. Hit the schedule and go to the Hilary Mason and GDP our channels, and check that out, but we're going to wrap now. Jim two main announcements today. I hesitate to call them big announcements. I mean they were you know just kind of ... I think the word you used last night was perfunctory. You know I mean they're okay, but they're not game changing. So what did you mean? >> Well first of all, when you look at ... Though IBM is not calling this a signature event, it's essentially a signature event. They do these every June or so. You know in the past several years, the signature events have had like a one track theme, whether it be IBM announcing their investing deeply in Spark, or IBM announcing that they're focusing on investing in R as the core language for data science development. This year at this event in Munich, it's really a three track event, in terms of the broad themes, and I mean they're all important tracks, but none of them is like game-changing. Perhaps IBM doesn't intend them to be it seems like. One of which is obviously Europe. We're holding this in Munich. And a couple of things of importance to European customers, first and foremost GDPR. The deadline next year, in terms of compliance, is approaching. So sound the alarm as it were. And IBM has rolled out compliance or governance tools. Download and the go from the information catalog, governance catalog and so forth. Now announcing the consortium with Hortonworks to build governance on top of Apache Atlas, but also IBM announcing that they've opened up a DSX center in England and a machine-learning hub here in Germany, to help their European clients, in those countries especially, to get deeper down into data science and machine learning, in terms of developing those applicants. That's important for the audience, the regional audience here. The second track, which is also important, and I alluded to it. It's governance. In all of its manifestations you need a master catalog of all the assets for building and maintaining and controlling your data applications and your data science applications. The catalog, the consortium, the various offerings at IBM is announced and discussed in great detail. They've brought in customers and partners like Northern Trust, talk about the importance of governance, not just as a compliance mandate, but also the potential strategy for monetizing your data. That's important. Number three is what I call cloud native data applications and how the state of the art in developing data applications is moving towards containerized and orchestrated environments that involve things like Docker and Kubernetes. The IBM DB2 developer community edition. Been in the market for a few years. The latest version they announced today includes kubernetes support. Includes support for JSON. So it's geared towards new generation of cloud and data apps. What I'm getting at ... Those three core themes are Europe governance and cloud native data application development. Each of them is individually important, but none of them is game changer. And one last thing. Data science and machine learning, is one of the overarching envelope themes of this event. They've had Hilary Mason. A lot of discussion there. My sense I was a little bit disappointed because there wasn't any significant new announcements related to IBM evolving their machine learning portfolio into deep learning or artificial intelligence in an environment where their direct competitors like Microsoft and Google and Amazon are making a huge push in AI, in terms of their investments. There's a bit of a discussion, and Rob Thomas got to it this morning, about DSX. Working with power AI, the IBM platform, I would like to hear more going forward about IBM investments in these areas. So I thought it was an interesting bunch of announcements. I'll backtrack on perfunctory. I'll just say it was good that they had this for a lot of reasons, but like I said, none of these individual announcements is really changing the game. In fact like I said, I think I'm waiting for the fall, to see where IBM goes in terms of doing something that's actually differentiating and innovative. >> Well I think that the event itself is great. You've got a bunch of partners here, a bunch of customers. I mean it's active. IBM knows how to throw a party. They've always have. >> And the sessions are really individually awesome. I mean terms of what you learn. >> The content is very good. I would agree. The two announcements that were sort of you know DB2, sort of what I call community edition. Simpler, easier to download. Even Dave can download DB2. I really don't want to download DB2, but I could, and play with it I guess. You know I'm not database guy, but those of you out there that are, go check it out. And the other one was the sort of unified data governance. They tried to tie it in. I think they actually did a really good job of tying it into GDPR. We're going to hear over the next, you know 11 months, just a ton of GDPR readiness fear, uncertainty and doubt, from the vendor community, kind of like we heard with Y2K. We'll see what kind of impact GDPR has. I mean it looks like it's the real deal Jim. I mean it looks like you know this 4% of turnover penalty. The penalties are much more onerous than any other sort of you know, regulation that we've seen in the past, where you could just sort of fluff it off. Say yeah just pay the fine. I think you're going to see a lot of, well pay the lawyers to delay this thing and battle it. >> And one of our people in theCUBE that we interviewed, said it exactly right. It's like the GDPR is like the inverse of Y2K. In Y2K everybody was freaking out. It was actually nothing when it came down to it. Where nobody on the street is really buzzing. I mean the average person is not buzzing about GDPR, but it's hugely important. And like you said, I mean some serious penalties may be in the works for companies that are not complying, companies not just in Europe, but all around the world who do business with European customers. >> Right okay so now bring it back to sort of machine learning, deep learning. You basically said to Rob Thomas, I see machine learning here. I don't see a lot of the deep learning stuff quite yet. He said stay tuned. You know you were talking about TensorFlow and things like that. >> Yeah they supported that ... >> Explain. >> So Rob indicated that IBM very much, like with power AI and DSX, provides an open framework or toolkit for plugging in your, you the developers, preferred machine learning or deep learning toolkit of an open source nature. And there's a growing range of open source deep learning toolkits beyond you know TensorFlow, including Theano and MXNet and so forth, that IBM is supporting within the overall ESX framework, but also within the power AI framework. In other words they've got those capabilities. They're sort of burying that message under a bushel basket, at least in terms of this event. Also one of the things that ... I said this too Mena Scoyal. Watson data platform, which they launched last fall, very important product. Very important platform for collaboration among data science professionals, in terms of the machine learning development pipeline. I wish there was more about the Watson data platform here, about where they're taking it, what the customers are doing with it. Like I said a couple of times, I see Watson data platform as very much a DevOps tool for the new generation of developers that are building machine learning models directly into their applications. I'd like to see IBM, going forward turn Watson data platform into a true DevOps platform, in terms of continuous integration of machine learning and deep learning another statistical models. Continuous training, continuous deployment, iteration. I believe that's where they're going, or probably she will be going. I'd like to see more. I'm expecting more along those lines going forward. What I just described about DevOps for data science is a big theme that we're focusing on at Wikibon, in terms where the industry is going. >> Yeah, yeah. And I want to come back to that again, and get an update on what you're doing within your team, and talk about the research. Before we do that, I mean one of the things we talked about on theCUBE, in the early days of Hadoop is that the guys are going to make the money in this big data business of the practitioners. They're not going to see, you know these multi-hundred billion dollar valuations come out of the Hadoop world. And so far that prediction has held up well. It's the Airbnbs and the Ubers and the Spotifys and the Facebooks and the Googles, the practitioners who are applying big data, that are crushing it and making all the money. You see Amazon now buying Whole Foods. That in our view is a data play, but who's winning here, in either the vendor or the practitioner community? >> Who's winning are the startups with a hot new idea that's changing, that's disrupting some industry, or set of industries with machine learning, deep learning, big data, etc. For example everybody's, with bated breath, waiting for you know self-driving vehicles. And the ecosystem as it develops somebody's going to clean up. And one or more companies, companies we probably never heard of, leveraging everything we're describing here today, data science and containerized distributed applications that involve you know deep learning for you know image analysis and sensor analyst and so forth. Putting it all together in some new fabric that changes the way we live on this planet, but as you said the platforms themselves, whether they be Hadoop or Spark or TensorFlow, whatever, they're open source. You know and the fact is, by it's very nature, open source based solutions, in terms of profit margins on selling those, inexorably migrate to zero. So you're not going to make any money as a tool vendor, or a platform vendor. You got to make money ... If you're going to make money, you make money, for example from providing an ecosystem, within which innovation can happen. >> Okay we have a few minutes left. Let's talk about the research that you're working on. What's exciting you these days? >> Right, right. So I think a lot of people know I've been around the analyst space for a long long time. I've joined the SiliconANGLE Wikibon team just recently. I used to work for a very large solution provider, and what I do here for Wikibon is I focus on data science as the core of next generation application development. When I say next-generation application development, it's the development of AI, deep learning machine learning, and the deployment of those data-driven statistical assets into all manner of application. And you look at the hot stuff, like chatbots for example. Transforming the experience in e-commerce on mobile devices. Siri and Alexa and so forth. Hugely important. So what we're doing is we're focusing on AI and everything. We're focusing on containerization and building of AI micro-services and the ecosystem of the pipelines and the tools that allow you to do that. DevOps for data science, distributed training, federated training of statistical models, so forth. We are also very much focusing on the whole distributed containerized ecosystem, Docker, Kubernetes and so forth. Where that's going, in terms of changing the state of the art, in terms of application development. Focusing on the API economy. All of those things that you need to wrap around the payload of AI to deliver it into every ... >> So you're focused on that intersection between AI and the related topics and the developer. Who is winning in that developer community? Obviously Amazon's winning. You got Microsoft doing a good job there. Google, Apple, who else? I mean how's IBM doing for example? Maybe name some names. Who do you who impresses you in the developer community? But specifically let's start with IBM. How is IBM doing in that space? >> IBM's doing really well. IBM has been for quite a while, been very good about engaging with new generation of developers, using spark and R and Hadoop and so forth to build applications rapidly and deploy them rapidly into all manner of applications. So IBM has very much reached out to, in the last several years, the Millennials for whom all of this, these new tools, have been their core repertoire from the very start. And I think in many ways, like today like developer edition of the DB2 developer community edition is very much geared to that market. Saying you know to the cloud native application developer, take a second look at DB2. There's a lot in DB2 that you might bring into your next application development initiative, alongside your spark toolkit and so forth. So IBM has startup envy. They're a big old company. Been around more than a hundred years. And they're trying to, very much bootstrap and restart their brand in this new context, in the 21st century. I think they're making a good effort at doing it. In terms of community engagement, they have a really good community engagement program, all around the world, in terms of hackathons and developer days, you know meetups here and there. And they get lots of turnout and very loyal customers and IBM's got to broadest portfolio. >> So you still bleed a little bit of blue. So I got to squeeze it out of you now here. So let me push a little bit on what you're saying. So DB2 is the emphasis here, trying to position DB2 as appealing for developers, but why not some of the other you know acquisitions that they've made? I mean you don't hear that much about Cloudant, Dash TV, and things of that nature. You would think that that would be more appealing to some of the developer communities than DB2. Or am I mistaken? Is it IBM sort of going after the core, trying to evolve that core you know constituency? >> No they've done a lot of strategic acquisitions like Cloudant, and like they've acquired Agrath Databases and brought them into their platform. IBM has every type of database or file system that you might need for web or social or Internet of Things. And so with all of the development challenges, IBM has got a really high-quality, fit-the-purpose, best-of-breed platform, underlying data platform for it. They've got huge amounts of developers energized all around the world working on this platform. DB2, in the last several years they've taken all of their platforms, their legacy ... That's the wrong word. All their existing mature platforms, like DB2 and brought them into the IBM cloud. >> I think legacy is the right word. >> Yeah, yeah. >> These things have been around for 30 years. >> And they're not going away because they're field-proven and ... >> They are evolving. >> And customers have implemented them everywhere. And they're evolving. If you look at how IBM has evolved DB2 in the last several years into ... For example they responded to the challenge from SAP HANA. We brought BLU Acceleration technology in memory technology into DB2 to make it screamingly fast and so forth. IBM has done a really good job of turning around these product groups and the product architecture is making them cloud first. And then reaching out to a new generation of cloud application developers. Like I said today, things like DB2 developer community edition, it's just the next chapter in this ongoing saga of IBM turning itself around. Like I said, each of the individual announcements today is like okay that's interesting. I'm glad to see IBM showing progress. None of them is individually disruptive. I think the last week though, I think Hortonworks was disruptive in the sense that IBM recognized that BigInsights didn't really have a lot of traction in the Hadoop spaces, not as much as they would have wished. Hortonworks very much does, and IBM has cast its lot to work with HDP, but HDP and Hortonworks recognizes they haven't achieved any traction with data scientists, therefore DSX makes sense, as part of the Hortonworks portfolio. Likewise a big sequel makes perfect sense as the sequel front end to the HDP. I think the teaming of IBM and Hortonworks is propitious of further things that they'll be doing in the future, not just governance, but really putting together a broader cloud portfolio for the next generation of data scientists doing work in the cloud. >> Do you think Hortonworks is a legitimate acquisition target for IBM. >> Of course they are. >> Why would IBM ... You know educate us. Why would IBM want to acquire Hortonworks? What does that give IBM? Open source mojo, obviously. >> Yeah mojo. >> What else? >> Strong loyalty with the Hadoop market with developers. >> The developer angle would supercharge the developer angle, and maybe make it more relevant outside of some of those legacy systems. Is that it? >> Yeah, but also remember that Hortonworks came from Yahoo, the team that developed much of what became Hadoop. They've got an excellent team. Strategic team. So in many ways, you can look at Hortonworks as one part aqui-hire if they ever do that and one part really substantial and growing solution portfolio that in many ways is complementary to IBM. Hortonworks is really deep on the governance of Hadoop. IBM has gone there, but I think Hortonworks is even deeper, in terms of their their laser focus. >> Ecosystem expansion, and it actually really wouldn't be that expensive of an acquisition. I mean it's you know north of ... Maybe a billion dollars might get it done. >> Yeah. >> You know so would you pay a billion dollars for Hortonworks? >> Not out of my own pocket. >> No, I mean if you're IBM. You think that would deliver that kind of value? I mean you know how IBM thinks about about acquisitions. They're good at acquisitions. They look at the IRR. They have their formula. They blue-wash the companies and they generally do very well with acquisitions. Do you think Hortonworks would fit profile, that monetization profile? >> I wouldn't say that Hortonworks, in terms of monetization potential, would match say what IBM has achieved by acquiring the Netezza. >> Cognos. >> Or SPSS. I mean SPSS has been an extraordinarily successful ... >> Well the day IBM acquired SPSS they tripled the license fees. As a customer I know, ouch, it worked. It was incredibly successful. >> Well, yeah. Cognos was. Netezza was. And SPSS. Those three acquisitions in the last ten years have been extraordinarily pivotal and successful for IBM to build what they now have, which is really the most comprehensive portfolio of fit-to-purpose data platform. So in other words all those acquisitions prepared IBM to duke it out now with their primary competitors in this new field, which are Microsoft, who's newly resurgent, and Amazon Web Services. In other words, the two Seattle vendors, Seattle has come on strong, in a way that almost Seattle now in big data in the cloud is eclipsing Silicon Valley, in terms of where you know ... It's like the locus of innovation and really of customer adoption in the cloud space. >> Quite amazing. Well Google still hanging in there. >> Oh yeah. >> Alright, Jim. Really a pleasure working with you today. Thanks so much. Really appreciate it. >> Thanks for bringing me on your team. >> And Munich crew, you guys did a great job. Really well done. Chuck, Alex, Patrick wherever he is, and our great makeup lady. Thanks a lot. Everybody back home. We're out. This is Fast Track Your Data. Go to IBMgo.com for all the replays. Youtube.com/SiliconANGLE for all the shows. TheCUBE.net is where we tell you where theCUBE's going to be. Go to wikibon.com for all the research. Thanks for watching everybody. This is Dave Vellante with Jim Kobielus. We're out.

Published Date : Jun 25 2017

SUMMARY :

Brought to you by IBM. I mean they were you know just kind of ... I think the word you used last night was perfunctory. And a couple of things of importance to European customers, first and foremost GDPR. IBM knows how to throw a party. I mean terms of what you learn. seen in the past, where you could just sort of fluff it off. I mean the average person is not buzzing about GDPR, but it's hugely important. I don't see a lot of the deep learning stuff quite yet. And there's a growing range of open source deep learning toolkits beyond you know TensorFlow, of Hadoop is that the guys are going to make the money in this big data business of the And the ecosystem as it develops somebody's going to clean up. Let's talk about the research that you're working on. the pipelines and the tools that allow you to do that. Who do you who impresses you in the developer community? all around the world, in terms of hackathons and developer days, you know meetups here Is it IBM sort of going after the core, trying to evolve that core you know constituency? They've got huge amounts of developers energized all around the world working on this platform. Likewise a big sequel makes perfect sense as the sequel front end to the HDP. You know educate us. The developer angle would supercharge the developer angle, and maybe make it more relevant Hortonworks is really deep on the governance of Hadoop. I mean it's you know north of ... They blue-wash the companies and they generally do very well with acquisitions. I wouldn't say that Hortonworks, in terms of monetization potential, would match say I mean SPSS has been an extraordinarily successful ... Well the day IBM acquired SPSS they tripled the license fees. now in big data in the cloud is eclipsing Silicon Valley, in terms of where you know Well Google still hanging in there. Really a pleasure working with you today. And Munich crew, you guys did a great job.

ENTITIES

Entity	Category	Confidence
Kate Silverton	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Germany	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Y2K	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Chuck	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
England	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
second track	QUANTITY	0.99+
Siri	TITLE	0.99+
two	QUANTITY	0.99+
21st century	DATE	0.99+
three track	QUANTITY	0.99+
Rob	PERSON	0.99+
next year	DATE	0.99+
4%	QUANTITY	0.99+
Mena Scoyal	PERSON	0.99+
Alex	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
Cloudant	ORGANIZATION	0.99+

Reynold Xin, Databricks - #Spark Summit - #theCUBE

>> Narrator: Live from San Francisco, it's theCUBE, covering Spark Summit 2017. Brought to you by Databricks. >> Welcome back we're here at theCube at Spark Summit 2017. I'm David Goad here with George Gilbert, George. >> Good to be here. >> Thanks for hanging with us. Well here's the other man of the hour here. We just talked with Ali, the CEO at Databricks and now we have the Chief Architect and co-founder at Databricks, Reynold Xin. Reynold, how are you? >> I'm good. How are you doing? >> David: Awesome. Enjoying yourself here at the show? >> Absolutely, it's fantastic. It's the largest Summit. It's a lot interesting things, a lot of interesting people with who I meet. >> Well I know you're a really humble guy but I had to ask Ali what should I ask Reynold when he gets up here. Reynold is one of the biggest contributors to Spark. And you've been with us for a long time right? >> Yes, I've been contributing for Spark for about five or six years and that's probably the most number of commits to the project and lately more I'm working with other people to help design the roadmap for both Spark and Databricks with them. >> Well let's get started talking about some of the new developments that you want maybe our audience at theCUBE hasn't heard here in the keynote this morning. What are some of the most exciting new developments? >> So, I think in general if we look at Spark, there are three directions I would say we doubling down. One the first direction is the deep learning. Deep learning is extremely hot and it's very capable but as we alluded to earlier in a blog post, deep learning has reached sort of a mass produced point in which it shows tremendous potential but the tools are very difficult to use. And we are hoping to democratize deep learning and do what Spark did to big data, to deep learning with this new library called deep learning pipelines. What it does, it integrates different deep learning libraries directly in Spark and can actually expose models in sequel. So, even the business analysts are capable of leveraging that. So, that one area, deep learning. The second area is streaming. Streaming, again, I think that a lot of customers have aspirations to actually shorten the latency and increase the throughput in streaming. So, the structured streaming effort is going to be generally available and last month alone on Databricks platform, I think out customers processed three trillion records, last month alone using structured streaming. And we also have a new effort to actually push down the latency all the way to some millisecond range. So, you can really do blazingly fast streaming analytics. And last but not least is the SEQUEL Data Warehousing area, Data warehousing I think that it's a very mature area from the outset of big data point of view, but from a big data one it's still pretty new and there's a lot of use cases that's popping up there. And Spark with approaches like the CBO and also impact here in the database runtime with DBIO, we're actually substantially improving the performance and the capabilities of data warehousing futures. >> We're going to dig in to some of those technologies here in just a second with George. But have you heard anything here so far from anyone that's changed your mind maybe about what to focus on next? So, one thing I've heard from a few customers is actually visibility and debugability of the big data jobs. So many of them are fairly technical engineers and some of them are less sophisticated engineers and they have written jobs and sometimes the job runs slow. And so the performance engineer in me would think so how do I make the job run fast? The different way to actually solve that problem is how can we expose the right information so the customer can actually understand and figure it out themselves. This is why my job is slow and this how I can tweak it to make it faster. Rather than giving people the fish, you actually give them the tools to fish. >> If you can call that bugability. >> Reynold: Yeah, Debugability. >> Debugability. >> Reynold: And visibility, yeah. >> Alright, awesome, George. >> So, let's go back and unpack some of those kind of juicy areas that you identified, on deep learning you were able to distribute, if I understand things right, the predictions. You could put models out on a cluster but the really hard part, the compute intensive stuff, was training across a cluster. And so Deep Learning, 4J and I think Intel's BigDL, they were written for Spark to do that. But with all the excitement over some of the new frameworks, are they now at the point where they are as good citizens on Spark as they are on their native environments? >> Yeah so, this is a very interesting question, obviously a lot of other frameworks are becoming more and more popular, such as TensorFlow, MXNet, Theano, Keras and Office. What the Deep Learning Pipeline library does, is actually exposes all these single note Deep Learning tools as highly optimized for say even GPUs or CPUs, to be available as a estimator or like a module in a pipeline of the machine learning pipeline library in spark. So, now users can actually leverage Spark's capability to, for example, do hyper parameter churning. So, when you're building a machine learning model, it's fairly rare that you just run something once and you're good with it. Usually have to fiddle with a lot of the parameters. For example, you might run over a hundred experiments to actually figure out what is the best model I can get. This is where actually Spark really shines. When you combine Spark with some deep learning library be it BigDL or be it MXNet, be it TensorFlow, you could be using Spark to distribute that training and then do cross validation on it. So you can actually find the best model very quickly. And Spark takes care of all the job scheduling, all the tolerance properties and how do you read data in from different data sources. >> And without my dropping too much in the weeds, there was a version of that where Spark wouldn't take care of all the communications. It would maybe distribute the models and then do some of the averaging of what was done out on the cluster. Are you saying that all that now can be managed by Spark? >> In that library, Spark will be able to actually take care of picking the best model out of it. And there are different ways you an design how do you define the best. The best could be some average of some different models. The best could be just pick one out of this. The best could be maybe there's a tree of models that you classify it on. >> George: And that's a hyper parameter configuration choice? >> So that is actually building functionality in Sparks machine learning pipeline. And now what we're doing is now you can actually plug all those deep learning libraries directly into that as part of the pipeline to be used. Another maybe just to add, >> Yeah, yeah, >> Another really cool functionality of the deep learning pipeline is transfer learning. So as you said, deep learning takes a very long time, it's very computationally demanding. And it takes a lot of resources, expertise to train. But with transfer learning what we allow the customers to do is they can take an existing deep learning model as well train in a different domain and they we'd retrain it on a very small amount of data very quickly and they can adapt it to a different domain. That's how sort of the demo on the James Bond car. So there is a general image classifier that we train it on probably just a few thousand images. And now we can actually detect whether a car is James Bond's car or not. >> Oh, and the implications there are huge, which is you don't have to have huge training data sets for modifying a model of a similar situation. I want to, in the time we have, there's always been this debate about whether Sparks should manage state, whether it's database, key value store. Tell us how the thinking about that has evolved and then how the integration interfaces for achieving that have evolved. >> One of the, I would say, advantages of Spark is that it's unbiased and works with a variety of storage systems, be it Cassandra, be it Edgebase, be it HDFS, be is S3. There is a metadata management functionality in Spark which is the catalog of tables that customers can define. But the actual storage sits somewhere else. And I don't think that will change in the near future because we do see that the storage systems have matured significantly in the last few years and I just wrote blog post last week about the advantage of S3 over HDFS for example. The storage price is being driven down by almost a factor of 10X when you go to the cloud. I just don't think it makes sense at this point to be building storage systems for analytics. That said, I think there's a lot of building on top of existing storage system. There's actually a lot of opportunities for optimization on how you can leverage the specific properties of the underlying storage system to get to maximum performance. For example, how are you doing intelligent caching, how do you start thinking about building indexes actually against the data that's stored for scanned workloads. >> With Tungsten's, you take advantage of the latest hardware and where we get more memory intensive systems and now that the Catalyst Optimizer has a cost based optimizer or will be, and large memory. Can you change how you go about knowing what data you're managing in the underlying system and therefore, achieve a tremendous acceleration in performance? >> This is actually one area we invested in the DBIO module as part of Databricks Runtime, and what DBIO does, a lot of this are still in progress, but for example, we're adding some form of indexing capability to add to the system so we can quickly skip and prune out all the irrelevant data when the user is doing simple point look-ups. Or if the user is doing a scan heavy workload with some predicates. That actually has to do with how we think about the underlying data structure. The storage system is still the same storage system, like S3, but were adding actually indexing functionalities on top of it as part of DBIO. >> And so what would be the application profiles? Is it just for the analytic queries or can you do the point look-ups and updates in that sort of scenario too? >> So it's interesting you're talking about updates. Updates is another thing that we've got a lot of future requests on. We're actively thinking about how we will support update workload. Now, that said, I just want to emphasize for both use case of doing point look-ups and updates, we're still talking about in the context of analytic environment. So we would be talking about for example maybe bulk updates or low throughput updates rather than doing transactional updates in which every time you swipe a credit card, some record gets updated. That's probably more belongs on the transactional databases like Oracle or my SEQUEL even. >> What about when you think about people who are going to run, they started out with Spark on prem, they realize they're going to put much more of their resources in the cloud, but with IIOT, industrial IOT type applications they're going to have Spark maybe in a gateway server on the edge? What do you think that configuration looks like? >> Really interesting, it's kind of two questions maybe. The first is the hybrid on prem, cloud solution. Again, so one of the nice advantage of Spark is the couple of storage and compute. So when you want to move for example, workloads from one prem to the cloud, the one you care the most about is probably actually the data 'cause the compute, it doesn't really matter that much where you run it but data's the one that's hard to move. We do have customers that's leveraging Databricks in the cloud but actually reading data directly from on prem the reliance of the caching solution we have that minimize the data transfer over time. And is one route I would say it's pretty popular. Another on is, with Amazon you can literally give them just a show ball of functionality. You give them hard drive with trucks, the trucks will ship your data directly put in a three. With IOT, a common pattern we see is a lot of the edge devices, would be actually pushing the data directly into some some fire hose like Kinesis or Kafka or, I'm sure Google and Microsoft both have their own variance of that. And then you use Spark to directly subscribe to those topics and process them in real time with structured streaming. >> And so would Spark be down, let's say at the site level. if it's not on the device itself? >> It's a interesting thought and maybe one thing we should actually consider more in the future is how do we push Spark to the edges. Right now it's more of a centralized model in which the devices push data into Spark which is centralized somewhere. I've seen for example, I don't remember exact the use case but it has to do with some scientific experiment in the North Pole. And of course there you don't have a great uplink of all the data connecting transferring back to some national lab and rather they would do a smart parsing there and then ship the aggregated result back. There's another one but it's less common. >> Alright well just one minute now before the break so I'm going to give you a chance to address the Spark community. What's the next big technical challenge you hope people will work on for the benefit of everybody? >> In general Spark came along with two focuses. One is performance, the other one's ease of use. And I still think big data tools are too difficult to use. Deep learning tools, even harder. The barrier to entry is very high for office tools. I would say, we might have already addressed performance to a degree that I think it's actually pretty usable. The systems are fast enough. Now, we should work on actually make (mumbles) even easier to use. It's what also we focus a lot on at Databricks here. >> David: Democratizing access right? >> Absolutely. >> Alright well Reynold, I wish we could talk to you all day. This is great. We are out of time now. Want to appreciate you coming by theCUBE and sharing your insights and good luck with the rest of the show. >> Thank you very much David and George. >> Thank you all for watching here were at theCUBE at Sparks Summit 2017. Stay tuned, lots of other great guests coming up today. We'll see you in a few minutes.

Published Date : Jun 7 2017

SUMMARY :

Brought to you by Databricks. I'm David Goad here with George Gilbert, George. Well here's the other man of the hour here. How are you doing? David: Awesome. It's the largest Summit. Reynold is one of the biggest contributors to Spark. and that's probably the most number of the new developments that you want So, the structured streaming effort is going to be And so the performance engineer in me would think kind of juicy areas that you identified, all the tolerance properties and how do you read data of the averaging of what was done out on the cluster. And there are different ways you an design as part of the pipeline to be used. of the deep learning pipeline is transfer learning. Oh, and the implications there are huge, of the underlying storage system and now that the Catalyst Optimizer The storage system is still the same storage system, That's probably more belongs on the transactional databases the one you care the most about if it's not on the device itself? And of course there you don't have a great uplink so I'm going to give you a chance One is performance, the other one's ease of use. Want to appreciate you coming by theCUBE Thank you all for watching here were at theCUBE

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Reynold	PERSON	0.99+
Ali	PERSON	0.99+
David	PERSON	0.99+
George	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
David Goad	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
North Pole	LOCATION	0.99+
San Francisco	LOCATION	0.99+
Reynold Xin	PERSON	0.99+
last month	DATE	0.99+
10X	QUANTITY	0.99+
two questions	QUANTITY	0.99+
three trillion records	QUANTITY	0.99+
second area	QUANTITY	0.99+
today	DATE	0.99+
last week	DATE	0.99+
Spark	TITLE	0.99+
Spark Summit 2017	EVENT	0.99+
first direction	QUANTITY	0.99+
One	QUANTITY	0.99+
James Bond	PERSON	0.98+
Spark	ORGANIZATION	0.98+
both	QUANTITY	0.98+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
Tungsten	ORGANIZATION	0.98+
two focuses	QUANTITY	0.97+
three directions	QUANTITY	0.97+
one minute	QUANTITY	0.97+
one area	QUANTITY	0.96+
three	QUANTITY	0.96+
about five	QUANTITY	0.96+
DBIO	ORGANIZATION	0.96+
six years	QUANTITY	0.95+
one thing	QUANTITY	0.94+
over a hundred experiments	QUANTITY	0.94+
Oracle	ORGANIZATION	0.92+
Theano	TITLE	0.92+
single note	QUANTITY	0.91+
Intel	ORGANIZATION	0.91+
one route	QUANTITY	0.89+
theCUBE	ORGANIZATION	0.88+
Office	TITLE	0.87+
TensorFlow	TITLE	0.87+
S3	TITLE	0.87+
MXNet	TITLE	0.85+

Day One Wrap - #SparkSummit - #theCUBE

>> Announcer: Live from San Francisco, it's the CUBE covering Spark Summit 2017, brought to by Databricks. (energetic music plays) >> And what an exciting day we've had here at the CUBE. We've been at Spark Summit 2017, talking to partners, to customers, to founders, technologists, data scientists. It's been a load of information, right? >> Yeah, an overload of information. >> Well, George, you've been here in the studio with me talking with a lot of the guests. I'm going to ask you to maybe recap some of the top things you've heard today for our guests. >> Okay so, well, Databricks laid down, sort of, three themes that they wanted folks to take away. Deep learning, Structured Streaming, and serverless. Now, deep learning is not entirely new to Spark. But they've dramatically improved their support for it. I think, going beyond the frameworks that were written specifically for Spark, like Deeplearning4j and BigDL by Intel And now like TensorFlow, which is the opensource framework from Google, has gotten much better support. Structured Streaming, it was not clear how much more news we were going to get, because it's been talked about for 18 months. And they really, really surprised a lot of people, including me, where they took, essentially, the processing time for an event or a small batch of events down to 1 millisecond. Whereas, before, it was in the hundreds if not higher. And that changes the type of apps you can build. And also, the Databricks guys had coined the term continuous apps, which means they operate on a never-ending stream of data, which is different from what we've had in the past where it's batch or with a user interface, request-response. So they definitely turned up the volume on what they can do with continuous apps. And serverless, they'll talk about more tomorrow. And Jim, I think, is going to weigh in. But it, basically, greatly simplifies the ability to run this infrastructure, because you don't think of it as a cluster of resources. You just know that it's sort of out there, and you ask requests of it, and it figures out how to fulfill it. I will say, the other big surprise for me was when we have Matei, who's the creator of Spark and the chief technologist at Databricks, come on the show and say, when we asked him about how Spark was going to deal with, essentially, more advanced storage of data so that you could update things, so that you could get queries back, so that you could do analytics, and not just of stuff that's stored in Spark but stuff that Spark stores essentially below it. And he said, "You know, Databricks, you can expect to see come out with or partner with a database to do these advanced scenarios." And I got the distinct impression, and after listen to the tape again, that he was talking about for Apache Spark, which is separate from Databricks, that they would do some sort of key-value store. So in other words, when you look at competitors or quasi-competitors like Confluent Kafka or a data artist in Flink, they don't, they're not perfect competitors. They overlap some. Now Spark is pushing its way more into overlapping with some of those solutions. >> Alright. Well, Jim Kobielus. And thank you for that, George. You've been mingling with the masses today. (laughs) And you've been here all day as well. >> Educated masses, yeah, (David laughs) who are really engaged in this stuff, yes. >> Well, great, maybe give us some of your top takeaways after all the conversations you've had today. >> They're not all that dissimilar from George's. What Databricks, Databricks of course being the center, the developer, the primary committer in the Spark opensource community. They've done a number of very important things in terms of the announcements today at this event that push Spark, the Spark ecosystem, where it needs to go to expand the range of capabilities and their deployability into production environments. I feel the deep-learning side, announcement in terms of the deep-learning pipeline API very, very important. Now, as George indicated, Spark has been used in a fair number of deep-learning development environments. But not as a modeling tool so much as a training tool, a tool for In Memory distributed training of deep-learning models that we developed in TensorFlow, in Caffe, and other frameworks. Now this announcement is essentially bringing support for deep learning directly into the Spark modeling pipeline, the machine-learning modeling pipeline, being able to call out to deep learning, you know, TensorFlow and so forth, from within MLlib. That's very important. That means that Spark developers, of which there are many, far more than there are TensorFlow developers, will now have an easy pass to bring more deep learning into their projects. That's critically important to democratize deep learning. I hope, and from what I've seen what Databricks has indicated, that they have support currently in API reaching out to both TensorFlow and Keras, that they have plans to bring in API support for access to other leading DL toolkits such as Caffe, Caffe 2, which is Facebook-developed, such as MXNet, which is Amazon-developed, and so forth. That's very encouraging. Structured Streaming is very important in terms of what they announced, which is an API to enable access to faster, or higher-throughput Structured Streaming in their cloud environment. And they also announced that they have gone beyond, in terms of the code that they've built, the micro-batch architecture of Structured Streaming, to enable it to evolve into a more true streaming environment to be able to contend credibly with the likes of Flink. 'Cause I think that the Spark community has, sort of, had their back against the wall with Structured Streaming that they couldn't fully provide a true sub-millisecond en-oo-en latency environment heretofore. But it sounds like with this R&D that Databricks is addressing that, and that's critically important for the Spark community to continue to evolve in terms of continuous computation. And then the serverless-apps announcement is also very important, 'cause I see it as really being, it's a fully-managed multi-tenant Spark-development environment, as an enabler for continuous Build, Deploy, and Testing DevOps within a Spark machine-learning and now deep-learning context. The Spark community as it evolves and matures needs robust DevOps tools to production-ize these machine-learning and deep-learning models. Because really, in many ways, many customers, many developers are now using, or developing, Spark applications that are real 24-by-7 enterprise application artifacts that need a robust DevOps environment. And I think that Databricks has indicated they know where this market needs to go and they're pushing it with R&D. And I'm encouraged by all those signs. >> So, great. Well thank you, Jim. I hope both you gentlemen are looking forward to tomorrow. I certainly am. >> Oh yeah. >> And to you out there, tune in again around 10:00 a.m. Pacific Time. We're going to be broadcasting live here. From Spark Summit 2017, I'm David Goad with Jim and George, saying goodbye for now. And we'll see you in the morning. (sparse percussion music playing) (wind humming and waves crashing).

Published Date : Jun 7 2017

SUMMARY :

Announcer: Live from San Francisco, it's the CUBE to customers, to founders, technologists, data scientists. I'm going to ask you to maybe recap And that changes the type of apps you can build. And thank you for that, George. after all the conversations you've had today. for the Spark community to continue to evolve I hope both you gentlemen are looking forward to tomorrow. And to you out there, tune in again

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
George	PERSON	0.99+
David	PERSON	0.99+
David Goad	PERSON	0.99+
San Francisco	LOCATION	0.99+
Matei	PERSON	0.99+
tomorrow	DATE	0.99+
Amazon	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
Spark	TITLE	0.99+
both	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
Intel	ORGANIZATION	0.98+
Spark Summit 2017	EVENT	0.98+
18 months	QUANTITY	0.98+
Flink	ORGANIZATION	0.97+
Facebook	ORGANIZATION	0.97+
Confluent Kafka	ORGANIZATION	0.97+
Caffe	ORGANIZATION	0.96+
today	DATE	0.96+
TensorFlow	TITLE	0.94+
three themes	QUANTITY	0.94+
10:00 a.m. Pacific Time	DATE	0.94+
CUBE	ORGANIZATION	0.94+
Deeplearning4j	TITLE	0.94+
Spark	ORGANIZATION	0.93+
1 millisecond	QUANTITY	0.93+
Keras	ORGANIZATION	0.91+
Day One	QUANTITY	0.81+
BigDL	TITLE	0.79+
TensorFlow	ORGANIZATION	0.79+
7	QUANTITY	0.77+
MLlib	TITLE	0.73+
Caffe 2	ORGANIZATION	0.7+
Caffe	TITLE	0.7+
24-	QUANTITY	0.68+
MXNet	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.54+

Fireside Chat with Andy Jassy, AWS CEO, at the AWS Summit SF 2017

>> Announcer: Please welcome Vice President of Worldwide Marketing, Amazon Web Services, Ariel Kelman. (applause) (techno music) >> Good afternoon, everyone. Thank you for coming. I hope you guys are having a great day here. It is my pleasure to introduce to come up on stage here, the CEO of Amazon Web Services, Andy Jassy. (applause) (techno music) >> Okay. Let's get started. I have a bunch of questions here for you, Andy. >> Just like one of our meetings, Ariel. >> Just like one of our meetings. So, I thought I'd start with a little bit of a state of the state on AWS. Can you give us your quick take? >> Yeah, well, first of all, thank you, everyone, for being here. We really appreciate it. We know how busy you guys are. So, hope you're having a good day. You know, the business is growing really quickly. In the last financials, we released, in Q four of '16, AWS is a 14 billion dollar revenue run rate business, growing 47% year over year. We have millions of active customers, and we consider an active customer as a non-Amazon entity that's used the platform in the last 30 days. And it's really a very broad, diverse customer set, in every imaginable size of customer and every imaginable vertical business segment. And I won't repeat all the customers that I know Werner went through earlier in the keynote, but here are just some of the more recent ones that you've seen, you know NELL is moving their their digital and their connected devices, meters, real estate to AWS. McDonalds is re-inventing their digital platform on top of AWS. FINRA is moving all in to AWS, yeah. You see at Reinvent, Workday announced AWS was its preferred cloud provider, and to start building on top of AWS further. Today, in press releases, you saw both Dunkin Donuts and Here, the geo-spatial map company announced they'd chosen AWS as their provider. You know and then I think if you look at our business, we have a really large non-US or global customer base and business that continues to expand very dramatically. And we're also aggressively increasing the number of geographic regions in which we have infrastructure. So last year in 2016, on top of the broad footprint we had, we added Korea, India, and Canada, and the UK. We've announced that we have regions coming, another one in China, in Ningxia, as well as in France, as well as in Sweden. So we're not close to being done expanding geographically. And then of course, we continue to iterate and innovate really quickly on behalf of all of you, of our customers. I mean, just last year alone, we launched what we considered over 1,000 significant services and features. So on average, our customers wake up every day and have three new capabilities they can choose to use or not use, but at their disposal. You've seen it already this year, if you look at Chime, which is our new unified communication service. It makes meetings much easier to conduct, be productive with. You saw Connect, which is our new global call center routing service. If you look even today, you look at Redshift Spectrum, which makes it easy to query all your data, not just locally on disk in your data warehouse but across all of S3, or DAX, which puts a cash in front of DynamoDB, we use the same interface, or all the new features in our machine learning services. We're not close to being done delivering and iterating on your behalf. And I think if you look at that collection of things, it's part of why, as Gartner looks out at the infrastructure space, they estimate the AWS is several times the size business of the next 14 providers combined. It's a pretty significant market segment leadership position. >> You talked a lot about adopts in there, a lot of customers moving to AWS, migrating large numbers of workloads, some going all in on AWS. And with that as kind of backdrop, do you still see a role for hybrid as being something that's important for customers? >> Yeah, it's funny. The quick answer is yes. I think the, you know, if you think about a few years ago, a lot of the rage was this debate about private cloud versus what people call public cloud. And we don't really see that debate very often anymore. I think relatively few companies have had success with private clouds, and most are pretty substantially moving in the direction of building on top of clouds like AWS. But, while you increasingly see more and more companies every month announcing that they're going all in to the cloud, we will see most enterprises operate in some form of hybrid mode for the next number of years. And I think in the early days of AWS and the cloud, I think people got confused about this, where they thought that they had to make this binary decision to either be all in on the public cloud and AWS or not at all. And of course that's not the case. It's not a binary decision. And what we know many of our enterprise customers want is they want to be able to run the data centers that they're not ready to retire yet as seamlessly as they can alongside of AWS. And it's why we've built a lot of the capabilities we've built the last several years. These are things like PPC, which is our virtual private cloud, which allows you to cordon off a portion of our network, deploy resources into it and connect to it through VPN or Direct Connect, which is a private connection between your data centers and our regions or our storage gateway, which is a virtual storage appliance, or Identity Federation, or a whole bunch of capabilities like that. But what we've seen, even though the vast majority of the big hybrid implementations today are built on top of AWS, as more and more of the mainstream enterprises are now at the point where they're really building substantial cloud adoption plans, they've come back to us and they've said, well, you know, actually you guys have made us make kind of a binary decision. And that's because the vast majority of the world is virtualized on top of VMWare. And because VMWare and AWS, prior to a few months ago, had really done nothing to try and make it easy to use the VMWare tools that people have been using for many years seamlessly with AWS, customers were having to make a binary choice. Either they stick with the VMWare tools they've used for a while but have a really tough time integrating with AWS, or they move to AWS and they have to leave behind the VMWare tools they've been using. And it really was the impetus for VMWare and AWS to have a number of deep conversations about it, which led to the announcement we made late last fall of VMWare and AWS, which is going to allow customers who have been using the VMWare tools to manage their infrastructure for a long time to seamlessly be able to run those on top of AWS. And they get to do so as they move workloads back and forth and they evolve their hybrid implementation without having to buy any new hardware, which is a big deal for companies. Very few companies are looking to find ways to buy more hardware these days. And customers have been very excited about this prospect. We've announced that it's going to be ready in the middle of this year. You see companies like Amadeus and Merck and Western Digital and the state of Louisiana, a number of others, we've a very large, private beta and preview happening right now. And people are pretty excited about that prospect. So we will allow customers to run in the mode that they want to run, and I think you'll see a huge transition over the next five to 10 years. >> So in addition to hybrid, another question we get a lot from enterprises around the concept of lock-in and how they should think about their relationship with the vendor and how they should think about whether to spread the workloads across multiple infrastructure providers. How do you think about that? >> Well, it's a question we get a lot. And Oracle has sure made people care about that issue. You know, I think people are very sensitive about being locked in, given the experience that they've had over the last 10 to 15 years. And I think the reality is when you look at the cloud, it really is nothing like being locked into something like Oracle. The APIs look pretty similar between the various providers. We build an open standard, it's like Linux and MySQL and Postgres. All the migration tools that we build allow you to migrate in or out of AWS. It's up to customers based on how they want to run their workload. So it is much easier to move away from something like the cloud than it is from some of the old software services that has created some of this phobia. But I think when you look at most CIOs, enterprise CIOs particularly, as they think about moving to the cloud, many of them started off thinking that they, you know, very well might split their workloads across multiple cloud providers. And I think when push comes to shove, very few decide to do so. Most predominately pick an infrastructure provider to run their workloads. And the reason that they don't split it across, you know, pretty evenly across clouds is a few reasons. Number one, if you do so, you have to standardize in the lowest common denominator. And these platforms are in radically different stages at this point. And if you look at something like AWS, it has a lot more functionality than anybody else by a large margin. And we're also iterating more quickly than you'll find from the other providers. And most folks don't want to tie the hands of their developers behind their backs in the name of having the ability of splitting it across multiple clouds, cause they actually are, in most of their spaces, competitive, and they have a lot of ideas that they want to actually build and invent on behalf of their customers. So, you know, they don't want to actually limit their functionality. It turns out the second reason is that they don't want to force their development teams to have to learn multiple platforms. And most development teams, if any of you have managed multiple stacks across different technologies, and many of us have had that experience, it's a pain in the butt. And trying to make a shift from what you've been doing for the last 30 years on premises to the cloud is hard enough. But then forcing teams to have to get good at running across two or three platforms is something most teams don't relish, and it's wasteful of people's time, it's wasteful of natural resources. That's the second thing. And then the third reason is that you effectively diminish your buying power because all of these cloud providers have volume discounts, and then you're splitting what you buy across multiple providers, which gives you a lower amount you buy from everybody at a worse price. So when most CIOs and enterprises look at this carefully, they don't actually end up splitting it relatively evenly. They predominately pick a cloud provider. Some will just pick one. Others will pick one and then do a little bit with a second, just so they know they can run with a second provider, in case that relationship with the one they choose to predominately run with goes sideways in some fashion. But when you really look at it, CIOs are not making that decision to split it up relatively evenly because it makes their development teams much less capable and much less agile. >> Okay, let's shift gears a little bit, talk about a subject that's on the minds of not just enterprises but startups and government organizations and pretty much every organization we talk to. And that's AI and machine learning. Reinvent, we introduced our Amazon AI services and just this morning Werner announced the general availability of Amazon Lex. So where are we overall on machine learning? >> Well it's a hugely exciting opportunity for customers, and I think, we believe it's exciting for us as well. And it's still in the relatively early stages, if you look at how people are using it, but it's something that we passionately believe is going to make a huge difference in the world and a huge difference with customers, and that we're investing a pretty gigantic amount of resource and capability for our customers. And I think the way that we think about, at a high level, the machine learning and deep learning spaces are, you know, there's kind of three macro layers of the stack. I think at that bottom layer, it's generally for the expert machine learning practitioners, of which there are relatively few in the world. It's a scarce resource relative to what I think will be the case in five, 10 years from now. And these are folks who are comfortable working with deep learning engines, know how to build models, know how to tune those models, know how to do inference, know how to get that data from the models into production apps. And for that group of people, if you look at the vast majority of machine learning and deep learning that's being done in the cloud today, it's being done on top of AWS, are P2 instances, which are optimized for deep learning and our deep learning AMIs, that package, effectively the deep learning engines and libraries inside those AMIs. And you see companies like Netflix, Nvidia, and Pinterest and Stanford and a whole bunch of others that are doing significant amounts of machine learning on top of those optimized instances for machine learning and the deep learning AMIs. And I think that you can expect, over time, that we'll continue to build additional capabilities and tools for those expert practitioners. I think we will support and do support every single one of the deep learning engines on top of AWS, and we have a significant amount of those workloads with all those engines running on top of AWS today. We also are making, I would say, a disproportionate investment of our own resources and the MXNet community just because if you look at running deep learning models once you get beyond a few GPUs, it's pretty difficult to have those scale as you get into the hundreds of GPUs. And most of the deep learning engines don't scale very well horizontally. And so what we've found through a lot of extensive testing, cause remember, Amazon has thousands of deep learning experts inside the company that have built very sophisticated deep learning capabilities, like the ones you see in Alexa, we have found that MXNet scales the best and almost linearly, as we continue to add nodes, as we continue to horizontally scale. So we have a lot of investment at that bottom layer of the stack. Now, if you think about most companies with developers, it's still largely inaccessible to them to do the type of machine learning and deep learning that they'd really like to do. And that's because the tools, I think, are still too primitive. And there's a number of services out there, we built one ourselves in Amazon Machine Learning that we have a lot of customers use, and yet I would argue that all of those services, including our own, are still more difficult than they should be for everyday developers to be able to build machine learning and access machine learning and deep learning. And if you look at the history of what AWS has done, in every part of our business, and a lot of what's driven us, is trying to democratize technologies that were really only available and accessible before to a select, small number of companies. And so we're doing a lot of work at what I would call that middle layer of the stack to get rid of a lot of the muck associated with having to do, you know, building the models, tuning the models, doing the inference, figuring how to get the data into production apps, a lot of those capabilities at that middle layer that we think are really essential to allow deep learning and machine learning to reach its full potential. And then at the top layer of the stack, we think of those as solutions. And those are things like, pass me an image and I'll tell you what that image is, or show me this face, does it match faces in this group of faces, or pass me a string of text and I'll give you an mpg file, or give me some words and what your intent is and then I'll be able to return answers that allow people to build conversational apps like the Lex technology. And we have a whole bunch of other services coming in that area, atop of Lex and Polly and Recognition, and you can imagine some of those that we've had to use in Amazon over the years that we'll continue to make available for you, our customers. So very significant level of investment at all three layers of that stack. We think it's relatively early days in the space but have a lot of passion and excitement for that. >> Okay, now for ML and AI, we're seeing customers wanting to load in tons of data, both to train the models and to actually process data once they've built their models. And then outside of ML and AI, we're seeing just as much demand to move in data for analytics and traditional workloads. So as people are looking to move more and more data to the cloud, how are we thinking about making it easier to get data in? >> It's a great question. And I think it's actually an often overlooked question because a lot of what gets attention with customers is all the really interesting services that allow you to do everything from compute and storage and database and messaging and analytics and machine learning and AI. But at the end of the day, if you have a significant amount of data already somewhere else, you have to get it into the cloud to be able to take advantage of all these capabilities that you don't have on premises. And so we have spent a disproportionate amount of focus over the last few years trying to build capabilities for our customers to make this easier. And we have a set of capabilities that really is not close to matched anywhere else, in part because we have so many customers who are asking for help in this area that it's, you know, that's really what drives what we build. So of course, you could use the good old-fashioned wire to send data over the internet. Increasingly, we find customers that are trying to move large amounts of data into S3, is using our S3 transfer acceleration service, which basically uses our points of presence, or POPs, all over the world to expedite delivery into S3. You know, a few years ago, we were talking to a number of companies that were looking to make big shifts to the cloud, and they said, well, I need to move lots of data that just isn't viable for me to move it over the wire, given the connection we can assign to it. It's why we built Snowball. And so we launched Snowball a couple years ago, which is really, it's a 50 terabyte appliance that is encrypted, the data's encrypted three different ways, and you ingest the data from your data center into Snowball, it has a Kindle connected to it, it allows you to, you know, that makes sure that you send it to the right place, and you can also track the progress of your high-speed ingestion into our data centers. And when we first launched Snowball, we launched it at Reinvent a couple years ago, I could not believe that we were going to order as many Snowballs to start with as the team wanted to order. And in fact, I reproached the team and I said, this is way too much, why don't we first see if people actually use any of these Snowballs. And so the team thankfully didn't listen very carefully to that, and they really only pared back a little bit. And then it turned out that we, almost from the get-go, had ordered 10X too few. And so this has been something that people have used in a very broad, pervasive way all over the world. And last year, at the beginning of the year, as we were asking people what else they would like us to build in Snowball, customers told us a few things that were pretty interesting to us. First, one that wasn't that surprising was they said, well, it would be great if they were bigger, you know, if instead of 50 terabytes it was more data I could store on each device. Then they said, you know, one of the problems is when I load the data onto a Snowball and send it to you, I have to still keep my local copy on premises until it's ingested, cause I can't risk losing that data. So they said it would be great if you could find a way to provide clustering, so that I don't have to keep that copy on premises. That was pretty interesting. And then they said, you know, there's some of that data that I'd actually like to be loading synchronously to S3, and then, or some things back from S3 to that data that I may want to compare against. That was interesting, having that endpoint. And then they said, well, we'd really love it if there was some compute on those Snowballs so I can do analytics on some relatively short-term signals that I want to take action on right away. Those were really the pieces of feedback that informed Snowball Edge, which is the next version of Snowball that we launched, announced at Reinvent this past November. So it has, it's a hundred-terabyte appliance, still the same level of encryption, and it has clustering so that you don't have to keep that copy of the data local. It allows you to have an endpoint to S3 to synchronously load data back and forth, and then it has a compute inside of it. And so it allows customers to use these on premises. I'll give you a good example. GE is using these for their wind turbines. And they collect all kinds of data from those turbines, but there's certain short-term signals they want to do analytics on in as close to real time as they can, and take action on those. And so they use that compute to do the analytics and then when they fill up that Snowball Edge, they detach it and send it back to AWS to do broad-scale analytics in the cloud and then just start using an additional Snowball Edge to capture that short-term data and be able to do those analytics. So Snowball Edge is, you know, we just launched it a couple months ago, again, amazed at the type of response, how many customers are starting to deploy those all over the place. I think if you have exabytes of data that you need to move, it's not so easy. An exabyte of data, if you wanted to move from on premises to AWS, would require 10,000 Snowball Edges. Those customers don't want to really manage a fleet of 10,000 Snowball Edges if they don't have to. And so, we tried to figure out how to solve that problem, and it's why we launched Snowmobile back at Reinvent in November, which effectively, it's a hundred-petabyte container on a 45-foot trailer that we will take a truck and bring out to your facility. It comes with its own power and its own network fiber that we plug in to your data center. And if you want to move an exabyte of data over a 10 gigabit per second connection, it would take you 26 years. But using 10 Snowmobiles, it would take you six months. So really different level of scale. And you'd be surprised how many companies have exabytes of data at this point that they want to move to the cloud to get all those analytics and machine learning capabilities running on top of them. Then for streaming data, as we have more and more companies that are doing real-time analytics of streaming data, we have Kinesis, where we built something called the Kinesis Firehose that makes it really simple to stream all your real-time data. We have a storage gateway for companies that want to keep certain data hot, locally, and then asynchronously be loading the rest of their data to AWS to be able to use in different formats, should they need it as backup or should they choose to make a transition. So it's a very broad set of storage capabilities. And then of course, if you've moved a lot of data into the cloud or into anything, you realize that one of the hardest parts that people often leave to the end is ETL. And so we have announced an ETL service called Glue, which we announced at Reinvent, which is going to make it much easier to move your data, be able to find your data and map your data to different locations and do ETL, which of course is hugely important as you're moving large amounts. >> So we've talked a lot about moving things to the cloud, moving applications, moving data. But let's shift gears a little bit and talk about something not on the cloud, connected devices. >> Yeah. >> Where do they fit in and how do you think about edge? >> Well, you know, I've been working on AWS since the start of AWS, and we've been in the market for a little over 11 years at this point. And we have encountered, as I'm sure all of you have, many buzzwords. And of all the buzzwords that everybody has talked about, I think I can make a pretty strong argument that the one that has delivered fastest on its promise has been IOT and connected devices. Just amazing to me how much is happening at the edge today and how fast that's changing with device manufacturers. And I think that if you look out 10 years from now, when you talk about hybrid, I think most companies, majority on premise piece of hybrid will not be servers, it will be connected devices. There are going to be billions of devices all over the place, in your home, in your office, in factories, in oil fields, in agricultural fields, on ships, in cars, in planes, everywhere. You're going to have these assets that sit at the edge that companies are going to want to be able to collect data on, do analytics on, and then take action. And if you think about it, most of these devices, by their very nature, have relatively little CPU and have relatively little disk, which makes the cloud disproportionately important for them to supplement them. It's why you see most of the big, successful IOT applications today are using AWS to supplement them. Illumina has hooked up their genome sequencing to AWS to do analytics, or you can look at Major League Baseball Statcast is an IOT application built on top of AWS, or John Deer has over 200,000 telematically enabled tractors that are collecting real-time planting conditions and information that they're doing analytics on and sending it back to farmers so they can figure out where and how to optimally plant. Tata Motors manages their truck fleet this way. Phillips has their smart lighting project. I mean, there're innumerable amounts of these IOT applications built on top of AWS where the cloud is supplementing the device's capability. But when you think about these becoming more mission-critical applications for companies, there are going to be certain functions and certain conditions by which they're not going to want to connect back to the cloud. They're not going to want to take the time for that round trip. They're not going to have connectivity in some cases to be able to make a round trip to the cloud. And what they really want is customers really want the same capabilities they have on AWS, with AWS IOT, but on the devices themselves. And if you've ever tried to develop on these embedded devices, it's not for mere mortals. It's pretty delicate and it's pretty scary and there's a lot of archaic protocols associated with it, pretty tough to do it all and to do it without taking down your application. And so what we did was we built something called Greengrass, and we announced it at Reinvent. And Greengrass is really like a software module that you can effectively have inside your device. And it allows developers to write lambda functions, it's got lambda inside of it, and it allows customers to write lambda functions, some of which they want to run in the cloud, some of which they want to run on the device itself through Greengrass. So they have a common programming model to build those functions, to take the signals they see and take the actions they want to take against that, which is really going to help, I think, across all these IOT devices to be able to be much more flexible and allow the devices and the analytics and the actions you take to be much smarter, more intelligent. It's also why we built Snowball Edge. Snowball Edge, if you think about it, is really a purpose-built Greengrass device. We have Greengrass, it's inside of the Snowball Edge, and you know, the GE wind turbine example is a good example of that. And so it's to us, I think it's the future of what the on-premises piece of hybrid's going to be. I think there're going to be billions of devices all over the place and people are going to want to interact with them with a common programming model like they use in AWS and the cloud, and we're continuing to invest very significantly to make that easier and easier for companies. >> We've talked about several feature directions. We talked about AI, machine learning, the edge. What are some of the other areas of investment that this group should care about? >> Well there's a lot. (laughs) That's not a suit question, Ariel. But there's a lot. I think, I'll name a few. I think first of all, as I alluded to earlier, we are not close to being done expanding geographically. I think virtually every tier-one country will have an AWS region over time. I think many of the emerging countries will as well. I think the database space is an area that is radically changing. It's happening at a faster pace than I think people sometimes realize. And I think it's good news for all of you. I think the database space over the last few decades has been a lonely place for customers. I think that they have felt particularly locked into companies that are expensive and proprietary and have high degrees of lock-in and aren't so customer-friendly. And I think customers are sick of it. And we have a relational database service that we launched many years ago and has many flavors that you can run. You can run MySQL, you can run Postgres, you can run MariaDB, you can run SQLServer, you can run Oracle. And what a lot of our customers kept saying to us was, could you please figure out a way to have a database capability that has the performance characteristics of the commercial-grade databases but the customer-friendly and pricing model of the more open engines like the MySQL and Postgres and MariaDB. What you do on your own, we do a lot of it at Amazon, but it's hard, I mean, it takes a lot of work and a lot of tuning. And our customers really wanted us to solve that problem for them. And it's why we spent several years building Aurora, which is our own database engine that we built, but that's fully compatible with MySQL and with Postgres. It's at least as fault tolerant and durable and performant as the commercial-grade databases, but it's a tenth of the cost of those. And it's also nice because if it turns out that you use Aurora and you decide for whatever reason you don't want to use Aurora anymore, because it's fully compatible with MySQL and Postgres, you just dump it to the community versions of those, and off you are. So there's really hardly any transition there. So that is the fastest-growing service in the history of AWS. I'm amazed at how quickly it's grown. I think you may have heard earlier, we've had 23,000 database migrations just in the last year or so. There's a lot of pent-up demand to have database freedom. And we're here to help you have it. You know, I think on the analytic side, it's just never been easier and less expensive to collect, store, analyze, and share data than it is today. Part of that has to do with the economics of the cloud. But a lot of it has to do with the really broad analytics capability that we provide you. And it's a much broader capability than you'll find elsewhere. And you know, you can manage Hadoop and Spark and Presto and Hive and Pig and Yarn on top of AWS, or we have a managed elastic search service, and you know, of course we have a very high scale, very high performing data warehouse in Redshift, that just got even more performant with Spectrum, which now can query across all of your S3 data, and of course you have Athena, where you can query S3 directly. We have a service that allows you to do real-time analytics of streaming data in Kinesis. We have a business intelligence service in QuickSight. We have a number of machine learning capabilities I talked about earlier. It's a very broad array. And what we find is that it's a new day in analytics for companies. A lot of the data that companies felt like they had to throw away before, either because it was too expensive to hold or they didn't really have the tools accessible to them to get the learning from that data, it's a totally different day today. And so we have a pretty big investment in that space, I mentioned Glue earlier to do ETL on all that data. We have a lot more coming in that space. I think compute, super interesting, you know, I think you will find, I think we will find that companies will use full instances for many, many years and we have, you know, more than double the number of instances than you'll find elsewhere in every imaginable shape and size. But I would also say that the trend we see is that more and more companies are using smaller units of compute, and it's why you see containers becoming so popular. We have a really big business in ECS. And we will continue to build out the capability there. We have companies really running virtually every type of container and orchestration and management service on top of AWS at this point. And then of course, a couple years ago, we pioneered the event-driven serverless capability in compute that we call Lambda, which I'm just again, blown away by how many customers are using that for everything, in every way. So I think the basic unit of compute is continuing to get smaller. I think that's really good for customers. I think the ability to be serverless is a very exciting proposition that we're continuing to to fulfill that vision that we laid out a couple years ago. And then, probably, the last thing I'd point out right now is, I think it's really interesting to see how the basic procurement of software is changing. In significant part driven by what we've been doing with our Marketplace. If you think about it, in the old world, if you were a company that was buying software, you'd have to go find bunch of the companies that you should consider, you'd have to have a lot of conversations, you'd have to talk to a lot of salespeople. Those companies, by the way, have to have a big sales team, an expensive marketing budget to go find those companies and then go sell those companies and then both companies engage in this long tap-dance around doing an agreement and the legal terms and the legal teams and it's just, the process is very arduous. Then after you buy it, you have to figure out how you're going to actually package it, how you're deploy to infrastructure and get it done, and it's just, I think in general, both consumers of software and sellers of software really don't like the process that's existed over the last few decades. And then you look at AWS Marketplace, and we have 35 hundred product listings in there from 12 hundred technology providers. If you look at the number of hours, that software that's been running EC2 just in the last month alone, it's several hundred million hours, EC2 hours, of that software being run on top of our Marketplace. And it's just completely changing how software is bought and procured. I think that if you talk to a lot of the big sellers of software, like Splunk or Trend Micro, there's a whole number of them, they'll tell you it totally changes their ability to be able to sell. You know, one of the things that really helped AWS in the early days and still continues to help us, is that we have a self-service model where we don't actually have to have a lot of people talk to every customer to get started. I think if you're a seller of software, that's very appealing, to allow people to find your software and be able to buy it. And if you're a consumer, to be able to buy it quickly, again, without the hassle of all those conversations and the overhead associated with that, very appealing. And I think it's why the marketplace has just exploded and taken off like it has. It's also really good, by the way, for systems integrators, who are often packaging things on top of that software to their clients. This makes it much easier to build kind of smaller catalogs of software products for their customers. I think when you layer on top of that the capabilities that we've announced to make it easier for SASS providers to meter and to do billing and to do identity is just, it's a very different world. And so I think that also is very exciting, both for companies and customers as well as software providers. >> We certainly touched on a lot here. And we have a lot going on, and you know, while we have customers asking us a lot about how they can use all these new services and new features, we also tend to get a lot of questions from customers on how we innovate so quickly, and they can think about applying some of those lessons learned to their own businesses. >> So you're asking how we're able to innovate quickly? >> Mmm hmm. >> I think there's a few things that have helped us, and it's different for every company. But some of these might be helpful. I'll point to a few. I think the first thing is, I think we disproportionately index on hiring builders. And we think of builders as people who are inventors, people who look at different customer experiences really critically, are honest about what's flawed about them, and then seek to reinvent them. And then people who understand that launch is the starting line and not the finish line. There's very little that any of us ever built that's a home run right out of the gate. And so most things that succeed take a lot of listening to customers and a lot of experimentation and a lot of iterating before you get to an equation that really works. So the first thing is who we hire. I think the second thing is how we organize. And we have, at Amazon, long tried to organize into as small and separable and autonomous teams as we can, that have all the resources in those teams to own their own destiny. And so for instance, the technologists and the product managers are part of the same team. And a lot of that is because we don't want the finger pointing that goes back and forth between the teams, and if they're on the same team, they focus all their energy on owning it together and understanding what customers need from them, spending a disproportionate amount of time with customers, and then they get to own their own roadmaps. One of the reasons we don't publish a 12 to 18 month roadmap is we want those teams to have the freedom, in talking to customers and listening to what you tell us matters, to re-prioritize if there are certain things that we assumed mattered more than it turns out it does. So, you know I think that the way that we organize is the second piece. I think a third piece is all of our teams get to use the same AWS building blocks that all of you get to use, which allow you to move much more quickly. And I think one of the least told stories about Amazon over the last five years, in part because people have gotten interested in AWS, is people have missed how fast our consumer business at Amazon has iterated. Look at the amount of invention in Amazon's consumer business. And they'll tell you that a big piece of that is their ability to use the AWS building blocks like they do. I think a fourth thing is many big companies, as they get larger, what starts to happen is what people call the institutional no, which is that leaders walk into meetings on new ideas looking to find ways to say no, and not because they're ill intended but just because they get more conservative or they have a lot on their plate or things are really managed very centrally, so it's hard to imagine adding more to what you're already doing. At Amazon, it's really the opposite, and in part because of the way we're organized in such a decoupled, decentralized fashion, and in part because it's just part of our DNA. When the leaders walk into a meeting, they are looking for ways to say yes. And we don't say yes to everything, we have a lot of proposals. But we say yes to a lot more than I think virtually any other company on the planet. And when we're having conversations with builders who are proposing new ideas, we're in a mode where we're trying to problem-solve with them to get to yes, which I think is really different. And then I think the last thing is that we have mechanisms inside the company that allow us to make fast decisions. And if you want a little bit more detail, you should read our founder and CEO Jeff Bezos's shareholder letter, which just was released. He talks about the fast decision-making that happens inside the company. It's really true. We make fast decisions and we're willing to fail. And you know, we sometimes talk about how we're working on several of our next biggest failures, and we hope that most of the things we're doing aren't going to fail, but we know, if you're going to push the envelope and if you're going to experiment at the rate that we're trying to experiment, to find more pillars that allow us to do more for customers and allow us to be more relevant, you are going to fail sometimes. And you have to accept that, and you have to have a way of evaluating people that recognizes the inputs, meaning the things that they actually delivered as opposed to the outputs, cause on new ventures, you don't know what the outputs are going to be, you don't know consumers or customers are going to respond to the new thing you're trying to build. So you have to be able to reward employees on the inputs, you have to have a way for them to continue to progress and grow in their career even if they work on something didn't work. And you have to have a way of thinking about, when things don't work, how do I take the technology that I built as part of that, that really actually does work, but I didn't get it right in the form factor, and use it for other things. And I think that when you think about a culture like Amazon, that disproportionately hires builders, organizes into these separable, autonomous teams, and allows them to use building blocks to move fast, and has a leadership team that's looking to say yes to ideas and is willing to fail, you end up finding not only do you do more inventing but you get the people at every level of the organization spending their free cycles thinking about new ideas because it actually pays to think of new ideas cause you get a shot to try it. And so that has really helped us and I think most of our customers who have made significant shifts to AWS and the cloud would argue that that's one of the big transformational things they've seen in their companies as well. >> Okay. I want to go a little bit deeper on the subject of culture. What are some of the things that are most unique about the AWS culture that companies should know about when they're looking to partner with us? >> Well, I think if you're making a decision on a predominant infrastructure provider, it's really important that you decide that the culture of the company you're going to partner with is a fit for yours. And you know, it's a super important decision that you don't want to have to redo multiple times cause it's wasted effort. And I think that, look, I've been at Amazon for almost 20 years at this point, so I have obviously drank the Kool Aid. But there are a few things that I think are truly unique about Amazon's culture. I'll talk about three of them. The first is I think that we are unusually customer-oriented. And I think a lot of companies talk about being customer-oriented, but few actually are. I think most of the big technology companies truthfully are competitor-focused. They kind of look at what competitors are doing and then they try to one-up one another. You have one or two of them that I would say are product-focused, where they say, hey, it's great, you Mr. and Mrs. Customer have ideas on a product, but leave that to the experts, and you know, you'll like the products we're going to build. And those strategies can be good ones and successful ones, they're just not ours. We are driven by what customers tell us matters to them. We don't build technology for technology's sake, we don't become, you know, smitten by any one technology. We're trying to solve real problems for our customers. 90% of what we build is driven by what you tell us matters. And the other 10% is listening to you, and even if you can't articulate exactly what you want, trying to read between the lines and invent on your behalf. So that's the first thing. Second thing is that we are pioneers. We really like to invent, as I was talking about earlier. And I think most big technology companies at this point have either lost their will or their DNA to invent. Most of them acquire it or fast follow. And again, that can be a successful strategy. It's just not ours. I think in this day and age, where we're going through as big a shift as we are in the cloud, which is the biggest technology shift in our lifetime, as dynamic as it is, being able to partner with a company that has the most functionality, it's iterating the fastest, has the most customers, has the largest ecosystem of partners, has SIs and ISPs, that has had a vision for how all these pieces fit together from the start, instead of trying to patch them together in a following act, you have a big advantage. I think that the third thing is that we're unusually long-term oriented. And I think that you won't ever see us show up at your door the last day of a quarter, the last day of a year, trying to harass you into doing some kind of deal with us, not to be heard from again for a couple years when we either audit you or try to re-up you for a deal. That's just not the way that we will ever operate. We are trying to build a business, a set of relationships, that will outlast all of us here. And I think something that always ties it together well is this trusted advisor capability that we have inside our support function, which is, you know, we look at dozens of programmatic ways that our customers are using the platform and reach out to you if you're doing something we think's suboptimal. And one of the things we do is if you're not fully utilizing resources, or hardly, or not using them at all, we'll reach out and say, hey, you should stop paying for this. And over the last couple of years, we've sent out a couple million of these notifications that have led to actual annualized savings for customers of 350 million dollars. So I ask you, how many of your technology partners reach out to you and say stop spending money with us? To the tune of 350 million dollars lost revenue per year. Not too many. And I think when we first started doing it, people though it was gimmicky, but if you understand what I just talked about with regard to our culture, it makes perfect sense. We don't want to make money from customers unless you're getting value. We want to reinvent an experience that we think has been broken for the prior few decades. And then we're trying to build a relationship with you that outlasts all of us, and we think the best way to do that is to provide value and do right by customers over a long period of time. >> Okay, keeping going on the culture subject, what about some of the quirky things about Amazon's culture that people might find interesting or useful? >> Well there are a lot of quirky parts to our culture. And I think any, you know lots of companies who have strong culture will argue they have quirky pieces but I think there's a few I might point to. You know, I think the first would be the first several years I was with the company, I guess the first six years or so I was at the company, like most companies, all the information that was presented was via PowerPoint. And we would find that it was a very inefficient way to consume information. You know, you were often shaded by the charisma of the presenter, sometimes you would overweight what the presenters said based on whether they were a good presenter. And vice versa. You would very rarely have a deep conversation, cause you have no room on PowerPoint slides to have any depth. You would interrupt the presenter constantly with questions that they hadn't really thought through cause they didn't think they were going to have to present that level of depth. You constantly have the, you know, you'd ask the question, oh, I'm going to get to that in five slides, you want to do that now or you want to do that in five slides, you know, it was just maddening. And we would often find that most of the meetings required multiple meetings. And so we made a decision as a company to effectively ban PowerPoints as a communication vehicle inside the company. Really the only time I do PowerPoints is at Reinvent. And maybe that shows. And what we found is that it's a much more substantive and effective and time-efficient way to have conversations because there is no way to fake depth in a six-page narrative. So what we went to from PowerPoint was six-page narrative. You can write, have as much as you want in the appendix, but you have to assume nobody will read the appendices. Everything you have to communicate has to be done in six pages. You can't fake depth in a six-page narrative. And so what we do is we all get to the room, we spend 20 minutes or so reading the document so it's fresh in everybody's head. And then where we start the conversation is a radically different spot than when you're hearing a presentation one kind of shallow slide at a time. We all start the conversation with a fair bit of depth on the topic, and we can really hone in on the three or four issues that typically matter in each of these conversations. So we get to the heart of the matter and we can have one meeting on the topic instead of three or four. So that has been really, I mean it's unusual and it takes some time getting used to but it is a much more effective way to pay attention to the detail and have a substantive conversation. You know, I think a second thing, if you look at our working backwards process, we don't write a lot of code for any of our services until we write and refine and decide we have crisp press release and frequently asked question, or FAQ, for that product. And in the press release, what we're trying to do is make sure that we're building a product that has benefits that will really matter. How many times have we all gotten to the end of products and by the time we get there, we kind of think about what we're launching and think, this is not that interesting. Like, people are not going to find this that compelling. And it's because you just haven't thought through and argued and debated and made sure that you drew the line in the right spot on a set of benefits that will really matter to customers. So that's why we use the press release. The FAQ is to really have the arguments up front about how you're building the product. So what technology are you using? What's the architecture? What's the customer experience? What's the UI look like? What's the pricing dimensions? Are you going to charge for it or not? All of those decisions, what are people going to be most excited about, what are people going to be most disappointed by. All those conversations, if you have them up front, even if it takes you a few times to go through it, you can just let the teams build, and you don't have to check in with them except on the dates. And so we find that if we take the time up front we not only get the products right more often but the teams also deliver much more quickly and with much less churn. And then the third thing I'd say that's kind of quirky is it is an unusually truth-seeking culture at Amazon. I think we have a leadership principle that we say have backbone, disagree, and commit. And what it means is that we really expect people to speak up if they believe that we're headed down a path that's wrong for customers, no matter who is advancing it, what level in the company, everybody is empowered and expected to speak up. And then once we have the debate, then we all have to pull the same way, even if it's a different way than you were advocating. And I think, you always hear the old adage of where, two people look at a ceiling and one person says it's 14 feet and the other person says, it's 10 feet, and they say, okay let's compromise, it's 12 feet. And of course, it's not 12 feet, there is an answer. And not all things that we all consider has that black and white answer, but most things have an answer that really is more right if you actually assess it and debate it. And so we have an environment that really empowers people to challenge one another and I think it's part of why we end up getting to better answers, cause we have that level of openness and rigor. >> Okay, well Andy, we have time for one more question. >> Okay. >> So other than some of the things you've talked about, like customer focus, innovation, and long-term orientation, what is the single most important lesson that you've learned that is really relevant to this audience and this time we're living in? >> There's a lot. But I'll pick one. I would say I'll tell a short story that I think captures it. In the early days at Amazon, our sole business was what we called an owned inventory retail business, which meant we bought the inventory from distributors or publishers or manufacturers, stored it in our own fulfillment centers and shipped it to customers. And around the year 1999 or 2000, this third party seller model started becoming very popular. You know, these were companies like Half.com and eBay and folks like that. And we had a really animated debate inside the company about whether we should allow third party sellers to sell on the Amazon site. And the concerns internally were, first of all, we just had this fundamental belief that other sellers weren't going to care as much about the customer experience as we did cause it was such a central part of everything we did DNA-wise. And then also we had this entire business and all this machinery that was built around owned inventory business, with all these relationships with publishers and distributors and manufacturers, who we didn't think would necessarily like third party sellers selling right alongside us having bought their products. And so we really debated this, and we ultimately decided that we were going to allow third party sellers to sell in our marketplace. And we made that decision in part because it was better for customers, it allowed them to have lower prices, so more price variety and better selection. But also in significant part because we realized you can't fight gravity. If something is going to happen, whether you want it to happen or not, it is going to happen. And you are much better off cannibalizing yourself or being ahead of whatever direction the world is headed than you are at howling at the wind or wishing it away or trying to put up blockers and find a way to delay moving to the model that is really most successful and has the most amount of benefits for the customers in question. And that turned out to be a really important lesson for Amazon as a company and for me, personally, as well. You know, in the early days of doing Marketplace, we had all kinds of folks, even after we made the decision, that despite the have backbone, disagree and commit weren't really sure that they believed that it was going to be a successful decision. And it took several months, but thankfully we really were vigilant about it, and today in roughly half of the units we sell in our retail business are third party seller units. Been really good for our customers. And really good for our business as well. And I think the same thing is really applicable to the space we're talking about today, to the cloud, as you think about this gigantic shift that's going on right now, moving to the cloud, which is, you know, I think in the early days of the cloud, the first, I'll call it six, seven, eight years, I think collectively we consumed so much energy with all these arguments about are people going to move to the cloud, what are they going to move to the cloud, will they move mission-critical applications to the cloud, will the enterprise adopt it, will public sector adopt it, what about private cloud, you know, we just consumed a huge amount of energy and it was, you can see both in the results in what's happening in businesses like ours, it was a form of fighting gravity. And today we don't really have if conversations anymore with our customers. They're all when and how and what order conversations. And I would say that this going to be a much better world for all of us, because we will be able to build in a much more cost effective fashion, we will be able to build much more quickly, we'll be able to take our scarce resource of engineers and not spend their resource on the undifferentiated heavy lifting of infrastructure and instead on what truly differentiates your business. And you'll have a global presence, so that you have lower latency and a better end user customer experience being deployed with your applications and infrastructure all over the world. And you'll be able to meet the data sovereignty requirements of various locales. So I think it's a great world that we're entering right now, I think we're at a time where there's a lot less confusion about where the world is headed, and I think it's an unprecedented opportunity for you to reinvent your businesses, reinvent your applications, and build capabilities for your customers and for your business that weren't easily possible before. And I hope you take advantage of it, and we'll be right here every step of the way to help you. Thank you very much. I appreciate it. (applause) >> Thank you, Andy. And thank you, everyone. I appreciate your time today. >> Thank you. (applause) (upbeat music)

Published Date : May 3 2017

SUMMARY :

of Worldwide Marketing, Amazon Web Services, Ariel Kelman. It is my pleasure to introduce to come up on stage here, I have a bunch of questions here for you, Andy. of a state of the state on AWS. And I think if you look at that collection of things, a lot of customers moving to AWS, And of course that's not the case. and how they should think about their relationship And I think the reality is when you look at the cloud, talk about a subject that's on the minds And I think that you can expect, over time, So as people are looking to move and it has clustering so that you don't and talk about something not on the cloud, And I think that if you look out 10 years from now, What are some of the other areas of investment and we have, you know, more than double and you know, while we have customers and listening to what you tell us matters, What are some of the things that are most unique And the other 10% is listening to you, And I think any, you know lots of companies moving to the cloud, which is, you know, And thank you, everyone. Thank you.

ENTITIES

Entity	Category	Confidence
Amadeus	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Western Digital	ORGANIZATION	0.99+
Andy	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
France	LOCATION	0.99+
Sweden	LOCATION	0.99+
Ningxia	LOCATION	0.99+
China	LOCATION	0.99+
Andy Jassy	PERSON	0.99+
Stanford	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
Ariel Kelman	PERSON	0.99+
Jeff Bezos	PERSON	0.99+
two	QUANTITY	0.99+
three	QUANTITY	0.99+
2000	DATE	0.99+
Oracle	ORGANIZATION	0.99+
12	QUANTITY	0.99+
26 years	QUANTITY	0.99+
20 minutes	QUANTITY	0.99+
Ariel	PERSON	0.99+
two people	QUANTITY	0.99+
10 feet	QUANTITY	0.99+
six pages	QUANTITY	0.99+
90%	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
six-page	QUANTITY	0.99+
second piece	QUANTITY	0.99+
last year	DATE	0.99+
14 feet	QUANTITY	0.99+
six	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
47%	QUANTITY	0.99+
50 terabytes	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
12 feet	QUANTITY	0.99+
seven	QUANTITY	0.99+
five slides	QUANTITY	0.99+
Today	DATE	0.99+
four	QUANTITY	0.99+
one	QUANTITY	0.99+
10%	QUANTITY	0.99+
2016	DATE	0.99+
350 million dollars	QUANTITY	0.99+
10X	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
November	DATE	0.99+
US	LOCATION	0.99+
second reason	QUANTITY	0.99+
McDonalds	ORGANIZATION	0.99+

Lowell Anderson, AWS - AWS Summit SF 2017 - #AWSSummit - #theCUBE

>> Narrator: Live from San Francisco, it's The Cube! Covering AWS Summit 2017, brought to you by Amazon Web Services. (upbeat music) >> Hi, welcome back to The Cube. We are live in San Francisco at the AWS Summit at Moscone Center. Really excited to be here. A tremendous amount of buzz going on. I'm Lisa Martin with my cohost George Gilbert and we're very excited to have Lowell Anderson, product marketing guru at AWS. Welcome back, Cube alumni! >> Lowell: It's great to be here, Lisa, thank you. >> Great to have you here as well. The keynote this morning was so energetic with Werner and Nextdoor is going to be on the program in a little bit. Over a thousand product launches last year. Not only are there superpowers now that AWS, I like that. You don't have a T-shirt, but maybe next time. But I think the word that I heard most today so far is customer. And I think that it's such a, and as AWS really talks about, it's a really differentiated way of thinking, of doing business. I'd love to understand what the products that were announced today. Walk us through some of the key highlights there. Customer logos were everywhere. So talk to us about how customers are influencing the development of the new services and products coming from AWS. >> Yeah, well, you know, for us, customers are always core to what drives our innovation. It's how we start, we start with what our customers want, and we work backwards from that to try to deliver a lot of the new features and services that we talked about today. And Werner covered a huge breadth of things, but they really fall into maybe four or five categories. He started talking about, directly for developers, talking about what we're doing with a product called CodeStar, which is designed to really help developers build and deploy software applications in the Cloud. He also then went and talked about our new marketplace, SaaS Contracts' capability, which makes it super easy for customers to sign up and purchase SaaS applications using multi-year contracts on AWS, but it also makes it easier for ISVs to make their offerings available for our customers. So again, really trying to make that easy for customers. We talked a lot about what we're doing in artificial intelligence, with the general availability of Amazon Lex today, and then a really entertaining video with Polly, where we saw that avatar speaking and the new whispering capability, so adding a lot more value to our suite of artificial intelligence services. Some exciting stuff in analytics, where we talked about Redshift Spectrum, which is the new search capability on Amazon Redshift that allows customers to search not just the data in their Redshift database, but also search all the unstructured data they have in S3. And then some really exciting announcements here on the database space with DynamoDB DAX, which is an accelerator for DynamoDB. And we also talked about the availability of a new version of Aurora for Postgres. So a lot of new capabilities, both in databases, big data, analytics, machine learning and artificial intelligence, and our offerings for SaaS Contracts as well. >> And that was all before lunch. (laughs) >> Lowell: Yeah, a lot of stuff. >> Lowell, following up on, in order of, let's say the comments on AI and the announcements made there. Microsoft, Google, Amazon all have gone beyond frameworks and tools to fully trained services that a normal developer can get their hands around. But in the areas of conversational user interface, natural language understanding, image recognition. Why do you think that those three vendors, the three vendors have been able to make such progress in those areas, to make that capability accessible, and there's so many other areas where we're still down in the weeds? >> I think there's, we sort of see it in, sort of focusing in maybe three different areas that are really targeted at what our customers are asking for. We have some very sophisticated customers who really want to build their own deep learning and machine learning applications, and they want services like MXNet, which is a highly scalable deep learning framework, that they can do and build these deep learning models. So there's a very sophisticated, targeted customer who wants that. But we also have customers that want to build and train and create prediction algorithms, and they use Amazon Machine Learning, which is a managed service which allows them to look at their past transactional data and build prediction models from it. And then the third piece is kind of what you mentioned, which is services that are really designed for the average developer, so they can really easily add capabilities like chatbots and speech and visual recognition to their applications with a simple API interface. And I think what you touched on is, why did we focus here, Well I think, as Andy also talked about today, that it's really early days in this space. And we're going to see a really, really strong amount of innovation here. And Amazon, which has been doing this for many, many years, and thousands of developers focused on this in our retail side, we're really working hard to bring that technology out, so that our customers can use it. And Lex, which is based on Alexa, which we're all familiar with from using the Echo. Bringing that out and making that type of capability available for average developers to use is a piece of that. So I think you're just going to continue to see that and over the course of the next year you're going to see continued new services coming from us on machine learning and artificial intelligence, across all those three spectrums. >> So let me jump to another subject which is really a hot button for our customers, both on the vendor side and the enterprise side, which is the hybrid cloud, I don't know whether we should call it migration or journey or endpoint. But let's take a couple scenarios. Let's say you're a Hadoop customer, and you've got Cloudera on-prem, you're a big bank, you've put an instance of it on Amazon and on Azure so that you can move your data around and you're relatively free. >> Lowell: Sure. >> Now the big use case has been data warehouse offload. So all of a sudden you have two really great data warehouses that are best in class on Amazon. With Redshift, with now the significant expansion of it, and Snowflake, and then you have Teradata, which now can take their on-prem capabilities and put them on the Cloud. How does the customer weigh the cost/benefit of lowest common denominator versus-- >> Yeah, yeah, sure. I think for us and for our customers it's not a one-size-fits-all. Every customer approaches this differently, and so what our focus has been on is to give them the range of choice. So if you want to use Cloudera, you can deploy it on EC2 and you can manage that yourself, and that's going to work great for you. But if you want a more managed service where maybe you don't want to have to manage the scalability of that Cloudera deployment, maybe you want to use EMR and deploy your Hadoop applications on EMR which manages that scalability for you. And so you make those tradeoffs and each of our customers makes those tradeoffs for different reasons and in different ways and at different times. And so our focus has always been to really try to give them that flexibility, to give them services where they can make the choice themselves about which direction they want to go for their individual applications, and maybe mix it up and try different ways of running these types of applications. And so we have a full range of those types of, from the ability to deploy those directly onto EC2 and manage it themselves, all the way to fully managed services that we maintain all the scalability and management and monitoring ourselves. >> One of the interesting things that Andy Jassy said in his fireside chat just in the last hour or so about HyperCloud was that most enterprises are going to operate in HyperCloud for the next several years, and there are those customers that are going to have to, or want to have their own data centers for whatever type of application. But something also that he brought up in that context, and I know you know a lot about this, George, is VMware. So when I was looking at the announcement that was made in the last six months or so about VMware, vSphere-based cloud services, VMware has just sold off their vCloud Air, kind of competing product, wondering with the VMware Cloud on Amazon, how does that... what are really the primary drivers there? Is that sort of a way to take those VMware customers eventually towards hybrid cloud, or is that an opportunity to maybe compete with some of the other guys who might have more traction in the legacy application migration space? >> I think for us, it's again, it comes back to our customers saying, some of our workloads that maybe for a long period of time have been deployed on VMware and we've been using VMware ESX for many, many years on-premise, and we have these applications that have been deployed for many years there, and they're highly integrated, they use specific features of VMware, and maybe we also like using VMware's management tools, we like using vCloud to manage all of these different instances of our VMware virtualization platform, but we just want to run it in the Cloud, because we want that scalability. When you deploy that stuff on-premise, you're still kind of locked in. Every time you want to expand, you've got to go out and you've got to buy more hardware. You really don't have the agility to expand that business, both as it grows, or as it declines. So you're paying for that hardware to power it and run it no matter what. And so they're telling us we'd like to get some of this up into the Cloud, but we don't want necessarily to have to, we've built these apps, we're comfortable with how they're running them, but we want to run them up in the Cloud and we want to do it with low risk. And that's what this VMware relationship is about, is letting those enterprises that have spent years building and maintaining and using VMware and their various management tools, to do that up in the Cloud. That's really what it's about. >> So let's switch gears to another topic that Andy talked about, since all his topics were topical. Edge computing and IIoT. That's another big shift that's coming along and changing the architecture so we have more computing at the edge again, and huge amounts of data. Obviously there's many scenarios, but how do you think customers will basically think through this, or how should they think through how much analytics and capability is at the edge, that issue of should it look like what is in the Cloud? Or should it be really tight and light and embedded? >> I think we're seeing just an increasing range. And also a really interesting mix, where you have some very intelligent devices, your laptop and so on, that is connected to the Cloud and it has a pretty significant amount of processing power, and so there can be applications that run on that machine that are very sophisticated. But if we're going to start to expand that universe of edge devices out to simple sensors for pipelines, and simple ways to monitor the thermostat in your home, and simple ways to measure and monitor and track all sorts of, you know, automobiles and so on, that there's going to be a range of different on-premise or edge types of compute, that we need to support in the Cloud. And so I think what Andy's saying is that we want to build the Cloud to be the system that can act as the, has the analytics power to ingest data from these maybe tens of millions of different devices, which will have a range of different compute power, and support those applications on a case by case basis. >> We've got to wrap things up here, and I know this conversation could continue for many hours. I think what we've heard here today is a tremendous amount of innovation, and I made the joke, all announced before lunch, but really it was. We're seeing the flexibility, we're seeing the customers really drive the innovation. Also the fact that AWS starting in the startup space with the developers, that's still a very key target market for you, even as things go up to the enterprise. So continued best luck with everything going forward. We're excited to be at re:Invent in just, what, five or six months from now, and with many, many more thousands of people and hearing the great things that continue to come from the leader in public cloud. >> Lowell: All right. Thank you, Lisa. >> Thanks for joining us, Lowell, we appreciate it. Next time I want the superpower T-shirt. (laughs) >> (laughs) Okay, I'll take you up on that. >> All right. I'm Lisa Martin for my cohost George Gilbert. Thanks so much for watching, stick around. We are live at the AWS Summit in San Francisco, and we will be right back. (upbeat music)

Published Date : Apr 19 2017

SUMMARY :

brought to you by Amazon Web Services. and we're very excited to have and Nextdoor is going to be on the program in a little bit. and the new whispering capability, And that was all before lunch. in those areas, to make that capability accessible, and over the course of the next year you're going to see So let me jump to another subject which is and Snowflake, and then you have Teradata, and that's going to work great for you. that are going to have to, or want to have their own and we want to do it with low risk. and changing the architecture so we have more computing that there's going to be a range of different that continue to come from the leader in public cloud. Lowell: All right. Thanks for joining us, Lowell, we appreciate it. and we will be right back.

ENTITIES

Entity	Category	Confidence
Andy	PERSON	0.99+
George	PERSON	0.99+
Lowell	PERSON	0.99+
Lisa Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
San Francisco	LOCATION	0.99+
Lowell Anderson	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Werner	PERSON	0.99+
two	QUANTITY	0.99+
three vendors	QUANTITY	0.99+
Echo	COMMERCIAL_ITEM	0.99+
third piece	QUANTITY	0.99+
vCloud	TITLE	0.99+
four	QUANTITY	0.99+
thousands	QUANTITY	0.99+
last year	DATE	0.99+
EMR	TITLE	0.99+
Moscone Center	LOCATION	0.99+
today	DATE	0.99+
Cloudera	TITLE	0.98+
Redshift	TITLE	0.98+
VMware ESX	TITLE	0.98+
both	QUANTITY	0.98+
Alexa	TITLE	0.98+
each	QUANTITY	0.98+
One	QUANTITY	0.98+
EC2	TITLE	0.97+
next year	DATE	0.97+
AWS Summit	EVENT	0.97+
HyperCloud	TITLE	0.96+
Nextdoor	ORGANIZATION	0.96+
vSphere	TITLE	0.96+
AWS Summit 2017	EVENT	0.96+
#AWSSummit	EVENT	0.95+
Cloud	TITLE	0.95+
five categories	QUANTITY	0.94+
S3	TITLE	0.94+
last six months	DATE	0.94+
three different areas	QUANTITY	0.93+
DynamoDB	TITLE	0.92+
Azure	TITLE	0.91+
six months	QUANTITY	0.91+
Teradata	ORGANIZATION	0.89+
AWS Summit SF 2017	EVENT	0.88+
Aurora	TITLE	0.87+
five	QUANTITY	0.87+
tens of millions	QUANTITY	0.86+
data warehouses	QUANTITY	0.85+
Polly	PERSON	0.85+
Postgres	ORGANIZATION	0.85+

Yuanhao Sun, Transwarp Technology - BigData SV 2017 - #BigDataSV - #theCUBE

>> Announcer: Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. (upbeat percussion music) >> Okay, welcome back everyone. Live here in Silicon Valley, San Jose, is the Big Data SV, Big Data Silicon Valley in conjunction with Strata Hadoop, this is theCUBE's exclusive coverage. Over the next two days, we've got wall-to-wall interviews with thought leaders, experts breaking down the future of big data, future of analytics, future of the cloud. I'm John Furrier with my co-host George Gilbert with Wikibon. Our next guest is Yuanhao Sun, who's the co-founder and CTO of Transwarp Technologies. Welcome to theCUBE. You were on, during the, 166 days ago, I noticed, on theCUBE, previously. But now you've got some news. So let's get the news out of the way. What are you guys announcing here, this week? >> Yes, so we are announcing 5.0, the latest version of Transwarp Hub. So in this version, we will call it probably revolutionary product, because the first one is we embedded communities in our product, so we will allow people to isolate different kind of workloads, using dock and containers, and we also provide a scheduler to better support mixed workloads. And the second is, we are building a set of tools allow people to build their warehouse. And then migrate from existing or traditional data warehouse to Hadoop. And we are also providing people capability to build a data mart, actually. It allow you to interactively query data. So we build a column store in memory and on SSD. And we totally write the whole SQL engine. That is a very tiny SQL engine, allow people to query data very quickly. And so today that tiny SQL engine is like about five to ten times faster than Spark 2.0. And we also allow people to build cubes on top of Hadoop. And then, once the cube is built, the SQL performance, like the TBCH performance, is about 100 times faster than existing database, or existing Spark 2.0. So it's super-fast. And in, actually we found a Paralect customer, so they replace their data with software, to build a data mart. And we already migrate, say 100 reports, from their data to our product. So the promise is very good. And the first one is we are providing tool for people to build the machine learning pipelines and we are leveraging TensorFlow, MXNet, and also Spark for people to visualize the pipeline and to build the data mining workflows. So this is kind of like Datasense tools, it's very easy for people to use. >> John: Okay, so take a minute to explain, 'cus that was great, you got the performance there, that's the news out of the way. Take a minute to explain Transwarp, your value proposition, and when people engage you as a customer. >> Yuanhao: Yeah so, people choose our product and the major reason is our compatibility to Oracle, DV2, and teradata SQL syntax, because you know, they have built a lot of applications onto those databases, so when they migrate to Hadoop, they don't want to rewrote whole program, so our compatibility, SQL compatibility is big advantage to them, so this is the first one. And we also support full ANCIT and distribute transactions onto Hadoop. So that a lot of applications can be migrate to our product, with few modification or without any changes. So this is the first our advantage. The second is because we are providing, even the best streaming engine, that is actually derived from Spark. So we apply this technology to IOT applications. You know the IOT pretty soon, they need a very low latency but they also need very complicated models on top of streams. So that's why we are providing full SQL support and machine learning support on top of streaming events. And we are also using event-driven technology to reduce the latency, to five to ten milliseconds. So this is second reason people choose our product. And then today we are announcing 5.0, and I think people will find more reason to choose our product. >> So you have the compatibility SQL, you have the tooling, and now you have the performance. So kind of the triple threat there. So what's the customer saying, when you go out and talk with your customers, what's the view of the current landscape for customers? What are they solving right now, what are the key challenges and pain points that customers have today? >> We have customers in more than 12 vertical segments, and in different verticals they have different pain points, actually so. Take one example: in financial services, the main pain point for them is to migrate existing legacy applications to Hadoop, you know they have accumulated a lot of data, and the performance is very bad using legacy database, so they need high performance Hadoop and Spark to speed up the performance, like reports. But in another vertical, like in logistic and transportation and IOT, the pain point is to find a very low latency streaming engine. At the same time, they need very complicated programming model to write their applications. And that example, like in public sector, they actually need very complicated and large scale search engine. They need to build analytical capability on top of search engine. They can search the results and analyze the result in the same time. >> George: Yuanhao, as always, whenever we get to interview you on theCube, you toss out these gems, sort of like you know diamonds, like big rocks that under millions of years, and incredible pressure, have been squeezed down into these incredibly valuable, kind of, you know, valuable, sort of minerals with lots of goodness in them, so I need you to unpack that diamond back into something that we can make sense out of, or I should say, that's more accessible. You've done something that none of the Hadoop Distro guys have managed to do, which is to build databases that are not just decision support, but can handle OLTP, can handle operational applications. You've done the streaming, you've done what even Databricks can't do without even trying any of the other stuff, which is getting the streaming down to event at a time. Let's step back from all these amazing things, and tell us what was the secret sauce that let you build a platform this advanced? >> So actually, we are driven by our customers, and we do see the trends people are looking for, better solutions, you know there are a lot of pain to set up a habitable class to use the Hadoop technology. So that's why we found it's very meaningful and also very necessary for us to build a SQL database on top of Hadoop. Quite a lot of customers in FS side, they ask us to provide asset until the transaction can be put on top of Hadoop, because they have to guarantee the consistency of their data. Otherwise they cannot use the technology. >> At the risk of interrupting, maybe you can tell us why others have built the analytic databases on top of Hadoop, to give the familiar SQL access, and obviously have a desire also to have transactions next to it, so you can inform a transaction decision with the analytics. One of the questions is, how did you combine the two capabilities? I mean it only took Oracle like 40 years. >> Right, so. Actually our transaction capability is only for analytics, you know, so this OLTP capability it is not for short term transactional applications, it's for data warehouse kind of workloads. >> George: Okay, so when you're ingesting. >> Yes, when you're ingesting, when you modify your data, in batch, you have to guarantee the consistency. So that's the OLTP capability. But we are also building another distributed storage, and distributed database, and that are providing that with OLTP capability. That means you can do concurrent transactions, on that database, but we are still developing that software right now. Today our product providing the digital transaction capability for people to actually build their warehouse. You know quite a lot of people believe data warehouse do not need transaction capability, but we found a lot of people modify their data in data warehouse, you know, they are loading their data continuously to data warehouse, like the CRM tables, customer information, they can be changed over time. So every day people need to update or change the data, that's why we have to provide transaction capability in data warehouse. >> George: Okay, and then so then well tell us also, 'cus the streaming problem is, you know, we're told that roughly two thirds of Spark deployments use streaming as a workload. And the biggest knock on Spark is that it can't process one event at a time, you got to do a little batch. Tell us some of the use cases that can take advantage of doing one event at a time, and how you solved that problem? >> Yuanhao: Yeah so the first use case we encounter is the anti-fraud, or fraud detection application in FSI, so whenever you swipe your credit card, the bank needs to tell you if the transaction is a fraud or not in a few milliseconds. But if you are using Spark streaming, it will usually take 500 milliseconds, so the latency is too high for such kind of application. And that's why we have to provide event per time, like means event-driven processing to detect the fraud, so that we can interrupt the transaction in a few milliseconds, so that's one kind of application. The other can come from IOT applications, so we already put our streaming framework in large manufacture factory. So they have to detect the main function of their equipments in a very short time, otherwise it may explode. So if you... So if you are using Spark streaming, probably when you submit your application, it will take you hundreds of milliseconds, and when you finish your detection, it usually takes a few seconds, so that will be too long for such kind of application. And that's why we need a low latency streaming engine, but you can see it is okay to use Storm or Flink, right? And problem is, we found it is: They need a very complicated programming model, that they are going to solve equation on the streaming events, they need to do the FFT transformation. And they are also asking to run some linear regression or some neural network on top of events, so that's why we have to provide a SQL interface and we are also embedding the CEP capability into our streaming engine, so that you can use pattern to match the events and to send alerts. >> George: So, SQL to get a set of events and maybe join some in the complex event processing, CEP, to say, does this fit a pattern I'm looking for? >> Yuanhao: Yes. >> Okay, and so, and then with the lightweight OLTP, that and any other new projects you're looking at, tell us perhaps the new use cases you'd be appropriated for. >> Yuanhao: Yeah so that's our official product actually, so we are going to solve the problem of large scale OLTP transaction problems like, so you know, a lot of... You know, in China, there is so many population, like in public sector or in banks, they need build a highly scalable transaction systems so that they can support a very high concurrent transactions at the same time, so that's why we are building such kind of technology. You know, in the past, people just divide transaction into multiple databases, like multiple Oracle instances or multiple mySQL instances. But the problem is: if the application is simple, you can very easily divide a transaction over the multiple instances of databases. But if the application is very complicated, especially when the ISV already wrote the applications based on Oracle or traditional database, they already depends on the transaction systems so that's why we have to build a same kind of transaction systems, so that we can support their legacy applications, but they can scale to hundreds of nodes, and they can scale to millions of transactions per second. >> George: On the transactional stuff? >> Yuanhao: Yes. >> Just correct me if I'm wrong, I know we're running out of time but I thought Oracle only scales out when you're doing decision support work, not when you're doing OLTP, not that it, that it can only, that it can maybe stretch to ten nodes or something like that, am I mistaken? >> Yuanhao: Yes, they can scale to 16 to all 32 nodes. >> George: For transactional work? >> For transaction works, but so that's the theoretical limit, but you know, like Google F1 and Google Spanner, they can scale to hundreds of nodes. But you know, the latency is higher than Oracle because you have to use distributed particle to communicate with multiple nodes, so the latency is higher. >> On Google? >> Yes. >> On Google. The latency is higher on the Google? >> 'Cus it has to go like all the way to Europe and back. >> Oracle or Google latency, you said? >> Google, because if you are using two phase commit protocol you have to talk to multiple nodes to broadcast your request to multiple nodes, and then wait for the feedback, so that mean you have a much higher latency, but it's necessary to maintain the consistency. So in a distributed OLTP databases, the latency is usually higher, but the concurrency is also much higher, and scalability is much better. >> George: So that's a problem you've stretched beyond what Oracle's done. >> Yuanhao: Yes, so because customer can tolerant the higher latency, but they need to scale to millions of transactions per second, so that's why we have to build a distributed database. >> George: Okay, for this reason we're going to have to have you back for like maybe five or ten consecutive segments, you know, maybe starting tomorrow. >> We're going to have to get you back for sure. Final question for you: What are you excited about, from a technology, in the landscape, as you look at open source, you're working with Spark, you mentioned Kubernetes, you have micro services, all the cloud. What are you most excited about right now in terms of new technology that's going to help simplify and scale, with low latency, the databases, the software. 'Cus you got IOT, you got autonomous vehicles, you have all this data, what are you excited about? >> So actually, so this technology we already solve these problems actually, but I think the most exciting thing is we found... There's two trends, the first trend is: We found it's very exciting to find more competition framework coming out, like the AI framework, like TensorFlow and MXNet, Torch, and tons of such machine learning frameworks are coming out, so they are solving different kinds of problems, like facial recognition from video and images, like human computer interactions using voice, using audio. So it's very exciting I think, but for... And also it's very, we found it's very exciting we are embedding these, we are combining these technologies together, so that's why we are using competitors you know. We didn't use YARN, because it cannot support TensorFlow or other framework, but you know, if you are using containers and if you have good scheduler, you can schedule any kind of competition frameworks. So we found it's very interesting to, to have these new frameworks, and we can combine together to solve different kinds of problems. >> John: Thanks so much for coming onto theCube, it's an operating system world we're living in now, it's a great time to be a technologist. Certainly the opportunities are out there, and we're breaking it down here inside theCube, live in Silicon Valley, with the best tech executives, best thought leaders and experts here inside theCube. I'm John Furrier with George Gilbert. We'll be right back with more after this short break. (upbeat percussive music)

Published Date : Mar 14 2017

SUMMARY :

Jose, California, it's theCUBE, So let's get the news out of the way. And the first one is we are providing tool and when people engage you as a customer. And then today we are announcing 5.0, So kind of the triple threat there. the pain point is to find so I need you to unpack because they have to guarantee next to it, so you can you know, so this OLTP capability So that's the OLTP capability. 'cus the streaming problem is, you know, the bank needs to tell you Okay, and so, and then and they can scale to millions scale to 16 to all 32 nodes. so the latency is higher. The latency is higher on the Google? 'Cus it has to go like all so that mean you have George: So that's a the higher latency, but they need to scale segments, you know, to get you back for sure. like the AI framework, like it's a great time to be a technologist.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
China	LOCATION	0.99+
five	QUANTITY	0.99+
Europe	LOCATION	0.99+
Transwarp Technologies	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
500 milliseconds	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
hundreds of nodes	QUANTITY	0.99+
Hadoop	TITLE	0.99+
Today	DATE	0.99+
ten nodes	QUANTITY	0.99+
first	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
100 reports	QUANTITY	0.99+
tomorrow	DATE	0.99+
second	QUANTITY	0.99+
first one	QUANTITY	0.99+
Yuanhao Sun	PERSON	0.99+
second reason	QUANTITY	0.99+
Spark 2.0	TITLE	0.99+
today	DATE	0.99+
this week	DATE	0.99+
ten times	QUANTITY	0.99+
16	QUANTITY	0.99+
two trends	QUANTITY	0.99+
Yuanhao	PERSON	0.99+
SQL	TITLE	0.99+
Spark	TITLE	0.99+
first trend	QUANTITY	0.99+
two capabilities	QUANTITY	0.98+
Silicon Valley, San Jose	LOCATION	0.98+
TensorFlow	TITLE	0.98+
one event	QUANTITY	0.98+
32 nodes	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
Torch	TITLE	0.98+
166 days ago	DATE	0.98+
one example	QUANTITY	0.98+
more than 12 vertical segments	QUANTITY	0.97+
ten milliseconds	QUANTITY	0.97+
hundreds of milliseconds	QUANTITY	0.97+
two thirds	QUANTITY	0.97+
MXNet	TITLE	0.97+
Databricks	ORGANIZATION	0.96+
Google	ORGANIZATION	0.96+
ten consecutive segments	QUANTITY	0.95+
first use	QUANTITY	0.95+
Wikibon	ORGANIZATION	0.95+
Big Data Silicon Valley	ORGANIZATION	0.95+
Strata Hadoop	ORGANIZATION	0.95+
about 100 times	QUANTITY	0.94+
Big Data SV	ORGANIZATION	0.94+
One of	QUANTITY	0.94+

Rob Thomas, IBM | IBM Machine Learning Launch

>> Narrator: Live from New York, it's theCUBE. Covering the IBM Machine Learning Launch Event. Brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Welcome back to New York City, everybody this is theCUBE, we're here at the IBM Machine Learning Launch Event, Rob Thomas is here, he's the general manager of the IBM analytics group. Rob, good to see you again. >> Dave, great to see you, thanks for being here. >> Yeah it's our pleasure. So two years ago, IBM announced the Z platform, and the big theme was bringing analytics and transactions together. You guys are sort of extending that today, bringing machine learning. So the news just hit three minutes ago. >> Rob: Yep. >> Take us through what you announced. >> This is a big day for us. The announcement is we are going to bring machine learning to private Clouds, and my observation is this, you look at the world today, over 90% of the data in the world cannot be googled. Why is that? It's because it's behind corporate firewalls. And as we've worked with clients over the last few years, sometimes they don't want to move their most sensitive data to the public Cloud yet, and so what we've done is we've taken the machine learning from IBM Watson, we've extracted that, and we're enabling that on private Clouds, and we're telling clients you can get the power of machine learning across any type of data, whether it's data in a warehouse, a database, unstructured content, email, you name it we're bringing machine learning everywhere. To your point, we were thinking about, so where do we start? And we said, well, what is the world's most valuable data? It's the data on the mainframe. It's the transactional data that runs the retailers of the world, the banks of the world, insurance companies, airlines of the world, and so we said we're going to start there because we can show clients how they can use machine learning to unlock value in their most valuable data. >> And which, you say private Cloud, of course, we're talking about the original private Cloud, >> Rob: Yeah. >> Which is the mainframe, right? >> Rob: Exactly. >> And I presume that you'll extend that to other platforms over time is that right? >> Yeah, I mean, we're going to think about every place that data is managed behind a firewall, we want to enable machine learning as an ingredient. And so this is the first step, and we're going to be delivering every quarter starting next quarter, bringing it to other platforms, other repositories, because once clients get a taste of the idea of automating analytics with machine learning, what we call continuous intelligence, it changes the way they do analytics. And, so, demand will be off the charts here. >> So it's essentially Watson ML extracted and placed on Z, is that right? And describe how people are going to be using this and who's going to be using it. >> Sure, so Watson on the Cloud today is IBM's Cloud platform for artificial intelligence, cognitive computing, augmented intelligence. A component of that is machine learning. So we're bringing that as IBM machine learning which will run today on the mainframe, and then in the future, other platforms. Now let's talk about what it does. What it is, it's a single-place unified model management, so you can manage all your models from one place. And we've got really interesting technology that we pulled out of IBM research, called CADS, which stands for the Cognitive Assistance for Data Scientist. And the idea behind CADS is, you don't have to know which algorithm to choose, we're going to choose the algorithm for you. You build your model, we'll decide based on all the algorithms available on open-source what you built for yourself, what IBM's provided, what's the best way to run it, and our focus here is, it's about productivity of data science and data scientists. No company has as many data scientists as they want, and so we've got to make the ones they do have vastly more productive, and so with technology like CADS, we're helping them do their job more efficiently and better. >> Yeah, CADS, we've talked about this in theCUBE before, it's like an algorithm to choose an algorithm, and makes the best fit. >> Rob: Yeah. >> Okay. And you guys addressed some of the collaboration issues at your Watson data platform announcement last October, so talk about the personas who are asking you to give me access to mainframe data, and give me, to tooling that actually resides on this private Cloud. >> It's definitely a data science persona, but we see, I'd say, an emerging market where it's more the business analyst type that is saying I'd really like to get at that data, but I haven't been able to do that easily in the past. So giving them a single pane of glass if you will, with some light data science experience, where they can manage their models, using CADS to actually make it more productive. And then we have something called a feedback loop that's built into it, which is you build a model running on Z, as you get new data in, these are the largest transactional systems in the world so there's data coming in every second. As you get new data in, that model is constantly updating. The model is learning from the data that's coming in, and it's becoming smarter. That's the whole idea behind machine learning in the first place. And that's what we've been able to enable here. Now, you and I have talked through the years, Dave, about IBM's investment in Spark. This is one of the first, I would say, world-class applications of Spark. We announced Spark on the mainframe last year, what we're bringing with IBM machine learning is leveraging Spark as an execution engine on the mainframe, and so I see this as Spark is finally coming into the mainstream, when you talk about Spark accessing the world's greatest transactional data. >> Rob, I wonder if you can help our audience kind of squint through a compare and contrast, public Cloud versus what you're offering today, 'cause one thing, public Cloud adding new services, machine learning seemed like one of those areas that we would add, like IBM had done with a machine learning platform. Streaming, absolutely you hear mobile streaming applications absolutely happened in the public Cloud. Is cost similar in private Cloud? Can I get all the services? How will IBM and your customer base keep up with that pace of innovation that we've seen from IBM and others in the public Cloud on PRIM? >> Yeah, so, look, my view is it's not an either or. Because when you look at this valuable data, clients want to do some of it in public Cloud, they want to keep a lot of it in the system that they built on PRIMA. So our job is, how do we actually bridge that gap? So I see machine learning like we've talked about becoming much more of a hybrid capability over time because the data they want to move to the Cloud, they should do that. The economics are great. The data, doing it on private Cloud, actually the economics are tremendous as well. And so we're delivering an elastic infrastructure on private Cloud as well that can scale the public Cloud. So to me it's not either or, it's about what everybody wants as Cloud features. They want the elasticity, they want a creatable interface, they want the economics of Cloud, and our job is to deliver that in both places. Whether it's on the public Cloud, which we're doing, or on the private Cloud. >> Yeah, one of the thought exercises I've gone through is if you follow the data, and follow the applications, it's going to show you where customers are going to do things. If you look at IOT, if you look at healthcare, there's lots of uses that it's going to be on PRIMA it's going to be on the edge, I got to interview Walmart a couple of years ago at the IBM Ed show, and they leveraged Z globally to use their sales, their enablement, and obviously they're not going to use AWS as their platform. What's the trends, what do you hear form their customers, how much of the data, are there reasons why it needs to stay at the edge? It's not just compliance and governance, but it's just because that's where the data is and I think you were saying there's just so much data on the Z series itself compared to in other environments. >> Yeah, and it's not just the mainframe, right? Let's be honest, there's just massive amounts of data that still sits behind corporate firewalls. And while I believe the end destination is a lot of that will be on public Cloud, what do you do now? Because you can't wait until that future arrives. And so the place, the biggest change I've seen in the market in the last year is clients are building private Clouds. It's not traditional on-premise deployments, it's, they're building an elastic infrastructure behind their firewall, you see it a lot in heavily-regulated industries, so financial services where they're dealing with things like GDPR, any type of retailer who's dealing with things like PCI compliance. Heavy-regulated industries are saying, we want to move there, but we got challenges to solve right now. And so, our mission is, we want to make data simple and accessible, wherever it is, on private Cloud or public Cloud, and help clients on that journey. >> Okay, so carrying through on that, so you're now unlocking access to mainframe data, great, if I have, say, a retail example, and I've got some data science, I'm building some models, I'm accessing the mainframe data, if I have data that's elsewhere in the Cloud, how specifically with regard to this announcement will a practitioner execute on that? >> Yeah, so, one is you could decide one place that you want to land your data and have it be resonant, so you could do that. We have scenarios where clients are using data science experience on the Cloud, but they're actually leaving the data behind the firewalls. So we don't require them to move the data, so our model is one of flexibility in terms of how they want to manage their data assets. Which I think is unique in terms of IBM's approach to that. Others in the market say, if you want to use our tools, you have to move your data to our Cloud, some of them even say as you click through the terms, now we own your data, now we own your insights, that's not our approach. Our view is it's your data, if you want to run the applications in the Cloud, leave the data where it is, that's fine. If you want to move both to the Cloud, that's fine. If you wanted to leave both on private Cloud, that's fine. We have capabilities like Big SQL where we can actually federate data across public and private Clouds, so we're trying to provide choice and flexibility when it comes to this. >> And, Rob, in the context of this announcement, that would be, that example you gave, would be done through APIs that allow me access to that Cloud data is that right? >> Yeah, exactly, yes. >> Dave: Okay. >> So last year we announced something called Data Connect, which is basically, think of it as a bus between private and public Cloud. You can leverage Data Connect to seamlessly and easily move data. It's very high-speed, it uses our Aspera technology under the covers, so you can do that. >> Dave: A recent acquisition. >> Rob, IBM's been very active in open source engagement, in trying to help the industry sort out some of the challenges out there. Where do you see the state of the machine learning frameworks Google of course has TensorFlow, we've seen Amazon pushing at MXNet, is IBM supporting all of them, there certain horses that you have strong feelings for? What are your customers telling you? >> I believe in openness and choice. So with IBM machine learning you can choose your language, you can use Scala, you can use Java, you can use Python, more to come. You can choose your framework. We're starting with Spark ML because that's where we have our competency and that's where we see a lot of client desire. But I'm open to clients using other frameworks over time as well, so we'll start to bring that in. I think the IT industry always wants to kind of put people into a box. This is the model you should use. That's not our approach. Our approach is, you can use the language, you can use the framework that you want, and through things like IBM machine learning, we give you the ability to tap this data that is your most valuable data. >> Yeah, the box today has just become this mosaic and you have to provide access to all the pieces of that mosaic. One of the things that practitioners tell us is they struggle sometimes, and I wonder if you could weigh in on this, to invest either in improving the model or capturing more data and they have limited budget, and they said, okay. And I've had people tell me, no, you're way better off getting more data in, I've had people say, no no, now with machine learning we can advance the models. What are you seeing there, what are you advising customers in that regard? >> So, computes become relatively cheap, which is good. Data acquisitions become relatively cheap. So my view is, go full speed ahead on both of those. The value comes from the right algorithms and the right models. That's where the value is. And so I encourage clients, even think about maybe you separate your teams. And you have one that's focused on data acquisition and how you do that, and another team that's focused on model development, algorithm development. Because otherwise, if you give somebody both jobs, they both get done halfway, typically. And the value is from the right models, the right algorithms, so that's where we stress the focus. >> And models to date have been okay, but there's a lot of room for improvement. Like the two examples I like to use are retargeting, ad retargeting, which, as we all know as consumers is not great. You buy something and then you get targeted for another week. And then fraud detection, which is actually, for the last ten years, quite good, but there's still a lot of false positives. Where do you see IBM machine learning taking that practical use case in terms of improving those models? >> Yeah, so why are there false positives? The issue typically comes down to the quality of data, and the amount of data that you have that's why. Let me give an example. So one of the clients that's going to be talking at our event this afternoon is Argus who's focused on the healthcare space. >> Dave: Yeah, we're going to have him on here as well. >> Excellent, so Argus is basically, they collect data across payers, they're focused on healthcare, payers, providers, pharmacy benefit managers, and their whole mission is how do we cost-effectively serve different scenarios or different diseases, in this case diabetes, and how do we make sure we're getting the right care at the right time? So they've got all that data on the mainframe, they're constantly getting new data in, it could be about blood sugar levels, it could be about glucose, it could be about changes in blood pressure. Their models will get smarter over time because they built them with IBM machine learning so that what's cost-effective today may not be the most effective or cost-effective solution tomorrow. But we're giving them that continuous intelligence as data comes in to do that. That is the value of machine learning. I think sometimes people miss that point, they think it's just about making the data scientists' job easier, that productivity is part of it, but it's really about the voracity of the data and that you're constantly updating your models. >> And the patient outcome there, I read through some of the notes earlier, is if I can essentially opt in to allow the system to adjudicate the medication or the claim, and if I do so, I can get that instantaneously or in near real-time as opposed to have to wait weeks and phone calls and haggling. Is that right, did I get that right? >> That's right, and look, there's two dimensions. It's the cost of treatment, so you want to optimize that, and then it's the effectiveness. And which one's more important? Well, they're both actually critically important. And so what we're doing with Argus is building, helping them build models where they deploy this so that they're optimizing both of those. >> Right, and in the case, again, back to the personas, that would be, and you guys stressed this at your announcement last October, it's the data scientist, it's the data engineer, it's the, I guess even the application developer, right? Involved in that type of collaboration. >> My hope would be over time, when I talked about we view machine learning as an ingredient across everywhere that data is, is you want to embed machine learning into any applications that are built. And at that point you no longer need a data scientist per se, for that case, you can just have the app developer that's incorporating that. Whereas another tough challenge like the one we discussed, that's where you need data scientists. So think about, you need to divide and conquer the machine learning problem, where the data scientist can play, the business analyst can play, the app developers can play, the data engineers can play, and that's what we're enabling. >> And how does streaming fit in? We talked earlier about this sort of batch, interactive, and now you have this continuous sort of work load. How does streaming fit? >> So we use streaming in a few ways. One is very high-speed data ingest, it's a good way to get data into the Cloud. We also can do analytics on the fly. So a lot of our use case around streaming where we actually build analytical models into the streaming engine so that you're doing analytics on the fly. So I view that as, it's a different side of the same coin. It's kind of based on your use case, how fast you're ingesting data if you're, you know, sub-millisecond response times, you constantly have data coming in, you need something like a streaming engine to do that. >> And it's actually consolidating that data pipeline, is what you described which is big in terms of simplifying the complexity, this mosaic of a dupe, for example and that's a big value proposition of Spark. Alright, we'll give you the last word, you've got an audience outside waiting, big announcement today; final thoughts. >> You know, we talked about machine learning for a long time. I'll give you an analogy. So 1896, Charles Brady King is the first person to drive an automobile down the street in Detroit. It was 20 years later before Henry Ford actually turned it from a novelty into mass appeal. So it was like a 20-year incubation period where you could actually automate it, you could make it more cost-effective, you could make it simpler and easy. I feel like we're kind of in the same thing here where, the data era in my mind began around the turn of the century. Companies came onto the internet, started to collect a lot more data. It's taken us a while to get to the point where we could actually make this really easy and to do it at scale. And people have been wanting to do machine learning for years. It starts today. So we're excited about that. >> Yeah, and we saw the same thing with the steam engine, it was decades before it actually was perfected, and now the timeframe in our industry is compressed to years, sometimes months. >> Rob: Exactly. >> Alright, Rob, thanks very much for coming on theCUBE. Good luck with the announcement today. >> Thank you. >> Good to see you again. >> Thank you guys. >> Alright, keep it right there, everybody. We'll be right back with our next guest, we're live from the Waldorf Astoria, the IBM Machine Learning Launch Event. Be right back. [electronic music]

Published Date : Feb 15 2017

SUMMARY :

Brought to you by IBM. Rob, good to see you again. Dave, great to see you, and the big theme was bringing analytics and we're telling clients you can get it changes the way they do analytics. are going to be using this And the idea behind CADS and makes the best fit. so talk about the personas do that easily in the past. in the public Cloud. Whether it's on the public Cloud, and follow the applications, And so the place, that you want to land your under the covers, so you can do that. of the machine learning frameworks This is the model you should use. and you have to provide access to and the right models. for the last ten years, quite good, and the amount of data to have him on here as well. That is the value of machine learning. the system to adjudicate It's the cost of treatment, Right, and in the case, And at that point you no and now you have this We also can do analytics on the fly. in terms of simplifying the complexity, King is the first person and now the timeframe in our industry much for coming on theCUBE. the IBM Machine Learning Launch Event.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Henry Ford	PERSON	0.99+
Rob	PERSON	0.99+
Dave	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Detroit	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
Charles Brady King	PERSON	0.99+
New York City	LOCATION	0.99+
Walmart	ORGANIZATION	0.99+
Scala	TITLE	0.99+
Amazon	ORGANIZATION	0.99+
New York	LOCATION	0.99+
last year	DATE	0.99+
two dimensions	QUANTITY	0.99+
1896	DATE	0.99+
Java	TITLE	0.99+
both	QUANTITY	0.99+
Argus	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Python	TITLE	0.99+
20-year	QUANTITY	0.99+
GDPR	TITLE	0.99+
Argus	PERSON	0.99+
one	QUANTITY	0.99+
two examples	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
both jobs	QUANTITY	0.99+
first step	QUANTITY	0.99+
today	DATE	0.99+
next quarter	DATE	0.99+
two years ago	DATE	0.98+
first	QUANTITY	0.98+
Google	ORGANIZATION	0.98+
first person	QUANTITY	0.98+
three minutes ago	DATE	0.98+
20 years later	DATE	0.98+
Watson	TITLE	0.98+
last October	DATE	0.97+
IBM Machine Learning Launch Event	EVENT	0.96+
IBM Machine Learning Launch Event	EVENT	0.96+
Spark ML	TITLE	0.96+
both places	QUANTITY	0.95+
One	QUANTITY	0.95+
IBM Machine Learning Launch Event	EVENT	0.94+
MXNet	ORGANIZATION	0.94+
Watson ML	TITLE	0.94+
Data Connect	TITLE	0.94+
Cloud	TITLE	0.93+

Kickoff - IBM Machine Learning Launch - #IBMML - #theCUBE

>> Narrator: Live from New York, it's The Cube covering the IBM Machine Learning Launch Event brought to you by IBM. Here are your hosts, Dave Vellante and Stu Miniman. >> Good morning everybody, welcome to the Waldorf Astoria. Stu Miniman and I are here in New York City, the Big Apple, for IBM's Machine Learning Event #IBMML. We're fresh off Spark Summit, Stu, where we had The Cube, this by the way is The Cube, the worldwide leader in live tech coverage. We were at Spark Summit last week, George Gilbert and I, watching the evolution of so-called big data. Let me frame, Stu, where we're at and bring you into the conversation. The early days of big data were all about offloading the data warehouse and reducing the cost of the data warehouse. I often joke that the ROI of big data is reduction on investment, right? There's these big, expensive data warehouses. It was quite successful in that regard. What then happened is we started to throw all this data into the data warehouse. People would joke it became a data swamp, and you had a lot of tooling to try to clean the data warehouse and a lot of transforming and loading and the ETL vendors started to participate there in a bigger way. Then you saw the extension of these data pipelines to try to more with that data. The Cloud guys have now entered in a big way. We're now entering the Cognitive Era, as IBM likes to refer to it. Others talk about AI and machine learning and deep learning, and that's really the big topic here today. What we can tell you, that the news goes out at 9:00am this morning, and it was well known that IBM's bringing machine learning to its mainframe, z mainframe. Two years ago, Stu, IBM announced the z13, which was really designed to bring analytic and transaction processing together on a single platform. Clearly IBM is extending the useful life of the mainframe by bringing things like Spark, certainly what it did with Linux and now machine learning into z. I want to talk about Cloud, the importance of Cloud, and how that has really taken over the world of big data. Virtually every customer you talk to now is doing work on the Cloud. It's interesting to see now IBM unlocking its transaction base, its mission-critical data, to this machine learning world. What are you seeing around Cloud and big data? >> We've been digging into this big data space since before it was called big data. One of the early things that really got me interested and exciting about it is, from the infrastructure standpoint, storage has always been one of its costs that we had to have, and the massive amounts of data, the digital explosion we talked about, is keeping all that information or managing all that information was a huge challenge. Big data was really that bit flip. How do we take all that information and make it an opportunity? How do we get new revenue streams? Dave, IBM has been at the center of this and looking at the higher-level pieces of not just storing data, but leveraging it. Obviously huge in analytics, lots of focus on everything from Hadoop and Spark and newer technologies, but digging in to how they can leverage up the stack, which is where IBM has done a lot of acquisitions in that space and leveraging that and wants to make sure that they have a strong position both in Cloud, which was renamed. The soft layer is now IBM Bluemix with a lot of services including a machine learning service that leverages the Watson technology and of course OnPrem they've got the z and the power solutions that you and I have covered for many years at the IBM Med show. >> Machine learning obviously heavily leverages models. We've seen in the early days of the data, the data scientists would build models and machine learning allows those models to be perfected over time. So there's this continuous process. We're familiar with the world of Batch and then some mini computer brought in the world of interactive, so we're familiar with those types of workloads. Now we're talking about a new emergent workload which is continuous. Continuous apps where you're streaming data in, what Spark is all about. The models that data scientists are building can constantly be improved. The key is automation, right? Being able to automate that whole process, and being able to collaborate between the data scientist, the data quality engineers, even the application developers that's something that IBM really tried to address in its last big announcement in this area of which was in October of last year the Watson data platform, what they called at the time the DataWorks. So really trying to bring together those different personas in a way that they can collaborate together and improve models on a continuous basis. The use cases that you often hear in big data and certainly initially in machine learning are things like fraud detection. Obviously ad serving has been a big data application for quite some time. In financial services, identifying good targets, identifying risk. What I'm seeing, Stu, is that the phase that we're in now of this so-called big data and analytics world, and now bringing in machine learning and deep learning, is to really improve on some of those use cases. For example, fraud's gotten much, much better. Ten years ago, let's say, it took many, many months, if you ever detected fraud. Now you get it in seconds, or sometimes minutes, but you also get a lot of false positives. Oops, sorry, the transaction didn't go through. Did you do this transaction? Yes, I did. Oh, sorry, you're going to have to redo it because it didn't go through. It's very frustrating for a lot of users. That will get better and better and better. We've all experienced retargeting from ads, and we know how crappy they are. That will continue to get better. The big question that people have and it goes back to Jeff Hammerbacher, the best minds of my generation are trying to get people to click on ads. When will we see big data really start to affect our lives in different ways like patient outcomes? We're going to hear some of that today from folks in health care and pharma. Again, these are the things that people are waiting for. The other piece is, of course, IT. What you're seeing, in terms of IT, in the whole data flow? >> Yes, a big question we have, Dave, is where's the data? And therefore, where does it make sense to be able to do that processing? In big data we talked about you've got masses amounts of data, can we move the processing to that data? With IT, the day before, your RCTO talked that there's going to be massive amounts of data at the edge and I don't have the time or the bandwidth or the need necessarily to pull that back to some kind of central repository. I want to be able to work on it there. Therefore there's going to be a lot of data worked at the edge. Peter Levine did a whole video talking about how, "Oh, Public Cloud is dead, it's all going to the edge." A little bit hyperbolic to the statement we understand that there's plenty use cases for both Public Cloud and for the edge. In fact we see Google big pushing machine learning TensorFlow, it's got one of those machine learning frameworks out there that we expect a lot of people to be working on. Amazon is putting effort into the MXNet framework, which is once again an open-source effort. One of the things I'm looking at the space, and I think IBM can provide some leadership here is to what frameworks are going to become popular across multiple scenarios? How many winners can there be for these frameworks? We already have multiple programming languages, multiple Clouds. How much of it is just API compatibility? How much of work there, and where are the repositories of data going to be, and where does it make sense to do that predictive analytics, that advanced processing? >> You bring up a good point. Last year, last October, at Big Data CIV, we had a special segment of data scientists with a data scientist panel. It was great. We had some rockstar data scientists on there like Dee Blanchfield and Joe Caserta, and a number of others. They echoed what you always hear when you talk to data scientists. "We spend 80% of our time messing with the data, "trying to clean the data, figuring out the data quality, "and precious little time on the models "and proving the models "and actually getting outcomes from those models." So things like Spark have simplified that whole process and unified a lot of the tooling around so-called big data. We're seeing Spark adoption increase. George Gilbert in our part one and part two last week in the big data forecast from Wikibon showed that we're still not on the steep part of the Se-curve, in terms of Spark adoption. Generically, we're talking about streaming as well included in that forecast, but it's forecasting that increasingly those applications are going to become more and more important. It brings you back to what IBM's trying to do is bring machine learning into this critical transaction data. Again, to me, it's an extension of the vision that they put forth two years ago, bringing analytic and transaction data together, actually processing within that Private Cloud complex, which is what essentially this mainframe is, it's the original Private Cloud, right? You were saying off-camera, it's the original converged infrastructure. It's the original Private Cloud. >> The mainframe's still here, lots of Linux on it. We've covered for many years, you want your cool Linux docker, containerized, machine learning stuff, I can do that on the Zn-series. >> You want Python and Spark and Re and Papa Java, and all the popular programming languages. It makes sense. It's not like a huge growth platform, it's kind of flat, down, up in the product cycle but it's alive and well and a lot of companies run their businesses obviously on the Zn. We're going to be unpacking that all day. Some of the questions we have is, what about Cloud? Where does it fit? What about Hybrid Cloud? What are the specifics of this announcement? Where does it fit? Will it be extended? Where does it come from? How does it relate to other products within the IBM portfolio? And very importantly, how are customers going to be applying these capabilities to create business value? That's something that we'll be looking at with a number of the folks on today. >> Dave, another thing, it reminds me of two years ago you and I did an event with the MIT Sloan school on The Second Machine Age with Andy McAfee and Erik Brynjolfsson talking about as machines can help with some of these analytics, some of this advanced technology, what happens to the people? Talk about health care, it's doctors plus machines most of the time. As these two professors say, it's racing with the machines. What is the impact on people? What's the impact on jobs? And productivity going forward, really interesting hot space. They talk about everything from autonomous vehicles, advanced health care and the like. This is right at the core of where the next generation of the economy and jobs are going to go. >> It's a great point, and no doubt that's going to come up today and some of our segments will explore that. Keep it right there, everybody. We'll be here all day covering this announcement, talking to practitioners, talking to IBM executives and thought leaders and sharing some of the major trends that are going on in machine learning, the specifics of this announcement. Keep it right there, everybody. This is The Cube. We're live from the Waldorf Astoria. We'll be right back.

Published Date : Feb 15 2017

SUMMARY :

covering the IBM Machine and that's really the and the massive amounts of data, and it goes back to Jeff Hammerbacher, and I don't have the time or the bandwidth of the Se-curve, in I can do that on the Zn-series. Some of the questions we have is, of the economy and jobs are going to go. and sharing some of the major trends

ENTITIES

Entity	Category	Confidence
Jeff Hammerbacher	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Peter Levine	PERSON	0.99+
George Gilbert	PERSON	0.99+
Erik Brynjolfsson	PERSON	0.99+
Joe Caserta	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Last year	DATE	0.99+
80%	QUANTITY	0.99+
Andy McAfee	PERSON	0.99+
Stu	PERSON	0.99+
New York City	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
last October	DATE	0.99+
Dee Blanchfield	PERSON	0.99+
last week	DATE	0.99+
Python	TITLE	0.99+
two professors	QUANTITY	0.99+
Spark	TITLE	0.99+
October	DATE	0.99+
Google	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Linux	TITLE	0.98+
today	DATE	0.98+
two years ago	DATE	0.98+
Ten years ago	DATE	0.98+
Waldorf Astoria	ORGANIZATION	0.98+
Big Apple	LOCATION	0.98+
Two years ago	DATE	0.97+
Spark Summit	EVENT	0.97+
single platform	QUANTITY	0.97+
both	QUANTITY	0.97+
One	QUANTITY	0.97+
Wikibon	ORGANIZATION	0.96+
one	QUANTITY	0.96+
The Cube	COMMERCIAL_ITEM	0.96+
MIT Sloan school	ORGANIZATION	0.96+
Watson	TITLE	0.91+
9:00am this morning	DATE	0.9+
Hadoop	TITLE	0.9+
Re	TITLE	0.9+
Papa Java	TITLE	0.9+
Zn	TITLE	0.88+
Watson	ORGANIZATION	0.87+
IBM Machine Learning Launch Event	EVENT	0.87+
MXNet	TITLE	0.84+
part two	QUANTITY	0.82+
Cloud	TITLE	0.81+
Second	TITLE	0.8+
IBM Med	EVENT	0.8+
Machine Learning Event	EVENT	0.79+
z13	COMMERCIAL_ITEM	0.78+
#IBMML	EVENT	0.77+
Big	ORGANIZATION	0.75+
#IBMML	TITLE	0.75+
DataWorks	ORGANIZATION	0.71+

Mike Gualtieri, Forrester Research - Spark Summit East 2017 - #sparksummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is the Cube, covering Spark Summit East 2017, brought to you by Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, where the town is still euphoric. Mike Gualtieri is here, he's the principal analyst at Forrester Research, attended the parade yesterday. How great was that, Mike? >> Yes. Yes. It was awesome. >> Nothing like we've ever seen before. All right, the first question is what was the bigger shocking surprise, upset, greatest win, was it the Red Sox over the Yankees or was it the Superbowl this weekend? >> That's the question, I think it's the Superbowl. >> Yeah, who knows, right? Who knows. It was a lot of fun. So how was the parade yesterday? >> It was magnificent. I mean, it was freezing. No one cared. I mean--but it was, yeah, it was great. Great to see that team in person. >> That's good, wish we could talk, We can, but we'll get into it. So, we're here at Spark Summit, and, you know, the show's getting bigger, you're seeing more sponsors, still heavily a technical audience, but what's your take these days? We were talking off-camera about the whole big data thing. It used to be the hottest thing in the world, and now nobody wants to have big data in their title. What's Forrester's take on that? >> I mean, I think big data-- I think it's just become mainstream, so we're just back to data. You know, because all data is potentially big. So, I don't think it's-- it's not the thing anymore. I mean, what do you do with big data? You analyze it, right? And part of what this whole Spark Summit is about-- look at all the sessions. Data science, machine learning, streaming analytics, so it's all about sort of using that data now, so big data is still important, but the value of big data comes from all this advanced analytics. >> Yeah, and we talked earlier, I mean, a lot of the value of, you know, Hadoop was cutting costs. You know, you've mentioned commodity components and reduction in denominator, and breaking the need for some kind of big storage container. OK, so that-- we got there. Now, shifting to new sources of value, what are you spending your time on these days in terms of research? >> Artificial intelligence, machine learning, so those are really forms of advanced analytics, so that's been-- that's been very hot. We did a survey last year, an AI survey, and we asked a large group of people, we said, oh, you know, what are you doing with AI? 58% said they're researching it. 19% said they're training a model. Right, so that's interesting. 58% are researching it, and far fewer are actually, you know, actually doing something with it. Now, the reality is, if you phrase that a little bit differently, and you said, oh, what are you doing with machine learning? Many more would say yes, we're doing machine learning. So it begs the question, what do enterprises think of AI? And what do they think it is? So, a lot of my inquiries are spent helping enterprises understand what AI is, what they should focus on, and the other part of it is what are the technologies used for AI, and deep learning is the hottest. >> So, you wrote a piece late last year, what's possible today in AI. What's possible today in AI? >> Well, you know, before understanding was possible, it's important to understand what's not possible, right? And so we sort of characterize it as there's pure AI, and there's pragmatic AI. So it's real simple. Pure AI is the sci-fi stuff, we've all seen it, Ex Machina, Star Wars, whatever, right? That's not what we're talking about. That's not what enterprises can do today. We're talking about pragmatic AI, and pragmatic AI is about building predictive models. It's about conversational APIs, to interact in a natural way with humans, it's about image analysis, which is something very hot because of deep learning. So, AI is really about the building blocks that companies have been using, but then using them in combination to create even more intelligent solutions. And they have more options on the market, both from open source, both from cloud services that-- from Google, Microsoft, IBM, and now Amazon, at their re-- Were you guys at their reinvent conference? >> I wasn't, personally, but we were certainly there. >> Yeah, they announced the Amazon AI, which is a set of three services that developers can use without knowing anything about AI or being a data scientist. But, I mean, I think the way to think about AI is that it is data science. It requires the expertise of a data scientist to do AI. >> Following up on that comment, which was really interesting, is we try and-- whereas vendors try and democratize access to machine learning and AI, and I say that with two terms because usually the machine learning is the stuff that's sort of widely accessible and AI is a little further out, but there's a spectrum when you can just access an API, which is like a pre-trained model-- >> Pre-trained model, yep. >> It's developer-accessible, you don't need to be a data scientist, and then at the other end, you know, you need to pick your algorithms, you need to pick your features, you need to find the right data, so how do you see that horizon moving over time? >> Yeah, no, I-- So, these machine learning services, as you say, they're pre-trained models, totally accessible by anyone, anyone who can call an API or a restful service can access these. But their scope is limited, right? So, if, for example, you take the image API, you know, the imaging API that you can get from Google or now Amazon, you can drop an image in there and it will say, oh, there's a wine bottle on a picnic table on the beach. Right? It can identify that. So that's pretty cool, there might be a lot of use cases for that, but think of an enterprise use case. No. You can't do it, and let me give you this example. Say you're an insurance company, and you have a picture of a steel roof that's caved in. If you give that to one of these APIs, it might say steel roof, it may say damage, but what it's not going to do is it's not going to be able to estimate the damage, it's not going to be able to create a bill of materials on how to repair it, because Google hasn't trained it at that level. OK, so, enterprises are going to have to do this themselves, or an ISV is going to have to do it, because think about it, you've got 10 years worth of all these pictures taken of damage. And with all of those pictures, you've got tons of write-ups from an adjuster. Whoa, if you could shove that into a deep learning algorithm, you could potentially have consumers take pictures, or someone untrained, and have this thing say here's what the estimate damage is, this is the situation. >> And I've read about like insurance use cases like that, where the customer could, after they sort of have a crack up, take pictures all around the car, and then the insurance company could provide an estimate, tell them where the nearest repair shops are-- >> Yeah, but right now it's like the early days of e-commerce, where you could send an order in and then it would fax it and they'd type it in. So, I think, yes, insurance coverage is taking those pictures, and the question is can we automate it, and-- >> Well, let me actually iterate on that question, which is so who can build a more end-to-end solution, assuming, you know, there's a lot of heavy lifting that's got to go on for each enterprise trying to build a use case like that. Is it internal development and only at big companies that have a few of these data science gurus? Would it be like an IBM Global Services or an EXIN SURE, or would it be like a vertical ISV where it's semi-custom, semi-patent? >> I think it's both, but I also think it's two or three people walking around this conference, right, understanding Spark, maybe understanding how to use TensorFlow in conjunction with Spark that will start to come up with these ideas as well. So I think-- I think we'll see all of those solutions. Certainly, like IBM with their cognitive computing-- oh, and by the way, so we think that cognitive computing equals pragmatic AI, right, because it has similar characteristics. So, we're already seeing the big ISVs and the big application developers, SAP, Oracle, creating AI-infused applications or modules, but yeah, we're going to see small ISVs do it. There's one in Austin, Texas, called InteractiveTel. It's like 10 people. What they do is they use the Google-- so they sell to large car dealerships, like Ernie Boch. And they record every conversation, phone conversation with customers. They use the Google pre-trained model to convert the speech to text, and then they use their own machine learning to analyze that text to find out if there's a customer service problem or if there's a selling opportunity, and then they alert managers or other people in the organization. So, small company, very narrowly focused on something like car buying. >> So, I wonder if we could come back to something you said about pragmatic AI. We love to have someone like you on the Cube, because we like to talk about the horses on the track. So, if Watson is pragmatic AI, and we all-- well, I think you saw the 60 Minutes show, I don't know, whenever it was, three or four months ago, and IBM Watson got all the love. They barely mentioned Amazon and Google and Facebook, and Microsoft didn't get any mention. So, and there seems to be sentiment that, OK, all the real action is in Silicon Valley. But you've got IBM doing pragmatic AI. Do those two worlds come together in your view? How does that whole market shake up? >> I don't think they come together in the way I think you're suggesting. I think what Google, Microsoft, Facebook, what they're doing is they're churning out fundamental technology, like one of the most popular deep learning frameworks, TensorFlow, is a Google thing that they open sourced. And as I pointed out, those image APIs, that Amazon has, that's not going to work for insurance, that's not going to work for radiology. So, I don't think they're in-- >> George Gilbert: Facebook's going to apply it differently-- >> Yeah, I think what they're trying to do is they're trying to apply it to the millions of consumers that use their platforms, and then I think they throw off some of the technology for the rest of the world to use, fundamentally. >> And then the rest of the world has to apply those. >> Yeah, but I don't think they're in the business of building insurance solutions or building logistical solutions. >> Right. >> But you said something that was really, really potentially intriguing, which was you could take the horizontal Google speech to text API, and then-- >> Mike Gualtieri: And recombine it. >> --put your own model on top of that. And that's, techies call that like ensemble modeling, but essentially you're taking, almost like an OS level service, and you're putting in a more vertical application on top of it, to relate it to our old ways of looking at software, and that's interesting. >> Yeah, because what we're talking about right now, but this conversation is now about applications. Right, we're talking about applications, which need lots of different services recombined, whereas mostly the data science conversation has been narrowly about building one customer lifetime value model or one churn model. Now the conversation, when we talk about AI, is becoming about combining many different services and many different models. >> Dave Vellante: And the platform for building applications is really-- >> Yeah, yeah. >> And that platform, the richest platform, or the platform that is, that is most attractive has the most building blocks to work with, or the broadest ones? >> The best ones, I would say, right now. The reason why I say it that way is because this technology is still moving very rapidly. So for an image analysis, deep learning, very good for image, nothing's better than deep learning for image analysis. But if you're doing business process models or like churn models, well, deep learning hasn't played out there yet. So, right now I think there's some fragmentation. There's so much innovation. Ultimately it may come together. What we're seeing is, many of these companies are saying, OK, look, we're going to bring in the open source. It's pretty difficult to create a deep learning library. And so, you know, a lot of the vendors in the machine learning space, instead of creating their own, they're just bringing in MXNet or TensorFlow. >> I might be thinking of something from a different angle, which is not what underlying implementation they're using, whether it's deep learning or whether it's just random forest, or whatever the terminology is, you know, the traditional statistical stuff. The idea, though, is you want a platform-- like way, way back, Windows, with the Win32 API had essentially more widgets for helping you build graphical applications than any other platform >> Mike Gualtieri: Yeah, I see where you're going. >> And I guess I'm thinking it doesn't matter what the underlying implementation is, but how many widgets can you string together? >> I'm totally with you there, yeah. And so I think what you're saying is look, a platform that has the most capabilities, but abstracts, the implementations, and can, you know, can be somewhat pluggable-- right, good, to keep up with the innovation, yeah. And there's a lot of new companies out there, too, that are tackling this. One of them's called Bonsai AI, you know, small startup, they're trying to abstract deep learning, because deep learning right now, like TensorFlow and MXNet, that's a little bit of a challenge to learn, so they're abstracting it. But so are a lot of the-- so is SAS, IBM, et cetera. >> So, Mike, we're out of time, but I want to talk about your talk tomorrow. So, AI meets Spark, give us a little preview. >> AI meets Spark. Basically, the prerequisite to AI is a very sophisticated and fast data pipeline, because just because we're talking about AI doesn't mean we don't need data to build these models. So, I think Spark gives you the best of both worlds, right? It's designed for these sort of complex data pipelines that you need to prep data, but now, with MLlib for more traditional machine learning, and now with their announcement of TensorFrames, which is going to be an interface for TensorFlow, now you've got deep learning, too. And you've got it in a cluster architecture, so it can scale. So, pretty cool. >> All right, Mike, thanks very much for coming on the Cube. You know, way to go Pats, awesome. Really a pleasure having you back. >> Thanks. >> All right, keep right there, buddy. We'll be back with our next guest right after this short break. This is the Cube. (peppy music)

Published Date : Feb 8 2017

SUMMARY :

brought to you by Databricks. Mike Gualtieri is here, he's the principal analyst It was awesome. All right, the first question is So how was the parade yesterday? Great to see that team in person. and, you know, the show's getting bigger, I mean, what do you do with big data? what are you spending your time on Now, the reality is, if you phrase that So, you wrote a piece late last year, So, AI is really about the building blocks It requires the expertise of a data scientist to do AI. So, if, for example, you take the image API, of e-commerce, where you could send an order in assuming, you know, there's a lot of heavy lifting and the big application developers, SAP, Oracle, We love to have someone like you on the Cube, that Amazon has, that's not going to work for insurance, Yeah, I think what they're trying to do Yeah, but I don't think they're in the business and you're putting in a more vertical application Yeah, because what we're talking about right now, And so, you know, a lot of the vendors you know, the traditional statistical stuff. and can, you know, can be somewhat pluggable-- So, Mike, we're out of time, So, I think Spark gives you the best of both worlds, right? Really a pleasure having you back. This is the Cube.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Mike Gualtieri	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
George Gilbert	PERSON	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Mike	PERSON	0.99+
Red Sox	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
Star Wars	TITLE	0.99+
10 years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
two terms	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Yankees	ORGANIZATION	0.99+
10 people	QUANTITY	0.99+
Superbowl	EVENT	0.99+
last year	DATE	0.99+
IBM Global Services	ORGANIZATION	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Ex Machina	TITLE	0.99+
Boston, Massachusetts	LOCATION	0.99+
Win32	TITLE	0.99+
first question	QUANTITY	0.99+
Austin, Texas	LOCATION	0.99+
19%	QUANTITY	0.99+
millions	QUANTITY	0.99+
yesterday	DATE	0.99+
three	DATE	0.99+
58%	QUANTITY	0.99+
Forrester Research	ORGANIZATION	0.99+
three people	QUANTITY	0.99+
Spark	TITLE	0.99+
One	QUANTITY	0.99+
SAS	ORGANIZATION	0.98+
tomorrow	DATE	0.98+
three services	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
Spark Summit	EVENT	0.98+
both worlds	QUANTITY	0.98+
TensorFrames	TITLE	0.97+
MLlib	TITLE	0.97+
SAP	ORGANIZATION	0.97+
today	DATE	0.96+
each enterprise	QUANTITY	0.96+
TensorFlow	TITLE	0.96+
four months ago	DATE	0.95+
two worlds	QUANTITY	0.95+
Windows	TITLE	0.95+
Cube	COMMERCIAL_ITEM	0.94+
late last year	DATE	0.93+
Ernie Boch	PERSON	0.91+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for MXNet: