Roland Acra, Cisco | Cisco Live EU 2019

>> Live from Barcelona, Spain, it's theCUBE, covering Cisco Live Europe, brought to you by Cisco and its ecosystem partners. >> Welcome back to theCUBE's live coverage here in Barcelona, Spain, for Cisco Live Europe 2019. I'm John Furrier, your host of theCUBE, with Dave Vellante as well, Stu Miniman, who's been doing interviews with us all week, our next guest is Roland Acra, Senior Vice President, General Manager of the Data Center Group. He's in charge of that core business of data center now, at the center of cloud and the edge. Roland, great to see you, thanks for coming on. >> Thank you, thank you for having me. >> So a lot of announcements, a lot of the big guns are out there for Cisco, you got the data center, you got the networking group, and you got IoT, and then cloud center suite was part of the big announcement, your team had a big piece of the keynote yesterday and continues to make waves. Give us a quick update on the news, the key points, what was the announcements? >> Yeah, the two big announcements for my group were ACI Anywhere and HyperFlex Anywhere, and we captured them under a common moniker of There's Nothing Centered About the Data Center Anymore, because both of these speak to things going outside the data center. ACI Anywhere is the integration of ACI, our software-defined networking solution, into two of the most prominent public cloud providers out there, Amazon and Azure, and for HyperFlex Anywhere, the exciting news is the expansion of HyperFlex, which is our hyper-converged solution, also outside the data center, to the edge of the enterprise, specifically branch offices and remote locations. >> And the other thing that came out of our conversation here on theCUBE and also on the keynote, is that the center of the value is the data center, as you guys pointed out with the slides, big circle in the middle, ACI Anywhere, HyperFlex Anywhere, but the network and the data and the security foundation has been a critical part of this new growth. >> Yes. >> Take a minute to explain the journey of ACI, how it started, where are we? It's been a progression for you guys, certainly inside the enterprise, but now it's extended. What's the journey, take us through that. >> When ACI came into the market five years ago now, we have a five year anniversary, ACI brought a software-defined networking solution into the market. It brought an automated network fabric capability, which said you can no longer screw yourself up by having incoherence between one part of the network and another, it's all managed coherently as one thing, and it brought, to your point about security, what's called segmentation of applications. Today, applications have data, they have databases, they have different sensitive pieces, and it's important to be able to tell the network not only get the traffic from one place to the other, but also selectively get the traffic that I tell you to get there, and not the one, and don't get the traffic that has no business getting there, and that's known as segmentation, which is a security concern, particularly when you have sensitive data like consumer data or things that have regulatory things around them. ACI has brought that to the market. That was the value proposition of ACI. We worked on then expanding ACI in the direction of scale. Customers have two or more data centers for disaster recovery, for resiliency, we made that possible. We got to bigger and bigger footprints. Then we took ACI to the edge of the enterprise. What if somebody wanted to put some computing capability in a store, or in a logistics center. ACI then was expanded with that. Step N minus one, was we took ACI to bare metal clouds. Customers now want to deploy also things in co-locations or bare metal clouds. We decoupled ACI software from the Cisco switches, which is the ACI hardware, and ACI became completely virtualized, and still able to be doing everything it does in hardware on premise, in software instead, in somebody's else's capability. And yesterday we announced the full combination of this, which is what if you don't want the ACI soft switching or hard switching, can you use the native switching of a public cloud, like Azure or AWS, and you tell the other APIs, please let those packets go from A to B because they're part of the whitelisted paths,, and don't let packets from C to D go because they're part of the blacklisted paths. And that was the full integration with these clouds-- >> Can you abstract that complexity? >> Completely, completely. One orchestrator, which is the multi-site orchestrator, the same one people have used on premise, that they've developed their policies around, so that we have invested a lot of sweat equity in that controller, it's where also they put their compliance, verification, and audit and assurance, and they use that thing even when something goes to Azure or it goes to AWS. >> So you mentioned the progression. So it's now your full progression, from core to the cloud, including edge-- >> Going through edge. >> What has been some of the results? You mentioned that segmentation's one of 'em, I get that. How has ACI been used, what are some highlights that show the value, because people start looking at ACI, saying, hmm, I like this, I like scale, I have a scale challenge with the new cloud world and edge, and complexity's abstracted away with software, okay, check, so far, so good. Where has been the success of ACI and how do you see that unfolding specifically in the cloud? >> Yeah, the biggest value our customers have gotten, cloud or no cloud, has been with ACI, they've been able to shorten the speed of change, shorten the time for change, therefore increase the speed of change of their network, because now the network needs to operate at the speed of the applications. Applications reconfigure themselves sometimes on hourly or daily basis, and it used to be that changing something in the network, you sent a ticket to somebody who took weeks to reconfigure things. Now that software-defined capability means the network reconfigures and people can change generations of compute on the fly, and the network is in lockstep with that. The agility and speed has been great. The other value has been the value of automation, which is people can run a bigger and bigger and bigger network with a small number of people. You don't have to scale your people the more switches you have. Again, because programming and automation comes to the rescue with that. >> Well I'll tell you, people who are watching right now can look behind Roland and see that it's a packed house. We're in the dev net zone, which has been the massively growing organization within Cisco. Community's been growing very fast, people are developing on top of the networks, and these are network folks, and as well there's new talent coming in. So skill gap is shortening, so you're getting a different makeup for a Cisco user, your customers are changing and changing, growing, existing base plus new people. Talk about that dynamic about how that impacts this intent-based networking, this notion of policy software is defined. >> Yes, you it's you know what many people have been calling infrastructure as code, which is you go from scripting to actually coding and composing very sophisticated automation capabilities and change management capabilities, for an automatable system, which is what ACI is. It's made for people drawing on the strengths that they were doing in the application domain or in the server domain, and bringing that into the network. And that's a new and exciting thing, it brought the network within the purview of coders, people who know how to do Python and know how to do Go language and things which are modern and exciting for the younger generation. It's made also for bringing the analytical capabilities, you know, a lot of what those young coders are used to is a lot of logs, a lot of visibility, a lot of analytics running on, because they've done that on web servers, they've done that on applications that run in the cloud, and we now offer the network, which is very rich in data. If you think about, we see every packet, we see every flow, we see every pattern of how the traffic is changing, and that becomes a data set that is subject to programming because then from there you can extract anomaly detection, you can extract security signatures of malware, you can extract prediction of where the traffic is going to be going in six months. There's a lot of exciting potential from the telemetry and the visibility that we bring into that framework. >> And as you point out, devs love that. I mean Cisco, we've talked about this, is one of the few large established companies that has, in our view, figured out developers, right? There's a lot of examples of those companies that haven't and continue to struggle, we've just witnessed here the dev crowd. I want to ask you about ACI and how it's different from, for example, VMware NSX. What's the differentiation there? >> The biggest differentiation is ACI is one system through which you manage the entire network, the overlay which is the virtual view of the network that the applications care about, as well as the underlay, which is the actual real delivery system that makes the packets get from A to B with quality of service and so forth. So that's first thing. It actually does a lot more, it has much more scope than NSX does. The other thing that's very unique about ACI is we have integrated it with every hypervisor on the planet and every container management framework on the planet, and ever bare metal system on the planet, which means that any workload, something sitting on a mainframe, something sitting on a Sun Oracle server, something on OpenStack on OpenShift, on VMware or on Hyper-V, and now on the EC2 APIs of AWS or on Azure, all of those are integrated with ACI. We're not wedded to one hypervisor, and our cloud implementation that we announced yesterday is a true integrated cloud capability, it's not a bring your own license and go put it on bare metal at AWS, which has been VMware's cloud strategy is to team up with AWS and let customers bring their software licenses into AWS bare metal. That's not EC2. And of course that's not Azure and that's not the other clouds we're going to be doing. So the openness to being multi-cloud on premise, which means every hypervisor and every container framework, and bare metal, with one system. We're extending that into the cloud to give customers choice and openness, that's really a very fundamental philosophy in networks. >> So much wider scope. That's kind of always been Cisco's philosophy in partnership. When you think about HyperFlex going back 10 years when you guys sort of created that with partners and then multiple partners now, maybe talk about that journey a little bit. >> HyperFlex? >> Yes. >> Yeah, 'cause hyperconvergence is another very exciting and fast growing trend in our industry. And really, HyperFlex started off with the hyperconverged infrastructure, started of being the notion of putting a mini-cloud in a box on-premise for application developers to rapidly deploy their applications, as if it was in the cloud. So speed and simplicity were really at a premium, and that's really what defines hyperconvergence. And we've done a tremendous amount of work at Cisco to make speed and simplicity there, because we've integrated network compute storage and a cloud management system called Intersight to give that whole capability to customers. We then hardened it. We took it from being able to do VDI kind of workloads and rather benign workloads, to mission-critical workloads. So databases are now running on HyperFlex. ERP systems are running on HyperFlex, the real crown jewels of the enterprise are now running on HyperFlex. Then we made it multi-cloud. We opened it to all hypervisors and to all container frameworks. We announced OpenShift yesterday, we have already done Hyper-V, we had done OpenStack and DSX, so again, same spirit of openness. And yesterday's announcement was, what if I want to take hyperconvergence outside of the data center in hundreds or thousands of remote locations? Think a retailer. In a retail environment, some of the most interesting data is born outside the data center, it's born in a store. The data is center that follows the customer who's interested in a plasma TV, and that data has a perishable lifetime. You act on it on location and on time or you lose the value. So sending it over, taking two hours to do a machine learning job on it and come back, the customer's already back home watching a movie. And so the window of opportunity for the data is often right there and then, and that's why our customers are taking their computing environment off into where the data is, to act on it fast and on location. >> It sounds easy but I want to just get your thoughts on this, because this is a critical data challenge. If data's stored in classic old ways, data warehouses and fenced off area, it's kind of in the internet, you're not going to have the latency to get that data in real time. Talking about real time data that's addressable for part of the application value. So this is a new notion that's emerged with dev ops and infrastructure as code. >> That's right. >> That's hard. How do you guys see that progressing, how should customers prepare to have that data centered properly for app addressability, discovery, whatever the uses of the data contextually is, time series data or whatever data it is, this is a critical thing. >> It's a critical thing, and there's no one answer, because depending on what the data is, sometimes you only see the value when you concentrate it and consolidate it, because the patterns emerge from rolling out a thousand stores worth of data and seeing that people who buy this toothbrush tend to buy that toothpaste. There may be that value where you want to concentrate the data, but there are also many things where acting on the data in the moment and on location quickly without referring to the other thousand stores extracts 90% of the value of that data. So that's why you want to do forward deploy computing on that data. >> So this highlights network programmability, this means the applications driving the queries or the network for that data, if it's available... So there's two things, network programmability from the app, and availability of the data. >> Yeah, and the ability for the entire infrastructure, network, compute, and storage, and hyperconvergence is the automation of all three to be able to deliver its value equally in remote locations or in a cloud, as it would have in a data center, because that's where, the application's going to want to go where the value is, and if the infrastructure can't follow it there, then you get a degraded ability to take advantage of the opportunity. >> Right, real time decisions happen at the edge, but then as you describe, you got to bring data back, certain data, back to the cloud, do the modeling there and then push the models back down. So you're going to have-- >> And you're going to have decision making distributed. >> And you've got to have low latency to be able to enable that. >> Yeah, and the same goes for other considerations. For example, why is it important to do, allow people to put data both on their premises and in the cloud? For disaster recovery, for data replication, for resiliency. Sometimes for governance reasons. GTPR in Europe says the data of European citizens that's personally identifying has to stay in Europe. Somebody may not have a data center in Europe. Could they take advantage of a co-location ability or somebody else's cloud? >> This is the theme we're seeing at this show this year, and certainly at the center of the news is, complexity is increasing 'cause it's just evolution, more devices are connected, diverse environments, scale for cloud and connectivity, but software driving that. So I got to ask you the question. Go back to the old days, you know, the 1990s, multi-vendor was a big word. Now multi-cloud feels the same way. This is the openness thing. How would you describe multi-cloud strategy for Cisco in context of this notion of being open? >> It is really the new dimension of openness, right? We've been open in the past to multiple forms of physical networks. Customers to use wireless or fiber or copper or what have you, we need to give them an IP network that operate equally well over all media. That was one dimension of openness. Another dimension of openness was, does a product from vendor A work with a product from vendor B? My router, your router, my switch, your firewall, those are other dimensions. Hardware and software coupling. Can I buy the hardware from Peter and the software from Mary, will it work well? The new dimension of openness is, can a customer avail themselves of any form of cloud, either because they like the tooling and how well their developers are more efficient on a given cloud, or because the pricing of the other guy, or the third guy has a point of presence in Tokyo, which this one doesn't. All of those are business choices that if we make our technology, let them take advantage of them with no technical restriction, they will, because now they can shop on the merit of what they want to do, and not on, oh well, sorry, if you want to go to Azure, I can't help you, but if you're willing to settle for your own premise or for Amazon, then I have a story for you. So that's-- >> Roland, you're leading the team on the core crown jewels for Cisco, as you guys, the rising tide's floating all boats here within the company. What's your plan for the year, what's your goals, you'll be out there pounding the pavement with customers, what's your objective, what do you hope to accomplish this year in 2019? >> Well 2019 is the year of many things for us, it's a very exciting year. It's the year of, on the physical infrastructure side, we're taking our switches to 400 gigabit per second. We have our new silicon capability, our new optics, so we're going to be able to scale for the cloud providers who are heading the next frontier of speed and density and scale. So performance will always, always be there, and when we're done with 400, we're already going to be asking about 800. So that's an exciting new generation of switches. ACI Anywhere getting deployed now and adopted across multiple clouds, is another exciting thing. HyperFlex Anywhere, we're really looking forward to the potential in financial services, in logistics, in retail, where's there's a lot of deployed data at the edge. And then, security is a never finished journey, right? Everything with give our customers in the way of security, because there, there's an active actor who's trying to make you fail, right? It's not that you're only fighting physics to get to 400 gigabit, then you win. There we have a guy who's trying to foil your schemes and trying to foil their schemes. Security is a great-- >> Constant attacks are on the network. You guys have seen this movie before, so you know how critical, Roland thanks so much for spending the time, congratulations on ACI Anywhere, HyperFlex Anywhere, intent-based networking at the core. It's theCUBE bringing you all the data, we have an intent here to bring you the best content from Cisco Live in Barcelona. I'm John, Dave Vellante, stay with us for live coverage, day two of three days of coverage here in the dev net zone, packed with developers learning new skills. We'll back with more after this short break.

Published Date : Jan 30 2019

SUMMARY :

covering Cisco Live Europe, brought to you by Cisco of the Data Center Group. So a lot of announcements, a lot of the big guns and for HyperFlex Anywhere, the exciting news is is that the center of the value is the data center, What's the journey, take us through that. but also selectively get the traffic that I tell you the same one people have used on premise, So you mentioned the progression. Where has been the success of ACI and how do you see that and the network is in lockstep with that. We're in the dev net zone, and exciting for the younger generation. is one of the few large established companies We're extending that into the cloud to give customers when you guys sort of created that with partners The data is center that follows the customer it's kind of in the internet, How do you guys see that progressing, extracts 90% of the value of that data. from the app, and availability of the data. and hyperconvergence is the automation of all three do the modeling there and then push the models back down. And you're going to have to be able to enable that. and in the cloud? and certainly at the center of the news is, and the software from Mary, will it work well? for Cisco, as you guys, the rising tide's Well 2019 is the year of many things for us, here in the dev net zone, packed with developers

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Stu Miniman	PERSON	0.99+
John	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
Roland	PERSON	0.99+
ACI	ORGANIZATION	0.99+
Roland Acra	PERSON	0.99+
Tokyo	LOCATION	0.99+
two hours	QUANTITY	0.99+
Barcelona	LOCATION	0.99+
Mary	PERSON	0.99+
90%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
yesterday	DATE	0.99+
hundreds	QUANTITY	0.99+
one system	QUANTITY	0.99+
Python	TITLE	0.99+
two things	QUANTITY	0.99+
three days	QUANTITY	0.99+
Barcelona, Spain	LOCATION	0.99+
Today	DATE	0.99+
2019	DATE	0.99+
GTPR	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Data Center Group	ORGANIZATION	0.99+
third guy	QUANTITY	0.99+
both	QUANTITY	0.98+
Hyper-V	TITLE	0.98+
five years ago	DATE	0.98+
thousands	QUANTITY	0.98+
one	QUANTITY	0.98+
400 gigabit	QUANTITY	0.98+
two big announcements	QUANTITY	0.98+
first thing	QUANTITY	0.98+
VMware	ORGANIZATION	0.97+
400	QUANTITY	0.97+
six months	QUANTITY	0.97+
OpenShift	TITLE	0.96+
OpenStack	TITLE	0.96+
Azure	TITLE	0.96+
One	QUANTITY	0.96+
this year	DATE	0.96+
EC2	TITLE	0.95+
thousand stores	QUANTITY	0.94+
Azure	ORGANIZATION	0.94+
theCUBE	ORGANIZATION	0.94+
five year anniversary	QUANTITY	0.94+
one hypervisor	QUANTITY	0.94+
about 800	QUANTITY	0.94+
VMware	TITLE	0.93+

Sreesha Rao, Niagara Bottling & Seth Dobrin, IBM | Change The Game: Winning With AI 2018

>> Live, from Times Square, in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI. Brought to you by IBM. >> Welcome back to the Big Apple, everybody. I'm Dave Vellante, and you're watching theCUBE, the leader in live tech coverage, and we're here covering a special presentation of IBM's Change the Game: Winning with AI. IBM's got an analyst event going on here at the Westin today in the theater district. They've got 50-60 analysts here. They've got a partner summit going on, and then tonight, at Terminal 5 of the West Side Highway, they've got a customer event, a lot of customers there. We've talked earlier today about the hard news. Seth Dobern is here. He's the Chief Data Officer of IBM Analytics, and he's joined by Shreesha Rao who is the Senior Manager of IT Applications at California-based Niagara Bottling. Gentlemen, welcome to theCUBE. Thanks so much for coming on. >> Thank you, Dave. >> Well, thanks Dave for having us. >> Yes, always a pleasure Seth. We've known each other for a while now. I think we met in the snowstorm in Boston, sparked something a couple years ago. >> Yep. When we were both trapped there. >> Yep, and at that time, we spent a lot of time talking about your internal role as the Chief Data Officer, working closely with Inderpal Bhandari, and you guys are doing inside of IBM. I want to talk a little bit more about your other half which is working with clients and the Data Science Elite Team, and we'll get into what you're doing with Niagara Bottling, but let's start there, in terms of that side of your role, give us the update. >> Yeah, like you said, we spent a lot of time talking about how IBM is implementing the CTO role. While we were doing that internally, I spent quite a bit of time flying around the world, talking to our clients over the last 18 months since I joined IBM, and we found a consistent theme with all the clients, in that, they needed help learning how to implement data science, AI, machine learning, whatever you want to call it, in their enterprise. There's a fundamental difference between doing these things at a university or as part of a Kaggle competition than in an enterprise, so we felt really strongly that it was important for the future of IBM that all of our clients become successful at it because what we don't want to do is we don't want in two years for them to go "Oh my God, this whole data science thing was a scam. We haven't made any money from it." And it's not because the data science thing is a scam. It's because the way they're doing it is not conducive to business, and so we set up this team we call the Data Science Elite Team, and what this team does is we sit with clients around a specific use case for 30, 60, 90 days, it's really about 3 or 4 sprints, depending on the material, the client, and how long it takes, and we help them learn through this use case, how to use Python, R, Scala in our platform obviously, because we're here to make money too, to implement these projects in their enterprise. Now, because it's written in completely open-source, if they're not happy with what the product looks like, they can take their toys and go home afterwards. It's on us to prove the value as part of this, but there's a key point here. My team is not measured on sales. They're measured on adoption of AI in the enterprise, and so it creates a different behavior for them. So they're really about "Make the enterprise successful," right, not "Sell this software." >> Yeah, compensation drives behavior. >> Yeah, yeah. >> So, at this point, I ask, "Well, do you have any examples?" so Shreesha, let's turn to you. (laughing softly) Niagara Bottling -- >> As a matter of fact, Dave, we do. (laughing) >> Yeah, so you're not a bank with a trillion dollars in assets under management. Tell us about Niagara Bottling and your role. >> Well, Niagara Bottling is the biggest private label bottled water manufacturing company in the U.S. We make bottled water for Costcos, Walmarts, major national grocery retailers. These are our customers whom we service, and as with all large customers, they're demanding, and we provide bottled water at relatively low cost and high quality. >> Yeah, so I used to have a CIO consultancy. We worked with every CIO up and down the East Coast. I always observed, really got into a lot of organizations. I was always observed that it was really the heads of Application that drove AI because they were the glue between the business and IT, and that's really where you sit in the organization, right? >> Yes. My role is to support the business and business analytics as well as I support some of the distribution technologies and planning technologies at Niagara Bottling. >> So take us the through the project if you will. What were the drivers? What were the outcomes you envisioned? And we can kind of go through the case study. >> So the current project that we leveraged IBM's help was with a stretch wrapper project. Each pallet that we produce--- we produce obviously cases of bottled water. These are stacked into pallets and then shrink wrapped or stretch wrapped with a stretch wrapper, and this project is to be able to save money by trying to optimize the amount of stretch wrap that goes around a pallet. We need to be able to maintain the structural stability of the pallet while it's transported from the manufacturing location to our customer's location where it's unwrapped and then the cases are used. >> And over breakfast we were talking. You guys produce 2833 bottles of water per second. >> Wow. (everyone laughs) >> It's enormous. The manufacturing line is a high speed manufacturing line, and we have a lights-out policy where everything runs in an automated fashion with raw materials coming in from one end and the finished goods, pallets of water, going out. It's called pellets to pallets. Pellets of plastic coming in through one end and pallets of water going out through the other end. >> Are you sitting on top of an aquifer? Or are you guys using sort of some other techniques? >> Yes, in fact, we do bore wells and extract water from the aquifer. >> Okay, so the goal was to minimize the amount of material that you used but maintain its stability? Is that right? >> Yes, during transportation, yes. So if we use too much plastic, we're not optimally, I mean, we're wasting material, and cost goes up. We produce almost 16 million pallets of water every single year, so that's a lot of shrink wrap that goes around those, so what we can save in terms of maybe 15-20% of shrink wrap costs will amount to quite a bit. >> So, how does machine learning fit into all of this? >> So, machine learning is way to understand what kind of profile, if we can measure what is happening as we wrap the pallets, whether we are wrapping it too tight or by stretching it, that results in either a conservative way of wrapping the pallets or an aggressive way of wrapping the pallets. >> I.e. too much material, right? >> Too much material is conservative, and aggressive is too little material, and so we can achieve some savings if we were to alternate between the profiles. >> So, too little material means you lose product, right? >> Yes, and there's a risk of breakage, so essentially, while the pallet is being wrapped, if you are stretching it too much there's a breakage, and then it interrupts production, so we want to try and avoid that. We want a continuous production, at the same time, we want the pallet to be stable while saving material costs. >> Okay, so you're trying to find that ideal balance, and how much variability is in there? Is it a function of distance and how many touches it has? Maybe you can share with that. >> Yes, so each pallet takes about 16-18 wraps of the stretch wrapper going around it, and that's how much material is laid out. About 250 grams of plastic that goes on there. So we're trying to optimize the gram weight which is the amount of plastic that goes around each of the pallet. >> So it's about predicting how much plastic is enough without having breakage and disrupting your line. So they had labeled data that was, "if we stretch it this much, it breaks. If we don't stretch it this much, it doesn't break, but then it was about predicting what's good enough, avoiding both of those extremes, right? >> Yes. >> So it's a truly predictive and iterative model that we've built with them. >> And, you're obviously injecting data in terms of the trip to the store as well, right? You're taking that into consideration in the model, right? >> Yeah that's mainly to make sure that the pallets are stable during transportation. >> Right. >> And that is already determined how much containment force is required when your stretch and wrap each pallet. So that's one of the variables that is measured, but the inputs and outputs are-- the input is the amount of material that is being used in terms of gram weight. We are trying to minimize that. So that's what the whole machine learning exercise was. >> And the data comes from where? Is it observation, maybe instrumented? >> Yeah, the instruments. Our stretch-wrapper machines have an ignition platform, which is a Scada platform that allows us to measure all of these variables. We would be able to get machine variable information from those machines and then be able to hopefully, one day, automate that process, so the feedback loop that says "On this profile, we've not had any breaks. We can continue," or if there have been frequent breaks on a certain profile or machine setting, then we can change that dynamically as the product is moving through the manufacturing process. >> Yeah, so think of it as, it's kind of a traditional manufacturing production line optimization and prediction problem right? It's minimizing waste, right, while maximizing the output and then throughput of the production line. When you optimize a production line, the first step is to predict what's going to go wrong, and then the next step would be to include precision optimization to say "How do we maximize? Using the constraints that the predictive models give us, how do we maximize the output of the production line?" This is not a unique situation. It's a unique material that we haven't really worked with, but they had some really good data on this material, how it behaves, and that's key, as you know, Dave, and probable most of the people watching this know, labeled data is the hardest part of doing machine learning, and building those features from that labeled data, and they had some great data for us to start with. >> Okay, so you're collecting data at the edge essentially, then you're using that to feed the models, which is running, I don't know, where's it running, your data center? Your cloud? >> Yeah, in our data center, there's an instance of DSX Local. >> Okay. >> That we stood up. Most of the data is running through that. We build the models there. And then our goal is to be able to deploy to the edge where we can complete the loop in terms of the feedback that happens. >> And iterate. (Shreesha nods) >> And DSX Local, is Data Science Experience Local? >> Yes. >> Slash Watson Studio, so they're the same thing. >> Okay now, what role did IBM and the Data Science Elite Team play? You could take us through that. >> So, as we discussed earlier, adopting data science is not that easy. It requires subject matter, expertise. It requires understanding of data science itself, the tools and techniques, and IBM brought that as a part of the Data Science Elite Team. They brought both the tools and the expertise so that we could get on that journey towards AI. >> And it's not a "do the work for them." It's a "teach to fish," and so my team sat side by side with the Niagara Bottling team, and we walked them through the process, so it's not a consulting engagement in the traditional sense. It's how do we help them learn how to do it? So it's side by side with their team. Our team sat there and walked them through it. >> For how many weeks? >> We've had about two sprints already, and we're entering the third sprint. It's been about 30-45 days between sprints. >> And you have your own data science team. >> Yes. Our team is coming up to speed using this project. They've been trained but they needed help with people who have done this, been there, and have handled some of the challenges of modeling and data science. >> So it accelerates that time to --- >> Value. >> Outcome and value and is a knowledge transfer component -- >> Yes, absolutely. >> It's occurring now, and I guess it's ongoing, right? >> Yes. The engagement is unique in the sense that IBM's team came to our factory, understood what that process, the stretch-wrap process looks like so they had an understanding of the physical process and how it's modeled with the help of the variables and understand the data science modeling piece as well. Once they know both side of the equation, they can help put the physical problem and the digital equivalent together, and then be able to correlate why things are happening with the appropriate data that supports the behavior. >> Yeah and then the constraints of the one use case and up to 90 days, there's no charge for those two. Like I said, it's paramount that our clients like Niagara know how to do this successfully in their enterprise. >> It's a freebie? >> No, it's no charge. Free makes it sound too cheap. (everybody laughs) >> But it's part of obviously a broader arrangement with buying hardware and software, or whatever it is. >> Yeah, its a strategy for us to help make sure our clients are successful, and I want it to minimize the activation energy to do that, so there's no charge, and the only requirements from the client is it's a real use case, they at least match the resources I put on the ground, and they sit with us and do things like this and act as a reference and talk about the team and our offerings and their experiences. >> So you've got to have skin in the game obviously, an IBM customer. There's got to be some commitment for some kind of business relationship. How big was the collective team for each, if you will? >> So IBM had 2-3 data scientists. (Dave takes notes) Niagara matched that, 2-3 analysts. There were some working with the machines who were familiar with the machines and others who were more familiar with the data acquisition and data modeling. >> So each of these engagements, they cost us about $250,000 all in, so they're quite an investment we're making in our clients. >> I bet. I mean, 2-3 weeks over many, many weeks of super geeks time. So you're bringing in hardcore data scientists, math wizzes, stat wiz, data hackers, developer--- >> Data viz people, yeah, the whole stack. >> And the level of skills that Niagara has? >> We've got actual employees who are responsible for production, our manufacturing analysts who help aid in troubleshooting problems. If there are breakages, they go analyze why that's happening. Now they have data to tell them what to do about it, and that's the whole journey that we are in, in trying to quantify with the help of data, and be able to connect our systems with data, systems and models that help us analyze what happened and why it happened and what to do before it happens. >> Your team must love this because they're sort of elevating their skills. They're working with rock star data scientists. >> Yes. >> And we've talked about this before. A point that was made here is that it's really important in these projects to have people acting as product owners if you will, subject matter experts, that are on the front line, that do this everyday, not just for the subject matter expertise. I'm sure there's executives that understand it, but when you're done with the model, bringing it to the floor, and talking to their peers about it, there's no better way to drive this cultural change of adopting these things and having one of your peers that you respect talk about it instead of some guy or lady sitting up in the ivory tower saying "thou shalt." >> Now you don't know the outcome yet. It's still early days, but you've got a model built that you've got confidence in, and then you can iterate that model. What's your expectation for the outcome? >> We're hoping that preliminary results help us get up the learning curve of data science and how to leverage data to be able to make decisions. So that's our idea. There are obviously optimal settings that we can use, but it's going to be a trial and error process. And through that, as we collect data, we can understand what settings are optimal and what should we be using in each of the plants. And if the plants decide, hey they have a subjective preference for one profile versus another with the data we are capturing we can measure when they deviated from what we specified. We have a lot of learning coming from the approach that we're taking. You can't control things if you don't measure it first. >> Well, your objectives are to transcend this one project and to do the same thing across. >> And to do the same thing across, yes. >> Essentially pay for it, with a quick return. That's the way to do things these days, right? >> Yes. >> You've got more narrow, small projects that'll give you a quick hit, and then leverage that expertise across the organization to drive more value. >> Yes. >> Love it. What a great story, guys. Thanks so much for coming to theCUBE and sharing. >> Thank you. >> Congratulations. You must be really excited. >> No. It's a fun project. I appreciate it. >> Thanks for having us, Dave. I appreciate it. >> Pleasure, Seth. Always great talking to you, and keep it right there everybody. You're watching theCUBE. We're live from New York City here at the Westin Hotel. cubenyc #cubenyc Check out the ibm.com/winwithai Change the Game: Winning with AI Tonight. We'll be right back after a short break. (minimal upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by IBM. at Terminal 5 of the West Side Highway, I think we met in the snowstorm in Boston, sparked something When we were both trapped there. Yep, and at that time, we spent a lot of time and we found a consistent theme with all the clients, So, at this point, I ask, "Well, do you have As a matter of fact, Dave, we do. Yeah, so you're not a bank with a trillion dollars Well, Niagara Bottling is the biggest private label and that's really where you sit in the organization, right? and business analytics as well as I support some of the And we can kind of go through the case study. So the current project that we leveraged IBM's help was And over breakfast we were talking. (everyone laughs) It's called pellets to pallets. Yes, in fact, we do bore wells and So if we use too much plastic, we're not optimally, as we wrap the pallets, whether we are wrapping it too little material, and so we can achieve some savings so we want to try and avoid that. and how much variability is in there? goes around each of the pallet. So they had labeled data that was, "if we stretch it this that we've built with them. Yeah that's mainly to make sure that the pallets So that's one of the variables that is measured, one day, automate that process, so the feedback loop the predictive models give us, how do we maximize the Yeah, in our data center, Most of the data And iterate. the Data Science Elite Team play? so that we could get on that journey towards AI. And it's not a "do the work for them." and we're entering the third sprint. some of the challenges of modeling and data science. that supports the behavior. Yeah and then the constraints of the one use case No, it's no charge. with buying hardware and software, or whatever it is. minimize the activation energy to do that, There's got to be some commitment for some and others who were more familiar with the So each of these engagements, So you're bringing in hardcore data scientists, math wizzes, and that's the whole journey that we are in, in trying to Your team must love this because that are on the front line, that do this everyday, and then you can iterate that model. And if the plants decide, hey they have a subjective and to do the same thing across. That's the way to do things these days, right? across the organization to drive more value. Thanks so much for coming to theCUBE and sharing. You must be really excited. I appreciate it. I appreciate it. Change the Game: Winning with AI Tonight.

ENTITIES

Entity	Category	Confidence
Shreesha Rao	PERSON	0.99+
Seth Dobern	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Walmarts	ORGANIZATION	0.99+
Costcos	ORGANIZATION	0.99+
Dave	PERSON	0.99+
30	QUANTITY	0.99+
Boston	LOCATION	0.99+
New York City	LOCATION	0.99+
California	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
60	QUANTITY	0.99+
Niagara	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Shreesha	PERSON	0.99+
U.S.	LOCATION	0.99+
Sreesha Rao	PERSON	0.99+
third sprint	QUANTITY	0.99+
90 days	QUANTITY	0.99+
two	QUANTITY	0.99+
first step	QUANTITY	0.99+
Inderpal Bhandari	PERSON	0.99+
Niagara Bottling	ORGANIZATION	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
ibm.com/winwithai	OTHER	0.99+
one	QUANTITY	0.99+
Terminal 5	LOCATION	0.99+
two years	QUANTITY	0.99+
about $250,000	QUANTITY	0.98+
Times Square	LOCATION	0.98+
Scala	TITLE	0.98+
2018	DATE	0.98+
15-20%	QUANTITY	0.98+
IBM Analytics	ORGANIZATION	0.98+
each	QUANTITY	0.98+
today	DATE	0.98+
each pallet	QUANTITY	0.98+
Kaggle	ORGANIZATION	0.98+
West Side Highway	LOCATION	0.97+
Each pallet	QUANTITY	0.97+
4 sprints	QUANTITY	0.97+
About 250 grams	QUANTITY	0.97+
both side	QUANTITY	0.96+
Data Science Elite Team	ORGANIZATION	0.96+
one day	QUANTITY	0.95+
every single year	QUANTITY	0.95+
Niagara Bottling	PERSON	0.93+
about two sprints	QUANTITY	0.93+
one end	QUANTITY	0.93+
R	TITLE	0.92+
2-3 weeks	QUANTITY	0.91+
one profile	QUANTITY	0.91+
50-60 analysts	QUANTITY	0.91+
trillion dollars	QUANTITY	0.9+
2-3 data scientists	QUANTITY	0.9+
about 30-45 days	QUANTITY	0.88+
almost 16 million pallets of water	QUANTITY	0.88+
Big Apple	LOCATION	0.87+
couple years ago	DATE	0.87+
last 18 months	DATE	0.87+
Westin Hotel	ORGANIZATION	0.83+
pallet	QUANTITY	0.83+
#cubenyc	LOCATION	0.82+
2833 bottles of water per second	QUANTITY	0.82+
the Game: Winning with AI	TITLE	0.81+

Ram Venkatesh, Hortonworks & Sudhir Hasbe, Google | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by HortonWorks. >> We are wrapping up Day One of coverage of Dataworks here in San Jose, California on theCUBE. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sudhir Hasbe, who is the director of product management at Google and Ram Venkatesh, who is VP of Engineering at Hortonworks. Ram, Sudhir, thanks so much for coming on the show. >> Thank you very much. >> Thank you. >> So, I want to start out by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. >> Sure, so basically what we announced was support for the Hortonworks DataPlatform and Hortonworks DataFlow, HDP and HDF, running on top of the Google Cloud Platform. So this includes deep integration with Google's cloud storage connector layer as well as it's a certified distribution of HDP to run on the Google Cloud Platform. >> I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises and as they look at moving to cloud, like in GCP, Google Cloud, they want the similar, familiar environment. So, they want the choice to deploy on-premises or Google Cloud, but they want the familiarity of what they've already been using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in cloud, or they wat to build this hybrid solution where the data can reside on-premises, can move to cloud and build these common, hybrid architecture. So, that's what this does. >> So, HDP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there's some tie-in between Hortonworks's real-time or low latency or streaming capabilities from HDF in the Google Cloud. So, could you describe, at a full sort of detail level, the degrees of technical integration between your two offerings here. >> You want to take that? >> Sure, I'll handle that. So, essentially, deep in the heart of HDP, there's the HDFS layer that includes Hadoop compatible file system which is a plug-able file system layer. So, what Google has done is they have provided an implementation of this API for the Google Cloud Storage Connector. So this is the GCS Connector. We've taken the connector and we've actually continued to refine it to work with our workloads and now Hortonworks has actually bundling, packaging, and making this connector be available as part of HDP. >> So bilateral data movement between them? Bilateral workload movement? >> No, think of this as being very efficient when our workloads are running on top of GCP. When they need to get at data, they can get at data that is in the Google Cloud Storage buckets in a very, very efficient manner. So, since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the cloud storage connector. This is a critical part of making sure that the architecture is actually optimized for the cloud. So, at our skill and our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiency that they're looking for as they move to the cloud. So, to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled elastically, independent of each other. So this is a fairly fundamental architectural change. We want to make sure that we enable this in a first-class manner. >> I think that's a key point, right. I think what cloud allows you to do is scale the storage and compute independently. And so, with storing data in Google Cloud Storage, you can like scale that horizontally and then just leverage that as your storage layer. And the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is store the data on GCP, on the cloud storage, and then just use the scale, the compute side of it with HDP and HDF. >> So, if you'll indulge me to a name, another Hortonworks partner for just a hypothetical. Let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of HDP on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling which is more compute intensive and then the separate decoupled storage infrastructure to do the training which is more storage intensive? Is that a capability that would available to your customers? With this integration with Google? >> Yeah, so where we are going with this is we are saying, IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So, you can run your machine learning training jobs, you can run your scoring jobs and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in their particular way with HDP, with Apache Ranger, Atlas, and so forth. So, when they move to the cloud, we want this experience to be seamless from a management standpoint. So, from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they are running in Google Cloud as well. So, we've had this capability on Azure and on AWS, so with this partnership, we are announcing the same type of deep integration with GCP as well. >> So Hortonworks is that one pane of glass across all your product partners for all manner of jobs. Go ahead, Rebecca. >> Well, I just wanted to ask about, we've talked about the reason, the impetus for this. With the customer, it's more familiar for customers, it offers the seamless experience, But, can you delve a little bit into the business problems that you're solving for customers here? >> A lot of times, our customers are at various points on their cloud journey, that for some of them, it's very simple, they're like there's a broom coming by and the datacenter is going away in 12 months and I need to be in the cloud. So, this is where there is a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So, for example, one of our large customers, a travel partner, so they are exploring their new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure, they know they have that for a while. They are spinning up new use cases in the cloud typically for reasons of agility. So, if you, typically many of our customers, they operate large, multi-tenant clusters on-prem. That's nice for, so a very scalable compute for running large jobs. But, if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you can do that. Whereas in this sort of model, what they can say is, they can bring up a new workload and just have the specific versions and dependency that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as... >> Through the containerization of the Spark jobs or whatever. >> Correct, and so containerization as well as even spinning up an entire new environment. Because, in the cloud, given that you have access to elastic compute resources, they can come and go. So, your workloads are much more independent of the underlying cluster than they are on-premise. And this is where sort of the core business benefits around agility, speed of deployment, things like that come into play. >> And also, if you look at the total cost of ownership, really take an example where customers are collecting all this information through the month. And, at month end, you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That's a great scenario for cloud where you don't have to on-premises create an infrastructure, keep it ready. So that's one example where now, in the new partnership, you can collect all the data through the on-premises if you want throughout the month. But, move that and leverage cloud to go ahead and scale and do this workload and finish the books and all. That's one, the second example I can give is, a lot of customers collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still connect all these events through HDP that may be running on-premises with Kafka and then, what you can do is, in-cloud, in GCP, you can deploy HDP, HDF, and you can use the HDF from there for real-time stream processing. So, collect all these clickstream events, use them, make decisions like, hey, which products are selling better?, should we go ahead and give?, how many people are looking at that product?, or how many people have bought it?. That kind of aggregation and real-time at scale, now you can do in-cloud and build these hybrid architectures that are there. And enable scenarios where in past, to do that kind of stuff, you would have to procure hardware, deploy hardware, all of that. Which all goes away. In-cloud, you can do that much more flexibly and just use whatever capacity you have. >> Well, you know, ephemeral workloads are at the heart of what many enterprise data scientists do. Real-world experiments, ad-hoc experiments, with certain datasets. You build a TensorFlow model or maybe a model in Caffe or whatever and you deploy it out to a cluster and so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of an ongoing experimentation program that's, you know, they're building and testing assets that may be or may not be deployed in the production applications. That's you know, so I can see a clear need for that, well, that capability of this announcement in lots of working data science shops in the business world. >> Absolutely. >> And I think coming down to, if you really look at the partnership, right. There are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at-scale at a lower cost, like total cost of ownership, reducing that, running at-scale analytics. That's one of the big things. Again, as I said, the hybrid scenarios. Most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in these hybrid scenarios. And what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they are building and get business value out of all of these. And then, finally, we at Google believe that the world will be more and more real-time over a period of time. Like, we already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions. And this is only going to grow. And this partnership also provides the whole streaming analytics capabilities in-cloud at-scale for customers to build these hybrid plus also real-time streaming scenarios with this package. >> Well it's clear from Google what the Hortonworks partnership gives you in this competitive space, in the multi-cloud space. It gives you that ability to support hybrid cloud scenarios. You're one of the premier public cloud providers and we all know about. And clearly now that you got, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. >> That's perfect, exactly right. >> Well a great note to end on. Thank you so much for coming on theCUBE. Sudhir, Ram, that you so much. >> Thank you, thanks a lot. >> Thank you. >> I'm Rebecca Knight for James Kobielus, we will have more tomorrow from DataWorks. We will see you tomorrow. This is theCUBE signing off. >> From sunny San Jose. >> That's right.

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, for coming on the show. So, I want to start out by asking you to run on the Google Cloud Platform. and as they look at moving to cloud, in the Google Cloud. So, essentially, deep in the heart of HDP, and the cost efficiency is scale the storage and to do the training which and you can have the same that one pane of glass With the customer, it's and just have the specific of the Spark jobs or whatever. of the underlying cluster and then, what you can and so the life of a data that the world will be And clearly now that you got, Sudhir, Ram, that you so much. We will see you tomorrow.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
two	QUANTITY	0.99+
Sudhir	PERSON	0.99+
Ram Venkatesh	PERSON	0.99+
San Jose	LOCATION	0.99+
HortonWorks	ORGANIZATION	0.99+
Sudhir Hasbe	PERSON	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
DataWorks	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Ram	PERSON	0.99+
AWS	ORGANIZATION	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.99+
two offerings	QUANTITY	0.98+
12 months	QUANTITY	0.98+
One	QUANTITY	0.98+
Day One	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
IBM	ORGANIZATION	0.97+
second example	QUANTITY	0.97+
Google Cloud Platform	TITLE	0.96+
Atlas	ORGANIZATION	0.96+
Google Cloud	TITLE	0.94+
Apache Ranger	ORGANIZATION	0.92+
three key areas	QUANTITY	0.92+
Hadoop	TITLE	0.91+
Kafka	TITLE	0.9+
theCUBE	ORGANIZATION	0.88+
earlier this morning	DATE	0.87+
Apache Hive	ORGANIZATION	0.86+
GCP	TITLE	0.86+
one pane	QUANTITY	0.86+
IBM Data Science	ORGANIZATION	0.84+
Azure	TITLE	0.82+
Spark	TITLE	0.81+
first	QUANTITY	0.79+
HDF	ORGANIZATION	0.74+
once in a month	QUANTITY	0.73+
HDP	ORGANIZATION	0.7+
TensorFlow	OTHER	0.69+
Hortonworks DataPlatform	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.61+
GCS	OTHER	0.57+
HDP	TITLE	0.5+
DSX	TITLE	0.49+
Cloud Storage	TITLE	0.47+

Pandit Prasad, IBM | DataWorks Summit 2018

>> From San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE's live coverage of Data Works here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Pandit Prasad. He is the analytics, projects, strategy, and management at IBM Analytics. Thanks so much for coming on the show. >> Thanks Rebecca, glad to be here. >> So, why don't you just start out by telling our viewers a little bit about what you do in terms of in relationship with the Horton Works relationship and the other parts of your job. >> Sure, as you said I am in Offering Management, which is also known as Product Management for IBM, manage the big data portfolio from an IBM perspective. I was also working with Hortonworks on developing this relationship, nurturing that relationship, so it's been a year since the Northsys partnership. We announced this partnership exactly last year at the same conference. And now it's been a year, so this year has been a journey and aligning the two portfolios together. Right, so Hortonworks had HDP HDF. IBM also had similar products, so we have for example, Big Sequel, Hortonworks has Hive, so how Hive and Big Sequel align together. IBM has a Data Science Experience, where does that come into the picture on top of HDP, so it means before this partnership if you look into the market, it has been you sell Hadoop, you sell a sequel engine, you sell Data Science. So what this year has given us is more of a solution sell. Now with this partnership we go to the customers and say here is NTN experience for you. You start with Hadoop, you put more analytics on top of it, you then bring Big Sequel for complex queries and federation visualization stories and then finally you put Data Science on top of it, so it gives you a complete NTN solution, the NTN experience for getting the value out of the data. >> Now IBM a few years back released a Watson data platform for team data science with DSX, data science experience, as one of the tools for data scientists. Is Watson data platform still the core, I call it dev ops for data science and maybe that's the wrong term, that IBM provides to market or is there sort of a broader dev ops frame work within which IBM goes to market these tools? >> Sure, Watson data platform one year ago was more of a cloud platform and it had many components of it and now we are getting a lot of components on to the (mumbles) and data science experience is one part of it, so data science experience... >> So Watson analytics as well for subject matter experts and so forth. >> Yes. And again Watson has a whole suit of side business based offerings, data science experience is more of a a particular aspect of the focus, specifically on the data science and that's been now available on PRAM and now we are building this arm from stack, so we have HDP, HDF, Big Sequel, Data Science Experience and we are working towards adding more and more to that portfolio. >> Well you have a broader reference architecture and a stack of solutions AI and power and so for more of the deep learning development. In your relationship with Hortonworks, are they reselling more of those tools into their customer base to supplement, extend what they already resell DSX or is that outside of the scope of the relationship? >> No it is all part of the relationship, these three have been the core of what we announced last year and then there are other solutions. We have the whole governance solution right, so again it goes back to the partnership HDP brings with it Atlas. IBM has a whole suite of governance portfolio including the governance catalog. How do you expand the story from being a Hadoop-centric story to an enterprise data-like story, and then now we are taking that to the cloud that's what Truata is all about. Rob Thomas came out with a blog yesterday morning talking about Truata. If you look at it is nothing but a governed data-link hosted offering, if you want to simplify it. That's one way to look at it caters to the GDPR requirements as well. >> For GDPR for the IBM Hortonworks partnership is the lead solution for GDPR compliance, is it Hortonworks Data Steward Studio or is it any number of solutions that IBM already has for data governance and curation, or is it a combination of all of that in terms of what you, as partners, propose to customers for soup to nuts GDPR compliance? Give me a sense for... >> It is a combination of all of those so it has a HDP, its has HDF, it has Big Sequel, it has Data Science Experience, it had IBM governance catalog, it has IBM data quality and it has a bunch of security products, like Gaurdium and it has some new IBM proprietary components that are very specific towards data (cough drowns out speaker) and how do you deal with the personal data and sensitive personal data as classified by GDPR. I'm supposed to query some high level information but I'm not allowed to query deep into the personal information so how do you blog those queries, how do you understand those, these are not necessarily part of Data Steward Studio. These are some of the proprietary components that are thrown into the mix by IBM. >> One of the requirements that is not often talked about under GDPR, Ricky of Formworks got in to it a little bit in his presentation, was the notion that the requirement that if you are using an UE citizen's PII to drive algorithmic outcomes, that they have the right to full transparency. It's the algorithmic decision paths that were taken. I remember IBM had a tool under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it was called Watson Curator a few years back, is that a solution that IBM still offers, because I'm getting a sense right now that Hortonworks has a specific solution, not to say that they may not be working on it, that addresses that side of GDPR, do you know what I'm referring to there? >> I'm not aware of something from the Hortonworks side beyond the Data Steward Studio, which offers basically identification of what some of the... >> Data lineage as opposed to model lineage. It's a subtle distinction. >> It can identify some of the personal information and maybe provide a way to tag it and hence, mask it, but the Truata offering is the one that is bringing some new research assets, after GDPR guidelines became clear and then they got into they are full of how do we cater to those requirements. These are relatively new proprietary components, they are not even being productized, that's why I am calling them proprietary components that are going in to this hosting service. >> IBM's got a big portfolio so I'll understand if you guys are still working out what position. Rebecca go ahead. >> I just wanted to ask you about this new era of GDPR. The last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it, in the sense of they're really still understand the ripple effects and how it's all going to play out? How would you describe your interactions with companies in terms of how they're dealing with these new requirements? >> They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. For example I met with a customer and they are a multi-national company. They have data centers across different geos and they asked me, I have somebody from Asia trying to query the data so that the query should go to Europe, but the query processing should not happen in Asia, the query processing all should happen in Europe, and only the output of the query should be sent back to Asia. You won't be able to think in these terms before the GDPR guidance era. >> Right, exceedingly complicated. >> Decoupling storage from processing enables those kinds of fairly complex scenarios for compliance purposes. >> It's not just about the access to data, now you are getting into where the processing happens were the results are getting displayed, so we are getting... >> Severe penalties for not doing that so your customers need to keep up. There was announcement at this show at Dataworks 2018 of an IBM Hortonwokrs solution. IBM post-analytics with with Hortonworks. I wonder if you could speak a little bit about that, Pandit, in terms of what's provided, it's a subscription service? If you could tell us what subset of IBM's analytics portfolio is hosted for Hortonwork's customers? >> Sure, was you said, it is a a hosted offering. Initially we are starting of as base offering with three products, it will have HDP, Big Sequel, IBM DB2 Big Sequel and DSX, Data Science Experience. Those are the three solutions, again as I said, it is hosted on IBM Cloud, so customers have a choice of different configurations they can choose, whether it be VMs or bare metal. I should say this is probably the only offering, as of today, that offers bare metal configuration in the cloud. >> It's geared to data scientist developers and machine-learning models will build the models and train them in IBM Cloud, but in a hosted HDP in IBM Cloud. Is that correct? >> Yeah, I would rephrase that a little bit. There are several different offerings on the cloud today and we can think about them as you said for ad-hoc or ephemeral workloads, also geared towards low cost. You think about this offering as taking your on PRAM data center experience directly onto the cloud. It is geared towards very high performance. The hardware and the software they are all configured, optimized for providing high performance, not necessarily for ad-hoc workloads, or ephemeral workloads, they are capable of handling massive workloads, on sitcky workloads, not meant for I turned this massive performance computing power for a couple of hours and then switched them off, but rather, I'm going to run these massive workloads as if it is located in my data center, that's number one. It comes with the complete set of HDP. If you think about it there are currently in the cloud you have Hive and Hbase, the sequel engines and the stories separate, security is optional, governance is optional. This comes with the whole enchilada. It has security and governance all baked in. It provides the option to use Big Sequel, because once you get on Hadoop, the next experience is I want to run complex workloads. I want to run federated queries across Hadoop as well as other data storage. How do I handle those, and then it comes with Data Science Experience also configured for best performance and integrated together. As a part of this partnership, I mentioned earlier, that we have progress towards providing this story of an NTN solution. The next steps of that are, yeah I can say that it's an NTN solution but are the product's look and feel as if they are one solution. That's what we are getting into and I have featured some of those integrations. For example Big Sequel, IBM product, we have been working on baking it very closely with HDP. It can be deployed through Morey, it is integrated with Atlas and Granger for security. We are improving the integrations with Atlas for governance. >> Say you're building a Spark machine learning model inside a DSX on HDP within IH (mumbles) IBM hosting with Hortonworks on HDP 3.0, can you then containerize that machine learning Sparks and then deploy into an edge scenario? >> Sure, first was Big Sequel, the next one was DSX. DSX is integrated with HDP as well. We can run DSX workloads on HDP before, but what we have done now is, if you want to run the DSX workloads, I want to run a Python workload, I need to have Python libraries on all the nodes that I want to deploy. Suppose you are running a big cluster, 500 cluster. I need to have Python libraries on all 500 nodes and I need to maintain the versioning of it. If I upgrade the versions then I need to go and upgrade and make sure all of them are perfectly aligned. >> In this first version will you be able build a Spark model and a Tesorflow model and containerize them and deploy them. >> Yes. >> Across a multi-cloud and orchestrate them with Kubernetes to do all that meshing, is that a capability now or planned for the future within this portfolio? >> Yeah, we have that capability demonstrated in the pedestal today, so that is a new one integration. We can run virtual, we call it virtual Python environment. DSX can containerize it and run data that's foreclosed in the HDP cluster. Now we are making use of both the data in the cluster, as well as the infrastructure of the cluster itself for running the workloads. >> In terms of the layers stacked, is also incorporating the IBM distributed deep-learning technology that you've recently announced? Which I think is highly differentiated, because deep learning is increasingly become a set of capabilities that are across a distributed mesh playing together as is they're one unified application. Is that a capability now in this solution, or will it be in the near future? DPL distributed deep learning? >> No, we have not yet. >> I know that's on the AI power platform currently, gotcha. >> It's what we'll be talking about at next year's conference. >> That's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration, next one is, depending on how the customers react to it, definitely we're thinking about bare metal with GPUs optimized for Tensorflow workloads. >> Exciting, we'll be tuned in the coming months and years I'm sure you guys will have that. >> Pandit, thank you so much for coming on theCUBE. We appreciate it. I'm Rebecca Knight for James Kobielus. We will have, more from theCUBE's live coverage of Dataworks, just after this.

Published Date : Jun 19 2018

SUMMARY :

Brought to you by Hortonworks. Thanks so much for coming on the show. and the other parts of your job. and aligning the two portfolios together. and maybe that's the wrong term, getting a lot of components on to the (mumbles) and so forth. a particular aspect of the focus, and so for more of the deep learning development. No it is all part of the relationship, For GDPR for the IBM Hortonworks partnership the personal information so how do you blog One of the requirements that is not often I'm not aware of something from the Hortonworks side Data lineage as opposed to model lineage. It can identify some of the personal information if you guys are still working out what position. in the sense of they're really still understand the and interpret the requirements coming to terms kinds of fairly complex scenarios for compliance purposes. It's not just about the access to data, I wonder if you could speak a little that offers bare metal configuration in the cloud. It's geared to data scientist developers in the cloud you have Hive and Hbase, can you then containerize that machine learning Sparks on all the nodes that I want to deploy. In this first version will you be able build of the cluster itself for running the workloads. is also incorporating the IBM distributed It's what we'll be talking next one is, depending on how the customers react to it, I'm sure you guys will have that. Pandit, thank you so much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Pandit	PERSON	0.99+
last year	DATE	0.99+
Python	TITLE	0.99+
yesterday morning	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
three solutions	QUANTITY	0.99+
Ricky	PERSON	0.99+
Northsys	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Pandit Prasad	PERSON	0.99+
GDPR	TITLE	0.99+
IBM Analytics	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
both	QUANTITY	0.99+
one year ago	DATE	0.98+
Hortonwork	ORGANIZATION	0.98+
three	QUANTITY	0.98+
today	DATE	0.98+
DSX	TITLE	0.98+
Formworks	ORGANIZATION	0.98+
this year	DATE	0.98+
Atlas	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Granger	ORGANIZATION	0.97+
Gaurdium	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Data Steward Studio	ORGANIZATION	0.97+
two portfolios	QUANTITY	0.97+
Truata	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
one solution	QUANTITY	0.96+
one way	QUANTITY	0.95+
next year	DATE	0.94+
500 nodes	QUANTITY	0.94+
NTN	ORGANIZATION	0.93+
Watson	TITLE	0.93+
Hortonworks	PERSON	0.93+

GDPR on theCUBE, Highlight Reel #3 | GDPR Day

(bouncy, melodic music) - The world's kind of revolting against these mega-siloed platforms. - That's the risk of having such centralized control over technology. If you remember in the old days, when Microsoft dominance was rising, all you had to do was target Windows as a virus platform, and you were able to impact thousands of businesses even in the early Internet days, within hours. And it's the same thing happening right now, as a weaponization of these social media platforms, and Google's search engine technology and so forth, is the same side effect now. The centralization, that control, is the problem. One of the reasons I love the Blockstack technology, and Blockchain in general, is the ability to decentralize these things right now. And the most passionate thing I care about nowadays is being driven out of Europe, where they have a lot more maturity in terms of handling these nuisance-- - You mean the check being driven out of Europe. - Their loss, - The loss, okay. - being driven out of Europe and-- - Be specific, we'd like an example. - The major deadline that's coming up in May 25th of 2018 is GDPR, General Data Protection Regulation, where European citizens now, and any company, American or otherwise, catering to European citizens, has to respond to things like the Right To Be Forgotten request. You've got 24 hours as a global corporation with European operations, to respond to European citizens, EU citizens, Right To Be Forgotten request where all the personally identifiable information, the PII, has to be removed and auto-trailed, proving it's been removed, has to be gone from two, three hundred internal systems within 24 hours. And this has teeth by the way. It's not like the 2.7 billion dollar fine that Google just flipped away casually. This has up to 4% of your global profits per incident where you don't meet that requirement. - And so what we're seeing in the case of GDPR is that's an accelerant to adopt Cloud, because we actually isolate the data down into regions and the way we've architected our platform from day one is always been a true multi-tenant SaaS technology platform. And so there's not that worry about data resiliency and where it resides, and how you get access to it, because we've built all that up. And so, when we go through all of our own attestations, whether it's SOC Type One, Type Two, GDPR as an initiative, what we're doin' for HIPAA, what we're doin' for plethora of other things, usually the CSO says, "Oh, I get it, you're way more secure, now help me," because I don't want the folks in development or operations to go amuck, so to speak, I want to be an enabler, not Doctor No. - I'm a developer, I search for data, I'm just searching for data. - That's right. - What's the controls available for making sure that I don't go afoul of GDPR. - So absolutely. So we have phenomenal security capabilities that are built into our product, both from an identification point of view, giving rights and privileges, as well as protecting that data from any third party access. All of this information is going to be compliant with these regulations, beyond GDPR. There's enormous regulations around data that require us to keep our securities levels as high as we go. In fact, we would argue that AWS itself is now typically more secure, more secure, - [Mike] They've done the work. - than your classic data center. - [Mike] Yeah, they've done the work. - AI-ers, explicable machine learning. - Yeah, that's a hot focus, - Indeed. - or concern of enterprises everywhere, especially in a world where governance and tracking and lineage, - Precisely. - GDPR and so forth, so hot. - Yes, you have mentioned all the right things. Now, so given those two things, there's normal web data, NML is not easy, why the partnership between Hortonworks and IBM makes sense? Well, you're looking at the number one, industry leading big data platform, Hortonworks, Then you look at a DSX Local, which I'm proud to say I've been there since the first line of code, and I'm feeling very passionate about the product, is the merge between the two. Ability to integrate them tightly together, gives your data scientists secure access to data, ability to leverage the Spark that runs inside of Hortonworks Glassdoor, ability to actually work in a platform like DSX, that doesn't limit you to just one kind of technology but allows you to work within multiple technologies, Ability to actually work on your, not only Spark-- - You say technologies here, are you referring to frameworks like TensorFlow, and-- - [Piotr] Precisely. - Okay, okay. - Very good, now, that part I'm gonna get into very shortly. So please don't steal my thunder. - So GDPR you see as a big opportunity for Cloud providers, like Azure. Or they bring something to the table, right? - Yeah, they bring different things to the table. You have elements of data where you need the on-premise solution, you need to have control, and you need to have that restriction about where that data sits. And some of the talks here that are going on at the moment, is understanding, again, how critical and how risky is that data? What is it you're keepin' and how high does that come up in our business value it is? So if that's gonna be on your imperma-solution, there may be other data that can get push out into the Cloud, but, I would say, Azure, the AWS Suites and Google, they are really pushing down that security, what you can do, how you protect it, how you can protect that data, and you've got the capabilities of things like LSR or GSR, and having that global reach or that local repositories, for the object storage. So you can start to control by policies. You can write into this country, but you're not allowed to go to this country, and you're not allowed to go to that one, and Cloud does give you that to a certain element, but also then, you have to step back into, maybe the sorts of things that-- - So does that make Cloud Orchestrator more valuable, or has it still got more work to do? Because under what Adam was saying, is that the point and click, is a great way to provision, right?

Published Date : May 25 2018

SUMMARY :

- So GDPR you see as a big opportunity

ENTITIES

Entity	Category	Confidence
Hortonworks	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Adam	PERSON	0.99+
two	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
May 25th of 2018	DATE	0.99+
GDPR	TITLE	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
24 hours	QUANTITY	0.99+
two things	QUANTITY	0.99+
Mike	PERSON	0.99+
first line	QUANTITY	0.99+
HIPAA	TITLE	0.98+
Cloud	TITLE	0.98+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
thousands of businesses	QUANTITY	0.96+
Windows	TITLE	0.96+
Piotr	PERSON	0.95+
up to 4%	QUANTITY	0.95+
TensorFlow	TITLE	0.94+
one kind	QUANTITY	0.93+
European	OTHER	0.93+
2.7 billion dollar	QUANTITY	0.91+
Azure	ORGANIZATION	0.89+
AWS Suites	ORGANIZATION	0.89+
Spark	TITLE	0.88+
three hundred internal systems	QUANTITY	0.86+
EU	LOCATION	0.85+
Hortonworks Glassdoor	ORGANIZATION	0.84+
NML	ORGANIZATION	0.83+
GDPR Day	EVENT	0.78+
day one	QUANTITY	0.75+
American	OTHER	0.74+
CSO	ORGANIZATION	0.72+
LSR	TITLE	0.7+
Right To Be Forgotten	OTHER	0.68+
GSR	TITLE	0.62+
Type Two	OTHER	0.62+
To	OTHER	0.6+
DSX	ORGANIZATION	0.59+
One	OTHER	0.59+
Highlight Reel	ORGANIZATION	0.56+
#3	QUANTITY	0.54+
one	QUANTITY	0.5+
SOC Type	TITLE	0.49+
DSX Local	TITLE	0.44+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Abhas Ricky, Hortonwork | Dataworks Summit 2018

>> Announcer: From Berlin, Germany, it's the CUBE covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Welcome to the CUBE, we're here at Dataworks Summit 2018 in Berlin. I'm James Kobielus. I am the lead analyst for big data analytics on the Wikibon team of SiliconANGLE Media On the CUBE, we extract the signal from the noise and here at Dataworks Summit, the signal is big data analytics and increasingly the imperative for many enterprises is compliance with GDPR, the General Data Protection Regulation comes in five weeks, May 25th. There's more things going on so what I'm going to be doing today for the next 20 minutes or so is from Hortonworks I have Abhas Ricky who is the director of strategy and innovation. He helps customers, and he'll explain what he does, but at a high level, he helps customers to identify the value of investments in big data, analytics, big data platforms in their business. And Abhas, how do you justify the value of compliance with GDPR. I guess, the value would be avoid penalties for noncompliance, right? Can you do it as an upside as well? Is there an upside in terms of if you make an investment, and you probably will need to make an investment to comply, Can you turn this around as a strategic asset, possibly? Yeah, so I'll take a step back first. >> James: Like a big data catalog and so forth. >> Yeah, so if you look at the value part which you said, it's interesting that you mentioned it. So there's a study which was done by McKinsey which said that only 15% of executives can understand what is the value of a digital initiative, let alone big data initiative. >> James: Yeah. >> Similarly, Gardner says that if you look at the various portraits and if you look at various issues, the fundamental thing which executives struggle with identifying the value which they will get. So that is where I pitch in. That is where I come in and do a data perspective. Now if you look at GDPR specifically, one of the things that we believe, and I've done multiple blogs around that and webinars, GDPR should be treated at a business opportunity because of the fact that -- >> James: Any opportunity? Business opportunity. It shouldn't necessarily be seen as a compliance burden on costs or your balance sheets because of the fact, it is the one single opportunity which allows you to clean up your data supply chain. It allows you to look at your data assets with a holistic view, and if you create a transparent data supply chain, and your IT systems talk to each other. So some of the provisions, as you know, in addition to right to content, right to portability, etc. It is also privacy by design which says that you have to be proactive in defining your IT systems and architecture. It's not necessarily reactive. But guess what? If you're able to do that, you will see the benefits in other use cases like single view of customer or fraud or anti-money laundering because at the end of the day, all GDPR is allowing you to say is that where do you store your data, what's the lineage, what's the provenance? Can you identify what the personally identifiable information is for any particular customer? And can you use that to your effect as you go forward? So it's a great opportunity because to be able to comply with the provisions, you've got to take steps before that which is essentially streamlining your data operations which obviously will have a domino effect on the efficiency of other use cases. So I believe it's a business opportunity. >> Right, now part of that opportunity in terms of getting your arms around what data you have, when the GDPR is concerned, the customer has a right to withhold consent for you and the enterprise that holds that data to use that personal data of theirs which they own for various and sundry reasons. Many enterprises and many of Hortonworks customers are using their big data for things like AI and machine learning. Won't this compliance with GDPR limit their ability to seize the opportunity to build deep learning and so forth? What are customers saying about that? Is that going to be kind of a downer or a chilling effect on their investments in AI and so forth? >> So there's two elements around it. The first thing which you said, there are customers, there's machine learning in AI, yes, there are. But broadly speaking, before you're able to do machine learning and AI, you need to get your data sets onto a particular platform in a particular fashion, clean data, otherwise, you can't do AI or machine learning on top of it. >> James: Right. So the reason why I say it's an opportunity is that because you're being forced by compliance to get that data from every other place onto this platform. So obviously those capabilities will get enhanced. Having said, I do agree if I'm an organization which does targeting, retargeting of customers based on multiple segmentations and then one of the things is online advertisements. In that case, yes, your ability might get affected, but I don't think you'll get prohibited. And that affected time span will be only small because you just adapt. So the good thing about machine learning and AI is that you don't create rules, you don't create manual rules. They pick up the rules based on the patterns and how the data sets have been performing. So obviously once you have created those structures in place, initially, yes, you'll have to make an investment to alter your programs of work. However, going forward, it will be even better. Because guess what? You just cleaned your entire data supply chain. So that's how I would see that, yes, a lot of companies, ecommerce you do targeting and retargeting based on the customer DNA, based on their shopping profiles, based on their shopping ad libs and then based off that, you give them the next best offer or whatever. So, yes, that might get affected initially, but that's not because GDPR is there or not. That's just because you're changing your program software. You're changing the fundamental way by which you're sourcing the data, the way they are coming from and which data can you use. But once you have tags against each of those attributes, once you have access controls, once you know exactly which customer attributes you can touch and you cannot for the purposes, do you have consent or not, your life's even better. The AI tools or the machine learning algorithms will learn from themselves. >> Right, so essentially, once you have a tight ship in terms of managing your data in line with the GDPR strictures and so forth, it sounds like what you're saying is that it gives you as an enterprise the confidence and assurance that if you want to use that data and need to use that data, you know exactly how you've the processes in place to gain the necessary consents from customers. So there won't be any nasty surprises later on of customers complaining because you've got legal procedures for getting the consent and that's great. You know, one of the things, Abhas, we're hearing right now in terms of compliance requirements that are coming along, maybe not apart of GDPR directly yet, but related to it is the whole notion of algorithmic transparency. As you build machine learning models and these machine learning models are driven into working applications, being able to transparently identify if those models make, in particular, let's say autonomous action based on particular data and particular variables, and then there is some nasty consequences like crashing an autonomous vehicle, the ability, they call it explicably AI to roll that back and determine who's liable for that event. Does Hortonworks have any capability within your portfolio to enable more transparency into the algorithmic underpinnings of a given decision? Is that something that you enable in your solutions or that your partner IBM enables through DSX and so forth? Give us a sense whether that's a capability currently that you guys offer and whether that's something in terms of your understand, are customers asking for that yet or is that too futuristic? >> So I would say that it's a two-part question. >> James: Yeah. >> The first one, yes, there are multiple regulations coming in, like Vilica Financial Markets, there's Mid Fair, the BCBS, etc. and organizations have to comply. You've got the IFRS which span to brokers, the insurance, etc., etc. So, yes, a lot of organizations across industries are getting affected by compliance use cases. Where does Hortonworks come into the picture is to be able to be compliant from a data standpoint, A you need to be able to identify which of those data sources you need to implement a particular use case. B you need to get them to a certain point whereby you can do analytics on that And then there's the whole storage and processing and all of that. But also which you might have heard at the keynote today, from a cloud perspective, it's starting to get more and more complex because everyone's moving to the cloud which means, if you look at any large multi-national organization, most of them have a hybrid cloud structure because they work with two or three cloud vendors which makes the process even more complex because now you have multiple clusters, you have have on premise and you have multiple different IT systems who need to talk to each other. Which is where the Hortonworks data plan services come into the picture because it gives you a unified view of your global data assets. >> James: Yes. >> Think of it like a single pane of glass which whereby you can do security and governance across all data assets. So from those angles, yes, we definitely enable those use cases which will help with compliance. >> Making the case to the customer for a big data catalog along the lines of what you guys offer, in making the case, there's a lot of upfront data architectural work that needs to be done to get all you data assets into shape within the context of the catalog. How do they justify making that expense in terms of hiring the people, the data architects and so forth needed to put it all in shape. I mean, how long does it take before you can really stand up in your working data catalog in most companies? >> So again, you've asked two questions. First of all is how do they justify it? Which is where we say that the platform is a means to an end. It's enabling you to deliver use cases. So I look at it in terms of five key value drivers. Either it's a risk reduction or it's a cost reduction or it's a cost avoidance. >> James: Okay. >> Or it's a revenue optimization, or it's time to market. Against each one of these value drivers, or multiple of them or a combination of them, each of the use cases that you're delivering on the platform will lead you to benefits around that. My job, obviously, is to work with the customers and executes to understand what will that be to quantify the potential impact which will then form the basis and give my customer champions enough ammunition so that they can go back and justify those investments. >> James: Abhas, we're going to have to cut it short, but I'm going to let you finish your point here, but we have to end this segment so go ahead. >> That's fine. >> Okay, well, anyway, have had Abhas Ricky who is the director of strategy and innovation at Hortonworks. We're here at Dataworks Summit Berlin. And thank you very much Sorry to cut it short, but we have to move to the next guest. >> No worries, pleasure, thank you very much. >> Take care, have a good one. >> Thanks a lot, yes. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and you probably will need to make an investment to comply, Yeah, so if you look at the value part which you said, the various portraits and if you look at various issues, So some of the provisions, as you know, the customer has a right to withhold consent for you you need to get your data sets onto a particular platform the way they are coming from and which data can you use. and need to use that data, you know exactly come into the picture because it gives you which whereby you can do security and governance a big data catalog along the lines of what you guys offer, the platform is a means to an end. will lead you to benefits around that. but I'm going to let you finish your point here, And thank you very much Thanks a lot, yes.

ENTITIES

Entity	Category	Confidence
James	PERSON	0.99+
James Kobielus	PERSON	0.99+
two	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
BCBS	ORGANIZATION	0.99+
two-part	QUANTITY	0.99+
General Data Protection Regulation	TITLE	0.99+
Abhas	PERSON	0.99+
Gardner	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
15%	QUANTITY	0.99+
two elements	QUANTITY	0.99+
Vilica Financial Markets	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Abhas Ricky	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
May 25th	DATE	0.99+
today	DATE	0.98+
First	QUANTITY	0.98+
Berlin, Germany	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
one	QUANTITY	0.98+
first	QUANTITY	0.97+
first one	QUANTITY	0.97+
single	QUANTITY	0.97+
Dataworks Summit	EVENT	0.96+
five weeks	QUANTITY	0.95+
five key value drivers	QUANTITY	0.95+
first thing	QUANTITY	0.95+
Wikibon	ORGANIZATION	0.95+
one single opportunity	QUANTITY	0.93+
single pane	QUANTITY	0.91+
McKinsey	ORGANIZATION	0.9+
CUBE	ORGANIZATION	0.9+
Mid Fair	ORGANIZATION	0.89+
three cloud vendors	QUANTITY	0.89+
IFRS	TITLE	0.87+
each one	QUANTITY	0.87+
Dataworks Summit Europe 2018	EVENT	0.86+
DSX	TITLE	0.8+
Hortonwork	ORGANIZATION	0.78+
next 20 minutes	DATE	0.72+

Nutanix .NEXT Morning Keynote Day1

Section 1 of 13 [00:00:00 - 00:10:04] (NOTE: speaker names may be different in each section) Speaker 1: Ladies and gentlemen our program will begin momentarily. Thank you. (singing) This presentation and the accompanying oral commentary may include forward looking statements that are subject to risks uncertainties and other factors beyond our control. Our actual results, performance or achievements may differ materially and adversely from those anticipated or implied by such statements because of various risk factors. Including those detailed in our annual report on form 10-K for the fiscal year ended July 31, 2017 filed with the SEC. Any future product or roadmap information presented is intended to outline general product direction and is not a commitment to deliver any functionality and should not be used when making any purchasing decision. (singing) Ladies and gentlemen please welcome Vice President Corporate Marketing Nutanix, Julie O'Brien. Julie O'Brien: All right. How about those Nutanix .NEXT dancers, were they amazing or what? Did you see how I blended right in, you didn't even notice I was there. [French 00:07:23] to .NEXT 2017 Europe. We're so glad that you could make it today. We have such a great agenda for you. First off do not miss tomorrow morning. We're going to share the outtakes video of the handclap video you just saw. Where are the customers, the partners, the Nutanix employee who starred in our handclap video? Please stand up take a bow. You are not going to want to miss tomorrow morning, let me tell you. That is going to be truly entertaining just like the next two days we have in store for you. A content rich highly interactive, number of sessions throughout our agenda. Wow! Look around, it is amazing to see how many cloud builders we have with us today. Side by side you're either more than 2,200 people who have traveled from all corners of the globe to be here. That's double the attendance from last year at our first .NEXT Conference in Europe. Now perhaps some of you are here to learn the basics of hyperconverged infrastructure. Others of you might be here to build your enterprise cloud strategy. And maybe some of you are here to just network with the best and brightest in the industry, in this beautiful French Riviera setting. Well wherever you are in your journey, you'll find customers just like you throughout all our sessions here with the next two days. From Sligro to Schroders to Societe Generale. You'll hear from cloud builders sharing their best practices and their lessons learned and how they're going all in with Nutanix, for all of their workloads and applications. Whether it's SAP or Splunk, Microsoft Exchange, unified communications, Cloud Foundry or Oracle. You'll also hear how customers just like you are saving millions of Euros by moving from legacy hypervisors to Nutanix AHV. And you'll have a chance to post some of your most challenging technical questions to the Nutanix experts that we have on hand. Our Nutanix technology champions, our MPXs, our MPSs. Where are all the people out there with an N in front of their certification and an X an R an S an E or a C at the end. Can you wave hello? You might be surprised to know that in Europe and the Middle East alone, we have more than 2,600 >> Julie: In Europe and the Middle East alone, we have more than 2,600 certified Nutanix experts. Those are customers, partners, and also employees. I'd also like to say thank you to our growing ecosystem of partners and sponsors who are here with us over the next two days. The companies that you meet here are the ones who are committed to driving innovation in the enterprise cloud. Over the next few days you can look forward to hearing from them and seeing some fantastic technology integration that you can take home to your data center come Monday morning. Together, with our partners, and you our customers, Nutanix has had such an exciting year since we were gathered this time last year. We were named a leader in the Gartner Magic Quadrant for integrated systems two years in a row. Just recently Gartner named us the revenue market share leader in their recent market analysis report on hyper-converged systems. We know enjoy more than 35% revenue share. Thanks to you, our customers, we received a net promoter score of more than 90 points. Not one, not two, not three, but four years in a row. A feat, I'm sure you'll agree, is not so easy to accomplish, so thank you for your trust and your partnership in us. We went public on NASDAQ last September. We've grown to more than 2,800 employees, more than 7,000 customers and 125 countries and in Europe and the Middle East alone, in our Q4 results, we added more than 250 customers just in [Amea 00:11:38] alone. That's about a third of all of our new customer additions. Today, we're at a pivotal point in our journey. We're just barely scratching the surface of something big and Goldman Sachs thinks so too. What you'll hear from us over the next two days is this: Nutanix is on it's way to building and becoming an iconic enterprise software company. By helping you transform your data center and your business with Enterprise Cloud Software that gives you the power of freedom of choice and flexibility in the hardware, the hypervisor and the cloud. The power of one click, one OS, any cloud. And now, to tell you more about the digital transformation that's possible in your business and your industry and share a little bit around the disruption that Nutanix has undergone and how we've continued to reinvent ourselves and maybe, if we're lucky, share a few hand clap dance moves, please welcome to stage Nutanix Founder, CEO and Chairman, Dheeraj Pandey. Ready? Alright, take it away [inaudible 00:13:06]. >> Dheeraj P: Thank you. Thank you, Julie and thank you every one. It looks like people are still trickling. Welcome to Acropolis. I just hope that we can move your applications to Acropolis faster than we've been able to move people into this room, actually. (laughs) But thank you, ladies and gentlemen. Thank you to our customers, to our partners, to our employees, to our sponsors, to our board members, to our performers, to everybody for their precious time. 'Cause that's the most precious thing you actually have, is time. I want to spend a little bit of time today, not a whole lot of time, but a little bit of time talking about the why of Nutanix. Like why do we exist? Why have we survived? Why will we continue to survive and thrive? And it's simpler than an NQ or category name, the word hyper-convergence, I think we are all complicated. Just thinking about what is it that we need to talk about today that really makes it relevant, that makes you take back something from this conference. That Nutanix is an obvious innovation, it's very obvious what we do is not very complicated. Because the more things change, the more they remain the same, so can we draw some parallels from life, from what's going on around us in our own personal lives that makes this whole thing very natural as opposed to "Oh, it's hyper-converged, it's a category, it's analysts and pundits and media." I actually think it's something new. It's not that different, so I want to start with some of that today. And if you look at our personal lives, everything that we had, has been digitized. If anything, a lot of these gadgets became apps, they got digitized into a phone itself, you know. What's Nutanix? What have we done in the last seven, eight years, is we digitized a lot of hardware. We made everything that used to be single purpose hardware look like pure software. We digitized storage, we digitized the systems manager role, an operations manager role. We are digitizing scriptures, people don't need to write scripts anymore when they automate because we can visually design automation with [com 00:15:36]. And we're also trying to make a case that the cloud itself is not just a physical destination. That it can be digitized and must be digitized as well. So we learn that from our personal lives too, but it goes on. Look at music. Used to be tons of things, if you used to go to [inaudible 00:15:55] Records, I'm sure there were European versions of [inaudible 00:15:57] Records as well, the physical things around us that then got digitized as well. And it goes on and on. We look at entertainment, it's very similar. The idea that if you go to a movie hall, the idea that you buy these tickets, the idea that we'd have these DVD players and DVDs, they all got digitized. Or as [inaudible 00:16:20] want to call it, virtualized, actually. That is basically happening in pretty much new things that we never thought would look this different. One of the most exciting things happening around us is the car industry. It's getting digitized faster than we know. And in many ways that we'd not even imagined 10 years ago. The driver will get digitized. Autonomous cars. The engine is definitely gone, it's a different kind of an engine. In fact, we'll re-skill a lot of automotive engineers who actually used to work in mechanical things to look at real chemical things like battery technologies and so on. A lot of those things that used to be physical are now in software in the car itself. Media itself got digitized. Think about a physical newspaper, or physical ads in newspapers. Now we talk about virtual ads, the digital ads, they're all over on websites and so on is our digital experience now. Education is no different, you know, we look back at the kind of things we used to do physically with physical things. Their now all digital. The experience has become that digital. And I can go on and on. You look at retail, you look at healthcare, look at a lot of these industries, they all are at the cusp of a digital disruption. And in fact, if you look at the data, everybody wants it. We all want a digital transformation for industries, for companies around us. In fact, the whole idea of a cloud is a highly digitized data center, basically. It's not just about digitizing servers and storage and networks and security, it's about virtualizing, digitizing the entire data center itself. That's what cloud is all about. So we all know that it's a very natural phenomenon, because it's happening around us and that's the obviousness of Nutanix, actually. Why is it actually a good thing? Because obviously it makes anything that we digitize and we work in the digital world, bring 10X more productivity and decision making efficiencies as well. And there are challenges, obviously there are challenges, but before I talk about the challenges of digitization, think about why are things moving this fast? Why are things becoming digitally disrupted quicker than we ever imagined? There are some reasons for it. One of the big reasons is obviously we all know about Moore's Law. The fact that a lot of hardware's been commoditized, and we have really miniaturized hardware. Nutanix today runs on a palm-sized server. Obviously it runs on the other end of the spectrum with high-end IBM power systems, but it also runs on palm-sized servers. Moore's Law has made a tremendous difference in the way we actually think about consuming software itself. Of course, the internet is also a big part of this. The fact that there's a bandwidth glut, there's Trans-Pacific cables and Trans-Atlantic cables and so on, has really connected us a lot faster than we ever imagined, actually, and a lot of this was also the telecom revolution of the '90s where we really produced a ton of glut for the internet itself. There's obviously a more subtle reason as well, because software development is democratizing. There's consumer-grade programming languages that we never imagined 10, 15, 20 years ago, that's making it so much faster to write- >> Speaker 1: 15-20 years ago that's making it so much faster to write code, with this crowdsourcing that never existed before with Githubs and things like that, open source. There's a lot more stuff that's happening that's outside the boundary of a corporation itself, which is making things so much faster in terms of going getting disrupted and writing things at 10x the speed it used to be 20 years ago. There is obviously this technology at the tip of our fingers, and we all want it in our mobile experience while we're driving, while we're in a coffee shop, and so on; and there's a tremendous focus on design on consumer-grade simplicity, that's making digital disruption that much more compressed in some of sense of this whole cycle of creative disruption that we talk about, is compressed because of mobility, because of design, because of API, the fact that machines are talking to machines, developers are talking to developers. We are going and miniaturizing the experience of organizations because we talk about micro-services and small two-pizza teams, and they all want to talk about each other using APIs and so on. Massive influence on this digital disruption itself. Of course, one of the reasons why this is also happening is because we want it faster, we want to consume it faster than ever before. And our attention spans are reducing. I like the fact that not many people are watching their cell phones right now, but you can imagine the multi-tasking mode that we are all in today in our lives, makes us want to consume things at a faster pace, which is one of the big drivers of digital disruption. But most importantly, and this is a very dear slide to me, a lot of this is happening because of infrastructure. And I can't overemphasize the importance of infrastructure. If you look at why did Google succeed, it was the ninth search engine, after eight of them before, and if you take a step back at why Facebook succeeded over MySpace and so on, a big reason was infrastructure. They believed in scale, they believed in low latency, they believed in being able to crunch information, at 10x, 100x, bigger scale than anyone else before. Even in our geopolitical lives, look at why is China succeeding? Because they've made infrastructure seamless. They've basically said look, governance is about making infrastructure seamless and invisible, and then let the businesses flourish. So for all you CIOs out there who actually believe in governance, you have to think about what's my first role? What's my primary responsibility? It's to provide such a seamless infrastructure, that lines of business can flourish with their applications, with their developers that can write code 10x faster than ever before. And a lot of these tenets of infrastructure, the fact of the matter is you need to have this always-on philosophy. The fact that it's breach-safe culture. Or the fact that operating systems are hardware agnostic. A lot of these tenets basically embody what Nutanix really stands for. And that's the core of what we really have achieved in the last eight years and want to achieve in the coming five to ten years as well. There's a nuance, and obviously we talk about digital, we talk about cloud, we talk about everything actually going to the cloud and so on. What are the things that could slow us down? What are the things that challenge us today? Which is the reason for Nutanix? Again, I go back to this very important point that the reason why we think enterprise cloud is a nuanced term, because the word "cloud" itself doesn't solve for a lot of the problems. The public cloud itself doesn't solve for a lot of the problems. One of the big ones, and obviously we face it here in Europe as well, is laws of the land. We have bureaucracy, which we need to deal with and respect; we have data sovereignty and computing sovereignty needs that we need to actually fulfill as well, while we think about going at breakneck speed in terms of disrupting our competitors and so on. So there's laws of the land, there's laws of physics. This is probably one of the big ones for what the architecture of cloud will look like itself, over the coming five to ten years. Our take is that cloud will need to be more dispersed than they have ever imagined, because computing has to be local to business operations. Computing has to be in hospitals and factories and shop floors and power plants and on and on and on... That's where you really can have operations and computing really co-exist together, cause speed is important there as well. Data locality is one of our favorite things; the fact that computing and data have to be local, at least the most relevant data has to be local as well. And the fact that electrons travel way faster when it's actually local, versus when you have to have them go over a Wide Area Network itself; it's one of the big reasons why we think that the cloud will actually be more nuanced than just some large data centers. You need to disperse them, you need to actually think about software (cloud is about software). Whether data plane itself could be dispersed and even miniaturized in small factories and shop floors and hospitals. But the control plane of the cloud is centralized. And that's the way you can have the best of both worlds; the control plane is centralized. You think as if you're managing one massive data center, but it's not because you're really managing hundreds or thousands of these sites. Especially if you think about edge-based computing and IoT where you really have your tentacles in tens of thousands of smaller devices and so on. We've talked about laws of the land, which is going to really make this digital transformation nuanced; laws of physics; and the third one, which is really laws of entropy. These are hackers that do this for adrenaline. These are parochial rogue states. These are parochial geo-politicians, you know, good thing I actually left the torture sign there, because apparently for our creative designer, geo-politics is equal to torture as well. So imagine one bad tweet can actually result in big changes to the way we actually live in this world today. And it's important. Geo-politics itself is digitized to a point where you don't need a ton of media people to go and talk about your principles and what you stand for and what you strategy for, for running a country itself is, and so on. And these are all human reasons, political reasons, bureaucratic reasons, compliance and regulations reasons, that, and of course, laws of physics is yet another one. So laws of physics, laws of the land, and laws of entropy really make us take a step back and say, "What does cloud really mean, then?" Cause obviously we want to digitize everything, and it all should appear like it's invisible, but then you have to nuance it for the Global 5000, the Global 10000. There's lots of companies out there that need to really think about GDPR and Brexit and a lot of the things that you all deal with on an everyday basis, actually. And that's what Nutanix is all about. Balancing what we think is all about technology and balancing that with things that are more real and practical. To deal with, grapple with these laws of the land and laws of physics and laws of entropy. And that's where we believe we need to go and balance the private and the public. That's the architecture, that's the why of Nutanix. To be able to really think about frictionless control. You want things to be frictionless, but you also realize that you are a responsible citizen of this continent, of your countries, and you need to actually do governance of things around you, which is computing governance, and data governance, and so on. So this idea of melding the public and the private is really about melding control and frictionless together. I know these are paradoxical things to talk about like how do you really have frictionless control, but that's the life you all lead, and as leaders we have to think about this series of paradoxes itself. And that's what Nutanix strategy, the roadmap, the definition of enterprise cloud is really thinking about frictionless control. And in fact, if anything, it's one of the things is also very interesting; think about what's disrupting Nutanix as a company? We will be getting disrupted along the way as well. It's this idea of true invisibility, the public cloud itself. I'd like to actually bring on board somebody who I have a ton of respect for, this leader of a massive company; which itself is undergoing disruption. Which is helping a lot of its customers undergo disruption as well, and which is thinking about how the life of a business analyst is getting digitized. And what about the laws of the land, the laws of physics, and laws of entropy, and so on. And we're learning a lot from this partner, massively giant company, called IBM. So without further ado, Bob Picciano. >> Bob Picciano: Thanks, >> Speaker 1: Thank you so much, Bob, for being here. I really appreciate your presence here- >> Bob Picciano: My pleasure! >> Speaker 1: And for those of you who actually don't know Bob, Bob is a Senior VP and General Manager at IBM, and is all things cognitive and obviously- >> Speaker 1: IBM is all things cognitive. Obviously, I learn a lot from a lot of leaders that have spent decades really looking at digital disruption. >> Bob: Did you just call me old? >> Speaker 1: No. (laughing) I want to talk about experience and talking about the meaning of history, because I love history, actually, you know, and I don't want to make you look old actually, you're too young right now. When you talk about digital disruption, we look at ourselves and say, "Look we are not extremely invisible, we are invisible, but we have not made something as invisible as the public clouds itself." And hence as I. But what's digital disruption mean for IBM itself? Now, obviously a lot of hardware is being digitized into software and cloud services. >> Bob: Yep. >> Speaker 1: What does it mean for IBM itself? >> Bob: Yeah, if you allow me to take a step back for a moment, I think there is some good foundational understanding that'll come from a particular point of view. And, you talked about it with the number of these dimensions that are affecting the way businesses need to consider their competitiveness. How they offer their capabilities into the market place. And as you reflected upon IBM, you know, we've had decades of involvement in information technology. And there's a big disruption going on in the information technology space. But it's what I call an accretive disruption. It's a disruption that can add value. If you were to take a step back and look at that digital trajectory at IBM you'd see our involvement with information technology in a space where it was all oriented around adding value and capability to how organizations managed inscale processes. Thinking about the way they were going to represent their businesses in a digital form. We came to call them applications. But it was how do you open an account, how do you process a claim, how do you transfer money, how do you hire an employee? All the policies of a company, the way the people used to do it mechanically, became digital representations. And that foundation of the digital business process is something that IBM helped define. We invented the role of the CIO to help really sponsor and enter in this notion that businesses could re represent themselves in a digital way and that allowed them to scale predictably with the qualities of their brand, from local operations, to regional operations, to international operations, and show up the same way. And, that added a lot of value to business for many decades. And we thrived. Many companies, SAP all thrived during that span. But now we're in a new space where the value of information technology is hitting a new inflection point. Which is not about how you scale process, but how you scale insight, and how you scale wisdom, and how you scale knowledge and learning from those operational systems and the data that's in those operational systems. >> Speaker 1: How's it different from 1993? We're talking about disruption. There was a time when IBM reinvented itself, 20-25 years ago. >> Bob: Right. >> Speaker 1: And you said it's bigger than 25 years ago. Tell us more. >> Bob: You know, it gets down. Everything we know about that process space right down to the very foundation, the very architecture of the CPU itself and the computer architecture, the von Neumann architecture, was all optimized on those relatively static scaled business processes. When you move into the notion where you're going to scale insight, scale knowledge, you enter the era that we call the cognitive era, or the era of intelligence. The algorithms are very different. You know the data semantically doesn't integrate well across those traditional process based pools and reformation. So, new capabilities like deep learning, machine learning, the whole field of artificial intelligence, allows us to reach into that data. Much of it unstructured, much of it dark, because it hasn't been indexed and brought into the space where it is directly affecting decision making processes in a business. And you have to be able to apply that capability to those business processes. You have to rethink the computer, the circuitry itself. You have to think about how the infrastructure is designed and organized, the network that is required to do that, the experience of the applications as you talked about have to be very natural, very engaging. So IBM does all of those things. So as a function of our transformation that we're on now, is that we've had to reach back, all the way back from rethinking the CPU, and what we dedicate our time and attention to. To our services organization, which is over 130,000 people on the consulting side helping organizations add digital intelligence to this notion of a digital business. Because, the two things are really a confluence of what will make this vision successful. >> Speaker 1: It looks like massive amounts of change for half a million people who work with the company. >> Bob: That's right. >> Speaker 1: I'm sure there are a lot of large customers out here, who will also read into this and say, "If IBM feels disrupted ... >> Bob: Uh hm >> Speaker 1: How can we actually stay not vulnerable? Actually there is massive amounts of change around their own competitive landscape as well. >> Bob: Look, I think every company should feel vulnerable right. If you're at this age, this cognitive era, the age of digital intelligence, and you're not making a move into being able to exploit the capabilities of cognition into the business process. You are vulnerable. If you're at that intersection, and your competitor is passing through it, and you're not taking action to be able to deploy cognitive infrastructure in conjunction with the business processes. You're going to have a hard time keeping up, because it's about using the machines to do the training to augment the intelligence of our employees of our professionals. Whether that's a lawyer, or a doctor, an educator or whether that's somebody in a business function, who's trying to make a critical business decision about risk or about opportunity. >> Speaker 1: Interesting, very interesting. You used the word cognitive infrastructure. >> Bob: Uh hm >> Speaker 1: There's obviously computer infrastructure, data infrastructure, storage infrastructure, network infrastructure, security infrastructure, and the core of cognition has to be infrastructure as well. >> Bob: Right >> Speaker 1: Which is one of the two things that the two companies are working together on. Tell us more about the collaboration that we are actually doing. >> Bob: We are so excited about our opportunity to add value in this space, so we do think very differently about the cognitive infrastructure that's required for this next generation of computing. You know I mentioned the original CPU was built for very deterministic, very finite operations; large precision floating point capabilities to be able to accurately calculate the exact balance, the exact amount of transfer. When you're working in the field of AI in cognition. You actually want variable precision. Right. The data is very sparse, as opposed to the way that deterministic or scorecastic operations work, which is very dense or very structured. So the algorithms are redefining the processes that the circuitry actually has to run. About five years ago, we dedicated a huge effort to rethink everything about the chip and what we made to facilitate an orchestra of participation to solve that problem. We all know the GPU has a great benefit for deep learning. But the GPU in many cases, in many architectures, specifically intel architectures, it's dramatically confined by a very small amount of IO bandwidth that intel allows to go on and off the chip. At IBM, we looked at all 686 roughly square millimeters of our chip and said how do we reuse that square area to open up that IO bandwidth? So the innovation of a GPU or a FPGA could really be utilized to it's maximum extent. And we could be an orchestrator of all of the diverse compute that's going to be necessary for AI to really compel these new capabilities. >> Speaker 1: It's interesting that you mentioned the fact that you know power chips have been redefined for the cognitive era. >> Bob: Right, for Lennox for the cognitive era. >> Speaker 1: Exactly, and now the question is how do you make it simple to use as well? How do you bring simplicity which is where ... >> Bob: That's why we're so thrilled with our partnership. Because you talked about the why of Nutanix. And it really is about that empowerment. Doing what's natural. You talked about the benefits of calm and being able to really create that liberation of an information technology professional, whether it's in operations or in development. Having the freedom of action to make good decisions about defining the infrastructure and deploying that infrastructure and not having to second guess the physical limitations of what they're going to have to be dealing with. >> Speaker 1: That's why I feel really excited about the fact that you have the power of software, to really meld the two forms together. The intel form and the power form comes together. And we have some interesting use cases that our CIO Randy Phiffer is also really exploring, is how can a power form serve as a storage form for our intel form. >> Bob: Sure. >> Speaker 1: It can serve files and mocks and things like that. >> Bob: Any data intensive application where we have seen massive growth in our Lennox business, now for our business, Lennox is 20% of the revenue of our power systems. You know, we started enabling native Lennox distributions on top of little Indian ones, on top of the power capabilities just a few years ago, and it's rocketed. And the reason for that if for any data intensive application like a data base, a no sequel database or a structured data base, a dupe in the unstructured space, they typically run about three to four times better price performance on top of Lennox on power, than they will on top of an intel alternative. >> Speaker 1: Fascinating. >> Bob: So all of these applications that we're talking about either create or consume a lot of data, have to manage a lot of flexibility in that space, and power is a tremendous architecture for that. And you mentioned also the cohabitation, if you will, between intel and power. What we want is that optionality, for you to utilize those benefits of the 3X better price performance where they apply and utilize the commodity base where it applies. So you get the cost benefits in that space and the depth and capability in the space for power. >> Speaker 1: Your tongue in cheek remark about commodity intel is not lost on people actually. But tell us about... >> Speaker 1: Intel is not lost on people actually. Tell us about ... Obviously we digitized Linux 10, 15 years ago with [inaudible 00:40:07]. Have you tried to talk about digitizing AIX? That is the core of IBM's business for the last 20, 25, 30 years. >> Bob: Again, it's about this ability to compliment and extend the investments that businesses have made during their previous generations of decision making. This industry loves to talk about shifts. We talked about this earlier. That was old, this is new. That was hard, this is easy. It's not about shift, it's about using the inflection point, the new capability to extend what you already have to make it better. And that's one thing that I must compliment you, and the entire Nutanix organization. It's really empowering those applications as a catalog to be deployed, managed, and integrated in a new way, and to have seamless interoperability into the cloud. We see the AIX workload just having that same benefit for those businesses. And there are many, many 10's of thousands around the world that are critically dependent on every element of their daily operations and productivity of that operating platform. But to introduce that into that network effect as well. >> Speaker 1: Yeah. I think we're looking forward to how we bring the same cloud experience on AIX as well because as a company it keeps us honest when we don't scoff at legacy. We look at these applications the last 10, 15, 20 years and say, "Can we bring them into the new world as well?" >> Bob: Right. >> Speaker 1: That's what design is all about. >> Bob: Right. >> Speaker 1: That's what Apple did with musics. We'll take an old world thing and make it really new world. >> Bob: Right. >> Speaker 1: The way we consume things. >> Bob: That governance. The capability to help protect against the bad actors, the nefarious entropy players, as you will. That's what it's all about. That's really what it takes to do this for the enterprise. It's okay, and possibly easier to do it in smaller islands of containment, but when you think about bringing these class of capabilities into an enterprise, and really helping an organization drive both the flexibility and empowerment benefits of that, but really be able to depend upon it for international operations. You need that level of support. You need that level of capability. >> Speaker 1: Awesome. Thank you so much Bob. Really appreciate you coming. [crosstalk 00:42:14] Look forward to your [crosstalk 00:42:14]. >> Bob: Cheers. Thank you. >> Speaker 1: Thanks again for all of you. I know that people are sitting all the way up there as well, which is remarkable. I hope you can actually see some of the things that Sunil and the team will actually bring about, talk about live demos. We do real stuff here, which is truly live. I think one of the requests that I have is help us help you navigate the digital disruption that's upon you and your competitive landscape that's around you that's really creating that disruption. Thank you again for being here, and welcome again to Acropolis. >> Speaker 3: Ladies and gentlemen, please welcome Chief Product and Development Officer, Nutanix Sunil Potti. >> Sunil Potti: Okay, so I'm going to just jump right in because I know a bunch of you guys are here to see the product as well. We are a lot of demos lined up for you guys, and we'll try to mix in the slides, and the demos as well. Here's just an example of the things I always bring up in these conferences to look around, and say in the last few months, are we making progress in simplifying infrastructure? You guys have heard this again and again, this has been our mantra from the beginning, that the hotter things get, the more differentiated a company like Nutanix can be if we can make things simple, or keep things simple. Even though I like this a lot, we found something a little bit more interesting, I thought, by our European marketing team. If you guys need these tea bags, which you will need pretty soon. It's a new tagline for the company, not really. I thought it was apropos. But before I get into the product and the demos, to give you an idea. Every time I go to an event you find ways to memorialize the event. You meet people, you build relationships, you see something new. Last night, nothing to do with the product, I sat beside someone. It was a customer event. I had no idea who I was sitting beside. He was a speaker. How many of you guys know him, by the way? Sir Ranulph Fiennes. Few hands. Good for you. I had no idea who I was sitting beside. I said, "Oh, somebody called Sir. I should be respectful." It's kind of hard for me to be respectful, but I tried. He says, "No, I didn't do anything in the sense. My grandfather was knighted about 100 years ago because he was the governor of Antigua. And when he dies, his son becomes." And apparently Sir Ranulph's dad also died in the war, and so that's how he is a sir. But then I started looking it up because he's obviously getting ready to present. And the background for him is, in my opinion, even though the term goes he's the World's Greatest Living Explorer. I would have actually called it the World's Number One Stag, and I'll tell you why. Really, you should go look it up. So this guy, at the age of 21, gets admitted to Special Forces. If you're from the UK, this is as good as it gets, SAS. Six, seven years into it, he rebels, helps out his local partner because he doesn't like a movie who's building a dam inside this pretty village. And he goes and blows up a dam, and he's thrown out of that Special Forces. Obviously he's in demolitions. Goes all the way. This is the '60's, by the way. Remember he's 74 right now. The '60's he goes to Oman, all by himself, as the only guy, only white guy there. And then around the '70's, he starts truly exploring, truly exploring. And this is where he becomes really, really famous. You have to go see this in real life, when he sees these videos to really appreciate the impact of this guy. All by himself, he's gone across the world. He's actually gone across Antarctica. Now he tells me that Antarctica is the size of China and India put together, and he was prepared for -50 to 60 degrees, and obviously he got -130 degrees. Again, you have to see the videos, see his frostbite. Two of his fingers are cut off, by the way. He hacksawed them himself. True story. And then as he, obviously, aged, his body couldn't keep up with him, but his will kept up with him. So after a recent heart attack, he actually ran seven marathons. But most importantly, he was telling me this story, at 65 he wanted to do something different because his body was letting him down. He said, "Let me do something easy." So he climbed Mount Everest. My point being, what is this related to Nutanix? Is that if Nutanix is a company, without technology, allows to spend more time on life, then we've accomplished a piece of our vision. So keep that in mind. Keep that in mind. Now comes the boring part, which is the product. The why, what, how of Nutanix. Neeris talked about this. We have two acts in this company. Invisible Infrastructure was what we started off. You heard us talk about it. How did we do it? Using one-click technologies by converging infrastructure, computer storage, virtualization, et cetera, et cetera. What we are now about is about changing the game. Saying that just like we'd applicated what powers Google and Amazon inside the data center, could we now make them all invisible? Whether it be inside or outside, could we now make clouds invisible? Clouds could be made invisible by a new level of convergence, not about computer storage, but converging public and private, converging CAPEX and OPEX, converging consumption models. And there, beyond our core products, Acropolis and Prism, are these new products. As you know, we have this core thesis, right? The core thesis says what? Predictable workloads will stay inside the data center, elastic workloads will go outside, as long as the experience on both sides is the same. So if you can genuinely have a cloud-like experience delivered inside a data center, then that's the right a- >> Speaker 1: Genuinely have a cloud like experience developed inside the data center. And that's the right answer of predictable workloads. Absolutely the answer of elastic workloads, doesn't matter whether security or compliance. Eventually a public cloud will have a data center right beside your region, whether through local partner or a top three cloud partner. And you should use it as your public cloud of choice. And so, our goal is to ensure that those two worlds are converged. And that's what Calm does, and we'll talk about that. But at the same time, what we found in late 2015, we had a bunch of customers come to us and said "Look, I love this, I love the fact that you're going to converge public and private and all that good stuff. But I have these environments and these apps that I want to be delivered as a service but I want the same operational tooling. I don't want to have two different environments but I don't want to manage my data centers. Especially my secondary data centers, DR data centers." And that's why we created Xi, right? And you'll hear a lot more about this, obviously it's going to start off in the U.S but very rapidly launch in Europe, APJ globally in the next 9-12 months. And so we'll spend some quality time on those products as well today. So, from the journey that we're at, we're starting with the score cloud that essentially says "Look, your public and private needs to be the same" We call that the first instantiation of your cloud architectures and we're essentially as a company, want to build this enterprise cloud operating system as a fabric across public and private. But that's just the starting point. The starting point evolves to the score architecture that we believe that the cloud is being dispersed. Just like you have a public and a private cloud in the core data centers and so forth, you'll need a similar experience inside your remote office branch office, inside your DR data centers, inside your branches, and it won't stop there. It'll go all the way to the edge. All we're already seeing this right? Not just in the army where your forward operating bases in Afghanistan having a three note cluster sitting inside a tent. But we're seeing this in a variety of enterprise scenarios. And here's an example. So, here's a customer, global oil and gas company, has couple of primary data centers running Nutanix, uses GCP as a core public cloud platform, has a whole bunch of remote offices, but it also has this interesting new edge locations in the form of these small, medium, large size rigs. And today, they're in the process of building a next generation cloud architecture that's completely dispersed. They're using one node, coming out on version 5.5 with Nutanix. They're going to use two nodes, they're going to throw us three nods, multicultural architectures. Day one, they're going to centrally manage it using Prism, with one click upgrades, right? And then on top of that, they're also now provisioning using Calm, purpose built apps for the various locations. So, for example, there will be a re control app at the edge, there's an exploration data lag in Google and so forth. My point being that increasingly this architecture that we're talking about is happening in real time. It's no longer just an existing cellular civilization data center that's being replatformed to look like a private cloud and so forth, or a hybrid cloud. But the fact that you're going into this multi cloud era is getting excel bated, the more someone consumes AWL's GCP or any public cloud, the more they're excel bating their internal transformation to this multi cloud architecture. And so that's what we're going to talk about today, is this construct of ONE OS and ONE Click, and when you think about it, every company has a standard stack. So, this is the only slide you're going to see from me today that's a stack, okay? And if you look at the new release coming out, version 5.5, it's coming out imminently, easiest way to say it is that it's got a ton of functionality. We've jammed as much as we can onto one slide and then build a product basically, okay? But I would encourage you guys to check out the release, it's coming out shortly. And we can go into each and every feature here, we'd be spending a lot of time but the way that we look at building Nutanix products as many of you know, it is not feature at a time. It's experience at a time. And so, when you really look at Nutanix using a lateral view, and that's how we approach problems with our customers and partners. We think about it as a life cycle, all the way from learning to using, operating, and then getting support and experiences. And today, we're going to go through each of these stages with you. And who better to talk about it than our local version of an architect, Steven Poitras please come up on stage. I don't know where you are, Steven come on up. You tucked your shirt in? >> Speaker 2: Just for you guys today. >> Speaker 1: Okay. Alright. He's sort of putting on his weight. I know you used a couple of tight buckles there. But, okay so Steven so I know we're looking for the demo here. So, what we're going to do is, the first step most of you guys know this, is we've been quite successful with CE, it's been a great product. How many of you guys like CE? Come on. Alright. I know you had a hard time downloading it yesterday apparently, there's a bunch of guys had a hard time downloading it. But it's been a great way for us not just to get you guys to experience it, there's more than 25,000 downloads and so forth. But it's also a great way for us to see new features like IEME and so forth. So, keep an eye on CE because we're going to if anything, explode the way that we actually use as a way to get new features out in the next 12 months. Now, one thing beyond CE that we did, and this was something that we did about ... It took us about 12 months to get it out. While people were using CE to learn a lot, a lot of customers were actually getting into full blown competitive evals, right? Especially with hit CI being so popular and so forth. So, we came up with our own version called X-Ray. >> Speaker 2: Yup. >> Speaker 1: What does X-Ray do before we show it? >> Speaker 2: Yeah. Absolutely. So, if we think about back in the day we were really the only ACI platform out there on the market. Now there are a few others. So, to basically enable the customer to objectively test these, we came out with X-Ray. And rather than talking about the slide let's go ahead and take a look. Okay, I think it's ready. Perfect. So, here's our X-Ray user interface. And essentially what you do is you specify your targets. So, in this case we have a Nutanix 80150 as well as some of our competitors products which we've actually tested. Now we can see on the left hand side here we see a series of tests. So, what we do is we go through and specify certain workloads like OLTP workloads, database colocation, and while we do that we actually inject certain test cases or scenarios. So, this can be snapshot or component failures. Now one of the key things is having the ability to test these against each other. So, what we see here is we're actually taking a OLTP workload where we're running two virtual machines, and then we can see the IOPS OLTP VM's are actually performing here on the left hand side. Now as we're actually go through this test we perform a series of snapshots, which are identified by these red lines here. Now as you can see, the Nutanix platform, which is shown by this blue line, is purely consistent as we go through this test. However, our competitor's product actually degrades performance overtime as these snapshots are taken. >> Speaker 1: Gotcha. And some of these tests by the way are just not about failure or benchmarking, right? It's a variety of tests that we have that makes real life production workloads. So, every couple of months we actually look at our production workloads out there, subset those two cases and put it into X-Ray. So, X-Ray's one of those that has been more recently announced into the public. But it's already gotten a lot of update. I would strongly encourage you, even if you an existing Nutanix customer. It's a great way to keep us honest, it's a great way for you to actually expand your usage of Nutanix by putting a lot of these real life tests into production, and as and when you look at new alternatives as well, there'll be certain situations that we don't do as well and that's a great way to give us feedback on it. And so, X-Ray is there, the other one, which is more recent by the way is a fact that most of you has spent many days if not weeks, after you've chosen Nutanix, moving non-Nutanix workloads. I.e. VMware, on three tier architectures to Atrio Nutanix. And to do that, we took a hard look and came out with a new product called Xtract. >> Speaker 2: Yeah. So essentially if we think about what Nutanix has done for the data center really enables that iPhone like experience, really bringing it simplicity and intuitiveness to the data center. Now what we wanted to do is to provide that same experience for migrating existing workloads to us. So, with Xtract essentially what we've done is we've scanned your existing environment, we've created design spec, we handled the migration process ... >> Steven: ... environment, we create a design spec. We handle for the migration process as well as the cut over. Now, let's go ahead and take a look in our extract user interface here. What we can see is we have a source environment. In this case, this is a VC environment. This can be any VC, whether it's traditional three tier or hypherconverged. We also see our Nutanix target environments. Essentially, these are our AHV target clusters where we're going to be migrating the data and performing the cut over to you. >> Speaker 2: Gotcha. Steven: The first thing that we do here is we go ahead and create a new migration plan. Here, I'm just going to specify this as DB Wave 2. I'll click okay. What I'm doing here is I'm selecting my target Nutanix cluster, as well as my target Nutanix container. Once I'll do that, I'll click next. Now in this case, we actually like to do it big. We're actually going to migrate some production virtual machines over to this target environment. Here, I'm going to select a few windows instances, which are in our database cluster. I'll click next. At this point, essentially what's occurring is it's going through taking a look at these virtual machines as well as taking a look at the target environment. It takes a look at the resources to ensure that we actually have enough, an ample capacity to facilitate the workload. The next thing we'll do is we'll go ahead and type in our credentials here. This is actually going to be used for logging into the virtual machine. We can do a new device driver installation, as well as get any static IP configuration. Well specify our network mapping. Then from there, we'll click next. What we'll do is we'll actually save and start. This will go through create the migration plan. It'll do some analysis on these virtual machines to ensure that we can actually log in before we actually start migrating data. Here we have a migration, which has been in progress. We can see we have a few virtual machines, obviously some Linux, some Windows here. We've cut over a few. What we do to actually cut over these VMS, is go ahead select the VMS- Speaker 2: This is the actual task of actually doing the final stage of cut over. Steven: Yeah, exactly. That's one of the nice things. Essentially, we can migrate the data whenever we want. We actually hook into the VADP API's to do this. Then every 10 minutes, we send over a delta to sync the data. Speaker 2: Gotcha, gotcha. That's how one click migration can now be possible. This is something that if you guys haven't used this, this has been out in the wild, just for a month or so. Its been probably one of our bestselling, because it's free, bestselling features of the recent product release. I've had customers come to me and say, "Look, there are situations where its taken us weeks to move data." That is now minutes from the operator perspective. Forget where the director, or the VP, it's the line architecture and operator that really loves these tools, which is essentially the core of Nutanix. That's one of our core things, is to make sure that if we can keep the engineer and the architect truly happy, then everything else will be fine for us, right? That's extract. Then we have a lot of things, right? We've done the usual things, there's a tunnel functionality on day zero, day one, day two, kind of capabilities. Why don't we start with something around Prism Central, now that we can do one click PC installs? We can do PC scale outs, we can go from managing thousands of VMS, tens of thousands of VMS, while doing all the one click operations, right? Steven: Yep. Speaker 2: Why don't we take a quick look at what's new in Prism Central? Steven: Yep. Absolutely. Here, we can see our Prism element interface. As you mentioned, one of the key things we added here was the ability to deploy Prism Central very simply just with a few clicks. We'll actually go through a distributed PC scale of deployment here. Here, we're actually going to deploy, as this is a new instance. We're going to select our 5.5 version. In this case, we're going to deploy a scale out Prism Central cluster. Obviously, availability and up-time's very critical for us, as we're mainly distributed systems. In this case we're going to deploy a scale-out PC cluster. Here we'll select our number of PC virtual machines. Based upon the number of VMS, we can actually select our size of VM that we'd deploy. If we want to deploy 25K's report, we can do that as well. Speaker 2: Basically a thousand to tens of thousands of VM's are possible now. Steven: Yep. That's a nice thing is you can start small, and then scale out as necessary. We'll select our PC network. Go ahead and input our IP address. Now, we'll go to deploy. Now, here we can see it's actually kicked off the deployment, so it'll go provision these virtual machines to apply the configuration. In a few minutes, we'll be up and running. Speaker 2: Right. While Steven's doing that, one of the things that we've obviously invested in is a ton of making VM operations invisible. Now with Calm's, what we've done is to up level that abstraction. Two applications. At the end of the day, more and more ... when you go to AWS, when you go to GCP, you go to [inaudible 01:04:56], right? The level of abstractions now at an app level, it's cloud formations, and so forth. Essentially, what Calm's able to do is to give you this marketplace that you can go in and self-service [inaudible 01:05:05], create this internal cloud like environment for your end users, whether it be business owners, technology users to self-serve themselves. The process is pretty straightforward. You, as an operator, or an architect, or [inaudible 01:05:16] create these blueprints. Consumers within the enterprise, whether they be self-service users, whether they'll be end business users, are able to consume them for a simple marketplace, and deploy them on whether it be a private cloud using Nutanix, or public clouds using anything with public choices. Then, as a single frame of glass, as operators you're doing conversed operations, at an application centric level between [inaudible 01:05:41] across any of these clouds. It's this combination of producer, consumer, operator in a curated sense. Much like an iPhone with an app store. It's the core construct that we're trying to get with Calm to up level the abstraction interface across multiple clouds. Maybe we'll do a quick demo of this, and then get into the rest of the stuff, right? Steven: Sure. Let's check it out. Here we have our Prism Central user interface. We can see we have two Nutanix clusters, our cloudy04 as well as our Power8 cluster. One of the key things here that we've added is this apps tab. I'm clicking on this apps tab, we can see that we have a few [inaudible 01:06:19] solutions, we have a TensorFlow solution, a [inaudible 01:06:22] et cetera. The nice thing about this is, this is essentially a marketplace where vendors as well as developers could produce these blueprints for consumption by the public. Now, let's actually go ahead and deploy one of these blueprints. Here we have a HR employment engagement app. We can see we have three different tiers of services part of this. Speaker 2: You need a lot of engagement at HR, you know that. Okay, keep going. Steven: Then the next thing we'll do here is we'll go and click on. Based upon this, we'll specify our blueprint name, HR app. The nice thing when I'm deploying is I can actually put in back doors. We'll click clone. Now what we can see here is our blueprint editor. As a developer, I could actually go make modifications, or even as an in-user given the simple intuitive user interface. Speaker 2: This is the consumers side right here, but it's also the [inaudible 01:07:11]. Steven: Yep, absolutely. Yeah, if I wanted to make any modifications, I could select the tier, I could scale out the number of instances, I could modify the packages. Then to actually deploy, all I do is click launch, specify HR app, and click create. Speaker 2: Awesome. Again, this is coming in 5.5. There's one other feature, by the way, that is coming in 5.5 that's surrounding Calm, and Prism Pro, and everything else. That seems to be a much awaited feature for us. What was that? Steven: Yeah. Obviously when we think about multi-tenant, multi-cloud role based access control is a very critical piece of that. Obviously within the organization, we're going to have multiple business groups, multiple units. Our back's a very critical piece. Now, if we go over here to our projects, we can see in this scenario we just have a single project. What we've added is if you want to specify certain roles, in this case we're going to add our good friend John Doe. We can add them, it could be a user or group, but then we specify their role. We can give a developer the ability to edit and create these blueprints, or consumer the ability to actually provision based upon. Speaker 2: Gotcha. Basically in 5.5, you'll have role based access control now in Prism and Calm burned into that, that I believe it'll support custom role shortly after. Steven: Yep, okay. Speaker 2: Good stuff, good stuff. I think this is where the Nutanix guys are supposed to clap, by the way, so that the rest of the guys can clap. Steven: Thank you, thank you. Okay. What do we have? Speaker 2: We have day one stuff, obviously there's a ton of stuff that's coming in core data path capabilities that most of you guys use. One of the most popular things is synchronous replication, especially in Europe. Everybody wants to do [Metro 01:08:49] for whatever reason. But we've got something new, something even more enhanced than Metro, right? Steven: Yep. Speaker 2: Do you want to talk a little bit about it? Steven: Yeah, let's talk about it. If we think about what we had previously, we started out with a synchronous replication. This is essentially going to be your higher RPO. Then we moved into Metro cluster, which was RPO zero. Those are two ins of the gamete. What we did is we introduced new synchronous replication, which really gives you the best of both worlds where you have very, very decreased RPO's, but zero impact in line mainstream performance. Speaker 2: That's it. Let's show something. Steven: Yeah, yeah. Let's do it. Here, we're back at our Prism Element interface. We'll go over here. At this point, we provisioned our HR app, the next thing we need to do is to protect that data. Let's go here to protection domain. We'll create a new PD for our HR app. Speaker 2: You clearly love HR. Steven: Spent a lot of time there. Speaker 2: Yeah, yeah, yeah. Steven: Here, you can see we have our production lamp DBVM. We'll go ahead and protect that entity. We can see that's protected. The next thing we'll do is create a schedule. Now, what would you say would be a good schedule we should actually shoot for? Speaker 2: I don't know, 15 minutes? Steven: 15 minutes is not bad. But I ... Section 7 of 13 [01:00:00 - 01:10:04] Section 8 of 13 [01:10:00 - 01:20:04] (NOTE: speaker names may be different in each section) Speaker 1: ... 15 minutes. Speaker 2: 15 minutes is not bad, but I think the people here deserve much better than that, so I say let's shoot for ... what about 15 seconds? Speaker 1: Yeah. They definitely need a bathroom break, so let's do 15 seconds. Speaker 2: Alright, let's do 15 seconds. Speaker 1: Okay, sounds good. Speaker 2: K. Then we'll select our retention policy and remote cluster replicate to you, which in this case is wedge. And we'll go ahead and create the schedule here. Now at this point we can see our protection domain. Let's go ahead and look at our entities. We can see our database virtual machine. We can see our 15 second schedule, our local snapshots, as well as we'll start seeing our remote snapshots. Now essentially what occurs is we take two very quick snapshots to essentially see the initial data, and then based upon that then we'll start taking our continuous 15 second snaps. Speaker 1: 15 seconds snaps, and obviously near sync has less of impact than synchronous, right? From an architectural perspective. Speaker 2: Yeah, and that's a nice thing is essentially within the cluster it's truly pure synchronous, but externally it's just a lagged a-sync. Speaker 1: Gotcha. So there you see some 15 second snapshots. So near sync is also built into five-five, it's a long-awaited feature. So then, when we expand in the rest of capabilities, I would say, operations. There's a lot of you guys obviously, have started using Prism Pro. Okay, okay, you can clap. You can clap. It's okay. It was a lot of work, by the way, by the core data pad team, it was a lot of time. So Prism Pro ... I don't know if you guys know this, Prism Central now run from zero percent to more than 50 percent attach on install base, within 18 months. And normally that's a sign of true usage, and true value being supported. And so, many things are new in five-five out on Prism Pro starting with the fact that you can do data[inaudible 01:11:49] base lining, alerting, so that you're not capturing a ton of false positives and tons of alerts. We go beyond that, because we have this core machine-learning technology power, we call it cross fit. And, what we've done is we've used that as a foundation now for pretty much all kinds of operations benefits such as auto RCA, where you're able to actually map to particular [inaudible 01:12:12] crosses back to who's actually causing it whether it's the network, a computer, and so forth. But then the last thing that we've also done in five-five now that's quite different shading, is the fact that you can now have a lot of these one-click recommendations and remediations, such as right-sizing, the fact that you can actually move around [inaudible 01:12:28] VMs, constrained VMs, and so forth. So, I now we've packed a lot of functionality in Prism Pro, so why don't we spend a couple of minutes quickly giving a sneak peak into a few of those things. Speaker 2: Yep, definitely. So here we're back at our Prism Central interface and one of the things we've added here, if we take a look at one of our clusters, we can see we have this new anomalies portion here. So, let's go ahead and select that and hop into this. Now let's click on one of these anomaly events. Now, essentially what the system does is we monitor all the entities and everything running within the system, and then based upon that, we can actually determine what we expect the band of values for these metrics to be. So in this scenario, we can see we have a CPU usage anomaly event. So, normal time, we expect this to be right around 86 to 100 percent utilization, but at this point we can see this is drastically dropped from 99 percent to near zero. So, this might be a point as an administrator that I want to go check out this virtual machine, ensure that certain services and applications are still up and running. Speaker 1: Gotcha, and then also it changes the baseline based on- Speaker 2: Yep. Yeah, so essentially we apply machine-learning techniques to this, so the system will dynamically adjust based upon the value adjustment. Speaker 1: Gotcha. What else? Speaker 2: Yep. So the other thing here that we mentioned was capacity planning. So if we go over here, we can take a look at our runway. So in this scenario we have about 30 days worth of runway, which is most constrained by memory. Now, obviously, more nodes is all good for everyone, but we also want to ensure that you get the maximum value on your investment. So here we can actually see a few recommendations. We have 11 overprovision virtual machines. These are essentially VMs which have more resources than are necessary. As well as 19 inactives, so these are dead VMs essentially that haven't been powered on and not utilized. We can also see we have six constrained, as well as one bully. So, constrained VMs are essentially VMs which are requesting more resources than they actually have access to. This could be running at 100 percent CPU utilization, or 100 percent memory, or storage utilization. So we could actually go in and modify these. Speaker 1: Gotcha. So these are all part of the auto remediation capabilities that are now possible? Speaker 2: Yeah. Speaker 1: What else, do you want to take reporting? Speaker 2: Yeah. Yeah, so I know reporting is a very big thing, so if we think about it, we can't rely on an administrator to constantly go into Prism. We need to provide some mechanism to allow them to get emailed reports. So what we've done is we actually autogenerate reports which can be sent via email. So we'll go ahead and add one of these sample reports which was created today. And here we can actually get specific detailed information about our cluster without actually having to go into Prism to get this. Speaker 1: And you can customize these reports and all? Speaker 2: Yep. Yeah, if we hop over here and click on our new report, we can actually see a list of views we could add to these reports, and we can mix and match and customize as needed. Speaker 1: Yeah, so that's the operational side. Now we also have new services like AFS which has been quite popular with many of you folks. We've had hundreds of customers already on it live with SMB functionality. You want to show a couple of things that is new in five-five? Speaker 2: Yeah. Yep, definitely. So ... let's wait for my screen here. So one of the key things is if we looked at that runway tab, what we saw is we had over a year's worth of storage capacity. So, what we saw is customers had the requirement for filers, they had some excess storage, so why not actually build a software featured natively into the cluster. And that's essentially what we've done with AFS. So here we can see we have our AFS cluster, and one of the key things is the ability to scale. So, this particular cluster has around 3.1 or 3.16 billion files, which are running on this AFS cluster, as well as around 3,000 active concurrent sessions. Speaker 1: So basically thousands of concurrent sessions with billions of files? Speaker 2: Yeah, and the nice thing with this is this is actually only a four node Nutanix cluster, so as the cluster actually scales, these numbers will actually scale linearly as a function of those nodes. Speaker 1: Gotcha, gotcha. There's got to be one more bullet here on this slide so what's it about? Speaker 2: Yeah so, obviously the initial use case was realistically for home folders as well as user profiles. That was a good start, but it wasn't the only thing. So what we've done is we've actually also introduced important and upcoming release of NFS. So now you can now use NFS to also interface with our [crosstalk 01:16:44]. Speaker 1: NFS coming soon with AFS by the way, it's a big deal. Big deal. So one last thing obviously, as you go operationalize it, we've talked a lot of things on features and functions but one of the cool things that's always been seminal to this company is the fact that we all for really good customer service and support experience. Right now a lot of it is around the product, the people, the support guys, and so forth. So fundamentally to the product we have found ways using Pulse to instrument everything. With Pulse HD that has been allowed for a little bit longer now. We have fine grain [inaudible 01:17:20] around everything that's being done, so if you turn on this functionality you get a lot of information now that we built, we've used when you make a phone call, or an email, and so forth. There's a ton of context now available to support you guys. What we've now done is taken that and are now externalizing it for your own consumption, so that you don't have to necessarily call support. You can log in, look at your entire profile across your own alerts, your own advisories, your own recommendations. You can look at collective intelligence now that's coming soon which is the fact that look, here are 50 other customers just like you. These are the kinds of customers that are using workloads like you, what are their configuration profiles? Through this centralized customer insights portal you going to get a lot more insight, not just about your own operations, but also how everybody else is also using it. So let's take a quick look at that upcoming functionality. Speaker 2: Yep. Absolutely. So this is our customer 360 portal, so as [inaudible 01:18:18] mentioned, as a customer I can actually log in here, I can get a high-level overview of my existing environment, my cases, the status of those cases, as well as any relevant announcements. So, here based upon my cluster version, if there's any updates which are available, I can then see that here immediately. And then one of the other things that we've added here is this insights page. So essentially this is information that previously support would leverage to essentially proactively look out to the cluster, but now we've exposed this to you as the customer. So, clicking on this insights tab we can see an overview of our environment, in this case we have three Nutanix clusters, right around 550 virtual machines, and over here what's critical is we can actually see our cases. And one of the nice things about this is these area all autogenerated by the cluster itself, so no human interaction, no manual intervention was required to actually create these alerts. The cluster itself will actually facilitate that, send it over to support, and then support can get back out to you automatically. Speaker 1: K, so look for customer insights coming soon. And obviously that's the full life cycle. One cool thing though that's always been unique to Nutanix was the fact that we had [inaudible 01:19:28] security from day one built-in. And [inaudible 01:19:31] chunk of functionality coming in five-five just around this, because every release we try to insert more and more security capabilities, and the first one is around data. What are we doing? Speaker 2: Yeah, absolutely. So previously we had support for data at rest encryption, but this did have the requirement to leverage self-encrypting drives. These can be very expensive, so what we've done, typical to our fashion is we've actually built this in natively via software. So, here within Prism Element, I can go to data at rest encryption, and then I can go and edit this configuration here. Section 8 of 13 [01:10:00 - 01:20:04] Section 9 of 13 [01:20:00 - 01:30:04] (NOTE: speaker names may be different in each section) Steve: Encryption and then I can go and edit this configuration here. From here I could add my CSR's. I can specify KMS server and leverage native software base encryption without the requirement of SED's. Sunil: Awesome. So data address encryption [inaudible 01:20:15] coming soon, five five. Now data security is only one element, the other element was around network security obviously. We've always had this request about what are we doing about networking, what are we doing about network, and our philosophy has always been simple and clear, right. It is that the problem in networking is not the data plan. Problem in networking is the control plan. As in, if a packing loss happens to the top of an ax switch, what do we do? If there's a misconfigured board, what do we do? So we've invested a lot in full blown new network visualization that we'll show you a preview of that's all new in five five, but then once you can visualize you can take action, so you can actually using our netscape API's now in five five. You can optovision re lands on the switch, you can update reps on your load balancing pools. You can update obviously rules on your firewall. And then we've taken that to the next level, which is beyond all that, just let you go to AWS right now, what do you do? You take 100 VM's, you put it in an AWS security group, boom. That's how you get micro segmentation. You don't need to buy expensive products, you don't need to virtualize your network to get micro segmentation. That's what we're doing with five five, is built in one click micro segmentation. That's part of the core product, so why don't we just quickly show that. Okay? Steve: Yeah, let's take a look. So if we think about where we've been so far, we've done the comparison test, we've done a migration over to a Nutanix. We've deployed our new HR app. We've protected it's data, now we need to protect the network's. So one of the things you'll see that's new here is this security policies. What we'll do is we'll actually go ahead and create a new security policy and we'll just say this is HR security policy. We'll specify the application type, which in this case is HR. Sunil: HR of course. Steve: Yep and we can see our app instance is automatically populated, so based upon the number of running instances of that blueprint, that would populate that drop-down. Now we'll go ahead and click next here and what we can see in the middle is essentially those three tiers that composed that app blueprint. Now one of the important things is actually figuring out what's trying to communicate with this within my existing environment. So if I take a look over here on my left hand side, I can essentially see a few things. I can see a Ha Proxy load balancer is trying to communicate with my app here, that's all good. I want to allow that. I can see some sort of monitoring service is trying to communicate with all three of the tiers. That's good as well. Now the last thing I can see here is this IP address which is trying to access my database. Now, that's not designed and that's not supposed to happen, so what we'll do is we'll actually take a look and see what it's doing. Now hopping over to this database virtual machine or the hack VM, what we can see is it's trying to perform a brute force log in attempt to my MySQL database. This is not good. We can see obviously it can connect on the socket, however, it hasn't guessed the right password. In order to lock that down, we'll go back to our policies here and we're going to click deny. Once we've done that, we'll click next and now we'll go to Apply Now. Now we can see our newly created security policy and if we hop back over to this VM, we can now see it's actually timing out and what this means is that it's not able to communicate with that database virtual machine due to micro segmentation actively blocking that request. Sunil: Gotcha and when you go back to the Prism site, essentially what we're saying now is, it's as simple as that, to set up micro segmentation now inside your existing clusters. So that's one click micro segmentation, right. Good stuff. One other thing before we let Steve walk off the stage and then go to the bathroom, but is you guys know Steve, you know he spends a lot time in the gym, you do. Right. He and I share cubes right beside each other by the way just if you ever come to San Jose Nutanix corporate headquarters, you're always welcome. Come to the fourth floor and you'll see Steve and Sunil beside each other, most of the time I'm not in the cube, most of the time he's in the gym. If you go to his cube, you'll see all kinds of stuff. Okay. It's true, it's true, but the reason why I brought this up, was Steve recently became a father, his first kid. Oh by the way this is, clicker, this is how his cube looks like by the way but he left his wife and his new born kid to come over here to show us a demo, so give him a round of applause. Thank you, sir. Steve: Cool, thanks, Sunil. That was fun. Sunil: Thank you. Okay, so lots of good stuff. Please try out five five, give us feedback as you always do. A lot of sessions, a lot of details, have fun hopefully for the rest of the day. To talk about how their using Nutanix, you know here's one of our favorite customers and partners. He normally comes with sunglasses, I've asked him that I have to be the best looking guy on stage in my keynotes, so he's going to try to reduce his charm a little bit. Please come on up, Alessandro. Thank you. Alessandro R.: I'm delighted to be here, thank you so much. Sunil: Maybe we can stand here, tell us a little bit about Leonardo. Alessandro R.: About Leonardo, Leonardo is a key actor of the aerospace defense and security systems. Helicopters, aircraft, the fancy systems, the fancy electronics, weapons unfortunately, but it's also a global actor in high technology field. The security information systems division that is the division I belong to, 3,000 people located in Italy and in UK and there's several other countries in Europe and the U.S. $1 billion dollar of revenue. It has a long a deep experience in information technology, communications, automation, logical and physical security, so we have quite a long experience to expand. I'm in charge of the security infrastructure business side. That is devoted to designing, delivering, managing, secure infrastructures services and secure by design solutions and platforms. Sunil: Gotcha. Alessandro R.: That is. Sunil: Gotcha. Some of your focus obviously in recent times has been delivering secure cloud services obviously. Alessandro R.: Yeah, obviously. Sunil: Versus traditional infrastructure, right. How did Nutanix help you in some of that? Alessandro R.: I can tell something about our recent experience about that. At the end of two thousand ... well, not so recent. Sunil: Yeah, yeah. Alessandro R.: At the end of 2014, we realized and understood that we had to move a step forward, a big step and a fast step, otherwise we would drown. At that time, our newly appointed CEO confirmed that the IT would be a core business to Leonardo and had to be developed and grow. So we decided to start our digital transformation journey and decided to do it in a structured and organized way. Having clear in mind our targets. We launched two programs. One analysis program and one deployments programs that were essentially transformation programs. We had to renew ourselves in terms of service models, in terms of organization, in terms of skills to invest upon and in terms of technologies to adopt. We were stacking a certification of technologies that adopted, companies merged in the years before and we have to move forward and to rationalize all these things. So we spent a lot of time analyzing, comparing technologies, and evaluating what would fit to us. We had two main targets. The first one to consolidate and centralize the huge amount of services and infrastructure that were spread over 52 data centers in Italy, for Leonardo itself. The second one, to update our service catalog with a bunch of cloud services, so we decided to update our data centers. One of our building block of our new data center architecture was Nutanix. We evaluated a lot, we had spent a lot of time in analysis, so that wasn't a bet, but you are quite pioneers at those times. Sunil: Yeah, you took a lot of risk right as an Italian company- Alessandro R.: At this time, my colleague used to say, "Hey, Alessandro, think it over, remember that not a CEO has ever been fired for having chose IBM." I apologize, Bob, but at that time, when Nutanix didn't run on [inaudible 01:29:27]. We have still a good bunch of [inaudible 01:29:31] in our data center, so that will be the chance to ... Audience Member: [inaudible 01:29:37] Alessandro R.: So much you must [inaudible 01:29:37] what you announced it. Sunil: So you took a risk and you got into it. Alessandro R.: Yes, we got into, we are very satisfied with the results we have reached. Sunil: Gotcha. Alessandro R.: Most of the targets we expected to fulfill have come and so we are satisfied, but that doesn't mean that we won't go on asking you a big discount ... Sunil: Sure, sure, sure, sure. Alessandro R.: On price list. Sunil: Sure, sure, so what's next in terms of I know there are some interesting stuff that you're thinking. Alessandro R.: The next- Section 9 of 13 [01:20:00 - 01:30:04] Section 10 of 13 [01:30:00 - 01:40:04] (NOTE: speaker names may be different in each section) Speaker 1: So what's next, in terms of I know you have some interesting stuff that you're thinking of. Speaker 2: The next, we have to move forward obviously. The name Leonardo is inspired to Leonardo da Vinci, it was a guy that in terms of innovation and technology innovation had some good ideas. And so, I think, that Leonardo with Nutanix could go on in following an innovation target and following really mutual ... Speaker 1: Partnership. Speaker 2: Useful partnership, yes. We surely want to investigate the micro segmentation technologies you showed a minute ago because we have some looking, particularly by the economical point of view ... Speaker 1: Yeah, the costs and expenses. Speaker 2: And we have to give an alternative to the technology we are using. We want to use more intensively AHV, again as an alternative solution we are using. We are selecting a couple of services, a couple of quite big projects to build using AHV talking of Calm we are very eager to understand the announcement that they are going to show to all of us because the solution we are currently using is quite[crosstalk 01:31:30] Speaker 1: Complicated. Speaker 2: Complicated, yeah. To move a step of automation to elaborate and implement[inaudible 01:31:36] you spend 500 hours of manual activities that's nonsense so ... Speaker 1: Manual automation. Speaker 2: (laughs) Yes, and in the end we are very interested also in the prism features, mostly the new features that you ... Speaker 1: Talked about. Speaker 2: You showed yesterday in the preview because one bit of benefit that we received from the solution in the operations field means a bit plus, plus to our customer and a distinctive plus to our customs so we are very interested in that ... Speaker 1: Gotcha, gotcha. Thanks for taking the risk, thanks for being a customer and partner. Speaker 2: It has been a pleasure. Speaker 1: Appreciate it. Speaker 2: Bless you, bless you. Speaker 1: Thank you. So, you know obviously one OS, one click was one of our core things, as you can see the tagline doesn't stop there, it also says "any cloud". So, that's the rest of the presentation right now it's about; what are we doing, to now fulfill on that mission of one OS, one cloud, one click with one support experience across any cloud right? And there you know, we talked about Calm. Calm is not only just an operational experience for your private cloud but as you can see it's a one-click experience where you can actually up level your apps, set up blueprints, put SLA's and policies, push them down to either your AWS, GCP all your [inaudible 01:33:00] environments and then on day one while you can do one click provisioning, day two and so forth you will see new and new capabilities such as, one-click migration and mobility seeping into the product. Because, that's the end game for Calm, is to actually be your cloud autonomy platform right? So, you can choose the right cloud for the right workload. And talk about how they're building a multi cloud architecture using Nutanix and partnership a great pleasure to introduce my other good Italian friend Daniele, come up on stage please. From Telecom Italia Sparkle. How are you sir? Daniele: Not too bad thank you. Speaker 1: You want an espresso, cappuccino? Daniele: No, no later. Speaker 1: You all good? Okay, tell us a little about Sparkle. Daniele: Yeah, Sparkle is a fully owned subsidy of Telecom Italia group. Speaker 1: Mm-hmm (affirmative) Daniele: Spinned off in 2003 with the mission to develop the wholesale and multinational corporate and enterprise business abroad. Huge network, as you can see, hundreds of thousands of kilometers of fiber optics spread between; south east Asia to Europe to the U.S. Most of it proprietary part of it realized on some running cables. Part of them proprietary part of them bilateral part of them[inaudible 01:34:21] with other operators. 37 countries in which we have offices in the world, 700 employees, lean and clean company ... Speaker 1: Wow, just 700 employees for all of this. Daniele: Yep, 1.4 billion revenues per year more or less. Speaker 1: Wow, are you a public company? Daniele: No, fully owned by TIM so far. Speaker 1: So, what is your experience with Nutanix so far? Daniele: Well, in a way similar to what Alessandro was describing. To operate such a huge network as you can see before, and to keep on bringing revenues for the wholesale market, while trying to turn the bar toward the enterprise in a serious way. Couple of years ago the management team realized that we had to go through a serious transformation, not just technological but in terms of the way we build the services to our customers. In terms of how we let our customer feel the Sparkle experience. So, we are moving towards cloud but we are moving towards cloud with connectivity attached to it because it's in our cord as a provider of Telecom services. The paradigm that is driving today is the on-demand, is the dynamic and in order to get these things we need to move to software. Most of the network must become invisible as the Nutanix way. So, we decided instead of creating patchworks onto our existing systems, infrastructure, OSS, BSS and network systems, to build a new data center from scratch. And the paradigm being this new data center, the mantra was; everything is software designed, everything must be easy to manage, performance capacity planning, everything must be predictable and everything to be managed by few people. Nutanix is at the moment the baseline of this data center for what concern, let's say all the new networking tools, meaning as the end controllers that are taking care of automation and programmability of the network. Lifecycle service orchestrator, network orchestrator, cloud automation and brokerage platform and everything at the moment runs on AHV because we are forcing our vendors to certify their application on AHV. The only stack that is not at the moment AHV based is on a specific cloud platform because there we were really looking for the multi[inaudible 01:37:05]things that you are announcing today. So, we hope to do the migration as soon as possible. Speaker 1: Gotcha, gotcha. And then looking forward you're going to build out some more data center space, expose these services Daniele: Yeah. Speaker 1: For the customers as well as your internal[crosstalk 01:37:21] Daniele: Yeah, basically yes for sure we are going to consolidate, to invest more in the data centers in the markets on where we are leader. Italy, Turkey and Greece we are big data centers for [inaudible 01:37:33] and cloud, but we believe that the cloud with all the issues discussed this morning by Diraj, that our locality, customer proximity ... we think as a global player having more than 120 pops all over the world, which becomes more than 1000 in partnerships, that the pop can easily be transformed in a data center, so that we want to push the customer experience of what we develop in our main data centers closer to them. So, that we can combine traditional infrastructure as a service with the new connectivity services every single[inaudible 01:38:18] possibly everything running. Speaker 1: I mean, it makes sense, I mean I think essentially in some ways to summarize it's the example of an edge cloud where you're pushing a micro-cloud closer to the customers edge. Daniele: Absolutely. Speaker 1: Great stuff man, thank you so much, thank you so much. Daniele: Pleasure, pleasure. Thank you. Speaker 1: So, you know a couple of other things before we get in the next demo is the fact that in addition to Calm from multi-cloud management we have Zai, we talked about for extended enterprise capabilities and something for you guys to quickly understand why we have done this. In a very simple way is if you think about your enterprise data center, clearly you have a bunch of apps there, a bunch of public clouds and when you look at the paradigm you currently deploy traditional apps, we call them mode one apps, SAP, Exchange and so forth on your enterprise. Then you have next generation apps whether it be [inaudible 01:39:11] space, whether it be Doob or whatever you want to call it, lets call them mode two apps right? And when you look at these two types of apps, which are the predominant set, most enterprises have a combination of mode one and mode two apps, most public clouds primarily are focused, initially these days on mode two apps right? And when people talk about app mobility, when people talk about cloud migration, they talk about lift and shift, forklift [inaudible 01:39:41]. And that's a hard problem I mean, it's happening but it's a hard problem and ends up that its just not a one time thing. Once you've forklift, once you move you have different tooling, different operation support experience, different stacks. What if for some of your applications that mattered ... Section 10 of 13 [01:30:00 - 01:40:04] Section 11 of 13 [01:40:00 - 01:50:04] (NOTE: speaker names may be different in each section) Speaker 1: What if, for some of your applications that matter to you, that are your core enterprise apps that you can retain the same toolimg, the same operational experience and so forth. And that is what we achieve to do with Xi. It is truly making hybrid invisible, which is a next act for this company. It'll take us a few years to really fulfill the vision here, but the idea here is that you shouldn't think about public cloud as a different silo. You should think of it as an extension of your enterprise data centers. And for any services such as DR, whether it would be dev test, whether it be back-up, and so-forth. You can use the same tooling, same experience, get a public cloud-like capability without lift and shift, right? So it's making this lift and shift invisible by, soft of, homogenizing the data plan, the network plan, the control plan is what we really want to do with Xi. Okay? And we'll show you some more details here. But the simplest way to understand this is, think of it as the iPhone, right? D has mentioned this a little bit. This is how we built this experience. Views IOS as the core, IP, we wrap it up with a great package called the iPhone. But then, a few years into the iPhone era, came iTunes and iCloud. There's no apps, per se. That's fused into IOS. And similarly, think about Xi that way. The more you move VMs, into an internet-x environment, stuff like DR comes burnt into the fabric. And to give us a sneak peek into a bunch of the com and Xi cable days, let me bring back Binny who's always a popular guys on stage. Come on up, Binny. I'd be surprised in Binny untucked his shirt. He's always tucking in his shirt. Binny Gill: Okay, yeah. Let's go. Speaker 1: So first thing is com. And to show how we can actually deploy apps, not just across private and public clouds, but across multiple public clouds as well. Right? Binny Gill: Yeah, basically, you know com is about simplifying the disparity between various public clouds out there. So it's very important for us to be able to take one application blueprint and then quickly deploy in whatever cloud of your choice. Without understanding how one cloud is different. Speaker 1: Yeah, that's the goal. Binny Gill: So here, if you can see, I have market list. And by the way, this market list is a great partner community interest. And every single sort of apps come up here. Let me take a sample app here, Hadoop. And click launch. And now where do you want me to deploy? Speaker 1: Let's start at GCP. Binny Gill: GCP, okay. So I click on GCP, and let me give it a name. Hadoop. GCP. Say 30, right. Clear. So this is one click deployment of anything from our marketplace on to a cloud of your choice. Right now, what the system is doing, is taking the intent-filled description of what the application should look like. Not just the infrastructure level but also within the merchant machines. And it's creating a set of work flows that it needs to go deploy. So as you can see, while we were talking, it's loading the application. Making sure that the provisioning workflows are all set up. Speaker 1: And so this is actually, in real time it's actually extracting out some of the GCP requirements. It's actually talking to GCP. Setting up the constructs so that we can actually push it up on the GCP personally. Binny Gill: Right. So it takes a couple of minutes. It'll provision. Let me go back and show you. Say you worked with deploying AWS. So you Hadoop. Hit address. And that's it. So again, the same work flow. Speaker 1: Same process, I see. Binny Gill: It's going to now deploy in AWS. Speaker 1: See one of the keys things is that we actually extracted out all the isms of each of these clouds into this logical substrate. Binny Gill: Yep. Speaker 1: That you can now piggy-back off of. Binny Gill: Absolutely. And it makes it extremely simple for the average consumer. And you know we like more cloud support here over time. Speaker 1: Sounds good. Binny Gill: Now let me go back and show you an app that I had already deployed. Now 13 days ago. It's on GCP. And essentially what I want to show you is what is the view of the application. Firstly, it shows you the cost summary. Hourly, daily, and how the cost is going to look like. The other is how you manage it. So you know one click ways of upgrading, scaling out, starting, deleting, and so on. Speaker 1: So common actions, but independent of the type of clouds. Binny Gill: Independent. And also you can act with these actions over time. Right? Then services. It's learning two services, Hadoop slave and Hadoop master. Hadoop slave runs fast right now. And auditing. It shows you what are the important actions you've taken on this app. Not just, for example, on the IS front. This is, you know how the VMs were created. But also if you scroll down, you know how the application was deployed and brought up. You know the slaves have to discover each other, and so on. Speaker 1: Yeah got you. So find game invisibility into whatever you were doing with clouds because that's been one of the complaints in general. Is that the cloud abstractions have been pretty high level. Binny Gill: Yeah. Speaker 1: Yeah. Binny Gill: Yeah. So that's how we make the differences between the public clouds. All go away for the Indias of ... Speaker 1: Got you. So why don't we now give folks ... Now a lot of this stuff is coming in five, five so you'll see that pretty soon. You'll get your hands around it with AWS and tree support and so forth. What we wanted to show you was emerging alpha version that is being baked. So is a real production code for Xi. And why don't we just jump right in to it. Because we're running short of time. Binny Gill: Yep. Speaker 1: Give folks a flavor for what the production level code is already being baked around. Binny Gill: Right. So the idea of the design is make sure it's not ... the public cloud is no longer any different from your private cloud. It's a true seamless extension of your private cloud. Here I have my test environment. As you can see I'm running the HR app. It has the DB tier and the Web tier. Yeah. Alright? And the DB tier is running Oracle DB. Employee payroll is the Web tier. And if you look at the availability zones that I have, this is my data center. Now I want to protect this application, right? From disaster. What do I do? I need another data center. Speaker 1: Sure. Binny Gill: Right? With Xi, what we are doing is ... You go here and click on Xi Cloud Services. Speaker 1: And essentially as the slide says, you are adding AZs with one click. Binny Gill: Yeps so this is what I'm going to do. Essentially, you log in using your existing my.nutanix.com credentials. So here I'm going to use my guest credentials and log in. Now while I'm logging in what's happening is we are creating a seamless network between the two sides. And then making the Xi cloud availability zone appear. As if it was my own. Right? Speaker 1: Gotcha. Binny Gill: So in a couple of seconds what you'll notice this list is here now I don't have just one availability zone, but another one appears. Speaker 1: So you have essentially, real time now, paid a one data center doing an availability zone. Binny Gill: Yep. Speaker 1: Cool. Okay. Let's see what else we can do. Binny Gill: So now you think about VR setup. Now I'm armed with another data center, let's do DR Center. Now DR set-up is going to be extremely simple. Speaker 1: Okay but it's also based because on the fact that it is the same stack on both sides. Right? Binny Gill: It's the same stack on both sides. We have a secure network lane connecting the two sides, on top of the secure network plane. Now data can flow back and forth. So now applications can go back and forth, securely. Speaker 1: Gotcha, okay. Let's look at one-click DR. Binny Gill: So for one-click DR set-up. A couple of things we need to know. One is a protection rule. This is the RPO, where does it apply to? Right? And the connection of the replication. The other one is recovery plans, in case disaster happens. You know, how do I bring up my machines and application work-order and so on. So let me first show you, Protection Rule. Right? So here's the protection rule. I'll create one right now. Let me call it Platinum. Alright, and source is my own data center. Destination, you know Xi appears now. Recovery point objective, so maybe in a one hour these snapshots going to the public cloud. I want to retain three in the public side, three locally. And now I select what are the entities that I want to protect. Now instead of giving VMs my name, what I can do is app type employee payroll, app type article database. It covers both the categories of the application tiers that I have. And save. Speaker 1: So one of the things here, by the way I don't know if you guys have noticed this, more and more of Nutanix's constructs are being eliminated to become app-centric. Of course is VM centric. And essentially what that allows one to do is to create that as the new service-level API/abstraction. So that under the cover over a period of time, you may be VMs today, maybe containers tomorrow. Or functions, the day after. Binny Gill: Yep. What I just did was all that needs to be done to set up replication from your own data center to Xi. So we started off with no data center to actually replication happening. Speaker 1: Gotcha. Binny Gill: Okay? Speaker 1: No, no. You want to set up some recovery plans? Binny Gill: Yeah so now set up recovery plan. Recovery plans are going to be extremely simple. You select a bunch of VMs or apps, and then there you can say what are the scripts you want to run. What order in which you want to boot things. And you know, you can set up access these things with one click monthly or weekly and so on. Speaker 1: Gotcha. And that sets up the IPs as well as subnets and everything. Binny Gill: So you have the option. You can maintain the same IPs on frame as the move to Xi. Or you can make them- Speaker 1: Remember, you can maintain your own IPs when you actually use the Xi service. There was a lot of things getting done to actually accommodate that capability. Binny Gill: Yeah. Speaker 1: So let's take a look at some of- Binny Gill: You know, the same thing as VPC, for example. Speaker 1: Yeah. Binny Gill: You need to possess on Xi. So, let's create a recovery plan. A recovery plan you select the destination. Where does the recovery happen. Now, after that Section 11 of 13 [01:40:00 - 01:50:04] Section 12 of 13 [01:50:00 - 02:00:04] (NOTE: speaker names may be different in each section) Speaker 1: ... does the recovery happen. Now, after that you have to think of what is the runbook that you want to run when disaster happens, right? So you're preparing for that, so let me call "HR App Recovery." The next thing is the first stage. We're doing the first stage, let me add some entities by categories. I want to bring up my database first, right? Let's click on the database and that's it. Speaker 2: So essentially, you're building the script now. Speaker 1: Building the script- Speaker 2: ... on the [inaudible 01:50:30] Speaker 1: ... but in a visual way. It's simple for folks to understand. You can add custom script, add delay and so on. Let me add another stage and this stage is about bringing up the web tier after the database is up. Speaker 2: So basically, bring up the database first, then bring up the web tier, et cetera, et cetera, right? Speaker 1: That's it. I've created a recovery plan. I mean usually it's complicated stuff, but we made it extremely simple. Now if you click on "Recovery Points," these are snapshots. Snapshots of your applications. As you can see, already the system has taken three snapshots in response to the protection rule that we had created just a couple minutes ago. And these are now being seeded to Xi data centers. Of course this takes time for seeding, so what I have is a setup already and that's the production environment. I'll cut over to that. This is my production environment. Click "Explore," now you see the same application running in production and I have a few other VMs that are not protected. Let's go to "Recovery Points." It has been running for sometime, these recover points are there and they have been replicated to Xi. Speaker 2: So let's do the failover then. Speaker 1: Yeah, so to failover, you'll have to go to Xi so let me login to Xi. This time I'll use my production account for logging into Xi. I'm logging in. The first thing that you'll see in Xi is a dashboard that gives you a quick summary of what your DR testing has been so far, if there are any issues with the replication that you have and most importantly the monthly charges. So right now I've spent with my own credit card about close to 1,000 bucks. You'll have to refund it quickly. Speaker 2: It depends. If the- Speaker 1: If this works- Speaker 2: IF the demo works. Speaker 1: Yeah, if it works, okay. As you see, there are no VMs right now here. If I go to the recovery points, they are there. I can click on the recovery plan that I had created and let's see how hard it's going to be. I click "Failover." It says three entities that, based on the snapshots, it knows that it can recovery from source to destination, which is Xi. And one click for the failover. Now we'll see what happens. Speaker 2: So this is essentially failing over my production now. Speaker 1: Failing over your production now. [crosstalk 01:52:53] If you click on the "HR App Recovery," here you see now it started the recovery plan. The simple recovery plan that we had created, it actually gets converted to a series of tasks that the system has to do. Each VM has to be hydrated, powered on in the right order and so on and so forth. You don't have to worry about any of that. You can keep an eye on it. But in the meantime, let's talk about something else. We are doing failover, but after you failover, you run in Xi as if it was your own setup and environment. Maybe I want to create a new VM. I create a VM and I want to maybe extend my HR app's web tier. Let me name it as "HR_Web_3." It's going to boot from that disk. Production network, I want to run it on production network. We have production and test categories. This one, I want to give it employee payroll category. Now it applies the same policies as it's peers will. Here, I'm going to create the VM. As you can see, I can already see some VMs coming up. There you go. So three VMs from on-prem are now being filled over here while the fourth VM that I created is already being powered. Speaker 2: So this is basically realtime, one-click failover, while you're using Xi for your [inaudible 01:54:13] operations as well. Speaker 1: Exactly. Speaker 2: Wow. Okay. Good stuff. What about- Speaker 1: Let me add here. As the other cloud vendors, they'll ask you to make your apps ready for their clouds. Well we tell our engineers is make our cloud ready for your apps. So as you can see, this failover is working. Speaker 2: So what about failback? Speaker 1: All of them are up and you can see the protection rule "platinum" has been applied to all four. Now let's look at this recovery plan points "HR_Web_3" right here, it's already there. Now assume the on-prem was already up. Let's go back to on-prem- Speaker 2: So now the scenario is, while Binny's coming up, is that the on-prem has come back up and we're going to do live migration back as in a failback scenario between the data centers. Speaker 1: And how hard is it going to be. "HR App Recovery" the same "HR App Recovery", I click failover and the system is smart enough to understand the direction is reversed. It's also smart enough to figure out "Hey, there are now the four VMs are there instead of three." Xi to on-prem, one-click failover again. Speaker 2: And it's rerunning obviously the same runbook but in- Speaker 1: Same runbook but the details are different. But it's hidden from the customer. Let me go to the VMs view and do something interesting here. I'll group them by availability zone. Here you go. As you can see, this is a hybrid cloud view. Same management plane for both sides public and private. There are two availability zones, the Xi availability zone is in the cloud- Speaker 2: So essentially you're moving from the top- Speaker 1: Yeah, top- Speaker 2: ... to the bottom. Speaker 1: ... to the bottom. Speaker 2: That's happening in the background. While this is happening, let me take the time to go and look at billing in Xi. Speaker 1: Sure, some of the common operations that you can now see in a hybrid view. Speaker 2: So you go to "Billing" here and first let me look at my account. And account is a simple page, I have set up active directory and you can add your own XML file, upload it. You can also add multi-factor authentication, all those things are simple. On the billing side, you can see more details about how did I rack up $966. Here's my credit card. Detailed description of where the cost is coming from. I can also download previous versions, builds. Speaker 1: It's actually Nutanix as a service essentially, right? Speaker 2: Yep. Speaker 1: As a subscription service. Speaker 2: Not only do we go to on-prem as you can see, while we were talking, two VMs have already come back on-prem. They are powered off right now. The other two are on the wire. Oh, there they are. Speaker 1: Wow. Speaker 2: So now four VMs are there. Speaker 1: Okay. Perfect. Sometimes it works, sometimes it doesn't work, but it's good. Speaker 2: It always works. Speaker 1: Always works. All right. Speaker 2: As you can see the platinum protection rule is now already applied to them and now it has reversed the direction of [inaudible 01:57:12]- Speaker 1: Remember, we showed one-click DR, failover, failback, built into the product when Xi ships to any Nutanix fabric. You can start with DSX on premise, obviously when you failover to Xi. You can start with AHV, things that are going to take the same paradigm of one-click operations into this hybrid view. Speaker 2: Let's stop doing lift and shift. The era has come for click and shift. Speaker 1: Binny's now been promoted to the Chief Marketing Officer, too by the way. Right? So, one more thing. Speaker 2: Okay. Speaker 1: You know we don't stop any conferences without a couple of things that are new. The first one is something that we should have done, I guess, a couple of years ago. Speaker 2: It depends how you look at it. Essentially, if you look at the cloud vendors, one of the key things they have done is they've built services as building blocks for the apps that run on top of them. What we have done at Nutanix, we've built core services like block services, file services, now with Calm, a marketplace. Now if you look at [inaudible 01:58:14] applications, one of the core building pieces is the object store. I'm happy to announce that we have the object store service coming up. Again, in true Nutanix fashion, it's going to be elastic. Speaker 1: Let's- Speaker 2: Let me show you. Speaker 1: Yeah, let's show it. It's something that is an object store service by the way that's not just for your primary, but for your secondary. It's obviously not just for on-prem, it's hybrid. So this is being built as a next gen object service, as an extension of the core fabric, but accommodating a bunch of these new paradigms. Speaker 2: Here is the object browser. I've created a bunch of buckets here. Again, object stores can be used in various ways: as primary object store, or for secondary use cases. I'll show you both. I'll show you a Hadoop use case where Hadoop is using this as a primary store and a backup use case. Let's just jump right in. This is a Hadoop bucket. AS you can see, there's a temp directory, there's nothing interesting there. Let me go to my Hadoop VM. There it is. And let me run a Hadoop job. So this Hadoop job essentially is going to create a bunch of files, write them out and after that do map radius on top. Let's wait for the job to start. It's running now. If we go back to the object store, refresh the page, now you see it's writing from benchmarks. Directory, there's a bunch of files that will write here over time. This is going to take time. Let's not wait for it, but essentially, it is showing Hadoop that uses AWS 3 compatible API, that can run with our object store because our object store exposes AWS 3 compatible APIs. The other use case is the HYCU backup. As you can see, that's a- Section 12 of 13 [01:50:00 - 02:00:04] Section 13 of 13 [02:00:00 - 02:13:42] (NOTE: speaker names may be different in each section) Vineet: This is the hycu back up ... As you can see, that's a back-up software that can back-up WSS3. If you point it to Nutanix objects or it can back-up there as well. There are a bunch of back-up files in there. Now, object stores, it's very important for us to be able to view what's going on there and make sure there's no objects sprawled because once it's easy to write objects, you just accumulate a lot of them. So what we wanted to do, in true Nutanix style, is give you a quick overview of what's happening with your object store. So here, as you can see, you can look at the buckets, where the load is, you can look at the bucket sizes, where the data is, and also what kind of data is there. Now this is a dashboard that you can optimize, and customize, for yourself as well, right? So that's the object store. Then we go back here, and I have one more thing for you as well. Speaker 2: Okay. Sounds good. I already clicked through a slide, by the way, by mistake, but keep going. Vineet: That's okay. That's okay. It is actually a quiz, so it's good for people- Speaker 2: Okay. Sounds good. Vineet: It's good for people to have some clues. So the quiz is, how big is my SAP HANA VM, right? I have to show it to you before you can answer so you don't leak the question. Okay. So here it is. So the SAP HANA VM here vCPU is 96. Pretty beefy. Memory is 1.5 terabytes. The question to all of you is, what's different in this screen? Speaker 2: Who's a real Prism user here, by the way? Come on, it's got to be at least a few. Those guys. Let's see if they'll notice something. Vineet: What's different here? Speaker 3: There's zero CVM. Vineet: Zero CVM. Speaker 2: That's right. Yeah. Yeah, go ahead. Vineet: So, essentially, in the Nutanix fabric, every server has to run a [inaudible 02:01:48] machine, right? That's where the storage comes from. I am happy to announce the Acropolis Compute Cloud, where you will be able to run the HV on servers that are storage-less, and add it to your existing cluster. So it's a compute cloud that now can be managed from Prism Central, and that way you can preserve your investments on your existing server farms, and add them to the Nutanix fabric. Speaker 2: Gotcha. So, essentially ... I mean, essentially, imagine, now that you have the equivalent of S3 and EC2 for the enterprise now on Premisis, like you have the equivalent compute and storage services on JCP and AWS, and so forth, right? So the full flexibility for any kind of workload is now surely being available on the same Nutanix fabric. Thanks a lot, Vineet. Before we wrap up, I'd sort of like to bring this home. We've announced a pretty strategic partnership with someone that has always inspired us for many years. In fact, one would argue that the genesis of Nutanix actually was inspired by Google and to talk more about what we're actually doing here because we've spent a lot of time now in the last few months to really get into the product capabilities. You're going to see some upcoming capabilities and 55X release time frame. To talk more about that stuff as well as some of the long-term synergies, let me invite Bill onstage. C'mon up Bill. Tell us a little bit about Google's view in the cloud. Bill: First of all, I want to compliment the demo people and what you did. Phenomenal work that you're doing to make very complex things look really simple. I actually started several years ago as a product manager in high availability and disaster recovery and I remember, as a product manager, my engineers coming to me and saying "we have a shortage of our engineers and we want you to write the fail-over routines for the SAP instance that we're supporting." And so here's the PERL handbook, you know, I haven't written in PERL yet, go and do all that work to include all the network setup and all that work, that's amazing, what you are doing right there and I think that's the spirit of the partnership that we have. From a Google perspective, obviously what we believe is that it's time now to harness the power of scale security and these innovations that are coming out. At Google we've spent a lot of time in trying to solve these really large problems at scale and a lot of the technology that's been inserted into the industry right now. Things like MapReduce, things like TenserFlow algorithms for AI and things like Kubernetes and Docker were first invented at Google to solve problems because we had to do it to be able to support the business we have. You think about search, alright? When you type in search terms within the search box, you see a white screen, what I see is all the data-center work that's happening behind that and the MapReduction to be able to give you a search result back in seconds. Think about that work, think about that process. Taking and pursing those search terms, dividing that over thousands of [inaudible 02:05:01], being able to then search segments of the index of the internet and to be able to intelligent reduce that to be able to get you an answer within seconds that is prioritized, that is sorted. How many of you, out there, have to go to page two and page three to get the results you want, today? You don't because of the power of that technology. We think it's time to bring that to the consumer of the data center enterprise space and that's what we're doing at Google. Speaker 2: Gotcha, man. So I know we've done a lot of things now over the last year worth of collaboration. Why don't we spend a few minutes talking through a couple things that we're started on, starting with [inaudible 02:05:36] going into com and then we'll talk a little bit about XI. Bill: I think one of the advantages here, as we start to move up the stack and virtualize things to your point, right, is virtual machines and the work required of that still takes a fair amount of effort of which you're doing a lot to reduce, right, you're making that a lot simpler and seamless across both On-Prem and the cloud. The next step in the journey is to really leverage the power of containers. Lightweight objects that allow you to be able to head and surface functionality without being dependent upon the operating system or the VM to be able to do that work. And then having the orchestration layer to be able to run that in the context of cloud and On-Prem We've been very successful in building out the Kubernetes and Docker infrastructure for everyone to use. The challenge that you're solving is how to we actually bridge the gap. How do we actually make that work seamlessly between the On-Premise world and the cloud and that's where our partnership, I think, is so valuable. It's cuz you're bringing the secret sauce to be able to make that happen. Speaker 2: Gotcha, gotcha. One last thing. We talked about Xi and the two companies are working really closely where, essentially the Nutanix fabric can seamlessly seep into every Google platform as infrastructure worldwide. Xi, as a service, could be delivered natively with GCP, leading to some additional benefits, right? Bill: Absolutely. I think, first and foremost, the infrastructure we're building at scale opens up all sorts of possibilities. I'll just use, maybe, two examples. The first one is network. If you think about building out a global network, there's a lot of effort to do that. Google is doing that as a byproduct of serving our consumers. So, if you think about YouTube, if you think about there's approximately a billion hours of YouTube that's watched every single day. If you think about search, we have approximately two trillion searches done in a year and if you think about the number of containers that we run in a given week, we run about two billion containers per week. So the advantage of being able to move these workloads through Xi in a disaster recovery scenario first is that you get to take advantage of the scale. Secondly, it's because of the network that we've built out, we had to push the network out to the edge. So every single one of our consumers are using YouTube and search and Google Play and all those services, by the way we have over eight services today that have more than a billion simultaneous users, you get to take advantage of that network capacity and capability just by moving to the cloud. And then the last piece, which is a real advantage, we believe, is that it's not just about the workloads you're moving but it's about getting access to new services that cloud preventers, like Google, provide. For example, are you taking advantage like the next generation Hadoop, which is our big query capability? Are you taking advantage of the artificial intelligence derivative APIs that we have around, the video API, the image API, the speech-to-text API, mapping technology, all those additional capabilities are now exposed to you in the availability of Google cloud that you can now leverage directly from systems that are failing over and systems that running in our combined environment. Speaker 2: A true converged fabric across public and private. Bill: Absolutely. Speaker 2: Great stuff Bill. Thank you, sir. Bill: Thank you, appreciate it. Speaker 2: Good to have you. So, the last few slides. You know we've talked about, obviously One OS, One Click and eCloud. At the end of the day, it's pretty obvious that we're evaluating the move from a form factor perspective, where it's not just an OS across multiple platforms but it's also being distributed genuinely from consuming itself as an appliance to a software form factor, to subscription form factor. What you saw today, obviously, is the fact that, look you know we're still continuing, the velocity has not slowed down. In fact, in some cases it's accelerated. If you ask my quality guys, if you ask some of our customers, we're coming out fast and furious with a lot of these capabilities. And some of this directly reflects, not just in features, but also in performance, just like a public cloud, where our performance curve is going up while our price-performance curve is being more attractive over a period of time. And this is balancing it with quality, it is what differentiates great companies from good companies, right? So when you look at the number of nodes that have been shipping, it was around ten more nodes than where we were a few years ago. But, if you look at the number of customer-found defects, as a percentage of number of nodes shipped it is not only stabilized, it has actually been coming down. And that's directly reflected in the NPS part. That most of you guys love. How many of you guys love your Customer Support engineers? Give them a round of applause. Great support. So this balance of velocity, plus quality, is what differentiates a company. And, before we call it a wrap, I just want to leave you with one thing. You know, obviously, we've talked a lot about technology, innovation, inspiration, and so forth. But, as I mentioned, from last night's discussion with Sir Ranulph, let's think about a few things tonight. Don't take technology too seriously. I'll give you a simple story that he shared with me, that puts things into perspective. The year was 1971. He had come back from Aman, from his service. He was figuring out what to do. This was before he became a world-class explorer. 1971, he had a job interview, came down from Scotland and applied for a role in a movie. And he failed that job interview. But he was selected from thousands of applicants, came down to a short list, he was a ... that's a hint ... he was a good looking guy and he lost out that role. And the reason why I say this is, if he had gotten that job, first of all I wouldn't have met him, but most importantly the world wouldn't have had an explorer like him. The guy that he lost out to was Roger Moore and the role was for James Bond. And so, when you go out tonight, enjoy with your friends [inaudible 02:12:06] or otherwise, try to take life a little bit once upon a time or more than once upon a time. Have fun guys, thank you. Speaker 5: Ladies and gentlemen please make your way to the coffee break, your breakout sessions will begin shortly. Don't forget about the women's lunch today, everyone is welcome. Please join us. You can find the details in the mobile app. Please share your feedback on all sessions in the mobile app. There will be prizes. We will see you back here and 5:30, doors will open at 5, after your last breakout session. Breakout sessions will start sharply at 11:10. Thank you and have a great day. Section 13 of 13 [02:00:00 - 02:13:42]

Published Date : Nov 9 2017

SUMMARY :

of the globe to be here. And now, to tell you more about the digital transformation that's possible in your business 'Cause that's the most precious thing you actually have, is time. And that's the way you can have the best of both worlds; the control plane is centralized. Speaker 1: Thank you so much, Bob, for being here. Speaker 1: IBM is all things cognitive. and talking about the meaning of history, because I love history, actually, you know, We invented the role of the CIO to help really sponsor and enter in this notion that businesses Speaker 1: How's it different from 1993? Speaker 1: And you said it's bigger than 25 years ago. is required to do that, the experience of the applications as you talked about have Speaker 1: It looks like massive amounts of change for Speaker 1: I'm sure there are a lot of large customers Speaker 1: How can we actually stay not vulnerable? action to be able to deploy cognitive infrastructure in conjunction with the business processes. Speaker 1: Interesting, very interesting. and the core of cognition has to be infrastructure as well. Speaker 1: Which is one of the two things that the two So the algorithms are redefining the processes that the circuitry actually has to run. Speaker 1: It's interesting that you mentioned the fact Speaker 1: Exactly, and now the question is how do you You talked about the benefits of calm and being able to really create that liberation fact that you have the power of software, to really meld the two forms together. Speaker 1: It can serve files and mocks and things like And the reason for that if for any data intensive application like a data base, a no sequel What we want is that optionality, for you to utilize those benefits of the 3X better Speaker 1: Your tongue in cheek remark about commodity That is the core of IBM's business for the last 20, 25, 30 years. what you already have to make it better. Speaker 1: Yeah. Speaker 1: That's what Apple did with musics. It's okay, and possibly easier to do it in smaller islands of containment, but when you Speaker 1: Awesome. Thank you. I know that people are sitting all the way up there as well, which is remarkable. Speaker 3: Ladies and gentlemen, please welcome Chief But before I get into the product and the demos, to give you an idea. The starting point evolves to the score architecture that we believe that the cloud is being dispersed. So, what we're going to do is, the first step most of you guys know this, is we've been Now one of the key things is having the ability to test these against each other. And to do that, we took a hard look and came out with a new product called Xtract. So essentially if we think about what Nutanix has done for the data center really enables and performing the cut over to you. Speaker 1: Sure, some of the common operations that you

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Binny Gill	PERSON	0.99+
Daniele	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Binny	PERSON	0.99+
Steven	PERSON	0.99+
Julie	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Italy	LOCATION	0.99+
UK	LOCATION	0.99+
Telecom Italia	ORGANIZATION	0.99+
Acropolis	ORGANIZATION	0.99+
100 percent	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Alessandro	PERSON	0.99+
2003	DATE	0.99+
Sunil	PERSON	0.99+
Google	ORGANIZATION	0.99+
20%	QUANTITY	0.99+
Steven Poitras	PERSON	0.99+
15 seconds	QUANTITY	0.99+
1993	DATE	0.99+
Leonardo	PERSON	0.99+
Lennox	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
Six	QUANTITY	0.99+
two companies	QUANTITY	0.99+
John Doe	PERSON	0.99+
AWS	ORGANIZATION	0.99+

Vikram Murali, IBM | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Vikram	PERSON	0.99+
John	PERSON	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Walls	PERSON	0.99+
October 30th	DATE	0.99+
2017	DATE	0.99+
April	DATE	0.99+
June	DATE	0.99+
one year	QUANTITY	0.99+
Daniel Hernandez	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
September	DATE	0.99+
one	QUANTITY	0.99+
ten miles	QUANTITY	0.99+
YARN	ORGANIZATION	0.99+
eight miles	QUANTITY	0.99+
Vikram Murali	PERSON	0.99+
New York City	LOCATION	0.99+
North America	LOCATION	0.99+
two day	QUANTITY	0.99+
Python	TITLE	0.99+
two releases	QUANTITY	0.99+
New York	LOCATION	0.99+
two years	QUANTITY	0.99+
three years	QUANTITY	0.99+
six releases	QUANTITY	0.99+
Toronto	LOCATION	0.99+
today	DATE	0.99+
Both	QUANTITY	0.99+
two months	QUANTITY	0.99+
a year	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
third phase	QUANTITY	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
first methodology	QUANTITY	0.98+
First	QUANTITY	0.97+
second thing	QUANTITY	0.97+
one small piece	QUANTITY	0.96+
One	QUANTITY	0.96+
XGBoost	TITLE	0.96+
Cameraman	PERSON	0.96+
about eight miles	QUANTITY	0.95+
Horton Data Platform	ORGANIZATION	0.95+
2017-2018	DATE	0.94+
first	QUANTITY	0.94+
The Valley	LOCATION	0.94+
TensorFlow	TITLE	0.94+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Wrap Up | IBM Fast Track Your Data 2017

>> Narrator: Live from Munich Germany, it's theCUBE, covering IBM, Fast Track Your Data. Brought to you by IBM. >> We're back. This is Dave Vellante with Jim Kobielus, and this is theCUBE, the leader in live tech coverage. We go out to the events. We extract the signal from the noise. We are here covering special presentation of IBM's Fast Track your Data, and we're in Munich Germany. It's been a day-long session. We started this morning with a panel discussion with five senior level data scientists that Jim and I hosted. Then we did CUBE interviews in the morning. We cut away to the main tent. Kate Silverton did a very choreographed scripted, but very well done, main keynote set of presentations. IBM made a couple of announcements today, and then we finished up theCUBE interviews. Jim and I are here to wrap. We're actually running on IBMgo.com. We're running live. Hilary Mason talking about what she's doing in data science, and also we got a session on GDPR. You got to log in to see those sessions. So go ahead to IBMgo.com, and you'll find those. Hit the schedule and go to the Hilary Mason and GDP our channels, and check that out, but we're going to wrap now. Jim two main announcements today. I hesitate to call them big announcements. I mean they were you know just kind of ... I think the word you used last night was perfunctory. You know I mean they're okay, but they're not game changing. So what did you mean? >> Well first of all, when you look at ... Though IBM is not calling this a signature event, it's essentially a signature event. They do these every June or so. You know in the past several years, the signature events have had like a one track theme, whether it be IBM announcing their investing deeply in Spark, or IBM announcing that they're focusing on investing in R as the core language for data science development. This year at this event in Munich, it's really a three track event, in terms of the broad themes, and I mean they're all important tracks, but none of them is like game-changing. Perhaps IBM doesn't intend them to be it seems like. One of which is obviously Europe. We're holding this in Munich. And a couple of things of importance to European customers, first and foremost GDPR. The deadline next year, in terms of compliance, is approaching. So sound the alarm as it were. And IBM has rolled out compliance or governance tools. Download and the go from the information catalog, governance catalog and so forth. Now announcing the consortium with Hortonworks to build governance on top of Apache Atlas, but also IBM announcing that they've opened up a DSX center in England and a machine-learning hub here in Germany, to help their European clients, in those countries especially, to get deeper down into data science and machine learning, in terms of developing those applicants. That's important for the audience, the regional audience here. The second track, which is also important, and I alluded to it. It's governance. In all of its manifestations you need a master catalog of all the assets for building and maintaining and controlling your data applications and your data science applications. The catalog, the consortium, the various offerings at IBM is announced and discussed in great detail. They've brought in customers and partners like Northern Trust, talk about the importance of governance, not just as a compliance mandate, but also the potential strategy for monetizing your data. That's important. Number three is what I call cloud native data applications and how the state of the art in developing data applications is moving towards containerized and orchestrated environments that involve things like Docker and Kubernetes. The IBM DB2 developer community edition. Been in the market for a few years. The latest version they announced today includes kubernetes support. Includes support for JSON. So it's geared towards new generation of cloud and data apps. What I'm getting at ... Those three core themes are Europe governance and cloud native data application development. Each of them is individually important, but none of them is game changer. And one last thing. Data science and machine learning, is one of the overarching envelope themes of this event. They've had Hilary Mason. A lot of discussion there. My sense I was a little bit disappointed because there wasn't any significant new announcements related to IBM evolving their machine learning portfolio into deep learning or artificial intelligence in an environment where their direct competitors like Microsoft and Google and Amazon are making a huge push in AI, in terms of their investments. There's a bit of a discussion, and Rob Thomas got to it this morning, about DSX. Working with power AI, the IBM platform, I would like to hear more going forward about IBM investments in these areas. So I thought it was an interesting bunch of announcements. I'll backtrack on perfunctory. I'll just say it was good that they had this for a lot of reasons, but like I said, none of these individual announcements is really changing the game. In fact like I said, I think I'm waiting for the fall, to see where IBM goes in terms of doing something that's actually differentiating and innovative. >> Well I think that the event itself is great. You've got a bunch of partners here, a bunch of customers. I mean it's active. IBM knows how to throw a party. They've always have. >> And the sessions are really individually awesome. I mean terms of what you learn. >> The content is very good. I would agree. The two announcements that were sort of you know DB2, sort of what I call community edition. Simpler, easier to download. Even Dave can download DB2. I really don't want to download DB2, but I could, and play with it I guess. You know I'm not database guy, but those of you out there that are, go check it out. And the other one was the sort of unified data governance. They tried to tie it in. I think they actually did a really good job of tying it into GDPR. We're going to hear over the next, you know 11 months, just a ton of GDPR readiness fear, uncertainty and doubt, from the vendor community, kind of like we heard with Y2K. We'll see what kind of impact GDPR has. I mean it looks like it's the real deal Jim. I mean it looks like you know this 4% of turnover penalty. The penalties are much more onerous than any other sort of you know, regulation that we've seen in the past, where you could just sort of fluff it off. Say yeah just pay the fine. I think you're going to see a lot of, well pay the lawyers to delay this thing and battle it. >> And one of our people in theCUBE that we interviewed, said it exactly right. It's like the GDPR is like the inverse of Y2K. In Y2K everybody was freaking out. It was actually nothing when it came down to it. Where nobody on the street is really buzzing. I mean the average person is not buzzing about GDPR, but it's hugely important. And like you said, I mean some serious penalties may be in the works for companies that are not complying, companies not just in Europe, but all around the world who do business with European customers. >> Right okay so now bring it back to sort of machine learning, deep learning. You basically said to Rob Thomas, I see machine learning here. I don't see a lot of the deep learning stuff quite yet. He said stay tuned. You know you were talking about TensorFlow and things like that. >> Yeah they supported that ... >> Explain. >> So Rob indicated that IBM very much, like with power AI and DSX, provides an open framework or toolkit for plugging in your, you the developers, preferred machine learning or deep learning toolkit of an open source nature. And there's a growing range of open source deep learning toolkits beyond you know TensorFlow, including Theano and MXNet and so forth, that IBM is supporting within the overall ESX framework, but also within the power AI framework. In other words they've got those capabilities. They're sort of burying that message under a bushel basket, at least in terms of this event. Also one of the things that ... I said this too Mena Scoyal. Watson data platform, which they launched last fall, very important product. Very important platform for collaboration among data science professionals, in terms of the machine learning development pipeline. I wish there was more about the Watson data platform here, about where they're taking it, what the customers are doing with it. Like I said a couple of times, I see Watson data platform as very much a DevOps tool for the new generation of developers that are building machine learning models directly into their applications. I'd like to see IBM, going forward turn Watson data platform into a true DevOps platform, in terms of continuous integration of machine learning and deep learning another statistical models. Continuous training, continuous deployment, iteration. I believe that's where they're going, or probably she will be going. I'd like to see more. I'm expecting more along those lines going forward. What I just described about DevOps for data science is a big theme that we're focusing on at Wikibon, in terms where the industry is going. >> Yeah, yeah. And I want to come back to that again, and get an update on what you're doing within your team, and talk about the research. Before we do that, I mean one of the things we talked about on theCUBE, in the early days of Hadoop is that the guys are going to make the money in this big data business of the practitioners. They're not going to see, you know these multi-hundred billion dollar valuations come out of the Hadoop world. And so far that prediction has held up well. It's the Airbnbs and the Ubers and the Spotifys and the Facebooks and the Googles, the practitioners who are applying big data, that are crushing it and making all the money. You see Amazon now buying Whole Foods. That in our view is a data play, but who's winning here, in either the vendor or the practitioner community? >> Who's winning are the startups with a hot new idea that's changing, that's disrupting some industry, or set of industries with machine learning, deep learning, big data, etc. For example everybody's, with bated breath, waiting for you know self-driving vehicles. And the ecosystem as it develops somebody's going to clean up. And one or more companies, companies we probably never heard of, leveraging everything we're describing here today, data science and containerized distributed applications that involve you know deep learning for you know image analysis and sensor analyst and so forth. Putting it all together in some new fabric that changes the way we live on this planet, but as you said the platforms themselves, whether they be Hadoop or Spark or TensorFlow, whatever, they're open source. You know and the fact is, by it's very nature, open source based solutions, in terms of profit margins on selling those, inexorably migrate to zero. So you're not going to make any money as a tool vendor, or a platform vendor. You got to make money ... If you're going to make money, you make money, for example from providing an ecosystem, within which innovation can happen. >> Okay we have a few minutes left. Let's talk about the research that you're working on. What's exciting you these days? >> Right, right. So I think a lot of people know I've been around the analyst space for a long long time. I've joined the SiliconANGLE Wikibon team just recently. I used to work for a very large solution provider, and what I do here for Wikibon is I focus on data science as the core of next generation application development. When I say next-generation application development, it's the development of AI, deep learning machine learning, and the deployment of those data-driven statistical assets into all manner of application. And you look at the hot stuff, like chatbots for example. Transforming the experience in e-commerce on mobile devices. Siri and Alexa and so forth. Hugely important. So what we're doing is we're focusing on AI and everything. We're focusing on containerization and building of AI micro-services and the ecosystem of the pipelines and the tools that allow you to do that. DevOps for data science, distributed training, federated training of statistical models, so forth. We are also very much focusing on the whole distributed containerized ecosystem, Docker, Kubernetes and so forth. Where that's going, in terms of changing the state of the art, in terms of application development. Focusing on the API economy. All of those things that you need to wrap around the payload of AI to deliver it into every ... >> So you're focused on that intersection between AI and the related topics and the developer. Who is winning in that developer community? Obviously Amazon's winning. You got Microsoft doing a good job there. Google, Apple, who else? I mean how's IBM doing for example? Maybe name some names. Who do you who impresses you in the developer community? But specifically let's start with IBM. How is IBM doing in that space? >> IBM's doing really well. IBM has been for quite a while, been very good about engaging with new generation of developers, using spark and R and Hadoop and so forth to build applications rapidly and deploy them rapidly into all manner of applications. So IBM has very much reached out to, in the last several years, the Millennials for whom all of this, these new tools, have been their core repertoire from the very start. And I think in many ways, like today like developer edition of the DB2 developer community edition is very much geared to that market. Saying you know to the cloud native application developer, take a second look at DB2. There's a lot in DB2 that you might bring into your next application development initiative, alongside your spark toolkit and so forth. So IBM has startup envy. They're a big old company. Been around more than a hundred years. And they're trying to, very much bootstrap and restart their brand in this new context, in the 21st century. I think they're making a good effort at doing it. In terms of community engagement, they have a really good community engagement program, all around the world, in terms of hackathons and developer days, you know meetups here and there. And they get lots of turnout and very loyal customers and IBM's got to broadest portfolio. >> So you still bleed a little bit of blue. So I got to squeeze it out of you now here. So let me push a little bit on what you're saying. So DB2 is the emphasis here, trying to position DB2 as appealing for developers, but why not some of the other you know acquisitions that they've made? I mean you don't hear that much about Cloudant, Dash TV, and things of that nature. You would think that that would be more appealing to some of the developer communities than DB2. Or am I mistaken? Is it IBM sort of going after the core, trying to evolve that core you know constituency? >> No they've done a lot of strategic acquisitions like Cloudant, and like they've acquired Agrath Databases and brought them into their platform. IBM has every type of database or file system that you might need for web or social or Internet of Things. And so with all of the development challenges, IBM has got a really high-quality, fit-the-purpose, best-of-breed platform, underlying data platform for it. They've got huge amounts of developers energized all around the world working on this platform. DB2, in the last several years they've taken all of their platforms, their legacy ... That's the wrong word. All their existing mature platforms, like DB2 and brought them into the IBM cloud. >> I think legacy is the right word. >> Yeah, yeah. >> These things have been around for 30 years. >> And they're not going away because they're field-proven and ... >> They are evolving. >> And customers have implemented them everywhere. And they're evolving. If you look at how IBM has evolved DB2 in the last several years into ... For example they responded to the challenge from SAP HANA. We brought BLU Acceleration technology in memory technology into DB2 to make it screamingly fast and so forth. IBM has done a really good job of turning around these product groups and the product architecture is making them cloud first. And then reaching out to a new generation of cloud application developers. Like I said today, things like DB2 developer community edition, it's just the next chapter in this ongoing saga of IBM turning itself around. Like I said, each of the individual announcements today is like okay that's interesting. I'm glad to see IBM showing progress. None of them is individually disruptive. I think the last week though, I think Hortonworks was disruptive in the sense that IBM recognized that BigInsights didn't really have a lot of traction in the Hadoop spaces, not as much as they would have wished. Hortonworks very much does, and IBM has cast its lot to work with HDP, but HDP and Hortonworks recognizes they haven't achieved any traction with data scientists, therefore DSX makes sense, as part of the Hortonworks portfolio. Likewise a big sequel makes perfect sense as the sequel front end to the HDP. I think the teaming of IBM and Hortonworks is propitious of further things that they'll be doing in the future, not just governance, but really putting together a broader cloud portfolio for the next generation of data scientists doing work in the cloud. >> Do you think Hortonworks is a legitimate acquisition target for IBM. >> Of course they are. >> Why would IBM ... You know educate us. Why would IBM want to acquire Hortonworks? What does that give IBM? Open source mojo, obviously. >> Yeah mojo. >> What else? >> Strong loyalty with the Hadoop market with developers. >> The developer angle would supercharge the developer angle, and maybe make it more relevant outside of some of those legacy systems. Is that it? >> Yeah, but also remember that Hortonworks came from Yahoo, the team that developed much of what became Hadoop. They've got an excellent team. Strategic team. So in many ways, you can look at Hortonworks as one part aqui-hire if they ever do that and one part really substantial and growing solution portfolio that in many ways is complementary to IBM. Hortonworks is really deep on the governance of Hadoop. IBM has gone there, but I think Hortonworks is even deeper, in terms of their their laser focus. >> Ecosystem expansion, and it actually really wouldn't be that expensive of an acquisition. I mean it's you know north of ... Maybe a billion dollars might get it done. >> Yeah. >> You know so would you pay a billion dollars for Hortonworks? >> Not out of my own pocket. >> No, I mean if you're IBM. You think that would deliver that kind of value? I mean you know how IBM thinks about about acquisitions. They're good at acquisitions. They look at the IRR. They have their formula. They blue-wash the companies and they generally do very well with acquisitions. Do you think Hortonworks would fit profile, that monetization profile? >> I wouldn't say that Hortonworks, in terms of monetization potential, would match say what IBM has achieved by acquiring the Netezza. >> Cognos. >> Or SPSS. I mean SPSS has been an extraordinarily successful ... >> Well the day IBM acquired SPSS they tripled the license fees. As a customer I know, ouch, it worked. It was incredibly successful. >> Well, yeah. Cognos was. Netezza was. And SPSS. Those three acquisitions in the last ten years have been extraordinarily pivotal and successful for IBM to build what they now have, which is really the most comprehensive portfolio of fit-to-purpose data platform. So in other words all those acquisitions prepared IBM to duke it out now with their primary competitors in this new field, which are Microsoft, who's newly resurgent, and Amazon Web Services. In other words, the two Seattle vendors, Seattle has come on strong, in a way that almost Seattle now in big data in the cloud is eclipsing Silicon Valley, in terms of where you know ... It's like the locus of innovation and really of customer adoption in the cloud space. >> Quite amazing. Well Google still hanging in there. >> Oh yeah. >> Alright, Jim. Really a pleasure working with you today. Thanks so much. Really appreciate it. >> Thanks for bringing me on your team. >> And Munich crew, you guys did a great job. Really well done. Chuck, Alex, Patrick wherever he is, and our great makeup lady. Thanks a lot. Everybody back home. We're out. This is Fast Track Your Data. Go to IBMgo.com for all the replays. Youtube.com/SiliconANGLE for all the shows. TheCUBE.net is where we tell you where theCUBE's going to be. Go to wikibon.com for all the research. Thanks for watching everybody. This is Dave Vellante with Jim Kobielus. We're out.

Published Date : Jun 25 2017

SUMMARY :

Brought to you by IBM. I mean they were you know just kind of ... I think the word you used last night was perfunctory. And a couple of things of importance to European customers, first and foremost GDPR. IBM knows how to throw a party. I mean terms of what you learn. seen in the past, where you could just sort of fluff it off. I mean the average person is not buzzing about GDPR, but it's hugely important. I don't see a lot of the deep learning stuff quite yet. And there's a growing range of open source deep learning toolkits beyond you know TensorFlow, of Hadoop is that the guys are going to make the money in this big data business of the And the ecosystem as it develops somebody's going to clean up. Let's talk about the research that you're working on. the pipelines and the tools that allow you to do that. Who do you who impresses you in the developer community? all around the world, in terms of hackathons and developer days, you know meetups here Is it IBM sort of going after the core, trying to evolve that core you know constituency? They've got huge amounts of developers energized all around the world working on this platform. Likewise a big sequel makes perfect sense as the sequel front end to the HDP. You know educate us. The developer angle would supercharge the developer angle, and maybe make it more relevant Hortonworks is really deep on the governance of Hadoop. I mean it's you know north of ... They blue-wash the companies and they generally do very well with acquisitions. I wouldn't say that Hortonworks, in terms of monetization potential, would match say I mean SPSS has been an extraordinarily successful ... Well the day IBM acquired SPSS they tripled the license fees. now in big data in the cloud is eclipsing Silicon Valley, in terms of where you know Well Google still hanging in there. Really a pleasure working with you today. And Munich crew, you guys did a great job.

ENTITIES

Entity	Category	Confidence
Kate Silverton	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Germany	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Y2K	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Chuck	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
England	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
second track	QUANTITY	0.99+
Siri	TITLE	0.99+
two	QUANTITY	0.99+
21st century	DATE	0.99+
three track	QUANTITY	0.99+
Rob	PERSON	0.99+
next year	DATE	0.99+
4%	QUANTITY	0.99+
Mena Scoyal	PERSON	0.99+
Alex	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
Cloudant	ORGANIZATION	0.99+

Panel Discussion | IBM Fast Track Your Data 2017

>> Narrator: Live, from Munich, Germany, it's the CUBE. Covering IBM, Fast Track Your Data. Brought to you by IBM. >> Welcome to Munich everybody. This is a special presentation of the CUBE, Fast Track Your Data, brought to you by IBM. My name is Dave Vellante. And I'm here with my cohost, Jim Kobielus. Jim, good to see you. Really good to see you in Munich. >> Jim: I'm glad I made it. >> Thanks for being here. So last year Jim and I hosted a panel at New York City on the CUBE. And it was quite an experience. We had, I think it was nine or 10 data scientists and we felt like that was a lot of people to organize and talk about data science. Well today, we're going to do a repeat of that. With a little bit of twist on topics. And we've got five data scientists. We're here live, in Munich. And we're going to kick off the Fast Track Your Data event with this data science panel. So I'm going to now introduce some of the panelists, or all of the panelists. Then we'll get into the discussions. I'm going to start with Lillian Pierson. Lillian thanks very much for being on the panel. You are in data science. You focus on training executives, students, and you're really a coach but with a lot of data science expertise based in Thailand, so welcome. >> Thank you, thank you so much for having me. >> Dave: You're very welcome. And so, I want to start with sort of when you focus on training people, data science, where do you start? >> Well it depends on the course that I'm teaching. But I try and start at the beginning so for my Big Data course, I actually start back at the fundamental concepts and definitions they would even need to understand in order to understand the basics of what Big Data is, data engineering. So, terms like data governance. Going into the vocabulary that makes up the very introduction of the course, so that later on the students can really grasp the concepts I present to them. You know I'm teaching a deep learning course as well, so in that case I start at a lot more advanced concepts. So it just really depends on the level of the course. >> Great, and we're going to come back to this topic of women in tech. But you know, we looked at some CUBE data the other day. About 17% of the technology industry comprises women. And so we're a little bit over that on our data science panel, we're about 20% today. So we'll come back to that topic. But I don't know if there's anything you would add? >> I'm really passionate about women in tech and women who code, in particular. And I'm connected with a lot of female programmers through Instagram. And we're supporting each other. So I'd love to take any questions you have on what we're doing in that space. At least as far as what's happening across the Instagram platform. >> Great, we'll circle back to that. All right, let me introduce Chris Penn. Chris, Boston based, all right, SMI. Chris is a marketing expert. Really trying to help people understand how to get, turn data into value from a marketing perspective. It's a very important topic. Not only because we get people to buy stuff but also understanding some of the risks associated with things like GDPR, which is coming up. So Chris, tell us a little bit about your background and your practice. >> So I actually started in IT and worked at a start up. And that's where I made the transition to marketing. Because marketing has much better parties. But what's really interesting about the way data science is infiltrating marketing is the technology came in first. You know, everything went digital. And now we're at a point where there's so much data. And most marketers, they kind of got into marketing as sort of the arts and crafts field. And are realizing now, they need a very strong, mathematical, statistical background. So one of the things, Adam, the reason why we're here and IBM is helping out tremendously is, making a lot of the data more accessible to people who do not have a data science background and probably never will. >> Great, okay thank you. I'm going to introduce Ronald Van Loon. Ronald, your practice is really all about helping people extract value out of data, driving competitive advantage, business advantage, or organizational excellence. Tell us a little bit about yourself, your background, and your practice. >> Basically, I've three different backgrounds. On one hand, I'm a director at a data consultancy firm called Adversitement. Where we help companies to become data driven. Mainly large companies. I'm an advisory board member at Simply Learn, which is an e-learning platform, especially also for big data analytics. And on the other hand I'm a blogger and I host a series of webinars. >> Okay, great, now Dez, Dez Blanchfield, I met you on Twitter, you know, probably a couple of years ago. We first really started to collaborate last year. We've spend a fair amount of time together. You are a data scientist, but you're also a jack of all trades. You've got a technology background. You sit on a number of boards. You work very active with public policy. So tell us a little bit more about what you're doing these days, a little bit more about your background. >> Sure, I think my primary challenge these days is communication. Trying to join the dots between my technical background and deeply technical pedigree, to just plain English, every day language, and business speak. So bridging that technical world with what's happening in the boardroom. Toe to toe with the geeks to plain English to execs in boards. And just hand hold them and steward them through the journey of the challenges they're facing. Whether it's the enormous rapid of change and the pace of change, that's just almost exhaustive and causing them to sprint. But not just sprint in one race but in multiple lanes at the same time. As well as some of the really big things that are coming up, that we've seen like GDPR. So it's that communication challenge and just hand holding people through that journey and that mix of technical and commercial experience. >> Great, thank you, and finally Joe Caserta. Founder and president of Caserta Concepts. Joe you're a practitioner. You're in the front lines, helping organizations, similar to Ronald. Extracting value from data. Translate that into competitive advantage. Tell us a little bit about what you're doing these days in Caserta Concepts. >> Thanks Dave, thanks for having me. Yeah, so Caserta's been around. I've been doing this for 30 years now. And natural progressions have been just getting more from application development, to data warehousing, to big data analytics, to data science. Very, very organically, that's just because it's where businesses need the help the most, over the years. And right now, the big focus is governance. At least in my world. Trying to govern when you have a bunch of disparate data coming from a bunch of systems that you have no control over, right? Like social media, and third party data systems. Bringing it in and how to you organize it? How do you ingest it? How do you govern it? How do you keep it safe? And also help to define ownership of the data within an organization within an enterprise? That's also a very hot topic. Which ties back into GDPR. >> Great, okay, so we're going to be unpacking a lot of topics associated with the expertise that these individuals have. I'm going to bring in Jim Kobielus, to the conversation. Jim, the newest Wikibon analyst. And newest member of the SiliconANGLE Media Team. Jim, get us started off. >> Yeah, so we're at an event, at an IBM event where machine learning and data science are at the heart of it. There are really three core themes here. Machine learning and data science, on the one hand. Unified governance on the other. And hybrid data management. I want to circle back or focus on machine learning. Machine learning is the coin of the realm, right now in all things data. Machine learning is the heart of AI. Machine learning, everybody is going, hiring, data scientists to do machine learning. I want to get a sense from our panel, who are experts in this area, what are the chief innovations and trends right now on machine learning. Not deep learning, the core of machine learning. What's super hot? What's in terms of new techniques, new technologies, new ways of organizing teams to build and to train machine learning models? I'd like to open it up. Let's just start with Lillian. What are your thoughts about trends in machine learning? What's really hot? >> It's funny that you excluded deep learning from the response for this, because I think the hottest space in machine learning is deep learning. And deep learning is machine learning. I see a lot of collaborative platforms coming out, where people, data scientists are able to work together with other sorts of data professionals to reduce redundancies in workflows. And create more efficient data science systems. >> Is there much uptake of these crowd sourcing environments for training machine learning wells. Like CrowdFlower, or Amazon Mechanical Turk, or Mighty AI? Is that a huge trend in terms of the workflow of data science or machine learning, a lot of that? >> I don't see that crowdsourcing is like, okay maybe I've been out of the crowdsourcing space for a while. But I was working with Standby Task Force back in 2013. And we were doing a lot of crowdsourcing. And I haven't seen the industry has been increasing, but I could be wrong. I mean, because there's no, if you're building automation models, most of the, a lot of the work that's being crowdsourced could actually be automated if someone took the time to just build the scripts and build the models. And so I don't imagine that, that's going to be a trend that's increasing. >> Well, automation machine learning pipeline is fairly hot, in terms of I'm seeing more and more research. Google's doing a fair amount of automated machine learning. The panel, what do you think about automation, in terms of the core modeling tasks involved in machine learning. Is that coming along? Are data scientists in danger of automating themselves out of a job? >> I don't think there's a risk of data scientist's being put out of a job. Let's just put that on the thing. I do think we need to get a bit clearer about this meme of the mythical unicorn. But to your call point about machine learning, I think what you'll see, we saw the cloud become baked into products, just as a given. I think machine learning is already crossed this threshold. We just haven't necessarily noticed or caught up. And if we look at, we're at an IBM event, so let's just do a call out for them. The data science experience platform, for example. Machine learning's built into a whole range of things around algorithm and data classification. And there's an assisted, guided model for how you get to certain steps, where you don't actually have to understand how machine learning works. You don't have to understand how the algorithms work. It shows you the different options you've got and you can choose them. So you might choose regression. And it'll give you different options on how to do that. So I think we've already crossed this threshold of baking in machine learning and baking in the data science tools. And we've seen that with Cloud and other technologies where, you know, the Office 365 is not, you can't get a non Cloud Office 365 account, right? I think that's already happened in machine learning. What we're seeing though, is organizations even as large as the Googles still in catch up mode, in my view, on some of the shift that's taken place. So we've seen them write little games and apps where people do doodles and then it runs through the ML library and says, "Well that's a cow, or a unicorn, or a duck." And you get awards, and gold coins, and whatnot. But you know, as far as 12 years ago I was working on a project, where we had full size airplanes acting as drones. And we mapped with two and 3-D imagery. With 2-D high res imagery and LiDAR for 3-D point Clouds. We were finding poles and wires for utility companies, using ML before it even became a trend. And baking it right into the tools. And used to store on our web page and clicked and pointed on. >> To counter Lillian's point, it's not crowdsourcing but crowd sharing that's really powering a lot of the rapid leaps forward. If you look at, you know, DSX from IBM. Or you look at Node-RED, huge number of free workflows that someone has probably already done the thing that you are trying to do. Go out and find in the libraries, through Jupyter and R Notebooks, there's an ability-- >> Chris can you define before you go-- >> Chris: Sure. >> This is great, crowdsourcing versus crowd sharing. What's the distinction? >> Well, so crowdsourcing, kind of, where in the context of the question you ask is like I'm looking for stuff that other people, getting people to do stuff that, for me. It's like asking people to mine classifieds. Whereas crowd sharing, someone has done the thing already, it already exists. You're not purpose built, saying, "Jim, help me build this thing." It's like, "Oh Jim, you already "built this thing, cool. "So can I fork it and make my own from it?" >> Okay, I see what you mean, keep going. >> And then, again, going back to earlier. In terms of the advancements. Really deep learning, it probably is a good idea to just sort of define these things. Machine learning is how machines do things without being explicitly programmed to do them. Deep learning's like if you can imagine a stack of pancakes, right? Each pancake is a type of machine learning algorithm. And your data is the syrup. You pour the data on it. It goes from layer, to layer, to layer, to layer, and what you end up with at the end is breakfast. That's the easiest analogy for what deep learning is. Now imagine a stack of pancakes, 500 or 1,000 high, that's where deep learning's going now. >> Sure, multi layered machine learning models, essentially, that have the ability to do higher levels of abstraction. Like image analysis, Lillian? >> I had a comment to add about automation and data science. Because there are a lot of tools that are able to, or applications that are able to use data science algorithms and output results. But the reason that data scientists aren't in risk of losing their jobs, is because just because you can get the result, you also have to be able to interpret it. Which means you have to understand it. And that involves deep math and statistical understanding. Plus domain expertise. So, okay, great, you took out the coding element but that doesn't mean you can codify a person's ability to understand and apply that insight. >> Dave: Joe, you have something to add? >> I could just add that I see the trend. Really, the reason we're talking about it today is machine learning is not necessarily, it's not new, like Dez was saying. But what's different is the accessibility of it now. It's just so easily accessible. All of the tools that are coming out, for data, have machine learning built into it. So the machine learning algorithms, which used to be a black art, you know, years ago, now is just very easily accessible. That you can get, it's part of everyone's toolbox. And the other reason that we're talking about it more, is that data science is starting to become a core curriculum in higher education. Which is something that's new, right? That didn't exist 10 years ago? But over the past five years, I'd say, you know, it's becoming more and more easily accessible for education. So now, people understand it. And now we have it accessible in our tool sets. So now we can apply it. And I think that's, those two things coming together is really making it becoming part of the standard of doing analytics. And I guess the last part is, once we can train the machines to start doing the analytics, right? And get smarter as it ingests more data. And then we can actually take that and embed it in our applications. That's the part that you still need data scientists to create that. But once we can have standalone appliances that are intelligent, that's when we're going to start seeing, really, machine learning and artificial intelligence really start to take off even more. >> Dave: So I'd like to switch gears a little bit and bring Ronald on. >> Okay, yes. >> Here you go, there. >> Ronald, the bromide in this sort of big data world we live in is, the data is the new oil. You got to be a data driven company and many other cliches. But when you talk to organizations and you start to peel the onion. You find that most companies really don't have a good way to connect data with business impact and business value. What are you seeing with your clients and just generally in the community, with how companies are doing that? How should they do that? I mean, is that something that is a viable approach? You don't see accountants, for example, quantifying the value of data on a balance sheet. There's no standards for doing that. And so it's sort of this fuzzy concept. How are and how should organizations take advantage of data and turn it into value. >> So, I think in general, if you look how companies look at data. They have departments and within the departments they have tools specific for this department. And what you see is that there's no central, let's say, data collection. There's no central management of governance. There's no central management of quality. There's no central management of security. Each department is manages their data on their own. So if you didn't ask, on one hand, "Okay, how should they do it?" It's basically go back to the drawing table and say, "Okay, how should we do it?" We should collect centrally, the data. And we should take care for central governance. We should take care for central data quality. We should take care for centrally managing this data. And look from a company perspective and not from a department perspective what the value of data is. So, look at the perspective from your whole company. And this means that it has to be brought on one end to, whether it's from C level, where most of them still fail to understand what it really means. And what the impact can be for that company. >> It's a hard problem. Because data by its' very nature is now so decentralized. But Chris you have a-- >> The thing I want to add to that is, think about in terms of valuing data. Look at what it would cost you for data breach. Like what is the expensive of having your data compromised. If you don't have governance. If you don't have policy in place. Look at the major breaches of the last couple years. And how many billions of dollars those companies lost in market value, and trust, and all that stuff. That's one way you can value data very easily. "What will it cost us if we mess this up?" >> So a lot of CEOs will hear that and say, "Okay, I get it. "I have to spend to protect myself, "but I'd like to make a little money off of this data thing. "How do I do that?" >> Well, I like to think of it, you know, I think data's definitely an asset within an organization. And is becoming more and more of an asset as the years go by. But data is still a raw material. And that's the way I think about it. In order to actually get the value, just like if you're creating any product, you start with raw materials and then you refine it. And then it becomes a product. For data, data is a raw material. You need to refine it. And then the insight is the product. And that's really where the value is. And the insight is absolutely, you can monetize your insight. >> So data is, abundant insights are scarce. >> Well, you know, actually you could say that intermediate between insights and the data are the models themselves. The statistical, predictive, machine learning models. That are a crystallization of insights that have been gained by people called data scientists. What are your thoughts on that? Are statistical, predictive, machine learning models something, an asset, that companies, organizations, should manage governance of on a centralized basis or not? >> Well the models are essentially the refinery system, right? So as you're refining your data, you need to have process around how you exactly do that. Just like refining anything else. It needs to be controlled and it needs to be governed. And I think that data is no different from that. And I think that it's very undisciplined right now, in the market or in the industry. And I think maturing that discipline around data science, I think is something that's going to be a very high focus in this year and next. >> You were mentioning, "How do you make money from data?" Because there's all this risk associated with security breaches. But at the risk of sounding simplistic, you can generate revenue from system optimization, or from developing products and services. Using data to develop products and services that better meet the demands and requirements of your markets. So that you can sell more. So either you are using data to earn more money. Or you're using data to optimize your system so you have less cost. And that's a simple answer for how you're going to be making money from the data. But yes, there is always the counter to that, which is the security risks. >> Well, and my question really relates to, you know, when you think of talking to C level executives, they kind of think about running the business, growing the business, and transforming the business. And a lot of times they can't fund these transformations. And so I would agree, there's many, many opportunities to monetize data, cut costs, increase revenue. But organizations seem to struggle to either make a business case. And actually implement that transformation. >> Dave, I'd love to have a crack at that. I think this conversation epitomizes the type of things that are happening in board rooms and C suites already. So we've really quickly dived into the detail of data. And the detail of machine learning. And the detail of data science, without actually stopping and taking a breath and saying, "Well, we've "got lots of it, but what have we got? "Where is it? "What's the value of it? "Is there any value in it at all?" And, "How much time and money should we invest in it?" For example, we talk of being about a resource. I look at data as a utility. When I turn the tap on to get a drink of water, it's there as a utility. I counted it being there but I don't always sample the quality of the water and I probably should. It could have Giardia in it, right? But what's interesting is I trust the water at home, in Sydney. Because we have a fairly good experience with good quality water. If I were to go to some other nation. I probably wouldn't trust that water. And I think, when you think about it, what's happening in organizations. It's almost the same as what we're seeing here today. We're having a lot of fun, diving into the detail. But what we've forgotten to do is ask the question, "Well why is data even important? "What's the reasoning to the business? "Why are we in business? "What are we doing as an organization? "And where does data fit into that?" As opposed to becoming so fixated on data because it's a media hyped topic. I think once you can wind that back a bit and say, "Well, we have lot's of data, "but is it good data? "Is it quality data? "Where's it coming from? "Is it ours? "Are we allowed to have it? "What treatment are we allowed to give that data?" As you said, "Are we controlling it? "And where are we controlling it? "Who owns it?" There's so many questions to be asked. But the first question I like to ask people in plain English is, "Well is there any value "in data in the first place? "What decisions are you making that data can help drive? "What things are in your organizations, "KPIs and milestones you're trying to meet "that data might be a support?" So then instead of becoming fixated with data as a thing in itself, it becomes part of your DNA. Does that make sense? >> Think about what money means. The Economists' Rhyme, "Money is a measure for, "a systems for, a medium, a measure, and exchange." So it's a medium of exchange. A measure of value, a way to exchange something. And a way to store value. Data, good clean data, well governed, fits all four of those. So if you're trying to figure out, "How do we make money out of stuff." Figure out how money works. And then figure out how you map data to it. >> So if we approach and we start with a company, we always start with business case, which is quite clear. And defined use case, basically, start with a team on one hand, marketing people, sales people, operational people, and also the whole data science team. So start with this case. It's like, defining, basically a movie. If you want to create the movie, You know where you're going to. You know what you want to achieve to create the customer experience. And this is basically the same with a business case. Where you define, "This is the case. "And this is how we're going to derive value, "start with it and deliver value within a month." And after the month, you check, "Okay, where are we and how can we move forward? "And what's the value that we've brought?" >> Now I as well, start with business case. I've done thousands of business cases in my life, with organizations. And unless that organization was kind of a data broker, the business case rarely has a discreet component around data. Is that changing, in your experience? >> Yes, so we guide companies into be data driven. So initially, indeed, they don't like to use the data. They don't like to use the analysis. So that's why, how we help. And is it changing? Yes, they understand that they need to change. But changing people is not always easy. So, you see, it's hard if you're not involved and you're not guiding it, they fall back in doing the daily tasks. So it's changing, but it's a hard change. >> Well and that's where this common parlance comes in. And Lillian, you, sort of, this is what you do for a living, is helping people understand these things, as you've been sort of evangelizing that common parlance. But do you have anything to add? >> I wanted to add that for organizational implementations, another key component to success is to start small. Start in one small line of business. And then when you've mastered that area and made it successful, then try and deploy it in more areas of the business. And as far as initializing big data implementation, that's generally how to do it successfully. >> There's the whole issue of putting a value on data as a discreet asset. Then there's the issue, how do you put a value on a data lake? Because a data lake, is essentially an asset you build on spec. It's an exploratory archive, essentially, of all kinds of data that might yield some insights, but you have to have a team of data scientists doing exploration and modeling. But it's all on spec. How do you put a value on a data lake? And at what point does the data lake itself become a burden? Because you got to store that data and manage it. At what point do you drain that lake? At what point, do the costs of maintaining that lake outweigh the opportunity costs of not holding onto it? >> So each Hadoop note is approximately $20,000 per year cost for storage. So I think that there needs to be a test and a diagnostic, before even inputting, ingesting the data and storing it. "Is this actually going to be useful? "What value do we plan to create from this?" Because really, you can't store all the data. And it's a lot cheaper to store data in Hadoop then it was in traditional systems but it's definitely not free. So people need to be applying this test before even ingesting the data. Why do we need this? What business value? >> I think the question we need to also ask around this is, "Why are we building data lakes "in the first place? "So what's the function it's going to perform for you?" There's been a huge drive to this idea. "We need a data lake. "We need to put it all somewhere." But invariably they become data swamps. And we only half jokingly say that because I've seen 90 day projects turn from a great idea, to a really bad nightmare. And as Lillian said, it is cheaper in some ways to put it into a HDFS platform, in a technical sense. But when we look at all the fully burdened components, it's actually more expensive to find Hadoop specialists and Spark specialists to maintain that cluster. And invariably I'm finding that big data, quote unquote, is not actually so much lots of data, it's complex data. And as Lillian said, "You don't always "need to store it all." So I think if we go back to the question of, "What's the function of a data lake in the first place? "Why are we building one?" And then start to build some fully burdened cost components around that. We'll quickly find that we don't actually need a data lake, per se. We just need an interim data store. So we might take last years' data and tokenize it, and analyze it, and do some analytics on it, and just keep the meta data. So I think there is this rush, for a whole range of reasons, particularly vendor driven. To build data lakes because we think they're a necessity, when in reality they may just be an interim requirement and we don't need to keep them for a long term. >> I'm going to attempt to, the last few questions, put them all together. And I think, they all belong together because one of the reasons why there's such hesitation about progress within the data world is because there's just so much accumulated tech debt already. Where there's a new idea. We go out and we build it. And six months, three years, it really depends on how big the idea is, millions of dollars is spent. And then by the time things are built the idea is pretty much obsolete, no one really cares anymore. And I think what's exciting now is that the speed to value is just so much faster than it's ever been before. And I think that, you know, what makes that possible is this concept of, I don't think of a data lake as a thing. I think of a data lake as an ecosystem. And that ecosystem has evolved so much more, probably in the last three years than it has in the past 30 years. And it's exciting times, because now once we have this ecosystem in place, if we have a new idea, we can actually do it in minutes not years. And that's really the exciting part. And I think, you know, data lake versus a data swamp, comes back to just traditional data architecture. And if you architect your data lake right, you're going to have something that's substantial, that's you're going to be able to harness and grow. If you don't do it right. If you just throw data. If you buy Hadoop cluster or a Cloud platform and just throw your data out there and say, "We have a lake now." yeah, you're going to create a mess. And I think taking the time to really understand, you know, the new paradigm of data architecture and modern data engineering, and actually doing it in a very disciplined way. If you think about it, what we're doing is we're building laboratories. And if you have a shabby, poorly built laboratory, the best scientist in the world isn't going to be able to prove his theories. So if you have a well built laboratory and a clean room, then, you know a scientist can get what he needs done very, very, very efficiently. And that's the goal, I think, of data management today. >> I'd like to just quickly add that I totally agree with the challenge between on premise and Cloud mode. And I think one of the strong themes of today is going to be the hybrid data management challenge. And I think organizations, some organizations, have rushed to adopt Cloud. And thinking it's a really good place to dump the data and someone else has to manage the problem. And then they've ended up with a very expensive death by 1,000 cuts in some senses. And then others have been very reluctant as a result of not gotten access to rapid moving and disruptive technology. So I think there's a really big challenge to get a basic conversation going around what's the value using Cloud technology as in adopting it, versus what are the risks? And when's the right time to move? For example, should we Cloud Burst for workloads? Do we move whole data sets in there? You know, moving half a petabyte of data into a Cloud platform back is a non-trivial exercise. But moving a terabyte isn't actually that big a deal anymore. So, you know, should we keep stuff behind the firewalls? I'd be interested in seeing this week where 80% of the data, supposedly is. And just push out for Cloud tools, machine learning, data science tools, whatever they might be, cognitive analytics, et cetera. And keep the bulk of the data on premise. Or should we just move whole spools into the Cloud? There is no one size fits all. There's no silver bullet. Every organization has it's own quirks and own nuances they need to think through and make a decision themselves. >> Very often, Dez, organizations have zonal architectures so you'll have a data lake that consists of a no sequel platform that might be used for say, mobile applications. A Hadoop platform that might be used for unstructured data refinement, so forth. A streaming platform, so forth and so on. And then you'll have machine learning models that are built and optimized for those different platforms. So, you know, think of it in terms of then, your data lake, is a set of zones that-- >> It gets even more complex just playing on that theme, when you think about what Cisco started, called Folk Computing. I don't really like that term. But edge analytics, or computing at the edge. We've seen with the internet coming along where we couldn't deliver everything with a central data center. So we started creating this concept of content delivery networks, right? I think the same thing, I know the same thing has happened in data analysis and data processing. Where we've been pulling social media out of the Cloud, per se, and bringing it back to a central source. And doing analytics on it. But when you think of something like, say for example, when the Dreamliner 787 from Boeing came out, this airplane created 1/2 a terabyte of data per flight. Now let's just do some quick, back of the envelope math. There's 87,400 fights a day, just in the domestic airspace in the USA alone, per day. Now 87,400 by 1/2 a terabyte, that's 43 point five petabytes a day. You physically can't copy that from quote unquote in the Cloud, if you'll pardon the pun, back to the data center. So now we've got the challenge, a lot of our Enterprise data's behind a firewall, supposedly 80% of it. But what's out at the edge of the network. Where's the value in that data? So there are zonal challenges. Now what do I do with my Enterprise versus the open data, the mobile data, the machine data. >> Yeah, we've seen some recent data from IDC that says, "About 43% of the data "is going to stay at the edge." We think that, that's way understated, just given the examples. We think it's closer to 90% is going to stay at the edge. >> Just on the airplane topic, right? So Airbus wasn't going to be outdone. Boeing put 4,000 sensors or something in their 787 Dreamliner six years ago. Airbus just announced an 83, 81,000 with 10,000 sensors in it. Do the same math. Now the FAA in the US said that all aircraft and all carriers have to be, by early next year, I think it's like March or April next year, have to be at the same level of BIOS. Or the same capability of data collection and so forth. It's kind of like a mini GDPR for airlines. So with the 83, 81,000 with 10,000 sensors, that becomes two point five terabytes per flight. If you do the math, it's 220 petabytes of data just in one day's traffic, domestically in the US. Now, it's just so mind boggling that we're going to have to completely turn our thinking on its' head, on what do we do behind the firewall? What do we do in the Cloud versus what we might have to do in the airplane? I mean, think about edge analytics in the airplane processing data, as you said, Jim, streaming analytics in flight. >> Yeah that's a big topic within Wikibon, so, within the team. Me and David Floyer, and my other colleagues. They're talking about the whole notion of edge architecture. Not only will most of the data be persisted at the edge, most of the deep learning models like TensorFlow will be executed at the edge. To some degree, the training of those models will happen in the Cloud. But much of that will be pushed in a federated fashion to the edge, or at least I'm predicting. We're already seeing some industry moves in that direction, in terms of architectures. Google has a federated training, project or initiative. >> Chris: Look at TensorFlow Lite. >> Which is really fascinating for it's geared to IOT, I'm sorry, go ahead. >> Look at TensorFlow Lite. I mean in the announcement of having every Android device having ML capabilities, is Google's essential acknowledgment, "We can't do it all." So we need to essentially, sort of like a setting at home. Everyone's smartphone top TV box just to help with the processing. >> Now we're talking about this, this sort of leads to this IOT discussion but I want to underscore the operating model. As you were saying, "You can't just "lift and shift to the Cloud." You're not going to, CEOs aren't going to get the billion dollar hit by just doing that. So you got to change the operating model. And that leads to, this discussion of IOT. And an entirely new operating model. >> Well, there are companies that are like Sisense who have worked with Intel. And they've taken this concept. They've taken the business logic and not just putting it in the chip, but actually putting it in memory, in the chip. So as data's going through the chip it's not just actually being processed but it's actually being baked in memory. So level one, two, and three cache. Now this is a game changer. Because as Chris was saying, even if we were to get the data back to a central location, the compute load, I saw a real interesting thing from I think it was Google the other day, one of the guys was doing a talk. And he spoke about what it meant to add cognitive and voice processing into just the Android platform. And they used some number, like that had, double the amount of compute they had, just to add voice for free, to the Android platform. Now even for Google, that's a nontrivial exercise. So as Chris was saying, I think we have to again, flip it on its' head and say, "How much can we put "at the edge of the network?" Because think about these phones. I mean, even your fridge and microwave, right? We put a man on the moon with something that these days, we make for $89 at home, on the Raspberry Pie computer, right? And even that was 1,000 times more powerful. When we start looking at what's going into the chips, we've seen people build new, not even GPUs, but deep learning and stream analytics capable chips. Like Google, for example. That's going to make its' way into consumer products. So that, now the compute capacity in phones, is going to, I think transmogrify in some ways because there is some magic in there. To the point where, as Chris was saying, "We're going to have the smarts in our phone." And a lot of that workload is going to move closer to us. And only the metadata that we need to move is going to go centrally. >> Well here's the thing. The edge isn't the technology. The edge is actually the people. When you look at, for example, the MIT language Scratch. This is kids programming language. It's drag and drop. You know, kids can assemble really fun animations and make little movies. We're training them to build for IOT. Because if you look at a system like Node-RED, it's an IBM interface that is drag and drop. Your workflow is for IOT. And you can push that to a device. Scratch has a converter for doing those. So the edge is what those thousands and millions of kids who are learning how to code, learning how to think architecturally and algorithmically. What they're going to create that is beyond what any of us can possibly imagine. >> I'd like to add one other thing, as well. I think there's a topic we've got to start tabling. And that is what I refer to as the gravity of data. So when you think about how planets are formed, right? Particles of dust accrete. They form into planets. Planets develop gravity. And the reason we're not flying into space right now is that there's gravitational force. Even though it's one of the weakest forces, it keeps us on our feet. Oftentimes in organizations, I ask them to start thinking about, "Where is the center "of your universe with regard to the gravity of data." Because if you can follow the center of your universe and the gravity of your data, you can often, as Chris is saying, find where the business logic needs to be. And it could be that you got to think about a storage problem. You can think about a compute problem. You can think about a streaming analytics problem. But if you can find where the center of your universe and the center of your gravity for your data is, often you can get a really good insight into where you can start focusing on where the workloads are going to be where the smarts are going to be. Whether it's small, medium, or large. >> So this brings up the topic of data governance. One of the themes here at Fast Track Your Data is GDPR. What it means. It's one of the reasons, I think IBM selected Europe, generally, Munich specifically. So let's talk about GDPR. We had a really interesting discussion last night. So let's kind of recreate some of that. I'd like somebody in the panel to start with, what is GDPR? And why does it matter, Ronald? >> Yeah, maybe I can start. Maybe a little bit more in general unified governance. So if i talk to companies and I need to explain to them what's governance, I basically compare it with a crime scene. So in a crime scene if something happens, they start with securing all the evidence. So they start sealing the environment. And take care that all the evidence is collected. And on the other hand, you see that they need to protect this evidence. There are all kinds of policies. There are all kinds of procedures. There are all kinds of rules, that need to be followed. To take care that the whole evidence is secured well. And once you start, basically, investigating. So you have the crime scene investigators. You have the research lab. You have all different kind of people. They need to have consent before they can use all this evidence. And the whole reason why they're doing this is in order to collect the villain, the crook. To catch him and on the other hand, once he's there, to convict him. And we do this to have trust in the materials. Or trust in basically, the analytics. And on the other hand to, the public have trust in everything what's happened with the data. So if you look to a company, where data is basically the evidence, this is the value of your data. It's similar to like the evidence within a crime scene. But most companies don't treat it like this. So if we then look to GDPR, GDPR basically shifts the power and the ownership of the data from the company to the person that created it. Which is often, let's say the consumer. And there's a lot of paradox in this. Because all the companies say, "We need to have this customer data. "Because we need to improve the customer experience." So if you make it concrete and let's say it's 1st of June, so GDPR is active. And it's first of June 2018. And I go to iTunes, so I use iTunes. Let's go to iTunes said, "Okay, Apple please "give me access to my data." I want to see which kind of personal information you have stored for me. On the other end, I want to have the right to rectify all this data. I want to be able to change it and give them a different level of how they can use my data. So I ask this to iTunes. And then I say to them, okay, "I basically don't like you anymore. "I want to go to Spotify. "So please transfer all my personal data to Spotify." So that's possible once it's June 18. Then I go back to iTunes and say, "Okay, I don't like it anymore. "Please reduce my consent. "I withdraw my consent. "And I want you to remove all my "personal data for everything that you use." And I go to Spotify and I give them, let's say, consent for using my data. So this is a shift where you can, as a person be the owner of the data. And this has a lot of consequences, of course, for organizations, how to manage this. So it's quite simple for the consumer. They get the power, it's maturing the whole law system. But it's a big consequence of course for organizations. >> This is going to be a nightmare for marketers. But fill in some of the gaps there. >> Let's go back, so GDPR, the General Data Protection Regulation, was passed by the EU in 2016, in May of 2016. It is, as Ronald was saying, it's four basic things. The right to privacy. The right to be forgotten. Privacy built into systems by default. And the right to data transfer. >> Joe: It takes effect next year. >> It is already in effect. GDPR took effect in May of 2016. The enforcement penalties take place the 25th of May 2018. Now here's where, there's two things on the penalty side that are important for everyone to know. Now number one, GDPR is extra territorial. Which means that an EU citizen, anywhere on the planet has GDPR, goes with them. So say you're a pizza shop in Nebraska. And an EU citizen walks in, orders a pizza. Gives her the credit card and stuff like that. If you for some reason, store that data, GDPR now applies to you, Mr. Pizza shop, whether or not you do business in the EU. Because an EU citizen's data is with you. Two, the penalties are much stiffer then they ever have been. In the old days companies could simply write off penalties as saying, "That's the cost of doing business." With GDPR the penalties are up to 4% of your annual revenue or 20 million Euros, whichever is greater. And there may be criminal sanctions, charges, against key company executives. So there's a lot of questions about how this is going to be implemented. But one of the first impacts you'll see from a marketing perspective is all the advertising we do, targeting people by their age, by their personally identifiable information, by their demographics. Between now and May 25th 2018, a good chunk of that may have to go away because there's no way for you to say, "Well this person's an EU citizen, this person's not." People give false information all the time online. So how do you differentiate it? Every company, regardless of whether they're in the EU or not will have to adapt to it, or deal with the penalties. >> So Lillian, as a consumer this is designed to protect you. But you had a very negative perception of this regulation. >> I've looked over the GDPR and to me it actually looks like a socialist agenda. It looks like (panel laughs) no, it looks like a full assault on free enterprise and capitalism. And on its' face from a legal perspective, its' completely and wholly unenforceable. Because they're assigning jurisdictional rights to the citizen. But what are they going to do? They're going to go to Nebraska and they're going to call in the guy from the pizza shop? And call him into what court? The EU court? It's unenforceable from a legal perspective. And if you write a law that's unenforceable, you know, it's got to be enforceable in every element. It can't be just, "Oh, we're only "going to enforce it for Facebook and for Google. "But it's not enforceable for," it needs to be written so that it's a complete and actionable law. And it's not written in that way. And from a technological perspective it's not implementable. I think you said something like 652 EU regulators or political people voted for this and 10 voted against it. But what do they know about actually implementing it? Is it possible? There's all sorts of regulations out there that aren't possible to implement. I come from an environmental engineering background. And it's absolutely ridiculous because these agencies will pass laws that actually, it's not possible to implement those in practice. The cost would be too great. And it's not even needed. So I don't know, I just saw this and I thought, "You know, if the EU wants to," what they're essentially trying to do is regulate what the rest of the world does on the internet. And if they want to build their own internet like China has and police it the way that they want to. But Ronald here, made an analogy between data, and free enterprise, and a crime scene. Now to me, that's absolutely ridiculous. What does data and someone signing up for an email list have to do with a crime scene? And if EU wants to make it that way they can police their own internet. But they can't go across the world. They can't go to Singapore and tell Singapore, or go to the pizza shop in Nebraska and tell them how to run their business. >> You know, EU overreach in the post Brexit era, of what you're saying has a lot of validity. How far can the tentacles of the EU reach into other sovereign nations. >> What court are they going to call them into? >> Yeah. >> I'd like to weigh in on this. There are lots of unknowns, right? So I'd like us to focus on the things we do know. We've already dealt with similar situations before. In Australia, we introduced a goods and sales tax. Completely foreign concept. Everything you bought had 10% on it. No one knew how to deal with this. It was a completely new practice in accounting. There's a whole bunch of new software that had to be written. MYRB had to have new capability, but we coped. No one actually went to jail yet. It's decades later, for not complying with GST. So what it was, was a framework on how to shift from non sales tax related revenue collection. To sales tax related revenue collection. I agree that there are some egregious things built into this. I don't disagree with that at all. But I think if I put my slightly broader view of the world hat on, we have well and truly gone past the point in my mind, where data was respected, data was treated in a sensible way. I mean I get emails from companies I've never done business with. And when I follow it up, it's because I did business with a credit card company, that gave it to a service provider, that thought that I was going to, when I bought a holiday to come to Europe, that I might want travel insurance. Now some might say there's value in that. And other's say there's not, there's the debate. But let's just focus on what we're talking about. We're talking about a framework for governance of the treatment of data. If we remove all the emotive component, what we are talking about is a series of guidelines, backed by laws, that say, "We would like you to do this," in an ideal world. But I don't think anyone's going to go to jail, on day one. They may go to jail on day 180. If they continue to do nothing about it. So they're asking you to sort of sit up and pay attention. Do something about it. There's a whole bunch of relief around how you approach it. The big thing for me, is there's no get out of jail card, right? There is no get out of jail card for not complying. But there's plenty of support. I mean, we're going to have ambulance chasers everywhere. We're going to have class actions. We're going to have individual suits. The greatest thing to do right now is get into GDPR law. Because you seem to think data scientists are unicorn? >> What kind of life is that if there's ambulance chasers everywhere? You want to live like that? >> Well I think we've seen ad blocking. I use ad blocking as an example, right? A lot of organizations with advertising broke the internet by just throwing too much content on pages, to the point where they're just unusable. And so we had this response with ad blocking. I think in many ways, GDPR is a regional response to a situation where I don't think it's the exact right answer. But it's the next evolutional step. We'll see things evolve over time. >> It's funny you mentioned it because in the United States one of the things that has happened, is that with the change in political administrations, the regulations on what companies can do with your data have actually been laxened, to the point where, for example, your internet service provider can resell your browsing history, with or without your consent. Or your consent's probably buried in there, on page 47. And so, GDPR is kind of a response to saying, "You know what? "You guys over there across the Atlantic "are kind of doing some fairly "irresponsible things with what you allow companies to do." Now, to Lillian's point, no one's probably going to go after the pizza shop in Nebraska because they don't do business in the EU. They don't have an EU presence. And it's unlikely that an EU regulator's going to get on a plane from Brussels and fly to Topeka and say, or Omaha, sorry, "Come on Joe, let's get the pizza shop in order here." But for companies, particularly Cloud companies, that have offices and operations within the EU, they have to sit up and pay attention. So if you have any kind of EU operations, or any kind of fiscal presence in the EU, you need to get on board. >> But to Lillian's point it becomes a boondoggle for lawyers in the EU who want to go after deep pocketed companies like Facebook and Google. >> What's the value in that? It seems like regulators are just trying to create work for themselves. >> What about the things that say advertisers can do, not so much with the data that they have? With the data that they don't have. In other words, they have people called data scientists who build models that can do inferences on sparse data. And do amazing things in terms of personalization. What do you do about all those gray areas? Where you got machine learning models and so forth? >> But it applies-- >> It applies to personally identifiable information. But if you have a talented enough data scientist, you don't need the PII or even the inferred characteristics. If a certain type of behavior happens on your website, for example. And this path of 17 pages almost always leads to a conversion, it doesn't matter who you are or where you're coming from. If you're a good enough data scientist, you can build a model that will track that. >> Like you know, target, infer some young woman was pregnant. And they inferred correctly even though that was never divulged. I mean, there's all those gray areas that, how can you stop that slippery slope? >> Well I'm going to weigh in really quickly. A really interesting experiment for people to do. When people get very emotional about it I say to them, "Go to Google.com, "view source, put it in seven point Courier "font in Word and count how many pages it is." I guess you can't guess how many pages? It's 52 pages of seven point Courier font, HTML to render one logo, and a search field, and a click button. Now why do we need 52 pages of HTML source code and Java script just to take a search query. Think about what's being done in that. It's effectively a mini operating system, to figure out who you are, and what you're doing, and where you been. Now is that a good or bad thing? I don't know, I'm not going to make a judgment call. But what I'm saying is we need to stop and take a deep breath and say, "Does anybody need a 52 page, "home page to take a search query?" Because that's just the tip of the iceberg. >> To that point, I like the results that Google gives me. That's why I use Google and not Bing. Because I get better search results. So, yeah, I don't mind if you mine my personal data and give me, our Facebook ads, those are the only ads, I saw in your article that GDPR is going to take out targeted advertising. The only ads in the entire world, that I like are Facebook ads. Because I actually see products I'm interested in. And I'm happy to learn about that. I think, "Oh I want to research that. "I want to see this new line of products "and what are their competitors?" And I like the targeted advertising. I like the targeted search results because it's giving me more of the information that I'm actually interested in. >> And that's exactly what it's about. You can still decide, yourself, if you want to have this targeted advertising. If not, then you don't give consent. If you like it, you give consent. So if a company gives you value, you give consent back. So it's not that it's restricting everything. It's giving consent. And I think it's similar to what happened and the same type of response, what happened, we had the Mad Cow Disease here in Europe, where you had the whole food chain that needed to be tracked. And everybody said, "No, it's not required." But now it's implemented. Everybody in Europe does it. So it's the same, what probably going to happen over here as well. >> So what does GDPR mean for data scientists? >> I think GDPR is, I think it is needed. I think one of the things that may be slowing data science down is fear. People are afraid to share their data. Because they don't know what's going to be done with it. If there are some guidelines around it that should be enforced and I think, you know, I think it's been said but as long as a company could prove that it's doing due diligence to protect your data, I think no one is going to go to jail. I think when there's, you know, we reference a crime scene, if there's a heinous crime being committed, all right, then it's going to become obvious. And then you do go directly to jail. But I think having guidelines and even laws around privacy and protection of data is not necessarily a bad thing. You can do a lot of data, really meaningful data science, without understanding that it's Joe Caserta. All of the demographics about me. All of the characteristics about me as a human being, I think are still on the table. All that they're saying is that you can't go after Joe, himself, directly. And I think that's okay. You know, there's still a lot of things. We could still cure diseases without knowing that I'm Joe Caserta, right? As long as you know everything else about me. And I think that's really at the core, that's what we're trying to do. We're trying to protect the individual and the individual's data about themselves. But I think as far as how it affects data science, you know, a lot of our clients, they're afraid to implement things because they don't exactly understand what the guideline is. And they don't want to go to jail. So they wind up doing nothing. So now that we have something in writing that, at least, it's something that we can work towards, I think is a good thing. >> In many ways, organizations are suffering from the deer in the headlight problem. They don't understand it. And so they just end up frozen in the headlights. But I just want to go back one step if I could. We could get really excited about what it is and is not. But for me, the most critical thing there is to remember though, data breaches are happening. There are over 1,400 data breaches, on average, per day. And most of them are not trivial. And when we saw 1/2 a billion from Yahoo. And then one point one billion and then one point five billion. I mean, think about what that actually means. There were 47,500 Mongodbs breached in an 18 hour window, after an automated upgrade. And they were airlines, they were banks, they were police stations. They were hospitals. So when I think about frameworks like GDPR, I'm less worried about whether I'm going to see ads and be sold stuff. I'm more worried about, and I'll give you one example. My 12 year old son has an account at a platform called Edmodo. Now I'm not going to pick on that brand for any reason but it's a current issue. Something like, I think it was like 19 million children in the world had their username, password, email address, home address, and all this social interaction on this Facebook for kids platform called Edmodo, breached in one night. Now I got my hands on a copy. And everything about my son is there. Now I have a major issue with that. Because I can't do anything to undo that, nothing. The fact that I was able to get a copy, within hours on a dark website, for free. The fact that his first name, last name, email, mobile phone number, all these personal messages from friends. Nobody has the right to allow that to breach on my son. Or your children, or our children. For me, GDPR, is a framework for us to try and behave better about really big issues. Whether it's a socialist issue. Whether someone's got an issue with advertising. I'm actually not interested in that at all. What I'm interested in is companies need to behave much better about the treatment of data when it's the type of data that's being breached. And I get really emotional when it's my son, or someone else's child. Because I don't care if my bank account gets hacked. Because they hedge that. They underwrite and insure themselves and the money arrives back to my bank. But when it's my wife who donated blood and a blood donor website got breached and her details got lost. Even things like sexual preferences. That they ask questions on, is out there. My 12 year old son is out there. Nobody has the right to allow that to happen. For me, GDPR is the framework for us to focus on that. >> Dave: Lillian, is there a comment you have? >> Yeah, I think that, I think that security concerns are 100% and definitely a serious issue. Security needs to be addressed. And I think a lot of the stuff that's happening is due to, I think we need better security personnel. I think we need better people working in the security area where they're actually looking and securing. Because I don't think you can regulate I was just, I wanted to take the microphone back when you were talking about taking someone to jail. Okay, I have a background in law. And if you look at this, you guys are calling it a framework. But it's not a framework. What they're trying to do is take 4% of your business revenues per infraction. They want to say, "If a person signs up "on your email list and you didn't "like, necessarily give whatever "disclaimer that the EU said you need to give. "Per infraction, we're going to take "4% of your business revenue." That's a law, that they're trying to put into place. And you guys are talking about taking people to jail. What jail are you? EU is not a country. What jurisdiction do they have? Like, you're going to take pizza man Joe and put him in the EU jail? Is there an EU jail? Are you going to take them to a UN jail? I mean, it's just on its' face it doesn't hold up to legal tests. I don't understand how they could enforce this. >> I'd like to just answer the question on-- >> Security is a serious issue. I would be extremely upset if I were you. >> I personally know, people who work for companies who've had data breaches. And I respect them all. They're really smart people. They've got 25 plus years in security. And they are shocked that they've allowed a breach to take place. What they've invariably all agreed on is that a whole range of drivers have caused them to get to a bad practice. So then, for example, the donate blood website. The young person who was assist admin with all the right skills and all the right experience just made a basic mistake. They took a db dump of a mysql database before they upgraded their Wordpress website for the business. And they happened to leave it in a folder that was indexable by Google. And so somebody wrote a radio expression to search in Google to find sql backups. Now this person, I personally respect them. I think they're an amazing practitioner. They just made a mistake. So what does that bring us back to? It brings us back to the point that we need a safety net or a framework or whatever you want to call it. Where organizations have checks and balances no matter what they do. Whether it's an upgrade, a backup, a modification, you know. And they all think they do, but invariably we've seen from the hundreds of thousands of breaches, they don't. Now on the point of law, we could debate that all day. I mean the EU does have a remit. If I was caught speeding in Germany, as an Australian, I would be thrown into a German jail. If I got caught as an organization in France, breaching GDPR, I would be held accountable to the law in that region, by the organization pursuing me. So I think it's a bit of a misnomer saying I can't go to an EU jail. I don't disagree with you, totally, but I think it's regional. If I get a speeding fine and break the law of driving fast in EU, it's in the country, in the region, that I'm caught. And I think GDPR's going to be enforced in that same approach. >> All right folks, unfortunately the 60 minutes flew right by. And it does when you have great guests like yourselves. So thank you very much for joining this panel today. And we have an action packed day here. So we're going to cut over. The CUBE is going to have its' interview format starting in about 1/2 hour. And then we cut over to the main tent. Who's on the main tent? Dez, you're doing a main stage presentation today. Data Science is a Team Sport. Hillary Mason, has a breakout session. We also have a breakout session on GDPR and what it means for you. Are you ready for GDPR? Check out ibmgo.com. It's all free content, it's all open. You do have to sign in to see the Hillary Mason and the GDPR sessions. And we'll be back in about 1/2 hour with the CUBE. We'll be running replays all day on SiliconAngle.tv and also ibmgo.com. So thanks for watching everybody. Keep it right there, we'll be back in about 1/2 hour with the CUBE interviews. We're live from Munich, Germany, at Fast Track Your Data. This is Dave Vellante with Jim Kobielus, we'll see you shortly. (electronic music)

Published Date : Jun 24 2017

SUMMARY :

Brought to you by IBM. Really good to see you in Munich. a lot of people to organize and talk about data science. And so, I want to start with sort of can really grasp the concepts I present to them. But I don't know if there's anything you would add? So I'd love to take any questions you have how to get, turn data into value So one of the things, Adam, the reason I'm going to introduce Ronald Van Loon. And on the other hand I'm a blogger I met you on Twitter, you know, and the pace of change, that's just You're in the front lines, helping organizations, Trying to govern when you have And newest member of the SiliconANGLE Media Team. and data science are at the heart of it. It's funny that you excluded deep learning of the workflow of data science And I haven't seen the industry automation, in terms of the core And baking it right into the tools. that's really powering a lot of the rapid leaps forward. What's the distinction? It's like asking people to mine classifieds. to layer, and what you end up with the ability to do higher levels of abstraction. get the result, you also have to And I guess the last part is, Dave: So I'd like to switch gears a little bit and just generally in the community, And this means that it has to be brought on one end to, But Chris you have a-- Look at the major breaches of the last couple years. "I have to spend to protect myself, And that's the way I think about it. and the data are the models themselves. And I think that it's very undisciplined right now, So that you can sell more. And a lot of times they can't fund these transformations. But the first question I like to ask people And then figure out how you map data to it. And after the month, you check, kind of a data broker, the business case rarely So initially, indeed, they don't like to use the data. But do you have anything to add? and deploy it in more areas of the business. There's the whole issue of putting And it's a lot cheaper to store data And then start to build some fully is that the speed to value is just the data and someone else has to manage the problem. So, you know, think of it in terms on that theme, when you think about from IDC that says, "About 43% of the data all aircraft and all carriers have to be, most of the deep learning models like TensorFlow geared to IOT, I'm sorry, go ahead. I mean in the announcement of having "lift and shift to the Cloud." And only the metadata that we need And you can push that to a device. And it could be that you got to I'd like somebody in the panel to And on the other hand, you see that But fill in some of the gaps there. And the right to data transfer. a good chunk of that may have to go away So Lillian, as a consumer this is designed to protect you. I've looked over the GDPR and to me You know, EU overreach in the post Brexit era, But I don't think anyone's going to go to jail, on day one. And so we had this response with ad blocking. And so, GDPR is kind of a response to saying, a boondoggle for lawyers in the EU What's the value in that? With the data that they don't have. leads to a conversion, it doesn't matter who you are And they inferred correctly even to figure out who you are, and what you're doing, And I like the targeted advertising. And I think it's similar to what happened I think no one is going to go to jail. and the money arrives back to my bank. "disclaimer that the EU said you need to give. I would be extremely upset if I were you. And I think GDPR's going to be enforced in that same approach. And it does when you have great guests like yourselves.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
David Floyer	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ronald	PERSON	0.99+
Lillian Pierson	PERSON	0.99+
Dave	PERSON	0.99+
Lillian	PERSON	0.99+
Jim	PERSON	0.99+
Joe Caserta	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dez	PERSON	0.99+
Nebraska	LOCATION	0.99+
Adam	PERSON	0.99+
Europe	LOCATION	0.99+
Hillary Mason	PERSON	0.99+
87,400	QUANTITY	0.99+
Topeka	LOCATION	0.99+
Airbus	ORGANIZATION	0.99+
Thailand	LOCATION	0.99+
Brussels	LOCATION	0.99+
Australia	LOCATION	0.99+
EU	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
Dez Blanchfield	PERSON	0.99+
Chris Penn	PERSON	0.99+
Omaha	LOCATION	0.99+
Munich	LOCATION	0.99+
May of 2016	DATE	0.99+
May 25th 2018	DATE	0.99+
Sydney	LOCATION	0.99+
nine	QUANTITY	0.99+
Germany	LOCATION	0.99+
17 pages	QUANTITY	0.99+
Joe	PERSON	0.99+
80%	QUANTITY	0.99+
$89	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
France	LOCATION	0.99+
June 18	DATE	0.99+
83, 81,000	QUANTITY	0.99+
30 years	QUANTITY	0.99+
Ronald Van Loon	PERSON	0.99+
Google	ORGANIZATION	0.99+
USA	LOCATION	0.99+
thousands	QUANTITY	0.99+
2013	DATE	0.99+
one point	QUANTITY	0.99+
100%	QUANTITY	0.99+

Manish Goyal, IBM - IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany, it's theCUBE. Covering IBM, Fast Track Your Data, brought to you by IBM. >> We're back in Munich, Germany this is Fast Track Your Data and this is theCUBE, the leader in live tech coverage, we go out to the events. We extract a signal from the noise my name is Dave Vellante and I'm here with my co-host Jim Kobielus. We just came off of the main stage. IBM had a very choreographed, really beautiful, Kate Silverton was there of BBC Fame talking to various folks within the IBM community. IBM executives, practitioners, and quite a main stage production Jim. IBM always knows how to do it right. Manish Goyal here, he's the Director of Product Management for the Watson Data Platform. Something we covered on theCUBE extensively, that announcement last year in New York City. Manish welcome to theCUBE. >> Thank you for having me. >> Dave: So this is, it really was your signature moment back in last fall at Strata in New York City. We covered that, big announcement, lot of customers there. You guys demonstrated sort of the next generation of platform that you guys are announcing. >> Manish: That's right. >> So take us, bring us up to date. How's it going, where are we at, and what are you guys doing here? >> So, again thank you for having me. >> Dave: You're welcome. >> Let me take a minute to just let all the viewers know what is alternate about form. So the Watson Data Platform is our cloud analytics platform, and it's really three things. It's a set of composable data services, for ingest, analyze, processed. It's a set of tailor-made experiences for the different personas. Whether you are a data engineer, a business analyst, data scientist, or the steward. And connecting all of these, both of these is a set of data fabric, which is really the secret sauce. And think of this as being the governance layer that ensures that everything that we're doing, that everything that is being done by any of these personas is working on trusted data, and that the insights that are being generated can be trusted by the risk folks, the business folks, as they put the analytics into production. >> Dave: So just to review for our audience, there are a number of components to the Watson Data Platform. >> That's right, yep. >> Dave: There's the governance components you mentioned, there's the visualization, there's analytics. Now, many people criticized Watson Data Platform, they said oh it's just IBM putting a bunch of despaired products together, some acquisitions and then wrapping some services around it. When we talked to you guys in October, you said no, no, that's not the case. But can you affirm that? >> That is exactly right, that is not the case. It's not just us putting stuff together and calling it a new name, and think oh that's the platform, just a set of despaired services. That is absolutely not, and that's why I was emphasizing this common data fabric, right. I've got a couple of, let me sort of dive a little bit deeper into it. >> Sure, great. >> Manish: So the biggest problem that customers and data users in general complain about is, extremely hard to find data, right. The tools that they're working with are all siloed. So even if, you know, you and I are working on, you know our analytics projects, very hard for me to share what I'm working on with you, the environment that I am running on with you, et cetera. And this... The third piece is, a real issue with is the data that I'm working with trusted? Like can I actually believe that this is the best data that I can use, so that when I put something into production when I create my machine learning models I put them into my production environment. The risk guys are going to be fine with it, I'm going to be fine with it, I see the results that I'm getting. And so, getting this data fabric which is addressing these issues. One, it's addressing it first and foremost with a data catalog, a governance layer. So that it's very clear, irrespective, whether you're a data engineer, business analyst, data scientist or the data steward, from the CDO's office, you're all working off the same version of the truth, right. >> Jim: Manish is that something a DevOps platform, is it like DevOps for data science or for machine learning development or is it... How would you describe... Does that make sense? The automated release pipeline that's-- >> Manish: In a way yes. >> With the governance baked in? >> Yes, in a way that's one way to describe it. So that's one aspect right? Making sure that you're working with the trusted data, making it very easy to find the data, so that's sort of the governance aspect. The second piece that sort of really makes this a platform is that you're working off the same notion of a workspace, we call it a project. So, you may start out as a data engineer being asked yourself, take all these different data sources that are coming in and create and publish a data set that can be consumed for dashboarding, for data analysis whatever. And you're working on that in a project, now if you have a data science team that needs to be working on the same thing, you can just invite them to the same project. So they're working on the same thing, similarly to a business analyst, et cetera. And all of these results, and when we talk about governance it's not about just data sets, it's all analytical products. So it is the model that you're creating are being put back into the catalog and governed. Data flows-- >> It's model governance. >> Jim: Model governance, it's model governance? >> Exactly. >> And aiding governance. >> Manish: So it's a huge problem that customers have. I was just talking to a large insurance company yesterday, and they're question was, "What are you doing to make sure that I don't have to spend an enormous amount of time that I have to with the risk group, before I can put a model into production." Because they want complete lineage all the way back, saying "Okay you created this model, you're going to put it into production, whether it's for allowing credit card insurance, whatever your product is that you're selling. How do you make sure that there's no bias in the model that is created, can you show me the data set on which you trained it? And then when you re-trained it can you show me that data set?" So in case they're audited, that there's complete way to go back from the production model all the way back to the data set that was created. And which goes even further back from all the different data sources. Where it was cleansed, et cetera, the ETL, where it was published, and then picked up by data science team. So all of these things, putting it together with this data fabric. Governance being a huge, huge portion of that that goes across everything that we're doing. Giving these tailor-made experiences for the different business personas, oh sorry, the data personas, and just making it extremely simple for generating insights that can be trusted. So that is what we are trying to do with the Watson Data Platform. As, since last fall when we announced it, we have had a huge update on our data science experience, you heard a lot about that in the presentation this morning. As well as, all of our other cloud data services and the governance put forth. >> Dave: And that data science experience is embedded fundamental to the platform. >> It is, it is. >> Dave: You know I want to ask you about that. Because I don't know if you remember Jim and Manish, a few years ago, several years ago, Pivotal announced this thing called Chorus and it went, it was a collaboration platform and it really went nowhere. Now part of the reason it went nowhere was because it was early days, but also there wasn't the analytics solution underneath it. But a lot of people questioned, "Well do we really need to collaborate across those personas?" Again maybe they were immature at the time. So convince me that there's a need for that and that this is actually getting used in the world. >> There was an example, probably you've always seen the venn diagram or for data scientist, right? With all the different skills that they need, they are a unicorn, and there are no unicorns. It's extremely hard for our customers, in fact just finding really good data scientist is extremely hard. It's a very limited supply of that talent. So that's one thing right. So you can't find enough of these folks to scale out the level of analytics that is needed, if you want to use data for a comparative advantage. So that's one aspect right, of talent being a huge issue. The second aspect of it is you really do need specialized skill in data engineer. You don't want your PhD data scientist spending 60% of their time finding cleansing data. You have folks who really do that well and you want to enable them to work closely with the data science team. And you really do need business analyst who are the key to sort of understanding the business problem that needs to be solved, because that's where you always want to start any analytics product. What is it that you're trying to improve, or reduce cost on, or whatever your problem is that you're addressing. And so you really need, it is a team sport. You can't just do it without. Now if it is a team sport, how are these folks going to collaborate, right? And that is why, in all of our interactions with our customers and their data science teams. They absolutely love the collaboration features that we have put in, and we have put in a lot of effort in data science experience and the same collaboration features are actually going to extend across the portfolio of these experiences on the data platform. >> And the whole notion of personas is so fundamental to Watson Data Platform. And I'm wondering, is IBM evolving the range and variety of personas for which you're providing these experiences? And what I mean by that is, examples, we see more and more data science application development projects focusing on for example, chat bots. That involves human conversation, you need a bit more, possibly a persona, a computational linguist. Or cognitive IoT, like Watson, you know IoT, that's sensors, that's hardware devices maybe hardware engineers, hardware engineering experiences. You see what I'm getting at is that data science centric projects are increasingly moving from the totally virtual world, to being very much embedding in the physical world and the world of human guided, machine learning guided conversation. What are your thoughts about evolving the personas mix? >> So application, application developers, or the persona I actually missed when I was talking about this before, it's absolutely central because almost anything that the data science team is doing is going to create, at the end of the day, sort of create models. But the hope is that it's going to put into production system. And that job typically is the role of an application developer. Now, Jim you mentioned sort of, there's a lot of emphasis these days on conversational chat bots. And again, at the end of the day with data science projects you are in many ways, trying to improve the experience that you're giving your customers. Or personalizing the experience that you're giving your customers. A celebrity experience that Rob talked about this morning. And there are other personas involved in that sense, so to get a chat bot right, I mean there is data that you can obviously harvest and use to create that flow, an intelligence in chat bot. But there are elements where you do need a subject matter expert to curate that. To make sure that it doesn't seem robotic, that it does feel genuine. And so there is a role for a subject matter expert, we sort of collaborate with a business analyst role, or persona. But yes, all of these roles play an important part in sort of putting together the entire package. It just feels seamless, and that's why I sort of come back to saying that it is a team sport and if you do not enable the teams to work closely together, and enhance their productivity, you can go after all the data that's being generated and all the opportunity that data is presenting. And the prize is to gain a competitive advantage. >> Dave: One of the things Manish, you demonstrated last fall was this sort of, it was sort of a recommendation engine and very personalized. And it was quite a nice demo and it wasn't a fake demo from what I understood, it was real data. Can you share with us in the time we have remaining, just some of your favorite examples of how people are applying the Watson Data Platform and affecting business? >> Manish: Sure yeah so, I'll tell you a couple of examples. So I was actually in London earlier this week, meeting with a customer and they are using DSX, our data science experience, with a couple of utility companies. One is a water company, water utility company. And the problem that they're trying to solve is, they're supplying water in a hilly area and they want to optimize the power that they use to power the pumps to pump out water. Because it can be very expensive if the pumps are running all the time, et cetera. And so they're using data science experience to optimize when and how, and how long the pumps need to run to enable that the customers are happy with the level of water supply that they're getting and the force that they're getting it with. While the utility company is optimizing the expense in actually powering these things. So that's just a recent example that comes to mind. There are others, there's a logistics, huge logistics in transportation company who's using data science experience to optimize how the refrigeration of the storage units that are going all across the globe for transporting sort of food and other articles like that. How they can optimize the temperature of the goods that they're transporting, again to make sure that there's absolutely the minimum amount of wastage that occurs in the transportation process. But at the same time optimizing the cost that they incur, because all of that sort of shows up in the end product that you and I buy from retailers. >> Dave: And is there instrumentation in the field involved in that? Is that kind of a semi-IoT example? >> Absolutely, right, so in this case, actually both of these cases, in one case there are smart meters that are throwing out data every 15 minutes. In the other example of the logistics one, it is data that is almost streaming coming in. So in one case you can use batch processing, even though it's coming in at a 15 minute intervals, to predict out what you want to do. In the other case it's streaming data, which you want to analyze as it streams. >> Excellent, alright well exciting times here for you and your group. >> Absolutely >> Dave: Congratulations on getting the product out and getting it adopted. >> Thank you. >> Glad to see that. And thanks for coming on theCUBE. >> Manish: Thank you. Thanks for having me. >> Alright! >> Dave: Keep it right there everybody. Jim and I will be back, we're live from Munich, Germany, unscripted, bringing theCUBE to you. Bringing Fast Track Your Data. We'll be right back. (techno music)

Published Date : Jun 24 2017

SUMMARY :

brought to you by IBM. for the Watson Data Platform. platform that you guys are announcing. and what are you guys doing here? So the Watson Data Platform is our cloud analytics platform, Dave: So just to review for our audience, Dave: There's the governance components you mentioned, That is exactly right, that is not the case. Manish: So the biggest problem that customers Jim: Manish is that something a DevOps platform, So it is the model that you're creating all the way back, saying "Okay you created this model, Dave: And that data science experience is embedded and that this is actually getting used in the world. the business problem that needs to be solved, and the world of human guided, And the prize is to gain a competitive advantage. Dave: One of the things Manish, and how long the pumps need to run to enable that to predict out what you want to do. for you and your group. Dave: Congratulations on getting the product out Glad to see that. Manish: Thank you. Dave: Keep it right there everybody.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Kate Silverton	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
London	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
Rob	PERSON	0.99+
Manish	PERSON	0.99+
October	DATE	0.99+
third piece	QUANTITY	0.99+
both	QUANTITY	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
60%	QUANTITY	0.99+
Manish Goyal	PERSON	0.99+
last year	DATE	0.99+
last fall	DATE	0.99+
15 minute	QUANTITY	0.99+
second piece	QUANTITY	0.99+
Pivotal	ORGANIZATION	0.99+
one aspect	QUANTITY	0.99+
second aspect	QUANTITY	0.99+
one case	QUANTITY	0.99+
Munich, Germany	LOCATION	0.98+
2017	DATE	0.98+
earlier this week	DATE	0.98+
several years ago	DATE	0.98+
BBC Fame	ORGANIZATION	0.96+
DevOps	TITLE	0.96+
one thing	QUANTITY	0.96+
three things	QUANTITY	0.96+
first	QUANTITY	0.96+
One	QUANTITY	0.94+
this morning	DATE	0.93+
few years ago	DATE	0.93+
Strata	ORGANIZATION	0.92+
Germany	LOCATION	0.9+
one way	QUANTITY	0.9+
Watson Data Platform	ORGANIZATION	0.83+
Watson Data Platform	TITLE	0.81+
DSX	ORGANIZATION	0.8+
every 15 minutes	QUANTITY	0.76+
Chorus	ORGANIZATION	0.75+
theCUBE	ORGANIZATION	0.73+
Data Platform	TITLE	0.68+
CDO	ORGANIZATION	0.62+
Watson	TITLE	0.58+
couple	QUANTITY	0.55+
Watson	ORGANIZATION	0.49+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Rob Bearden, Hortonworks & Rob Thomas, IBM Analytics - #DataWorks - #theCUBE

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi, welcome to theCUBE. We are live in San Jose, in the heart of Silicon Valley at the DataWorks Summit, day one. I'm Lisa Martin, with my co-host, George Gilbert. And we're very excited to be talking to two Robs. With Rob squared on the program this morning. Rob Bearden, the CEO of Hortonworks. Welcome, Rob. >> Thank you for having us. >> And Rob Thomas, the VP, GM rather, of IBM Analytics. So, guys, we just came from this really exciting, high energy keynote. The laser show was fantastic, but one of the great things, Rob, that you kicked off with was really showing the journey that Hortonworks has been on, and in a really pretty short period of time. Tremendous inertia, and you talked about the four mega-trends that are really driving enterprises to modernize their data architecture. Cloud, IOT, streaming data, and the fourth, next leg of this is data science. Data science, you said, will be the transformational next leg in the journey. Tell our viewers a little bit more about that. What does that mean for Hortonworks and your partnership with IBM? >> Well, what I think what IBM and Hortonworks now have the ability to do is to bring all the data together across a connected data platform. The data in motion, the data at rest, now have in one common platform, irrespective of the deployment architecture, whether it's on prim across multiple data centers or whether deployed in the cloud. And now that the large volume of data and we have access to it, we can now start to begin to drive the analytics in the end as that data moves through each phase of its life cycle. And what really happens now, is now that we have visibility and access to the inclusive life cycle of the data we can now put a data science framework over that to really now understand and learn those patterns and what's the data telling us, what's the pattern behind that. And we can bring simplification to the data science and turn data science actually into a team sport. Allow them to collaborate, allow them to have access to it. And sort of take the black magic out of doing data science with the framework of the tool and the power of DSX on top of the connected data platform. Now we can advance rapidly the insights in the end of the data and what that really does is drive value really quickly back into the customer. And then we can then begin to bring smart applications via the data science back into the enterprise. So we can now do things like connected car in real time, and have connected car learn as it's moving and through all the patterns, we can now, from a retail standpoint really get smart and accurate about inventory placement and inventory management. From an industrial standpoint, we know in real time, down to the component, what's happening with the machine, and any failures that may happen and be able to eliminate downtime. Agriculture, same kind of... Healthcare, every industry, financial services, fraud detection, money laundering advances that we have but it's all going to be attributable to how machine learning is applied and the DSX platform is the best platform in the world to do that with. >> And one of the things that I thought was really interesting, was that, as we saw enterprises start to embrace Hadoop and Big Data and Segano this needs to co-exist and inter-operate with our traditional applications, our traditional technologies. Now you're saying and seeing data science is going to be strategic business differentiator. You mentioned a number of industries, and there were several of them on stage today. Give us some, maybe some, one of your favorite examples of one of your customers leveraging data science and driving a pretty significant advantage for their business. >> Sure. Yeah, well, to step back a little bit, just a little context, only ten companies have out performed the S&P 500 in each of the last five years. We start looking at what are they doing. Those are companies that have decided data science and machine learning is critical. They've made a big bet on it, and every company needs to be doing that. So a big part of our message today was, kind of, I'd say, open the eyes of everybody to say there is something happening in the market right now. And it can make a huge difference in how you're applying data analytics to improve your business. We announced our first focus on this back in February, and one of our clients that spoke at that event is a company called Argus Healthcare. And Argus has massive amounts of data, sitting on a mainframe, and they were looking for how can we unleash that to do better care of patients, better care for our hospital networks, and they did that with data they had in their mainframe. So they brought data science experience and machine learning to their mainframe, that's what they talked about. What Rob and I have announced today is there's another great trove of data in every organization which is the data inside Hadoop. HDP, leading distribution for that, is a great place to start. So the use case that I just shared, which is on the mainframe, that's going to apply anywhere where there's large amounts of data. And right now there's not a great answer for data science on Hadoop, until today, where data science experience plus HDP brings really, I'd say, an elegant approach to it. It makes it a team sport. You can collaborate, you can interact, you can get education right in the platform. So we have the opportunity to create a next generation of data scientists working with data and HDP. That's why we're excited. >> Let me follow up with this question in your intro that, in terms of sort of the data science experience as this next major building block, to extract, or to build on the value from the data lake, the two companies, your two companies have different sort of, better markets, especially at IBM, but the industry solutions and global business services, you guys can actually build semi-custom solutions around this platform, both the data and the data science experience. With Hortonworks, what are those, what's your go to market motion going to look like and what are the offerings going to look like to the customer? >> They'll be several. You just described a great example, with IBM professional services, they have the ability to take those industry templates and take these data science models and instantly be able to bring those to the data, and so as part of our joint go to market motion, we'll be able now partner, bring those templates, bring those models to not only our customer base, but also part of the new sales go to market motion in the light space, in new customer opportunities and the whole point is, now we can use the enterprise data platforms to bring the data under management in a mission critical way that then bring value to it through these kinds of use case and templates that drive the smart applications into quick time to value. And just increase that time to value for the customers. >> So, how would you look at the mix changing over time in terms of data scientists working with the data to experiment on the model development and the two hard parts that you talked about, data prep and operationalization. So in other words, custom models, the issue of deploying it 11 months later because there's no real process for that that's packaged, and then packaged enterprise apps that are going to bake these models in as part of their functionality that, you know, the way Salesforce is starting to do and Workday is starting to do. How does that change over time? >> It'll be a layering effect. So today, we now have the ability to bring through the connected data platforms all the data under management in a mission critical manner from point of origination through the entire stream till it comes at rest. Now with the data science, through DSX, we can now, then, have that data science framework to where, you know, the analogy I would say, is instead of it being a black science of how you do data access and go through and build the models and determine what the algorithms are and how that yields a result, the analogy is you don't have to be a mechanic to drive a car anymore. The common person can drive a car. So, now we really open up the community business analyst that can now participate and enable data science through collaboration and then we can take those models and build the smart apps and evolve the smart apps that go to that very rapidly and we can accelerate that process also now through the partnership with IBM and bringing their core domain and value that, drivers that they've already built and drop that into the DSX environments and so I think we can accelerate the time to value now much faster and efficient than we've ever been able to do before. >> You mentioned teamwork a number of times, and I'm curious about, you also talked about the business analyst, what's the governance like to facilitate business analysts and different lines of business that have particular access? And what is that team composed of? >> Yeah, well, so let's look at what's happening in the big enterprises in the world right now. There's two major things going one. One is everybody's recognizing this is a multi-cloud world. There's multiple public cloud options, most clients are building a private cloud. They need a way to manage data as a strategic asset across all those multiple cloud environments. The second piece is, we are moving towards, what I would call, the next generation data fabric, which is your warehousing capabilities, your database capabilities, married with Hadoop, married with other open source data repositories and doing that in a seamless fashion. So you need a governance strategy for all of that. And the way I describe governance, simple analogy, we do for data what libraries do for books. Libraries create a catalog of books, they know they have different copies of books, some they archive, but they can access all of the intelligence in the library. That's what we do for data. So when we talk about governance and working together, we're both big supporters of the Atlas project, that will continue, but the other piece, kind of this point around enterprise data fabric is what we're doing with Big SQL. Big SQL is the only 100% ANSI-SQL compliant SQL engine for data across Hadoop and other repositories. So we'll be working closely together to help enterprises evolve in a multi-cloud world to this enterprise data fabric and Big SQL's a big capability for that. >> And an immediate example of that is in our EDW optimization suite that we have today we be loading Big SQL as the platform to do the complex query sector of that. That will go to market with almost immediately. >> Follow up question on the governance, there's, to what extent is end to end governance, meaning from the point of origin through the last mile, you know, if the last mile might be some specialized analytic engine, versus having all the data management capabilities in that fabric, you mentioned operational and analytic, so, like, are customers going to be looking for a provider who can give them sort of end to end capabilities on both the governance side and on all the data management capabilities? Is that sort of a critical decision? >> I believe so. I think there's really two use cases for governance. It's either insights or it's compliance. And if you're focus is on compliance, something like GDPR, as an example, that's really about the life cycle of data from when it starts to when it can be disposed of. So for compliance use case, absolutely. When I say insights as a governance use case, that's really about self-service. The ideal world is you can make your data available to anybody in your organization, knowing that they have the right permissions, that they can access, that they can do it in a protected way and most companies don't have that advantage today. Part of the idea around data science on HDP is if you've got the right governance framework in place suddenly you can enable self-service which is any data scientist or any business analyst can go find and access the data they need. So it's a really key part of delivering on data science, is this governance piece. Now I just talked to clients, they understand where you're going. Is this about compliance or is this about insights? Because there's probably a different starting point, but the end game is similar. >> Curious about your target markets, Tyler talked about the go to market model a minute ago, are you targeting customers that are on mainframes? And you said, I think, in your keynote, 90% of transactional data is in a mainframe. Is that one of the targets, or is it the target, like you mention, Rob, with the EDW optimization solution, are you working with customers who have an existing enterprise data warehouse that needs to be modernized, is it both? >> The good news is it's both. It's about, really the opportunity and mission, is about enabling the next generation data architecture. And within that is again, back to the layering approach, is being able to bring the data under management from point of origination through point of it reg. Now if we look at it, you know, probably 90% of, at least transactional data, sits in the mainframe, so you have to be able to span all data sets and all deployment architectures on prim multi-data center as well as public cloud. And that then, is the opportunity, but for that to then drive value ultimately back, you've got to be able to have then the simplification of the data science framework and toolset to be able to then have the proper insights and basis on which you can bring the new smart applications. And drive the insights, drive the governance through the entire life cycle. >> On the value front, you know, we talk about, and Hortonworks talks about, the fact that this technology can really help a business unlock transformational value across their organization, across lines of business. This conversation, we just talked about a couple of the customer segments, is this a conversation that you're having at the C-suite initially? Where are the business leaders in terms of understanding? We know there's more value here, we probably can open up new business opportunities or are you talking more the data science level? >> Look, it's at different levels. So, data science, machined learning, that is a C-suite topic. A lot of times I'm not sure the audience knows what they're asking for, but they know it's important and they know they need to be doing something. When you go to things like a data architecture, the C-suite discussion there is, I just want to become more productive in how I'm deploying and using technology because my IT budget's probably not going up, if anything it may be going down, so I've got to become a lot more productive and efficient to do that. So it depends on who you're talking to, there's different levels of dialogue. But there's no question in my mind, I've seen, you know, just look at major press Financial Times, Wallstreet Journal last year. CEOs are talking about AI, machine learning, using data as a competitive weapon. It is happening and it's happening right now. What we're doing together, saying how do we make data simple and accessible? How do we make getting there really easy? Because right now it's pretty hard. But we think with the combination of what we're bringing, we make it pretty darn easy. >> So one quick question following up on that, and then I think we're getting close to the end. Which is when the data lakes started out, it was sort of, it seemed like, for many customers a mandate from on high, we need a big data strategy, and that translated into standing up a Hadoop cluster, and that resulted in people realizing that there's a lot to manage there. It sounds like, right now people know machine learning is hot so they need to get data science tools in place, but is there a business capability sort of like the ETL offload was for the initial Hadoop use cases, where you would go to a customer and recommend do this, bite this off as something concrete? >> I'll start and then Rob can comment. Look, the issue's not Hadoop, a lot of clients have started with it. The reason there hasn't been, in some cases, the outcomes they wanted is because just putting data into Hadoop doesn't drive an outcome. What drives an outcome is what do you do with it. How do you change your business process, how do you change what the company's doing with the data, and that's what this is about, it's kind of that next step in the evolution of Hadoop. And that's starting to happen now. It's not happening everywhere, but we think this will start to propel that discussion. Any thoughts you had, Rob? >> Spot on. Data lake was about releasing the constraints of all the silos and being able to bring those together and aggregate that data. And it was the first basis for being able to have a 360 degree or wholistic centralized insight about something and, or pattern, but what then data science does is it actually accelerates those patterns and those lessons learned and the ability to have a much more detailed and higher velocity insight that you can react to much faster, and actually accelerate the business models around this aggregate. So it's a foundational approach with Hadoop. And it's then, as I mentioned in the keynote, the data science platforms, machine learning, and AI actually is what is the thing that transformationally opens up and accelerates those insights, so then new models and patterns and applications get built to accelerate value. >> Well, speaking of transformation, thank you both so much for taking time to share your transformation and the big news and the announcements with Hortonworks and IBM this morning. Thank you Rob Bearden, CEO of Hortonworks, Rob Thomas, General Manager of IBM Analytics. I'm Lisa Martin with my co-host, George Gilbert. Stick around. We are live from day one at DataWorks Summit in the heart of Silicon Valley. We'll be right back. (tech music)

Published Date : Jun 13 2017

SUMMARY :

brought to you by Hortonworks. We are live in San Jose, in the heart of Silicon Valley and the fourth, next leg of this is data science. now have the ability to do And one of the things and every company needs to be doing that. and the data science experience. that drive the smart applications into quick time to value. and the two hard parts that you talked about, and drop that into the DSX environments and doing that in a seamless fashion. in our EDW optimization suite that we have today and most companies don't have that advantage today. Tyler talked about the go to market model a minute ago, but for that to then drive value ultimately back, On the value front, you know, we talk about, and they know they need to be doing something. that there's a lot to manage there. it's kind of that next step in the evolution of Hadoop. and the ability to have a much more detailed and the announcements with Hortonworks and IBM this morning.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Argus	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
Tyler	PERSON	0.99+
February	DATE	0.99+
two companies	QUANTITY	0.99+
second piece	QUANTITY	0.99+
Argus Healthcare	ORGANIZATION	0.99+
last year	DATE	0.99+
360 degree	QUANTITY	0.99+
GDPR	TITLE	0.99+
one	QUANTITY	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
both	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
ten companies	QUANTITY	0.99+
two	QUANTITY	0.99+
fourth	QUANTITY	0.99+
today	DATE	0.99+
two hard parts	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
11 months later	DATE	0.98+
each	QUANTITY	0.98+
two use cases	QUANTITY	0.97+
100%	QUANTITY	0.97+
one quick question	QUANTITY	0.97+
Segano	ORGANIZATION	0.97+
SQL	TITLE	0.96+
four mega-trends	QUANTITY	0.96+
Big SQL	TITLE	0.96+
first basis	QUANTITY	0.94+
one common platform	QUANTITY	0.94+
two major things	QUANTITY	0.92+
Robs	PERSON	0.92+
Wallstreet Journal	ORGANIZATION	0.92+
Financial Times	ORGANIZATION	0.92+

Jean Francois Puget, IBM | IBM Machine Learning Launch 2017

>> Announcer: Live from New York, it's theCUBE, covering the IBM machine learning launch event. Brought to you by IBM. Now, here are your hosts, Dave Vellante and Stu Miniman. >> Alright, we're back. Jean Francois Puget is here, he's the distinguished engineer for machine learning and optimization at IBM analytics, CUBE alum. Good to see you again. >> Yes. >> Thanks very much for coming on, big day for you guys. >> Jean Francois: Indeed. >> It's like giving birth every time you guys give one of these products. We saw you a little bit in the analyst meeting, pretty well attended. Give us the highlights from your standpoint. What are the key things that we should be focused on in this announcement? >> For most people, machine learning equals machine learning algorithms. Algorithms, when you look at newspapers or blogs, social media, it's all about algorithms. Our view that, sure, you need algorithms for machine learning, but you need steps before you run algorithms, and after. So before, you need to get data, to transform it, to make it usable for machine learning. And then, you run algorithms. These produce models, and then, you need to move your models into a production environment. For instance, you use an algorithm to learn from past credit card transaction fraud. You can learn models, patterns, that correspond to fraud. Then, you want to use those models, those patterns, in your payment system. And moving from where you run the algorithm to the operation system is a nightmare today, so our value is to automate what you do before you run algorithms, and then what you do after. That's our differentiator. >> I've had some folks in theCUBE in the past have said years ago, actually, said, "You know what, algorithms are plentiful." I think he made the statement, I remember my friend Avi Mehta, "Algorithms are free. "It's what you do with them that matters." >> Exactly, that's, I believe in autonomy that open source won for machine learning algorithms. Now the future is with open source, clearly. But it solves only a part of the problem you're facing if you want to action machine learning. So, exactly what you said. What do you do with the results of algorithm is key. And open source people don't care much about it, for good reasons. They are focusing on producing the best algorithm. We are focusing on creating value for our customers. It's different. >> In terms of, you mentioned open source a couple times, in terms of customer choice, what's your philosophy with regard to the various tooling and platforms for open source, how do you go about selecting which to support? >> Machine learning is fascinating. It's overhyped, maybe, but it's also moving very quickly. Every year there is a new cool stuff. Five years ago, nobody spoke about deep learning. Now it's everywhere. Who knows what will happen next year? Our take is to support open source, to support the top open source packages. We don't know which one will win in the future. We don't know even if one will be enough for all needs. We believe one size does not fit all, so our take is support a curated list of mid-show open source. We start with Spark ML for many reasons, but we won't stop at Spark ML. >> Okay, I wonder if we can talk use cases. Two of my favorite, well, let's just start with fraud. Fraud has become much, much better over the past certainly 10 years, but still not perfect. I don't know if perfection is achievable, but lot of false positives. How will machine learning affect that? Can we expect as consumers even better fraud detection in more real time? >> If we think of the full life cycle going from data to value, we will provide a better answer. We still use machine learning algorithm to create models, but a model does not tell you what to do. It will tell you, okay, for this credit card transaction coming, it has a high probability to be fraud. Or this one has a lower priority, uh, probability. But then it's up to the designer of the overall application to make decisions, so what we recommend is to use machine learning data prediction but not only, and then use, maybe, (murmuring). For instance, if your machine learning model tells you this is a fraud with a high probability, say 90%, and this is a customer you know very well, it's a 10-year customer you know very well, then you can be confident that it's a fraud. Then if next fraud tells you this is 70% probability, but it's a customer since one week. In a week, we don't know the customer, so the confidence we can get in machine learning should be low, and there you will not reject the transaction immediately. Maybe you will enter, you don't approve it automatically, maybe you will send a one-time passcode, or you enter a serve vendor system, but you don't reject it outright. Really, the idea is to use machine learning predictions as yet another input for making decisions. You're making decision informed on what you could learn from your past. But it's not replacing human decision-making. Our approach with IBM, you don't see IBM speak much about artificial intelligence in general because we don't believe we're here to replace humans. We're here to assist humans, so we say, augmented intelligence or assistance. That's the role we see for machine learning. It will give you additional data so that you make better decisions. >> It's not the concept that you object to, it's the term artificial intelligence. It's really machine intelligence, it's not fake. >> I started my career as a PhD in artificial intelligence, I won't say when, but long enough. At that time, there were already promise that we have Terminator in the next decade and this and that. And the same happened in the '60s, or it was after the '60s. And then, there is an AI winter, and we have a risk here to have an AI winter because some people are just raising red flags that are not substantiated, I believe. I don't think that technology's here that we can replace human decision-making altogether any time soon, but we can help. We can certainly make some proficient, more efficient, more productive with machine learning. >> Having said that, there are a lot of cognitive functions that are getting replaced, maybe not by so-called artificial intelligence, but certainly by machines and automation. >> Yes, so we're automating a number of things, and maybe we won't need to have people do quality check and just have an automated vision system detect defects. Sure, so we're automating more and more, but this is not new, it has been going on for centuries. >> Well, the list evolved. So, what can humans do that machines can't, and how would you expect that to change? >> We're moving away from IMB machine learning, but it is interesting. You know, each time there is a capacity that a machine that will automate, we basically redefine intelligence to exclude it, so you know. That's what I foresee. >> Yeah, well, robots a while ago, Stu, couldn't climb stairs, and now, look at that. >> Do we feel threatened because a robot can climb a stair faster than us? Not necessarily. >> No, it doesn't bother us, right. Okay, question? >> Yeah, so I guess, bringing it back down to the solution that we're talking about today, if I now am doing, I'm doing the analytics, the machine learning on the mainframe, how do we make sure that we don't overrun and blow out all our MIPS? >> We recommend, so we are not using the mainframe base compute system. We recommend using ZIPS, so additional calls to not overload, so it's a very important point. We claim, okay, if you do everything on the mainframe, you can learn from operational data. You don't want to disturb, and you don't want to disturb takes a lot of different meanings. One that you just said, you don't want to slow down your operation processings because you're going to hurt your business. But you also want to be careful. Say we have a payment system where there is a machine learning model predicting fraud probability, a part of the system. You don't want a young bright data scientist decide that he had a great idea, a great model, and he wants to push his model in production without asking anyone. So you want to control that. That's why we insist, we are providing governance that includes a lot of things like keeping track of how models were created from which data sets, so lineage. We also want to have access control and not allow anyone to just deploy a new model because we make it easy to deploy, so we want to have a role-based access and only someone someone with some executive, well, it depends on the customer, but not everybody can update the production system, and we want to support that. And that's something that differentiates us from open source. Open source developers, they don't care about governance. It's not their problem, but it is our customer problem, so this solution will come with all the governance and integrity constraints you can expect from us. >> Can you speak to, first solution's going to be on z/OS, what's the roadmap look like and what are some of those challenges of rolling this out to other private cloud solutions? >> We are going to shape this quarter IBM machine learning for Z. It starts with Spark ML as a base open source. This is not, this is interesting, but it's not all that is for machine learning. So that's how we start. We're going to add more in the future. Last week we announced we will shape Anaconda, which is a major distribution for Python ecosystem, and it includes a number of machine learning open source. We announced it for next quarter. >> I believe in the press release it said down the road things like TensorFlow are coming, H20. >> But Anaconda will announce for next quarter, so we will leverage this when it's out. Then indeed, we have a roadmap to include major open source, so major open source are the one from Anaconda (murmuring), mostly. Key deep learning, so TensorFlow and probably one or two additional, we're still discussing. One that I'm very keen on, it's called XGBoost in one word. People don't speak about it in newspapers, but this is what wins all Kaggle competitions. Kaggle is a machine learning competition site. When I say all, all that are not imagery cognition competitions. >> Dave: And that was ex-- >> XGBoost, X-G-B-O-O-S-T. >> Dave: XGBoost, okay. >> XGBoost, and it's-- >> Dave: X-ray gamma, right? >> It's really a package. When I say we don't know which package will win, XGBoost was introduced a year ago also, or maybe a bit more, but not so long ago, and now, if you have structure data, it is the best choice today. It's a really fast-moving, but so, we will support mid-show deep learning package and mid-show classical learning package like the one from Anaconda or XGBoost. The other thing we start with Z. We announced in the analyst session that we will have a power version and a private cloud, meaning XTC69X version as well. I can't tell you when because it's not firm, but it will come. >> And in public cloud as well, I guess we'll, you've got components in the public cloud today like the Watson Data Platform that you've extracted and put here. >> We have extracted part of the testing experience, so we've extracted notebooks and a graphical tool called ModelBuilder from DSX as part of IBM machine learning now, and we're going to add more of DSX as we go. But the goal is to really share code and function across private cloud and public cloud. As Rob Thomas defined it, we want with private cloud to offer all the features and functionality of public cloud, except that it would run inside a firewall. We are really developing machine learning and Watson machine learning on a command code base. It's an internal open source project. We share code, and then, we shape on different platform. >> I mean, you haven't, just now, used the word hybrid. Every now and then IBM does, but do you see that so-called hybrid use case as viable, or do you see it more, some workloads should run on prem, some should run in the cloud, and maybe they'll never come together? >> Machine learning, you basically have to face, one is training and the other is scoring. I see people moving training to cloud quite easily, unless there is some regulation about data privacy. But training is a good fit for cloud because usually you need a large computing system but only for limited time, so elasticity's great. But then deployment, if you want to score transaction in a CICS transaction, it has to run beside CICS, not cloud. If you want to score data on an IoT gateway, you want to score other gateway, not in a data center. I would say that may not be what people think first, but what will drive really the split between public cloud, private, and on prem is where you want to apply your machine learning models, where you want to score. For instance, smart watches, they are switching to gear to fit measurement system. You want to score your health data on the watch, not in the internet somewhere. >> Right, and in that CICS example that you gave, you'd essentially be bringing the model to the CICS data, is that right? >> Yes, that's what we do. That's a value of machine learning for Z is if you want to score transactions happening on Z, you need to be running on Z. So it's clear, mainframe people, they don't want to hear about public cloud, so they will be the last one moving. They have their reasons, but they like mainframe because it ties really, really secure and private. >> Dave: Public cloud's a dirty word. >> Yes, yes, for Z users. At least that's what I was told, and I could check with many people. But we know that in general the move is for public cloud, so we want to help people, depending on their journey, of the cloud. >> You've got one of those, too. Jean Francois, thanks very much for coming on theCUBE, it was really a pleasure having you back. >> Thank you. >> You're welcome. Alright, keep it right there, everybody. We'll be back with our next guest. This is theCUBE, we're live from the Waldorf Astoria. IBM's machine learning announcement, be right back. (electronic keyboard music)

Published Date : Feb 15 2017

SUMMARY :

Brought to you by IBM. Good to see you again. on, big day for you guys. What are the key things that we and then what you do after. "It's what you do with them that matters." So, exactly what you said. but we won't stop at Spark ML. the past certainly 10 years, so that you make better decisions. that you object to, that we have Terminator in the next decade cognitive functions that and maybe we won't need to and how would you expect that to change? to exclude it, so you know. and now, look at that. Do we feel threatened because No, it doesn't bother us, right. and you don't want to disturb but it's not all that I believe in the press release it said so we will leverage this when it's out. and now, if you have structure data, like the Watson Data Platform But the goal is to really but do you see that so-called is where you want to apply is if you want to score so we want to help people, depending on it was really a pleasure having you back. from the Waldorf Astoria.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jean Francois	PERSON	0.99+
IBM	ORGANIZATION	0.99+
10-year	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
Avi Mehta	PERSON	0.99+
New York	LOCATION	0.99+
Anaconda	ORGANIZATION	0.99+
70%	QUANTITY	0.99+
Jean Francois Puget	PERSON	0.99+
next year	DATE	0.99+
Two	QUANTITY	0.99+
Last week	DATE	0.99+
next quarter	DATE	0.99+
90%	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
one-time	QUANTITY	0.99+
today	DATE	0.99+
Five years ago	DATE	0.99+
one word	QUANTITY	0.99+
CICS	ORGANIZATION	0.99+
Python	TITLE	0.99+
a year ago	DATE	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
next decade	DATE	0.98+
one week	QUANTITY	0.98+
first solution	QUANTITY	0.98+
XGBoost	TITLE	0.98+
a week	QUANTITY	0.97+
Spark ML	TITLE	0.97+
'60s	DATE	0.97+
ModelBuilder	TITLE	0.96+
one size	QUANTITY	0.96+
One	QUANTITY	0.95+
first	QUANTITY	0.94+
Watson Data Platform	TITLE	0.93+
each time	QUANTITY	0.93+
Kaggle	ORGANIZATION	0.92+
Stu	PERSON	0.91+
this quarter	DATE	0.91+
DSX	TITLE	0.89+
XGBoost	ORGANIZATION	0.89+
Waldorf Astoria	ORGANIZATION	0.86+
Spark ML.	TITLE	0.85+
z/OS	TITLE	0.82+
years	DATE	0.8+
centuries	QUANTITY	0.75+
10 years	QUANTITY	0.75+
DSX	ORGANIZATION	0.72+
Terminator	TITLE	0.64+
XTC69X	TITLE	0.63+
IBM Machine Learning Launch 2017	EVENT	0.63+
couple times	QUANTITY	0.57+
machine learning	EVENT	0.56+
X	TITLE	0.56+
Watson	TITLE	0.55+
these products	QUANTITY	0.53+
-G-B	COMMERCIAL_ITEM	0.53+
H20	ORGANIZATION	0.52+
TensorFlow	ORGANIZATION	0.5+
theCUBE	ORGANIZATION	0.49+
CUBE	ORGANIZATION	0.37+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for DSX 1.2: