Tripp Partain, HPE and Anthony Rokis, GE Digital - HPE Discover 2017

>> Narrator: Live from Las Vegas, it's theCUBE covering HPE Discover 2017 brought to you by Hewlett Packard Enterprise. >> Welcome back, everyone. We are here live in Las Vegas for HPE Discover 2017 exclusives look at angle cube coverage, our seventh year. I'm John Furrier with my co-host Dave Vellante. Our next guest is Tripp Partain, HPE CTO for GE General Electric and Anthony Rokis, VP of Software Engineering at Predix with GE Digital. Guys, welcome back to theCUBE. Good to see you. >> Thanks guys. >> Thanks for coming on. Obviously, GE has really been on the front end of IOT. You guys have been doing extremely well and changing over, bringing digital to analog, kind of connecting those worlds. What's your take on this intelligent Edge? You got to love the messaging. You got to love the messaging with HP. >> God, it's great. I think this is really starting to take off. If you look at our positioning, we really are going after the Edge, right. And with Predix being our forefront in the Predix system, we really believe in the opportunity here. I think, as you heard Meg speak yesterday, the engagement between GE Digital and HPE is getting stronger, we're finding more and more synergies over time. And both our strategy and their strategy are really starting to line up very nicely, both Edge and computing in general. >> I had a chance a couple years ago to host a panel with your CEO Jeff Amels, and United Airlines, Hospital in Chicago and at that time it was really hardcore, tangible dollars on the line. I mean, we're talking highly instrumented devices and machinery that you guys are in and there were some significant dollars involved. Just getting the data is a very low-hanging fruit, but big numbers, this is now going mainstream where everyone's kind of having this awakening moment, Tripp, where it's kind of like, "Hey, we're just going mainstream." So what's next for you guys, as the world starts getting up to speed on IOT, what's next for GE? What are you guys doing now to go onto the next level? What's that next tier of digital IOT for you guys? >> Yeah, honestly in my view and GE's view if you look at what we've done in the past, it's really the foundations getting in place. It's censor-enabled devices getting assets. The censor is more progressive, and that's kind of been the first sort of step, right. Then we get into how do we collect that data? Where you think GE is headed now, are the smart analytics. It's the outcomes that are going to drive those big dollars in productivity. It's really getting into the digital industrial revolution area. To date, it's been a lot of the foundation getting in place, and I think that's where you're going to see tremendous growth over time is. When you unleash data scientists on wealth of information, the outcomes in the productivity, the world and the economy is going to see is going to be great. >> I love that quote, with Jeff Immelt. We refer to it all the time. I went to bed an industrial giant, and woke up, you know a software company. And so it clearly underscores the transformation. We were talking off-camera about the study that we did many, many years ago. I mean, the numbers are staggering. It's in the trillions. But one of the things that we found was this notion of, and we talk about it all the time and I'd love to get your take on it is the IT and OT. They're not talking to each other. Typically, they're not birds of a feather. What are you seeing, Tripp, in your experience with customers in terms of those organizational, let's start with the IT side and we can talk about the OT side. >> Yeah, and as we've had our partners show up with GE continue to develop, the one thing we've found is we have a lot of similar customers. And in these same customers are extremely large customers, but what's interesting we don't talk to any of the same people. Right, on the HPE side we tended to talk to the IT teams and data center and GE would be out in the factory floor or out in the field and more industrial. But in order to really fulfill the IOT promise, the two groups are really having to come together and I think it's taking time for messaging to really sort out there. And one of the things that we're really doing, taking advantage of our partnership to help solve the problem is when we have IT teams come to visit HPE, we bring along GE operational experts to actually talk about the business side of the outcome so it's not just an IT conversation. And really intentionally crossing those paths and leveraging our partner in GE to bring that capability to us so we can have a holistic conversation to the customer. >> So who's in charge here, who's driving the bus? Is it the OT guys, or the IT guys, or somebody above? >> It's both, there's two drivers. >> Uh oh. >> Two hands, four hands on the wheel almost. If you look at the OT side, there's a lot of challenges we're facing where HPE and the IT community is coming to help. For instance, data sovereign team, right. So one of the challenges we have is a lot of our companies, our customers, want data sovereignty and this is where IT has solved that problem for us and on the OT side, we need to figure out how do we store, maintain, analyze that data within a country. And again, that's why we're bringing the IT companies with us and partner to help us. >> So when a plane flies from Spain, crosses France, Germany, and ends up in Ireland, where is the data? (laughs) >> Very good question. >> Well it infringes the data, because there's sort of a data love triangle going on. You've obviously got devices installed, HPE brings equipment, and the customer. So, talk about the conversations that you're having with your customers. I mean, who owns the data. The factory says, "Hey, wait a minute it's a system. That's my data." GE obviously has to do predictive maintenance and same with HPE. There's all this data flowing. What about data, I don't know, ownership or IP, what are those conversations like? >> Yeah, I can say certainly from the GE side it's always been our stance that the customer owns their data, right. We are running a multi-tenency environment and a platform. And they own that data. How that data is stored, we can help facilitate, right. We offer Cloud store and a couple other technologies that allow that. But at the end of the day in a multi-tenent environment, the customer owns that data. And we will facilitate with HPE where that data needs to reside based on the customer's need. >> So you're not trying in any way to monetize that data? I mean, I'm astounded, why not? >> I think the monetization really comes in with how you empower the customer to get the value out of the data. And in a former life, I worked through the data monetization world and there is certain amounts of value in the data itself. There's also value in helping the customer determine what their data can offer to them and the business cases that we're able to jointly present to the customer and the value that that generates still allows for us to monetize the process by which we help enable the customer to really bring these data assets together. Really understand areas that they may have seen silos of the data before, but they weren't looking holistically at it and being able to, in a very timely fashion correlate between that and then actually see a different answer to a problem where yes, this meter may be reading 80 and it should be 60, but if I throttle it to 70 and I get 10% more output, it's worth running at 70 because of the benefit on the revenue. So you actually can make trade offs across certain areas where you weren't able to do that. >> But Predix is informing models, is it not, I mean. >> Yeah, I mean at the end of the day, we're taking that data and for the customer created an outcome. Right, the analytic, the information that we can derive out of it to make a more productive or a more efficient outcome of running operation, that's where we get the monetization from. >> If data's a new oil then you need to refine it, was your point about the monetization question. That's interesting because we see the same thing where if you make the data freely available or you treat it as an asset to the customer, it's how it's monetized in its effect. Or there's a tacticle, let's monetize our data. So depending on how you look at it, there's different approaches, right? I mean this is kind of the key thing. >> Right, and even though this is not the way now, if you follow the history of how other industries have dealt with the data. So I came out of credit services long ago, and it's very common now, in the credit services industry for data to be monetized and leveraged like for credit reports and for that whole banking financial process to take place, but it didn't start that way. So my guess is, as we continue to show value to the customer of their own data as they then start to think about, "Wow but if I could do comparables between my data and industry data that would help me even more." I expect that the customers that today that are worried about who owns the data, will eventually start asking players like HPE and GE Digital to help them solve that problem. And they'll evolve to that sort of data monetization like a lot of the other industries have. >> A whole new digital just creates a whole new way to look at things, it's not a linear supply chain anymore whether relative to data or what not, so super cool. Final question for you guys and I appreciate you coming on theCUBE and sharing your insights. What's next for the partnership with HPE and GE Digital? Obviously, the digital transformation's in full swing impacting business transformation, impacting the Dev Ops aspect of Cloud. All this cool stuff's happening, true private Cloud's on fire, hybrid's the doorway to Multi Cloud. A lot of cool stuff happening, what's next for you guys? >> Yeah I think from our side we're really excited about the partnership on the Edge, right. When we start looking at the computing requirements and needs at the Edge, close to the asset, low latency that's where HPE and GE are really going to start to partner very heavily and you're going to see a lot more engagement at that level. So I think the Edge is going to be our focal point. >> Oh absolutely, and I think the uniqueness we bring to the market with our Edge line converged systems, we're able to do things at the Edge, leveraging GE Predix and then also bringing in other third party partners in conjunction and now you have enough computer power in the right form factor that can all sit and reside at the Edge, process at the Edge and solve the problems there locally. Doesn't take away from the Cloud aspect, doesn't take away from being able to have a macro view across multiple scenarios. But if I'm on an oil rig in the middle of the North Sea, you know it's going to be very important for me to have everything I need in the right form factor at the lowest power utilization possible and still solve my problems. >> And can process all the data right there. Guys, we are pushing it to the Edge here theCUBE goes out to the events, that's the Edge of the action. We'll bring you all the great videos. Thanks for coming on, this is theCUBE live coverage from the Edge at HPE Discover 2017. I'm John Furrier, Dave Vellante. Be right back with more, stay with us. (digital music)

Published Date : Jun 7 2017

SUMMARY :

covering HPE Discover 2017 brought to you Good to see you. Obviously, GE has really been on the front end of IOT. in the Predix system, we really believe in Just getting the data is a very low-hanging fruit, and the economy is going to see But one of the things that we found was Right, on the HPE side we tended and the IT community is coming to help. Well it infringes the data, But at the end of the day in a multi-tenent environment, the customer to really bring these But Predix is informing models, Yeah, I mean at the end of the day, So depending on how you look at it, I expect that the customers that today hybrid's the doorway to Multi Cloud. and needs at the Edge, close to the asset, in the right form factor at the lowest that's the Edge of the action.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Anthony Rokis	PERSON	0.99+
Jeff Immelt	PERSON	0.99+
Ireland	LOCATION	0.99+
GE	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
GE Digital	ORGANIZATION	0.99+
United Airlines	ORGANIZATION	0.99+
Spain	LOCATION	0.99+
10%	QUANTITY	0.99+
Two hands	QUANTITY	0.99+
John Furrier	PERSON	0.99+
two groups	QUANTITY	0.99+
Tripp Partain	PERSON	0.99+
France	LOCATION	0.99+
GE General Electric	ORGANIZATION	0.99+
two drivers	QUANTITY	0.99+
60	QUANTITY	0.99+
70	QUANTITY	0.99+
Germany	LOCATION	0.99+
one	QUANTITY	0.99+
Meg	PERSON	0.99+
80	QUANTITY	0.99+
Jeff Amels	PERSON	0.99+
HP	ORGANIZATION	0.99+
both	QUANTITY	0.99+
four hands	QUANTITY	0.99+
Hewlett Packard Enterprise	ORGANIZATION	0.99+
seventh year	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
Predix	ORGANIZATION	0.98+
yesterday	DATE	0.98+
Edge	ORGANIZATION	0.98+
first	QUANTITY	0.98+
North Sea	LOCATION	0.97+
today	DATE	0.95+
HPE Discover 2017	EVENT	0.94+
Hospital	ORGANIZATION	0.92+
Tripp	PERSON	0.87+
Chicago	LOCATION	0.85+
many years	DATE	0.85+
a couple years ago	DATE	0.77+
HPE CTO	ORGANIZATION	0.64+
couple	QUANTITY	0.63+
Narrator	TITLE	0.59+
VP	PERSON	0.58+
Edge	TITLE	0.52+
trillions	QUANTITY	0.5+
theCUBE	TITLE	0.44+

Wikibon Research Meeting

>> Dave: The cloud. There you go. I presume that worked. >> David: Hi there. >> Dave: Hi David. We had agreed, Peter and I had talked and we said let's just pick three topics, allocate enough time. Maybe a half hour each, and then maybe a little bit longer if we have the time. Then try and structure it so we can gather some opinions on what it all means. Ultimately the goal is to have an outcome with some research that hits the network. The three topics today, Jim Kobeielus is going to present on agile and data science, David Floyer on NVMe over fabric and of course keying off of the Micron news announcement. I think Nick is, is that Nick who just joined? He can contribute to that as well. Then George Gilbert has this concept of digital twin. We'll start with Jim. I guess what I'd suggest is maybe present this in the context of, present a premise or some kind of thesis that you have and maybe the key issues that you see and then kind of guide the conversation and we'll all chime in. >> Jim: Sure, sure. >> Dave: Take it away, Jim. >> Agile development and team data science. Agile methodology obviously is well-established as a paradigm and as a set of practices in various schools in software development in general. Agile is practiced in data science in terms of development, the pipelines. The overall premise for my piece, first of all starting off with a core definition of what agile is as a methodology. Self-organizing, cross-functional teams. They sprint toward results in steps that are fast, iterative, incremental, adaptive and so forth. Specifically the premise here is that agile has already come to data science and is coming even more deeply into the core practice of data science where data science is done in team environment. It's not just unicorns that are producing really work on their own, but more to the point, it's teams of specialists that come together in co-location, increasingly in co-located environments or in co-located settings to produce (banging) weekly check points and so forth. That's the basic premise that I've laid out for the piece. The themes. First of all, the themes, let me break it out. In terms of the overall how I design or how I'm approaching agile in this context is I'm looking at the basic principles of agile. It's really practices that are minimal, modular, incremental, iterative, adaptive, and co-locational. I've laid out how all that maps in to how data science is done in the real world right now in terms of tight teams working in an iterative fashion. A couple of issues that I see as regards to the adoption and sort of the ramifications of agile in a data science context. One of which is a co-location. What we have increasingly are data science teams that are virtual and distributed where a lot of the functions are handled by statistical modelers and data engineers and subject matter experts and visualization specialists that are working remotely from each other and are using collaborative tools like the tools from the company that I just left. How can agile, the co-location work primer for agile stand up in a world with more of the development team learning deeper and so forth is being done on a scrutiny basis and needs to be by teams of specialists that may be in different cities or different time zones, operating around the clock, produce brilliant results? Another one of which is that agile seems to be predicated on the notion that you improvise the process as you go, trial and error which seems to fly in the face of documentation or tidy documentation. Without tidy documentation about how you actually arrived at your results, how come those results can not be easily reproduced by independent researchers, independent data scientists? If you don't have well defined processes for achieving results in a certain data science initiative, it can't be reproduced which means they're not terribly scientific. By definition it's not science if you can't reproduce it by independent teams. To the extent that it's all loosey-goosey and improvised and undocumented, it's not reproducible. If it's not reproducible, to what extent should you put credence in the results of a given data science initiative if it's not been documented? Agile seems to fly in the face of reproducibility of data science results. Those are sort of my core themes or core issues that I'm pondering with or will be. >> Dave: Jim, just a couple questions. You had mentioned, you rattled off a bunch of parameters. You went really fast. One of them was co-location. Can you just review those again? What were they? >> Sure. They are minimal. The minimum viable product is the basis for agile, meaning a team puts together data a complete monolithic sect, but an initial deliverable that can stand alone, provide some value to your stakeholders or users and then you iteratively build upon that in what I call minimum viable product going forward to pull out more complex applications as needed. There's sort of a minimum viable product is at the heart of agile the way it's often looked at. The big question is, what is the minimum viable product in a data science initiative? One way you might approach that is saying that what you're doing, say you're building a predictive model. You're predicting a single scenario, for example such as whether one specific class of customers might accept one specific class of offers under the constraining circumstances. That's an example of minimum outcome to be achieved from a data science deliverable. A minimum product that addresses that requirement might be pulling the data from a single source. We'll need a very simplified feature set of predictive variables like maybe two or three at the most, to predict customer behavior, and use one very well understood algorithm like linear regressions and do it. With just a few lines of programming code in Python or Aura or whatever and build us some very crisp, simple rules. That's the notion in a data science context of a minimum viable product. That's the foundation of agile. Then there's the notion of modular which I've implied with minimal viable product. The initial product is the foundation upon which you build modular add ons. The add ons might be building out more complex algorithms based on more data sets, using more predictive variables, throwing other algorithms in to the initiative like logistic regression or decision trees to do more fine-grained customer segmentation. What I'm giving you is a sense for the modular add ons and builds on to the initial product that generally weaken incrementally in the course of a data science initiative. Then there's this, and I've already used the word incremental where each new module that gets built up or each new feature or tweak on the core model gets added on to the initial deliverable in a way that's incremental. Ideally it should all compose ultimately the sum of the useful set of capabilities that deliver a wider range of value. For example, in a data science initiative where it's customer data, you're doing predictive analysis to identify whether customers are likely to accept a given offer. One way to add on incrementally to that core functionality is to embed that capability, for example, in a target marketing application like an outbound marketing application that uses those predictive variables to drive responses in line to, say an e-commerce front end. Then there's the notion of iterative and iterative really comes down to check points. Regular reviews of the standards and check points where the team comes together to review the work in a context of data science. Data science by its very nature is exploratory. It's visualization, it's model building and testing and training. It's iterative scoring and testing and refinement of the underlying model. Maybe on a daily basis, maybe on a weekly basis, maybe adhoc, but iteration goes on all the time in data science initiatives. Adaptive. Adaptive is all about responding to circumstances. Trial and error. What works, what doesn't work at the level of the clinical approach. It's also in terms of, do we have the right people on this team to deliver on the end results? A data science team might determine mid-way through that, well we're trying to build a marketing application, but we don't have the right marketing expertise in our team, maybe we need to tap Joe over there who seems to know a little bit about this particular application we're trying to build and this particular scenario, this particular customers, we're trying to get a good profile of how to reach them. You might adapt by adding, like I said, new data sources, adding on new algorithms, totally changing your approach for future engineering as you go along. In addition to supervised learning from ground troops, you might add some unsupervised learning algorithms to being able to find patterns in say unstructured data sets as you bring those into the picture. What I'm getting at is there's a lot, 10 zillion variables that, for a data science team that you have to add in to your overall research plan going forward based on, what you're trying to derive from data science is its insights. They're actionable and ideally repeatable. That you can embed them in applications. It's just a matter of figuring out what actually helps you, what set of variables and team members and data and sort of what helps you to achieve the goals of your project. Finally, co-locational. It's all about the core team needs to be, usually in the same physical location according to the book how people normally think of agile. The company that I just left is basically doing a massive social engineering exercise, ongoing about making their marketing and R&D teams a little more agile by co-locating them in different cities like San Francisco and Austin and so forth. The whole notion that people will collaborate far better if they're not virtual. That's highly controversial, but none-the-less, that's the foundation of agile as it's normally considered. One of my questions, really an open question is what hard core, you might have a sprawling team that's doing data science, doing various aspects, but what solid core of that team needs to be physically co-located all or most of the time? Is it the statistical modeler and a data engineer alone? The one who stands up how to do cluster and the person who actually does the building and testing of the model? Do the visualization specialists need to be co-located as well? Are other specialties like subject matter experts who have the knowledge in marketing, whatever it is, do they also need to be in the physical location day in, day out, week in and week out to achieve results on these projects? Anyway, so there you go. That's how I sort of appealed the argument of (mumbling). >> Dave: Okay. I got a minimal modular, incremental, iterative, adaptive, co-locational. What was six again? I'm sorry. >> Jim: Co-locational. >> Dave: What was the one before that? >> Jim: I'm sorry. >> Dave: Adaptive. >> Minimal, modular, incremental, iterative, adaptive, and co-locational. >> Dave: Okay, there were only six. Sorry, I thought it was seven. Good. A couple of questions then we can get the discussion going here. Of course, you're talking specifically in the context of data science, but some of the questions that I've seen around agile generally are, it's not for everybody, when and where should it be used? Waterfalls still make sense sometimes. Some of the criticisms I've read, heard, seen, and sometimes experienced with agile are sort of quality issues, I'll call it lack of accountability. I don't know if that's the right terminology. We're going for speed so as long as we're fast, we checked that box, quality can sacrifice. Thoughts on that. Where does it fit and again understanding specifically you're talking about data science. Does it always fit in data science or because it's so new and hip and cool or like traditional programming environments, is it horses for courses? >> David: Can I add to that, Dave? It's a great, fundamental question. It seems to me there's two really important aspects of artificial intelligence. The first is the research part of it which is developing the algorithms, developing the potential data sources that might or might not matter. Then the second is taking that and putting it into production. That is that somewhere along the line, it's saving money, time, etc., and it's integrated with the rest of the organization. That second piece is, the first piece it seems to be like most research projects, the ROI is difficult to predict in a new sort of way. The second piece of actually implementing it is where you're going to make money. Is agile, if you can integrate that with your systems of record, for example and get automation of many of the aspects that you've researched, is agile the right way of doing it at that stage? How would you bridge the gap between the initial development and then the final instantiation? >> That's an important concern, David. Dev Ops, that's a closely related issue but it's not exactly the same scope. As data science and machine learning, let's just net it out. As machine learning and deep learning get embedded in applications, in operations I should say, like in your e-commerce site or whatever it might be, then data science itself becomes an operational function. The people who continue to iterate those models in line the operational applications. Really, where it comes down to an operational function, everything that these people do needs to be documented and version controlled and so forth. These people meaning data science professionals. You need documentation. You need accountability. The development of these assets, machine learning and so forth, needs to be, is compliance. When you look at compliance, algorithmic accountability comes into it where lawyers will, like e-discovery. They'll subpoena, theoretically all your algorithms and data and say explain how you arrived at this particular recommendation that you made to grant somebody or not grant somebody a loan or whatever it might be. The transparency of the entire development process is absolutely essential to the data science process downstream and when it's a production application. In many ways, agile by saying, speed's the most important thing. Screw documentation, you can sort of figure that out and that's not as important, that whole pathos, it goes by the wayside. Agile can not, should not skip on documentation. Documentation is even more important as data science becomes an operational function. That's one of my concerns. >> David: I think it seems to me that the whole rapid idea development is difficult to get a combination of that and operational, boring testing, regression testing, etc. The two worlds are very different. The interface between the two is difficult. >> Everybody does their e-commerce tweaks through AB testing of different layouts and so forth. AB testing is fundamentally data science and so it's an ongoing thing. (static) ... On AB testing in terms of tweaking. All these channels and all the service flow, systems of engagement and so forth. All this stuff has to be documented so agile sort of, in many ways flies in the face of that or potentially compromises the visibility of (garbled) access. >> David: Right. If you're thinking about IOT for example, you've got very expensive machines out there in the field which you're trying to optimize true put through and trying to minimize machine's breaking, etc. At the Micron event, it was interesting that Micron's use of different methodologies of putting systems together, they were focusing on the data analysis, etc., to drive greater efficiency through their manufacturing process. Having said that, they need really, really tested algorithms, etc. to make sure there isn't a major (mumbling) or loss of huge amounts of potential revenue if something goes wrong. I'm just interested in how you would create the final product that has to go into production in a very high value chain like an IOT. >> When you're running, say AI from learning algorithms all the way down to the end points, it gets even trickier than simply documenting the data and feature sets and the algorithms and so forth that were used to build up these models. It also comes down to having to document the entire life cycle in terms of how these algorithms were trained to make the predictors of whatever it is you're trying to do at the edge with a particular algorithm. The whole notion of how are all of these edge points applications being trained, with what data, at what interval? Are they being retrained on a daily basis, hourly basis, moment by moment basis? All of those are critical concerns to know whether they're making the best automated decisions or actions possible in all scenarios. That's like a black box in terms of the sheer complexity of what needs to be logged to figure out whether the application is doing its job as best a possible. You need a massive log, you need a massive event log from end to end of the IOT to do that right and to provide that visibility ongoing into the performance of these AI driven edge devices. I don't know anybody who's providing the tool to do it. >> David: If I think about how it's done at the moment, it's obviously far too slow at the moment. At the same time, you've got to have some testing and things like that. It seems to me that you've got a research model on one side and then you need to create a working model from that which is your production model. That's the one that goes through the testing and everything of that sort. It seems to me that the interface would be that transition from the research model to the working model that would be critical here and the working model is obviously a subset and it's going to be optimized for performance, etc. in real time, as opposed to the development model which can be a lot to do and take half a week to manage it necessary. It seems to me that you've got a different set of business pressures on the working model and a different set of skills as well. I think having one team here doesn't sound right to me. You've got to have a Dev Ops team who are going to take the working model from the developers and then make sure that it's sound and save. Especially in a high value IOT area that the level of iteration is not going to be nearly as high as in a lower cost marketing type application. Does that sound sensible? >> That sounds sensible. In fact in Dev Ops, the Dev Ops team would definitely be the ones that handle the continuous training and retraining of the working models on an ongoing basis. That's a core observation. >> David: Is that the right way of doing it, Jim? It seems to me that the research people would be continuing to adapt from data from a lot of different places whereas the operational model would be at a specific location with a specific IOT and they wouldn't have necessarily all the data there to do that. I'm not quite sure whether - >> Dave: Hey guys? Hey guys, hey guys? Can I jump in here? Interesting discussion, but highly nuanced and I'm struggling to figure out how this turns into a piece or sort of debating some certain specifics that are very kind of weedy. I wonder if we could just reset for a second and come back to sort of what I was trying to get to before which is really the business impact. Should this be applied broadly? Should this be applied specifically? What does it mean if I'm a practitioner? What should I take away from, Jim your premise and your sort of fixed parameters? Should I be implementing this? Why? Where? What's the value to my organization - the value I guess is obvious, but does it fit everywhere? Should it be across the board? Can you address that? >> Neil: Can I jump in here for a second? >> Dave: Please, that would be great. Is that Neil? >> Neil: Neil. I've never been a data scientist, but I was an actuary a long time ago. When the truth actuary came to me and said we need to develop a liability insurance coverage for floating oil rigs in the North Sea, I'm serious, it took a couple of months of research and modeling and so forth. If I had to go to all of those meetings and stand ups in an agile development environment, I probably would have gone postal on the place. I think that there's some confusion about what data science is. It's not a vector. It's not like a Dev Op situation where you start with something and you go (mumbling). When a data scientist or whatever you want to call them comes up with a model, that model has to be constantly revisited until it's put out of business. It's refined, it's evaluated. It doesn't have an end point like that. The other thing is that data scientist is typically going to be running multiple projects simultaneously so how in the world are you going to agilize that? I think if you look at the data science group, they're probably, I think Nick said this, there are probably groups in there that are doing fewer Dev Ops, software engineering and so forth and you can apply agile techniques to them. The whole data science thing is too squishy for that, in my opinion. >> Jim: Squishy? What do you mean by squishy, Neil? >> Neil: It's not one thing. I think if you try to represent data science as here's a project, we gather data, we work on a model, we test it, and then we put it into production, it doesn't end there. It never ends. It's constantly being revised. >> Yeah, of course. It's akin to application maintenance. The application meaning the model, the algorithm to be fit for purpose has to continually be evaluated, possibly tweaked, always retrained to determine its predictive fit for whatever task it's been assigned. You don't build it once and assume its strong predictive fit forever and ever. You can never assume that. >> Neil: James and I called that adaptive control mechanisms. You put a model out there and you monitor the return you're getting. You talk about AB testing, that's one method of doing it. I think that a data scientist, somebody who really is keyed into the machine learning and all that jazz. I just don't see them as being project oriented. I'll tell you one other thing, I have a son who's a software engineer and he said something to me the other day. He said, "Agile? Agile's dead." I haven't had a chance to find out what he meant by that. I'll get back to you. >> Oh, okay. If you look at - Go ahead. >> Dave: I'm sorry, Neil. Just to clarify, he said agile's dead? Was that what he said? >> Neil: I didn't say it, my son said it. >> Dave: Yeah, yeah, yeah right. >> Neil: No idea what he was talking about. >> Dave: Go ahead, Jim. Sorry. >> If you look at waterfall development in general, for larger projects it's absolutely essential to get requirements nailed down and the functional specifications and all that. Where you have some very extensive projects and many moving parts, obviously you need a master plan that it all fits into and waterfall, those checkpoints and so forth, those controls that are built into that methodology are critically important. Within the context of a broad project, some of the assets being build up might be machine loading models and analytics models and so forth so in the context of our broader waterfall oriented software development initiative, you might need to have multiple data science projects spun off within the sub-projects. Each of those would fit into, by itself might be indicated sort of like an exploration task where you have a team doing data visualization, exploration in more of an open-ended fashion because while they're trying to figure out the right set of predictors and the right set of data to be able to build out the right model to deliver the right result. What I'm getting at is that agile approaches might be embedded into broader waterfall oriented development initiatives, agile data science approaches. Fundamentally, data science began and still is predominantly very smart people, PhDs in statistics and math, doing open-ended exploration of complex data looking for non-obvious patterns that you wouldn't be able to find otherwise. Sort of a fishing expedition, a high priced fishing expedition. Kind of a mode of operation as how data science often is conducted in the real world. Looking for that eureka moment when the correlations just jump out at you. There's a lot of that that goes on. A lot of that is very important data science, it's more akin to pure science. What I'm getting at is there might be some role for more structure in waterfall development approaches in projects that have a data science, core data science capability to them. Those are my thoughts. >> Dave: Okay, we probably should move on to the next topic here, but just in closing can we get people to chime in on sort of the bottom line here? If you're writing to an audience of data scientists or data scientist want to be's, what's the one piece of advice or a couple of pieces of advice that you would give them? >> First of all, data science is a developer competency. The modern developers are, many of them need to be data scientists or have a strong grounding and understanding of data science, because much of that machine learning and all that is increasingly the core of what software developers are building so you can't not understand data science if you're a modern software developer. You can't understand data science as it (garbled) if you don't understand the need for agile iterative steps within the, because they're looking for the needle in the haystack quite often. The right combination of predictive variables and the right combination of algorithms and the right training regimen in order to get it all fit. It's a new world competency that need be mastered if you're a software development professional. >> Dave: Okay, anybody else want to chime in on the bottom line there? >> David: Just my two penny worth is that the key aspect of all the data scientists is to come up with the algorithm and then implement them in a way that is robust and it part of the system as a whole. The return on investment on the data science piece as an insight isn't worth anything until it's actually implemented and put into production of some sort. It seems that second stage of creating the working model is what is the output of your data scientists. >> Yeah, it's the repeatable deployable asset that incorporates the crux of data science which is algorithms that are data driven, statistical algorithms that are data driven. >> Dave: Okay. If there's nothing else, let's close this agenda item out. Is Nick on? Did Nick join us today? Nick, you there? >> Nick: Yeah. >> Dave: Sounds like you're on. Tough to hear you. >> Nick: How's that? >> Dave: Better, but still not great. Okay, we can at least hear you now. David, you wanted to present on NVMe over fabric pivoting off the Micron news. What is NVMe over fabric and who gives a fuck? (laughing) >> David: This is Micron, we talked about it last week. This is Micron announcement. What they announced is NVMe over fabric which, last time we talked about is the ability to create a whole number of nodes. They've tested 250, the architecture will take them to 1,000. 1,000 processor or 1,000 nodes, and be able to access the data on any single node at roughly the same speed. They are quoting 200 microseconds. It's 195 if it's local and it's 200 if it's remote. That is a very, very interesting architecture which is like nothing else that's been announced. >> Participant: David, can I ask a quick question? >> David: Sure. >> Participant: This latency and the node count sounds astonishing. Is Intel not replicating this or challenging in scope with their 3D Crosspoint? >> David: 3D Crosspoint, Intel would love to sell that as a key component of this. The 3D Crosspoint as a storage device is very, very, very expensive. You can replicate most of the function of 3D Crosspoint at a much lower price point by using a combination of D-RAM and protective D-RAM and Flash. At the moment, 3D Crosspoint is a nice to have and there'll be circumstances where they will use it, but at the meeting yesterday, I don't think they, they might have brought it up once. They didn't emphasize it (mumbles) at all as being part of it. >> Participant: To be clear, this means rather than buying Intel servers rounded out with lots of 3D Crosspoint, you buy Intel servers just with the CPU and then all the Micron niceness for their NVMe and their Interconnect? >> David: Correct. They are still Intel servers. The ones they were displaying yesterday were HP1's, they also used SuperMicro. They want certain characteristics of the chip set that are used, but those are just standard pieces. The other parts of the architecture are the Mellanox, the 100 gigabit converged ethernet and using Rocky which is IDMA over converged ethernet. That is the secret sauce which allows you and Mellanox themselves, their cards have a lot of offload of a lot of functionality. That's the secret sauce which allows you to go from any point to any point in 5 microseconds. Then create a transfer and other things. Files are on top of that. >> Participant: David, Another quick question. The latency is incredibly short. >> David: Yep. >> Participant: What happens if, as say an MPP SQL database with 1,000 nodes, what if they have to shuffle a lot of data? What's the throughput? Is it limited by that 100 gig or is that so insanely large that it doesn't matter? >> David: They key is this, that it allows you to move the processing to wherever the data is very, very easily. In the principle that will evolve from this architecture, is that you know where the data is so don't move the data around, that'll block things up. Move the processing to that particular node or some adjacent node and do the processing as close as possible. That is as an architecture is a long term goal. Obviously in the short term, you've got to take things as they are. Clearly, a different type of architecture for databases will need to eventually evolve out of this. At the moment, what they're focusing on is big problems which need low latency solutions and using databases as they are and the whole end to end use stack which is a much faster way of doing it. Then over time, they'll adapt new databases, new architectures to really take advantage of it. What they're offering is a POC at the moment. It's in Beta. They had their customers talking about it and they were very complimentary in general about it. They hope to get it into full production this year. There's going to be a host of other people that are doing this. I was trying to bottom line this in terms of really what the link is with digital enablement. For me, true digital enablement is enabling any relevant data to be available for processing at the point of business engagement in real time or near real time. The definition that this architecture enables. It's a, in my view a potential game changer in that this is an architecture which will allow any data to be available for processing. You don't have to move the data around, you move the processing to that data. >> Is Micron the first market with this capability, David? NV over Me? NVMe. >> David: Over fabric? Yes. >> Jim: Okay. >> David: Having said that, there are a lot of start ups which have got a significant amount of money and who are coming to market with their own versions. You would expect Dell, HP to be following suit. >> Dave: David? Sorry. Finish your thought and then I have another quick question. >> David: No, no. >> Dave: The principle, and you've helped me understand this many times, going all the way back to Hadoop, bring the application to the data, but when you're using conventional relational databases and you've had it all normalized, you've got to join stuff that might not be co-located. >> David: Yep. That's the whole point about the five microseconds. Now that the impact of non co-location if you have to join stuff or whatever it is, is much, much lower. It's so you can do the logical draw in, whatever it is, very quickly and very easily across that whole fabric. In terms of processing against that data, then you would choose to move the application to that node because it's much less data to move, that's an optimization of the architecture as opposed to a fundamental design point. You can then optimize about where you run the thing. This is ideal architecture for where I personally see things going which is traditional systems of record which need to be exactly as they've ever been and then alongside it, the artificial intelligence, the systems of understanding, data warehouses, etc. Having that data available in the same space so that you can combine those two elements in real time or in near real time. The advantage of that in terms of business value, digital enablement, and business value is the biggest thing of all. That's a 50% improvement in overall productivity of a company, that's the thing that will drive, in my view, 99% of the business value. >> Dave: Going back just to the joint thing, 100 gigs with five microseconds, that's really, really fast, but if you've got petabytes of data on these thousand nodes and you have to do a join, you still got to go through that 100 gig pipe of stuff that's not co-located. >> David: Absolutely. The way you would design that is as you would design any query. You've got a process you would need, a process in front of that which is query optimization to be able to farm all of the independent jobs needed to do in each of the nodes and take the output of that and bring that together. Both the concepts are already there. >> Dave: Like a map. >> David: Yes. That's right. All of the data science is there. You're starting from an architecture which is fundamentally different from the traditional let's get it out architectures that have existed, by removing that huge overhead of going from one to another. >> Dave: Oh, because this goes, it's like a mesh not a ring? >> David: Yes, yes. >> Dave: It's like the high performance compute of this MPI type architecture? >> David: Absolutely. NVMe, by definition is a point to point architecture. Rocky, underneath it is a point to point architecture. Everything is point to point. Yes. >> Dave: Oh, got it. That really does call for a redesign. >> David: Yes, you can take it in steps. It'll work as it is and then over time you'll optimize it to take advantage of it more. Does that definition of (mumbling) make sense to you guys? The one I quoted to you? Enabling any relevant data to be available for processing at the point of business engagement, in real time or near real time? That's where you're trying to get to and this is a very powerful enabler of that design. >> Nick: You're emphasizing the network topology, while I kind of thought the heart of the argument was performance. >> David: Could you repeat that? It's very - >> Dave: Let me repeat. Nick's a little light, but I could hear him fine. You're emphasizing the network topology, but Nick's saying his takeaway was the whole idea was the thrust was performance. >> Nick: Correct. >> David: Absolutely. Absolutely. The result of that network topology is a many times improvement in performance of the systems as a whole that you couldn't achieve in any previous architecture. I totally agree. That's what it's about is enabling low latency applications with much, much more data available by being able to break things up in parallel and delivering multiple streams to an end result. Yes. >> Participant: David, let me just ask, if I can play out how databases are designed now, how they can take advantage of it unmodified, but how things could be very, very different once they do take advantage of it which is that today, if you're doing transaction processing, you're pretty much bottle necked on a single node that sort of maintains the fresh cache of shared data and that cache, even if it's in memory, it's associated with shared storage. What you're talking about means because you've got memory speed access to that cache from anywhere, it no longer is tied to a node. That's what allows you to scale out to 1,000 nodes even for transaction processing. That's something we've never really been able to do. Then the fact that you have a large memory space means that you no longer optimize for mapping back and forth from disk and disk structures, but you have everything in a memory native structure and you don't go through this thing straw for IO to storage, you go through memory speed IO. That's a big, big - >> David: That's the end point. I agree. That's not here quite yet. It's still IO, so the IO has been improved dramatically, the protocol within the Me and the over fabric part of it. The elapsed time has been improved, but it's not yet the same as, for example, the HPV initiative. That's saying you change your architecture, you change your way of processing just in the memory. Everything is assumed to be memory. We're not there yet. 200 microseconds is still a lot, lot slower than the process that - one impact of this architecture is that the amount of data that you can pass through it is enormously higher and therefore, the memory sizes themselves within each node will need to be much, much bigger. There is a real opportunity for architectures which minimize the impact, which hold data coherently across multiple nodes and where there's minimal impact of, no tapping on the shoulder for every byte transferred so you can move large amounts of data into memory and then tell people that it's there and allow it to be shared, for example between the different calls and the GPUs and FPGAs that will be in these processes. There's more to come in terms of the architecture in the future. This is a step along the way, it's not the whole journey. >> Participant: Dave, another question. You just referenced 200 milliseconds or microseconds? >> David: Did I say milliseconds? I meant microseconds. >> Participant: You might have, I might have misheard. Relate that to the five microsecond thing again. >> David: If you have data directly attached to your processor, the access time is 195 microseconds. If you need to go to a remote, anywhere else in the thousand nodes, your access time is 200 microseconds. In other words, the additional overhead of that data is five microseconds. >> Participant: That's incredible. >> David: Yes, yes. That is absolutely incredible. That's something that data scientists have been working on for years and years. Okay. That's the reason why you can now do what I talked about which was you can have access from any node to any data within that large amount of nodes. You can have petabytes of data there and you can have access from any single node to any of that data. That, in terms of data enablement, digital enablement, is absolutely amazing. In other words, you don't have to pre put the data that's local in one application in one place. You're allowing an enormous flexibility in how you design systems. That coming back to artificial intelligence, etc. allows you a much, much larger amount of data that you can call on for improving applications. >> Participant: You can explore and train models, huge models, really quickly? >> David: Yes, yes. >> Participant: Apparently that process works better when you have an MPI like mesh than a ring. >> David: If you compare this architecture to the DSST architecture which was the first entrance into this that MP bought for a billion dollars, then that one stopped at 40 nodes. It's architecture was very, very proprietary all the way through. This one takes you to 1,000 nodes with much, much lower cost. They believe that the cost of the equivalent DSSD system will be between 10 and 20% of that cost. >> Dave: Can I ask a question about, you mentioned query optimizer. Who develops the query optimizer for the system? >> David: Nobody does yet. >> Jim: The DBMS vendor would have to re-write theirs with a whole different pensive cost. >> Dave: So we would have an optimizer database system? >> David: Who's asking a question, I'm sorry. I don't recognize the voice. >> Dave: That was Neil. Hold on one second, David. Hold on one second. Go ahead Nick. You talk about translation. >> Nick: ... On a network. It's SAN. It happens to be very low latency and very high throughput, but it's just a storage sub-system. >> David: Yep. Yep. It's a storage sub-system. It's called a server SAN. That's what we've been talking about for a long time is you need the same characteristics which is that you can get at all the data, but you need to be able to get at it in compute time as opposed to taking a stroll down the road time. >> Dave: Architecturally it's a SAN without an array controller? >> David: Exactly. Yeah, the array controller is software from a company called Xcellate, what was the name of it? I can't remember now. Say it again. >> Nick: Xcelero or Xceleron? >> David: Xcelero. That's the company that has produced the software for the data services, etc. >> Dave: Let's, as we sort of wind down this segment, let's talk about the business impact again. We're talking about different ways potentially to develop applications. There's an ecosystem requirement here it sounds like, from the ISDs to support this and other developers. It's the final, portends the elimination of the last electromechanical device in computing which has implications for a lot of things. Performance value, application development, application capability. Maybe you could talk about that a little bit again thinking in terms of how practitioners should look at this. What are the actions that they should be taking and what kinds of plans should they be making in their strategies? >> David: I thought Neil's comment last week was very perceptive which is, you wouldn't start with people like me who have been imbued with the 100 database call limits for umpteen years. You'd start with people, millennials, or sub-millenials or whatever you want to call them, who can take a completely fresh view of how you would exploit this type of architecture. Fundamentally you will be able to get through 10 or 100 times more data in real time than you can with today's systems. There's two parts of that data as I said before. The traditional systems of record that need to be updated, and then a whole host of applications that will allow you to do processes which are either not possible, or very slow today. To give one simple example, if you want to do real time changing of pricing based on availability of your supply chain, based on what you've got in stock, based on the delivery capabilities, that's a very, very complex problem. The optimization of all these different things and there are many others that you could include in that. This will give you the ability to automate that process and optimize that process in real time as part of the systems of record and update everything together. That, in terms of business value is extracting a huge number of people who previously would be involved in that chain, reducing their involvement significantly and making the company itself far more agile, far more responsive to change in the marketplace. That's just one example, you can think of hundreds for every marketplace where the application now becomes the systems of record, augmented by AI and huge amounts more data can improve the productivity of an organization and the agility of an organization in the marketplace. >> This is a godsend for AI. AI, the draw of AI is all this training data. If you could just move that in memory speed to the application in real time, it makes the applications much sharper and more (mumbling). >> David: Absolutely. >> Participant: How long David, would it take for the cloud vendors to not just offer some instances of this, but essentially to retool their infrastructure. (laughing) >> David: This is, to me a disruption and a half. The people who can be first to market in this are the SaaS vendors who can take their applications or new SaaS vendors. ISV. Sorry, say that again, sorry. >> Participant: The SaaS vendors who have their own infrastructure? >> David: Yes, but it's not going to be long before the AWS' and Microsofts put this in their tool bag. The SaaS vendors have the greatest capability of making this change in the shortest possible time. To me, that's one area where we're going to see results. Make no mistake about it, this is a big change and at the Micron conference, I can't remember what the guys name was, he said it takes two Olympics for people to start adopting things for real. I think that's going to be shorter than two Olympics, but it's going to be quite a slow process for pushing this out. It's radically different and a lot of the traditional ways of doing things are going to be affected. My view is that SaaS is going to be the first and then there are going to be individual companies that solve the problems themselves. Large companies, even small companies that put in systems of this sort and then use it to outperform the marketplace in a significant way. Particularly in the finance area and particularly in other data intent areas. That's my two pennies worth. Anybody want to add anything else? Any other thoughts? >> Dave: Let's wrap some final thoughts on this one. >> Participant: Big deal for big data. >> David: Like it, like it. >> Participant: It's actually more than that because there used to be a major trade off between big data and fast data. Latency and throughput and this starts to push some of those boundaries out so that you sort of can have both at once. >> Dave: Okay, good. Big deal for big data and fast data. >> David: Yeah, I like it. >> Dave: George, you want to talk about digital twins? I remember when you first sort of introduced this, I was like, "Huh? What's a digital twin? "That's an interesting name." I guess, I'm not sure you coined it, but why don't you tell us what digital twin is and why it's relevant. >> George: All right. GE coined it. I'm going to, at a high level talk about what it is, why it's important, and a little bit about as much as we can tell, how it's likely to start playing out and a little bit on the differences of the different vendors who are going after it. As far as sort of defining it, I'm cribbing a little bit from a report that's just in the edit process. It's data representation, this is important, or a model of a product, process, service, customer, supplier. It's not just an industrial device. It can be any entity involved in the business. This is a refinement sort of Peter helped with. The reason it's any entity is because there is, it can represent the structure and behavior, not just of a machine tool or a jet engine, but a business process like sales order process when you see it on a screen and its workflow. That's a digital twin of what used to be a physical process. It applied to both the devices and assets and processes because when you can model them, you can integrate them within a business process and improve that process. Going back to something that's more physical so I can do a more concrete definition, you might take a device like a robotic machine tool and the idea is that the twin captures the structure and the behavior across its lifecycle. As it's designed, as it's built, tested, deployed, operated, and serviced. I don't know if you all know the myth of, in the Greek Gods, one of the Goddesses sprang fully formed from the forehead of Zeus. I forgot who it was. The point of that is digital twin is not going to spring fully formed from any developers head. Getting to the level of fidelity I just described is a journey and a long one. Maybe a decade or more because it's difficult. You have to integrate a lot of data from different systems and you have to add structure and behavior for stuff that's not captured anywhere and may not be captured anywhere. Just for example, CAD data might have design information, manufacturing information might come from there or another system. CRM data might have support information. Maintenance repair and overhaul applications might have information on how it's serviced. Then you also connect the physical version with the digital version with essentially telemetry data that says how its been operating over time. That sort of helps define its behavior so you can manipulate that and predict things or simulate things that you couldn't do with just the physical version. >> You have to think about combined with say 3D printers, you could create a hot physical back up of some malfunctioning thing in the field because you have the entire design, you have the entire history of its behavior and its current state before it went kablooey. Conceivably, it can be fabricated on the fly and reconstituted as a physicologic from the digital twin that was maintained. >> George: Yes, you know what actually that raises a good point which is that the behavior that was represented in the telemetry helps the designer simulate a better version for the next version. Just what you're saying. Then with 3D printing, you can either make a prototype or another instance. Some of the printers are getting sophisticated enough to punch out better versions or parts for better versions. That's a really good point. There's one thing that has to hold all this stuff together which is really kind of difficult, which is challenging technology. IBM calls it a knowledge graph. It's pretty much in anyone's version. They might not call it a knowledge graph. It's a graph is, instead of a tree where you have a parent and then children and then the children have more children, a graph, many things can relate to many things. The reason I point that out is that puts a holistic structure over all these desperate sources of data behavior. You essentially talk to the graph, sort of like with Arnold, talk to the hand. That didn't, I got crickets. (laughing) Let me give you guys the, I put a definitions table in this dock. I had a couple things. Beta models. These are some important terms. Beta model represents the structure but not the behavior of the digital twin. The API represents the behavior of the digital twin and it should conform to the data model for maximum developer usability. Jim, jump in anywhere where you feel like you want to correct or refine. The object model is a combination of the data model and API. You were going to say something? >> Jim: No, I wasn't. >> George: Okay. The object model ultimately is the digital twin. Another way of looking at it, defining the structure and behavior. This sounds like one of these, say "T" words, the canonical model. It's a generic version of the digital twin or really the one where you're going to have a representation that doesn't have customer specific extensions. This is important because the way these things are getting built today is mostly custom spoke and so if you want to be able to reuse work. If someone's building this for you like a system integrator, you want to be able to, or they want to be able to reuse this on the next engagement and you want to be able to take the benefit of what they've learned on the next engagement back to you. There has to be this canonical model that doesn't break every time you essentially add new capabilities. It doesn't break your existing stuff. Knowledge graph again is this thing that holds together all the pieces and makes them look like one coherent hole. I'll get to, I talked briefly about network compatibility and I'll get to level of detail. Let me go back to, I'm sort of doing this from crib notes. We talked about telemetry which is sort of combining the physical and the twin. Again, telemetry's really important because this is like the time series database. It says, this is all the stuff that was going on over time. Then you can look at telemetry data that tells you, we got a dirty power spike and after three of those, this machine sort of started vibrating. That's part of how you're looking to learn about its behavior over time. In that process, models get better and better about predicting and enabling you to optimize their behavior and the business process with which it integrates. I'll give some examples of that. Twins, these digital twins can themselves be composed in levels of detail. I think I used the example of a robotic machine tool. Then you might have a bunch of machine tools on an assembly line and then you might have a bunch of assembly lines in a factory. As you start modeling, not just the single instance, but the collections that higher up and higher levels of extractions, or levels of detail, you get a richer and richer way to model the behavior of your business. More and more of your business. Again, it's not just the assets, but it's some of the processes. Let me now talk a little bit about how the continual improvement works. As Jim was talking about, we have data feedback loops in our machine learning models. Once you have a good quality digital twin in place, you get the benefit of increasing returns from the data feedback loops. In other words, if you can get to a better starting point than your competitor and then you get on the increasing returns of the data feedback loops, that is improving the fidelity of the digital twins now faster than your competitor. For one twin, I'll talk about how you want to make the whole ecosystem of twins sort of self-reinforcing. I'll get to that in a sec. There's another point to make about these data feedback loops which is traditional apps, and this came up with Jim and Neil, traditional apps are static. You want upgrades, you get stuff from the vendor. With digital twins, they're always learning from the customer's data and that has implications when the partner or vendor who helped build it for a customer takes learnings from the customer and goes to a similar customer for another engagement. I'll talk about the implications from that. This is important because it's half packaged application and half bespoke. The fact that you don't have to take the customer's data, but your model learns from the data. Think of it as, I'm not going to take your coffee beans, your data, but I'm going to run or make coffee from your beans and I'm going to take that to the next engagement with another customer who could be your competitor. In other words, you're extracting all the value from the data and that helps modify the behavior of the model and the next guy gets the benefit of it. Dave, this is the stuff where IBM keeps saying, we don't take your data. You're right, but you're taking the juice you squeezed out of it. That's one of my next reports. >> Dave: It's interesting, George. Their contention is, they uniquely, unlike Amazon and Google, don't swap spit, your spit with their competitors. >> George: That's misleading. To say Amazon and Google, those guys aren't building digital twins. Parametric technology is. I've got this definitely from a parametric technical fellow at an AWS event last week, which is they, not only don't use the data, they don't use the structure of the twin either from engagement to engagement. That's a big difference from IBM. I have a quote, Chris O'Connor from IBM Munich saying, "We'll take the data model, "but we won't take the data." I'm like, so you take the coffee from the beans even if you don't take the beans? I'm going to be very specific about saying that saying you don't do what Google and FaceBook do, what they do, it's misleading. >> Dave: My only caution there is do some more vetting and checking. A lot of times what some guy says on a Cube interview, he or she doesn't even know, in my experience. Make sure you validate that. >> George: I'll send it to them for feedback, but it wasn't just him. I got it from the CTO of the IOT division as well. >> Dave: When you were in Munich? >> George: This wasn't on the Cube either. This was by the side of, at the coffee table during our break. >> Dave: I understand and CTO's in theory should know. I can't tell you how many times I've gotten a definitive answer from a pretty senior level person and it turns out it was, either they weren't listening to me or they didn't know or they were just yessing me or whatever. Just be really careful and make sure you do your background checks. >> George: I will. I think the key is leave them room to provide a nuanced answer. It's more of a really, really, really concrete about really specific edge conditions and say do you or don't you. >> Dave: This is a pretty big one. If I'm a CIO, a chief digital officer, a chief data officer, COO, head of IT, head of data science, what should I be doing in this regard? What's the advice? >> George: Okay, can I go through a few more or are we out of time? >> Dave: No, we have time. >> George: Let me do a couple more points. I talked about training a single twin or an instance of a twin and I talked about the acceleration of the learning curve. There's edge analytics, David has educated us with the help of looking at GE Predicts. David, you have been talking about this fpr a long time. You want edge analytics to inform or automate a low latency decision and so this is where you're going to have to run some amount of analytics. Right near the device. Although I got to mention, hopefully this will elicit a chuckle. When you get some vendors telling you what their edge and cloud strategies are. Map R said, we'll have a hadoop cluster that only needs four or five nodes as our edge device. And we'll need five admins to care and feed it. He didn't say the last part, but that obviously isn't going to work. The edge analytics could be things like recalibrating the machine for different tolerance. If it's seeing that it's getting out of the tolerance window or something like that. The cloud, and this is old news for anyone who's been around David, but you're going to have a lot of data, not all of it, but going back to the cloud to train both the instances of each robotic machine tool and the master of that machine tool. The reason is, an instance would be oh I'm operating in a high humidity environment, something like that. Another one would be operating where there's a lot of sand or something that screws up the behavior. Then the master might be something that has behavior that's sort of common to all of them. It's when the training, the training will take place on the instances and the master and will in all likelihood push down versions of each. Next to the physical device process, whatever, you'll have the instance one and a class one and between the two of them, they should give you the optimal view of behavior and the ability to simulate to improve things. It's worth mentioning, again as David found out, not by talking to GE, but by accidentally looking at their documentation, their whole positioning of edge versus cloud is a little bit hand waving and in talking to the guys from ThingWorks which is a division of what used to be called Parametric Technology which is just PTC, it appears that they're negotiating with GE to give them the orchestration and distributed database technology that GE can't build itself. I've heard also from two ISV's, one a major one and one a minor one who are both in the IOT ecosystem one who's part of the GE ecosystem that predicts as a mess. It's analysis paralysis. It's not that they don't have talent, it's just that they're not getting shit done. Anyway, the key thing now is when you get all this - >> David: Just from what I learned when I went to the GE event recently, they're aware of their requirement. They've actually already got some sub parts of the predix which they can put in the cloud, but there needs to be more of it and they're aware of that. >> George: As usual, just another reason I need a red phone hotline to David for any and all questions I have. >> David: Flattery will get you everywhere. >> George: All right. One of the key takeaways, not the action item, but the takeaway for a customer is when you get these data feedback loops reinforcing each other, the instances of say the robotic machine tools to the master, then the instance to the assembly line to the factory, when all that is being orchestrated and all the data is continually enhancing the models as well as the manual process of adding contextual information or new levels of structure, this is when you're on increasing returns sort of curve that really contributes to sustaining competitive advantage. Remember, think of how when Google started off on search, it wasn't just their algorithm, but it was collecting data about which links you picked, in which order and how long you were there that helped them reinforce the search rankings. They got so far ahead of everyone else that even if others had those algorithms, they didn't have that data to help refine the rankings. You get this same process going when you essentially have your ecosystem of learning models across the enterprise sort of all orchestrating. This sounds like motherhood and apple pie and there's going to be a lot of challenges to getting there and I haven't gotten all the warts of having gone through, talked to a lot of customers who've gotten the arrows in the back, but that's the theoretical, really cool end point or position where the entire company becomes a learning organization from these feedback loops. I want to, now that we're in the edit process on the overall digital twin, I do want to do a follow up on IBM's approach. Hopefully we can do it both as a report and then as a version that's for Silicon Angle because that thing I wrote on Cloudera got the immediate attention of Cloudera and Amazon and hopefully we can both provide client proprietary value add, but also the public impact stuff. That's my high level. >> This is fascinating. If you're the Chief of Data Science for example, in a large industrial company, having the ability to compile digital twins of all your edge devices can be extraordinarily valuable because then you can use that data to do more fine-grained segmentation of the different types of edges based on their behavior and their state under various scenarios. Basically then your team of data scientists can then begin to identify the extent to which they need to write different machine learning models that are tuned to the specific requirements or status or behavior of different end points. What I'm getting at is ultimately, you're going to have 10 zillion different categories of edge devices performing in various scenarios. They're going to be driven by an equal variety of machine learning, deep learning AI and all that. All that has to be built up by your data science team in some coherent architecture where there might be a common canonical template that all devices will, all the algorithms and so forth on those devices are being built from. Each of those algorithms will then be tweaked to the specific digital twins profile of each device is what I'm getting at. >> George: That's a great point that I didn't bring up which is folks who remember object oriented programming, not that I ever was able to write a single line of code, but the idea, go into this robotic machine tool, you can inherit a couple of essentially component objects that can also be used in slightly different models, but let's say in this machine tool, there's a model for a spinning device, I forget what it's called. Like a drive shaft. That drive shaft can be in other things as well. Eventually you can compose these twins, even instances of a twin with essentially component models themselves. Thing Works does this. I don't know if GE does this. I don't think IBM does. The interesting thing about IBM is, their go to market really influences their approach to this which is they have this huge industry solutions group and then obviously the global business services group. These guys are all custom development and domain experts so they'll go into, they're literally working with Airbus and with the goal of building a model of a particular airliner. Right now I think they're doing the de-icing subsystem, I don't even remember on which model. In other words they're helping to create this bespoke thing and so that's what actually gets them into trouble with potentially channel conflict or maybe it's more competitor conflict because Airbus is not going to be happy if they take their learnings and go work with Boeing next. Whereas with PTC and Thing Works, at least their professional services arm, they treat this much more like the implementation of a packaged software product and all the learnings stay with the customer. >> Very good. >> Dave: I got a question, George. In terms of the industrial design and engineering aspect of building products, you mentioned PTC which has been in the CAD business and the engineering business for software for 50 years, and Ansis and folks like that who do the simulation of industrial products or any kind of a product that gets built. Is there a natural starting point for digital twin coming out of that area? That would be the vice president of engineering would be the guy that would be a key target for this kind of thinking. >> George: Great point. This is, I think PTC is closely aligned with Terradata and they're attitude is, hey if it's not captured in the CAD tool, then you're just hand waving because you won't have a high fidelity twin. >> Dave: Yeah, it's a logical starting point for any mechanical kind of device. What's a thing built to do and what's it built like? >> George: Yeah, but if it's something that was designed in a CAD tool, yes, but if it's something that was not, then you start having to build it up in a different way. I think, I'm trying to remember, but IBM did not look like they had something that was definitely oriented around CAD. Theirs looked like it was more where the knowledge graph was the core glue that pulled all the structure and behavior together. Again, that was a reflection of their product line which doesn't have a CAD tool and the fact that they're doing these really, really, really bespoke twins. >> Dave: I'm thinking that it strikes me that from the industrial design in engineering area, it's really the individual product is really the focus. That's one part of the map. The dynamic you're pointing at, there's lots of other elements of the map in terms of an operational, a business process. That might be the fleet of wind turbines or the fleet of trucks. How they behave collectively. There's lots of different entry points. I'm just trying to grapple with, isn't the CAD area, the engineering area at least for hard products, have an obvious starting point for users to begin to look at this. The BP of Engineering needs to be on top of this stuff. >> George: That's a great point that I didn't bring up which is, a guy at Microsoft who was their CTO in their IT organization gave me an example which was, you have a pipeline that's 1,000 miles long. It's got 10,000 valves in it, but you're not capturing the CAD design of the valve, you just put a really simple model that measures pressure, temperature, and leakage or something. You string 10,000 of those together into an overall model of the pipeline. That is a low fidelity thing, but that's all they need to start with. Then they can see when they're doing maintenance or when the flow through is higher or what the impact is on each of the different valves or flanges or whatever. It doesn't always have to start with super high fidelity. It depends on which optimizing for. >> Dave: It's funny. I had a conversation years ago with a guy, the engineering McNeil Schwendler if you remember those folks. He was telling us about 30 to 40 years ago when they were doing computational fluid dynamics, they were doing one dimensional computational fluid dynamics if you can imagine that. Then they were able, because of the compute power or whatever, to get the two dimensional computational fluid dynamics and finally they got to three dimensional and they're looking also at four and five dimensional as well. It's serviceable, I guess what I'm saying in that pipeline example, the way that they build that thing or the way that they manage that pipeline is that they did the one dimensional model of a valve is good enough, but over time, maybe a two or three dimensional is going to be better. >> George: That's why I say that this is a journey that's got to take a decade or more. >> Dave: Yeah, definitely. >> Take the example of airplane. The old joke is it's six million parts flying in close formation. It's going to be a while before you fit that in one model. >> Dave: Got it. Yes. Right on. When you have that model, that's pretty cool. All right guys, we're about out of time. I need a little time to prep for my next meeting which is in 15 minutes, but final thoughts. Do you guys feel like this was useful in terms of guiding things that you might be able to write about? >> George: Hugely. This is hugely more valuable than anything we've done as a team. >> Jim: This is great, I learned a lot. >> Dave: Good. Thanks you guys. This has been recorded. It's up on the cloud and I'll figure out how to get it to Peter and we'll go from there. Thanks everybody. (closing thank you's)

Published Date : May 9 2017

SUMMARY :

There you go. and maybe the key issues that you see and is coming even more deeply into the core practice You had mentioned, you rattled off a bunch of parameters. It's all about the core team needs to be, I got a minimal modular, incremental, iterative, iterative, adaptive, and co-locational. in the context of data science, and get automation of many of the aspects everything that these people do needs to be documented that the whole rapid idea development flies in the face of that create the final product that has to go into production and the algorithms and so forth that were used and the working model is obviously a subset that handle the continuous training and retraining David: Is that the right way of doing it, Jim? and come back to sort of what I was trying to get to before Dave: Please, that would be great. so how in the world are you going to agilize that? I think if you try to represent data science the algorithm to be fit for purpose and he said something to me the other day. If you look at - Just to clarify, he said agile's dead? Dave: Go ahead, Jim. and the functional specifications and all that. and all that is increasingly the core that the key aspect of all the data scientists that incorporates the crux of data science Nick, you there? Tough to hear you. pivoting off the Micron news. the ability to create a whole number of nodes. Participant: This latency and the node count At the moment, 3D Crosspoint is a nice to have That is the secret sauce which allows you The latency is incredibly short. Move the processing to that particular node Is Micron the first market with this capability, David? David: Over fabric? and who are coming to market with their own versions. Dave: David? bring the application to the data, Now that the impact of non co-location and you have to do a join, and take the output of that and bring that together. All of the data science is there. NVMe, by definition is a point to point architecture. Dave: Oh, got it. Does that definition of (mumbling) make sense to you guys? Nick: You're emphasizing the network topology, the whole idea was the thrust was performance. of the systems as a whole Then the fact that you have a large memory space is that the amount of data that you can pass through it You just referenced 200 milliseconds or microseconds? David: Did I say milliseconds? Relate that to the five microsecond thing again. anywhere else in the thousand nodes, That's the reason why you can now do what I talked about when you have an MPI like mesh than a ring. They believe that the cost of the equivalent DSSD system Who develops the query optimizer for the system? Jim: The DBMS vendor would have to re-write theirs I don't recognize the voice. Dave: That was Neil. It happens to be very low latency which is that you can get at all the data, Yeah, the array controller is software from a company called That's the company that has produced the software from the ISDs to support this and other developers. and the agility of an organization in the marketplace. AI, the draw of AI is all this training data. for the cloud vendors to not just offer are the SaaS vendors who can take their applications and then there are going to be individual companies Latency and throughput and this starts to push Dave: Okay, good. I guess, I'm not sure you coined it, and the idea is that the twin captures the structure Conceivably, it can be fabricated on the fly and it should conform to the data model and that helps modify the behavior Dave: It's interesting, George. saying, "We'll take the data model, Make sure you validate that. I got it from the CTO of the IOT division as well. This was by the side of, at the coffee table I can't tell you how many times and say do you or don't you. What's the advice? of behavior and the ability to simulate to improve things. of the predix which they can put in the cloud, I need a red phone hotline to David and all the data is continually enhancing the models having the ability to compile digital twins and all the learnings stay with the customer. and the engineering business for software hey if it's not captured in the CAD tool, What's a thing built to do and what's it built like? and the fact that they're doing these that from the industrial design in engineering area, but that's all they need to start with. and finally they got to three dimensional that this is a journey that's got to take It's going to be a while before you fit that I need a little time to prep for my next meeting This is hugely more valuable than anything we've done how to get it to Peter and we'll go from there.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim	PERSON	0.99+
Chris O'Connor	PERSON	0.99+
George	PERSON	0.99+
Dave	PERSON	0.99+
Airbus	ORGANIZATION	0.99+
Boeing	ORGANIZATION	0.99+
Jim Kobeielus	PERSON	0.99+
James	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Neil	PERSON	0.99+
Joe	PERSON	0.99+
Nick	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
1,000 miles	QUANTITY	0.99+
10	QUANTITY	0.99+
Peter	PERSON	0.99+
195 microseconds	QUANTITY	0.99+

Carlo Vaiti | DataWorks Summit Europe 2017

>> Announcer: You are CUBE Alumni. Live from Munich, Germany, it's theCUBE. Covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello, everyone, welcome back to live coverage at DataWorks 2017, I'm John Furrier with my cohost, Dave Vellante. Two days of coverage here in Munich, Germany, covering Hortonworks and Yahoo, presenting Hadoop Summit, now called DataWorks 2017. Our next guest is Carlo Vaiti, who's the HPE chief technology strategist, EMEA Digital Solutions, Europe, Middle East, and Africa. Welcome to theCUBE. >> Thank you, John. >> So we were just chatting before we came on, of your historic background at IBM, Oracle, and now HPE, and now back into the saddle there. >> Don't forget Sun Microsystems. >> Sun Microsystems, sorry, Sun, yeah. I mean, great, great run. >> It was a long run. >> You've seen the computer revolution happen. I worked at HP for nine years, from '88 to '97. Again, Dave was a premier analyst during that run of client-server. We've seen the computer revolution happen. Now we're seeing the digital revolution where the iPhone is now 10 years old, Cloud is booming, data's at the center of the value proposition, so a completely new disruptive capability. >> Carlo: Sure, yes. >> So what are you doing as the CTO, chief technologist for HPE, how are you guys bringing this story together? 'Cause there's so much going on at HPE. You got the services spit, you got the software split, and HP's focusing on the new style of IT, as Meg Whitman calls it. >> So, yeah. My role in EMEA is actually all about having basically a visionary kind of strategy role for what's going to be HP in the future, in terms of IT. And one of the things that we are looking at is, is specifically to have, we split our strategy in three different aspects, so three transformation areas. The first one which we usually talk is what I call hybrid IT, right, which is basically making services around either On-Premise or on Cloud for our customer base. The second one is actually power the Intelligent Edge, so is actually looking after our collaboration and when we acquire Aruba components. And the third one, which is in the middle, and that's why I'm here at the DataWorks Summit, is actually the data-analytics aspects. And we have a couple of solution in there. One is the Enterprise great Hadoop, which is part of this. This is actually how we generalize all the figure and the strategy for HP. >> It's interesting, Dave and I were talking yesterday, being in Europe, it's obviously a different sideshow, it's smaller than the DataWorks or Hadoop Summit in North America in San Jose, but there's a ton of Internet of things, IoT or IIoT, 'cause here in Germany, obviously, a lot of industrial nations, but in Europe in general, a lot of smart cities initiatives, a lot of mobility, a ton of Internet of things opportunity, more than in the US. >> Absolutely. >> Can you comment on how you guys are tackling the IoT? Because it's an Intelligent Edge, certainly, but it's also data, it's in your wheelhouse. >> Yes, sure. So I'm actually working, it's a good question, because I'm actually working a couple of projects in Eastern Europe, where it's all about Industrial IoT Analytics, IIoTA. That's the new terminology we use. So what we do is actually, we analyze from a business perspective, what are the business pain points, in an oil and gas company for example. And we understand for example, what kind of things that they need and must have. And what I'm saying here is, one of the aspects for example, is the drilling opportunity. So how much oil you can extract from a specific rig in the middle of the North Sea, for example. This is one of the key question, because the customer want to understand, in the future, how much oil they can extract. The other one is for example, the upstream business. So doing on the retail side and having, say, when my customer is stopping in a gas station, I want go in the shop, immediately giving, I dunno, my daughter, a kind of campaign for the Barbie, because they like the Barbie. So IoT, Industrial IoT help us in actually making a much better customer experience, and that's the case of the upstream business, but is also helping us in actually much faster business outcomes. And that's what the customer wants, right? 'Cause, and was talking with your colleague before, I'm talking to the business guy. I'm not talking to the IT anymore in these kind of place, and that's how IoT allow us a chance to change the conversation at the industry level. >> These are first-time conversations too. You're getting at the kinds of business conversations that weren't possible five years ago. >> Carlo: Yes, sure. >> I mean and 10 years ago, they would have seemed fantasy. Now they're reality. >> The role of analytics in my opinion, is becoming extremely key, and I said this morning, for me my best center is that the detail, is the stone foundation of the digital economy. I continue to repeat this terminology, because it's actually where everything is starting from. So what I mean is, let's take a look at the analytic aspect. So if I'm able to analyze the data close to the shop floor, okay, close to the shop manufacturing floor, if I'm able to analyze my data on the rig, in the oil and gas industry, if I'm able to analyze doing preprocessing analytics, with Kafka, Druid, these kind of open-source software, where close to the Intelligent Edge, then my customers going to be happy, because I give them very fast response, and the decision-maker can get to decision in a faster time. Today, it takes a long time to take these type of decision. So that's why we want to move into the power Intelligent Edge. >> So you're saying, data's foundational, but if you get to the Intelligent Edge, it's dynamic. So you have a dynamic reactive, realtime time series, or presences of data, but you need the foundational pre-data. >> Perfect. >> Is that kind of what you're getting at? >> Yes, that's the first step. Preprocessing analytics is what we do. In the next generation of, we think is going to be Industrial IoT Analytics, we're going to actually put massive amount of compute close to the shop manufacturing floor. We call internally or actually externally, convergent planned infrastructure. And that's the key point, right? >> John: Convergent plan? >> Convergent planned infrastructure, CPI. If you look at in Google, you will find. It's a solution we bring in the market a few months ago. We announce it in December last year. >> Yeah, Antonio's smart. He also had a converged systems as well. One of the first ones. >> Yeah, so that's converge compute at the edge basically. >> Correct, converge compute-- >> Very powerful. >> Very powerful, and we run analytics on the edge. That's the key point. >> Which we love, because that means you don't have to send everything back to the Cloud because it's too expensive, it's going to take too long, it's not going to work. >> Carlo: The bandwidth on the network is much less. >> There's no way that's going to be successful, unless you go to the edge and-- >> It takes time. >> With a cost. >> Now the other thing is, of course, you've got the Aruba asset, to be able to, I always say, joke, connect the windmill. But, Carlo, can we go back to the IoTA example? >> Carlo: Correct, yeah. >> I want to help, help our audience understand, sort of, the new HP, post these spin merges. So perviously you would say, okay, we have Vertica. You still have partnership, or you still own Vertica, but after September 1st-- >> Absolutely, absolutely. It's part of the columnar side-- >> Right, yes, absolutely, but, so. But the new strategy is to be more of a platform for a variety of technology. So how for instance would you solve, or did you solve, that problem that you described? What did you actually deliver? >> So again, as I said, we're, especially in the Industrial IoT, we are an ecosystem, okay? So we're one element of the ecosystem solution. For the oil and gas specifically, we're working with other system integrator. We're working with oil and the industry gas expertise, like DXC company, right, the company that we just split a few days ago, and we're working with them. They're providing the industry expertise. We are a infrastructure provided around that, and the services around that for the infrastructure element. But for the industry expertise, we try to have a kind of little bit of knowledge, to start the conversation with the customer. But again, my role in the strategy is actually to be a ecosystem digital integrator. That's the new terminology we like to bring in the market, because we really believe that's the way HP role is going to be. And the relevance of HP is totally depending if we are going to be successful in these type of things. >> Okay, now a couple other things you talked about in your keynote. I'm just going to list them, and then we can go wherever we want. There was Data Link 3.0, Storage Disaggregation, which is kind of interesting, 'cause it's been a problem. Hadoop as a service, Realtime Everywhere, and then Analytics at the Edge, which we kind of just talked about. Let's pick one. Let's start with Data Link 3.0. What is that? John doesn't like the term data link. He likes data ocean. >> I like data ocean. >> Is Data Link 3.0 becoming an ocean? >> It's becoming an ocean. So, Data Link 3.0 for us is actually following what is going to be the future for HDFS 3.0. So we have three elements. The erasure coding feature, which is coming on HDFS. The second element is around having HDFS data tier, multi-data tier. So we're going to have faster SSD drives. We're going to have big memory nodes. We're going to have GPU nodes. And the reason why I say disaggregation is because some of the workload will be only compute, and some of the workload will be only storage, okay? So we're going to bring, and the customer require this, because it's getting more data, and they need to have for example, YARN application running on compute nodes, and the same level, they want to have storage compute block, sorry, storage components, running on the storage model, like HBase for example, like HDFS 3.0 with the multi-tier option. So that's why the data disaggregation, or disaggregation between compute and storage, is the key point. We call this asymmetric, right? Hadoop is becoming asymmetric. That's what it mean. >> And the problem you're solving there, is when I add a node to a cluster, I don't have to add compute and storage together, I can disaggregate and choose whatever I need, >> Everyone that we did. >> based on the workload. >> They are all multitenancy kind of workload, and they are independent and they scale out. Of course, it's much more complex, but we have actually proved that this is the way to go, because that's what the customer is demanding. >> So, 3.0 is actually functional. It's erasure coding, you said. There's a data tier. You've got different memory levels. >> And I forgot to mention, the containerization of the application. Having dockerized the application for example. Using mesosphere for example, right? So having the containerization of the application is what all of that means, because what we do in Hadoop, we actually build the different clusters, they need to talk to each other, and change data in a faster way. And a solution like, a product like SQL Manager, from Hortonworks, is actually helping us to get this connection between the cluster faster and faster. And that's what the customer wants. >> And then Hadoop as a service, is that an on-premise solution, is that a hybrid solution, is it a Cloud solution, all three? >> I can offer all of them. Hadoop is a service could be run on-premise, could be run on a public Cloud, could be run on Azure, or could be mix of them, partially on-premise, and partially on public. >> And what are you seeing with regard to customer adoption of Cloud, and specifically around Hadoop and big data? >> I think the way I see that option is all the customer want to start very small. The maturity is actually better from a technology standpoint. If you're asking me the same question maybe a year ago, I would say, it's difficult. Now I think they've got the point. Every large customer, they want to build this big data ocean, note the delay, ocean, whatever you want to call it. >> John: Love that. (laughs) >> All right. They want to build this data ocean, and the point I want to make is, they want to start small, but they want to think very high. Very big, right, from their perspective. And the way they approach us is, we have a kind of methodology. We establish the maturity assessment. We do a kind of capability maturity assessment, where we find that if the customer is actually a pioneer, or is actually a very traditional one, so it's very slow-going. Once we determine where is the stage of the customer is, we propose some specific proof of concept. And in three months usually, we're putting this in place. >> You also talked about realtime everywhere. We in our research, we talk about the, historically, you had batchy of interactive, and now you have what we call continuous, or realtime streaming workloads. How prevalent is that? Where do you see it going in the future? >> So I think is another train for the future, as I mentioned this morning in my presentation. So and Spark is actually doing the open-source memory engine process, is actually the core of this stuff. We see 60 to 70 time faster analytics, compared to not to use Spark. So many customer implemented Spark because of this. The requirement are that the customer needs an immediate response time, okay, for a specific decision-making that they have to do, in order to improve their business, in order to improve their life. But this require a different architecture. >> I have a question, 'cause you, you've lived in the United States, you're obviously global, and spent a lot of time in Europe as well, and a lot of times, people want to discuss the differences between, let's make it specific here, the European continent and North America, and from a sophistication standpoint, same, we can agree on that, but there are still differences. Maybe, more greater privacy concerns. The whole thing with the Cloud and the NSA in the United States, created some concerns. What do you see as the differences today between North America and Europe? >> From my perspective, I think we are much more for example take IoT, Industrial IoT. I think in Europe we are much more advanced. I think in the manufacturing and the automotive space, the connected car kind of things, autonomous driving, this is something that we know already how to manage, how to do it. I mean, Tesla in the US is a good example that what I'm saying is not true, but if I look at for example, large German manufacturing car, they always implemented these type of things already today. >> Dave: For years, yeah. >> That's the difference, right? I think the second step is about the faster analytic approach. So what I mentioned before. The Power the Intelligent Edge, in my opinion at the moment, is much more advanced in the US compared to Europe. But I think Europe is starting to run back, and going on the same route. Because we believe that putting compute capacity on the edge is what actually the customer wants. But that's the two big differences I see. >> The other two big external factors that we like to look at, are Brexit and Trump. So (laughs) how 'about Brexit? Now that it's starting to sort of actually become, begin the process, how should we think about it? Is it overblown? It is critical? What's your take? >> Well, I think it's too early to say. UK just split a few days ago, right, officially. It's going to take another 18 months before it's going to be completed. From a commercial standpoint, we don't see any difference so far. We're actually working the same way. For me it's too early to say if there's going to be any implication on that. >> And we don't know about Trump. We don't have to talk about it, but the, but I saw some data recently that's, European sentiment, business sentiment is trending stronger than the US, which is different than it's been for the last many years. What do you see in terms of just sentiment, business conditions in Europe? Do you see a pick up? >> It's getting better, it is getting better. I mean, if I look at the major countries, the P&L is going positive, 1.5%. So I think from that perspective, we are getting better. Of course we are still suffering from the Chinese, and Japanese market sometimes. Especially in some of the big large deals. The inclusion of the Japanese market, I feel it, and the Chinese market, I feel that. But I think the economy is going to be okay, so it's going to be good. >> Carlo, I want to thank you for coming on and sharing your insight, final question for you. You're new to HPE, okay. We have a lot of history, obviously I was, spent a long part of my career there, early in my career. Dave and I have covered the transformation of HP for many, many years, with theCUBE certainly. What attracted you to HP and what would you say is going on at HP from your standpoint, that people should know about? >> So I think the number one thing is that for us the word is going to be hybrid. It means that some of the services that you can implement, either on-premise or on Cloud, could be done very well by the new Pointnext organization. I'm not part of Pointnext. I'm in the EG, Enterprise Group division. But I am fan for Pointnext because I believe this is the future of our company, is on the services side, that's where it's going. >> I would just point out, Dave and I, our commentary on the spin merge has been, create these highly cohesive entities, very focused. Antonio now running EG, big fans, of where it's actually an efficient business model. >> Carlo: Absolutely. >> And Chris Hsu is running the Micro Focus, CUBE Alumni. >> Carlo: It's a very efficient model, yes. >> Well, congratulations and thanks for coming on and sharing your insights here in Europe. And certainly it is an IoT world, IIoT. I love the analytics story, foundational services. It's going to be great, open source powering it, and this is theCUBE, opening up our content, and sharing that with you. I'm John Furrier, Dave Vellante. Stay with us for more great coverage, here from Munich after the short break.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Welcome to theCUBE. and now back into the saddle there. I mean, great, great run. data's at the center of the value proposition, and HP's focusing on the new style And one of the things that we are looking at is, it's smaller than the DataWorks or Hadoop Summit Can you comment on how you guys are tackling the IoT? and that's the case of the upstream business, You're getting at the kinds of business conversations I mean and 10 years ago, they would have seemed fantasy. and the decision-maker can get to decision in a faster time. So you have a dynamic reactive, And that's the key point, right? It's a solution we bring in the market a few months ago. One of the first ones. That's the key point. it's going to take too long, it's not going to work. Now the other thing is, sort of, the new HP, post these spin merges. It's part of the columnar side-- But the new strategy is to be more That's the new terminology we like to bring in the market, John doesn't like the term data link. and the same level, they want to have but we have actually proved that this is the way to go, So, 3.0 is actually functional. So having the containerization of the application Hadoop is a service could be run on-premise, all the customer want to start very small. John: Love that. and the point I want to make is, they want to start small, and now you have what we call continuous, is actually the core of this stuff. in the United States, created some concerns. I mean, Tesla in the US is a good example is much more advanced in the US compared to Europe. actually become, begin the process, before it's going to be completed. We don't have to talk about it, but the, and the Chinese market, I feel that. Dave and I have covered the transformation of HP It means that some of the services that you can implement, our commentary on the spin merge has been, I love the analytics story, foundational services.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Carlo	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Trump	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Pointnext	ORGANIZATION	0.99+
Chris Hsu	PERSON	0.99+
John	PERSON	0.99+
Carlo Vaiti	PERSON	0.99+
John Furrier	PERSON	0.99+
HP	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Antonio	PERSON	0.99+
US	LOCATION	0.99+
EG	ORGANIZATION	0.99+
second element	QUANTITY	0.99+
United States	LOCATION	0.99+
second step	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
December last year	DATE	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
San Jose	LOCATION	0.99+
1.5%	QUANTITY	0.99+
yesterday	DATE	0.99+
North America	LOCATION	0.99+
September 1st	DATE	0.99+
'97	DATE	0.99+
'88	DATE	0.99+
Africa	LOCATION	0.99+
one	QUANTITY	0.99+
Today	DATE	0.99+
three months	QUANTITY	0.99+
Eastern Europe	LOCATION	0.99+
Sun	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
60	QUANTITY	0.99+
DataWorks 2017	EVENT	0.99+
10 years ago	DATE	0.99+
DXC	ORGANIZATION	0.98+
EMEA Digital Solutions	ORGANIZATION	0.98+
five years ago	DATE	0.98+
a year ago	DATE	0.98+
Tesla	ORGANIZATION	0.98+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for North Sea: