MANUFACTURING Drive Transportation

(upbeat music) >> Welcome to our industry drill-down. This is from manufacturing. I'm here with Michael Ger who is the managing director for automotive and manufacturing solutions at Cloudera, and in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data. Connected trucks are fundamental to optimizing fleet performance, costs, and delivering new services to fleet operators, and what's going to happen here is Michael's going to present some data and information, and we're going to come back and have a little conversation about what we just heard. Michael, great to see you. Over to you. >> Oh, thank you, Dave, and I appreciate having this conversation today. Hey, this is actually an area, connected trucks. This is an area that we have seen a lot of action here at Cloudera, and I think the reason is kind of important because first of all, you can see that this change is happening very, very quickly. 150% growth is forecast by 2022, and I think this is why we're seeing a lot of action and a lot of growth is that there are a lot of benefits. We're talking about a B2B type of situation here. So this is truck makers providing benefits to fleet operators, and if you look at the top benefits that fleet operators expect, you see this in the graph over here. Almost 80% of them expect improved productivity, things like improved routing, so route efficiencies and improved customer service, decrease in fuel consumption, but better be, this isn't technology for technology's sake. These connected trucks are coming onto the marketplace because hey, they can provide tremendous value to the business, and in this case, we're talking about fleet operators and fleet efficiencies. So, one of the things that's really important to be able to enable this, trucks are becoming connected because at the end of the day, we want to be able to provide fleet efficiencies through connected truck analytics and machine learning. Let me explain to you a little bit about what we mean by that because how this happens is by creating a connected vehicle analytics machine learning lifecycle, and to do that, you need to do a few different things. You start off, of course, with connected trucks in the field, and you could have many of these trucks 'cause typically, you're dealing at a truck level and at a fleet level. We want to be able to do analytics and machine learning to improve performance. So you start off with these trucks, and the first you need to be able to do is connect to those trucks. You have to have an intelligent edge where you can collect that information from the trucks, and by the way, once you've conducted this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now, what I'm going to show you, the ability to take this real-time action is actually the result of a machine learning life cycle. Let me explain to you what I mean by that. So we have this truck, so we start to collect data from it. At the end of the day, what we'd like to be able to do is pull that data into either your data center or into the cloud where we can start to do more advanced analytics, and we start with being able to ingest that data into the cloud, into the enterprise data lake. We store that data. We want to enrich it with other data sources, so for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've collected from those trucks, and you want to augment that with your dealership service information. Now, you have sensor data and the resulting repair orders. You're now equipped to do things like predict when maintenance will occur. You've got all the data sets that you need to be able to do that. So what do you do here? Like I said, you ingest it, you're storing it, you're enriching it with data. You're processing that data. You're aligning, say, the sensor data to that transactional system data from your repair maintenance systems. You're bringing it together so that you can do two things. First of all, you could do self-service BI on that data. You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to create machine learning models. So if you have the sensor values and the need, for example, for a dealership repair order so you could start to correlate which sensor values predicted the need for maintenance, and you could build out those machine learning models, and then, as I mentioned to you, you could push those machine learning models back out to the edge which is how you would then take those real-time actions I mentioned earlier. As that data that then comes through in real-time, you're running it against that model, and you can take some real-time actions. This analytics and machine learning model, machine learning life cycle, is exactly what Cloudera enables. This end-to-end ability to ingest data, store it, put a query lay over it, create machine learning models, and then run those machine learning models in real-time. Now, that's what we do as a business. Now, one such customer, and I just wanted to give you one example of a customer that we have worked with to provide these types of results is Navistar, and Navistar was kind of an early adopter of connected-truck analytics, and they provided these capabilities to their fleet operators. And they started off by connecting 475,000 trucks, up to well over a million now, and the point here is that they were centralizing data from their telematics service providers, from their trucks' telematics service providers. They're bringing in things like weather data and all those types of things, and what they started to do was to build out machine learning models aimed at predictive maintenance, and what's really interesting is that you see that Navistar made tremendous strides in reducing the expense associated with maintenance. So rather than waiting for a truck to break, and then fixing it, they would predict when that truck needs service, condition-based monitoring, and service it before it broke down so that you can do that in a much more cost-effective manner. And if you see the benefits, they reduced maintenance costs 3 cents a mile down from the industry average of 15 cents a mile down to 12 cents cents a mile. So this was a tremendous success for Navistar, and we're seeing this across many of our truck manufacturers. We're working with many of the truck OEMs, and they are all working to achieve very, very similar types of benefits to their customers. So just a little bit about Navistar. Now, we're going to turn to Q&A. Dave's got some questions for me in a second, but before we do that, if you want to learn more about how we work with connected vehicles and autonomous vehicles, please go to our website, what you see up on the screen. There's the URLs, cloudera.com/solutions/manufacturing, and you'll see a whole slew of lateral and information in much more detail in terms of how we connect trucks to fleet operators who provide analytics, use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >> Thank you, Michael. That's a great example. I love the lifecycle. We can visualize that very well. You've got an edge-use case. You're doing both real time inference, really, at the edge, and then you're blending that sensor data with other data sources to enrich your models, and you can push that back to the edge. That's that life cycle, so really appreciate that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into? >> Yeah, that's a great question, Dave 'cause everybody always thinks about machine learning, like this is the first thing you go to. Well, actually it's not. The first thing you really want to be able to do, and many of our customers are doing, is look, let's simply connect our trucks or our vehicles or whatever our IOT asset is, and then you can do very simple things like just performance monitoring of the piece of equipment. In the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how is the driver performing? Is there a lot of idle time spent? What's route efficiencies looking like? By connecting the vehicles, you get insights, as I said, into the truck and into the driver, and that's not machine learning any more, but that monitoring piece is really, really important. So the first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, what I call, the machine learning and AI models where you're using inference on the edge, and there you start to see things like predictive maintenance happening, kind of real-time route optimization and things like that, and you start to see that evolution again to those smarter, more intelligent, dynamic types of decision-making. But let's not minimize the value of good old-fashioned monitoring to give you that kind of visibility first, then moving to smarter use cases as you go forward. >> You know, it's interesting. I'm envisioning, when you talked about the monitoring, I'm envisioning you see the bumper sticker how am I driving? The only time somebody ever probably calls is when they get cut off, and many people might think, oh, it's about Big Brother, but it's not. I mean, that's yeah, okay, fine, but it's really about improvement and training and continuous improvement, and then of course the route optimization. I mean, that's bottom-line business value. I love those examples. >> Great. >> What are the big hurdles that people should think about when they want to jump into those use cases that you just talked about? What are they going to run into, the blind spots they're going to get hit with? >> There's a few different things. So first of all, a lot of times, your IT folks aren't familiar with kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set. There's very specialized hardware in the car and things like that and protocols. That's number one. That's the classic IT-OT kind of conundrum that many of our customers struggle with, but then more fundamentally is if you look at the way these types of connected truck or IOT solutions started, oftentimes the first generation were very custom built, so they were brittle. They were kind of hardwired. Then as you move towards more commercial solutions, you had what I call the silo problem. You had fragmentation in terms of this capability from this vendor, this capability from another vendor. You get the idea. One of the things that we really think that needs to be brought to the table is first of all, having an end-to-end data management platform that's kind of integrated, it's all tested together. You have a data lineage across the entire stack, but then also importantly, to be realistic, you have to be able to integrate to industry kind of best practices as well in terms of solution components in the car, the hardware, and all those types of things. So I think there's, it's just stepping back for a second, I think that there has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of offerings. Our job as a software maker is to make that easier and connect those dots so customers don't have to do it all and all on their own. >> And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about new types of hardware coming in. You guys are optimizing for that. We see the IT and the OT worlds blending together, no question, and then that end-to-end management piece. This is different from, you're right, from IT. Normally everything's controlled, you're the data center, and this is a metadata rethinking, kind of how you manage metadata. So in the spirit of what we talked about earlier today, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >> Yeah, I'm really glad you're asking that Dave because we actually embarked on a project called Project Fusion which really was about integrating with, when you look at that connected vehicle lifecycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Now, Cloudera's piece of this was ingesting data and all the things I talked about being storing and the machine learning. So we provide that end-to-end, but what we wanted to do is we wanted to partner with some key partners, and the partners that we did integrate with were NXP. NXP provides the service-oriented gateways in the cars. That's the hardware in the car. Wind River provides an in-car operating system that's Linux, that's hardened and tested. We then ran our Apache MiNiFi which is part of Cloudera Dataflow in the vehicle, on that operating system, on that hardware. We pumped the data over into the cloud where we did all the data analytics and machine learning and built out these very specialized models, and then we used a company called Airbiquity once we built those models to do. They specialize in automotive over-the-air updates, so they can then take those models and update those models back to the vehicle very rapidly. So what we said is, look, there's an established ecosystem, if you will, of leaders in this space. What we wanted to do is make sure that Cloudera was part and parcel of this ecosystem, and by the way, you mentioned Nvidia as well. We're working close with Nvidia now so when we're doing the machine learning, we can leverage some of their hardware to get some still further acceleration in the machine learning side of things. So yeah, one of the things I always say about these types of use cases, it does take a village, and what we've really tried to do is build out an ecosystem that provides that village so that we can speed that analytics and machine learning life cycle just as fast as it can be. >> This is, again, another great example of data-intensive workloads. It's not your grandfather's ERP that's running on traditional systems. These are really purpose built. Maybe they're customizable for certain edge-use cases. They're low cost, low power. They can't be bloated, and you're right, it does take an ecosystem. You've got to have APIs that connect, and that takes a lot of work and a lot of thought. So that leads me to the technologies that are sort of underpinning this. We talked a lot on theCUBE about semiconductor technology, and now that's changing, and the advancements we're seeing there. What do you see as some of the key technology areas that are advancing this connected vehicle machine learning? >> You know, it's interesting. I'm seeing it in a few places, just a few notable ones. I think, first of all, we see that the vehicle itself is getting smarter. So when you look at that NXP-type of gateway that we talked about, that used to be kind of a dumb gateway that was really, all it was doing was pushing data up and down and provided isolation as a gateway down from the lower level subsystems, so it was really security and just basic communication. That gateway now is becoming what they call a service oriented gateway, so it can run. It's got discs, it's got memory, it's got all this stuff. So now you could run serious compute in the car. So now, all of these things like running machine learning inference models, you have a lot more power in the car. At the same time, 5G is making it so that you can push data fast enough making low-latency computing available even on the cloud. So now you've got incredible compute both at the edge in the vehicle and on the cloud. And then on the cloud, you've got partners like Nvidia who are accelerating it still further through better GPU-based compute. So I mean the whole stack, if you look at that machine learning life cycle we talked about, no Dave, it seems like there's improvements in every step along the way. We're starting to see technology optimization just pervasive throughout the cycle. >> And then, real quick. It's not a quick topic, but you mentioned security. I mean, we've seen a whole new security model emerge. There is no perimeter anymore in a use case like this, is there? >> No, there isn't, and one of the things that we're- Remember, we're the data management platform, and the thing we have to provide is provide end-to-end lineage of where that data came from, who can see it, how it changed, and that's something that we have integrated into from the beginning of when that data is ingested through when it's stored through when it's kind of processed, and people are doing machine learning. We will provide that lineage so that security and governance is assured throughout the data learning life cycle. >> And federated, in this example, across the fleet. So, all right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it. >> Dave, thank you, and thanks for the audience for listening in today. >> Yes, thank you for watching. Keep it right there. (upbeat music)

Published Date : Aug 5 2021

SUMMARY :

and in this first session, and the first you need to be able to do and machine learning, the and then you can do very talked about the monitoring, and complexity in the past. So in the spirit of what we and the partners that we and the advancements we're seeing there. it so that you can push data but you mentioned security. and the thing we have that's all the time we have right now. and thanks for the audience Yes, thank you for watching.

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Dave	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Michael Ger	PERSON	0.99+
Airbiquity	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
NXP	ORGANIZATION	0.99+
first	QUANTITY	0.99+
475,000 trucks	QUANTITY	0.99+
2022	DATE	0.99+
150%	QUANTITY	0.99+
Linux	TITLE	0.99+
first generation	QUANTITY	0.99+
3 cents a mile	QUANTITY	0.99+
One	QUANTITY	0.99+
15 cents a mile	QUANTITY	0.98+
first session	QUANTITY	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
two things	QUANTITY	0.97+
Wind River	ORGANIZATION	0.97+
one example	QUANTITY	0.97+
cloudera.com/solutions/manufacturing	OTHER	0.96+
one	QUANTITY	0.95+
first thing	QUANTITY	0.95+
First	QUANTITY	0.95+
5G	ORGANIZATION	0.92+
one such	QUANTITY	0.88+
12 cents cents a mile	QUANTITY	0.86+
Apache	ORGANIZATION	0.83+
over a million	QUANTITY	0.83+
80%	QUANTITY	0.81+
earlier today	DATE	0.72+
Brother	TITLE	0.6+
second	QUANTITY	0.59+
MiNiFi	COMMERCIAL_ITEM	0.58+
well	QUANTITY	0.55+

Manufacturing - Drive Transportation Efficiency and Sustainability with Big | Cloudera

>> Welcome to our industry drill down. This is for manufacturing. I'm here with Michael Ger, who is the managing director for automotive and manufacturing solutions at Cloudera. And in this first session, we're going to discuss how to drive transportation efficiencies and improve sustainability with data. Connected trucks are fundamental to optimizing fleet performance, costs, and delivering new services to fleet operators. And what's going to happen here is Michael's going to present some data and information, and we're going to come back and have a little conversation about what we just heard. Michael, great to see you! Over to you. >> Oh, thank you, Dave. And I appreciate having this conversation today. Hey, you know, this is actually an area, connected trucks, you know, this is an area that we have seen a lot of action here at Cloudera. And I think the reason is kind of important, right? Because you know, first of all, you can see that, you know, this change is happening very, very quickly, right? 150% growth is forecast by 2022 and the reasons, and I think this is why we're seeing a lot of action and a lot of growth, is that there are a lot of benefits, right? We're talking about a B2B type of situation here. So this is truck made, truck makers providing benefits to fleet operators. And if you look at the, the top fleet operator, the top benefits that fleet operators expect, you see this in, in the, in the graph over here, now almost 80% of them expect improved productivity, things like improved routing, right? So route efficiencies, improved customer service, decrease in fuel consumption, better better technology. This isn't technology for technology's sake, these connected trucks are coming onto the marketplace because, hey, it can provide tremendous value to the business. And in this case, we're talking about fleet operators and fleet efficiencies. So, you know, one of the things that's really important to be able to enable us, right, trucks are becoming connected because at the end of the day, we want to be able to provide fleet efficiencies through connected truck analytics and machine learning. Let me explain to you a little bit about what we mean by that, because what, you know, how this happens is by creating a connected vehicle, analytics, machine-learning life cycle, and to do that, you need to do a few different things, right? You start off, of course, with connected trucks in the field. And, you know, you could have many of these trucks because typically you're dealing at a truck level and at a fleet level, right? You want to be able to do analytics and machine learning to improve performance. So you start off with these trucks. And the first thing you need to be able to do is connect to those trucks, right? You have to have an intelligent edge where you can collect that information from the trucks. And by the way, once you collect the, this information from the trucks, you want to be able to analyze that data in real-time and take real-time actions. Now what I'm going to show you, the ability to take this real-time action, is actually the result of your machine-learning lifecycle. Let me explain to you what I mean by that. So we have these trucks, we start to collect data from it, right? At the end of the day what we'd like to be able to do is pull that data into either your data center or into the cloud, where we can start to do more advanced analytics. And we start with being able to ingest that data into the cloud, into that enterprise data lake. We store that data. We want to enrich it with other data sources. So for example, if you're doing truck predictive maintenance, you want to take that sensor data that you've connected, collected from those trucks. And you want to augment that with your dealership, say, service information. Now you have, you know, you have sensor data and the resulting repair orders. You're now equipped to do things like predict when maintenance will work, all right. You've got all the data sets that you need to be able to do that. So what do you do? Like I said, you're ingested, you're storing, you're enriching it with data, right? You're processing that data. You're aligning, say, the sensor data to that transactional system data from your, from your your repair maintenance systems; you're, you're bringing it together so that you can do two things. You can do, first of all, you could do self-service BI on that data, right? You can do things like fleet analytics, but more importantly, what I was talking to you about before is you now have the data sets to be able to do create machine learning models. So if you have the sensor values and the need, for example, for, for a dealership repair, or is, you could start to correlate which sensor values predicted the need for maintenance, and you could build out those machine learning models. And then as I mentioned to you, you could push those machine learning models back out to the edge, which is how you would then take those real-time actions I mentioned earlier. As that data that then comes through in real-time, you're running it again against that model. And you can take some real-time actions. This is what we, this is this, this, this analytics and machine learning model, machine learning life cycle is exactly what Cloudera enables. This end-to-end ability to ingest data; store, you know, store it, put a query lay over it, create machine learning models, and then run those machine learning models in real time. Now that's what we, that's what we do as a business. Now one such customer, and I just want to give you one example of a customer that we have worked with to provide these types of results is Navistar. And Navistar was kind of an early, early adopter of connected truck analytics, and they provided these capabilities to their fleet operators, right? And they started off by, by, you know, connecting 475,000 trucks to up to well over a million now. And you know, the point here is that they were centralizing data from their telematics service providers, from their trucks' telematics service providers. They're bringing in things like weather data and all those types of things. And what they started to do was to build out machine learning models aimed at predictive maintenance. And what's really interesting is that you see that Navistar made tremendous strides in reducing the need, or the expense associated with maintenance, right? So rather than waiting for a truck to break and then fixing it, they would predict when that truck needs service, condition-based monitoring, and service it before it broke down, so that you can do that in a much more cost-effective manner. And if you see the benefits, right, they reduce maintenance costs 3 cents a mile from the, you know, down from the industry average of 15 cents a mile down to 12 cents cents a mile. So this was a tremendous success for Navistar. And we're seeing this across many of our, you know, truck manufacturers. We're, we're working with many of the truck OEMs, and they are all working to achieve very, very similar types of benefits to their customers. So just a little bit about Navistar. Now, we're going to turn to Q and A. Dave's got some questions for me in a second, but before we do that, if you want to learn more about our, how we work with connected vehicles and autonomous vehicles, please go to our web, to our website. What you see up, up on the screen. There's the URL. It's cloudera.com forward slash solutions, forward slash manufacturing. And you'll see a whole slew of collateral and information in much more detail in terms of how we connect trucks to fleet operators who provide analytics. Use cases that drive dramatically improved performance. So with that being said, I'm going to turn it over to Dave for questions. >> Thank you, Michael. That's a great example you've got. I love the life cycle. You can visualize that very well. You've got an edge use case you do in both real time inference, really, at the edge. And then you're blending that sensor data with other data sources to enrich your models. And you can push that back to the edge. That's that life cycle. So really appreciate that, that info. Let me ask you, what are you seeing as the most common connected vehicle when you think about analytics and machine learning, the use cases that you see customers really leaning into? >> Yeah, that's really, that's a great question, Dave, you know, cause, you know, everybody always thinks about machine learning like this is the first thing you go to. Well, actually it's not, right? For the first thing you really want to be able to go down, many of our customers are doing, is look, let's simply connect our trucks or our vehicles or whatever our IOT asset is, and then you can do very simple things like just performance monitoring of the, of the piece of equipment. In the truck industry, a lot of performance monitoring of the truck, but also performance monitoring of the driver. So how is the, how is the driver performing? Is there a lot of idle time spent? You know, what's, what's route efficiency looking like? You know, by connecting the vehicles, right? You get insights, as I said, into the truck and into the driver and that's not machine learning even, right? But, but that, that monitoring piece is really, really important. So the first thing that we see is monitoring types of use cases. Then you start to see companies move towards more of the, what I call the machine learning and AI models, where you're using inference on the edge. And then you start to see things like predictive maintenance happening, kind of route real-time, route optimization and things like that. And you start to see that evolution again, to those smarter, more intelligent dynamic types of decision-making. But let's not, let's not minimize the value of good old fashioned monitoring, that's to give you that kind of visibility first, then moving to smarter use cases as you, as you go forward. >> You know, it's interesting, I'm I'm envisioning, when you talked about the monitoring, I'm envisioning, you see the bumper sticker, you know, "How am I driving?" The only time somebody ever probably calls is when they get cut off it's and you know, I mean, people might think, "Oh, it's about big brother," but it's not. I mean, that's yeah okay, fine. But it's really about improvement and training and continuous improvement. And then of course the, the route optimization. I mean, that's, that's bottom line business value. So, so that's, I love those, those examples. >> Great! >> I wonder, I mean, what are the big hurdles that people should think about when they want to jump into those use cases that you just talked about, what are they going to run into? You know, the blind spots they're, they're going to, they're going to to get hit with. >> There's a few different things, right? So first of all, a lot of times your IT folks aren't familiar with the kind of the more operational IOT types of data. So just connecting to that type of data can be a new skill set, right? There's very specialized hardware in the car and things like, like that and protocols. That's number one. That's the classic IT OT kind of conundrum that, you know, many of our customers struggle with. But then, more fundamentally, is, you know, if you look at the way these types of connected truck or IOT solutions started, you know, oftentimes they were, the first generation were very custom built, right? So they were brittle, right? They were kind of hardwired. Then as you move towards more commercial solutions, you had what I call the silo problem, right? You had fragmentation in terms of this capability from this vendor, this capability from another vendor. You get the idea. You know, one of the things that we really think that we need that we, that needs to be brought to the table, is, first of all, having an end to end data management platform. It's kind of an integrated, it's all tested together, you have a data lineage across the entire stack. But then also importantly, to be realistic, we have to be able to integrate to industry kind of best practices as well in terms of solution components in the car, the hardware and all those types of things. So I think there's, you know, it's just stepping back for a second, I think that there is, has been fragmentation and complexity in the past. We're moving towards more standards and more standard types of offerings. Our job as a software maker is to make that easier and connect those dots, so customers don't have to do it all on all on their own. >> And you mentioned specialized hardware. One of the things we heard earlier in the main stage was your partnership with Nvidia. We're talking about new types of hardware coming in. You guys are optimizing for that. We see the IT and the OT worlds blending together, no question. And then that end-to-end management piece, you know, this is different from, your right, from IT, normally everything's controlled, you're in the data center. And this is a metadata, you know, rethinking kind of how you manage metadata. So in the spirit of, of what we talked about earlier today, other technology partners, are you working with other partners to sort of accelerate these solutions, move them forward faster? >> Yeah, I'm really glad you're asking that, Dave, because we actually embarked on a product on a project called Project Fusion, which really was about integrating with, you know, when you look at that connected vehicle lifecycle, there are some core vendors out there that are providing some very important capabilities. So what we did is we joined forces with them to build an end-to-end demonstration and reference architecture to enable the complete data management life cycle. Now Cloudera's piece of this was ingesting data and all the things I talked about in storing and the machine learning, right? And so we provide that end to end. But what we wanted to do is we wanted to partner with some key partners. And the partners that we did integrate with were NXP. NXP provides the service-oriented gateways in the car, so that's the hardware in the car. Wind River provides an in-car operating system. That's Linux, right? That's hardened and tested. We then ran ours, our, our Apache MiNiFi, which is part of Cloudera data flow, in the vehicle, right on that operating system, on that hardware. We pumped the data over into the cloud where we did the, all the data analytics and machine learning, and built out these very specialized models. And then we used a company called Airbiquity, once we built those models, to do, you know, they specialize in automotive over-the-air updates, right? So they can then take those models, and update those models back to the vehicle very rapidly. So what we said is, look, there's, there's an established, you know, ecosystem, if you will, of leaders in this space. What we wanted to do is make sure that Cloudera was part and parcel of this ecosystem. And by the way, you mentioned Nvidia as well. We're working close with Nvidia now. So when we're doing the machine learning, we can leverage some of their hardware to get some still further acceleration in the machine learning side of things. So yeah, you know, one of the things I, I, I always say about these types of use cases, it does take a village. And what we've really tried to do is build out that, that, that an ecosystem that provides that village so that we can speed that analytics and machine learning lifecycle just as fast as it can be. >> This is, again, another great example of data intensive workloads. It's not your, it's not your grandfather's ERP that's running on, you know, traditional, you know, systems, it's, these are really purpose built, maybe they're customizable for certain edge-use cases. They're low cost, low, low power. They can't be bloated. And you're right, it does take an ecosystem. You've got to have, you know, APIs that connect and, and that's that, that takes a lot of work and a lot of thought. So that, that leads me to the technologies that are sort of underpinning this. We've talked, we've talked a lot on The Cube about semiconductor technology, and now that's changing, and the advancements we're seeing there. What, what do you see as some of the key technology areas that are advancing this connected vehicle machine learning? >> You know, it's interesting, I'm seeing it in, in a few places, just a few notable ones. I think, first of all, you know, we see that the vehicle itself is getting smarter, right? So when you look at, we look at that NXP type of gateway that we talked about. That used to be kind of a, a dumb gateway that was, really all it was doing was pushing data up and down, and provided isolation as a gateway down to the, down from the lower level subsystems. So it was really security and just basic, you know, basic communication. That gateway now is becoming what they call a service oriented gateway. So it can run. It's not, it's got disc, it's got memory, it's got all this. So now you could run serious compute in the car, right? So now all of these things like running machine-learning inference models, you have a lot more power in the car. At the same time, 5G is making it so that you can push data fast enough, making low latency computing available, even on the cloud. So now, now you've got incredible compute both at the edge in the vehicle and on the cloud, right? And, you know, and then on the, you know, on the cloud, you've got partners like Nvidia, who are accelerating it still further through better GPU-based computing. So, I mean the whole stack, if you look at that, that machine learning life cycle we talked about, you know, Dave, it seems like there's improvements in every, in every step along the way, we're starting to see technology optim, optimization just pervasive throughout the cycle. >> And then, you know, real quick, it's not a quick topic, but you mentioned security. I mean, we've seen a whole new security model emerge. There is no perimeter anymore in this, in a use case like this is there? >> No, there isn't. And one of the things that we're, you know, remember we're the data management plat, platform, and the thing we have to provide is provide end-to-end, you know, end-to-end lineage of where that data came from, who can see it, you know, how it changed, right? And that's something that we have integrated into, from the beginning of when that data is ingested, through when it's stored, through when it's kind of processed and people are doing machine learning; we provide, we will provide that lineage so that, you know, that security and governance is assured throughout the, throughout that data learning life's level. >> And federated across, in this example, across the fleet, so. All right, Michael, that's all the time we have right now. Thank you so much for that great information. Really appreciate it. >> Dave, thank you. And thanks for the audience for listening in today. >> Yes, thank you for watching. Keep it right there.

Published Date : Aug 3 2021

SUMMARY :

And in this first session, And the first thing you the use cases that you see For the first thing you really it's and you know, I that you just talked about, So I think there's, you know, And this is a metadata, you know, And by the way, you You've got to have, you and just basic, you know, And then, you know, real that lineage so that, you know, the time we have right now. And thanks for the audience Yes, thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Michael	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Michael Ger	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
12 cents	QUANTITY	0.99+
NXP	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Airbiquity	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
150%	QUANTITY	0.99+
475,000 trucks	QUANTITY	0.99+
2022	DATE	0.99+
first session	QUANTITY	0.99+
today	DATE	0.99+
two things	QUANTITY	0.99+
first generation	QUANTITY	0.98+
15 cents a mile	QUANTITY	0.98+
Wind River	ORGANIZATION	0.98+
Linux	TITLE	0.98+
cloudera.com	OTHER	0.98+
One	QUANTITY	0.98+
3 cents a mile	QUANTITY	0.98+
first thing	QUANTITY	0.97+
one example	QUANTITY	0.97+
both	QUANTITY	0.96+
one	QUANTITY	0.96+
almost 80%	QUANTITY	0.94+
Apache	ORGANIZATION	0.94+
cents a mile	QUANTITY	0.82+
over a million	QUANTITY	0.79+
earlier today	DATE	0.79+
Project	TITLE	0.75+
one such customer	QUANTITY	0.72+
Cube	ORGANIZATION	0.7+
a second	QUANTITY	0.69+
5G	ORGANIZATION	0.64+
well	QUANTITY	0.61+
MiNiFi	COMMERCIAL_ITEM	0.59+
second	QUANTITY	0.56+
up	QUANTITY	0.54+
Cloudera	TITLE	0.53+

Rob Bearden, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Rob	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
hundreds of thousands of dollars	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
each component	QUANTITY	0.99+
GDPR	TITLE	0.99+
DataWorks Summit	EVENT	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
millions of dollars	QUANTITY	0.98+
Atlas	TITLE	0.98+
first steps	QUANTITY	0.98+
HDP 3.0	TITLE	0.97+
One pane	QUANTITY	0.97+
both	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
First	QUANTITY	0.96+
next year	DATE	0.96+
each	QUANTITY	0.96+
DataPlane	TITLE	0.96+
theCUBE	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
DataWorks	ORGANIZATION	0.95+
Spark	TITLE	0.95+
today	DATE	0.94+
EU	LOCATION	0.93+
this morning	DATE	0.91+
Atlanta,	LOCATION	0.91+
Berlin	LOCATION	0.9+
each type	QUANTITY	0.88+
Global Data Protection Regulation GDPR	TITLE	0.87+
one common	QUANTITY	0.86+
few months ago	DATE	0.85+
NiFi	ORGANIZATION	0.85+
Data Platform 3.0	TITLE	0.84+
each tier	QUANTITY	0.84+
Data Studio	ORGANIZATION	0.84+
Data Studio	TITLE	0.83+
day one	QUANTITY	0.83+
one management platform	QUANTITY	0.82+
MiNiFi	ORGANIZATION	0.82+
San	LOCATION	0.71+
DataPlane	ORGANIZATION	0.69+
Kafka	TITLE	0.67+
Encore ERP	TITLE	0.66+
one common set	QUANTITY	0.65+
Data Steward Studio	ORGANIZATION	0.65+
HDF	ORGANIZATION	0.59+
Georgia	LOCATION	0.55+
announcements	QUANTITY	0.51+
Jose	ORGANIZATION	0.47+

John Kreisa, Hortonworks | DataWorks Summit 2018

>> Live from San José, in the heart of Silicon Valley, it's theCUBE! Covering DataWorks Summit 2018. Brought to you by Hortonworks. (electro music) >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San José, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by John Kreisa. He is the VP of marketing here at Hortonworks. Thanks so much for coming on the show. >> Thank you for having me. >> We've enjoyed watching you on the main stage, it's been a lot of fun. >> Thank you, it's been great. It's been great general sessions, some great talks. Talking about the technology, we've heard from some customers, some third parties, and most recently from Kevin Slavin from The Shed which is really amazing. >> So I really want to get into this event. You have 2,100 attendees from 23 different countries, 32 different industries. >> Yep. This started as a small, >> That's right. tiny little thing! >> Didn't Yahoo start it in 2008? >> It did, yeah. >> You changed names a few year ago, but it's still the same event, looming larger and larger. >> Yeah! >> It's been great, it's gone international as you've said. It's actually the 17th total event that we've done. >> Yeah. >> If you count the ones we've done in Europe and Asia. It's a global community around data, so it's no surprise. The growth has been phenomenal, the energy is great, the innovations that the community is talking about, the ecosystem is talking about, is really great. It just continues to evolve as an event, it continues to bring new ideas and share those ideas. >> What are you hearing from customers? What are they buzzing about? Every morning on the main stage, you do different polls that say, "how much are you using machine learning? What portion of your data are you moving to the cloud?" What are you learning? >> So it's interesting because we've done similar polls in our show in Berlin, and the results are very similar. We did the cloud poll pole and there's a lot of buzz around cloud. What we're hearing is there's a lot of companies that are thinking about, or are somewhere along their cloud journey. It's exactly what their overall plans are, and there's a lot of news about maybe cloud will eat everything, but if you look at the pole results, something like 75% of the attendees said they have cloud in their plans. Only about 12% said they're going to move everything to the cloud, so a lot of hybrid with cloud. It's how to figure out which work loads to run where, how to think about that strategy in terms of where to deploy the data, where to deploy the work loads and what that should look like and that's one of the main things that we're hearing and talking a lot about. >> We've been seeing that Wikiban and our recent update to the recent market forecast showed that public cloud will dominate increasingly in the coming decade, but hybrid cloud will be a long transition period for many or most enterprises who are still firmly rooted in on-premises employment, so forth and so on. Clearly, the bulk of your customers, both of your custom employments are on premise. >> They are. >> So you're working from a good starting point which means you've got what, 1,400 customers? >> That's right, thereabouts. >> Predominantly on premises, but many of them here at this show want to sustain their investment in a vendor that provides them with that flexibility as they decide they want to use Google or Microsoft or AWS or IBM for a particular workload that their existing investment to Hortonworks doesn't prevent them from facilitating. It moves that data and those workloads. >> That's right. The fact that we want to help them do that, a lot of our customers have, I'll call it a multi-cloud strategy. They want to be able to work with an Amazon or a Google or any of the other vendors in the space equally well and have the ability to move workloads around and that's one of the things that we can help them with. >> One of the things you also did yesterday on the main stage, was you talked about this conference in the greater context of the world and what's going on right now. This is happening against the backdrop of the World Cup, and you said that this is really emblematic of data because this is a game, a tournament that generates tons of data. >> A tremendous amount of data. >> It's showing how data can launch new business models, disrupt old ones. Where do you think we're at right now? For someone who's been in this industry for a long time, just lay the scene. >> I think we're still very much at the beginning. Even though the conference has been around for awhile, the technology has been. It's emerging so fast and just evolving so fast that we're still at the beginning of all the transformations. I've been listening to the customer presentations here and all of them are at some point along the journey. Many are really still starting. Even in some of the polls that we had today talked about the fact that they're very much at the beginning of their journey with things like streaming or some of the A.I. machine learning technologies. They're at various stages, so I believe we're really at the beginning of the transformation that we'll see. >> That reminds me of another detail of your product portfolio or your architecture streaming and edge deployments are also in the future for many of your customers who still primarily do analytics on data at rest. You made an investment in a number of technologies NiFi from streaming. There's something called MiNiFi that has been discussed here at this show as an enabler for streaming all the way out to edge devices. What I'm getting at is that's indicative of Arun Murthy, one of your co-founders, has made- it was a very good discussion for us analysts and also here at the show. That is one of many investments you're making is to prepare for a future that will set workloads that will be more predominant in the coming decade. One of the new things I've heard this week that I'd not heard in terms of emphasis from you guys is more of an emphasis on data warehousing as an important use case for HDP in your portfolios, specifically with HIVE. The HIVE 3.0 now in- HDP3.0. >> Yes. >> With the enhancements to HIVE to support more real time and low latency, but also there's ACID capabilities there. I'm hearing something- what you guys are doing is consistent with one of your competitors, Cloudera. They're going deeper into data warehousing too because they recognize they've got to got there like you do to be able to absorb more of your customers' workloads. I think that's important that you guys are making that investment. You're not just big data, you're all data and all data applications. Potentially, if your customers want to go there and engage you. >> Yes. >> I think that was a significant, subtle emphasis that me as an analyst noticed. >> Thank you. There were so many enhancements in 3.0 that were brought from the community that it was hard to talk about everything in depth, but you're right. The enhancements to HIVE in terms of performance have really enabled it to take on a greater set of workloads and inner activity that we know that our customers want. The advantage being that you have a common data layer in the back end and you can run all this different work. It might be data warehousing, high speed query workloads, but you can do it on that same data with Spark and data-science related workloads. Again, it's that common pool backend of the data lake and having that ability to do it with common security and governance. It's one of the benefits our customers are telling us they really appreciate. >> One of the things we've also heard this morning was talking about data analytics in terms of brand value and brand protection importantly. Fedex, exactly. Talking about, the speaker said, we've all seen these apology commercials. What do you think- is it damage control? What is the customer motivation here? >> Well a company can have billions of dollars of market cap wiped out by breeches in security, and we've seen it. This is not theoretical, these are actual occurrences that we've seen. Really, they're trying to protect the brand and the business and continue to be viable. They can get knocked back so far that it can take years to recover from the impact. They're looking at the security aspects of it, the governance of their data, the regulations of GVPR. These things you've mentioned have real financial impact on the businesses, and I think it's brand and the actual operations and finances of the businesses that can be impacted negatively. >> When you're thinking about Hortonworks's marketing messages going forward, how do you want to be described now, and then how do you want customers to think of you five or 10 years from now? >> I want them to think of us as a partner to help us with their data journey, on all aspects of their data journey, whether they're collecting data from the EDGE, you mentioned NiFi and things like that. Bringing that data back, processing it in motion, as well as processing it in rest, regardless of where that data lands. On premise, in the cloud, somewhere in between, the hybrid, multi-cloud strategy. We really want to be thought of as their partner in their data journey. That's really what we're doing. >> Even going forward, one of the things you were talking about earlier is the company's sort of saying, "we want to be boring. We want to help you do all the stuff-" >> There's a lot of money in boring. >> There's a lot of money, right! Exactly! As you said, a partner in their data journey. Is it "we'll do anything and everything"? Are you going to do niche stuff? >> That's a good question. Not everything. We are focused on the data layer. The movement of data, the process and storage, and truly the analytic applications that can be built on top of the platform. Right now we've stuck to our strategy. It's been very consistent since the beginning of the company in terms of taking these open source technologies, making them enterprise viable, developing an eco-system around it and fostering a community around it. That's been our strategy since before the company even started. We want to continue to do that and we will continue to do that. There's so much innovation happening in the community that we quickly bring that into the products and make sure that's available in a trusted, enterprise-tested platform. That's really one of the things we see our customers- over and over again they select us because we bring innovation to them quickly, in a safe and consumable way. >> Before we came on camera, I was telling Rebecca that Hortonworks has done a sensational job of continuing to align your product roadmaps with those of your leading partners. IBM, AWS, Microsoft. In many ways, your primary partners are not them, but the entire open source community. 26 open source projects in which Hortonworks represents and incorporated in your product portfolio in which you are a primary player and committer. You're a primary ingester of innovation from all the communities in which you operate. >> We do. >> That is your core business model. >> That's right. We both foster the innovation and we help drive the information ourselves with our engineers and architects. You're absolutely right, Jim. It's the ability to get that innovation, which is happening so fast in the community, into the product and companies need to innovate. Things are happening so fast. Moore's Law was mentioned multiple times on the main stage, you know, and how it's impacting different parts of the organization. It's not just the technology, but business models are evolving quickly. We heard a little bit about Trumble, and if you've seen Tim Leonard's talk that he gave around what they're doing in terms of logistics and the ability to go all the way out to the farmer and impact what's happening at the farm and tracking things down to the level of a tomato or an egg all the way back and just understand that. It's evolving business models. It's not just the tech but the evolution of business models. Rob talked about it yesterday. I think those are some of the things that are kind of key. >> Let me stay on that point really quick. Industrial internet like precision agriculture and everything it relates to, is increasingly relying on visual analysis, parts and eggs and whatever it might be. That is convolutional neural networks, that is A.I., it has to be trained, and it has to be trained increasingly in the cloud where the data lives. The data lives in H.D.P, clusters and whatnot. In many ways, no matter where the world goes in terms of industrial IoT, there will be massive cluster of HTFS and object storage driving it and also embedded A.I. models that have to follow a specific DevOps life cycle. You guys have a strong orientation in your portfolio towards that degree of real-time streaming, as it were, of tasks that go through the entire life cycle. From the preparing the data, to modeling, to training, to deploying it out, to Google or IBM or wherever else they want to go. So I'm thinking that you guys are in a good position for that as well. >> Yeah. >> I just wanted to ask you finally, what is the takeaway? We're talking about the attendees, talking about the community that you're cultivating here, theme, ideas, innovation, insight. What do you hope an attendee leaves with? >> I hope that the attendee leaves educated, understanding the technology and the impacts that it can have so that they will go back and change their business and continue to drive their data projects. The whole intent is really, and we even changed the format of the conference for more educational opportunities. For me, I want attendees to- a satisfied attendee would be one that learned about the things they came to learn so that they could go back to achieve the goals that they have when they get back. Whether it's business transformation, technology transformation, some combination of the two. To me, that's what I hope that everyone is taking away and that they want to come back next year when we're in Washington, D.C. and- >> My stomping ground. >> His hometown. >> Easy trip for you. They'll probably send you out here- (laughs) >> Yeah, that's right. >> Well John, it's always fun talking to you. Thank you so much. >> Thank you very much. >> We will have more from theCUBE's live coverage of DataWorks right after this. I'm Rebecca Knight for James Kobielus. (upbeat electro music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the VP of marketing you on the main stage, Talking about the technology, So I really want to This started as a small, That's right. but it's still the same event, It's actually the 17th total event the innovations that the community is that's one of the main things that Clearly, the bulk of your customers, their existing investment to Hortonworks have the ability to move workloads One of the things you also did just lay the scene. Even in some of the polls that One of the new things I've heard this With the enhancements to HIVE to subtle emphasis that me the data lake and having that ability to One of the things we've also aspects of it, the the EDGE, you mentioned NiFi and one of the things you were talking There's a lot of money, right! That's really one of the things we all the communities in which you operate. It's the ability to get that innovation, the cloud where the data lives. talking about the community that learned about the things they came to They'll probably send you out here- fun talking to you. coverage of DataWorks right after this.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Tim Leonard	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
Jim	PERSON	0.99+
Kevin Slavin	PERSON	0.99+
Europe	LOCATION	0.99+
John Kreisa	PERSON	0.99+
Berlin	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
2008	DATE	0.99+
Washington, D.C.	LOCATION	0.99+
Asia	LOCATION	0.99+
75%	QUANTITY	0.99+
Rob	PERSON	0.99+
five	QUANTITY	0.99+
San José	LOCATION	0.99+
next year	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
32 different industries	QUANTITY	0.99+
World Cup	EVENT	0.99+
yesterday	DATE	0.99+
23 different countries	QUANTITY	0.99+
one	QUANTITY	0.99+
1,400 customers	QUANTITY	0.99+
today	DATE	0.99+
two	QUANTITY	0.99+
2,100 attendees	QUANTITY	0.99+
Fedex	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
26 open source projects	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.98+
17th	QUANTITY	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
Cloudera	ORGANIZATION	0.97+
about 12%	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
this week	DATE	0.96+
DataWorks Summit 2018	EVENT	0.95+
NiFi	ORGANIZATION	0.91+
this morning	DATE	0.89+
HIVE 3.0	OTHER	0.86+
Spark	TITLE	0.86+
few year ago	DATE	0.85+
Wikiban	ORGANIZATION	0.85+
The Shed	ORGANIZATION	0.84+
San José, California	LOCATION	0.84+
tons	QUANTITY	0.82+
H.D.P	LOCATION	0.82+
DataWorks	EVENT	0.81+
things	QUANTITY	0.78+
DataWorks	ORGANIZATION	0.74+
MiNiFi	TITLE	0.62+
data	QUANTITY	0.61+
Moore	TITLE	0.6+
years	QUANTITY	0.59+
coming decade	DATE	0.59+
Trumble	ORGANIZATION	0.59+
GVPR	ORGANIZATION	0.58+
3.0	OTHER	0.56+

Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

Sastry Malladi, FogHorn | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partner. (upbeat electronic music) >> Welcome back to The Cube. I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV, in downtown San Jose down the street from the Strata Data Conference. We're joined by a new guest to theCUBE, Sastry Malladi, the CTO Of FogHorn. Sastry, welcome to theCUBE. >> Thank you, thank you, Lisa. >> So FogHorn, cool name, what do you guys do, who are you? Tell us all that good stuff. >> Sure. We are a startup based in Silicon Valley right here in Mountain View. We started about three years ago, three plus years ago. We provide edge computing intelligence software for edge computing or fog computing. That's how our company name got started is FogHorn. For our particularly, for our IoT industrial sector. All of the industrial guys, whether it's transportation, manufacturing, oil and gas, smart cities, smart buildings, any of those different sectors, they use our software to predict failure conditions in real time, or do condition monitoring, or predictive maintenance, any of those use cases and successfully save a lot of money. Obviously in the process, you know, we get paid for what we do. >> So Sastry... GE populized this concept of IIoT and the analytics and, sort of the new business outcomes you could build on it, like Power by the Hour instead of selling a jet engine. >> Sastry: That's right. But there's... Actually we keep on, and David Floor did some pioneering research on how we're going to have to do a lot of analytics on the edge for latency and bandwidth. What's the FogHorn secret sauce that others would have difficulty with on the edge analytics? >> Okay, that's a great question. Before I directly answer the question, if you don't mind, I'll actually even describe why that's even important to do that, right? So a lot of these industrial customers, if you look at, because we work with a lot of them, the amount of data that's produced from all of these different machines is terabytes to petabytes of data, it's real. And it's not just the traditional digital sensors but there are video, audio, acoustic sensors out there. The amount of data is humongous, right? It's not even practical to send all of that to a Cloud environment and do data processing, for many reasons. One is obviously the connectivity, bandwidth issues, and all of that. But the two most important things are cyber security. None of these customers actually want to connect these highly expensive machines to the internet. That's one. The second is the lack of real-time decision making. What they want to know, when there is a problem, they want to know before it's too late. We want to notify them it is a problem that is occurring so that have a chance to go fix it and optimize their asset that is in question. Now, existing solutions do not work in this constrained environment. That's why FogHorn had to invent that solution. >> And tell us, actually, just to be specific, how constrained an environment you can operate in. >> We can run in about less than 100 to 150 megabytes of memory, single-core to dual-core of CPU, whether it's an ARM processor, an x86 Intel-based processor, almost literally no storage because we're a real-time processing engine. Optionally, you could have some storage if you wanted to store some of the results locally there but that's the kind of environment we're talking about. Now, when I say 100 megabytes of memory, it's like a quarter of Raspberry Pi, right? And even in that environment we have customers that run dozens of machinery models, right? And we're not talking -- >> George: Like an ensemble. >> Like an anomaly detection, a regression, a random forest, or a clustering, or a gamut, some of those. Now, if we get into more deep learning models, like image processing and neural net and all of that, you obviously need a little bit more memory. But what we have shown, we could still run, one of our largest smart city buildings customer, elevator company, runs in a raspberry Pi on millions of elevators, right? Dozens of machinery algorithms on top of that, right? So that's the kind of size we're talking about. >> Let me just follow up with one question on the other thing you said, with, besides we have to do the low-latency locally. You said a lot of customers don't want to connect these brown field, I guess, operations technology machines to the internet, and physically, I mean there was physical separation for security. So it's like security, Bill Joy used to say "Security by obscurity." Here it's security by -- >> Physical separation, absolutely. Tell me about it. I was actually coming from, if you don't mind, last week I was in Saudi Arabia. One of the oil and gas plants where we deployed our software, you have to go to five levels of security even to get to there, It's a multibillion dollar plant and refining the gas and all of that. Completely offline, no connectivity to the internet, and we installed, in their existing small box, our software, connected to their live video cameras that are actually measuring the stuff, doing the processing and detecting the specific conditions that we're looking for. >> That's my question, which was if they want to be monitoring. So there's like one low level, really low hardware low level, the sensor feeds. But you could actually have a richer feed, which is video and audio, but how much of that, then, are you doing the, sort of, inferencing locally? Or even retraining, and I assume that since it's not the OT device, and it's something that's looking at it, you might be more able to send it back up the Cloud if you needed to do retraining? >> That's exactly right. So the way the model works is particularly for image processing because you need, it's a more complex process to train than create a model. You could create a model offline, like in a GPU box, an FPGA box and whatnot. Import and bring the model back into this small little device that's running in the plant, and now the live video data is coming in, the model is inferencing the specific thing. Now there are two ways to update and revise the model: incremental revision of the model, you could do that if you want, or you can send the results to a central location. Not internet, they do have local, in this example for example a PIDB, an OSS PIDB, or some other local service out there, where you have an opportunity to gather the results from each of these different locations and then consolidate and retrain the model, put the model back again. >> Okay, the one part that I didn't follow completely is... If the model is running ultimately on the device, again and perhaps not even on a CPU, but a programmable logic controller. >> It could, even though a programmable controller also typically have some shape of CPU there as well. These days, most of the PLCs, programmable controllers, have either an RM-based processor or an x86-based processor. We can run either one of those too. >> So, okay, assume you've got the model deployed down there, for the, you know, local inferencing. Now, some retraining is going to go on in the Cloud, where you have, you're pulling in the richer perspective from many different devices. How does that model get back out to the device if it doesn't have the connectivity between the device and the Cloud? >> Right, so if there's strictly no connectivity, so what happens is once the model is regenerated or retrained, they put a model in a USB stick, it's a low attack. USB stick, bring it to the PLC device and upload the model. >> George: Oh, so this is sort of how we destroyed the Iranian centrifuges. >> That's exactly right, exactly right. But you know, some other environments, even though it's not connectivity to the Cloud environment, per se, but the devices have the ability to connect to the Cloud. Optionally, they say, "Look, I'm the device "that's coming up, do you have an upgraded model for me?" Then it can pull the model. So in some of the environments it's super strict where there are absolutely no way to connect this device, you put it in a USB stick and bring the model back here. Other environments, device can query the Cloud but Cloud cannot connect to the device. This is a very popular model these days because, in other words imagine this, an elevator sitting in a building, somebody from the Cloud cannot reach the elevator, but an elevator can reach the Cloud when it wants to. >> George: Sort of like a jet engine, you don't want the Cloud to reach the jet engine. >> That's exactly right. The jet engine can reach the Cloud it if wants to, when it wants to, but the Cloud cannot reach the jet engine. That's how we can pull the model. >> So Sastry, as a CTO you meet with customers often. You mentioned you were in Saudi Arabia last week. I'd love to understand how you're leveraging and gaging with customers to really help drive the development of FogHorn, in terms of being differentiated in the market. What are those, kind of bi-directional, symbiotic customer relationships like? And how are they helping FogHorn? >> Right, that's actually a great question. We learn a lot from customers because we started a long time ago. We did an initial version of the product. As we begin to talk to the customers, particularly that's part of my job, where I go talk to many of these customers, they give us feedback. Well, my problem is really that I can't even do, I can't even give you connectivity to the Cloud, to upgrade the model. I can't even give you sample data. How do you do that modeling, right? And sometimes they say, "You know what, "We are not technical people, help us express the problem, "the outcome, give me tools "that help me express that outcome." So we created a bunch of what we call OT tools, operational technology tools. How we distinguish ourselves in this process, from the traditional Cloud-based vendor, the traditional data science and data analytics companies, is that they think in terms of computer scientists, computer programmers, and expressions. We think in terms of industrial operators, what can they express, what do they know? They don't really necessarily care about, when you tell them, "I've got an anomaly detection "data science machine algorithm", they're going to look at you like, "What are you talking about? "I don't understand what you're talking about", right? You need to tell them, "Look, this machine is failing." What are the conditions in which the machine is failing? How do you express that? And then we translate that requirement, or that into the underlying models, underlying Vel expressions, Vel or CPU expression language. So we learned a ton from user interface, capabilities, latency issues, connectivity issues, different protocols, a number of things that we learn from customers. >> So I'm curious with... More of the big data vendors are recognizing data in motion and data coming from devices. And some, like Hortonworks DataFlow NiFi has a MiNiFi component written in C plus plus, really low resource footprint. But I assume that that's really just a transport. It's almost like a collector and that it doesn't have the analytics built in -- >> That's exactly right, NiFi has the transport, it has the real-time transport capability for sure. What it does not have is this notion of that CEP concept. How do you combine all of the streams, everything is a time series data for us, right, from the devices. Whether it's coming from a device or whether it's coming from another static source out there. How do you express a pattern, a recognition pattern definition, across these streams? That's where our CPU comes in the picture. A lot of these seemingly similar software capabilities that people talk about, don't quite exactly have, either the streaming capability, or the CPU capability, or the real-time, or the low footprint. What we have is a combination of all of that. >> And you talked about how everything's time series to you. Is there a need to have, sort of an equivalent time series database up in some central location? So that when you subset, when you determine what relevant subset of data to move up to the Cloud, or you know, on-prem central location, does it need to be the same database? >> No, it doesn't need to be the same database. It's optional. In fact, we do ship a local time series database at the edge itself. If you have a little bit of a local storage, you can down sample, take the results, and store it locally, and many customers actually do that. Some others, because they have their existing environment, they have some Cloud storage, whether it's Microsoft, it doesn't matter what they use, we have connectors from our software to send these results into their existing environments. >> So, you had also said something interesting about your, sort of, tool set, as being optimized for operations technology. So this is really important because back when we had the Net-Heads and the Bell-Heads, you know it was a cultural clash and they had different technologies. >> Sastry: They sure did, yeah. >> Tell us more about how selling to operations, not just selling, but supporting operations technology is different from IT technology and where does that boundary live? >> Right, so typical IT environment, right, you start with the boss who is the decision maker, you work with them and they approve the project and you go and execute that. In an industrial, in an OT environment, it doesn't quite work like that. Even if the boss says, "Go ahead and go do this project", if the operator on the floor doesn't understand what you're talking about, because that person is in charge of operating that machine, it doesn't quite work like that. So you need to work bottom up as well, to convincing them that you are indeed actually solving their pain point. So the way we start, where rather than trying to tell them what capabilities we have as a product, or what we're trying to do, the first thing we ask is what is their pain point? "What's your problem? What is the problem "you're trying to solve?" Some customers say, "Well I've got yield, a lot of scrap. "Help me reduce my scrap. "Help me to operate my equipment better. "Help me predict these failure conditions "before it's too late." That's how the problem starts. Then we start inquiring them, "Okay, what kind of data "do you have, what kind of sensors do you have? "Typically, do you have information about under what circumstances you have seen failures "versus not seeing failures out there?" So in the process of inauguration we begin to understand how they might actually use our software and then we tell them, "Well, here, use your software, "our software, to predict that." And, sorry, I want 30 more seconds on that. The other thing is that, typically in an IT environment, because I came from that too, I've been in this position for 30 plus years, IT, UT and all of that, where we don't right away talk about CEP, or expressions, or analytics, and we don't talk about that. We talk about, look, you have these bunch of sensors, we have OT tools here, drag and drop your sensors, express the outcome that you're trying to look for, what is the outcome you're trying to look for, and then we drive behind the scenes what it means. Is it analytics, is it machine learning, is it something else, and what is it? So that's kind of how we approach the problem. Of course, if, sometimes you do surprisingly occasionally run into very technical people. From those people we can right away talk about, "Hey, you need these analytics, you need to use machinery, "you need to use expressions" and all of that. That's kind of how we operate. >> One thing, you know, that's becoming clearer is I think this widespread recognition that's data intensive and low latency work to be done near the edge. But what goes on in the Cloud is actually closer to simulation and high-performance compute, if you want to optimize a model. So not just train it, but maybe have something that's prescriptive that says, you know, here's the actionable information. As more of your data is video and audio, how do you turn that into something where you can simulate a model, that tells you the optimal answer? >> Right, so this is actually a good question. From our experience, there are models that require a lot of data, for example, video and audio. There are some other models that do not require a lot of data for training. I'll give you an example of what customer use cases that we have. There's one customer in a manufacturing domain, where they've been seeing a lot of finished goods failures, there's a lot of scrap and the problem then was, "Hey, predict the failures, "reduce my scrap, save the money", right? Because they've been seeing a lot of failures every single day, we did not need a lot of data to train and create a model to that. So, in fact, we just needed one hour's worth of data. We created a model, put the thing, we have reduced, completely eliminated their scrap. There are other kinds of models, other kinds of models of video, where we can't do that in the edge, so we're required for example, some video files or simulated audio files, take it to an offline model, create the model, and see whether it's accurately predicting based on the real-time video coming in or not. So it's a mix of what we're seeing between those two. >> Well Sastry, thank you so much for stopping by theCUBE and sharing what it is that you guys at FogHorn are doing, what you're hearing from customers, how you're working together with them to solve some of these pretty significant challenges. >> Absolutely, it's been a pleasure. Hopefully this was helpful, and yeah. >> Definitely, very educational. We want to thank you for watching theCUBE, I'm Lisa Martin with George Gilbert. We are live at our event, Big Data SV in downtown San Jose. Come stop by Forager Tasting Room, hang out with us, learn as much as we are about all the layers of big data digital transformation and the opportunities. Stick around, we will be back after a short break. (upbeat electronic music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media down the street from the Strata Data Conference. what do you guys do, who are you? Obviously in the process, you know, the new business outcomes you could build on it, What's the FogHorn secret sauce that others Before I directly answer the question, if you don't mind, how constrained an environment you can operate in. but that's the kind of environment we're talking about. So that's the kind of size we're talking about. on the other thing you said, with, and refining the gas and all of that. the Cloud if you needed to do retraining? Import and bring the model back If the model is running ultimately on the device, These days, most of the PLCs, programmable controllers, if it doesn't have the connectivity USB stick, bring it to the PLC device and upload the model. we destroyed the Iranian centrifuges. but the devices have the ability to connect to the Cloud. you don't want the Cloud to reach the jet engine. but the Cloud cannot reach the jet engine. So Sastry, as a CTO you meet with customers often. they're going to look at you like, and that it doesn't have the analytics built in -- or the real-time, or the low footprint. So that when you subset, when you determine If you have a little bit of a local storage, So, you had also said something interesting So the way we start, where rather than trying that tells you the optimal answer? and the problem then was, "Hey, predict the failures, and sharing what it is that you guys at FogHorn are doing, Hopefully this was helpful, and yeah. We want to thank you for watching theCUBE,

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Saudi Arabia	LOCATION	0.99+
Sastry Malladi	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
one hour	QUANTITY	0.99+
Sastry	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
GE	ORGANIZATION	0.99+
100 megabytes	QUANTITY	0.99+
Lisa	PERSON	0.99+
Bill Joy	PERSON	0.99+
two	QUANTITY	0.99+
FogHorn	ORGANIZATION	0.99+
last week	DATE	0.99+
Mountain View	LOCATION	0.99+
30 more seconds	QUANTITY	0.99+
David Floor	PERSON	0.99+
one question	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
30 plus years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three plus years ago	DATE	0.99+
one customer	QUANTITY	0.98+
one	QUANTITY	0.98+
second	QUANTITY	0.98+
C plus plus	TITLE	0.98+
One	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
150 megabytes	QUANTITY	0.98+
two ways	QUANTITY	0.97+
Strata Data Conference	EVENT	0.97+
Iranian	OTHER	0.97+
five levels	QUANTITY	0.95+
millions of elevators	QUANTITY	0.95+
about less than 100	QUANTITY	0.95+
one part	QUANTITY	0.94+
Vel	OTHER	0.94+
One thing	QUANTITY	0.92+
dozens of machinery models	QUANTITY	0.92+
each	QUANTITY	0.91+
Intel	ORGANIZATION	0.91+
FogHorn	PERSON	0.86+
2018	DATE	0.85+
first thing	QUANTITY	0.85+
single-core	QUANTITY	0.85+
NiFi	ORGANIZATION	0.82+
Power by the Hour	ORGANIZATION	0.81+
about three years ago	DATE	0.81+
Forager Tasting R	ORGANIZATION	0.8+
a ton	QUANTITY	0.8+
CTO	PERSON	0.79+
multibillion dollar	QUANTITY	0.79+
Data	EVENT	0.79+
Bell-Heads	ORGANIZATION	0.78+
every single day	QUANTITY	0.76+
The Cube	ORGANIZATION	0.75+
Cloud	COMMERCIAL_ITEM	0.73+
Dozens of machinery algorithms	QUANTITY	0.71+
Pi	COMMERCIAL_ITEM	0.71+
petabytes	QUANTITY	0.7+
raspberry	ORGANIZATION	0.69+
Big Data	ORGANIZATION	0.68+
Cloud	TITLE	0.67+
dual-core	QUANTITY	0.65+
Sastry	ORGANIZATION	0.62+
Net	ORGANIZATION	0.61+

Scott Gnau, Hortonworks Big Data SV 17 #BigDataSV #theCUBE

>> Narrator: Live from San Jose, California it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We're here live in Silicon Valley. This is theCUBE's coverage of Big Data Silicon Valley. Our event in conjunction with O'Reilly Strata Hadoop, of course we have our Big Data NYC event and we have our special popup event in New York and Silicon Valley. This is our Silicon Valley version. I'm John Furrier, with my co-host Jeff Frick and our next guest is Scott Gnau, CTO of Hortonworks. Great to have you on, good to see you again. >> Scott: Thanks for having me. >> You guys have an event coming up in Munich, so I know that there's a slew of new announcements coming up with Hortonworks in April, next month in Munich for your EU event and you're going to be holding a little bit of that back, but some interesting news this morning. We had Wei Wang yesterday with Microsoft Azure team HDInsight's. That's flowering nicely, a good bet there, but the question has always been at least from people in the industry and we've been questioning you guys on, hey, where's your cloud strategy? Because as a disture you guys have been very successful with your always open approach. Microsoft as your guy was basically like, that's why we go with Hortonworks because of pure open source, committed to that from day one, never wavered. The question is cloud first, AI, machine learning this is a sweet spot for IoT. You're starting to see the collision between cloud and data, and in the intersection of that is deep learning, IoT, a lot of amazing new stuff going to be really popping out of this. Your thoughts and your cloud strategy. >> Obviously we see cloud as an enabler for these use cases. In many instances the use cases can be femoral. They might not be tied immediately to an ROI, so you're going to go to the capital committee and all this kind of stuff, versus let me go prove some value very quickly. It's one of the key enablers core ingredients and when we say cloud first, we really mean it. It's something where the solutions work together. At the same time, cloud becomes important. Our cloud strategy and I think we've talked about this in many different venues is really twofold. One is we want to give a common experience to our customers across whatever footprint they chose, whether it be they roll their own, they do it on print, they do it in public cloud and they have choice of different public cloud vendors. We want to give them a similar experience, a good experience that is enterprise great, platform level experience, so not point solution kind of one function and then get rid of it, but really being able to extend the platform. What I mean by that of course, is being able to have common security, common governance, common operational management. Being able to have a blueprint of the footprint so that there's compatibility of applications that get written. And those applications can move as they decide to change their mind about where their platform hosting the data, so our goal really is to give them a great and common experience across all of those footprints number one. Then number two, to offer a lot of choices across all of those domains as well, whether it be, hey I want to do infrastructure as a service and I know what I want on one end of the spectrum to I'm not sure exactly what I want, but I want to spin up a data science cluster really quickly. Boom, here's a platform as a service offer that runs and is available very easy to consume, comes preconfigured and kind of everywhere in between. >> By the way yesterday Wei was pointing out 99.99 SLAs on some of the stuff coming out. >> Are amazing and obviously in the platform as a service space, you also get the benefit of other cloud services that can plug in that wouldn't necessarily be something you'd expect to be typical of a core Hadoop platform. Getting the SLAs, getting the disaster recovery, getting all of the things that cloud providers can provide behind the scenes is some additional upside obviously as well in those deployment options. Having that common look and feel, making it easy, making it frictionless, are all of the core components of our strategy and we saw a lot of success with that in coming out of year end last year. We see rapid customer adoption. We see rapid customer success and frankly I see that I would say that 99.9% of customers that I talk to are hybrid where they have a foot in nonprem and they have a foot in cloud and they may have a foot in multiple clouds. I think that's indicative of what's going on in the world. Think about the gravity of data. Data movement is expensive. Analytics and multi-core chipsets give us the ability to process and crunch numbers at unprecedented rates, but movement of data is actually kind of hard. There's latency, it can be expensive. A lot of data in the future, IoT data, machine data is going to be created and live its entire lifecycle in the cloud, so the notion of being able to support hybrid with a common look and feel, I think very strategically positions us to help our customers be successful when they start actually dealing with data that lives its entire lifecycle outside the four walls of the data center. >> You guys really did a good job I thought on having that clean positioning of data at rest, but also you had the data in motion, which I think ahead of its time you guys really nailed that and you also had the IoT edge in mind, we've talked I think two years ago and this was really not on everyone's radar, but you guys saw that, so you've made some good bets on the HDInsight and we talked about that yesterday with Wei on here and Microsoft. So edge analytics and data in motion a very key right now, because that batch streaming world's coming together and IoTs flooding it with all this kind of data. We've seen the success in the clouds where analytics have been super successful with powering by the clouds. I got to ask you with Microsoft as your preferred cloud provider, what's the current status for customers who have data in motion, specifically IoT too. It's the common question we're getting, not necessarily the Microsoft question, but okay I've got edge coming in strong-- >> Scott: Mm-hmm >> and I'm going to run certainly hybrid in a multi cloud world, but I want to put the cloud stuff for most of the analytics and how do I deal with the edge? >> Wow, there's a lot there (laughs) >> John: You got 10 seconds, go! (laughs) You have Microsoft as your premier cloud and you have an Amazon relationship with a marketplace and what not. You've got a great relationship with Microsoft. >> Yeah. I think it boils down to a bigger macro thing and hopefully I'll peel into some specifics. I think number one, we as an industry kind of short change ourselves talking about Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. I think it's bigger than Hadoop, not different than but certainly than, right, and this is where we started with the whole connected platforms indicating of traditional Hadoop comes from traditional thinking of data at rest. So I've got some data, I've stored it and I want to run some analytics and I want to be able to scale it and all that kinds of stuff. Really good stuff, but only part of the issue. The other part of the issue is data that's moving, data that's being created outside of the four walls of the data center. Data that's coming from devices. How do I manage and move and handle all of that? Of course there have been different hype cycles on streaming and streaming analytics and data flow and all those things. What we wanted to do is take a very protracted look at the problem set of the future. We said look it's really about the entire lifecycle of data from inception to demise of the data or data being delayed, delete it, which very infrequently happens these days. >> Or cold storage-- >> Cold storage, whatever. You know it's created at the edge, it moves through, it moves in different places, its landed, its analyzed, there are models built. But as models get deployed back out to the edge, that entire problem set is a problem set that I think we, certainly we at Hortonworks are looking to address with the solutions. That actually is accelerated by the notion of multiple cloud footprints because when you think about a customer that may have multiple cloud footprints and trying to tie the data together, it creates a unique opportunity, I think there's a reversal in the way people need to think about the future of compute. Where having been around for a little bit of time, it's always been let me bring all the data together to the applications and have the applications run and then I'll send answers back. That is impossible in this new world order, whether it be the cloud or the fog or any of the things in between or the data center, data are going to be distributed and data movement will become the expensive thing, so it will be very important to be able to have applications that are deployable across a grid, and applications move to the data instead of data moving to the application. And or at least to have a choice and be able to be selective so that I believe that ultimately scalability five years from now, ten years from now, it's not going to be about how many exabytes I have in my cloud instance, that will be part of it, it will be about how many edge devices can I have computing and analyzing simultaneously and coordinating with each other this information to optimize customer experience, to optimize the way an autonomous car drives or anywhere in between. >> It's totally radical, but it's also innovative. You mentioned the cost of moving data will be the issue. >> Scott: Yeah. >> So that's going to change the architecture of the edge. What are you seeing with customers, cuz we're seeing a lot of people taking a protracted view like you were talking about and looking at the architectures, specifically around okay. There's some pressure, but there's no real gun to the head yet, but there's certainly pressure to do architectural thinking around edge and some of the things you mentioned. Patterns, things you can share, anecdotal stories, customer references. >> You know the common thing is that customers go, "Yep, that's going to be interesting. "It's not hitting me right now, "but I know it's going to be important. "How can I ease into it and kind of without the suspenders "how can I prove this is going to work and all that." We've seen a lot of certainly interest in that. What's interesting is we're able to apply some of that futuristic IoT technology in Hortonworks data flow that includes NiFi and MiNiFi out to the edge to traditional problems like, let me get the data from the branches into the central office and have that roundtrip communication to a banker who's talking to a customer and has the benefit of all the analytics at home, but I can guarantee that roundtrip of data and analytics. Things that we thought were solid before, can be solved very easily and efficiently with this technology, which is then also extensible even out further to the edge. In many instances, I've been surprised by customer adoption with them saying, "Yeah, I get that, but gee this helps me "solve a problem that I've had for the last 20 years "and it's very easy and it sets me up "on the right architectural course, "for when I start to add in those edge devices, "I know exactly how I'm going to go do it." It's been actually a really good conversation that's very pragmatic with immediate ROI, but again positioning people for the future that they know is coming. Doing that, by the way, we're also able to prove the security. Think about security is a big issue that everyone's talking about, cyber security and everything. That's typically security about my data center where I've got this huge fence around it and it's very controlled. Think about edge devices are now outside that fence, so security and privacy and provenance become really, really interesting in that world. It's been gratifying to be able to go prove that technology today and again put people on that architectural course that positions them to be able to go out further to the edge as their business demands it. >> That's such great validation when they come back to you with a different solution based on what you just proposed. >> Scott: Yep. >> That means they really start to understand, they really start to see-- >> Scott: Yep. >> How it can provide value to them. >> Absolutely, absolutely. That is all happening and again like I said this I think the notion of the bigger problem set, where it's not just storing data and analyzing data, but how do I have portable applications and portable applications that move further and further out to the edge is going to be the differentiation. The future successful deployments out there because those deployments and folks are able to adopt that kind of technology will have a time to market advantage, they'll have a latency advantage in terms of interaction with a customer, not waiting for that roundtrip of really being able to push out customized, tailored interactions, whether it be again if it's driving your car and stopping on time, which is kind of important, to getting a coupon when you're walking past a store and anywhere in between. >> It's good you guys have certainly been well positioned for being flexible, being an open source has been a great advantage. I got to ask you the final question for the folks watching, I'm sure you guys answer this either to investors or whatnot and customers. A lot's changed in the past five years and a lot's happening right now. You just illustrated it out, the scenario with the edge is very robust, dynamic, changing, but yet value opportunity for businesses. What's the biggest thing that's changing right now in the Hortonworks view of the world that's notable that you thinks worth highlighting to people watching that are your customers, investors, or people in the industry. >> I think you brought up a good point, the whole notion of open and the whole groundswell around open source, open community development as a new paradigm for delivering software. I talked a little bit about a new paradigm of the gravity of data and sensors and this new problem set that we've got to go solve, that's kind of one piece of this storm. The other piece of the storm is the adoption and the wave of open, open community collaboration of developers versus integrated silo stacks of software. That's manifesting itself in two places and obviously I think we're an example of helping to create that. Open collaboration means quicker time to market and more innovation and accelerated innovation in an increasingly complex world. That's one requirement slash advantage of being in the open world. I think the other thing that's happening is the generation of workforce. When I think about when I got my first job, I typed a resume with a typewriter. I'm dating myself. >> White out. >> Scott: Yeah, with white out. (laughter) >> I wasn't a good typer. >> Resumes today is basically name and get GitHub address. Here's my body of work and it's out there for everybody to see, and that's the mentality-- >> And they have their cute videos up there as well, of course. >> Scott: Well yeah, I'm sure. (laughter) >> So it's kind of like that shift to this is now the new paradigm for software delivery. >> This is important. You've got theCUBE interview, but I mean you're seeing it-- >> Is that the open source? >> In the entertainment. No, we're seeing people put huge interviews on their LinkedIn, so this notion of collaboration in the software engineering mindset. You go back to when we grew up in software engineering, now it went to open source, now it's GitHub is essentially a social network for your body of work. You're starting to see the software development open source concepts, they apply to data engineering, data science is still early days. Media media creation what not so, I think that's a really key point in the data science tools are still in their infancy. >> I think open, and by the way I'm not here to suggest that everything will be open, but I think a majority and-- >> Collaborative the majority of the problem that we're solving will be collaborative, it will be ecosystem driven and where there's an extremely large market open will be the most efficient way to address it. And certainly no one's arguing that data and big data is not a large market. >> Yep. You guys are all on the cloud now, you got the Microsoft, any other updates that you think worth sharing with folks. >> You've got to come back and see us in Munich then. >> Alright. We'll be there, theCUBE will be there in Munich in April. We have the Hortonworks coverage going on in Data Works, the conference is now called Data Works in Munich. This is theCUBE here with Scott Gnau, the CTO of Hortonworks. Breaking it down I'm John Furrier with Jeff Frick. More coverage from Big Data SV in conjunction with Strata Hadoop after the short break. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE covering Big good to see you again. and in the intersection of blueprint of the footprint on some of the stuff coming out. of customers that I talk to are hybrid I got to ask you with Microsoft and you have an Amazon relationship of the data center. and be able to be selective You mentioned the cost of and looking at the architectures, and has the benefit on what you just proposed. and further out to the edge I got to ask you the final and the whole groundswell Scott: Yeah, with white out. and that's the mentality-- And they have their cute videos Scott: Well yeah, I'm sure. So it's kind of like that shift to but I mean you're seeing it-- in the data science tools the majority of the you got the Microsoft, You've got to come back We have the Hortonworks

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
Jeff Frick	PERSON	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
New York	LOCATION	0.99+
Munich	LOCATION	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
April	DATE	0.99+
yesterday	DATE	0.99+
10 seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
99.99	QUANTITY	0.99+
two places	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
first job	QUANTITY	0.99+
GitHub	ORGANIZATION	0.99+
next month	DATE	0.99+
two years ago	DATE	0.98+
today	DATE	0.98+
99.9%	QUANTITY	0.98+
ten years	QUANTITY	0.97+
Big Data	EVENT	0.97+
five years	QUANTITY	0.96+
Big Data Silicon Valley 2017	EVENT	0.96+
this morning	DATE	0.95+
O'Reilly Strata Hadoop	ORGANIZATION	0.95+
One	QUANTITY	0.95+
Data Works	EVENT	0.94+
year end last year	DATE	0.94+
one	QUANTITY	0.93+
Hadoop	TITLE	0.93+
theCUBE	ORGANIZATION	0.93+
one piece	QUANTITY	0.93+
Wei Wang	PERSON	0.91+
NYC	LOCATION	0.9+
Wei	PERSON	0.88+
past five years	DATE	0.87+
first	QUANTITY	0.86+
CTO	PERSON	0.83+
four walls	QUANTITY	0.83+
Big Data SV	ORGANIZATION	0.83+
#BigDataSV	EVENT	0.82+
one function	QUANTITY	0.81+
Big Data SV 17	EVENT	0.78+
EU	LOCATION	0.73+
HDInsight	ORGANIZATION	0.69+
Strata Hadoop	PERSON	0.69+
one requirement	QUANTITY	0.68+
number two	QUANTITY	0.65+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for MiNiFi: