Image Title

Search Results for Hortonworks DataFlow:

Day Two Kickoff | DataWorks Summit 2018


 

>> Live from San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to day two of theCube's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight along with my co-host James Kobielus. James, it's great to be here with you in the hosting seat again. >> Day two, yes. >> Exactly. So here we are, this conference, 2,100 attendees from 32 countries, 23 industries. It's a relatively big show. They do three of them during the year. One of the things that I really-- >> It's a well-established show too. I think this is like the 11th year since Yahoo started up the first Hadoop summit in 2008. >> Right, right. >> So it's an established event, yeah go. >> Exactly, exactly. But I really want to talk about Hortonworks the company. This is something that you had brought up in an analyst report before the show started and that was talking about Hortonworks' cash flow positivity for the first time. >> Which is good. >> Which is good, which is a positive sign and yet what are the prospects for this company's financial health? We're still not seeing really clear signs of robust financial growth. >> I think the signs are good for the simple reason they're making significant investments now to prepare for the future that's almost inevitable. And the future that's almost inevitable, and when I say the future, the 2020s, the decade that's coming. Most of their customers will shift more of their workloads, maybe not entirely yet, to public cloud environments for everything they're doing, AI, machine learning, deep learning. And clearly the beneficiaries of that trend will be the public cloud providers, all of whom are Hortonworks' partners and established partners, AWS, Microsoft with Azure, Google with, you know, Google Cloud Platform, IBM with IBM Cloud. Hortonworks, and this is... You know, their partnerships with these cloud providers go back several years so it's not a new initiative for them. They've seen the writing on the wall practically from the start of Hortonworks' founding in 2011 and they now need to go deeper towards making their solution portfolio capable of being deployable on-prem, in cloud, public clouds, and in various and sundry funky combinations called hybrid multi-clouds. Okay, so, they've been making those investments in those partnerships and in public cloud enabling the Hortonworks Data Platform. Here at this show, DataWorks 2018 here in San Jose, they've released the latest major version, HDP 3.0 of their core platform with a lot of significant enhancements related to things that their customers are increasingly doing-- >> Well I want to ask you about those enhancements. >> But also they have partnership announcements, the deep ones of integration and, you know, lift and shift of the Hortonworks portfolio of HDP with Hortonworks DataFlow and DataPlane Services, so that those solutions can operate transparently on those public cloud environments as the customers, as and when the customers choose to shift their workloads. 'Cause Hortonworks really... You know, like Scott Gnau yesterday, I mean just laid it on the line, they know that the more of the public cloud workloads will predominate now in this space. They're just making these speculative investments that they absolutely have to now to prepare the way. So I think this cost that they're incurring now to prepare their entire portfolio for that inevitable future is the right thing to do and that's probably why they still have not attained massive rock and rollin' positive cash flow yet but I think that they're preparing the way for them to do so in the coming decade. >> So their financial future is looking brighter and they're doing the right things. >> Yeah, yes. >> So now let's talk tech. And this is really where you want to be, Jim, I know you. >> Oh I get sleep now and I don't think about tech constantly. >> So as you've said, they're really doing a lot of emphasis now on their public cloud partnerships. >> Yes. >> But they've also launched several new products and upgrades to existing products, what are you seeing that excites you and that you think really will be potential game changers? >> You know, this is geeky but this is important 'cause it's at the very heart of Hortonworks Data Platform 3.0, containerization of more... When you're a data scientist, and you're building a machine learning model using data that's maintained, and is persisted, and processed within Hortonworks Data Platform or any other big data platform, you want the ability increasingly for developing machine learning, deep learning, AI in general, to take that application you might build while you're using TensorFlow models, that you build on HDP, they will containerize it in Docker and, you know, orchestrate it all through Kubernetes and all that wonderful stuff, and deploy it out, those AI, out to increasingly edge computing, mobile computing, embedded computing environments where, you know, the real venture capital mania's happening, things like autonomous vehicles, and you know, drones, and you name it. So the fact is that Hortonworks has made that in many ways the premier new feature of HDP 3.0 announced here this week at the show. That very much harmonizes with what their partners, where their partners are going with containerization of AI. IBM, one of their premier partners, very recently, like last month, I think it was, announced the latest version of IBM, what do they call it, IBM Cloud Private, which has embedded as a core feature containerization within that environment which is a prem-based environment of AI and so forth. The fact that Hortonworks continues to maintain close alignment with the capabilities that its public cloud partners are building to their respective portfolios is important. But also Hortonworks with its, they call it, you know, a single pane of glass, the DataPlane Services for metadata and monitoring and governance and compliance across this sprawling hybrid multi-cloud, these scenarios. The fact that they're continuing to make, in fact, really focusing on deep investments in that portfolio, so that when an IBM introduces or, AWS, whoever, introduces some new feature in their respective platforms, Hortonworks has the ability to, as it were, abstract above and beyond all of that so that the customer, the developer, and the data administrator, all they need to do, if they're a Hortonworks customer, is stay within the DataPlane Services and environment to be able to deploy with harmonized metadata and harmonized policies, and harmonized schemas and so forth and so on, and query optimization across these sprawling environments. So Hortonworks, I think, knows where their bread is buttered and it needs to stay on the DPS, DataPlane Services, side which is why a couple months ago in Berlin, Hortonworks made a, I think, the most significant announcement of the year for them and really for the industry, was that they announced the Data Steward Studio in Berlin. Tech really clearly was who addressed the GDPR mandate that was coming up but really did a stewardship as an end-to-end workflow for lots of, you know, core enterprise applications, absolutely essential. Data Steward Studio is a DataPlane Service that can operate across multi-cloud environments. Hortonworks is going to keep on, you know... They didn't have a DPS, DataPlane Services, announcements here in San Jose this week but you can best believe that next year at this time at this show, and in the interim they'll probably have a number of significant announcements to deepen that portfolio. Once again it's to grease the wheels towards a more purely public cloud future in which there will be Hortonworks DNA inside most of their customers' environments going forward. >> I want to ask you about themes of this year's conference. The thing is is that you were in Berlin at the last big Hortonworks DataWorks Summit. >> (speaks in foreign language) >> And really GDPR dominated the conversations because the new rules and regulations hadn't yet taken effect and companies were sort of bracing for what life was going to be like under GDPR. Now the rules are here, they're here to stay, and companies are really grappling with it, trying to understand the changes and how they can exist in this new regime. What would you say are the biggest themes... We're still talking about GDPR, of course, but what would you say are the bigger themes that are this week's conference? Is it scalability, is it... I mean, what would you say we're going, what do you think has dominated the conversations here? >> Well scalability is not the big theme this week though there are significant scalability announcements this week in the context of HDP 3.0, the ability to persist in a scale-out fashion across multi-cloud, billions of files. Storage efficiency is an important piece of the overall announcement with support for erasure coding, blah blah blah. That's not, you know, that's... Already, Hortonworks, like all of their cloud providers and other big data providers, provide very scalable environments for storage, workload management. That was not the hugest, buzzy theme in terms of the announcements this week. The buzz of course was HDP 3.0. Containerization, that's important, but you know, we just came out of the day two keynote. AI is not a huge focus yet for a lot of the Hortonworks customers who are here, the developers. They're, you know, most of their customers are not yet that far along in their deep learning journeys and whatever but they're definitely going there. There's plenty of really cool keynote discussions including the guy with the autonomous vehicles or whatever that, the thing we just came out of. That was not the predominant theme this week here in terms of the HDP 3.0. I think what it comes down to is that with HDP 3.0... Hive, though you tend to take it for granted, it's been in Hadoop from the very start, practically, Hive is now a full enterprise database and that's the core, one of the cores, of HDP 3.0. Hive itself, Hive 3.0 now is its version, is ACID compliant and that may be totally geeky to the most of the world but that enables it to support transactional applications. So more big data in every environment is supporting more traditional enterprise application, transactional applications that require like two-phase commit and all that goodness. The fact is, you know, Hortonworks have, from what I can see, is the first of the big data vendors to incorporate those enhancements to Hive 3.0 because they're so completely tuned in to the Hive environment in terms of a committer. I think in many ways that is the predominant theme in terms of the new stuff that will actually resonate with the developers, their customers here at the show. And with the, you know, enterprises in general, they can put more of their traditional enterprise application workloads on big data environments and specifically, Hortonworks hopes, its HDP 3.0. >> Well I'm excited to learn more here at the on theCube with you today. We've got a lot of great interviews lined up and a lot of interesting content. We got a great crew too so this is a fun show to do. >> Sure is. >> We will have more from day two of the.

Published Date : Jun 20 2018

SUMMARY :

Live from San Jose, in the heart James, it's great to be here with you One of the things that I really-- I think this is like the So it's an This is something that you had brought up of robust financial growth. in public cloud enabling the Well I want to ask you is the right thing to do doing the right things. And this is really where you Oh I get sleep now and I don't think of emphasis now on their announcement of the year at the last big Hortonworks because the new rules of the announcements this week. this is a fun show to do.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

Hortonworks'ORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

2011DATE

0.99+

JimPERSON

0.99+

IBMORGANIZATION

0.99+

BerlinLOCATION

0.99+

AWSORGANIZATION

0.99+

San JoseLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Silicon ValleyLOCATION

0.99+

JamesPERSON

0.99+

23 industriesQUANTITY

0.99+

YahooORGANIZATION

0.99+

San Jose, CaliforniaLOCATION

0.99+

Hive 3.0TITLE

0.99+

2020sDATE

0.99+

next yearDATE

0.99+

this weekDATE

0.99+

32 countriesQUANTITY

0.99+

HiveTITLE

0.99+

11th yearQUANTITY

0.99+

yesterdayDATE

0.99+

first timeQUANTITY

0.99+

GDPRTITLE

0.98+

last monthDATE

0.98+

DataPlane ServicesORGANIZATION

0.98+

OneQUANTITY

0.98+

Scott GnauPERSON

0.98+

2008DATE

0.98+

threeQUANTITY

0.98+

2,100 attendeesQUANTITY

0.98+

HDP 3.0TITLE

0.98+

todayDATE

0.98+

Data Steward StudioORGANIZATION

0.98+

two-phaseQUANTITY

0.98+

oneQUANTITY

0.97+

DataWorks Summit 2018EVENT

0.96+

DataPlaneORGANIZATION

0.96+

Day twoQUANTITY

0.96+

billions of filesQUANTITY

0.95+

firstQUANTITY

0.95+

day twoQUANTITY

0.95+

DPSORGANIZATION

0.95+

Data Platform 3.0TITLE

0.94+

Hortonworks DataWorks SummitEVENT

0.94+

DataWorksEVENT

0.92+

Scott Gnau, Hortonworks | Big Data SV 2018


 

>> Narrator: Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to the Cube's continuing coverage of Big Data SV. >> This is out tenth Big Data event, our fifth year in San Jose. We are down the street from the Strata Data Conference. We invite you to come down and join us, come on down! We are at Forager Tasting Room & Eatery, super cool place. We've got a cocktail event tonight, and a endless briefing tomorrow morning. We are excited to welcome back to the Cube, Scott Gnau, the CTO of Hortonworks. Hey, Scott, welcome back. >> Thanks for having me, and I really love what you've done with the place. I think there's as much energy here as I've seen in the entire show. So, thanks for having me over. >> Yeah! >> We have done a pretty good thing to this place that we're renting for the day. So, thanks for stopping by and talking with George and I. So, February, Hortonworks announced some news about Hortonworks DataFlow. What was in that announcement? What does that do to help customers simplify data in motion? What industries is it going to be most impactful for? I'm thinking, you know, GDPR is a couple months away, kind of what's new there? >> Well, yeah, and there are a couple of topics in there, right? So, obviously, we're very committed to, which I think is one of our unique value propositions, is we're committed to really creating an easy to use data management platform, as it were, for the entire lifecycle of data, from one data created at the edge and as data are streaming from one place to another place, and, at rest, analytics get run, analytics get pushed back out to the edge. So, that entire lifecycle is really the footprint that we're looking at, and when you dig a level into that, obviously, the data in motion piece is usually important, and So I think one a the things that we've looked at is we don't want to be just a streaming engine or just a tool for creating pipes and data flows and so on. We really want to create that entire experience around what needs to happen for data that's moving, whether it be acquisition at the edge in a protected way with provenance and encryption, whether it be applying streaming analytics as the data are flowing and everywhere kind of in between, and so that's what HDF represents, and what we released in our latest release, which, to your point, was just a few weeks ago, is a way for our customers to go build their data in motion applications using a very simple drag and drop GUI interface. So, they don't have to understand all of the different animals in the zoo, and the different technologies that are in play. It's like, "I want to do this." Okay, here's a GUI tool, you can have all of the different operators that are represented by the different underlying technologies that we provide as Hortonworks DataFlow, and you can stream them together, and then, you can make those applications and test those applications. One of the biggest enhancements that we did, is we made it very easy then for once those things are built in a laptop environment or in a dev environment, to be published out to production or to be published out to other developers who might want to enhance them and so on. So, the idea is to make it consumable inside of an enterprise, and when you think about data in motion and IOT and all those use cases, it's not going to be one department, one organization, or one person that's doing it. It's going to be a team of people that are distributed just like the data and the sensors, and, so, being able to have that sharing capability is what we've enhanced in the experience. >> So, you were just saying, before we went live, that you're here having speed dates with customers. What are some of the things... >> It's a little bit more sincere than that, but yeah. >> (laughs) Isn't speed dating sincere? It's 2018, I'm not sure. (Scott laughs) What are some of the things that you're hearing from customers, and how is that helping to drive what's coming out from Hortonworks? >> So, the two things that I'm hearing right, number one, certainly, is that they really appreciate our approach to the entire lifecycle of data, because customers are really experiencing huge data volume increases and data just from everywhere, and it's no longer just from the ERP system inside the firewall. It's from third party, it's from Sensors, it's from mobile devices, and, so, they really do appreciate kind of the territory that we cover with the tools and technologies we bring to market, and, so, that's been very rewarding. Clearly, customers who are now well into this path, they're starting to think about, in this new world, data governance, and data governance, I just took all of the energy out of the room, governance, it sounds like, you know, hard. What I mean by data governance, really, is customers need to understand, with all of this diverse, connected data everywhere, in the cloud, on PRIM, then Sensors, third party, partners, is, frankly, they need a trail of breadcrumbs that say what is it, where'd it come from, who had access to it, and then, what did they do with it? If you start to piece that together, that's what they really need to understand, the data estate that belongs to them, so they can turn that into refined product, and, so, when you then segway in one of your earlier questions, that GDPR is, certainly, a triggering point where if it's like, okay, the penalties are huge, oh my God, it's a whole new set of regulations that I have to comply with, and when you think about that trail of breadcrumbs that I just described, that actually becomes a roadmap for compliance under regulations like GDPR, where if a European customer calls up and says, "Forget my data.", the only way that you can guarantee that you forgot that person's data, is to actually understand where it all is, and that requires proper governance, tools, and techniques, and, so, when I say governance, it's, really, not like, you know, the governor and the government, and all that. That's an aspect, but the real, important part is how do I keep all of that connectivity so that I can understand the landscape of data that I've got access to, and I'm hearing a lot of energy around that, and when you think about an IOT kind of world, distributed processing, multiple hybrid cloud footprints, data is just everywhere, and, so, the perimeter is no longer fixed, it's kind of variable, and being able to keep track of that is a very important thing for our customers. >> So, continuing on that theme, Scott. Data lakes seem to be the first major new repository we added after we had data warehouses and data marts, and it looked like the governance solutions were sort of around that perimeter of the data lake. Tell us, you were alluding to, sort of, how many more repositories, whether at rest or in motion, there are for data. Do we have to solve the governance problem end-to-end before we can build meaningful applications? >> So, I would argue personally, that governance is one of the most strategic things for us as an industry, collectively, to go solve in a universal way, and what I mean by that, is throughout my career, which is probably longer than I'd like to admit, in an EDW centric world, where things are somewhat easier in terms of the perimeter and where the data came from, data sources were much more controlled, typically ERP systems, owned wholly by a company. Even in that era, true data governance, meta data management, and that provenance was never really solved adequately. There were 300 different solutions, none of which really won. They were all different, non-compatible, and the problem was easier. In this new world, with connected data, the problem is infinitely more difficult to go solve, and, so, that same kind of approach of 300 different proprietary solutions I don't think is going to work. >> So, tell us, how does that approach have to change and who can make that change? >> So, one of the things, obviously, that we're driving is we're leveraging our position in the open community to try to use the community to create that common infrastructure, common set of APIs for meta data management, and, of course, we call that Apache Atlas, and we work with a lot of partners, some of whom are customers, some of whom are other vendors, even some of whom could be considered competitors, to try to drive an Apache open source kind of project to become that standard layer that's common into which vendors can bring their applications. So, now, if I have a common API for tracking meta data in that trail of breadcrumbs that's commonly understood, I can bring in an application that helps customers go develop the taxonomy of the rules that they want to implement, and, then, that helps visualize all of the other functionality, which is also extremely important, and that's where I think specialization comes into play, but having that common infrastructure, I think, is a really important thing, because that's going to enable data, data lakes, IOT to be trusted, and if it's not trusted, it's not going to be successful. >> Okay, there's a chicken and an egg there it sounds like, potentially. >> Am I the chicken or the egg? >> Well, you're the CTO. (Lisa laughs) >> Okay. >> The thing I was thinking of was, the broader the scope of trust that you're trying to achieve at first, the more difficult the problem, do you see customers wanting to pick off one high value application, not necessarily that's about managing what's in Atlas, in the meta data, so much as they want to do an IOT app and they'll implement some amount of governance to solve that app. In other words, which comes first? Do they have to do the end-to-end meta data management and governance, or do they pick a problem off first? >> In this case, I think it's chicken or egg. I mean, you could start from either point. I see customers who are implementing applications in the IOT space, and they're saying, "Hey, this requires a new way to think of governance, "so, I'm going to go and build that out, but I'm going to "think about it being pluggable into the next app." I also see a lot of customers, especially in highly regulated industries, and especially in highly regulated jurisdictions, who are stepping back and saying, "Forget the applications, this is a data opportunity, "and, so, I want to go solve my data fabric, "and I want to have some consistency across "that data fabric into which I can publish data "for specific applications and guarantee "that, wholistically, I am compliant "and that I'm sitting inside of our corporate mission "and all of those things." >> George: Okay. >> So, one of the things you mention, and we talk about this a lot, is the proliferation of data. It's so many, so many different sources, and companies have an opportunity, you had mentioned the phrase data opportunity, there is massive opportunity there, but you said, you know, from even a GDR perspective alone, I can't remove the data if I don't know where it is to the breadcrumbs. As a marketer, we use terms like get a 360 degree view of your customer. Is that actually really something that customers can achieve leveraging a data. Can they actually really get, say a retailer, a 360, a complete view of their customer? >> Alright, 358. >> That's pretty good! >> And we're getting there. (Lisa laughs) Yeah, I mean, obviously, the idea is to get a much broader view, and 360 is a marketing term. I'm not a marketing person, >> Yes. But it, certainly, creates a much broader view of highly personalized information that help you interact with your customer better, and, yes, we're seeing customers do that today and have great success with it and actually change and build new business models based on that capability, for sure. The folks who've done that have realized that in this new world, the way that that works is you have to have a lot of people have access to a lot of data, and that's scary, because that's not the way it used to be, right? >> Right. >> It used to be you go to the DBA and you ask for access, and then, your boss has to sign off and say it's what you asked for. In this world, you need to have access to all of it. So, when you think about this new governance capability where as part of the governance integrated with security, personalized information can be encrypted, it can be blurred out, but you still have access to the data to look at the relationships to be found in the data to build out those sophisticated models. So, that's where not only is it a new opportunity for governance just because the sources, the variety at the different landscape, but it's, ultimately, very much required, because if you're the CSO, you're not going to give access to the marketing team all of its customer data unless you understand that, right, but it has to be, "I'm just giving it to you, "and I know that it's automatically protected." versus, "I'm going to let you ask for it." to be successful. >> Right. >> I guess, following up on that, it sounds like what we were talking about, chicken or egg. Are you seeing an accelerating shift from where data is sort of collected, centrally, from applications, or, what we hear on Amazon, is the amount coming off the edge is accelerating. >> It is, and I think that that is a big drive to, frankly, faster clouded option, you know, the analytic space, particularly, has been a laggard in clouded option for many reasons, and we've talked about it previously, but one of the biggest reasons, obviously, is that data has gravity, data movement is expensive, and, so, now, when you think about where data is being created, where it lives, being further out on the edge, and may live its entire lifecycle in the cloud, you're seeing a reversal of gravity more towards cloud, and that, again, creates more opportunities in terms of driving a more varied perimeter and just keeping track of where all the assets are. Finally, I think it also leads to this notion of managing entire lifecycle of data. One of the implications of that is if data is not going to be centralized, it's going to live in different places, applications have to be portable to move to where the data exists. So, when I think about that landscape of creating ubiquitous data management within Hortonworks' portfolio, that's one of the big values that we can create for our customers. Not only can we be an on-ramp to their hybrid architecture, but as we become that on-ramp, we can also guarantee the portability of the applications that they've built out to those cloud footprints and, ultimately, even out to the edge. >> So, a quick question, then, to clarify on that, or drill down, would that mean you could see scenarios where Hortonworks is managing the distribution of models that do the inferencing on the edge, and you're collecting, bringing back the relevant data, however that's defined, to do the retraining of any models or recreation of new models. >> Absolutely, absolutely. That's one of the key things about the NiFi project in general and Hortonworks DataFlow, specifically, is the ability to selectively move data, and the selectivity can be based on analytic models as well. So, the easiest case to think about is self-driving cars. We all understand how that works, right? A self-driving car has cameras, and it's looking at things going on. It's making decisions, locally, based on models that have been delivered, and they have to be done locally, because of latency, right, but, selectively, hey, here's something that I saw as an image I didn't recognize. I need to send that up, so that it can be added to my lexicon of what images are and what action should be taken. So, of course, that's all very futuristic, but we understand how that works, but that has application in things that are very relevant today. Think about jet engines that have diagnostics running. Do I need to send that terabyte of data an hour over an expensive thing? No, but I have a model that runs locally that says, "Wow, this thing looks interesting. "Let me send a gigabyte now for immediate action." So, that decision making capability is extremely important. >> Well, Scott, thanks so much for taking some time to come chat with us once again on the Cube. We appreciate your insights. >> Appreciate it, time flies. This is great. >> Doesn't it? When you're having fun! >> Yeah. >> Alright, we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at Forager Tasting Room in downtown San Jose at our own event, Big Data SV. We'd love for you to come on down and join us tonight, today, tonight, and tomorrow. Stick around, we'll be right back with our next guest after a short break. (techno music) >> Narrator: Since the dawn of the cloud, the Cube

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media Welcome back to the Cube's We are down the street from the Strata Data Conference. as I've seen in the entire show. What does that do to help customers simplify data in motion? So, the idea is to make it consumable What are some of the things... It's a little bit more from customers, and how is that helping to drive what's that I have to comply with, and when you think and it looked like the governance solutions the problem is infinitely more difficult to go solve, So, one of the things, obviously, Okay, there's a chicken and an egg there it sounds like, Well, you're the CTO. of governance to solve that app. "so, I'm going to go and build that out, but I'm going to So, one of the things you mention, is to get a much broader view, that help you interact with your customer better, in the data to build out those sophisticated models. off the edge is accelerating. if data is not going to be centralized, of models that do the inferencing on the edge, is the ability to selectively move data, to come chat with us once again on the Cube. This is great. Alright, we want to thank you for watching the Cube.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
GeorgePERSON

0.99+

ScottPERSON

0.99+

HortonworksORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

George GilbertPERSON

0.99+

Scott GnauPERSON

0.99+

Lisa MartinPERSON

0.99+

San JoseLOCATION

0.99+

FebruaryDATE

0.99+

360 degreeQUANTITY

0.99+

2018DATE

0.99+

tomorrowDATE

0.99+

358OTHER

0.99+

GDPRTITLE

0.99+

todayDATE

0.99+

tomorrow morningDATE

0.99+

fifth yearQUANTITY

0.99+

tonightDATE

0.99+

LisaPERSON

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

firstQUANTITY

0.99+

Hortonworks'ORGANIZATION

0.99+

one departmentQUANTITY

0.99+

one organizationQUANTITY

0.99+

two thingsQUANTITY

0.99+

360QUANTITY

0.98+

one personQUANTITY

0.98+

oneQUANTITY

0.98+

CubeORGANIZATION

0.97+

Strata Data ConferenceEVENT

0.96+

300 different solutionsQUANTITY

0.96+

an hourQUANTITY

0.95+

OneQUANTITY

0.95+

tenthQUANTITY

0.95+

300 different proprietary solutionsQUANTITY

0.95+

Big Data SV 2018EVENT

0.93+

few weeks agoDATE

0.92+

one dataQUANTITY

0.87+

AtlasTITLE

0.86+

Hortonworks DataFlowORGANIZATION

0.85+

Big DataEVENT

0.85+

CubeCOMMERCIAL_ITEM

0.84+

Silicon ValleyLOCATION

0.83+

EuropeanOTHER

0.82+

DBAORGANIZATION

0.82+

ApacheTITLE

0.79+

TastingORGANIZATION

0.76+

ApacheORGANIZATION

0.73+

CTOPERSON

0.72+

SensorsORGANIZATION

0.71+

downtown San JoseLOCATION

0.7+

Forager Tasting RoomLOCATION

0.67+

SVEVENT

0.66+

terabyte of dataQUANTITY

0.66+

NiFiORGANIZATION

0.64+

ForagerLOCATION

0.62+

Narrator:TITLE

0.6+

Big DataORGANIZATION

0.55+

RoomLOCATION

0.52+

EateryORGANIZATION

0.45+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE


 

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave VellantePERSON

0.99+

JohnPERSON

0.99+

EuropeLOCATION

0.99+

AmazonORGANIZATION

0.99+

2014DATE

0.99+

John FurrierPERSON

0.99+

MicrosoftORGANIZATION

0.99+

John MerticPERSON

0.99+

Mike OlsonPERSON

0.99+

ShaunPERSON

0.99+

IBMORGANIZATION

0.99+

Shaun ConnollyPERSON

0.99+

CentricORGANIZATION

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Coca-ColaORGANIZATION

0.99+

John FurrierPERSON

0.99+

2016DATE

0.99+

4.1 billionQUANTITY

0.99+

ClouderaORGANIZATION

0.99+

AWSORGANIZATION

0.99+

90%QUANTITY

0.99+

twoQUANTITY

0.99+

100 millionQUANTITY

0.99+

fiveQUANTITY

0.99+

2011DATE

0.99+

Mount FujiLOCATION

0.99+

USLOCATION

0.99+

sevenQUANTITY

0.99+

185 millionQUANTITY

0.99+

eight yearsQUANTITY

0.99+

four yearsQUANTITY

0.99+

10xQUANTITY

0.99+

Dahl JeppePERSON

0.99+

YouTubeORGANIZATION

0.99+

FedExORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

100 millionQUANTITY

0.99+

oneQUANTITY

0.99+

MuleSoftORGANIZATION

0.99+

2025DATE

0.99+

Red HatORGANIZATION

0.99+

three yearsQUANTITY

0.99+

15QUANTITY

0.99+

two companiesQUANTITY

0.99+

2012DATE

0.99+

Munich, GermanyLOCATION

0.98+

HadoopTITLE

0.98+

DataWorks 2017EVENT

0.98+

Wei WangPERSON

0.98+

WeiPERSON

0.98+

10%QUANTITY

0.98+

eight yearsQUANTITY

0.98+

20xQUANTITY

0.98+

Hortonworks Hadoop SummitEVENT

0.98+

end of 2016DATE

0.98+

three billion dollarsQUANTITY

0.98+

SiliconANGLEORGANIZATION

0.98+

AzureORGANIZATION

0.98+

DataWorks SummitEVENT

0.97+