Rob Bearden, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Rob	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
hundreds of thousands of dollars	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
each component	QUANTITY	0.99+
GDPR	TITLE	0.99+
DataWorks Summit	EVENT	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
millions of dollars	QUANTITY	0.98+
Atlas	TITLE	0.98+
first steps	QUANTITY	0.98+
HDP 3.0	TITLE	0.97+
One pane	QUANTITY	0.97+
both	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
First	QUANTITY	0.96+
next year	DATE	0.96+
each	QUANTITY	0.96+
DataPlane	TITLE	0.96+
theCUBE	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
DataWorks	ORGANIZATION	0.95+
Spark	TITLE	0.95+
today	DATE	0.94+
EU	LOCATION	0.93+
this morning	DATE	0.91+
Atlanta,	LOCATION	0.91+
Berlin	LOCATION	0.9+
each type	QUANTITY	0.88+
Global Data Protection Regulation GDPR	TITLE	0.87+
one common	QUANTITY	0.86+
few months ago	DATE	0.85+
NiFi	ORGANIZATION	0.85+
Data Platform 3.0	TITLE	0.84+
each tier	QUANTITY	0.84+
Data Studio	ORGANIZATION	0.84+
Data Studio	TITLE	0.83+
day one	QUANTITY	0.83+
one management platform	QUANTITY	0.82+
MiNiFi	ORGANIZATION	0.82+
San	LOCATION	0.71+
DataPlane	ORGANIZATION	0.69+
Kafka	TITLE	0.67+
Encore ERP	TITLE	0.66+
one common set	QUANTITY	0.65+
Data Steward Studio	ORGANIZATION	0.65+
HDF	ORGANIZATION	0.59+
Georgia	LOCATION	0.55+
announcements	QUANTITY	0.51+
Jose	ORGANIZATION	0.47+

Scott Gnau, Hortonworks | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicone Valley, it's theCUBE. Covering Datawork Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of Dataworks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost James Kobielus. We're joined by Scott Gnau, he is the chief technology officer at Hortonworks. Welcome back to theCUBE, Scott. >> Great to be here. >> It's always fun to have you on the show. So, you have really spent your entire career in the data industry. I want to start off at 10,000 feet, and just have you talk about where we are now, in terms of customer attitudes, in terms of the industry, in terms of where customers feel, how they're dealing with their data and how they're thinking about their approach in their business strategy. >> Well I have to say, 30 plus years ago starting in the data field, it wasn't as exciting as it is today. Of course, I always found it very exciting. >> Exciting means nerve-wracking. Keep going. >> Or nerve-wracking. But you know, we've been predicting it. I remember even you know, 10, 15 years ago before big data was a thing, it's like oh all this data's going to come, and it's going to be you know 10x what it is. And we were wrong. It was like 5000x, you know what it is. And I think the really exciting part is that data really used to be relegated frankly, to big companies as a derivative work of ERP systems, and so on and so forth. And while that's very interesting, and certainly enabled a whole level of productivity for industry, when you compare that to all of the data flying around everywhere today, whether it be Twitter feeds and even doing live polls, like we did in the opening session today. Data is just being created everywhere. And the same thing applies to that data that applied to the ERP data of old. And that is being able to harness, manage and understand that data is a new business creating opportunity. And you know, we were with some analysts the other day, and I think one of the more quoted things that came out of that when I was speaking with them, was really, like railroads and shipping in the 1800s and oil in the 1900s, data really is the wealth creator of this century. And so that creates a very nerve-wracking environment. It also creates an environment, a very agile and very important technological breakthroughs that enable those things to be turned into wealth. >> So thinking about that, in terms of where we are at this point in time and on the main stage this morning someone had likened it to the interstate highway system, that really revolutionized transportation, but also commerce. >> I love that actually. I may steal it in some of my future presentations. >> That's good but we'll know where you pilfered it. >> Well perhaps if data is oil the edge, in containerized applications and piping data, you know, microbursts of data across the internet of things, is sort of like the new fracking. You know, you're being able to extract more of this precious resource from the territory. >> Hopefully not quite as damaging to the environment. >> Maybe not. I'm sorry for environmentalist if I just offended you, I apologize. >> But I think you know, all of those analogies are very true, and I particularly like the interstate one this morning. Because when I think about what we've done in our core http platform, and I know Arun was here talking about all the great advances that we built into this, the kind of the core hadoop platform. Very traditional. Store data, analyze data but also bring in new kinds of algorithms, rapid innovation and so on. That's really great but that's kind of half of the story. In a device connected world, in a consumer centric world, capturing data at the edge, moving and processing data at the edge is the new normal, right? And so just like the interstate highway system actually created new ways of commerce because we could move people and things more efficiently, moving data and processing data more efficiently is kind of the second part of the opportunity that we have in this new deluge of data. And that's really where we've been with our Hortonworks data flow. And really saying that the complete package of managing data from origination at the edge all the way through analytic to decision that's triggered back at the edge is like the holy grail, right? And building a technology for that footprint, is why I'm certainly excited today. It's not the caffeine, it's just the opportunity of making all of that work. >> You know, one of the, I think the key announcement for me at this show, that you guys made on HDP 3.0 was containerization of more of the capabilities of your distributed environment so that these capabilities, in terms of processing. First of all, capturing and analyzing an moving that data, can be pushed closer to the end points. Can you speak a bit Scott, about this new capability or this containerization support? Within HDP 3.0 but really in your broader portfolio and where you're going with that in terms of addressing edge applications perhaps, autonomous vehicles or you know, whatever you might put into a new smart phone or whatever you put at the edge. Describe the potential containerizations to sort of break this ecosystem wide open. >> Yeah, I think there are a couple of aspects to containerization and by the way, we're like so excited about kind of the cloud first, containerized HDP 3.0 that we launched here today. There's a lot of great tech that our customers have been clamoring for that they can take advantage of. And it's really just the beginning, which again is part of the excitement of being in the technology space and certainly being part of Hortonworks. So containerization affords a couple of things. Certainly, agility. Agility in deploying applications. So, you know for 30 years we've built these enterprise software stacks that were very integrated, hugely complicated systems that could bring together multiple different applications, different workloads and manage all that in a multi-tendency kind of environment. And that was because we had to do that, right? Servers were getting bigger, they were more powerful but not particularly well distributed. Obviously in a containerized world, you now turn that whole paradigm on its head and you say, you know what? I'm just going to collect these three microservices that I need to do this job. I can isolate them. I can have them run in a server-less technology. I can actually allocate in the cloud servers to go run, and when they're done they go away. And I don't pay for them anymore. So thinking about kind of that from a software development deployment implementation perspective, there huge implications but the real value for customers is agility, right? I don't have to wait until next year to upgrade my enterprise software stack to take advantage of this new algorithm. I can simply isolate it inside of a container, have it run, and have it go away. And get the answer, right? And so when I think about, and a number of our keynotes this morning were talking about just kind of the exponential rate of change, this is really the net new norm. Because the only way we can do things faster, is in fact to be able to provide this. >> And it's not just microservices. Also orchestrating them through Kubernetes, and so forth, so they can be. >> Sure. That's the how versus yeah. >> Quickly deployed as an ensemble and then quickly de-provisioned when you don't need them anymore. >> Yeah so then there's obviously the cost aspect, right? >> Yeah. >> So if you're going to run a whole bunch of stuff or even if you have something as mundane as a really big merge join inside of hive. Let me spin up a thousand extra containers to go do that big thing, and then have them go away when it's done. >> And oh, by the way, you'll be deployed on. >> And only pay for it while I'm using it. >> And then you can possibly distribute those containers across different public clouds depending on what's most cost effective at any point in time Azure or AWS or whatever it might be. >> And I tease with Arun, you know the only thing that we haven't solved is for the speed of light, but we're working on it. >> In talking about how this warp speed change, being the new norm, can you talk about some of the most exciting use cases you've seen in terms of the customers and clients that are using Hortonworks in the coolest ways. >> Well I mean obviously autonomous vehicles is one that we all captured all of our imagination. 'Cause we understand how that works. But it's a perfect use case for this kind of technology. But the technology also applies in fraud detection and prevention. It applies in healthcare management, in proactive personalized medicine delivery, and in generating better outcomes for treatment. So, you know, all across. >> It will bind us in every aspect of our lives including the consumer realm increasingly, yeah. >> Yeah, all across the board. And you know one of the things that really changed, right, is well a couple things. A lot of bandwidth so you can start to connect these things. The devices themselves are particularly smart, so you don't any longer have to transfer all the data to a mainframe and then wait three weeks, sorry, wait three weeks for your answer and then come back. You can have analytic models running on and edge device. And think about, you know, that is really real time. And that actually kind of solves for the speed of light. 'Cause you're not waiting for those things to go back and forth. So there are a lot of new opportunities and those architectures really depend on some of the core tenets of ultimately containerization stateless application deployment and delivery. And they also depend on the ability to create feedback loops to do point-to-point and peer kinds of communication between devices. This is a whole new world of how data get moved and how the decisions around date movement get made. And certainly that's what we're excited about, building with the core components. The other implication of all of this, and we've know each other for a long time. Data has gravity. Data movements expensive. It takes time, frankly, you have to pay for the bandwidth and all that kind of stuff. So being able to play the data where it lies becomes a lot more interesting from an application portability perspective and with all of these new sensors, devices and applications out there, a lot more data is living its entire lifecycle in the cloud. And so being able to create that connective tissue. >> Or as being as terralexical on the edge. >> And even on the edge. >> In with machine learn, let me just say, butt in a second. One of the areas that we're focusing on increasingly in Wikibot in terms of our focus on machine learning at the edge, is more and more machine learning frameworks are coming into the browser world. Javascript for the most like tenser flow JS, you know more of this inferencing and training is going to happen inside your browser. That blows a lot of people's minds. It may not be heavy hitting machine learning, but it'll be good enough for a lot of things that people do in their normal life. Where you don't want to round trip back to the cloud. It's all happening right there, in you know, Chrome or whatever you happen to be using. >> Yeah and so the point being now, you know when I think about the early days, talking about scalability, I remember ship being my first one terabyte database. And then the first 10 terabyte database. Yeah, it doesn't sound very exciting. When I think about scalability of the future, it's really going to, scalability is not going to be defined as petabytes or exabytes under management. It's really going to be defined as petabytes or exabytes affected across a grid of storage and processing devices. And that's a whole new technology paradigm, and really that's kind of the driving force behind what we've been building and what we've been talking about at this conference. >> Excellent. >> So when you're talking about these things. I mean how much, are the companies themselves prepared, and do they have the right kind of talent to use the kinds of insights that you're able to extract? And then act on them in the real time. 'Cause you're talking about how this is saving a lot of the waiting around time. So is this really changing the way business gets done, and do companies have the talent to execute? >> Sure. I mean it's changing the way business gets done. We showed a quote on stage this morning from the CEO of Marriott, right? So, I think there a couple of pieces. One is business are increasingly data driven and business strategy is increasingly the data strategy. And so it starts from the top, kind of setting that strategy and understanding the value of that asset and how that needs to be leveraged to drive new business. So that's kind of one piece. And you know, obviously there are more and more folks kind of coming to the realization that that is important. The other thing that's been helpful is, you know, as with any new technology there's always kind of the startup shortage of resource and people start to spool up and learn. You know the really good news, and for the past 10 years I've been working with a number of different university groups. Parents are actually going to universities and demanding that the curriculum include data, and processing and big data and all of these technologies. Because they know that their children educated in that kind of a world, number one, they're going to have a fun job to go to everyday. 'Cause it's going to be something different everyday. But number two they're going to be employed for life. (laughing) >> Yeah. >> They will be solvent. >> Frankly the demand has actually created a catch up in supply that we're seeing. And of course, you know, as tools start to get more mature and more integrated, they also become a little bit easier to use. You know, less, there's a little bit easier deployment and so on. So a combination of, I'm seeing a really good supply, there really, obviously we invest in education through the community. And then frankly, the education system itself, and folks saying this is really the hot job of the next century. You know, I can be the new oil barren. Or I can be the new railroad captain. It's actually creating more supply which is also very helpful. >> Data's the heart of what I call the new stem cell. It's science, technology, engineering, mathematics that you want to implant in the brains of the young as soon as possible. I hear ya. >> Yeah, absolutely. >> Well Scott thanks so much for coming on. But I want to first also, we can't let you go without the fashion statement. You arrived on set wearing it. >> The elephants. >> I mean it was quite a look. >> Well I did it because then you couldn't see I was sweating on my brow. >> Oh please, no, no, no. >> 'Cause I was worried about this tough interview. >> You know one of the things I love about your logo, and I'll just you know, sounds like I'm fawning. The elephant is a very intelligent animal. >> It is indeed. >> My wife's from Indonesia. I remember going back one time they had Asian elephants at a one of these safari parks. And watching it perform, and then my son was very little then. The elephant is a very sensitive, intelligent animal. You don't realize 'till you're up close. They pick up all manner of social cues. I think it's an awesome symbol for a company that's all about data driven intelligence. >> The elephant never forgets. >> Yeah. >> That's what we know. >> That's right we never forget. >> Him forget 'cause he's got a brain. Or she, I'm sorry. He or she has a brain. >> And it's data driven. >> Yeah. >> Thanks very much. >> Great. Well thanks for coming on theCUBE. I'm Rebecca Knight for James Kobielus. We will have more coming up from Dataworks just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicone Valley, he is the chief technology in terms of the industry, in the data field, Exciting means nerve-wracking. and shipping in the 1800s and on the main stage this I love that actually. where you pilfered it. is sort of like the new fracking. to the environment. I apologize. And really saying that the of more of the capabilities of the cloud servers to go run, and so forth, so they can be. and then quickly de-provisioned and then have them go away when it's done. And oh, by the way, And then you can possibly is for the speed of light, Hortonworks in the coolest ways. But the technology also including the consumer and how the decisions around terralexical on the edge. One of the areas that we're Yeah and so the point being now, the talent to execute? and demanding that the And of course, you know, in the brains of the young the fashion statement. then you couldn't see 'Cause I was worried and I'll just you know, and then my son was very little then. He or she has a brain. for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
James Kobielus	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Indonesia	LOCATION	0.99+
three weeks	QUANTITY	0.99+
30 years	QUANTITY	0.99+
10x	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Marriott	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
1900s	DATE	0.99+
1800s	DATE	0.99+
10,000 feet	QUANTITY	0.99+
Silicone Valley	LOCATION	0.99+
one piece	QUANTITY	0.99+
Dataworks Summit	EVENT	0.99+
AWS	ORGANIZATION	0.99+
Chrome	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
next year	DATE	0.98+
next century	DATE	0.98+
today	DATE	0.98+
30 plus years ago	DATE	0.98+
Javascript	TITLE	0.98+
second part	QUANTITY	0.98+
Twitter	ORGANIZATION	0.98+
first	QUANTITY	0.97+
Dataworks	ORGANIZATION	0.97+
One	QUANTITY	0.97+
5000x	QUANTITY	0.97+
Datawork Summit 2018	EVENT	0.96+
HDP 3.0	TITLE	0.95+
one	QUANTITY	0.95+
this morning	DATE	0.95+
HDP 3.0	TITLE	0.94+
three microservices	QUANTITY	0.93+
first one terabyte	QUANTITY	0.93+
First	QUANTITY	0.92+
DataWorks Summit 2018	EVENT	0.92+
JS	TITLE	0.9+
Asian	OTHER	0.9+
3.0	TITLE	0.87+
one time	QUANTITY	0.86+
a thousand extra containers	QUANTITY	0.84+
this morning	DATE	0.83+
15 years ago	DATE	0.82+
Arun	PERSON	0.81+
this century	DATE	0.81+
10,	DATE	0.8+
first 10 terabyte	QUANTITY	0.79+
couple	QUANTITY	0.72+
Azure	ORGANIZATION	0.7+
Kubernetes	TITLE	0.7+
theCUBE	EVENT	0.66+
parks	QUANTITY	0.59+
a second	QUANTITY	0.58+
past 10 years	DATE	0.57+
number two	QUANTITY	0.56+
Wikibot	TITLE	0.55+
HDP	COMMERCIAL_ITEM	0.54+
rd.	QUANTITY	0.48+

Ram Venkatesh, Hortonworks & Sudhir Hasbe, Google | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by HortonWorks. >> We are wrapping up Day One of coverage of Dataworks here in San Jose, California on theCUBE. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sudhir Hasbe, who is the director of product management at Google and Ram Venkatesh, who is VP of Engineering at Hortonworks. Ram, Sudhir, thanks so much for coming on the show. >> Thank you very much. >> Thank you. >> So, I want to start out by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. >> Sure, so basically what we announced was support for the Hortonworks DataPlatform and Hortonworks DataFlow, HDP and HDF, running on top of the Google Cloud Platform. So this includes deep integration with Google's cloud storage connector layer as well as it's a certified distribution of HDP to run on the Google Cloud Platform. >> I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises and as they look at moving to cloud, like in GCP, Google Cloud, they want the similar, familiar environment. So, they want the choice to deploy on-premises or Google Cloud, but they want the familiarity of what they've already been using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in cloud, or they wat to build this hybrid solution where the data can reside on-premises, can move to cloud and build these common, hybrid architecture. So, that's what this does. >> So, HDP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there's some tie-in between Hortonworks's real-time or low latency or streaming capabilities from HDF in the Google Cloud. So, could you describe, at a full sort of detail level, the degrees of technical integration between your two offerings here. >> You want to take that? >> Sure, I'll handle that. So, essentially, deep in the heart of HDP, there's the HDFS layer that includes Hadoop compatible file system which is a plug-able file system layer. So, what Google has done is they have provided an implementation of this API for the Google Cloud Storage Connector. So this is the GCS Connector. We've taken the connector and we've actually continued to refine it to work with our workloads and now Hortonworks has actually bundling, packaging, and making this connector be available as part of HDP. >> So bilateral data movement between them? Bilateral workload movement? >> No, think of this as being very efficient when our workloads are running on top of GCP. When they need to get at data, they can get at data that is in the Google Cloud Storage buckets in a very, very efficient manner. So, since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the cloud storage connector. This is a critical part of making sure that the architecture is actually optimized for the cloud. So, at our skill and our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiency that they're looking for as they move to the cloud. So, to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled elastically, independent of each other. So this is a fairly fundamental architectural change. We want to make sure that we enable this in a first-class manner. >> I think that's a key point, right. I think what cloud allows you to do is scale the storage and compute independently. And so, with storing data in Google Cloud Storage, you can like scale that horizontally and then just leverage that as your storage layer. And the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is store the data on GCP, on the cloud storage, and then just use the scale, the compute side of it with HDP and HDF. >> So, if you'll indulge me to a name, another Hortonworks partner for just a hypothetical. Let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of HDP on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling which is more compute intensive and then the separate decoupled storage infrastructure to do the training which is more storage intensive? Is that a capability that would available to your customers? With this integration with Google? >> Yeah, so where we are going with this is we are saying, IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So, you can run your machine learning training jobs, you can run your scoring jobs and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in their particular way with HDP, with Apache Ranger, Atlas, and so forth. So, when they move to the cloud, we want this experience to be seamless from a management standpoint. So, from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they are running in Google Cloud as well. So, we've had this capability on Azure and on AWS, so with this partnership, we are announcing the same type of deep integration with GCP as well. >> So Hortonworks is that one pane of glass across all your product partners for all manner of jobs. Go ahead, Rebecca. >> Well, I just wanted to ask about, we've talked about the reason, the impetus for this. With the customer, it's more familiar for customers, it offers the seamless experience, But, can you delve a little bit into the business problems that you're solving for customers here? >> A lot of times, our customers are at various points on their cloud journey, that for some of them, it's very simple, they're like there's a broom coming by and the datacenter is going away in 12 months and I need to be in the cloud. So, this is where there is a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So, for example, one of our large customers, a travel partner, so they are exploring their new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure, they know they have that for a while. They are spinning up new use cases in the cloud typically for reasons of agility. So, if you, typically many of our customers, they operate large, multi-tenant clusters on-prem. That's nice for, so a very scalable compute for running large jobs. But, if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you can do that. Whereas in this sort of model, what they can say is, they can bring up a new workload and just have the specific versions and dependency that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as... >> Through the containerization of the Spark jobs or whatever. >> Correct, and so containerization as well as even spinning up an entire new environment. Because, in the cloud, given that you have access to elastic compute resources, they can come and go. So, your workloads are much more independent of the underlying cluster than they are on-premise. And this is where sort of the core business benefits around agility, speed of deployment, things like that come into play. >> And also, if you look at the total cost of ownership, really take an example where customers are collecting all this information through the month. And, at month end, you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That's a great scenario for cloud where you don't have to on-premises create an infrastructure, keep it ready. So that's one example where now, in the new partnership, you can collect all the data through the on-premises if you want throughout the month. But, move that and leverage cloud to go ahead and scale and do this workload and finish the books and all. That's one, the second example I can give is, a lot of customers collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still connect all these events through HDP that may be running on-premises with Kafka and then, what you can do is, in-cloud, in GCP, you can deploy HDP, HDF, and you can use the HDF from there for real-time stream processing. So, collect all these clickstream events, use them, make decisions like, hey, which products are selling better?, should we go ahead and give?, how many people are looking at that product?, or how many people have bought it?. That kind of aggregation and real-time at scale, now you can do in-cloud and build these hybrid architectures that are there. And enable scenarios where in past, to do that kind of stuff, you would have to procure hardware, deploy hardware, all of that. Which all goes away. In-cloud, you can do that much more flexibly and just use whatever capacity you have. >> Well, you know, ephemeral workloads are at the heart of what many enterprise data scientists do. Real-world experiments, ad-hoc experiments, with certain datasets. You build a TensorFlow model or maybe a model in Caffe or whatever and you deploy it out to a cluster and so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of an ongoing experimentation program that's, you know, they're building and testing assets that may be or may not be deployed in the production applications. That's you know, so I can see a clear need for that, well, that capability of this announcement in lots of working data science shops in the business world. >> Absolutely. >> And I think coming down to, if you really look at the partnership, right. There are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at-scale at a lower cost, like total cost of ownership, reducing that, running at-scale analytics. That's one of the big things. Again, as I said, the hybrid scenarios. Most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in these hybrid scenarios. And what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they are building and get business value out of all of these. And then, finally, we at Google believe that the world will be more and more real-time over a period of time. Like, we already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions. And this is only going to grow. And this partnership also provides the whole streaming analytics capabilities in-cloud at-scale for customers to build these hybrid plus also real-time streaming scenarios with this package. >> Well it's clear from Google what the Hortonworks partnership gives you in this competitive space, in the multi-cloud space. It gives you that ability to support hybrid cloud scenarios. You're one of the premier public cloud providers and we all know about. And clearly now that you got, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. >> That's perfect, exactly right. >> Well a great note to end on. Thank you so much for coming on theCUBE. Sudhir, Ram, that you so much. >> Thank you, thanks a lot. >> Thank you. >> I'm Rebecca Knight for James Kobielus, we will have more tomorrow from DataWorks. We will see you tomorrow. This is theCUBE signing off. >> From sunny San Jose. >> That's right.

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, for coming on the show. So, I want to start out by asking you to run on the Google Cloud Platform. and as they look at moving to cloud, in the Google Cloud. So, essentially, deep in the heart of HDP, and the cost efficiency is scale the storage and to do the training which and you can have the same that one pane of glass With the customer, it's and just have the specific of the Spark jobs or whatever. of the underlying cluster and then, what you can and so the life of a data that the world will be And clearly now that you got, Sudhir, Ram, that you so much. We will see you tomorrow.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
two	QUANTITY	0.99+
Sudhir	PERSON	0.99+
Ram Venkatesh	PERSON	0.99+
San Jose	LOCATION	0.99+
HortonWorks	ORGANIZATION	0.99+
Sudhir Hasbe	PERSON	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
DataWorks	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Ram	PERSON	0.99+
AWS	ORGANIZATION	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.99+
two offerings	QUANTITY	0.98+
12 months	QUANTITY	0.98+
One	QUANTITY	0.98+
Day One	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
IBM	ORGANIZATION	0.97+
second example	QUANTITY	0.97+
Google Cloud Platform	TITLE	0.96+
Atlas	ORGANIZATION	0.96+
Google Cloud	TITLE	0.94+
Apache Ranger	ORGANIZATION	0.92+
three key areas	QUANTITY	0.92+
Hadoop	TITLE	0.91+
Kafka	TITLE	0.9+
theCUBE	ORGANIZATION	0.88+
earlier this morning	DATE	0.87+
Apache Hive	ORGANIZATION	0.86+
GCP	TITLE	0.86+
one pane	QUANTITY	0.86+
IBM Data Science	ORGANIZATION	0.84+
Azure	TITLE	0.82+
Spark	TITLE	0.81+
first	QUANTITY	0.79+
HDF	ORGANIZATION	0.74+
once in a month	QUANTITY	0.73+
HDP	ORGANIZATION	0.7+
TensorFlow	OTHER	0.69+
Hortonworks DataPlatform	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.61+
GCS	OTHER	0.57+
HDP	TITLE	0.5+
DSX	TITLE	0.49+
Cloud Storage	TITLE	0.47+

John Kreisa, Hortonworks | DataWorks Summit 2018

>> Live from San José, in the heart of Silicon Valley, it's theCUBE! Covering DataWorks Summit 2018. Brought to you by Hortonworks. (electro music) >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San José, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by John Kreisa. He is the VP of marketing here at Hortonworks. Thanks so much for coming on the show. >> Thank you for having me. >> We've enjoyed watching you on the main stage, it's been a lot of fun. >> Thank you, it's been great. It's been great general sessions, some great talks. Talking about the technology, we've heard from some customers, some third parties, and most recently from Kevin Slavin from The Shed which is really amazing. >> So I really want to get into this event. You have 2,100 attendees from 23 different countries, 32 different industries. >> Yep. This started as a small, >> That's right. tiny little thing! >> Didn't Yahoo start it in 2008? >> It did, yeah. >> You changed names a few year ago, but it's still the same event, looming larger and larger. >> Yeah! >> It's been great, it's gone international as you've said. It's actually the 17th total event that we've done. >> Yeah. >> If you count the ones we've done in Europe and Asia. It's a global community around data, so it's no surprise. The growth has been phenomenal, the energy is great, the innovations that the community is talking about, the ecosystem is talking about, is really great. It just continues to evolve as an event, it continues to bring new ideas and share those ideas. >> What are you hearing from customers? What are they buzzing about? Every morning on the main stage, you do different polls that say, "how much are you using machine learning? What portion of your data are you moving to the cloud?" What are you learning? >> So it's interesting because we've done similar polls in our show in Berlin, and the results are very similar. We did the cloud poll pole and there's a lot of buzz around cloud. What we're hearing is there's a lot of companies that are thinking about, or are somewhere along their cloud journey. It's exactly what their overall plans are, and there's a lot of news about maybe cloud will eat everything, but if you look at the pole results, something like 75% of the attendees said they have cloud in their plans. Only about 12% said they're going to move everything to the cloud, so a lot of hybrid with cloud. It's how to figure out which work loads to run where, how to think about that strategy in terms of where to deploy the data, where to deploy the work loads and what that should look like and that's one of the main things that we're hearing and talking a lot about. >> We've been seeing that Wikiban and our recent update to the recent market forecast showed that public cloud will dominate increasingly in the coming decade, but hybrid cloud will be a long transition period for many or most enterprises who are still firmly rooted in on-premises employment, so forth and so on. Clearly, the bulk of your customers, both of your custom employments are on premise. >> They are. >> So you're working from a good starting point which means you've got what, 1,400 customers? >> That's right, thereabouts. >> Predominantly on premises, but many of them here at this show want to sustain their investment in a vendor that provides them with that flexibility as they decide they want to use Google or Microsoft or AWS or IBM for a particular workload that their existing investment to Hortonworks doesn't prevent them from facilitating. It moves that data and those workloads. >> That's right. The fact that we want to help them do that, a lot of our customers have, I'll call it a multi-cloud strategy. They want to be able to work with an Amazon or a Google or any of the other vendors in the space equally well and have the ability to move workloads around and that's one of the things that we can help them with. >> One of the things you also did yesterday on the main stage, was you talked about this conference in the greater context of the world and what's going on right now. This is happening against the backdrop of the World Cup, and you said that this is really emblematic of data because this is a game, a tournament that generates tons of data. >> A tremendous amount of data. >> It's showing how data can launch new business models, disrupt old ones. Where do you think we're at right now? For someone who's been in this industry for a long time, just lay the scene. >> I think we're still very much at the beginning. Even though the conference has been around for awhile, the technology has been. It's emerging so fast and just evolving so fast that we're still at the beginning of all the transformations. I've been listening to the customer presentations here and all of them are at some point along the journey. Many are really still starting. Even in some of the polls that we had today talked about the fact that they're very much at the beginning of their journey with things like streaming or some of the A.I. machine learning technologies. They're at various stages, so I believe we're really at the beginning of the transformation that we'll see. >> That reminds me of another detail of your product portfolio or your architecture streaming and edge deployments are also in the future for many of your customers who still primarily do analytics on data at rest. You made an investment in a number of technologies NiFi from streaming. There's something called MiNiFi that has been discussed here at this show as an enabler for streaming all the way out to edge devices. What I'm getting at is that's indicative of Arun Murthy, one of your co-founders, has made- it was a very good discussion for us analysts and also here at the show. That is one of many investments you're making is to prepare for a future that will set workloads that will be more predominant in the coming decade. One of the new things I've heard this week that I'd not heard in terms of emphasis from you guys is more of an emphasis on data warehousing as an important use case for HDP in your portfolios, specifically with HIVE. The HIVE 3.0 now in- HDP3.0. >> Yes. >> With the enhancements to HIVE to support more real time and low latency, but also there's ACID capabilities there. I'm hearing something- what you guys are doing is consistent with one of your competitors, Cloudera. They're going deeper into data warehousing too because they recognize they've got to got there like you do to be able to absorb more of your customers' workloads. I think that's important that you guys are making that investment. You're not just big data, you're all data and all data applications. Potentially, if your customers want to go there and engage you. >> Yes. >> I think that was a significant, subtle emphasis that me as an analyst noticed. >> Thank you. There were so many enhancements in 3.0 that were brought from the community that it was hard to talk about everything in depth, but you're right. The enhancements to HIVE in terms of performance have really enabled it to take on a greater set of workloads and inner activity that we know that our customers want. The advantage being that you have a common data layer in the back end and you can run all this different work. It might be data warehousing, high speed query workloads, but you can do it on that same data with Spark and data-science related workloads. Again, it's that common pool backend of the data lake and having that ability to do it with common security and governance. It's one of the benefits our customers are telling us they really appreciate. >> One of the things we've also heard this morning was talking about data analytics in terms of brand value and brand protection importantly. Fedex, exactly. Talking about, the speaker said, we've all seen these apology commercials. What do you think- is it damage control? What is the customer motivation here? >> Well a company can have billions of dollars of market cap wiped out by breeches in security, and we've seen it. This is not theoretical, these are actual occurrences that we've seen. Really, they're trying to protect the brand and the business and continue to be viable. They can get knocked back so far that it can take years to recover from the impact. They're looking at the security aspects of it, the governance of their data, the regulations of GVPR. These things you've mentioned have real financial impact on the businesses, and I think it's brand and the actual operations and finances of the businesses that can be impacted negatively. >> When you're thinking about Hortonworks's marketing messages going forward, how do you want to be described now, and then how do you want customers to think of you five or 10 years from now? >> I want them to think of us as a partner to help us with their data journey, on all aspects of their data journey, whether they're collecting data from the EDGE, you mentioned NiFi and things like that. Bringing that data back, processing it in motion, as well as processing it in rest, regardless of where that data lands. On premise, in the cloud, somewhere in between, the hybrid, multi-cloud strategy. We really want to be thought of as their partner in their data journey. That's really what we're doing. >> Even going forward, one of the things you were talking about earlier is the company's sort of saying, "we want to be boring. We want to help you do all the stuff-" >> There's a lot of money in boring. >> There's a lot of money, right! Exactly! As you said, a partner in their data journey. Is it "we'll do anything and everything"? Are you going to do niche stuff? >> That's a good question. Not everything. We are focused on the data layer. The movement of data, the process and storage, and truly the analytic applications that can be built on top of the platform. Right now we've stuck to our strategy. It's been very consistent since the beginning of the company in terms of taking these open source technologies, making them enterprise viable, developing an eco-system around it and fostering a community around it. That's been our strategy since before the company even started. We want to continue to do that and we will continue to do that. There's so much innovation happening in the community that we quickly bring that into the products and make sure that's available in a trusted, enterprise-tested platform. That's really one of the things we see our customers- over and over again they select us because we bring innovation to them quickly, in a safe and consumable way. >> Before we came on camera, I was telling Rebecca that Hortonworks has done a sensational job of continuing to align your product roadmaps with those of your leading partners. IBM, AWS, Microsoft. In many ways, your primary partners are not them, but the entire open source community. 26 open source projects in which Hortonworks represents and incorporated in your product portfolio in which you are a primary player and committer. You're a primary ingester of innovation from all the communities in which you operate. >> We do. >> That is your core business model. >> That's right. We both foster the innovation and we help drive the information ourselves with our engineers and architects. You're absolutely right, Jim. It's the ability to get that innovation, which is happening so fast in the community, into the product and companies need to innovate. Things are happening so fast. Moore's Law was mentioned multiple times on the main stage, you know, and how it's impacting different parts of the organization. It's not just the technology, but business models are evolving quickly. We heard a little bit about Trumble, and if you've seen Tim Leonard's talk that he gave around what they're doing in terms of logistics and the ability to go all the way out to the farmer and impact what's happening at the farm and tracking things down to the level of a tomato or an egg all the way back and just understand that. It's evolving business models. It's not just the tech but the evolution of business models. Rob talked about it yesterday. I think those are some of the things that are kind of key. >> Let me stay on that point really quick. Industrial internet like precision agriculture and everything it relates to, is increasingly relying on visual analysis, parts and eggs and whatever it might be. That is convolutional neural networks, that is A.I., it has to be trained, and it has to be trained increasingly in the cloud where the data lives. The data lives in H.D.P, clusters and whatnot. In many ways, no matter where the world goes in terms of industrial IoT, there will be massive cluster of HTFS and object storage driving it and also embedded A.I. models that have to follow a specific DevOps life cycle. You guys have a strong orientation in your portfolio towards that degree of real-time streaming, as it were, of tasks that go through the entire life cycle. From the preparing the data, to modeling, to training, to deploying it out, to Google or IBM or wherever else they want to go. So I'm thinking that you guys are in a good position for that as well. >> Yeah. >> I just wanted to ask you finally, what is the takeaway? We're talking about the attendees, talking about the community that you're cultivating here, theme, ideas, innovation, insight. What do you hope an attendee leaves with? >> I hope that the attendee leaves educated, understanding the technology and the impacts that it can have so that they will go back and change their business and continue to drive their data projects. The whole intent is really, and we even changed the format of the conference for more educational opportunities. For me, I want attendees to- a satisfied attendee would be one that learned about the things they came to learn so that they could go back to achieve the goals that they have when they get back. Whether it's business transformation, technology transformation, some combination of the two. To me, that's what I hope that everyone is taking away and that they want to come back next year when we're in Washington, D.C. and- >> My stomping ground. >> His hometown. >> Easy trip for you. They'll probably send you out here- (laughs) >> Yeah, that's right. >> Well John, it's always fun talking to you. Thank you so much. >> Thank you very much. >> We will have more from theCUBE's live coverage of DataWorks right after this. I'm Rebecca Knight for James Kobielus. (upbeat electro music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the VP of marketing you on the main stage, Talking about the technology, So I really want to This started as a small, That's right. but it's still the same event, It's actually the 17th total event the innovations that the community is that's one of the main things that Clearly, the bulk of your customers, their existing investment to Hortonworks have the ability to move workloads One of the things you also did just lay the scene. Even in some of the polls that One of the new things I've heard this With the enhancements to HIVE to subtle emphasis that me the data lake and having that ability to One of the things we've also aspects of it, the the EDGE, you mentioned NiFi and one of the things you were talking There's a lot of money, right! That's really one of the things we all the communities in which you operate. It's the ability to get that innovation, the cloud where the data lives. talking about the community that learned about the things they came to They'll probably send you out here- fun talking to you. coverage of DataWorks right after this.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Tim Leonard	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
Jim	PERSON	0.99+
Kevin Slavin	PERSON	0.99+
Europe	LOCATION	0.99+
John Kreisa	PERSON	0.99+
Berlin	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
2008	DATE	0.99+
Washington, D.C.	LOCATION	0.99+
Asia	LOCATION	0.99+
75%	QUANTITY	0.99+
Rob	PERSON	0.99+
five	QUANTITY	0.99+
San José	LOCATION	0.99+
next year	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
32 different industries	QUANTITY	0.99+
World Cup	EVENT	0.99+
yesterday	DATE	0.99+
23 different countries	QUANTITY	0.99+
one	QUANTITY	0.99+
1,400 customers	QUANTITY	0.99+
today	DATE	0.99+
two	QUANTITY	0.99+
2,100 attendees	QUANTITY	0.99+
Fedex	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
26 open source projects	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.98+
17th	QUANTITY	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
Cloudera	ORGANIZATION	0.97+
about 12%	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
this week	DATE	0.96+
DataWorks Summit 2018	EVENT	0.95+
NiFi	ORGANIZATION	0.91+
this morning	DATE	0.89+
HIVE 3.0	OTHER	0.86+
Spark	TITLE	0.86+
few year ago	DATE	0.85+
Wikiban	ORGANIZATION	0.85+
The Shed	ORGANIZATION	0.84+
San José, California	LOCATION	0.84+
tons	QUANTITY	0.82+
H.D.P	LOCATION	0.82+
DataWorks	EVENT	0.81+
things	QUANTITY	0.78+
DataWorks	ORGANIZATION	0.74+
MiNiFi	TITLE	0.62+
data	QUANTITY	0.61+
Moore	TITLE	0.6+
years	QUANTITY	0.59+
coming decade	DATE	0.59+
Trumble	ORGANIZATION	0.59+
GVPR	ORGANIZATION	0.58+
3.0	OTHER	0.56+

Cindy Maike, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering Data Works Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of Dataworks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Cindy Maike. She is the VP Industry Solutions and GM Insurance and Healthcare at Hortonworks. Thanks so much for coming on theCUBE, Cindy. >> Thank you, thank you, look forward to it. >> So, before the cameras were rolling we were talking about the business case for data, for data analytics. Walk our viewers through how you, how you think about the business case and your approach to sort of selling it. >> So, when you think about data and analytics, I mean, as industries we've been very good sometimes at doing kind of like the operational reporting. To me that's looking in the rearview mirror, something's already happened, but when you think about data and analytics, especially big data it's about what questions haven't I been able to answer. And, a lot of companies when they embark on it they're like, let's do it for technology's sake, but from a business perspective when we, as our industry GMs we are out there working with our customers it's like, what questions can't you answer today and how can I look at existing data on new data sources to actually help me answer questions. I mean, we were talking a little bit about the usage of sensors and so forth around telematics and the insurance industry, connected homes, connective lives, connected cars, those are some types of concepts. In other industries we're looking at industrial internet of things, so how do I actually make the operations more efficient? How do I actually deploy time series analysis to actually help us become more profitable? And, that's really where companies are about. You know, I think in our keynote this morning we were talking about new communities and it's what does that mean? How do we actually leverage data to either monetize new data sources or make us more profitable? >> You're a former insurance CFO, so let's delve into that use case a little bit and talk about the questions that I haven't asked yet. What are some of those and how are companies putting this thing to work? >> Yeah so, the insurance industry you know, it's kind of frustrating sometimes where as an insurance company you sit there and you always monitor what your combined ratio is, especially if you're a property casualty company and you go, yeah, but that tells me information like once a month, you know, but I was actually with a chief marketing officer recently and she's like, she came from the retail industry and she goes, I need to understand what's going on in my business on any given day. And so, how can we leverage better real time information to say, what customers are we interacting with? You know, what customers should we not be interacting with? And then you know, the last thing insurance companies want to do is go out and say, we want you as a customer and then you decline their business because they're not risk worthy. So, that's where we're seeing the insurance industry and I'll focus a lot on insurance here, but it's how do we leverage data to change that customer engagement process, look at connected ecosystems and it's a good time to be well fundamentally in the insurance industry, we're seeing a lot of use cases, but also in the retail industry, new data opportunities that are out there. We talked a little bit before the interview started on shrinkage and you know, the retail industry's especially in the food, any type of consumer type packages, we're starting to see the usage of sensors to actually help companies move fresh food around to reduce their shrinkage. You know, we've got. >> Sorry, just define shrinkage, 'cause I'm not even sure I understand, it's not that your gapple is getting smaller. It refers to perishable goods, you explain it. >> Right, so you're actually looking at, how do we make sure that my produce or items that are perishable, you know, I want to minimize the amount of inventory write offs that I have to do, so that would be the shrinkage and this one major retail chain is, they have a lot of consumer goods that they're actually saying, you know what, their shrinkage was pretty high, so they're now using sensors to help them monitor should we, do we need to move certain types of produce? Do we need to look at food before it expires you know, to make sure that we're not doing an inventory write off. >> You say sensors and it's kind of, are you referring to cameras taking photos of the produce or are you referring to other types of chemical analysis or whatever it might be, I don't know. >> Yeah, so it's actually a little bit of both. It's how do I actually you know, looking at certain types of products, so we all know when you walk into a grocery store or some type of department store, there's cameras all over the place, so it's not just looking at security, but it's also looking at you know, are those goods moving? And so, you can't move people around a store, but I can actually use the visualization and now with deep machine learning you can actually look at that and say, you know what, those bananas are getting a little ripe. We need to like move those or we need to help turn the inventory. And then, there's also things with bar coding you know, when you think of things that are on the shelves. So, how do I look at those bar codes because in the past you would've taken somebody down the isle. They would've like checked that, but no, now we're actually looking up the bar codes and say, do we need to move this? Do we need to put these things on sale? >> At this conference we're hearing just so much excitement and talk about data as the new oil and it is an incredible strategic asset, but you were also saying that it could become a liability. Talk about the point at which it becomes a liability. >> It becomes a liability when one, we don't know what to do with it, or we make decisions off of data data, so you think about you know, I'll give you an example, in the healthcare industry. You know, medical procedures have changed so immensely. The advancement in technology, precision medicine, but if we're making healthcare decisions on medical procedures from 10 years ago, so you really need to say how do I leverage you know, newer data stats, so over time if you make your algorithms based on data that's 10, 20 years old, it's good in certain things, but you know, you can make some bad business decisions if the data is not recent. So, that's when I talk about the liability aspect. >> Okay, okay, and then, thinking about how you talk with, collaborate with customers, what is your approach in the sense of how you help them think through their concerns, their anxieties? >> So, a lot of times it's really kind of understanding what's their business strategy. What are their financial, what are their operational goals? And you say, what can we look at from a data perspective, both data that we have today or data that we can acquire from new data sources to help them actually achieve their business goals and you know, specifically in the insurance industry we focus on top line growth with growing your premium or decreasing your combined ratio. So, what are the types of data sources and the analytical use cases that we can actually you know, use? See the exact same thing in manufacturing, so. >> And, have customer attitudes evolved over time since you've been in the industry? How would you describe their mindsets right now? >> I think we still have some industries that we struggle with, but it's actually you know, I mentioned healthcare, the way we're seeing data being used in the healthcare industry, I mean, it's about precision medicine. You look at gnomics research. It says that if people like 58 percent of the world's population would actually do a gnomics test if they could actually use that information. So, it's interesting to see. >> So, the struggle is with people's concern about privacy encroachment, is that the primary struggle? >> There's a little bit of that and companies are saying, you know, I want to make sure that it's not being used against me, but there was actually a recent article in Best Review, which is an insurance trade magazine, that says, you know, if I have, actually have a gnomic test can the insurance industry use that against me? So, I mean, there's still a little bit of concern. >> Which is a legitimate concern. >> It is, it is, absolutely and then also you know, we see globally with just you know, the General Data Protection act, the GDPR, you know, how are companies using my information and data? So you know, consumers have to be comfortable with the type of data, but outside of the consumer side there's so much data in the industry and you made the comment about you know, data's the new oil. I have a thing, against, with that is, but we don't use oil straight in a car, we don't use crude putting in a car, so once we do something with it which is the analytical side, then that's where we get the business end side. So, data for data's sake is just data. It's the business end sites is what's really important. >> Looking ahead at Hortonworks five, 10 years from now I mean, how much, how much will your business account for the total business of Hortonworks do you think, in the sense of as you've said, this is healthcare and insurance represents such huge potential possibilities and opportunities for the company? Where do you see the trajectory? >> The trajectory I believe is really in those analytical apps, so we were working with a lot of partners that are like you know, how do I accelerate those business value because like I said, it's like we're not just into data management, we're in the data age and what does that mean? It's like turning those things into business value and I've got to be able to I think from an industry perspective, you know be working with the right partners and then also customers because they lack some of the skillsets. So, who can actually accelerate the time to value of using data for profitability? >> Is your primary focus area at helping regulated industries with their data analytics challenges and using IOT or does it also cover unregulated? >> Unregulated as well. >> Are the analytics requirements different between regulated and unregulated in terms of the underlying capabilities they require in terms of predictive modeling, of governance and so forth and how does Hortonworks differentiate their response to those needs? >> Yeah, so it varies a little bit based upon their regulations. I mean, even if you look at life sciences, life sciences is very, very regulated on how long do I have to keep the data? How can I actually use the data? So, if you look at those industries that maybe aren't regulated as much, so we'll get away from financial services, highly regulated across all different areas, but I'll also look at say business insurance, not as much regulated as like you and I as consumers, because insurance companies can use any type of data to actually do the pricing and doing the underwriting and the actual claims. So, still regulated based upon the solvency, but not regulated on how we use it to evaluate risk. Manufacturing, definitely some regulation there from a work safety perspective, but you can use the data to optimize your yields you know, however you see fit. So, we see a mixture of everything, but I think from a Hortonworks perspective it's being able to share data across multiple industries 'cause we talk about connected ecosystems and connected ecosystems are really going to change business of the future. >> So, how so? I mean, especially in bringing it back to this conference, to Data Works, and the main stage this morning we heard so much about these connected communities and really it's all about the ecosystem, what do you see as the biggest change going forward? >> So, you look at, and I'll give you the context of the insurance industry. You look at companies like Arity, which is a division of All State, what they're doing actually working with the car manufacturers, so at some point in time you know, the automotive industry, General Motors tried this 20 years ago, they didn't quite get it with On Star and GMAC Insurance. Now, you actually have the opportunity with you know, maybe on the front man for the insurance industry. So, I can now start to collect the data from the vehicle. I'm using that for driving of the vehicle, but I can also use it to help a driver make safer driving. >> And upsize their experience of actually driving, making it more pleasant as well as safer. There's many layers of what can be done now with the same data. Some of those uses impinge or relate to regulated concern or mandatory concerns, then some are purely for competitive differentiation of the whole issue of experience. >> Right, and you think about certain aspects that the insurance industry just has you know, a negative connotation and we have an image challenge on what data can and cannot be used, so, but a lot of people opt in to an automotive manufacturer and share that type of data, so moving forward who's to say with the connected ecosystem I still have the insurance company in the background doing all the underwriting, but my distribution channel is now the car dealer. >> I love it, great. That's a great note to end on. Thanks so much for coming on theCUBE. Thank you Cindy. I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of Data Works in just a little bit. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. She is the VP Industry Thank you, thank about the business case and your approach kind of like the operational reporting. the questions that I haven't asked yet. And then you know, the last goods, you explain it. before it expires you know, of the produce or are you also looking at you know, about data as the new oil but you know, you can make actually you know, use? actually you know, I mentioned that says, you know, if I have, the industry and you made accelerate the time to value business of the future. of the insurance industry. competitive differentiation of the whole Right, and you think Thank you Cindy.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Cindy	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Cindy Maike	PERSON	0.99+
General Motors	ORGANIZATION	0.99+
General Data Protection act	TITLE	0.99+
San Jose	LOCATION	0.99+
10	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
58 percent	QUANTITY	0.99+
Arity	ORGANIZATION	0.99+
GDPR	TITLE	0.98+
20 years ago	DATE	0.98+
On Star	ORGANIZATION	0.98+
once a month	QUANTITY	0.98+
GM Insurance	ORGANIZATION	0.97+
theCUBE	ORGANIZATION	0.97+
Data Works Summit 2018	EVENT	0.96+
one	QUANTITY	0.96+
today	DATE	0.96+
DataWorks Summit 2018	EVENT	0.95+
both	QUANTITY	0.95+
10 years ago	DATE	0.94+
VP Industry Solutions	ORGANIZATION	0.94+
GMAC Insurance	ORGANIZATION	0.92+
this morning	DATE	0.9+
both data	QUANTITY	0.84+
five	QUANTITY	0.78+
20 years	QUANTITY	0.75+
10 years	QUANTITY	0.72+
Dataworks	ORGANIZATION	0.59+
Data Works	TITLE	0.59+
Best Review	TITLE	0.54+
theCUBE	EVENT	0.54+
State	ORGANIZATION	0.49+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Dan Potter. He is the VP Product Management at Attunity and also Ali Bajwah, who is the principal partner solutions engineer at Hortonworks. Thanks so much for coming on theCUBE. >> Pleasure to be here. >> It's good to be here. >> So I want to start with you, Dan, and have you tell our viewers a little bit about the company based in Boston, Massachusetts, what Attunity does. >> Attunity, we're a data integration vendor. We are best known as a provider of real-time data movement from transactional systems into data lakes, into clouds, into streaming architectures, so it's a modern approach to data integration. So as these core transactional systems are being updated, we're able to take those changes and move those changes where they're needed when they're needed for analytics for new operational applications, for a variety of different tasks. >> Change data capture. >> Change data capture is the heart of our-- >> They are well known in this business. They have changed data capture. Go ahead. >> We are. >> So tell us about the announcement today that Attunity has made at the Hortonworks-- >> Yeah, thank you, it's a great announcement because it showcases the collaboration between Attunity and Hortonworks and it's all about taking the metadata that we capture in that integration process. So we're a piece of a data lake architecture. As we are capturing changes from those source systems, we are also capturing the metadata, so we understand the source systems, we understand how the data gets modified along the way. We use that metadata internally and now we're built extensions to share that metadata into Atlas and to be able to extend that out through Atlas to higher data governance initiatives, so Data Steward Studio, into the DataPlane Services, so it's really important to be able to take the metadata that we have and to add to it the metadata that's from the other sources of information. >> Sure, for more of the transactional semantics of what Hortonworks has been describing they've baked in to HDP in your overall portfolios. Is that true? I mean, that supports those kind of requirements. >> With HTP, what we're seeing is you know the EDW optimization play has become more and more important for a lot of customers as they try to optimize the data that their EDWs are working on, so it really gels well with what we've done here with Attunity and then on the Atlas side with the integration on the governance side with GDPR and other sort of regulations coming into the play now, you know, those sort of things are becoming more and more important, you know, specifically around the governance initiative. We actually have a talk just on Thursday morning where we're actually showcasing the integration as well. >> So can you talk a little bit more about that for those who aren't going to be there for Thursday. GDPR was really a big theme at the DataWorks Berlin event and now we're in this new era and it's not talked about too, too much, I mean we-- >> And global business who have businesses at EU, but also all over the world, are trying to be systematic and are consistent about how they manage PII everywhere. So GDPR are those in EU regulation, really in many ways it's having ripple effects across the world in terms of practices. >> Absolutely and at the heart of understanding how you protect yourself and comply, I need to understand my data, and that's where metadata comes in. So having a holistic understanding of all of the data that resides in your data lake or in your cloud, metadata becomes a key part of that. And also in terms of enforcing that, if I understand my customer data, where the customer data comes from, the lineage from that, then I'm able to apply the protections of the masking on top of that data. So it's really, the GDPR effect has had, you know, it's created a broad-scale need for organizations to really get a handle on metadata so the timing of our announcement just works real well. >> And one nice thing about this integration is that you know it's not just about being able to capture the data in Atlas, but now with the integration of Atlas and Ranger, you can do enforcement of policies based on classifications as well, so if you can tag data as PCI, PII, personal data, that can get enforced through Ranger to say, hey, only certain admins can access certain types of data and now all that becomes possible once we've taken the initial steps of the Atlas integration. >> So with this collaboration, and it's really deepening an existing relationship, so how do you go to market? How do you collaborate with each other and then also service clients? >> You want to? >> Yeah, so from an engineering perspective, we've got deep roots in terms of being a first-class provider into the Hortonworks platform, both HDP and HDF. Last year about this time, we announced our support for acid merge capabilities, so the leading-edge work that Hortonworks has done in bringing acid compliance capabilities into Hive, was a really important one, so our change to data capture capabilities are able to feed directly into that and be able to support those extensions. >> Yeah, we have a lot of you know really key customers together with Attunity and you know maybe a a result of that they are actually our ISV of the Year as well, which they probably showcase on their booth there. >> We're very proud of that. Yeah, no, it's a nice honor for us to get that distinction from Hortonworks and it's also a proof point to the collaboration that we have commercially. You know our sales reps work hand in hand. When we go into a large organization, we both sell to very large organizations. These are big transformative initiatives for these organizations and they're looking for solutions not technologies, so the fact that we can come in, we can show the proof points from other customers that are successfully using our joint solution, that's really, it's critical. >> And I think it helps that they're integrating with some of our key technologies because, you know, that's where our sales force and our customers really see, you know, that as well as that's where we're putting in the investment and that's where these guys are also investing, so it really, you know, helps the story together. So with Hive, we're doing a lot of investment of making it closer and closer to a sort of real-time database, where you can combine historical insights as well as your, you know, real-time insights. with the new acid merge capabilities where you can do the inserts, updates and deletes, and so that's exactly what Attunity's integrating with with Atlas. We're doing a lot of investments there and that's exactly what these guys are integrating with. So I think our customers and prospects really see that and that's where all the wins are coming from. >> Yeah, and I think together there were two main barriers that we saw in terms of customers getting the most out of their data lake investment. One of them was, as I'm moving data into my data lake, I need to be able to put some structure around this, I need to be able to handle continuously updating data from multiple sources and that's what we introduce with Attunity composed for Hive, building out the structure in an automated fashion so I've got analytics-ready data and using the acid merge capabilities just made those updates much easier. The second piece was metadata. Business users need to have confidence that the data that they're using. Where did this come from? How is it modified? And overcoming both of those is really helping organizations make the most of those investments. >> How would you describe customer attitudes right now in terms of their approach to data because I mean, as we've talked about, data is the new oil, so there's a real excitement and there's a buzz around it and yet there's also so many high-profile cases of breeches and security concerns, so what would you say, is it that customers, are they more excited or are they more trepidatious? How would you describe the CIL mindset right now? >> So I think security and governance has become top of minds right, so more and more the serveways that we've taken with our customers, right, you know, more and more customers are more concerned about security, they're more concerned about governance. The joke is that we talk to some of our customers and they keep talking to us about Atlas, which is sort of one of the newer offerings on governance that we have, but then we ask, "Hey, what about Ranger for enforcement?" And they're like, "Oh, yeah, that's a standard now." So we have Ranger, now it's a question of you know how do we get our you know hooks into the Atlas and all that kind of stuff, so yeah, definitely, as you mentioned, because of GDPR, because of all these kind of issues that have happened, it's definitely become top of minds. >> And I would say the other side of that is there's real excitement as well about the possibilities. Now bringing together all of this data, AI, machine learning, real-time analytics and real-time visualization. There's analytic capabilities now that organizations have never had, so there's great excitement, but there's also trepidation. You know, how do we solve for both of those? And together, we're doing just that. >> But as you mentioned, if you look at Europe, some of the European companies that are more hit by GDPR, they're actually excited that now they can, you know, really get to understand their data more and do better things with it as a result of you know the GDPR initiative. >> Absolutely. >> Are you using machine learning inside of Attunity in a Hortonworks context to find patterns in that data in real time? >> So we enable data scientists to build those models. So we're not only bringing the data together but again, part of the announcement last year is the way we structure that data in Hive, we provide a complete historic data store so every single transaction that has happened and we send those transactions as they happen, it's at a big append, so if you're a data scientist, I want to understand the complete history of the transactions of a customer to be able to build those models, so building those out in Hive and making those analytics ready in Hive, that's what we do, so we're a key enabler to machine learning. >> Making analytics ready rather than do the analytics in the spring, yeah. >> Absolutely. >> Yeah, the other side to that is that because they're integrated with Atlas, you know, now we have a new capability called DataPlane and Data Steward Studio so the idea there is around multi-everything, so more and more customers have multiple clusters whether it's on-prem, in the cloud, so now more and more customers are looking at how do I get a single glass pane of view across all my data whether it's on-prem, in the cloud, whether it's IOT, whether it's data at rest, right, so that's where DataPlane comes in and with the Data Steward Studio, which is our second offering on top of DataPlane, they can kind of get that view across all their clusters, so as soon as you know the data lands from Attunity into Atlas, you can get a view into that across as a part of Data Steward Studio, and one of the nice things we do in Data Steward Studio is that we also have machine learning models to do some profiling, to figure out that hey, this looks like a credit card, so maybe I should suggest this as a tag of sensitive data and now the end user, the end administration has the option of you know saying that okay, yeah, this is a credit card, I'll accept that tag, or they can reject that and pick one of their own. >> Will any of this going forward of the Attunity CDC change in the capture capability be containerized for deployment to the edges in HDP 3.0? I mean, 'cause it seems, I mean for internetive things, edge analytics and so forth, change data capture, is it absolutely necessary to make the entire, some call it the fog computing, cloud or whatever, to make it a completely transactional environment for all applications from micro endpoint to micro endpoint? Are there any plans to do that going forward? >> Yeah, so I think what HDP 3.0 as you mentioned right, one of the key factors that was coming into play was around time to value, so with containerization now being able to bring third-party apps on top of Yarn through Docker, I think that's definitely an avenue that we're looking at. >> Yes, we're excited about that with 3.0 as well, so that's definitely in the cards for us. >> Great, well, Ali and Dan, thank you so much for coming on theCUBE. It's fun to have you here. >> Nice to be here, thank you guys. >> Great to have you. >> Thank you, it was a pleasure. >> I'm Rebecca Knight, for James Kobielus, we will have more from DataWorks in San Jose just after this. (techno music)

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Dan Potter	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Ali Bajwah	PERSON	0.99+
Dan	PERSON	0.99+
Ali Bajwa	PERSON	0.99+
Ali	PERSON	0.99+
James Kobielus	PERSON	0.99+
Thursday morning	DATE	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
Attunity	ORGANIZATION	0.99+
Last year	DATE	0.99+
One	QUANTITY	0.99+
second piece	QUANTITY	0.99+
GDPR	TITLE	0.99+
Atlas	ORGANIZATION	0.99+
Thursday	DATE	0.99+
both	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
Ranger	ORGANIZATION	0.98+
second offering	QUANTITY	0.98+
DataWorks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
Atlas	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
today	DATE	0.97+
DataWorks Summit 2018	EVENT	0.96+
two main barriers	QUANTITY	0.95+
DataPlane Services	ORGANIZATION	0.95+
DataWorks Summit 2018	EVENT	0.94+
one	QUANTITY	0.93+
San Jose, California	LOCATION	0.93+
Docker	TITLE	0.9+
single glass	QUANTITY	0.87+
3.0	OTHER	0.85+
European	OTHER	0.84+
Attunity	PERSON	0.84+
Hive	LOCATION	0.83+
HDP 3.0	OTHER	0.82+
one nice thing	QUANTITY	0.82+
DataWorks Berlin	EVENT	0.81+
EU	ORGANIZATION	0.81+
first	QUANTITY	0.8+
DataPlane	TITLE	0.8+
EU	LOCATION	0.78+
EDW	TITLE	0.77+
Data Steward Studio	ORGANIZATION	0.73+
Hive	ORGANIZATION	0.73+
Data Steward Studio	TITLE	0.69+
single transaction	QUANTITY	0.68+
Ranger	TITLE	0.66+
Studio	COMMERCIAL_ITEM	0.63+
CDC	ORGANIZATION	0.58+
DataPlane	ORGANIZATION	0.55+
them	QUANTITY	0.53+
HDP 3.0	OTHER	0.52+

Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Alan Gates, Hortonworks | Dataworks Summit 2018

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
Alan	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Alan Gates	PERSON	0.99+
four years	QUANTITY	0.99+
James	PERSON	0.99+
ING	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Apache	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Java	TITLE	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
100%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Atlas	ORGANIZATION	0.99+
DataWorks Summit 2018	EVENT	0.98+
Data Steward Studio	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
NiFi	ORGANIZATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
Hadoop	ORGANIZATION	0.98+
one platform	QUANTITY	0.97+
2018	EVENT	0.97+
both	QUANTITY	0.97+
millions of events	QUANTITY	0.96+
Hbase	ORGANIZATION	0.95+
Tablo	TITLE	0.95+
ODPI	ORGANIZATION	0.94+
Big Data Analytics	ORGANIZATION	0.94+
One	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
NiFi	COMMERCIAL_ITEM	0.92+
day two	QUANTITY	0.92+
about five	QUANTITY	0.91+
Kafka	TITLE	0.9+
Zeppelin	ORGANIZATION	0.89+
Atlas	TITLE	0.85+
Ranger	ORGANIZATION	0.84+
Jupyter	ORGANIZATION	0.83+
first	QUANTITY	0.82+
Apache Atlas	ORGANIZATION	0.82+
Hadoop	TITLE	0.79+

Joe Morrissey, Hortonworks | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to theCUBE. I'm James Kobielus. I'm lead analyst at Wikibon for big data analytics. Wikibon, of course, is the analyst team inside of SiliconANGLE Media. One of our core offerings is theCUBE and I'm here with Joe Morrissey. Joe is the VP for International at Hortonworks and Hortonworks is the host of Dataworks Summit. We happen to be at Dataworks Summit 2018 in Berlin! Berlin, Germany. And so, Joe, it's great to have you. >> Great to be here! >> We had a number of conversations today with Scott Gnau and others from Hortonworks and also from your customer and partners. Now, you're International, you're VP for International. We've had a partner of yours from South Africa on theCUBE today. We've had a customer of yours from Uruguay. So there's been a fair amount of international presence. We had Munich Re from Munich, Germany. Clearly Hortonworks is, you've been in business as a company for seven years now, I think it is, and you've established quite a presence worldwide, I'm looking at your financials in terms of your customer acquisition, it just keeps going up and up so you're clearly doing a great job of bringing the business in throughout the world. Now, you've told me before the camera went live that you focus on both Europe and Asia PACS, so I'd like to open it up to you, Joe. Tell us how Hortonworks is doing worldwide and the kinds of opportunities you're selling into. >> Absolutely. 2017 was a record year for us. We grew revenues by over 40% globally. I joined to lead the internationalization of the business and you know, not a lot of people know that Hortonworks is actually one of the fastest growing software companies in history. We were the fastest to get to $100 million. Also, now the fastest to get to $200 million but the majority of that revenue contribution was coming from the United States. When I joined, it was about 15% of international contribution. By the end of 2017, we'd grown that to 31%, so that's a significant improvement in contribution overall from our international customer base even though the company was growing globally at a very fast rate. >> And that's also not only fast by any stretch of the imagination in terms of growth, some have said," Oh well, maybe Hortonworks, "just like Cloudera, maybe they're going to plateau off "because the bloom is off the rose of Hadoop." But really, Hadoop is just getting going as a market segment or as a platform but you guys have diversified well beyond that. So give us a sense for going forward. What are your customers? What kind of projects are you positioning and selling Hortonworks solutions into now? Is it a different, well you've only been there 18 months, but is it shifting towards more things to do with streaming, NiFi and so forth? Does it shift into more data science related projects? Coz this is worldwide. >> Yeah. That's a great question. This company was founded on the premise that data volumes and diversity of data is continuing to explode and we believe that it was necessary for us to come and bring enterprise-grade security and management and governance to the core Hadoop platform to make it really ready for the enterprise, and that's what the first evolution of our journey was really all about. A number of years ago, we acquired a company called Onyara, and the logic behind that acquisition was we believe companies now wanted to go out to the point of origin, of creation of data, and manage data throughout its entire life cycle and derive pre-event as well as post-event analytical insight into their data. So what we've seen as our customers are moving beyond just unifying data in the data lake and deriving post-transaction inside of their data. They're now going all the way out to the edge. They're deriving insight from their data in real time all the way from the point of creation and getting pre-transaction insight into data as well so-- >> Pre-transaction data, can you define what you mean by pre-transaction data. >> Well, I think if you look at it, it's really the difference between data in motion and data at rest, right? >> Oh, yes. >> A specific example would be if a customer walks into the store and they've interacted in the store maybe on social before they come in or in some other fashion, before they've actually made the purchase. >> Engagement data, interaction data, yes. >> Engagement, exactly. Exactly. Right. So that's one example, but that also extends out to use cases in IoT as well, so data in motion and streaming data, as you mentioned earlier since become a very, very significant use case that we're seeing a lot of adoption for. Data science, I think companies are really coming to the realization that that's an essential role in the organization. If we really believe that data is the most important asset, that it's the crucial asset in the new economy, then data scientist becomes a really essential role for any company. >> How do your Asian customers' requirements differ, or do they differ from your European cause European customers clearly already have their backs against the wall. We have five weeks until GDPR goes into effect. Do many of your Asian customer, I'm sure a fair number sell into Europe, are they putting a full court, I was going to say in the U.S., a full court press on complying with GDPR, or do they have equivalent privacy mandates in various countries in Asia or a bit of both? >> I think that one of the primary drivers I see in Asia is that a lot of companies there don't have the years of legacy architecture that European companies need to contend with. In some cases, that means that they can move towards next generation data-orientated architectures much quicker than European companies have. They don't have layers of legacy tech that they need to sunset. A great example of that is Reliance. Reliance is the largest company in India, they've got a subsidiary called GO, which is the fastest growing telco in the world. They've implemented our technology to build a next-generation OSS system to improve their service delivery on their network. >> Operational support system. >> Exactly. They were able to do that from the ground up because they formed their telco division around being a data-only company and giving away voice for free. So they can in some extent, move quicker and innovate a little faster in that regards. I do see much more emphasis on regulatory compliance in Europe than I see in Asia. I do think that GDPR amongst other regulations is a big driver of that. The other factor though I think that's influencing that is Cloud and Cloud strategy in general. What we've found is that, customers are drawn to the Cloud for a number of reasons. The economics sometimes can be attractive, the ability to be able to leverage the Cloud vendors' skills in terms of implementing complex technology is attractive, but most importantly, the elasticity and scalability that the Cloud provides us, hugely important. Now, the key concern for customers as they move to the Cloud though, is how do they leverage that as a platform in the context of an overall data strategy, right? And when you think about what a data strategy is all about, it all comes down to understanding what your data assets are and ensuring that you can leverage them for a competitive advantage but do so in a regulatory compliant manner, whether that's data in motion or data at rest. Whether it's on-prem or in the Cloud or in data across multiple Clouds. That's very much a top of mind concern for European companies. >> For your customers around the globe, specifically of course, your area of Europe and Asia, what percentage of your customers that are deploying Hortonworks into a purely public Cloud environment like HDInsight and Microsoft Azure or HDP inside of AWS, in a public Cloud versus in a private on-premises deployment versus in a hybrid public-private multi Cloud. Is it mostly on-prem? >> Most of our business is still on-prem to be very candid. I think almost all of our customers are looking at migrating, some more close to the Cloud. Even those that had intended to have a Cloud for a strategy have now realized that not all workloads belong in the Cloud. Some are actually more economically viable to be on-prem, and some just won't ever be able to move to the Cloud because of regulation. In addition to that, most of our customers are telling us that they actually want Cloud optionality. They don't want to be locked in to a single vendor, so we very much view the future as hybrid Cloud, as multi Cloud, and we hear our customers telling us that rather than just have a Cloud strategy, they need a data strategy. They need a strategy to be able to manage data no matter where it lives, on which tier, to ensure that they are regulatory compliant with that data. But then to be able to understand that they can secure, govern, and manage those data assets at any tier. >> What percentage of your deals involve a partner? Like IBM is a major partner. Do you do a fair amount of co-marketing and joint sales and joint deals with IBM and other partners or are they mostly Hortonworks-led? >> No, partners are absolutely critical to our success in the international sphere. Our partner revenue contribution across EMEA in the past year grew, every region grew by over 150% in terms of channel contribution. Our total channel business was 28% of our total, right? That's a very significant contribution. The growth rate is very high. IBM are a big part of that, as are many other partners. We've got, the very significant reseller channel, we've got IHV and ISV partners that are critical to our success also. Where we're seeing the most impact with with IBM is where we go to some of these markets where we haven't had a presence previously, and they've got deep and long-standing relationships and that helps us accelerate time to value with our customers. >> Yeah, it's been a very good and solid partnership going back several years. Well, Joe, this is great, we have to wrap it up, we're at the end of our time slot. This has been Joe Morrissey who is the VP for International at Hortonworks. We're on theCUBE here at Dataworks Summit 2018 in Berlin, and want to thank you all for watching this segment and tune in tomorrow, we'll have a full slate of further discussions with Hortonworks, with IBM and others tomorrow on theCUBE. Have a good one. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and Hortonworks is the host of Dataworks Summit. and the kinds of opportunities you're selling into. Also, now the fastest to get to $200 million of the imagination in terms of growth, and governance to the core Hadoop platform Pre-transaction data, can you define what you mean maybe on social before they come in or Engagement data, that that's an essential role in the organization. Do many of your Asian customer, that they need to sunset. the ability to be able to leverage the Cloud vendors' skills and Microsoft Azure or Most of our business is still on-prem to be very candid. and joint deals with IBM that are critical to our success also. and want to thank you all for watching this segment and

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Joe Morrissey	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Europe	LOCATION	0.99+
Joe	PERSON	0.99+
Uruguay	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
India	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
seven years	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
28%	QUANTITY	0.99+
South Africa	LOCATION	0.99+
Onyara	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
United States	LOCATION	0.99+
$100 million	QUANTITY	0.99+
$200 million	QUANTITY	0.99+
31%	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
18 months	QUANTITY	0.99+
GO	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.98+
today	DATE	0.98+
U.S.	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
AWS	ORGANIZATION	0.98+
Berlin, Germany	LOCATION	0.98+
over 40%	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.98+
Reliance	ORGANIZATION	0.98+
over 150%	QUANTITY	0.97+
Dataworks Summit	EVENT	0.97+
EMEA	ORGANIZATION	0.97+
first evolution	QUANTITY	0.96+
2018	EVENT	0.96+
European	OTHER	0.96+
SiliconANGLE Media	ORGANIZATION	0.95+
Munich, Germany	LOCATION	0.95+
One	QUANTITY	0.95+
end of 2017	DATE	0.94+
Hadoop	TITLE	0.93+
Cloudera	ORGANIZATION	0.93+
about 15%	QUANTITY	0.93+
past year	DATE	0.92+
theCUBE	ORGANIZATION	0.92+
single vendor	QUANTITY	0.91+
telco	ORGANIZATION	0.89+
Munich Re	ORGANIZATION	0.88+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Raj Verma, Hortonworks - DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to by Hortonworks. >> Welcome back to theCUBE, we are live, on day two of the DataWorks Summit. I'm Lisa Martin. #DWS17, join the conversation. We've had a great day and a half. We have learned from a ton of great influencers and leaders about really what's going on with big data, data science, how things are changing. My cohost is George Gilbert. We're joined by my old buddy, the COO of Hortonworks, Rajnish Verma. Raj, it's great to have you on theCUBE. >> It's great to be here, Lisa. Great to see you as well, it's been a while. >> It has, so yesterday on the customer panel, the Raj I know had great conversation with customers from, Duke Energy was one. You also had Black Knight on the financial services side. >> Rajnish: And HSC. >> Yes, on the insurance side, and one of the things that, a couple things that really caught my attention, one was when Duke said, kind of, where they were using data and moving to Hadoop, but they are now a digital company. They're now a technology company that sells electricity and products, which I thought was fantastic. Another thing that I found really interesting about that was they all talked about the need to leverage big data, and glean insights and monetize that, really requires this cultural shift. So I know you love customer interactions. Talk to us about what you're seeing. Those are three great industry examples. What are you seeing? Where are customers on this sort of maturity model where big data and Hadoop are concerned? >> Sure, happy to. So one thing that I enjoy the most about my job is meeting customers and talking to them about the art of the possible. And some of the stuff that they're doing, and, which was only science fiction, really, about two or three years ago. And they're a couple of questions that you've just asked me as to where they are on their journey, what are they trying to accomplish, et cetera. I remember about, five, seven, 10 years ago where Marc Andreessen said "Software is eating the world." And to be honest with you, now, it's now more like every company is a data company. I wouldn't say data is eating the world, but without effective monetization of your data assets, you can't be a force to reckon with as a company. So that is a common theme that we are seeing irrespective of industry, irrespective of customer, irrespective of really the size of the customer. The only thing that sort of varies is the amount and complexity of data, from one company to the other. Now, when, I'm new to Hortonworks as you know. It's really my fifth month here. And one of the things that I've seen and, Lisa, as you know, are coming from TIBCO. So we've been dealing with data. I have been involved with data for over a decade and a half now, right. So the difference was, 15 years ago, we were dealing with really structured data and we actually connected the structured data and gleaned insights into structured data. Now, today, a seminal challenge that every CIO or chief data officer is trying to solve is how do you get actionable insights into semi-structured and unstructured data. Now, so, getting insights into that data first requires ability to aggregate data, right. Once you've aggregated data, you also need a platform to make sense of data in real-time, that is being streamed at you. Now once you do those two things, then you put yourself in a position to analyze that data. So in that journey, as you asked, where our customers are. Some are defining their data aggregation strategy. The others, having defined data aggregation, they're talking about streaming analytics as a platform, and then the others are talking about data science and machine learning and deep learning, as a journey. Now, you saw the customer panel yesterday. But the one point I'd like to make is, it's not only the Duke Energies and the Black Knights of the world, or the HSC, who I believe are big, large firms that are using data. Even a company like, an old agricultural company, or I shouldn't say old but steeped in heritage is probably the right word. 96, 97 year old agricultural company that's in the animal feed business. Animal feed. Multi-billion dollar animal feed business. They use data to monetize their business model. What they say is, they've been feeding animals for the last 70 years. Sp now they go to a farmer and they have enough data about how to feed animals, that they can actually tell the farmer, that this hog that you have right now, which is 17 pounds, I can guarantee you that I will have him or her on a nutrition that, by four months, it'll be 35 pounds. How much are you willing to pay? So even in the animal feed business, data is being used to drive not only insights, but monetization models. >> Wow. >> So. >> That's outstanding. >> Thank you. >> So in getting to that level of sophistication, it's not like every firm sort of has the skills and technology in place to do that. What are some of the steps that you find that they typically have to go through to get to that level of maturity? Like, where do they make mistakes? Where do they find the skills to manage on-prem infrastructure, if it is on-premmed? What about, if they're trying to do a hybrid cloud setup. How complex is that? >> I think that's where the power of the community comes through at multiple levels. So we're committed to the open-source movement. We're committed to the community-based development of data. Now, this community-based business model does a few things. Firstly, it keeps the innovation at the leading edge, bleeding edge, number one. But as you heard the panel talk about yesterday, one of the biggest benefits that our customers see of using open source, is, sure economics is good, but that's not the leading reason. Keeping up with innovation, very high up there. Avoiding when to lock in, again very, very high up there. But one of the biggest reasons that CIOs gave me for choosing open source as a business model is more to do with the fact that they can attract good talent, and without open source, you can't actually attract talent. And I can relate to that because I have a sophomore at home. And it just happened to me that she's 15 now but she's been using open source since she was 11. The iPhone and, she downloads an application for free. She uses it, and if she stretches the limit of that, then she orders something more in a paid model. So the community helps people do a few things. Be able to fail fast if they need to. The second is, it lowers the barriers of entry, right. Because it's really free. You can have the same model. The third is, you can rely on the community for support and methodologies and best practices and lessons learned from implementations. The fourth is, it's a great hiring ground in terms of bringing people in and attracting Millennial talent, young talent, and sought-after talent. So that's really probably the answer that I would have for that. >> When you talk about the business model, the open-source business model and the attraction on the customer side, that sounded like there's this analogy with sort of the agro-business customer in the sense that there are offering data along with their traditional product. If your traditional product is open-source data management, what a room started telling us this morning was the machine learning that goes along with operating not only your own sort of internal workloads but customers, and being to offer prescriptive advice on operations, essentially IT operations. Is that the core, will that become the core of sort of value-add through data for an open-source business model like yours? >> I don't want to be speculative but I'll probably answer it another way. I think our vision, which was set by our founder Rob Bearden, and he took you guys through that yesterday, was way back when, we did say that our mission in life is to manage the world's data. So that mission hasn't changed. And the second was, we would do it as a open-source community or as a big contributing part of that community. And that has really not changed. Now, we feel that machine learning and data science and deep learning are areas that we're very, very excited about, our customers are very, very excited about. Now, the one thing that we did cover yesterday and I think earlier today as well, I'm a computer science engineer. And when I was in college, way back when, 25 years ago, I was interested in AI and ML. And it has existed for 50 years. The reason why it hasn't been available to the common man, so as to speak, is because of two reasons. One is, it did not have a source of data that it could sit on top of, that makes machine learning and AI effective. Or at least not a commercially-viable option to do so. Now, there is one. The second is, the compute power required to run some of the large algorithms that really give you insights into machine learning and AI. So we've become the platform on which customers can take advantage of excellent machine learning and AI tools to get insights. Now, that is two independent sort of categories. One is the open source community providing the platform. And then what tools the customer has used to apply data science and machine learning, so. >> So, all right. I'm thinking something that is slightly different and maybe the nuance is making it tough to articulate. But it's how can Hortonworks take the data platform and data science tools that you use to help understand how to operate important works, whether it's on a customer prem, or in the cloud. In other words, how can you use machine learning to make it a sort of a more effective and automated manage service? >> Yeah, and I think that's, the nuance's not lost in me. I think what I'm trying to sort of categorize is, for that to happen, you require two things. One is data aggregator across on-prem and cloud. Because when you have data which is multi-tenancy, you have a lot of issues with data security, data governance, all the rest of it. Now, that is what we plan to manage for the world, so as to speak. Now, on top of that, customers who require to have data science or deep learning to be used, we provide that platform. Now, whether that is used as a service by the customer, which we would be happy to provide, or it is used inhouse, on-prem, on various cloud models, that's more a customer decision. We don't want to force that decision. However, from the art of the possible perspective, yes it's possible. >> I love the mission to manage the world's data. >> Thank you. >> That's a lofty goal, but yesterday's announcements with IBM were pretty, pretty transformative. In your opinion as chief operating officer, how do you see this extension of this technology and strategic partnership helping Hortonworks on the next level of managing the world's data? >> Absolutely, it's game-changing for us. We're very, very excited. Our colleagues are very, very excited about the opportunity to partner. It's also a big validation of the fact that we now have a pretty large open-source community that contributes to this cause. So we're very excited about that. The opportunity is in actually our partnering with a leader in data science, machine learning, and AI, a company that has steeped in heritage, is known for game-changing, next technology moves. And the fact that we're powering it from a data perspective is something that we're very, very excited and pleased about. And the opportunities are limitless. >> I love that, and I know you are a game-changer, in your fifth month. We thank you so much, Raj, for joining us. It was great to see you. Continued success, >> Thank you. >> at managing the world's data and being that game-changer, yourself, and for Hortonworks as well. >> Thank you Lisa, good to see you. >> You've been watching theCUBE. Again, we're live, day two of the DataWorks Summit, #DWS17. For my cohost, George Gilbert, I'm Lisa Martin. Stick around guys, we'll be right back with more great content. (jingle)

Published Date : Jun 14 2017

SUMMARY :

in the heart of Silicon Valley, Raj, it's great to have you on theCUBE. Great to see you as well, it's been a while. You also had Black Knight on the financial services side. Yes, on the insurance side, and one of the things that, But the one point I'd like to make is, What are some of the steps that you find is more to do with the fact that they can attract and the attraction on the customer side, Now, the one thing that we did cover yesterday and maybe the nuance is making it tough to articulate. for that to happen, you require two things. on the next level of managing the world's data? about the opportunity to partner. I love that, and I know you are a game-changer, at managing the world's data of the DataWorks Summit, #DWS17.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Marc Andreessen	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Duke Energy	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
TIBCO	ORGANIZATION	0.99+
Duke Energies	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
35 pounds	QUANTITY	0.99+
Raj	PERSON	0.99+
Rob Bearden	PERSON	0.99+
50 years	QUANTITY	0.99+
San Jose	LOCATION	0.99+
17 pounds	QUANTITY	0.99+
fifth month	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Rajnish Verma	PERSON	0.99+
HSC	ORGANIZATION	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
15	QUANTITY	0.99+
four months	QUANTITY	0.99+
One	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Black Knights	ORGANIZATION	0.99+
Duke	ORGANIZATION	0.99+
two reasons	QUANTITY	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Firstly	QUANTITY	0.99+
second	QUANTITY	0.99+
third	QUANTITY	0.99+
one company	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
three	QUANTITY	0.98+
#DWS17	EVENT	0.98+
Multi-billion dollar	QUANTITY	0.98+
fourth	QUANTITY	0.98+
one thing	QUANTITY	0.98+
today	DATE	0.97+
15 years ago	DATE	0.97+
11	QUANTITY	0.96+
this morning	DATE	0.95+
25 years ago	DATE	0.95+
one point	QUANTITY	0.94+
day two	QUANTITY	0.93+
Rajnish	PERSON	0.93+
first	QUANTITY	0.93+
five	DATE	0.91+
three years ago	DATE	0.91+
theCUBE	ORGANIZATION	0.9+
96, 97 year old	QUANTITY	0.89+
Hortonworks - DataWorks Summit 2017	EVENT	0.87+
earlier today	DATE	0.87+
COO	PERSON	0.86+
10 years ago	DATE	0.86+
about two	DATE	0.84+
seven	DATE	0.8+
couple	QUANTITY	0.8+
Hadoop	ORGANIZATION	0.75+
over a decade and a half	QUANTITY	0.72+
last 70 years	DATE	0.69+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)

Published Date : Jun 14 2017

SUMMARY :

brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Telco	ORGANIZATION	0.99+
Madhu	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jamie Engesser	PERSON	0.99+
Madhu Kochar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
U.S.	LOCATION	0.99+
second step	QUANTITY	0.99+
First	QUANTITY	0.99+
third	QUANTITY	0.99+
yesterday	DATE	0.99+
first step	QUANTITY	0.99+
two activities	QUANTITY	0.99+
San Jose	LOCATION	0.99+
second thing	QUANTITY	0.99+
Hortonwork	ORGANIZATION	0.99+
2015	DATE	0.99+
first	QUANTITY	0.99+
first thought	QUANTITY	0.98+
two things	QUANTITY	0.98+
eight months	QUANTITY	0.98+
three things	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.97+
two next guests	QUANTITY	0.97+
both	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Apache	ORGANIZATION	0.97+

Scott Gnau, Hortonworks & Tendü Yogurtçu, Syncsort - DataWorks Summit 2017

>> Man's Voiceover: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE, we are live at Day One of the DataWorks Summit, we've had a great day here, I'm surprised that we still have our voices left. I'm Lisa Martin, with my co-host George Gilbert. We have been talking with great innovators today across this great community, folks from Hortonworks, of course, IBM, partners, now I'd like to welcome back to theCube, who was here this morning in the green shoes, the CTO of Hortonworks, Scott Gnau, welcome back Scott! >> Great to be here yet again. >> Yet again! And we have another CTO, we've got CTO corner over here, with CUBE Alumni and the CTO of SyncSort, Tendu Yogurtcu Welcome back to theCUBE both of you >> Pleasure to be here, thank you. >> So, guys, what's new with the partnership? I know that syncsort, you have 87%, or 87 of the Fortune 100 companies are customers. Scott, 60 of the Fortune 100 companies are customers of Hortonworks. Talk to us about the partnership that you have with syncsort, what's new, what's going on there? >> You know there's always something new in our partnership. We launched our partnership, what a year and a half ago or so? >> Yes. And it was really built on the foundation of helping our customers get time to value very quickly, right and leveraging our mutual strengths. And we've been back on theCUBE a couple of times and we continue to have new things to talk about whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that was had a great foundation in value and continues to grow. And, ya know, with some of the latest moves that I'm sure Tendu will bring us up to speed on that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. >> Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you Scott. And Trillium acquisition has been transformative for us really. We have achieved quite a lot within the last six months. Delivering joint solutions between our data integration, DMX-h, and Trillium data quality and profiling portfolio and that was kind of our first step very much focused on the data governance. We are going to have data quality for Data Lake product available later this year and this week actually we will be announcing our partnership with Collibra data governance platform basically making business rules and technical meta data available through the Collibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year that's in production, a large complex production deployment's already happened. Our customers access all their data all enterprise data including legacy data, warehouse, new data sources as well as legacy main frame in the data lake so we will be announcing again in a week or so change in the capture capabilities from legacy data storage into Hadoop keeping that data fresh and giving more choices to our customers in terms of populating the data lake as well as use cases like archiving data into cloud. >> Tendu, let me try and unpack what was a very dense, in a good way, lot of content. Sticking my foot in my mouth every 30 seconds (laughter) >> Scott Voiceover: I think he called you dense. (laughter) >> So help us visualize a scenario where you have maybe DMX-h bringing data in you might have changed it at capture coming from a live data base >> Tendu Voiceover: Yes. and you've got the data quality at work as well. Help us picture how much faster and higher fidelity the data flow might be relative to >> Sure, absolutely. So, our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all data, all enterprise data accessible in the data lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop is a service architecture, there are multiple ways that you can keep that data fresh in the data lake. And you can have changed it at capture by basically taking snap-shots of the data and comparing in the data lake which is a viable method of doing it. But, as the data volumes are growing and the real time analytics requirements of the business are growing we recognize our customers are also looking for alternative ways that they can actually capture the change in real time when the change is just like less than 10% of the data, original data set and keep the data fresh in the data lake. So that enables faster analytics, real time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities that are on profiling that data, getting better understanding of that data. So we will be focused on delivering products around that because as we understand data we can also help our customers to create the business rules, to cleanse that data, and preserve the fidelity of the data and integrity of the data. >> So, with the change data capture it sounds like near real time, you're capturing changes in near real time, could that serve as a streaming solution that then is also populating the history as well? >> Absolutely. We can go through streaming or message cues. We also offer more efficient proprietary ways of streaming the data to the Hadoop. >> So the, I assume the message cues refers to, probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. >> Yes, we can do either true Kafka cues which is very efficient as well. We can also go through proprietary methods. >> So, Scott, help us understand then now the governance capabilities that, um I'm having a senior moment (laughter) I'm getting too many of these! (laughter) Help us understand the governance capabilities that Syncsort's adding to the, sort of mix with the data warehouse optimization package and how it relates to what you're doing. >> Yeah, right. So what we talked about even again this morning, right the whole notion of the value of open squared, right open source and open ecosystem. And I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like Yarn for multi tendency and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool that simply accrete's to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality. And at the same time we're really thrilled because, and we've talked about this on many times right, the whole notion of governance and meta data management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common meta data tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now, regardless of the application they choose, or the applications that they choose, they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. >> So you're partnership sounds very very symbiotic, that there's changes made on one side that reflect the other. Give us an example of where is your common customer, and this might not be, well, they're all over the place, who has got an enterprise data warehouse, are you finding more customers that are looking to modernize this? That have multi-cloud, core edge, IOT devices that's a pretty distributed environment versus customers that might be still more on prem? What's kind of the mix there? >> Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation, I think one of the strengths of our partnership is at many different levels it's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering field teams working together And in terms of our customers, it's really organizations are trying to move toward modern data architecture. And as they are trying to build the modern data architecture there are the data in motion piece I will let Scott talk about, data in rest piece and as we have so much data coming from cloud, originating through mobile and web in the enterprise, especially the Fortune 500, that we talk, Fortune 100 we talked about, insurance, health care, Talco financial services and banking has a lot of legacy data stores. So our, really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott >> Yeah, I agree And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of cuz they've already made the right decision, right. I also think, though, there's a lot of green field opportunity for us because there are hundreds if not thousands of customers out there who have legacy data systems where their data is kind of locked away. And by the way, it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great, but that is a source of raw material that belongs also in the data lake, right, and can be, can certainly enhance the value of all the other data that's being built there. And so the value, frankly, of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake and then from there, the sky's the limit, right. Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive, and optimization of the overall data fabric and off loading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed. And so, there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there is a distinct value proposition not just for our existing customers but frankly for a large set of customers out there that have, kind of, the data locked away. >> So, how would you see do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? >> Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the syncs or DMX-h we actually understand the path that data travel from, the meta data is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. >> Sure, I mean extended provenance is kind of what you're describing and that's a big deal when you think about some of these legacy systems where frankly 90% of the costs of implementing them originally was actually building out those business rules and that meta data. And so being able to preserve that and bring it over into a common or an open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in, ya know, the latest releases of Hive as an example where we can get low latency query response times there's a whole new class of work loads that now is appropriate to move into this platform and you'll see us continue to move along those lines as we advance the technology from the open community. >> Well, congratulations on continuing this great, symbiotic as we said, partnership. It sounds like it's incredible strong on the technology side, on the strategic side, on the GTM side. I'd loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE >> Thank you. twice in one day. >> Twice in one day. Tendu, thank you as well >> Thank you. for coming back to theCUBE. >> Always a pleasure. For both of our CTO's that have joined us from Hortonworks and Syncsort and my co-host George Gilbert, I am Lisa Martin, you've been watching theCUBE live from day one of the DataWorks summit. Stick around, we've got great guests coming up (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, the CTO of Hortonworks, Scott Gnau, Pleasure to be here, Scott, 60 of the Fortune 100 companies We launched our partnership, what and we continue to have new things and the bundle that we launched early February of this year what was a very dense, in a good way, lot of content. Scott Voiceover: I think he called you dense. and higher fidelity the data flow might be relative to and keep the data fresh in the data lake. We can go through streaming or message cues. So the, I assume the message cues refers to, Yes, we can do either true Kafka cues and how it relates to what you're doing. And so the fact that we're able that reflect the other. and in terms of the partnership and get it in the data lake Some of the areas that we are jointly focused on frankly 90% of the costs of implementing them originally on the strategic side, on the GTM side. Thank you. Tendu, thank you as well for coming back to theCUBE. For both of our CTO's that have joined us

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
hundreds	QUANTITY	0.99+
90%	QUANTITY	0.99+
Twice	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
twice	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trillium	ORGANIZATION	0.99+
Syncsort	ORGANIZATION	0.99+
both	QUANTITY	0.99+
60	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Data Lake	ORGANIZATION	0.99+
less than 10%	QUANTITY	0.99+
this week	DATE	0.99+
one day	QUANTITY	0.99+
Tendu	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
87%	QUANTITY	0.99+
first step	QUANTITY	0.99+
thousands of customers	QUANTITY	0.99+
Syncsort	TITLE	0.98+
87	QUANTITY	0.98+
one	QUANTITY	0.98+
Atlas	TITLE	0.98+
later this year	DATE	0.98+
SyncSort	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.98+
a year and a half ago	DATE	0.97+
Tendu	PERSON	0.97+
DataWorks Summit 2017	EVENT	0.97+
Day One	QUANTITY	0.97+
Fortune 500	ORGANIZATION	0.96+
a week	QUANTITY	0.96+
one side	QUANTITY	0.96+
Fortune 100	ORGANIZATION	0.96+
Scott Voiceover	PERSON	0.95+
Hadoop	TITLE	0.93+
Atlas	ORGANIZATION	0.93+
theCUBE	ORGANIZATION	0.92+
this morning	DATE	0.92+
CTO	PERSON	0.92+
day one	QUANTITY	0.92+
couple	QUANTITY	0.91+
last six months	DATE	0.9+
first use cases	QUANTITY	0.9+
early February of this year	DATE	0.89+
theCube	ORGANIZATION	0.89+
CUBE Alumni	ORGANIZATION	0.87+
DataWorks summit	EVENT	0.86+
today	DATE	0.86+
Talco financial services	ORGANIZATION	0.85+
every 30 seconds	QUANTITY	0.83+
Fortune	ORGANIZATION	0.8+
Kafka	PERSON	0.79+
DMX-h	ORGANIZATION	0.75+
data lake	ORGANIZATION	0.73+
Man's Voiceover	TITLE	0.6+
Kafka	TITLE	0.6+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

John Kreisa, Hortonworks– DataWorks Summit Europe 2017 #DWS17 #theCUBE

>> Announcer: Live from Munich, Germany, it's theCUBE, covering DataWorks Summit Europe 2017. Brought to you by HORTONWORKS. (electronic music) (crowd) >> Okay, welcome back everyone, we are here live in Munich, Germany, for DataWorks 2017, formerly Hadoop Summit, the European version. Again, different kind of show than the main show in North America, in San Jose, but it's a great show, a lot of great topics. I'm John Furrier, my co-host, Dave Vellante. Our next guest is John Kreisa, Vice President of International Marketing. Great to see you emceeing the event. Great job, great event! >> John Kreisa: Great. >> Classic European event, its got the European vibe. >> Yep. >> Germany everything's tightly buttoned down, very professional. (laughing) But big IOT message-- >> Yes. >> Because in Germany a lot of industrial action-- >> That's right. >> And then Europe, in general, a lot of smart cities, a lot of mobility, and issues. >> Umm-hmm. >> So a lot of IOT, a lot of meat on the bone here. >> Yep. >> So congratulations! >> John Kreisa: Thank you. >> How's your thoughts? Are you happy with the event? Give us by the numbers, how many people, what's the focus? >> Sure, yeah, no, thanks, John, Dave. Long-time CUBE attendee, I'm really excited to be here. Always great to have you guys here-- >> Thanks. >> Thanks. >> And be participating. This is a great event this year. We did change the name as you mentioned from Hadoop Summit to DataWorks Summit. Perhaps, I'll just riff on that a little bit. I think that really was in response to the change in the community, the breadth of technologies. You mentioned IOT, machine learning, and AI, which we had some of in the keynotes. So just a real expansion of from data loading, data streaming, analytics, and machine learning and artificial intelligence, which all sit on top and use the core Hadoop platform. We felt like it was time to expand the conference itself. Open up the aperture to really bring in the other technologies that were involved, and really represent what was already starting to kind of feed into Hadoop Summit, so it's kind of a natural change, a natural evolution. >> And there's a 2-year visibility. We talk about this two years ago. >> John Kreisa: Yeah, yeah. >> That you are starting to see this aperture open up a little bit. >> Yeah. >> But it's interesting. I want to get your thoughts on this because Dave and I were talking yesterday. It's like we've been to every single Hadoop Summit. Even theCUBE's been following it all as you know. It's interesting the big data space was created by the Hadoop ecosystem. >> Umm-hmm. >> So, yeah, you rode in on the Hadoop horse. >> Yeah. >> I get that. A lot of people don't get them. They say, Oh, Hadoop's dead, but it's not. >> No. >> It's evolving to a much broader scope. >> That's right. >> And you guys saw that two years ago. Comment on your reaction to Hadoop is not dead. >> Yeah, wow (laughing). It's far from dead if you look at the momentum, largest conference ever here in Europe. I think strong interest from them. I think we had a very good customer panel, which talked about the usage, right. How they were really transforming. You had Walgreens Booth's talking about how they're redoing their shelf, shelving, and how they're redesigning their stores. Don Ske-bang talking about how they're analyzing, how they replenish their cash machines. Centrica talking about how they redo their... Or how they're driving down cost of energy by being smarter around energy consumption. So, these are real transformative use cases, and so, it's far from dead. Really what might be confusing people is probably the fact that there are so many other technologies and markets that are being enabled by this open source technologies and the breadth of the platform. And I think that's maybe people see it kind of move a little bit back as a platform play. And so, we talk more about streaming and analytics and machine learning, but all that's enabled by Hadoop. It's all riding on top of this platform. And I think people kind of just misconstrue that the fact that there's one enabling-- >> It's a fundamental element, obviously. >> John Kreisa: Yeah. >> But what's the new expansion? IOT, as I mentioned, is big here. >> Umm-hmm. >> But there's a lot more in connective tissue going on, as Shawn Connelly calls it. >> Yeah, yep. >> What are those other things? >> Yeah, so I think, as you said, smart cities, smart devices, the analytics, getting the value out of the technologies. The ability to load it and capture it in new ways with new open source technology, NyFy and some of those other things, Kafka we've heard of. And some of those technologies are enabling the broader use cases, so I don't think it's... I think it's that's really the fundamental change in shift that we see. It's why we renamed it to DataWorks Summit because it's all about the data, right. That's the thing-- >> But I think... Well, if you think about from a customer perspective, to me anyway, what's happened is we went through the adolescent phase of getting this stuff to work and-- >> Yeah. >> And figuring out, Okay, what's the relationship with my enterprise data warehouse, and then they realize, Wow, the enterprise data warehouse is critical to my big data platform. >> Umm-hmm. >> So what's customers have done as they've evolved, as Hadoop has evolved, their big data platforms internally-- >> Umm-hmm. And now they're turning to to their business saying, Okay, we have this platform. Let's now really start to go up the steep part of the S-curve and get more value out of it. >> John Kreisa: Umm-hmm. >> Do you agree with that scenario? >> I would definitely agree with that. I think that as companies have, and in particularly here in Europe, it's interesting because they kind of waited for the technology to mature and its reached that inflection point. To your point, Dave, such that they're really saying, Alright, let's really get this into production. Let's really drive value out of the data that they see and know they have. And there's sort of... We see a sense of urgency here in Europe, to get going and really start to get that value out. Yeah, and we call it a ratchet game. (laughing) The ratchet is, Okay, you get the technology to work. Okay, you still got to keep the lights on. Okay, and oh, by the way, we need some data governance. Let's ratchet it up that side. Oh, we need a CDO! >> Umm-hmm. >> And so, because if you just try to ratchet up one side of the house (laughing) (cross-talk)-- >> Well, Carlo from HPE said it great on our last segment. >> Yeah. >> And I thought this was fundamental. And this was kind of like you had a CUBE moment where it's like, Wow, that's a really amazing insight. And he said something profound, The data is now foundational to all conversations. >> Right. >> And that's from a business standpoint. It's never always been the case. Now, it's like, Okay, you can look at data as a fundamental foundation building block. >> Right. >> And then react from there. So if you get the data locked in, to Dave's point about compliance, you then can then do clever things. You can have a conversation about a dynamic edge or-- >> Right. >> Something else. So the foundational data is really now fundamental, and I think that is... Changes, it's not a database issue. It's just all data. >> Right, now all data-- >> All databases. >> You're right, it's all data. It's driving the business in all different functions. It's operational efficiency. It's new applications. It's customer intimacy. All of those different ways that all these companies are going, We've got this data. We now have the systems, and we can go ahead and move forward with it. And I think that's the momentum that we're seeing here in Europe, as evidence by the conference and those kinds of things, just I think really shows how maybe... We used to say... I'd say when I first moved over here, that Europe was maybe a year and a half behind the U.S., in terms of adoption. I'd say that's shrunk to where a lot of the conversations are the exact same conversations that we're having with big European companies, that we're having with U.S. companies. >> And, even in... >> Yeah. >> Like we were just talking to Carlo, He was like, Well, and Europe is ahead in things like certain IOT-- >> Yeah. >> And Industrial IOT. >> Yeah. >> Yeah. >> Even IOT analytics. Some of the... Tesla not withstanding some of the automated vehicles. >> John Kreisa: Correct. >> Autonomous vehicles activity that's going on. >> John Kreisa: That's right. >> Certainly with Daimler and others. So there's an advancement. It almost reminds me of the early days of mobile, so... (laughing) >> It's actually, it's a good point. If you look at... Squint through some of the perspectives, it depends on where you are in the room and what your view is. You could argue there are many things that Europe is advanced on and where we're behind. If you look at Amazon Web Services, for instance. >> Umm-hmm. >> They are clearly running as fast as they can to deploy regions. >> Umm-hmm. >> So the scoop's coming out now. I'm hearing buzz that there's another region coming out. >> Right. >> From Amazon soon (laughing). They can't go fast enough. Google is putting out regions again. >> Right. >> Data centers are now pushing global, yet, there's more industrial here than is there. So it's interesting perspective. It depends on how you look at it! >> Yeah, yeah, no, I think it's... And it's perfectly fair to say there are many places where it's more advanced. I think in this technology and open source technologies, in general, are helping drive some of those and enable some of those trends. >> Yeah. >> Because if you have the sensors, you need a place to store and analyze that data whether it's smart cars or smart cities, or energy, smart energy, all those different places. That's really where we are. >> What's different in the international theater that you're involved in because you've been on both sides. >> Yep. >> As you came from the U.S. then when we first met. What's different out here now? And I see the gaps closing? What other things that notable that you could share? >> Yeah, yeah, so I'd say, we still see customers in the U.S. that are still very much wanting to use the shiniest, new thing, like the very latest version of Spark or the very latest version of NyFy or some other technologies. They want to push and use that latest version. In Europe, now the conversations are slightly different, in terms of understanding the security and governance. I think there's a lot more consciousness, if you will, around data here. There's other rules and regulations that are coming into place. And I think they're a little bit more advanced in how they think of-- >> Yeah. >> Data, personal data, how to be treated, and so, consequently, those are where the conversations are about the platform. How do we secure it? How does it get governed? So that you need regulations-- >> John Furrier: It's not as fast, as loose as the U.S. >> Yeah, it's not as fast. And you look and see some of the regulations. (laughing) My wife asked me if we should set up a VPIN on our home WiFi because of this new rule about being able to sell the personal data. I've said, Well, we're not in the U.S., but perhaps, when we move to the U.S. >> In order to get the right to block chain (laughing). (cross-talk) >> Yeah, absolutely (cross-talk). >> John Furrier: Encrypt everything. >> (laughing) Yeah, exactly. >> Well, another topic is... Let's talk about the ecosystem a little bit. >> Umm-hmm. >> You've got now some additional public brethren, obviously Cloudera's, there's been a lot of talk here about-- >> Umm-hmm. Tow-len and Al-trex-is have gone public. >> Yeah. >> The ecosystem you've evolved that. IBM was up on stage with you guys. >> Yeah, yep. >> So that continues to be-- >> Gallium C. >> Can we talk about that a little bit? >> Gallium C >> Gallium C. >> We had a great... Partners are great. We've always been about the ecosystem. We were talking about before we came on-screen that for us it's not Marney Partnership. They're very much of substance, engineering to try to drive value for the customers. It's where we see that value in that joint value. So IBM is working with us across all of the DataWorks Summit, but, even in all of the engineering work that we're doing, participated in HDP 2.6 announcement that we just did. And I'm sure what you covered with Shawn and others, but those partnerships really help drive value for the customer. >> Umm-hmm. For us, it's all making sure the customer is successful. And to make a complete solution, it is a range of products, right. It is whether it's data warehousing, servers, networks, all of the different analytics, right. There's not one product that is the complete solution. It does take a stack, a multitude of technologies, to make somebody successful. >> Cloudera's S-1, was file, what's been part of the conversation, and we've been digging into, it's great to see the numbers. >> Umm-hmm. >> Anything surprise you in the S-1? And advice you'd give to open source companies looking to go public because, as Dave pointed out, there's a string now of comrades in arms, if you will, Mool-saw, that's doing very well. >> Yeah, yeah. >> And Al-trex-is just went public. >> Yeah. >> You guys have been public for a long time. You guys been operating the public open-- >> Yeah. >> Both open source, pure open source. But also on the public markets. You guys have experience. You got some scar tissue. >> John Kreisa: (laughing) Yeah, yeah. >> What's your advice to Cloudera or others that are... Because the risk certainly will be a rush for more public companies. >> Yeah. >> It's a fantastic trend. >> I think it is a fantastic trend. I completely agree. And I think that it shows the strength of the market. It shows both the big data market, in general, the analytics market, kind of all the different components that are represented in some of those IPOs or planned IPOs. I think that for us, we're always driving for success of the customer, and I think any of the open source companies, they have to look at their business plan and take it step-wise in approach, that keeps an eye on making the customer successful because that's ultimately what's going to drive the company success and drive revenue for it and continue to do it. But we welcome as many companies as possible to come into the public market because A: it just allows everybody to operate in an open and honest way, in terms of comparison and understanding how growth is. But B: it's shows that strength of how open source and related technologies can help-- >> Yeah. >> Drive things forward. >> And it's good for the customer, too, because now they can compare-- >> Yes! >> Apples to Apples-- >> Exactly. >> Visa V, Cloudera, and what's interesting is that they had such a head start on you guys, HORTONWORKS, but the numbers are almost identical. >> Umm-hmm, yeah. >> Really close. >> Yeah, I think it's indicative of the opportunity that they're now coming out and there's rumors of other companies coming out. And I think it's just gives that visibility. We welcome it, absolutely-- >> Yeah. >> To show because we're very proud of our performance and now are growth. And I think that's something that we stand behind and stand on top of. And we want to see others come out and show what they got. >> Let's talk about events, if we can? >> Yeah. >> We were there at the first Hadoop Summit in San Jose. Thrilled to be-- >> John Kreisa: In a few years. >> In Dublin last year. >> Yeah. >> So what's the event strategy? I love going into the local flavor. >> Umm-hmm. >> Last year we had the Irish singers. This year we had a great (laughing) locaL band. >> John Kreisa: (laughing) Yeah, yeah, yeah. >> So I don't know if you've announced where next year's going to be? Maybe you can share with us some of the roll-out strategies? >> Yeah, so first of all, DataWorks Summit is a great event as you guys know, And you guys are long participants, so it's a great partnership. We've moving them international, of course, we did a couple... We are already international, but moving a couple to Asia last year so-- >> Right. >> Those were a tremendous success, we actually exceeded our targets, in terms of how many people we thought would go. >> Dave: Where did you do those? >> We were in Melburn in Tokyo. >> Dave: That's right, yeah. >> Yeah, so in both places great community, kind of rushed to the event and kind of understanding, really showed that there is truly a global kind of data community around Hadoop and other related technologies. So from here as you guys know because you're going to be there, we're thinking about San Jose and really wanting to make sure that's a great event. It's already stacking up to be tremendous, call for papers is all done. And all that's announced so, even the sessions we're really starting build for that, We'll be later this year. We'll be in Sydney, so we're going to have to take DataWorks into Sydney, Australia, in September. So throughout the rest of this year, there's going to be continued building momentum and just really global participation in this community, which is great. >> Yeah. >> Yeah. >> Yeah, it's fantastic. >> Yeah, Sydney should be great. >> Yeah. >> Looking forward to it. We're going to expand theCUBE down under. Dave and I are are excited-- >> Dave: Yeah, let's talk about that. >> We got a lot of interest (laughing). >> Alright. >> John, great to have you-- >> Come on down. >> On theCUBE again. Great to see you. Congratulations, I'm going to see you up on stage. >> Thank you. >> Doing the emcee. Great show, a lot of great presenters and great customer testimonials. And as always the sessions are packed. And good learning, great community. >> Yeah. >> Congratulations on your ecosystem. This is theCUBE broadcasting live from Munich, Germany for DataWorks 2017, presented by HORTONWORKS and Yahoo. I'm John Furrier with Dave Vellante. Stay with us, great interviews on day two still up. Stay with us. (electronic music)

Published Date : Apr 6 2017

SUMMARY :

Brought to you by HORTONWORKS. Great to see you emceeing the event. its got the European vibe. But big IOT message-- a lot of smart cities, a lot of meat on the bone here. Always great to have you guys here-- We did change the name as you mentioned And there's a 2-year visibility. to see this aperture It's interesting the big data space in on the Hadoop horse. A lot of people don't get them. to a much broader scope. And you guys saw that two years ago. that the fact that there's one enabling-- But what's the new expansion? But there's a lot more in because it's all about the data, right. of getting this stuff to work and-- Wow, the enterprise data warehouse of the S-curve and get for the technology to mature it great on our last segment. And I thought It's never always been the case. So if you get the data locked in, So the foundational data a lot of the conversations of the automated vehicles. activity that's going on. It almost reminds me of the it depends on where you are in the room as fast as they can to deploy regions. So the scoop's Google is putting out regions again. It depends on how you look at it! And it's perfectly fair to have the sensors, the international theater And I see the gaps closing? or the very latest version of NyFy So that you need regulations-- fast, as loose as the U.S. some of the regulations. In order to get the right Let's talk about the Tow-len and Al-trex-is IBM was up on stage with you guys. even in all of the engineering work networks, all of the it's great to see the numbers. in the S-1? You guys been operating the public open-- But also on the public markets. Because the risk certainly will be kind of all the different components HORTONWORKS, but the numbers indicative of the opportunity And I think that's something at the first Hadoop Summit in San Jose. I love going into the local flavor. the Irish singers. Yeah, yeah, yeah. And you guys are long participants, in terms of how many kind of rushed to the event We're going to expand theCUBE down under. to see you up on stage. And as always the sessions are packed. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
John Kreisa	PERSON	0.99+
John Furrier	PERSON	0.99+
Carlo	PERSON	0.99+
Sydney	LOCATION	0.99+
Asia	LOCATION	0.99+
Shawn Connelly	PERSON	0.99+
2-year	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
Dublin	LOCATION	0.99+
Melburn	LOCATION	0.99+
San Jose	LOCATION	0.99+
North America	LOCATION	0.99+
John	PERSON	0.99+
Last year	DATE	0.99+
U.S.	LOCATION	0.99+
Daimler	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
September	DATE	0.99+
Centrica	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
last year	DATE	0.99+
Walgreens Booth	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
both sides	QUANTITY	0.99+
HORTONWORKS	ORGANIZATION	0.99+
This year	DATE	0.99+
S-1	TITLE	0.99+
yesterday	DATE	0.98+
next year	DATE	0.98+
Munich, Germany	LOCATION	0.98+
Shawn	PERSON	0.98+
HPE	ORGANIZATION	0.98+
Hadoop Summit	EVENT	0.98+
both	QUANTITY	0.98+
two years ago	DATE	0.98+
a year and a half	QUANTITY	0.98+
one product	QUANTITY	0.98+
DataWorks 2017	EVENT	0.98+
this year	DATE	0.98+
Sydney, Australia	LOCATION	0.97+
DataWorks Summit	EVENT	0.97+
Apples	ORGANIZATION	0.97+
day two	QUANTITY	0.97+
Hortonworks–	ORGANIZATION	0.97+
Gallium C	ORGANIZATION	0.96+
Gallium C.	ORGANIZATION	0.96+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
2014	DATE	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Mertic	PERSON	0.99+
Mike Olson	PERSON	0.99+
Shaun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Shaun Connolly	PERSON	0.99+
Centric	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Coca-Cola	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
2016	DATE	0.99+
4.1 billion	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
two	QUANTITY	0.99+
100 million	QUANTITY	0.99+
five	QUANTITY	0.99+
2011	DATE	0.99+
Mount Fuji	LOCATION	0.99+
US	LOCATION	0.99+
seven	QUANTITY	0.99+
185 million	QUANTITY	0.99+
eight years	QUANTITY	0.99+
four years	QUANTITY	0.99+
10x	QUANTITY	0.99+
Dahl Jeppe	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
100 million	QUANTITY	0.99+
one	QUANTITY	0.99+
MuleSoft	ORGANIZATION	0.99+
2025	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
15	QUANTITY	0.99+
two companies	QUANTITY	0.99+
2012	DATE	0.99+
Munich, Germany	LOCATION	0.98+
Hadoop	TITLE	0.98+
DataWorks 2017	EVENT	0.98+
Wei Wang	PERSON	0.98+
Wei	PERSON	0.98+
10%	QUANTITY	0.98+
eight years	QUANTITY	0.98+
20x	QUANTITY	0.98+
Hortonworks Hadoop Summit	EVENT	0.98+
end of 2016	DATE	0.98+
three billion dollars	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.97+

Day Two Kickoff | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to day two of theCube's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight along with my co-host James Kobielus. James, it's great to be here with you in the hosting seat again. >> Day two, yes. >> Exactly. So here we are, this conference, 2,100 attendees from 32 countries, 23 industries. It's a relatively big show. They do three of them during the year. One of the things that I really-- >> It's a well-established show too. I think this is like the 11th year since Yahoo started up the first Hadoop summit in 2008. >> Right, right. >> So it's an established event, yeah go. >> Exactly, exactly. But I really want to talk about Hortonworks the company. This is something that you had brought up in an analyst report before the show started and that was talking about Hortonworks' cash flow positivity for the first time. >> Which is good. >> Which is good, which is a positive sign and yet what are the prospects for this company's financial health? We're still not seeing really clear signs of robust financial growth. >> I think the signs are good for the simple reason they're making significant investments now to prepare for the future that's almost inevitable. And the future that's almost inevitable, and when I say the future, the 2020s, the decade that's coming. Most of their customers will shift more of their workloads, maybe not entirely yet, to public cloud environments for everything they're doing, AI, machine learning, deep learning. And clearly the beneficiaries of that trend will be the public cloud providers, all of whom are Hortonworks' partners and established partners, AWS, Microsoft with Azure, Google with, you know, Google Cloud Platform, IBM with IBM Cloud. Hortonworks, and this is... You know, their partnerships with these cloud providers go back several years so it's not a new initiative for them. They've seen the writing on the wall practically from the start of Hortonworks' founding in 2011 and they now need to go deeper towards making their solution portfolio capable of being deployable on-prem, in cloud, public clouds, and in various and sundry funky combinations called hybrid multi-clouds. Okay, so, they've been making those investments in those partnerships and in public cloud enabling the Hortonworks Data Platform. Here at this show, DataWorks 2018 here in San Jose, they've released the latest major version, HDP 3.0 of their core platform with a lot of significant enhancements related to things that their customers are increasingly doing-- >> Well I want to ask you about those enhancements. >> But also they have partnership announcements, the deep ones of integration and, you know, lift and shift of the Hortonworks portfolio of HDP with Hortonworks DataFlow and DataPlane Services, so that those solutions can operate transparently on those public cloud environments as the customers, as and when the customers choose to shift their workloads. 'Cause Hortonworks really... You know, like Scott Gnau yesterday, I mean just laid it on the line, they know that the more of the public cloud workloads will predominate now in this space. They're just making these speculative investments that they absolutely have to now to prepare the way. So I think this cost that they're incurring now to prepare their entire portfolio for that inevitable future is the right thing to do and that's probably why they still have not attained massive rock and rollin' positive cash flow yet but I think that they're preparing the way for them to do so in the coming decade. >> So their financial future is looking brighter and they're doing the right things. >> Yeah, yes. >> So now let's talk tech. And this is really where you want to be, Jim, I know you. >> Oh I get sleep now and I don't think about tech constantly. >> So as you've said, they're really doing a lot of emphasis now on their public cloud partnerships. >> Yes. >> But they've also launched several new products and upgrades to existing products, what are you seeing that excites you and that you think really will be potential game changers? >> You know, this is geeky but this is important 'cause it's at the very heart of Hortonworks Data Platform 3.0, containerization of more... When you're a data scientist, and you're building a machine learning model using data that's maintained, and is persisted, and processed within Hortonworks Data Platform or any other big data platform, you want the ability increasingly for developing machine learning, deep learning, AI in general, to take that application you might build while you're using TensorFlow models, that you build on HDP, they will containerize it in Docker and, you know, orchestrate it all through Kubernetes and all that wonderful stuff, and deploy it out, those AI, out to increasingly edge computing, mobile computing, embedded computing environments where, you know, the real venture capital mania's happening, things like autonomous vehicles, and you know, drones, and you name it. So the fact is that Hortonworks has made that in many ways the premier new feature of HDP 3.0 announced here this week at the show. That very much harmonizes with what their partners, where their partners are going with containerization of AI. IBM, one of their premier partners, very recently, like last month, I think it was, announced the latest version of IBM, what do they call it, IBM Cloud Private, which has embedded as a core feature containerization within that environment which is a prem-based environment of AI and so forth. The fact that Hortonworks continues to maintain close alignment with the capabilities that its public cloud partners are building to their respective portfolios is important. But also Hortonworks with its, they call it, you know, a single pane of glass, the DataPlane Services for metadata and monitoring and governance and compliance across this sprawling hybrid multi-cloud, these scenarios. The fact that they're continuing to make, in fact, really focusing on deep investments in that portfolio, so that when an IBM introduces or, AWS, whoever, introduces some new feature in their respective platforms, Hortonworks has the ability to, as it were, abstract above and beyond all of that so that the customer, the developer, and the data administrator, all they need to do, if they're a Hortonworks customer, is stay within the DataPlane Services and environment to be able to deploy with harmonized metadata and harmonized policies, and harmonized schemas and so forth and so on, and query optimization across these sprawling environments. So Hortonworks, I think, knows where their bread is buttered and it needs to stay on the DPS, DataPlane Services, side which is why a couple months ago in Berlin, Hortonworks made a, I think, the most significant announcement of the year for them and really for the industry, was that they announced the Data Steward Studio in Berlin. Tech really clearly was who addressed the GDPR mandate that was coming up but really did a stewardship as an end-to-end workflow for lots of, you know, core enterprise applications, absolutely essential. Data Steward Studio is a DataPlane Service that can operate across multi-cloud environments. Hortonworks is going to keep on, you know... They didn't have a DPS, DataPlane Services, announcements here in San Jose this week but you can best believe that next year at this time at this show, and in the interim they'll probably have a number of significant announcements to deepen that portfolio. Once again it's to grease the wheels towards a more purely public cloud future in which there will be Hortonworks DNA inside most of their customers' environments going forward. >> I want to ask you about themes of this year's conference. The thing is is that you were in Berlin at the last big Hortonworks DataWorks Summit. >> (speaks in foreign language) >> And really GDPR dominated the conversations because the new rules and regulations hadn't yet taken effect and companies were sort of bracing for what life was going to be like under GDPR. Now the rules are here, they're here to stay, and companies are really grappling with it, trying to understand the changes and how they can exist in this new regime. What would you say are the biggest themes... We're still talking about GDPR, of course, but what would you say are the bigger themes that are this week's conference? Is it scalability, is it... I mean, what would you say we're going, what do you think has dominated the conversations here? >> Well scalability is not the big theme this week though there are significant scalability announcements this week in the context of HDP 3.0, the ability to persist in a scale-out fashion across multi-cloud, billions of files. Storage efficiency is an important piece of the overall announcement with support for erasure coding, blah blah blah. That's not, you know, that's... Already, Hortonworks, like all of their cloud providers and other big data providers, provide very scalable environments for storage, workload management. That was not the hugest, buzzy theme in terms of the announcements this week. The buzz of course was HDP 3.0. Containerization, that's important, but you know, we just came out of the day two keynote. AI is not a huge focus yet for a lot of the Hortonworks customers who are here, the developers. They're, you know, most of their customers are not yet that far along in their deep learning journeys and whatever but they're definitely going there. There's plenty of really cool keynote discussions including the guy with the autonomous vehicles or whatever that, the thing we just came out of. That was not the predominant theme this week here in terms of the HDP 3.0. I think what it comes down to is that with HDP 3.0... Hive, though you tend to take it for granted, it's been in Hadoop from the very start, practically, Hive is now a full enterprise database and that's the core, one of the cores, of HDP 3.0. Hive itself, Hive 3.0 now is its version, is ACID compliant and that may be totally geeky to the most of the world but that enables it to support transactional applications. So more big data in every environment is supporting more traditional enterprise application, transactional applications that require like two-phase commit and all that goodness. The fact is, you know, Hortonworks have, from what I can see, is the first of the big data vendors to incorporate those enhancements to Hive 3.0 because they're so completely tuned in to the Hive environment in terms of a committer. I think in many ways that is the predominant theme in terms of the new stuff that will actually resonate with the developers, their customers here at the show. And with the, you know, enterprises in general, they can put more of their traditional enterprise application workloads on big data environments and specifically, Hortonworks hopes, its HDP 3.0. >> Well I'm excited to learn more here at the on theCube with you today. We've got a lot of great interviews lined up and a lot of interesting content. We got a great crew too so this is a fun show to do. >> Sure is. >> We will have more from day two of the.

Published Date : Jun 20 2018

SUMMARY :

Live from San Jose, in the heart James, it's great to be here with you One of the things that I really-- I think this is like the So it's an This is something that you had brought up of robust financial growth. in public cloud enabling the Well I want to ask you is the right thing to do doing the right things. And this is really where you Oh I get sleep now and I don't think of emphasis now on their announcement of the year at the last big Hortonworks because the new rules of the announcements this week. this is a fun show to do.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Hortonworks'	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
2011	DATE	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
James	PERSON	0.99+
23 industries	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Hive 3.0	TITLE	0.99+
2020s	DATE	0.99+
next year	DATE	0.99+
this week	DATE	0.99+
32 countries	QUANTITY	0.99+
Hive	TITLE	0.99+
11th year	QUANTITY	0.99+
yesterday	DATE	0.99+
first time	QUANTITY	0.99+
GDPR	TITLE	0.98+
last month	DATE	0.98+
DataPlane Services	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Scott Gnau	PERSON	0.98+
2008	DATE	0.98+
three	QUANTITY	0.98+
2,100 attendees	QUANTITY	0.98+
HDP 3.0	TITLE	0.98+
today	DATE	0.98+
Data Steward Studio	ORGANIZATION	0.98+
two-phase	QUANTITY	0.98+
one	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.96+
DataPlane	ORGANIZATION	0.96+
Day two	QUANTITY	0.96+
billions of files	QUANTITY	0.95+
first	QUANTITY	0.95+
day two	QUANTITY	0.95+
DPS	ORGANIZATION	0.95+
Data Platform 3.0	TITLE	0.94+
Hortonworks DataWorks Summit	EVENT	0.94+
DataWorks	EVENT	0.92+

Rob Thomas, IBM Analytics | IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany, it's theCUBE. Covering IBM: Fast Track Your Data. Brought to you by IBM. >> Welcome, everybody, to Munich, Germany. This is Fast Track Your Data brought to you by IBM, and this is theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the noise. My name is Dave Vellante, and I'm here with my co-host Jim Kobielus. Rob Thomas is here, he's the General Manager of IBM Analytics, and longtime CUBE guest, good to see you again, Rob. >> Hey, great to see you. Thanks for being here. >> Dave: You're welcome, thanks for having us. So we're talking about, we missed each other last week at the Hortonworks DataWorks Summit, but you came on theCUBE, you guys had the big announcement there. You're sort of getting out, doing a Hadoop distribution, right? TheCUBE gave up our Hadoop distributions several years ago so. It's good that you joined us. But, um, that's tongue-in-cheek. Talk about what's going on with Hortonworks. You guys are now going to be partnering with them essentially to replace BigInsights, you're going to continue to service those customers. But there's more than that. What's that announcement all about? >> We're really excited about that announcement, that relationship, just to kind of recap for those that didn't see it last week. We are making a huge partnership with Hortonworks, where we're bringing data science and machine learning to the Hadoop community. So IBM will be adopting HDP as our distribution, and that's what we will drive into the market from a Hadoop perspective. Hortonworks is adopting IBM Data Science Experience and IBM machine learning to be a core part of their Hadoop platform. And I'd say this is a recognition. One is, companies should do what they do best. We think we're great at data science and machine learning. Hortonworks is the best at Hadoop. Combine those two things, it'll be great for clients. And, we also talked about extending that to things like Big SQL, where they're partnering with us on Big SQL, around modernizing data environments. And then third, which relates a little bit to what we're here in Munich talking about, is governance, where we're partnering closely with them around unified governance, Apache Atlas, advancing Atlas in the enterprise. And so, it's a lot of dimensions to the relationship, but I can tell you since I was on theCUBE a week ago with Rob Bearden, client response has been amazing. Rob and I have done a number of client visits together, and clients see the value of unlocking insights in their Hadoop data, and they love this, which is great. >> Now, I mean, the Hadoop distro, I mean early on you got into that business, just, you had to do it. You had to be relevant, you want to be part of the community, and a number of folks did that. But it's really sort of best left to a few guys who want to do that, and Apache open source is really, I think, the way to go there. Let's talk about Munich. You guys chose this venue. There's a lot of talk about GDPR, you've got some announcements around unified government, but why Munich? >> So, there's something interesting that I see happening in the market. So first of all, you look at the last five years. There's only 10 companies in the world that have outperformed the S&P 500, in each of those five years. And we started digging into who those companies are and what they do. They are all applying data science and machine learning at scale to drive their business. And so, something's happening in the market. That's what leaders are doing. And I look at what's happening in Europe, and I say, I don't see the European market being that aggressive yet around data science, machine learning, how you apply data for competitive advantage, so we wanted to come do this in Munich. And it's a bit of a wake-up call, almost, to say hey, this is what's happening. We want to encourage clients across Europe to think about how do they start to do something now. >> Yeah, of course, GDPR is also a hook. The European Union and you guys have made some talk about that, you've got some keynotes today, and some breakout sessions that are discussing that, but talk about the two announcements that you guys made. There's one on DB2, there's another one around unified governance, what do those mean for clients? >> Yeah, sure, so first of all on GDPR, it's interesting to me, it's kind of the inverse of Y2K, which is there's very little hype, but there's huge ramifications. And Y2K was kind of the opposite. So look, it's coming, May 2018, clients have to be GDPR-compliant. And there's a misconception in the market that that only impacts companies in Europe. It actually impacts any company that does any type of business in Europe. So, it impacts everybody. So we are announcing a platform for unified governance that makes sure clients are GDPR-compliant. We've integrated software technology across analytics, IBM security, some of the assets from the Promontory acquisition that IBM did last year, and we are delivering the only platform for unified governance. And that's what clients need to be GDPR-compliant. The second piece is data has to become a lot simpler. As you think about my comment, who's leading the market today? Data's hard, and so we're trying to make data dramatically simpler. And so for example, with DB2, what we're announcing is you can download and get started using DB2 in 15 minutes or less, and anybody can do it. Even you can do it, Dave, which is amazing. >> Dave: (laughs) >> For the first time ever, you can-- >> We'll test that, Rob. >> Let's go test that. I would love to see you do it, because I guarantee you can. Even my son can do it. I had my son do it this weekend before I came here, because I wanted to see how simple it was. So that announcement is really about bringing, or introducing a new era of simplicity to data and analytics. We call it Download And Go. We started with SPSS, we did that back in March. Now we're bringing Download And Go to DB2, and to our governance catalog. So the idea is make data really simple for enterprises. >> You had a community edition previous to this, correct? There was-- >> Rob: We did, but it wasn't this easy. >> Wasn't this simple, okay. >> Not anybody could do it, and I want to make it so anybody can do it. >> Is simplicity, the rate of simplicity, the only differentiator of the latest edition, or I believe you have Kubernetes support now with this new addition, can you describe what that involves? >> Yeah, sure, so there's two main things that are new functionally-wise, Jim, to your point. So one is, look, we're big supporters of Kubernetes. And as we are helping clients build out private clouds, the best answer for that in our mind is Kubernetes, and so when we released Data Science Experience for Private Cloud earlier this quarter, that was on Kubernetes, extending that now to other parts of the portfolio. The other thing we're doing with DB2 is we're extending JSON support for DB2. So think of it as, you're working in a relational environment, now just through SQL you can integrate with non-relational environments, JSON, documents, any type of no-SQL environment. So we're finally bringing to fruition this idea of a data fabric, which is I can access all my data from a single interface, and that's pretty powerful for clients. >> Yeah, more cloud data development. Rob, I wonder if you can, we can go back to the machine learning, one of the core focuses of this particular event and the announcements you're making. Back in the fall, IBM made an announcement of Watson machine learning, for IBM Cloud, and World of Watson. In February, you made an announcement of IBM machine learning for the z platform. What are the machine learning announcements at this particular event, and can you sort of connect the dots in terms of where you're going, in terms of what sort of innovations are you driving into your machine learning portfolio going forward? >> I have a fundamental belief that machine learning is best when it's brought to the data. So, we started with, like you said, Watson machine learning on IBM Cloud, and then we said well, what's the next big corpus of data in the world? That's an easy answer, it's the mainframe, that's where all the world's transactional data sits, so we did that. Last week with the Hortonworks announcement, we said we're bringing machine learning to Hadoop, so we've kind of covered all the landscape of where data is. Now, the next step is about how do we bring a community into this? And the way that you do that is we don't dictate a language, we don't dictate a framework. So if you want to work with IBM on machine learning, or in Data Science Experience, you choose your language. Python, great. Scala or Java, you pick whatever language you want. You pick whatever machine learning framework you want, we're not trying to dictate that because there's different preferences in the market, so what we're really talking about here this week in Munich is this idea of an open platform for data science and machine learning. And we think that is going to bring a lot of people to the table. >> And with open, one thing, with open platform in mind, one thing to me that is conspicuously missing from the announcement today, correct me if I'm wrong, is any indication that you're bringing support for the deep learning frameworks like TensorFlow into this overall machine learning environment. Am I wrong? I know you have Power AI. Is there a piece of Power AI in these announcements today? >> So, stay tuned on that. We are, it takes some time to do that right, and we are doing that. But we want to optimize so that you can do machine learning with GPU acceleration on Power AI, so stay tuned on that one. But we are supporting multiple frameworks, so if you want to use TensorFlow, that's great. If you want to use Caffe, that's great. If you want to use Theano, that's great. That is our approach here. We're going to allow you to decide what's the best framework for you. >> So as you look forward, maybe it's a question for you, Jim, but Rob I'd love you to chime in. What does that mean for businesses? I mean, is it just more automation, more capabilities as you evolve that timeline, without divulging any sort of secrets? What do you think, Jim? Or do you want me to ask-- >> What do I think, what do I think you're doing? >> No, you ask about deep learning, like, okay, that's, I don't see that, Rob says okay, stay tuned. What does it mean for a business, that, if like-- >> Yeah. >> If I'm planning my roadmap, what does that mean for me in terms of how I should think about the capabilities going forward? >> Yeah, well what it means for a business, first of all, is what they're going, they're using deep learning for, is doing things like video analytics, and speech analytics and more of the challenges involving convolution of neural networks to do pattern recognition on complex data objects for things like connected cars, and so forth. Those are the kind of things that can be done with deep learning. >> Okay. And so, Rob, you're talking about here in Europe how the uptick in some of the data orientation has been a little bit slower, so I presume from your standpoint you don't want to over-rotate, to some of these things. But what do you think, I mean, it sounds like there is difference between certainly Europe and those top 10 companies in the S&P, outperforming the S&P 500. What's the barrier, is it just an understanding of how to take advantage of data, is it cultural, what's your sense of this? >> So, to some extent, data science is easy, data culture is really hard. And so I do think that culture's a big piece of it. And the reason we're kind of starting with a focus on machine learning, simplistic view, machine learning is a general-purpose framework. And so it invites a lot of experimentation, a lot of engagement, we're trying to make it easier for people to on-board. As you get to things like deep learning as Jim's describing, that's where the market's going, there's no question. Those tend to be very domain-specific, vertical-type use cases and to some extent, what I see clients struggle with, they say well, I don't know what my use case is. So we're saying, look, okay, start with the basics. A general purpose framework, do some tests, do some iteration, do some experiments, and once you find out what's hunting and what's working, then you can go to a deep learning type of approach. And so I think you'll see an evolution towards that over time, it's not either-or. It's more of a question of sequencing. >> One of the things we've talked to you about on theCUBE in the past, you and others, is that IBM obviously is a big services business. This big data is complicated, but great for services, but one of the challenges that IBM and other companies have had is how do you take that service expertise, codify it to software and scale it at large volumes and make it adoptable? I thought the Watson data platform announcement last fall, I think at the time you called it Data Works, and then so the name evolved, was really a strong attempt to do that, to package a lot of expertise that you guys had developed over the years, maybe even some different software modules, but bring them together in a scalable software package. So is that the right interpretation, how's that going, what's the uptake been like? >> So, it's going incredibly well. What's interesting to me is what everybody remembers from that announcement is the Watson Data Platform, which is a decomposable framework for doing these types of use cases on the IBM cloud. But there was another piece of that announcement that is just as critical, which is we introduced something called the Data First method. And that is the recipe book to say to a client, so given where you are, how do you get to this future on the cloud? And that's the part that people, clients, struggle with, is how do I get from step to step? So with Data First, we said, well look. There's different approaches to this. You can start with governance, you can start with data science, you can start with data management, you can start with visualization, there's different entry points. You figure out the right one for you, and then we help clients through that. And we've made Data First method available to all of our business partners so they can go do that. We work closely with our own consulting business on that, GBS. But that to me is actually the thing from that event that has had, I'd say, the biggest impact on the market, is just helping clients map out an approach, a methodology, to getting on this journey. >> So that was a catalyst, so this is not a sequential process, you can start, you can enter, like you said, wherever you want, and then pick up the other pieces from majority model standpoint? Exactly, because everybody is at a different place in their own life cycle, and so we want to make that flexible. >> I have a question about the clients, the customers' use of Watson Data Platform in a DevOps context. So, are more of your customers looking to use Watson Data Platform to automate more of the stages of the machine learning development and the training and deployment pipeline, and do you see, IBM, do you see yourself taking the platform and evolving it into a more full-fledged automated data science release pipelining tool? Or am I misunderstanding that? >> Rob: No, I think that-- >> Your strategy. >> Rob: You got it right, I would just, I would expand a little bit. So, one is it's a very flexible way to manage data. When you look at the Watson Data Platform, we've got relational stores, we've got column stores, we've got in-memory stores, we've got the whole suite of open-source databases under the composed-IO umbrella, we've got cloud in. So we've delivered a very flexible data layer. Now, in terms of how you apply data science, we say, again, choose your model, choose your language, choose your framework, that's up to you, and we allow clients, many clients start by building models on their private cloud, then we say you can deploy those into the Watson Data Platform, so therefore then they're running on the data that you have as part of that data fabric. So, we're continuing to deliver a very fluid data layer which then you can apply data science, apply machine learning there, and there's a lot of data moving into the Watson Data Platform because clients see that flexibility. >> All right, Rob, we're out of time, but I want to kind of set up the day. We're doing CUBE interviews all morning here, and then we cut over to the main tent. You can get all of this on IBMgo.com, you'll see the schedule. Rob, you've got, you're kicking off a session. We've got Hilary Mason, we've got a breakout session on GDPR, maybe set up the main tent for us. >> Yeah, main tent's going to be exciting. We're going to debunk a lot of misconceptions about data and about what's happening. Marc Altshuller has got a great segment on what he calls the death of correlations, so we've got some pretty engaging stuff. Hilary's got a great piece that she was talking to me about this morning. It's going to be interesting. We think it's going to provoke some thought and ultimately provoke action, and that's the intent of this week. >> Excellent, well Rob, thanks again for coming to theCUBE. It's always a pleasure to see you. >> Rob: Thanks, guys, great to see you. >> You're welcome; all right, keep it right there, buddy, We'll be back with our next guest. This is theCUBE, we're live from Munich, Fast Track Your Data, right back. (upbeat electronic music)

Published Date : Jun 22 2017

SUMMARY :

Brought to you by IBM. This is Fast Track Your Data brought to you by IBM, Hey, great to see you. It's good that you joined us. and machine learning to the Hadoop community. You had to be relevant, you want to be part of the community, So first of all, you look at the last five years. but talk about the two announcements that you guys made. Even you can do it, Dave, which is amazing. I would love to see you do it, because I guarantee you can. but it wasn't this easy. and I want to make it so anybody can do it. extending that now to other parts of the portfolio. What are the machine learning announcements at this And the way that you do that is we don't dictate I know you have Power AI. We're going to allow you to decide So as you look forward, maybe it's a question No, you ask about deep learning, like, okay, that's, and speech analytics and more of the challenges But what do you think, I mean, it sounds like And the reason we're kind of starting with a focus One of the things we've talked to you about on theCUBE And that is the recipe book to say to a client, process, you can start, you can enter, and deployment pipeline, and do you see, IBM, models on their private cloud, then we say you can deploy and then we cut over to the main tent. and that's the intent of this week. It's always a pleasure to see you. This is theCUBE, we're live from Munich,

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Europe	LOCATION	0.99+
Rob	PERSON	0.99+
Marc Altshuller	PERSON	0.99+
Hilary	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Rob Bearden	PERSON	0.99+
February	DATE	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
May 2018	DATE	0.99+
March	DATE	0.99+
Munich	LOCATION	0.99+
Scala	TITLE	0.99+
Apache	ORGANIZATION	0.99+
second piece	QUANTITY	0.99+
Last week	DATE	0.99+
Java	TITLE	0.99+
last year	DATE	0.99+
two announcements	QUANTITY	0.99+
10 companies	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
DB2	TITLE	0.99+
15 minutes	QUANTITY	0.99+
last week	DATE	0.99+
IBM Analytics	ORGANIZATION	0.99+
European Union	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
JSON	TITLE	0.99+
Watson Data Platform	TITLE	0.99+
third	QUANTITY	0.99+
One	QUANTITY	0.99+
this week	DATE	0.98+
today	DATE	0.98+
a week ago	DATE	0.98+
two things	QUANTITY	0.98+
SQL	TITLE	0.98+
last fall	DATE	0.98+
2017	DATE	0.98+
Munich, Germany	LOCATION	0.98+
each	QUANTITY	0.98+
Y2K	ORGANIZATION	0.98+

Show Wrap - Data Platforms 2017 - #DataPlatforms2017

>> Announcer: Live from the Wigwam in Phoenix, Arizona. It's theCUBE. Covering Data Platforms 2017. Brought to you by Kubo. >> Hey welcome back everybody. Jeff Frick here with theCUBE along with George Gilbert from Wikibon. We've had a tremendous day here at DataPlatforms 2017 at the historic Wigwam Resort, just outside of Phoenix, Arizona. George, you've been to a lot of big data shows. What's your impression? >> I thought we're at the, we're sort of at the edge of what could be a real bridge to something new, which is, we've built big data systems for like out of traditional, as traditional software for deployment on traditional infrastructure. Even if you were going to put it in a virtual machine, it's still not a cloud. You're still dealing with server abstractions. But what's happening with Kubo is, they're saying, once you go to the cloud, whether it's Amazon, Azure, Google or Oracle, you're going to be dealing with services. Services are very different. It greatly simplifies the administrative experience, the developer experience, and more than that, they're focused on, they're focused on turning Kubo, the product on Kubo the service, so that they can automate the management of it. And we know that big data has been choking itself on complexity. Both admin and developer complexity. And they're doing something unique, both on sort of the big data platform management, but also data science operations. And their point, their contention, which we still have to do a little more homework on, is that the vendors who started with software on-prem, can't really make that change very easily without breaking what they've done on-prem. Cuz they have traditional perpetual license physical software as opposed to services, which is what is in the cloud. >> The question is, are people going to wait for them to figure it out. I talked to somebody in the hallway earlier this morning and we were talking about their move to put all their data into, it was S3, on their data lake. And he said, it's part of a much bigger transformational process that we're doing inside the company. And so, this move, from his cloud, public cloud viable, to tell me, give me a reason why it shouldn't go to the cloud, has really kicked in big time. And hear over and over and over that speed and agility, not just in deploying applications, but in operating as a company, is the key to success. And we hear over and over how many, how short the tenure is on the Fortune 500 now, compared to what it used to be. So if you're not speed and agile, which you pretty much have to use cloud, and software driven automated decision-making >> Yeah. >> that's powered by machine learning to eat. >> Those two things. >> A huge percentage of your transaction and decision-making, you're going to get smoked by the person that is. >> Let's let's sort of peel that back. I was talking to Monte Zweben who is the co-founder of Splice Machine, one of the most advance databases that sort of come out of nowhere over the last couple of years. And it's now, I think, in close beta on Amazon. He showed me, like a couple of screens for spinning it up and configuring it on Amazon. And he said, if I were doing that on-prem, he goes I needed Hadoop cluster with HBase. It would take me like four plus months. And that's an example of software versus services. >> Jeff: Right. >> And when you said, when you pointed out that, automated decision-making, powered by machine learning, that's the other part, which is these big data systems ultimately are in the service of creating machine learning models that will inform ever better decisions with ever greater speed and the key then is to plug those models into existing systems of record. >> Jeff: Right. Right. >> Because we're not going to, >> We're not going to to rip those out and rebuild them from scratch. >> Right. But as you just heard, you can pull the data out that you need, run it through a new age application. >> George: Yeah. >> And then feed it back into the old system. >> George: Yes. >> The other thing that came up, it was Oskar, I have to look him up, Oskar Austegard from Gannett was on one of the panels. We always talk about the flexibility to add capacity very easily in a cloud-based solution. But he talked about in the separation of storage and cloud, that they actually have times where they turn off all their compute. It's off. Off. >> And that was If you had to boil down the fundamental compatibility break between on-prem and in the cloud, the Kubo folks, both the CEO and CMO said, look, you cannot reconcile what's essentially server send, where the storage is attached to the compute node, the server. With cloud where you have storage separate from compute and allowing you to spin it down completely. He said those are just the fundamentally incompatible. >> Yeah, yeah. And also, Andretti, one of the founders in his talk, he talked about the big three trends, which we just kind of talked about, he summarized them right in serverless. This continual push towards smaller and smaller units >> George: Yeah. >> of store compute. And the increasing speed of networks is one, from virtual servers to just no servers, to just compute. The second one is automation, you've got to move to automation. >> George: Right. If you're not, you're going to get passed by your competitor that is. Or the competitor you that you don't even know that exists that's going to come out from over your shoulder. And the third one was the intelligence, right. There is a lot of intelligence that can be applied. And I think the other cusp that we're on, is this continuing crazy increase in compute horsepower. Which just keeps going. That the speed and the intelligence of these machines is growing at an exponential curve, not a linear curve. It's going to be bananas in the not too distance future. >> We're soaking up more and more that intelligence with machine learning. The training part of machine learning where the datasets to train a model are immense. Not only the dataset are large, but the amount of time to sort of chug through them to come up with the, just the right mix of variables and values for those variables. Or maybe even multiple models. So that we're going to see in the cloud. And that's going to chew up more and more cycles. Even as we have >> Jeff: Right. Right. >> specialized processors. >> Jeff: Right. But in the data ops world, in theory yes, but I don't have to wait to get it right. Right? I can get it 70% right. >> George: Yeah. >> Which is better than not right. >> George: Yeah. >> And I can continue to iterate over time. In that, I think was the the genius of dev-ops. To stop writing PRDs and MRDs. >> George: Yeah. >> And deliver something. And then listen and adjust. >> George: Yeah. >> And within the data ops world, it's the same thing. Don't try to figure it all out. Take the data you know, have some hypothesis. Build some models and iterate. That's really tough to compete with. >> George: Yeah. >> Fast, fast, fast iteration. >> We're doing actually a fair amount of research on that. On the Wikibon side. Which is, if you build, if you build an enterprise application that has, that is reinforced or informed by models in many different parts, in other words, you're modeling more and more digital entities within the business. >> Jeff: Right. >> Each of those has feedback loops. >> Jeff: Right. Right. >> And when you get the whole thing orchestrated and moving or learning in concert then you have essentially what Michael Porter many years ago called competitive advantage. Which is when each business process reinforces all the other business processes in service of a delivering a value proposition. And those models represent business processes and when they're learning and orchestrated all together, you have a, what Trump called a fined-tuned machine. >> I won't go there. >> Leaving out that it was Bigley and it was finely-tuned machine. >> Yeah, yeah. But the end of the day, if you're using resources and effort to improve an different resource and effort, you're getting a multiplier effect. >> Yes. >> And that's really the key part. Final thought as we go out of here. Are you excited about this? Do you see, they showed the picture the NASA headquarters with the big giant snowball truck loading up? Do you see more and more of this big enterprise data going into S3, going into Google Cloud, going into Microsoft Azure? >> You're asking-- >> Is this the solution for the data lake swamp issue that we've been talking about? >> You're asking the 64 dollar question. Which is, companies, we sensed a year ago at the at the Hortonworks DataWorks Summit in, was in June, down in San Jose last year. That was where we first got the sense that, people were sort of throwing in the towel on trying to build, large scale big data platforms on-prem. And what changes now is, are they now evaluating Hortonworks versus Cloudera versus MapR in the cloud or are they widening their consideration as Kubo suggests. Because now they want to look, not only at Cloud Native Hadoop, but they actually might want to look at Cloud Native Services that aren't necessarily related to Hadoop. >> Right. Right. And we know as a service wins. It's continue. PAS is a service. Software is a service. Time and time again, as a service either eats a lot of share from the incumbent or knocks the incumbent out. So, Hadoop as a service, regardless of your distro, via one of these types of companies on Amazon, it seems like it's got to win, right. It's going to win. >> Yeah but the difference is, so far, so far, the Clouderas and the MapRs and the Hortonworks of the world are more software than service when they're in the cloud. They don't hide all the knobs. You still need You still a highly trained admin to get them up-- >> But not if you buy it as a service, in theory, right. It's going to be packaged up by somebody else and they'll have your knobs all set. >> They're not designed yet that way. >> HD Insight >> Then, then, then, then, They better be careful cuz it might be a new, as a service distro, of the Hadoop system. >> My point, which is what this is. >> Okay, very good, we'll leave it at that. So George, thanks for spending the day with me. Good show as always. >> And I'll be in a better mood next time when you don't steal my candy bars. >> All right. He's George Goodwin. I'm Jeff Frick. You're watching theCUBE. We're at the historic 99 years young, Wigwam Resort, just outside of Phoenix, Arizona. DataPlatforms 2017. Thanks for watching. It's been a busy season. It'll continue to be a busy season. So keep it tuned. SiliconAngle.TV or YouTube.com/SiliconAngle. Thanks for watching.

Published Date : May 26 2017

SUMMARY :

Brought to you by Kubo. at the historic Wigwam Resort, is that the vendors who started with software on-prem, but in operating as a company, is the key to success. you're going to get smoked by the person that is. over the last couple of years. and the key then is to plug those models Jeff: Right. We're not going to to rip those out But as you just heard, We always talk about the flexibility to add capacity And that was And also, Andretti, one of the founders in his talk, And the increasing speed of networks is one, And the third one was the intelligence, right. but the amount of time to sort of chug through them Jeff: Right. But in the data ops world, in theory yes, And I can continue to iterate over time. And then listen and adjust. Take the data you know, have some hypothesis. On the Wikibon side. Jeff: Right. And when you get the whole thing orchestrated Leaving out that it was Bigley But the end of the day, if you're using resources And that's really the key part. You're asking the 64 dollar question. a lot of share from the incumbent and the Hortonworks of the world It's going to be packaged up by somebody else of the Hadoop system. which is what this is. So George, thanks for spending the day with me. And I'll be in a better mood next time We're at the historic 99 years young, Wigwam Resort,

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Jeff	PERSON	0.99+
George	PERSON	0.99+
George Goodwin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Michael Porter	PERSON	0.99+
Andretti	PERSON	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
64 dollar	QUANTITY	0.99+
70%	QUANTITY	0.99+
Trump	PERSON	0.99+
Oskar Austegard	PERSON	0.99+
June	DATE	0.99+
Oracle	ORGANIZATION	0.99+
Oskar	PERSON	0.99+
Google	ORGANIZATION	0.99+
NASA	ORGANIZATION	0.99+
Kubo	ORGANIZATION	0.99+
one	QUANTITY	0.99+
last year	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
four plus months	QUANTITY	0.99+
99 years	QUANTITY	0.99+
third one	QUANTITY	0.99+
Phoenix, Arizona	LOCATION	0.99+
a year ago	DATE	0.99+
Splice Machine	ORGANIZATION	0.98+
Both	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
both	QUANTITY	0.97+
Azure	ORGANIZATION	0.97+
Each	QUANTITY	0.96+
Monte Zweben	PERSON	0.96+
first	QUANTITY	0.94+
MapRs	ORGANIZATION	0.94+
earlier this morning	DATE	0.92+
Wigwam Resort	LOCATION	0.92+
two things	QUANTITY	0.92+
2017	DATE	0.92+
#DataPlatforms2017	EVENT	0.89+
Wikibon	ORGANIZATION	0.89+
second one	QUANTITY	0.89+
three trends	QUANTITY	0.89+
each business process	QUANTITY	0.87+
DataPlatforms	TITLE	0.86+
theCUBE	ORGANIZATION	0.85+
Cloudera	ORGANIZATION	0.85+
Hortonworks DataWorks Summit	EVENT	0.85+
Wigwam Resort	ORGANIZATION	0.85+
Kubo	PERSON	0.84+
Gannett	ORGANIZATION	0.82+
MapR	ORGANIZATION	0.8+
S3	TITLE	0.8+
many years ago	DATE	0.78+
DataPlatforms 2017	EVENT	0.74+
years	DATE	0.73+
YouTube.com/SiliconAngle	OTHER	0.72+
Clouderas	ORGANIZATION	0.7+
Cloud Native	TITLE	0.67+
Platforms	TITLE	0.67+
Google Cloud	TITLE	0.64+
Cloud Native Hadoop	TITLE	0.64+
last couple	DATE	0.64+
Azure	TITLE	0.61+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hortonworks DataWorks Summit: