Rob Bearden, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Rob	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
hundreds of thousands of dollars	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
each component	QUANTITY	0.99+
GDPR	TITLE	0.99+
DataWorks Summit	EVENT	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
millions of dollars	QUANTITY	0.98+
Atlas	TITLE	0.98+
first steps	QUANTITY	0.98+
HDP 3.0	TITLE	0.97+
One pane	QUANTITY	0.97+
both	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
First	QUANTITY	0.96+
next year	DATE	0.96+
each	QUANTITY	0.96+
DataPlane	TITLE	0.96+
theCUBE	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
DataWorks	ORGANIZATION	0.95+
Spark	TITLE	0.95+
today	DATE	0.94+
EU	LOCATION	0.93+
this morning	DATE	0.91+
Atlanta,	LOCATION	0.91+
Berlin	LOCATION	0.9+
each type	QUANTITY	0.88+
Global Data Protection Regulation GDPR	TITLE	0.87+
one common	QUANTITY	0.86+
few months ago	DATE	0.85+
NiFi	ORGANIZATION	0.85+
Data Platform 3.0	TITLE	0.84+
each tier	QUANTITY	0.84+
Data Studio	ORGANIZATION	0.84+
Data Studio	TITLE	0.83+
day one	QUANTITY	0.83+
one management platform	QUANTITY	0.82+
MiNiFi	ORGANIZATION	0.82+
San	LOCATION	0.71+
DataPlane	ORGANIZATION	0.69+
Kafka	TITLE	0.67+
Encore ERP	TITLE	0.66+
one common set	QUANTITY	0.65+
Data Steward Studio	ORGANIZATION	0.65+
HDF	ORGANIZATION	0.59+
Georgia	LOCATION	0.55+
announcements	QUANTITY	0.51+
Jose	ORGANIZATION	0.47+

Day Two Kickoff | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to day two of theCube's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight along with my co-host James Kobielus. James, it's great to be here with you in the hosting seat again. >> Day two, yes. >> Exactly. So here we are, this conference, 2,100 attendees from 32 countries, 23 industries. It's a relatively big show. They do three of them during the year. One of the things that I really-- >> It's a well-established show too. I think this is like the 11th year since Yahoo started up the first Hadoop summit in 2008. >> Right, right. >> So it's an established event, yeah go. >> Exactly, exactly. But I really want to talk about Hortonworks the company. This is something that you had brought up in an analyst report before the show started and that was talking about Hortonworks' cash flow positivity for the first time. >> Which is good. >> Which is good, which is a positive sign and yet what are the prospects for this company's financial health? We're still not seeing really clear signs of robust financial growth. >> I think the signs are good for the simple reason they're making significant investments now to prepare for the future that's almost inevitable. And the future that's almost inevitable, and when I say the future, the 2020s, the decade that's coming. Most of their customers will shift more of their workloads, maybe not entirely yet, to public cloud environments for everything they're doing, AI, machine learning, deep learning. And clearly the beneficiaries of that trend will be the public cloud providers, all of whom are Hortonworks' partners and established partners, AWS, Microsoft with Azure, Google with, you know, Google Cloud Platform, IBM with IBM Cloud. Hortonworks, and this is... You know, their partnerships with these cloud providers go back several years so it's not a new initiative for them. They've seen the writing on the wall practically from the start of Hortonworks' founding in 2011 and they now need to go deeper towards making their solution portfolio capable of being deployable on-prem, in cloud, public clouds, and in various and sundry funky combinations called hybrid multi-clouds. Okay, so, they've been making those investments in those partnerships and in public cloud enabling the Hortonworks Data Platform. Here at this show, DataWorks 2018 here in San Jose, they've released the latest major version, HDP 3.0 of their core platform with a lot of significant enhancements related to things that their customers are increasingly doing-- >> Well I want to ask you about those enhancements. >> But also they have partnership announcements, the deep ones of integration and, you know, lift and shift of the Hortonworks portfolio of HDP with Hortonworks DataFlow and DataPlane Services, so that those solutions can operate transparently on those public cloud environments as the customers, as and when the customers choose to shift their workloads. 'Cause Hortonworks really... You know, like Scott Gnau yesterday, I mean just laid it on the line, they know that the more of the public cloud workloads will predominate now in this space. They're just making these speculative investments that they absolutely have to now to prepare the way. So I think this cost that they're incurring now to prepare their entire portfolio for that inevitable future is the right thing to do and that's probably why they still have not attained massive rock and rollin' positive cash flow yet but I think that they're preparing the way for them to do so in the coming decade. >> So their financial future is looking brighter and they're doing the right things. >> Yeah, yes. >> So now let's talk tech. And this is really where you want to be, Jim, I know you. >> Oh I get sleep now and I don't think about tech constantly. >> So as you've said, they're really doing a lot of emphasis now on their public cloud partnerships. >> Yes. >> But they've also launched several new products and upgrades to existing products, what are you seeing that excites you and that you think really will be potential game changers? >> You know, this is geeky but this is important 'cause it's at the very heart of Hortonworks Data Platform 3.0, containerization of more... When you're a data scientist, and you're building a machine learning model using data that's maintained, and is persisted, and processed within Hortonworks Data Platform or any other big data platform, you want the ability increasingly for developing machine learning, deep learning, AI in general, to take that application you might build while you're using TensorFlow models, that you build on HDP, they will containerize it in Docker and, you know, orchestrate it all through Kubernetes and all that wonderful stuff, and deploy it out, those AI, out to increasingly edge computing, mobile computing, embedded computing environments where, you know, the real venture capital mania's happening, things like autonomous vehicles, and you know, drones, and you name it. So the fact is that Hortonworks has made that in many ways the premier new feature of HDP 3.0 announced here this week at the show. That very much harmonizes with what their partners, where their partners are going with containerization of AI. IBM, one of their premier partners, very recently, like last month, I think it was, announced the latest version of IBM, what do they call it, IBM Cloud Private, which has embedded as a core feature containerization within that environment which is a prem-based environment of AI and so forth. The fact that Hortonworks continues to maintain close alignment with the capabilities that its public cloud partners are building to their respective portfolios is important. But also Hortonworks with its, they call it, you know, a single pane of glass, the DataPlane Services for metadata and monitoring and governance and compliance across this sprawling hybrid multi-cloud, these scenarios. The fact that they're continuing to make, in fact, really focusing on deep investments in that portfolio, so that when an IBM introduces or, AWS, whoever, introduces some new feature in their respective platforms, Hortonworks has the ability to, as it were, abstract above and beyond all of that so that the customer, the developer, and the data administrator, all they need to do, if they're a Hortonworks customer, is stay within the DataPlane Services and environment to be able to deploy with harmonized metadata and harmonized policies, and harmonized schemas and so forth and so on, and query optimization across these sprawling environments. So Hortonworks, I think, knows where their bread is buttered and it needs to stay on the DPS, DataPlane Services, side which is why a couple months ago in Berlin, Hortonworks made a, I think, the most significant announcement of the year for them and really for the industry, was that they announced the Data Steward Studio in Berlin. Tech really clearly was who addressed the GDPR mandate that was coming up but really did a stewardship as an end-to-end workflow for lots of, you know, core enterprise applications, absolutely essential. Data Steward Studio is a DataPlane Service that can operate across multi-cloud environments. Hortonworks is going to keep on, you know... They didn't have a DPS, DataPlane Services, announcements here in San Jose this week but you can best believe that next year at this time at this show, and in the interim they'll probably have a number of significant announcements to deepen that portfolio. Once again it's to grease the wheels towards a more purely public cloud future in which there will be Hortonworks DNA inside most of their customers' environments going forward. >> I want to ask you about themes of this year's conference. The thing is is that you were in Berlin at the last big Hortonworks DataWorks Summit. >> (speaks in foreign language) >> And really GDPR dominated the conversations because the new rules and regulations hadn't yet taken effect and companies were sort of bracing for what life was going to be like under GDPR. Now the rules are here, they're here to stay, and companies are really grappling with it, trying to understand the changes and how they can exist in this new regime. What would you say are the biggest themes... We're still talking about GDPR, of course, but what would you say are the bigger themes that are this week's conference? Is it scalability, is it... I mean, what would you say we're going, what do you think has dominated the conversations here? >> Well scalability is not the big theme this week though there are significant scalability announcements this week in the context of HDP 3.0, the ability to persist in a scale-out fashion across multi-cloud, billions of files. Storage efficiency is an important piece of the overall announcement with support for erasure coding, blah blah blah. That's not, you know, that's... Already, Hortonworks, like all of their cloud providers and other big data providers, provide very scalable environments for storage, workload management. That was not the hugest, buzzy theme in terms of the announcements this week. The buzz of course was HDP 3.0. Containerization, that's important, but you know, we just came out of the day two keynote. AI is not a huge focus yet for a lot of the Hortonworks customers who are here, the developers. They're, you know, most of their customers are not yet that far along in their deep learning journeys and whatever but they're definitely going there. There's plenty of really cool keynote discussions including the guy with the autonomous vehicles or whatever that, the thing we just came out of. That was not the predominant theme this week here in terms of the HDP 3.0. I think what it comes down to is that with HDP 3.0... Hive, though you tend to take it for granted, it's been in Hadoop from the very start, practically, Hive is now a full enterprise database and that's the core, one of the cores, of HDP 3.0. Hive itself, Hive 3.0 now is its version, is ACID compliant and that may be totally geeky to the most of the world but that enables it to support transactional applications. So more big data in every environment is supporting more traditional enterprise application, transactional applications that require like two-phase commit and all that goodness. The fact is, you know, Hortonworks have, from what I can see, is the first of the big data vendors to incorporate those enhancements to Hive 3.0 because they're so completely tuned in to the Hive environment in terms of a committer. I think in many ways that is the predominant theme in terms of the new stuff that will actually resonate with the developers, their customers here at the show. And with the, you know, enterprises in general, they can put more of their traditional enterprise application workloads on big data environments and specifically, Hortonworks hopes, its HDP 3.0. >> Well I'm excited to learn more here at the on theCube with you today. We've got a lot of great interviews lined up and a lot of interesting content. We got a great crew too so this is a fun show to do. >> Sure is. >> We will have more from day two of the.

Published Date : Jun 20 2018

SUMMARY :

Live from San Jose, in the heart James, it's great to be here with you One of the things that I really-- I think this is like the So it's an This is something that you had brought up of robust financial growth. in public cloud enabling the Well I want to ask you is the right thing to do doing the right things. And this is really where you Oh I get sleep now and I don't think of emphasis now on their announcement of the year at the last big Hortonworks because the new rules of the announcements this week. this is a fun show to do.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Hortonworks'	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
2011	DATE	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
James	PERSON	0.99+
23 industries	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Hive 3.0	TITLE	0.99+
2020s	DATE	0.99+
next year	DATE	0.99+
this week	DATE	0.99+
32 countries	QUANTITY	0.99+
Hive	TITLE	0.99+
11th year	QUANTITY	0.99+
yesterday	DATE	0.99+
first time	QUANTITY	0.99+
GDPR	TITLE	0.98+
last month	DATE	0.98+
DataPlane Services	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Scott Gnau	PERSON	0.98+
2008	DATE	0.98+
three	QUANTITY	0.98+
2,100 attendees	QUANTITY	0.98+
HDP 3.0	TITLE	0.98+
today	DATE	0.98+
Data Steward Studio	ORGANIZATION	0.98+
two-phase	QUANTITY	0.98+
one	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.96+
DataPlane	ORGANIZATION	0.96+
Day two	QUANTITY	0.96+
billions of files	QUANTITY	0.95+
first	QUANTITY	0.95+
day two	QUANTITY	0.95+
DPS	ORGANIZATION	0.95+
Data Platform 3.0	TITLE	0.94+
Hortonworks DataWorks Summit	EVENT	0.94+
DataWorks	EVENT	0.92+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Dan Potter. He is the VP Product Management at Attunity and also Ali Bajwah, who is the principal partner solutions engineer at Hortonworks. Thanks so much for coming on theCUBE. >> Pleasure to be here. >> It's good to be here. >> So I want to start with you, Dan, and have you tell our viewers a little bit about the company based in Boston, Massachusetts, what Attunity does. >> Attunity, we're a data integration vendor. We are best known as a provider of real-time data movement from transactional systems into data lakes, into clouds, into streaming architectures, so it's a modern approach to data integration. So as these core transactional systems are being updated, we're able to take those changes and move those changes where they're needed when they're needed for analytics for new operational applications, for a variety of different tasks. >> Change data capture. >> Change data capture is the heart of our-- >> They are well known in this business. They have changed data capture. Go ahead. >> We are. >> So tell us about the announcement today that Attunity has made at the Hortonworks-- >> Yeah, thank you, it's a great announcement because it showcases the collaboration between Attunity and Hortonworks and it's all about taking the metadata that we capture in that integration process. So we're a piece of a data lake architecture. As we are capturing changes from those source systems, we are also capturing the metadata, so we understand the source systems, we understand how the data gets modified along the way. We use that metadata internally and now we're built extensions to share that metadata into Atlas and to be able to extend that out through Atlas to higher data governance initiatives, so Data Steward Studio, into the DataPlane Services, so it's really important to be able to take the metadata that we have and to add to it the metadata that's from the other sources of information. >> Sure, for more of the transactional semantics of what Hortonworks has been describing they've baked in to HDP in your overall portfolios. Is that true? I mean, that supports those kind of requirements. >> With HTP, what we're seeing is you know the EDW optimization play has become more and more important for a lot of customers as they try to optimize the data that their EDWs are working on, so it really gels well with what we've done here with Attunity and then on the Atlas side with the integration on the governance side with GDPR and other sort of regulations coming into the play now, you know, those sort of things are becoming more and more important, you know, specifically around the governance initiative. We actually have a talk just on Thursday morning where we're actually showcasing the integration as well. >> So can you talk a little bit more about that for those who aren't going to be there for Thursday. GDPR was really a big theme at the DataWorks Berlin event and now we're in this new era and it's not talked about too, too much, I mean we-- >> And global business who have businesses at EU, but also all over the world, are trying to be systematic and are consistent about how they manage PII everywhere. So GDPR are those in EU regulation, really in many ways it's having ripple effects across the world in terms of practices. >> Absolutely and at the heart of understanding how you protect yourself and comply, I need to understand my data, and that's where metadata comes in. So having a holistic understanding of all of the data that resides in your data lake or in your cloud, metadata becomes a key part of that. And also in terms of enforcing that, if I understand my customer data, where the customer data comes from, the lineage from that, then I'm able to apply the protections of the masking on top of that data. So it's really, the GDPR effect has had, you know, it's created a broad-scale need for organizations to really get a handle on metadata so the timing of our announcement just works real well. >> And one nice thing about this integration is that you know it's not just about being able to capture the data in Atlas, but now with the integration of Atlas and Ranger, you can do enforcement of policies based on classifications as well, so if you can tag data as PCI, PII, personal data, that can get enforced through Ranger to say, hey, only certain admins can access certain types of data and now all that becomes possible once we've taken the initial steps of the Atlas integration. >> So with this collaboration, and it's really deepening an existing relationship, so how do you go to market? How do you collaborate with each other and then also service clients? >> You want to? >> Yeah, so from an engineering perspective, we've got deep roots in terms of being a first-class provider into the Hortonworks platform, both HDP and HDF. Last year about this time, we announced our support for acid merge capabilities, so the leading-edge work that Hortonworks has done in bringing acid compliance capabilities into Hive, was a really important one, so our change to data capture capabilities are able to feed directly into that and be able to support those extensions. >> Yeah, we have a lot of you know really key customers together with Attunity and you know maybe a a result of that they are actually our ISV of the Year as well, which they probably showcase on their booth there. >> We're very proud of that. Yeah, no, it's a nice honor for us to get that distinction from Hortonworks and it's also a proof point to the collaboration that we have commercially. You know our sales reps work hand in hand. When we go into a large organization, we both sell to very large organizations. These are big transformative initiatives for these organizations and they're looking for solutions not technologies, so the fact that we can come in, we can show the proof points from other customers that are successfully using our joint solution, that's really, it's critical. >> And I think it helps that they're integrating with some of our key technologies because, you know, that's where our sales force and our customers really see, you know, that as well as that's where we're putting in the investment and that's where these guys are also investing, so it really, you know, helps the story together. So with Hive, we're doing a lot of investment of making it closer and closer to a sort of real-time database, where you can combine historical insights as well as your, you know, real-time insights. with the new acid merge capabilities where you can do the inserts, updates and deletes, and so that's exactly what Attunity's integrating with with Atlas. We're doing a lot of investments there and that's exactly what these guys are integrating with. So I think our customers and prospects really see that and that's where all the wins are coming from. >> Yeah, and I think together there were two main barriers that we saw in terms of customers getting the most out of their data lake investment. One of them was, as I'm moving data into my data lake, I need to be able to put some structure around this, I need to be able to handle continuously updating data from multiple sources and that's what we introduce with Attunity composed for Hive, building out the structure in an automated fashion so I've got analytics-ready data and using the acid merge capabilities just made those updates much easier. The second piece was metadata. Business users need to have confidence that the data that they're using. Where did this come from? How is it modified? And overcoming both of those is really helping organizations make the most of those investments. >> How would you describe customer attitudes right now in terms of their approach to data because I mean, as we've talked about, data is the new oil, so there's a real excitement and there's a buzz around it and yet there's also so many high-profile cases of breeches and security concerns, so what would you say, is it that customers, are they more excited or are they more trepidatious? How would you describe the CIL mindset right now? >> So I think security and governance has become top of minds right, so more and more the serveways that we've taken with our customers, right, you know, more and more customers are more concerned about security, they're more concerned about governance. The joke is that we talk to some of our customers and they keep talking to us about Atlas, which is sort of one of the newer offerings on governance that we have, but then we ask, "Hey, what about Ranger for enforcement?" And they're like, "Oh, yeah, that's a standard now." So we have Ranger, now it's a question of you know how do we get our you know hooks into the Atlas and all that kind of stuff, so yeah, definitely, as you mentioned, because of GDPR, because of all these kind of issues that have happened, it's definitely become top of minds. >> And I would say the other side of that is there's real excitement as well about the possibilities. Now bringing together all of this data, AI, machine learning, real-time analytics and real-time visualization. There's analytic capabilities now that organizations have never had, so there's great excitement, but there's also trepidation. You know, how do we solve for both of those? And together, we're doing just that. >> But as you mentioned, if you look at Europe, some of the European companies that are more hit by GDPR, they're actually excited that now they can, you know, really get to understand their data more and do better things with it as a result of you know the GDPR initiative. >> Absolutely. >> Are you using machine learning inside of Attunity in a Hortonworks context to find patterns in that data in real time? >> So we enable data scientists to build those models. So we're not only bringing the data together but again, part of the announcement last year is the way we structure that data in Hive, we provide a complete historic data store so every single transaction that has happened and we send those transactions as they happen, it's at a big append, so if you're a data scientist, I want to understand the complete history of the transactions of a customer to be able to build those models, so building those out in Hive and making those analytics ready in Hive, that's what we do, so we're a key enabler to machine learning. >> Making analytics ready rather than do the analytics in the spring, yeah. >> Absolutely. >> Yeah, the other side to that is that because they're integrated with Atlas, you know, now we have a new capability called DataPlane and Data Steward Studio so the idea there is around multi-everything, so more and more customers have multiple clusters whether it's on-prem, in the cloud, so now more and more customers are looking at how do I get a single glass pane of view across all my data whether it's on-prem, in the cloud, whether it's IOT, whether it's data at rest, right, so that's where DataPlane comes in and with the Data Steward Studio, which is our second offering on top of DataPlane, they can kind of get that view across all their clusters, so as soon as you know the data lands from Attunity into Atlas, you can get a view into that across as a part of Data Steward Studio, and one of the nice things we do in Data Steward Studio is that we also have machine learning models to do some profiling, to figure out that hey, this looks like a credit card, so maybe I should suggest this as a tag of sensitive data and now the end user, the end administration has the option of you know saying that okay, yeah, this is a credit card, I'll accept that tag, or they can reject that and pick one of their own. >> Will any of this going forward of the Attunity CDC change in the capture capability be containerized for deployment to the edges in HDP 3.0? I mean, 'cause it seems, I mean for internetive things, edge analytics and so forth, change data capture, is it absolutely necessary to make the entire, some call it the fog computing, cloud or whatever, to make it a completely transactional environment for all applications from micro endpoint to micro endpoint? Are there any plans to do that going forward? >> Yeah, so I think what HDP 3.0 as you mentioned right, one of the key factors that was coming into play was around time to value, so with containerization now being able to bring third-party apps on top of Yarn through Docker, I think that's definitely an avenue that we're looking at. >> Yes, we're excited about that with 3.0 as well, so that's definitely in the cards for us. >> Great, well, Ali and Dan, thank you so much for coming on theCUBE. It's fun to have you here. >> Nice to be here, thank you guys. >> Great to have you. >> Thank you, it was a pleasure. >> I'm Rebecca Knight, for James Kobielus, we will have more from DataWorks in San Jose just after this. (techno music)

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Dan Potter	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Ali Bajwah	PERSON	0.99+
Dan	PERSON	0.99+
Ali Bajwa	PERSON	0.99+
Ali	PERSON	0.99+
James Kobielus	PERSON	0.99+
Thursday morning	DATE	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
Attunity	ORGANIZATION	0.99+
Last year	DATE	0.99+
One	QUANTITY	0.99+
second piece	QUANTITY	0.99+
GDPR	TITLE	0.99+
Atlas	ORGANIZATION	0.99+
Thursday	DATE	0.99+
both	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
Ranger	ORGANIZATION	0.98+
second offering	QUANTITY	0.98+
DataWorks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
Atlas	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
today	DATE	0.97+
DataWorks Summit 2018	EVENT	0.96+
two main barriers	QUANTITY	0.95+
DataPlane Services	ORGANIZATION	0.95+
DataWorks Summit 2018	EVENT	0.94+
one	QUANTITY	0.93+
San Jose, California	LOCATION	0.93+
Docker	TITLE	0.9+
single glass	QUANTITY	0.87+
3.0	OTHER	0.85+
European	OTHER	0.84+
Attunity	PERSON	0.84+
Hive	LOCATION	0.83+
HDP 3.0	OTHER	0.82+
one nice thing	QUANTITY	0.82+
DataWorks Berlin	EVENT	0.81+
EU	ORGANIZATION	0.81+
first	QUANTITY	0.8+
DataPlane	TITLE	0.8+
EU	LOCATION	0.78+
EDW	TITLE	0.77+
Data Steward Studio	ORGANIZATION	0.73+
Hive	ORGANIZATION	0.73+
Data Steward Studio	TITLE	0.69+
single transaction	QUANTITY	0.68+
Ranger	TITLE	0.66+
Studio	COMMERCIAL_ITEM	0.63+
CDC	ORGANIZATION	0.58+
DataPlane	ORGANIZATION	0.55+
them	QUANTITY	0.53+
HDP 3.0	OTHER	0.52+

Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Muggie van Staden, Obsidian | Dataworks Summit 2018

>> Voiceover: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Hi, hello, welcome to theCUBE, I'm James Kobielus. I'm the lead analyst for Big Data Analytics at the Wikibon, which is the team inside of SiliconANGLE Media that focuses on emerging trends and technologies. We are here, on theCUBE at DataWorks Summit 2018 in Berlin, Germany. And I have a guest here. This is, Muggie, and if I get it wrong, Muggie Van Staden >> That's good enough, yep. >> Who is with Obsidian, which is a South Africa-based partner of Hortonworks. And I'm not familiar with Obsidian, so I'm going to ask Muggie to tell us a little bit about your company, what you do, your focus on open source, and really the opportunities you see for big data, for Hadoop, in South Africa, really the African continent as a whole. So, Muggie? >> Yeah, James great to be here. Yes, Obsidian, we started it 23 years ago, focusing mostly on open source technologies, and as you can imagine that has changed a lot over the last 23 years when we started the concept of selling Linux was basically a box with a hat and maybe a T-shirt in it. Today that's changed. >> James: Hopefully there's a stuffed penguin in there, too. (laughing) I could use that right now. >> Maybe a manual. So our business has evolved a lot over the last 23 years. And one of the technologies that has come around is Hadoop. And we actually started with some of the other Hadoop vendors out there as our first partnerships, and probably three or four years ago we decided to take on Hortonworks as one of our vendors. We found them an amazing company to work with. And together with them we've now worked in four of the big banks in South Africa. One of them is actually here at DataWorks Summit. They won an award last night. So it's fantastic to be part of all of that. And yes, South Africa being so far removed from the rest of the world. They have different challenges. Everybody's nervous of Cloud. We have the joys that we don't really have any Cloud players locally yet. The two big players are in Microsoft and Amazon are planning some data centers soon. So the guys have different challenges to Europe and to the States. But big data, the big banks are looking at it, starting to deploy nice Hadoop clusters, starting to ingest data, starting to get real business value out of it, and we're there to help, and hopefully the four is the start for us and we can help lots of customers on this journey. >> Are South African-based companies, because you are so distant in terms of miles on the planet from Europe, from the EU, is any company in South Africa, or many companies, concerned at all about the global, or say the general data protection regulation, GDPR? US-based companies certainly are 'cause they operate in Europe. So is that a growing focus for them? And we have five weeks until GDPR kicks in. So tell me about it. >> Yeah, so from a South African point of view, some of the banks and some of the companies would have subsidiaries in Europe. So for them it's a very real thing. But we have our own Act called PoPI, which is the protection of private information, so very similar. So everybody's keeping an eye on it. Everybody's worried. I think everybody's worried for the first company to be fined. And then they will all make sure that they get their things right. But, I think not just because of a legislation, I think it's something that everybody should worry about. How do we protect data? How do we make sure the right people have access to the correct data when they should and nobody violates that because I mean, in this day and age, you know, Google and Amazon and those guys probably know more about me than my family does. So it's a challenge for everybody. And I think it's just the right thing for companies to do is to make sure that the data that they do have that they really do take good care of it. We trust them with our money and now we're trusting them with our data. So it's a real challenge for everybody. >> So how long has Obsidian been a partner of Hortonworks and how has your role, or partnership I should say, evolved over that time, and how do you see it evolving going forward. >> We've been a partner about three or four years now. And started off as a value added reseller. We also a training partner in South Africa for them. And as they as company have evolved, we've had to evolve with them. You know, so they started with HTTP as the Hadoop platform. Now they're doing NiFi and HDF, so we have to learn all of those technologies as well. But very, very excited where they're going with DataPlane service just managing a customer's data across multiple clusters, multiple clouds, because that's realistically where we see all the customers going, is you know clusters, on-premise clusters in typically multiple Clouds and how do you manage that? And we are very excited to walk this road together with Hortonworks and all the South African customers that we have. >> So you say your customers are deploying multiple Clouds. Public Clouds or hybrid private-public Clouds? Give us a sense, for South Africa, whether public Cloud is a major, or is a major deployment option or choice for financial services firms that you work with. >> Not necessarily financial services, so most of them are kicking tires at this stage, nobody's really put major work loads in there. As I mentioned, both Amazon and Microsoft are planning to put data centers down in South Africa very soon, and I think that will spur a big movement towards Cloud, but we do have some customers, unfortunately not Hortonworks customers, that are actually mostly in the Cloud. And they are now starting to look at a multi-Cloud strategy. So to ideally be in the three or four major Cloud providers and spinning up the right workloads in the right Cloud, and we're there to help. >> One of the most predominant workloads that your customers are running in the Cloud, is it backend in terms of data ingest and transformation? Is it a bit of maybe data warehousing with unstructured data? Is it a bit of things like queriable archiving. I want to get a sense for, what is predominant right now in workloads? >> Yeah I think most of them start with (mumble) environments. (mumbles) one customer that's heavily into Cloud from a data point of view. Literally it's their data warehouse. They put everything in there. I think from the banking customers, most of them are considering DR of their existing Hadoop clusters, maybe a subset of their data and not necessarily everything. And I think some of them are also considering putting their unstructured data outside on the Cloud because that's where most of it's coming from. I mean, if you have Twitter, Facebook, LinkedIn data, it's a bit silly to pull all of that into your environment, why not just put it in the Cloud, that's where it's coming from, and analyze that and connect it back to your data where relevant. So I think a lot of the customers would love to get there, and now Hortonworks makes it so much easier to do that. I think a lot of them will start moving in that direction. Now, excuse me, so are any or many of your customers doing development and training of machine learning algorithms and models in their Clouds? And to the extent that they are, are they using tools like the IBM Data Science Experience that Hortonworks resells for that? >> I think it's definitely on the radar for a lot of them. I'm not aware of anybody using it yet, but lots of people are looking at it and excited about the partnership between IBM and Hortonworks. And IBM has been a longstanding player in the South African market, and it's exciting for us as well to bring them into the whole Hortonworks ecosystem, and together solve real world problems. >> Give us a sense for how built out the big data infrastructure is in neighboring countries like Botswana or Angola or Mozambique and so forth. Is that an area that your company, are those regions that your company operates in? Sells into? >> We don't have offices, but we don't have a problem going in and helping customers there, so we've had projects in the past, not data related, that we've flown in and helped people. Most of the banks from a South African point of view, have branches into Africa. So it's on the roadmap, some are a little bit ahead of others, but definitely on the roadmap to actually put down Hadoop clusters in some of the major countries all throughout Africa. There's a big debate, do you put it down there, do you leave the data in South Africa? So they're all going through their own legislation, but it's definitely on the roadmap for all of them to actually take their data, knowledge in data science, up into Africa. >> Now you say that in South Africa Proper, there are privacy regulations, you know, maybe not the same as GDPR, but equivalent. Throughout Africa, at least throughout Southern Africa, how is privacy regulation lacking or is it emerging? >> I think it's emerging. A lot of the countries do have the basic rule that their data shouldn't leave the country. So everybody wants that data sovereignty and that's why a lot of them will not go to Cloud, and that's part of the challenges for the banks, that if they have banks up in Botswana, etc. And Botswana rules are our data has to stay in country. They have to figure out a way how do they connect that data to get the value for all of their customers. So real world challenges for everybody. >> When you're going into and selling into an emerging, or developing nation, of you need to provide upfront consulting to help the customer bootstrap their own understanding of the technology and making the business case and so forth. And how consultative is the selling process... >> Absolutely, and what we see with the banks, most of them even have a consultative approach within their own environment, so you would have the South African team maybe flying into the team at (mumbles) Botswana, and share some of the learnings that they've had. And then help those guys get up to speed. The reality is the skills are not necessarily in country. So there's a lot of training, a lot of help to go and say, we've done this, let us upscale you. And be a part of that process. So we sometimes send in teams to come and do two, three day training, basics, etc., so that ultimately the guys can operationalize in each country by themselves. >> So, that's very interesting, so what do you want to take away from this event? What do you find most interesting in terms of the sessions you've been in around the community showcase that you can take back to Obsidian, back in your country and apply? Like the announcement this morning of the Data Steward Studio. Do you see a possible, that your customers might be eager to use that for curation of their data in their clusters? >> Definitely, and one of the key messages for me was Scott, the CTO's message about your data strategy, your Cloud strategy, and your business strategy. It is effectively the same thing. And I think that's the biggest message that I would like to take back to the South African customers is to go and say, you need to start thinking about this. You know, as Cloud becomes a bigger reality for us, we have to align, we have to go and say, how do we get your data where it belongs? So you know, we like to say to our customers, we help the teams get the right code to the right computer and the right data, and I think it's absolutely critical for all of the customers to go and say, well, where is that data going to sit? Where is the right compute for that piece of data? And can we get it then, can we manage it, etc.? And align to business strategy. Everybody's trying to do digital transformation, and those three things go very much hand-in-hand. >> Well, Muggie, thank you very much. We're at the end of our slot. This has been great. It's been excellent to learn more about Obsidian and the work you're doing in South Africa, providing big data solutions or working with customers to build the big data infrastructure in the financial industry down there. So this has been theCUBE. We've been speaking with Muggie Van Staden of Obsidian Systems, and here at DataWorks Summit 2018 in Berlin. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. I'm the lead analyst for Big Data Analytics at the Wikibon, and really the opportunities you see for big data, and as you can imagine that has changed a lot I could use that right now. So the guys have different challenges to Europe or say the general data protection regulation, GDPR? And I think it's just the right thing for companies to do and how do you see it evolving going forward. And we are very excited to walk this road together So you say your customers are deploying multiple Clouds. And they are now starting to look at a multi-Cloud strategy. One of the most predominant workloads and now Hortonworks makes it so much easier to do that. and excited about the partnership the big data infrastructure is in neighboring countries but definitely on the roadmap to actually put down you know, maybe not the same as GDPR, and that's part of the challenges for the banks, And how consultative is the selling process... and share some of the learnings that they've had. around the community showcase that you can take back for all of the customers to go and say, and the work you're doing in South Africa,

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Muggie Van Staden	PERSON	0.99+
Africa	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Muggie van Staden	PERSON	0.99+
Botswana	LOCATION	0.99+
Mozambique	LOCATION	0.99+
Angola	LOCATION	0.99+
Muggie	PERSON	0.99+
Scott	PERSON	0.99+
South Africa	LOCATION	0.99+
James	PERSON	0.99+
Southern Africa	LOCATION	0.99+
two	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
three day	QUANTITY	0.99+
three	QUANTITY	0.99+
GDPR	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
Twitter	ORGANIZATION	0.99+
Obsidian Systems	ORGANIZATION	0.99+
first company	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
four	QUANTITY	0.99+
first partnerships	QUANTITY	0.99+
three	DATE	0.99+
Today	DATE	0.98+
Linux	TITLE	0.98+
23 years ago	DATE	0.98+
DataWorks Summit 2018	EVENT	0.98+
both	QUANTITY	0.97+
EU	LOCATION	0.97+
Wikibon	ORGANIZATION	0.97+
one	QUANTITY	0.97+
PoPI	TITLE	0.97+
Data Steward Studio	ORGANIZATION	0.97+
each country	QUANTITY	0.97+
Cloud	TITLE	0.97+
US	LOCATION	0.96+
last night	DATE	0.96+
SiliconANGLE Media	ORGANIZATION	0.96+
four years	QUANTITY	0.96+
DataWorks Summit	EVENT	0.96+
Hadoo	ORGANIZATION	0.96+
One	QUANTITY	0.96+
Dataworks Summit 2018	EVENT	0.95+
Hadoop	ORGANIZATION	0.93+
about three	QUANTITY	0.93+
two big players	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Rob Thomas, IBM | Big Data NYC 2017

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Rob Thomas	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
10	QUANTITY	0.99+
New York	LOCATION	0.99+
10 products	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
first book	QUANTITY	0.99+
two books	QUANTITY	0.99+
a day	QUANTITY	0.99+
Rob	PERSON	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Francisco Bay	LOCATION	0.99+
five products	QUANTITY	0.99+
second book	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first one	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
Spark	TITLE	0.99+
SQL	TITLE	0.99+
Common SQL	TITLE	0.98+
datascience.ibm.com	OTHER	0.98+
eighth year	QUANTITY	0.98+
One	QUANTITY	0.98+
one issue	QUANTITY	0.97+
Hortonworks Dataplane	ORGANIZATION	0.97+
three platforms	QUANTITY	0.97+
Strata Hadoop	TITLE	0.97+
today	DATE	0.97+
The Data Revolution	TITLE	0.97+
Cloudera	ORGANIZATION	0.97+
second	QUANTITY	0.96+
NYC	LOCATION	0.96+
two big problems	QUANTITY	0.96+
Analytics University	ORGANIZATION	0.96+
step two	QUANTITY	0.96+
one way	QUANTITY	0.96+
November first	DATE	0.96+
Big Data Revolution	TITLE	0.95+
one	QUANTITY	0.94+
Every Company is a Tech Company	TITLE	0.94+
CUBE	ORGANIZATION	0.93+
this year	DATE	0.93+
two different concepts	QUANTITY	0.92+
one system	QUANTITY	0.92+
step one	QUANTITY	0.92+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Jagane Sundar & Pranav Rastogi | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE, covering Big Data, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back, everyone. Live in Manhattan, this is theCUBE's coverage of our fifth year doing Big Data, NYC; eighth year covering Hadoop World, which is now evolved into Strata Data which is right around the corner. We're doing that in conjunction with that event. This is, again, where we have the thought leaders, we have the experts, we have the entrepreneurs and CEOs come in, of course. The who's who in tech. And my next two guests, is Jagane Sundar, CUBE alumni, who was on yesterday. CTO of WANdisco, one of the hottest companies, most valuable companies in the space for their unique IP, and not a lot of people know what they're doing. So congratulations on that. But you're here with one of your partners, a company I've heard of, called Microsoft, also doing extremely well with Azure Cloud. We've got Pranav Rastogi, who's the program manager of Microsoft Cloud Azure. You guys have an event going on as well at Microsoft Ignite which has been creating a lot of buzz this year again. As usual, they have a good show, but this year the Cloud certainly has taken front and center. Welcome to theCUBE, and good to see you again. >> Thank you. >> Thank you. >> Alright, so talk about the partnership. You guys, Jagane deals with all the Cloud guys. You're here with Microsoft. What's going on with Microsoft? Obviously they've been, if you look at the stock price. From 20-something to a complete changeover of the leadership of Satya Nadella. The company has mobilized. The Cloud has got traction, putting a dent in the universe. Certainly, Amazon feels a little bit of pain there. But, in general, a lot more work to do. What are you guys doing together? Share the relationship. >> So, we just announced a product that's a one-click deployment in the Microsoft Azure Cloud, off WANdisco's Fusion Replication technology. So, if you got some data assets, Hadoop or Cloud object stores on-premise and you want to create a hybrid or a Cloud environment with Azure and Picture, ours is the only way of doing Active/Active. >> Active/Active. And there is some stuff out there that's looking like Active/Active. DataPlane by Hortonworks. But it's fully not Active/Active. We talked a little bit about that yesterday. >> Jagane: Yes. >> Microsoft, you guys, what's interesting about these guys besides the Active/Active? It's a unique thing. It's an ingredient for you guys. >> Yes, the interesting thing for us is, the biggest problem that we think customers have for big data perspective is, if you look at the landscape of the ecosystem in terms of open source projects that are available it's very hard to a: figure out How do I use this software?, b: How do I install it? And, so what we have done is created an experience in Azure HDInsight where you can discover these applications, within the context of your cluster and you can install these applications by one-click install. Which installs the application, configures it, and then you're good to go. We think that this is going to sort of increase the productivity of users trying to get sense out of big data. The key challenges we think customers have today is setting up some sort of hybrid environment between how do you connect your on premise data to move it to the Cloud, and there are different use cases that you can have you can move parts of the data and you can do experiment easily in the Cloud. So what we've done is, we've enabled WANdisco as an application on our HDInsight application platform, where customers can install it using a single-click deploy connected with the data that's sitting on-prem, use the Active/Active feature to have both these environments running simultaneously and they're in sync. >> So one benefits the one-click thing, that's on your side, right? You guys are enabling that. So, okay, I get that. That's totally cool. We'll get to that in a second. I want to kind of drill down on that. But, what's the benefit to the customers, that you guys are having? So, I'm a customer, I one-click, I want some WANdisco Active/Active. Why am I doing it? What does the Cloud change? How does your Cloud change from that experience? >> One example that you can think about is going to change is in an on-premise environment you have a cluster running, but you're kind of limited on what you can do with the cluster, because you've already setup the number of nodes and the workloads your running is fairly finite, but what's happening in reality and today is, lots of users, especially in the machine learning space, and AI space, and the analytic space are using a lot of open source libraries and technologies and they're using it on top of Hadoop, and they're using it on top of Spark. However, in experimenting with these technologies is hard on-prem because it's a locked environment. So we believe, with the Cloud, especially with it offering WANdisco and HDInsight, once you move the data you can start spinning up clusters, you can start installing more open source libraries, experiment, and you can shut down the clusters when you're done. So it's going to increase your efficiency, it's going to allow you to experiment faster, and it's going to reduce for cost as well, because you don't have to have the cluster running all the time and once you are done with your experimentation, then you can decide which way do you want to go. So, it's going to remove the-- >> Jagane, what's your experience with Azure? A lot of people have been, some people have been critical, and rightfully so. You guys are moving as fast you can. You can only go as fast you can, but the success of the Cloud has been phenomenal. You guys have done a great job with the Cloud. Got to give you props on that. Your customers are benefiting, or Microsoft's customers are benefiting. How's the relationship? Are you getting more customers through these guys? Are you bringing customers from on-prem to Cloud? How's the customer flow going? >> Almost all of our customers who have on-prem instances of Hadoop are considering Cloud in one form or the other. Different Clouds have different strengths, as they've found-- >> Interviewer: And different technologies. >> Indeed. And Azure's strengths appear to be the HDInsight piece of it and as Pranam just mentioned, the cool thing is, you can replicate into the Cloud, start up a 50 node Spark cluster today to run a query, that may return results to you really fast. Now, remember this is data that you can write to both in the Cloud and on-premise. It's kept consistent by our technology, or tomorrow you may find that somebody tells you, Hive with the new Tez enhancements is faster, sure, spin up a hundred node Hive cluster in the Cloud, HDInsight supports that really well. You're getting consistent data and your queries will respond much faster than your on-premise. >> We've had Oliver Chu on, before with Hortonworks obviously they're partnering there. HDInsight's been getting a lot of traction lately. Where's that going? We've seen some good buzz on that. Good people talking about it. What's the latest update on your end? >> HDInsight is doing really good. The customers love the ease of creating a cluster using just a few clicks and the benefits that customers get, clusters are optimized for certain scenarios. So if you're doing data science, you can create a Spark cluster, install open source libraries. We have Microsoft R Server running on Spark, which is a unique offering to Microsoft, which lots of customers have appreciated. You also have streaming scenarios that you can do using open source technologies, like we have Apache Kafka running on a stack, which is becoming very popular from an ingestion perspective. Folks have been-- >> Has the Kupernetes craze come down to your group yet? Has it trickled down? It seems to be going crazy. You hired an amazing person from Google, Brendan Burns, we've interviewed before. He's part of the original Kubernetes spec he now works for Microsoft. What's the buzz on the Kubernetes container world there? >> In general, Microsoft Azure has seen great benefits out of it. We are seeing lots of traction in that space. From my role in particular, I focus more on the HDInsight big data space, which is kind of outside of what we do with Kubernetes' work. >> And your relationship is going strong with WANdisco? >> Pranav: Yes. >> Right. >> We just launched this offering just about yesterday is what we announced and we're looking forward to getting customers on to the stack. >> That's awesome. What's your take on the industry right now? Obviously, the partnerships are becoming clearer as people can see there's (mumbles). You're starting to see the notion of infrastructure and services are changing. More and more people want services and then you got the classic infrastructure which looks like it's going to be hybrid. That's pretty clear, we see that. Services versus infrastructure, how should customers think about how they architect their environments? So they can take advantage of the Active/Active and also have a robust, clean, not a lot of re-skilling going on, but more of a good organization from a personnel standpoint, but yet get to a hybrid architecture? >> So, it depends, the Cloud gives you lots of options to meet the customers where they are. Different customers have different kinds of requirements. Customers who have specialized, some of their applications will probably want to go more of an infrastructure route, but customers also love to have some of the past benefits where, you know, I have a service running where I don't have to worry about the infrastructure, how dispatching happen, how does OS updates happen, how does maintenance happen. They want to sort of rely on the Microsoft Azure Cloud provider to take care of it. So that they can focus on their application specific logic, or business specific logic, or analytical workloads, and worry about optimizing those parts of the application because that is their core-- >> It's been great.I want to get your thoughts real quick. Share some color. What's going on inside Microsoft? Obviously, open source has become a really big part of the culture, even just at Ignite. More Linux news is coming. You guys have been involved in Linux. Obviously, open source with Azure, ton of stuff, I know is built in the Microsoft Cloud on open source. You're contributing now as to Kubernetes, as I mentioned earlier. Seems to be a good cultural shift at Microsoft. What's the vibe on the open source internally at Microsoft? Can you share, just some anecdotal insight into what's the vibe like inside, around open source? >> The vibe has increased quite a lot around open source. You rightly mentioned, just recently we've announced a SQL server on Linux as well, at the Ignite conference. You can also deploy a SQL server on a docker container, which is quite revolutionary if you think about how forward we have come. Open source is so pervasive it's almost used in a lot of these projects. Microsoft employees are contributing back to open source projects in terms of, bug fixes, feature requests, or documentation updates. It's a very, very active community and by and large I think customers are benefiting a lot, because there are so many folks working together on open source projects and making them successful and especially around the Azure stack, we also ensure that you can run these open source workloads lively in the Cloud. From an enterprise perspective, you get the best of both worlds. You get the latest innovations happening in open source, plus the reliability of the managed platform that Azure provides at an enterprise scale. >> So again, obviously Microsoft partnership is huge, all the Clouds as well. Where do you want to take the relationship with Microsoft? What happens next? You guys are just going to continue to do business, you're like expecting the one-click's nice, I have some questions on that. What happens next? >> So, I see our partnership becoming deeper. We see the value that HDInsight brings to the ecosystem and all of that value is captured by the data. At the end of the day, if you have stale data, if you have data that you can't rely on the applications are useless. So we see ourselves getting more and more deeply embedded in the system. We see of ourselves as an essential part of the data strategy for Azure. >> Yeah, we see continuous integration as a development concept, continuous analytics as a term, that's being kicked around. We were talking yesterday about, here in theCUBE, real time, I want some data real time and IT goes back, "Here it is, it's real time!" No, but the data's three weeks old. I mean, real time (laughs) is a word that doesn't mean I got to see it really fast, low latency response. Well, that's not the data I want. I meant the data in real time, not you giving me a real time query. So again, this brings up a mind shift in terms of the new way to do business in the Cloud and hybrid. It's changing the game. As customers scratch their heads and try to figure out how to make their organizations more DevOps oriented, what do you guys see for advice for those managers, who are really getting behind it, really want to make change, who kind of have to herd the cats a little bit, and maybe break out security and put it in it's own group? Or you come and say, okay IT guys we're going to change into our operating model, even on-prem, we'll use some burst in to the Cloud, Azure's got 365 on there, lot of coolness developing. What's the advice for the mindset of the change agents out there that are going to do the transformation? >> My advice would be, if you've done the same thing by hand over two times, it's time you automated it, but-- >> Interviewer: Two times?! >> Two times. >> No three rule? Three strikes you're out? >> You're saying two, contrarian. >> That's a careful statement. Because, if you try automating something that you've never actually tried by hand, that's a disaster as well. A couple times, so you know how it's supposed to work. >> Interviewer: Get a good groove on it. >> Right, then you optimize, you automate, and then you turn the knobs. So, you try a hundred node cluster, maybe that's going to be faster. Maybe after a certain point, you don't get any improvements, so you know how to-- >> So take some baby steps, and one easy way to do it is to automate something that you've done. >> Jagane: Yes, exactly. >> That's almost risk-free, relatively speaking. Thoughts, advice to change agents out there. This is your industry hat on. You can take your Microsoft hat off. >> Baby steps. So you start small, you get familiar with the environment and your toolsets are provided so that you get a consistent experience on what you were doing on-prem and sort of in a hybrid space. And the whole idea is as you get more comfortable the benefits of the Cloud far outweigh any sort of cultural changes that need to happen-- >> Guys, thanks for coming on theCUBE, really appreciate it. Thoughts on the Big Data NYC this week? What do you think? >> I think it's a conference that has a lot of Cloud hanging over it and people are scratching their heads. Including vendors, customers, everybody scratching their head, but there is a lot of Cloud in this conference, although this is not a Cloud conference. >> Yeah, they're trying to make it an AI conference. A lot of AI watching certainly we're seeing that everywhere. But again, nothing wrong hyping up AI. It's good for society. It really is cool, but still, that's talking about baby steps, AI is still not there. It seems like, AI from when I got my CS degree in the 80's, not a lot innovation, well machine learning is getting better, but, a lot more way to go on AI. Don't you think? >> Yes, you know a few of the announcements we've made in this week is all about making it easier for developers to get started with AI and machine learning and our whole hope is with these investments that we've done and Azure machine learning improvements and the companion app and the workbench, allows you to get started very easily with AI and machine learning models and you can apply and build these models, do a CICD process and deploy these models and be more effective in the space. >> Yeah and also the tooling market has kind of gotten out of control. We were just joking the other day, that there's this tool shed mindset where everything is in the tool shed and people bought a hammer and turned it into a lawnmower. So it's like, you got to be careful which tools you have. Think about a platform. Think holistically, but if you take the baby steps and implement it, certainly it's there. My personal opinion, I think the Cloud is the equalizer. Cloud can bring compute power that changes what a tool was built for. Even, go back six years, the tools that were out there even six years ago are completely changed by the impact of unlimited, potentially unlimited capacity horsepower. So, okay that resets a little bit. You agree? >> I do. I totally agree. >> Who wins, who loses on the reset? >> The Cloud is an equalizer, but there is a mindset shift that goes with that those who can adapt to the mindset shift, will win. Those who can not and are still clinging to their old practices will have a hard time. >> Yeah, it's exciting. If you're still reinventing Hadoop from 2011 then, probably not good shape right now. >> Jagane: Not a good place to be. >> Using Hadoop is great for Bash, but you can't make that be a lawnmower. That's my opinion. Okay, thanks for coming on. I appreciate it (laughs) You're smiling, you got something that you, no? >> Pranav: (laughs) Thank you so much for that comment. >> Yeah, tool sheds are out there, be careful. Guys do your job. Congratulations on your partnership, appreciate it. This is theCUBE, live in New York. More after this short break. We'll be right back.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media Welcome to theCUBE, and good to see you again. of the leadership of Satya Nadella. and you want to create a hybrid We talked a little bit about that yesterday. It's an ingredient for you guys. and there are different use cases that you can have that you guys are having? and once you are done with your experimentation, Got to give you props on that. in one form or the other. the cool thing is, you can replicate into the Cloud, What's the latest update on your end? You also have streaming scenarios that you can do using Has the Kupernetes craze come down to your group yet? I focus more on the HDInsight big data space, on to the stack. and then you got the classic infrastructure So, it depends, the Cloud gives you lots of options of the culture, even just at Ignite. and especially around the Azure stack, Where do you want to take the relationship with Microsoft? At the end of the day, if you have stale data, in terms of the new way to do A couple times, so you know how it's supposed to work. and then you turn the knobs. and one easy way to do it is to You can take your Microsoft hat off. And the whole idea is as you get more comfortable Thoughts on the Big Data NYC this week? but there is a lot of Cloud in this conference, Don't you think? and you can apply and build these models, So it's like, you got to be careful which tools you have. I totally agree. and are still clinging to their old practices Yeah, it's exciting. but you can't make that be a lawnmower. Congratulations on your partnership, appreciate it.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Brendan Burns	PERSON	0.99+
Two times	QUANTITY	0.99+
2011	DATE	0.99+
Amazon	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Satya Nadella	PERSON	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Jagane Sundar	PERSON	0.99+
three weeks	QUANTITY	0.99+
Jagane	PERSON	0.99+
fifth year	QUANTITY	0.99+
Manhattan	LOCATION	0.99+
yesterday	DATE	0.99+
HDInsight	ORGANIZATION	0.99+
CUBE	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
WANdisco	ORGANIZATION	0.99+
20	QUANTITY	0.99+
Pranav	PERSON	0.99+
one-click	QUANTITY	0.99+
Pranav Rastogi	PERSON	0.99+
two	QUANTITY	0.99+
New York City	LOCATION	0.99+
Midtown Manhattan	LOCATION	0.99+
this year	DATE	0.99+
eighth year	QUANTITY	0.98+
One example	QUANTITY	0.98+
SQL	TITLE	0.98+
both worlds	QUANTITY	0.98+
both	QUANTITY	0.98+
Linux	TITLE	0.97+
one	QUANTITY	0.97+
Spark	TITLE	0.97+
Azure	TITLE	0.97+
NYC	LOCATION	0.97+
two guests	QUANTITY	0.97+
this week	DATE	0.97+
six years ago	DATE	0.97+
today	DATE	0.96+
CTO	PERSON	0.96+
Ignite	EVENT	0.96+
one form	QUANTITY	0.96+
80's	DATE	0.95+
Ignite	ORGANIZATION	0.95+
Hadoop	TITLE	0.95+
Azure	ORGANIZATION	0.95+
single	QUANTITY	0.95+
Oliver Chu	PERSON	0.94+
Azure Cloud	TITLE	0.93+
one easy way	QUANTITY	0.93+
WANdisco	TITLE	0.91+

Jagane Sundar, WANdisco | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone here live in New York City. This is theCUBE special presentation of our annual event with theCUBE and Wikibon Research called BigData NYC, it's our own event that we have every year, celebrating what's going on in the big data world now. It's evolving to all data, cloud applications, AI, you name it, it's happening. In the enterprise, the impact is huge for developers, the impact is huge. I'm John Furrier, cohost of the theCUBE, with Peter Burris, Head of Research, SiliconANGLE Media and General Manager of Wikibon Research. Our next guest is Jagane Sundar, who's the CTO of WANdisco, Cube alumni, great to see you again as usual here on theCUBE. >> Thank you John, thank you Peter, it's great to be back on theCUBE. >> So we've been talking the big data for many years, certainly with you guys, and it's been a great evolution. I don't want to get into the whole backstory and history, we covered that before, but right now is a really, really important time, we see you know the hurricanes come through, we see the floods in Texas, we've seen Florida, and Puerto Rico now on the main conversation. You're seeing it, you're seeing disasters happen. Disaster recovery's been the low hanging fruit for you guys, and we talked about this when New York City got flooded years and years ago. This is a huge issue for IT, because they have to have disaster recovery. But now it's moving more beyond just disaster recovery. It's cloud. What's the update from WANdisco? You guys have a unique perspective on this. >> Yes, absolutely. So we have capabilities to replicate between the cloud and Hadoop multi data centers across geos, so disasters are not a problem for us. And we have some unique technologies we use. One of the things we do is we can replicate in an active-active mode between different cloud vendors, between cloud and on-prem Hadoop, and we are the only game in town. Nobody else can do that. >> So okay let me just stop right there. When you say the only game in town I got a little skeptic here. Are you saying that nobody does active-active replication at all? >> That is exactly what I'm saying. We had some wonderful announcements from Hortonworks, they have a great product called the Dataplane. But if you dig deep, you'll find that it's actually an active-passive architecture, because to do active-active, you need this capability called the Paxos algorithm for resolving conflict. That's a very hard algorithm to implement. We have over 10 years' experience in that. That's what gives us our ability to do this active-active replication, between clouds, between on-prem and cloud. >> All right so just to take that a step further, I know we're having a CTO conversation, but the classic cliche is skate to where the puck is going to be. So you kind of didn't just decide one morning you're going to be the active-active for cloud. You kind of backed into this. You know the world spun in your direction, the puck came to you guys. Is that a fair statement? >> That is a very fair statement. We've always known there's tremendous value in this technology we own, and with the global infrastructure trends, we knew that this was coming. It wasn't called the cloud when we started out, but that's exactly what it is now, and we're benefiting from it. >> And the cloud is just a data center, it's just, you don't own it. (mumbles) Peter, what's your reaction to this? Because when he says only game in town, implies some scarcity. >> Well, WANdisco has a patent, and it actually is very interesting technology, if I can summarize very quickly. You do continuous replication based on writes that are performed against the database, so that you can have two writers and two separate databases and you guarantee that they will be synchronized at some point in time because you guarantee that the writing of the logs and the messaging to both locations >> Absolutely. >> in order, which is a big issue. You guys put a stamp on the stuff, and it actually writes to the different locations with order guaranteed, and that's not the way most replication software works. >> Yes, that's exactly right. That's very hard to do, and that's the only way for you to allow your clients in different data centers to write to the same data store, whether it's a database, a Hadoop folder, whether it's a bucket in a cloud object store, it doesn't matter. The core fact remains, the Paxos algorithm is the only way for you to do active-active replication, and ours is the only Paxos implementation that can work over the >> John: And that's patented by you guys? >> Yes, it's patented. >> And so someone to replicate that, they'd have to essentially reverse engineer and have a little twist on it to not get around the patents. Are you licensing the technology or are you guys hoarding it for yourselves? >> We have different ways of engaging with partners. We are very reasonable with that, and we work with several powerful partners >> So you partner with the technology. >> Yes. >> But the key thing, John, in answer to your question is that it's unassailable. I mean there's no argument, that is, companies move more towards a digital way of doing things, largely driven by what customers want, your data becomes more of an asset. As you data becomes more of an asset, you make money by using that data in more places, more applications and more times. That is possible with data, but the problem you end up with consistency issues, and for certain applications, it's not an issue, you're basically writing, or if you're basically reading data it's not an issue. But the minute that you're trying to write on behalf of a particular business event or a particular value proposition, then now you have a challenge, you are limited in how you can do it unless you have this kind of a technology. And so this notion of continuous replication in a world that's going to become increasingly dependent upon data, data that is increasingly distributed, data that you want to ensure has common governance and policy in place, technologies like WANdisco provides are going to be increasingly important to the overall way that a business organizes itself, institutes its work and makes sure it takes care of its data assets. >> Okay, so my next question then, thanks for the clarification, it's good input there and thanks for summarizing it like that, 'cause I couldn't have done that. But when we last talked, I always was enamored by the fact that you guys have the data center replication thing down. I always saw that as a great thing for you guys. Okay, I get that, that's an on-premise situation, you have active-active, good for disaster recovery, lot of use cases, people should be beating down your door 'cause you have a better mousetrap, I get that. Now how does that translate to the cloud? So take me through why the cloud now fits nicely with that same paradigm. >> So, I mean, these are industry trends, right. What we've found is that the cloud object stores are very, very cost effective and efficient, so customers are moving towards that. They're using their Hadoop applications but on cloud object stores. Now it's trivial for us to add plugins that enable us to replicate between a cloud object store on one side, and a Hadoop on the other side. It could also be another cloud object store from a different cloud provider on the other side. Once you have that capability, now customers are freed from lock-in from either a cloud vendor or a Hadoop vendor, and they love that, they're looking at it as another way to leverage their data assets. And we enable them to do that without fear of lock-in from any of these vendors. >> So on the cloud side, the regions have always been a big thing. So we've heard Amazon have a region down here, and there was fix it. We saw at VMworld push their VMware solution to only one western region. What's the geo landscape look like in the cloud? Does that relate to anything in your tech? >> So yes, it does relate, and one of the things that people forget is that when you create an Amazon S3 bucket, for example, you specify a region. Well, but this is the cloud, isn't it worldwide? Turns out that object store actually resides in one region, and you can use some shaky technologies like cross-region replication to eventually get the data to the other region. >> Peter: Which just boosts the prices you pay. >> Yes, not just boost the price. >> Well they're trying to save price but then they're exposed on reliability. >> Reliability, exactly. You don't know when the data's going to be there, there are no guarantees. What we offer is, take your cloud storage, but we'll guarantee that we can replicate it in a synchronous fashion to another region. Could be the same provider, could be another provider. That gives tremendous benefits to the customers. >> So you actually have a guarantee when you go to customers, say with an SLA guarantee? Do you back it up with like money back, what's the guarantee? >> So the guarantees are, you know we are willing to back it up with contracts and such like, and our customers put us through rigorous testing procedures, naturally. But we stand up to every one of those. We can scale and maintain the consistency guarantees that they need for modern businesses. >> Okay, so take me through the benefits. Who wants this? Because you can almost get kind of sucked into the complexities of it, and the nuances of cloud and everything as Peter laid out, it's pretty complex even as he simplified it. Who buys this? (laughs) I mean, who's the guy, is it the IT department, is it the ops guy, is it the facilities, who... >> So we sell to the IT departments, and they absolutely love the technology. But to go back to your initial statement, we have all these disasters happening, you know, hopefully people are all doing reasonably okay at the end of these horrible disasters, but if you're an enterprise of any size, it doesn't have to be a big enterprise, you cannot go back to your users or customers and say that because of a hurricane you cannot have access to your data. That's sometimes legally not allowed, and other times it's just suicide for a business >> And HPE in Houston, it's a huge plant down there. >> Jagane: Indeed. >> They got hit hard. >> Yep, in those sort of circumstances, you want to make sure that your data is available in multiple data centers spread throughout the world, and we give you that capability. >> Okay, what are some of the successes? Let's talk through now, obviously you've got the technology, I get that. Where's the stakes in the ground? Who's adopting it? I know you do a lot of biz dev deals. I don't know if they're actually OEM-type deals, or they're just licensing deals. Take us through to where your successes are with this technology. >> So, biz dev wise, we have a mix of OEM deals and licenses and co-selling agreements. The strong ones are all OEMs, of course. We have great partnerships with IBM, Amazon, Microsoft, just wonderful partnerships. The actual end customers, we started off selling mostly to the financial industry because they have a legal mandate, so they were the first to look into this sort of a thing. But now we've expanded into automobile companies. A lot of the auto companies are generating vast amounts of data from their cars, and you can't push all that data into a single data center, that's just not reasonable. You want to push that data into a single data store that's distributed across the world in just wherever the car is closest to. We offer that capability that nobody else can, so that we've got big auto manufacturers signed up, we've got big retailers signed up for exactly the same capability. You cannot imagine ingesting all that data into a single location. You want this replicated across, you want it available no matter what happens to any single region or a data center. So we've got tremendous success in retail, banking, and a lot of this is through partnerships again. >> Well congratulations, I got to ask, you know, what's new with you guys? Obviously you have success with the active-active. We'll dig into the Hortonworks things to check your comment around them not having it, so we'll certainly look with the Dataplane, which we like. We interviewed Rob Bearden. Love the announcement, but they don't have the active-active, we're going to document that, and get that on the record. But you guys are doing well. What's new here, what's in New York, what are some of your wins, can you just give a quick update on what's going on at WANdisco? >> Okay, so quick recap, we love the Hortonworks Dataplane as well. We think that we can build value into that ecosystem by building a plugin for them. And we love the whole technology. I have wonderful friends there as well. As for our own company, we see all of our, a lot of our business coming from cloud and hybrid environments. It's just the reality of the situation. You had, you know, 20 years ago, you had NFS, which was the great appender of all storage, but turned out to be very expensive, and you had 10 years, seven years ago you had HDFS come along, and that appended the cost model of NFS and SANs, which those industries were still working their way through. And now we have cloud object stores, which have appended the HDFS model, it's much more cost-efficient to operate using cloud object stores. So we will be there, we have replication products for that. >> John: And you're in the major clouds, you in Azure? >> Yes, we are in Azure. >> Google? >> Jagane: Yes, absolutely. >> AWS? >> AWS, of course. >> Oracle? >> Oracle, of course. >> So you got all the top four companies. >> We're in all of them. >> All right, so here's the next question is, >> And you're also in IBM stuff too. >> Yes, we're built tightly into IBM >> So you've got a pretty strong legacy >> And a monopoly. >> On the mainframe. >> Like the fiber channel of replication. (John and Jagane laugh) That was a bad analogy. I mean it's like... Well, I mean fiber channel has only limited suppliers 'cause they have unique technology, it was highly important. >> But the basic proposition is look, any customer that wants to ensure that a particular data source is going to be available in a distributed way, and you're going to have some degree of consistency, is going to look at this as an option. >> Yes. >> Well you guys certainly had a great team under your leadership, it's got great tech. The final question I have for you here is, you know, we've had many conversations about the industry, we like to pontificate, I certainly like to speculate, but now we have eight years of history now in the big data world, we look back, you know, we're doing our own event in New York City, you know, thanks to great support from you guys and other great friends in the community. Appreciate everyone out there supporting theCUBE, that's awesome. But the world's changed. So I got to ask you, you're a student of the industry, I know that and knowing you personally. What's been the success formula that keeps the winners around today, and what do people need to do going forward? 'Cause we've seen the train wreck, we've seen the dead bodies in the industry, we've kind of seen what's happened, there've been some survivors. Why did the current list of characters and companies survive, and what's the winning formula in your opinion to stay relevant as big data grows in a huge way from IoT to AI cloud and everything in between? >> I'll quote Stephen Hawking in this. Intelligence is the capability to adapt to changes. That's what keeps industries, that's what keeps companies, that what keeps executives around. If you can adapt to change, if you can see things coming, and adapt your core values, your core technology to that, you can offer customers a value proposition that's going to last a long time. >> And in a big data space, what is that adaptive key focus, what should they be focused on? >> I think at this point, it's extracting information from this volume of data, whether you use machine learning in the modern days, or whether it was simple hive queries, that's the value proposition, and making sure the data's available everywhere so you can do that processing on it, that remains the strength. >> So the whole concept of digital business suggests that increasingly we're going to see our assets rendered in some form as data. >> Yes. >> And we want to be able to ensure that that data is able to be where it needs to be when it needs to be there for any number of reasons. It's a very, very interesting world we're entering into. >> Peter, I think you have a good grasp on this, and I love the narrative of programming the world in real time. What's the phrase you use? It's real time but it's programming the world... Programming the real world. >> Yeah, programming the real world. >> That's a huge, that means something completely, it's not a tech, it's a not a speed or feed. >> Well the way we think about it, is that we look at IoT as a big information transducer, where information's in one form, and then you turn it into another form to do different kinds of work. And that big data's a crucial feature in how you take data from one form and turn it into another form so that it can perform work. But then you have to be able to turn that around and have it perform work back in the real world. There's a lot of new development, a lot of new technology that's coming on to help us do that. But any way you look at it, we're going to have to move data with some degree of consistency, we're still going to have to worry about making sure that if our policy says that that action needs to take place there, and that action needs to take place there, that it actually happens the way we want it to, and that's going to require a whole raft of new technologies. We're just at the very beginning of this. >> And active-active, things like active-active in what you're talking about really is about value creation. >> Well the thing that makes active-active interesting is, again, borrowing from your terms, it's a new term to both of us, I think, today. I like it actually. But the thing that makes it interesting is the idea that you can have a source here that is writing things, and you can have a source over there that are writing things, and as a consequence, you can nonetheless look at a distributed database and keep it consistent. >> Consistent, yeah. >> And that is a major, major challenge that's going to become increasingly a fundamental feature of our digital business as well. >> It's an enabling technology for the value creation and you call it work. >> Yeah, that's right. >> Transformation of work. Jagane, congratulations on the active-active, and WANdiscos's technology and all your deals you're doing, got all the cloud locked up. What's next? Well you going to lock up the edge? You're going to lock up the edge too, the cloud. >> We do like this notion of the edge cloud and all the intermediate steps. We think that replicating data between those systems or running consistent compute across those systems is an interesting problem for us to solve. We've got all the ingredients to solve that problem. We will be on that. >> Jagane Sundar, CTO of WANdisco, back on theCUBE, bringing it down. New tech, whole new generation of modern apps and infrastructure happening in distributed and decentralized networks. Of course theCUBE's got it covered for you, and more live coverage here in New York City for BigData NYC, our annual event, theCUBE and Wikibon here in Hell's Kitchen in Manhattan, more live coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media great to see you again as usual here on theCUBE. Thank you John, thank you Peter, Disaster recovery's been the low hanging fruit for you guys, One of the things we do is we can replicate Are you saying that nobody does because to do active-active, you need this capability the puck came to you guys. and with the global infrastructure trends, And the cloud is just a data center, and the messaging to both locations You guys put a stamp on the stuff, is the only way for you to do active-active replication, or are you guys hoarding it for yourselves? and we work with several powerful partners But the key thing, John, in answer to your question that you guys have the data center replication thing down. Once you have that capability, Does that relate to anything in your tech? and you can use some shaky technologies but then they're exposed on reliability. Could be the same provider, could be another provider. So the guarantees are, you know we are willing to is it the ops guy, is it the facilities, who... you cannot have access to your data. And HPE in Houston, and we give you that capability. I know you do a lot of biz dev deals. and you can't push all that data into a single data center, and get that on the record. and that appended the cost model of NFS and SANs, So you got all Like the fiber channel of replication. But the basic proposition is look, in the big data world, we look back, you know, Intelligence is the capability to adapt to changes. and making sure the data's available everywhere So the whole concept of digital business is able to be where it needs to be What's the phrase you use? That's a huge, that means something completely, that it actually happens the way we want it to, in what you're talking about really is about is the idea that you can have a source here that's going to become increasingly and you call it work. Well you going to lock up the edge? We've got all the ingredients to solve that problem. and more live coverage here in New York City

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Jagane Sundar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Jagane	PERSON	0.99+
John Furrier	PERSON	0.99+
Peter	PERSON	0.99+
WANdisco	ORGANIZATION	0.99+
Stephen Hawking	PERSON	0.99+
two writers	QUANTITY	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
Puerto Rico	LOCATION	0.99+
Texas	LOCATION	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Wikibon Research	ORGANIZATION	0.99+
VMworld	ORGANIZATION	0.99+
Florida	LOCATION	0.99+
Google	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
both	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
two separate databases	QUANTITY	0.99+
20 years ago	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
Cube	ORGANIZATION	0.99+
first	QUANTITY	0.99+
WANdiscos	ORGANIZATION	0.98+
over 10 years'	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
SiliconANGLE Media	ORGANIZATION	0.98+
one form	QUANTITY	0.97+
Wikibon	ORGANIZATION	0.97+
One	QUANTITY	0.97+
today	DATE	0.97+
seven years ago	DATE	0.96+
one	QUANTITY	0.96+
one region	QUANTITY	0.96+
Hadoop	TITLE	0.96+
Hortonworks Dataplane	ORGANIZATION	0.95+
NYC	LOCATION	0.95+
four companies	QUANTITY	0.94+
single region	QUANTITY	0.94+
years	DATE	0.93+
Dataplane	ORGANIZATION	0.91+
single location	QUANTITY	0.91+
single data center	QUANTITY	0.91+
HPE	ORGANIZATION	0.9+
one side	QUANTITY	0.9+
one western	QUANTITY	0.89+
Paxos	TITLE	0.89+
Paxos	OTHER	0.88+
both locations	QUANTITY	0.88+
10 years	QUANTITY	0.88+
BigData	EVENT	0.87+
Azure	TITLE	0.86+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Dataplane: