Image Title

Search Results for Apache NiFi:

John Kreisa, Hortonworks | Dataworks Summit EU 2018


 

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AlanPERSON

0.99+

James KobielusPERSON

0.99+

JimPERSON

0.99+

Rob BeardenPERSON

0.99+

IBMORGANIZATION

0.99+

John KreisaPERSON

0.99+

EuropeLOCATION

0.99+

JohnPERSON

0.99+

AsiaLOCATION

0.99+

AWSORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

BerlinLOCATION

0.99+

yesterdayDATE

0.99+

AfricaLOCATION

0.99+

South AmericaLOCATION

0.99+

SiliconAngle MediaORGANIZATION

0.99+

U.S.LOCATION

0.99+

1,250QUANTITY

0.99+

Scott GnauPERSON

0.99+

1,300QUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

seven yearsQUANTITY

0.99+

six and a half yearsQUANTITY

0.99+

JapanLOCATION

0.99+

HadoopTITLE

0.99+

AsianLOCATION

0.99+

secondQUANTITY

0.98+

over 2,300 partnersQUANTITY

0.98+

todayDATE

0.98+

two-thirdsQUANTITY

0.98+

19 different countriesQUANTITY

0.98+

Dataworks SummitEVENT

0.98+

more than 51 countriesQUANTITY

0.98+

Hadoop 3.0TITLE

0.98+

firstQUANTITY

0.98+

JamesPERSON

0.98+

Data Steward StudioORGANIZATION

0.98+

Dataworks Summit EU 2018EVENT

0.98+

Dataworks Summit 2018EVENT

0.97+

ClouderaORGANIZATION

0.97+

MapRORGANIZATION

0.96+

GDPRTITLE

0.96+

DataPlane ServicesORGANIZATION

0.96+

SingaporeLOCATION

0.96+

year sixQUANTITY

0.95+

2018EVENT

0.95+

Wikibon SiliconAngle MediaORGANIZATION

0.94+

IndiaLOCATION

0.94+

HadoopORGANIZATION

0.94+

APACORGANIZATION

0.93+

Big Data AnalyticsORGANIZATION

0.93+

3.1TITLE

0.93+

Wall Street JournalTITLE

0.93+

oneQUANTITY

0.93+

ApacheORGANIZATION

0.92+

WikibonORGANIZATION

0.92+

NiFiTITLE

0.92+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018


 

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
EuropeLOCATION

0.99+

ScottPERSON

0.99+

IBMORGANIZATION

0.99+

BerlinLOCATION

0.99+

Scott GnauPERSON

0.99+

HortonworksORGANIZATION

0.99+

TeradataORGANIZATION

0.99+

Last yearDATE

0.99+

May 25thDATE

0.99+

five weeksQUANTITY

0.99+

Mandy ChessellPERSON

0.99+

GDPRTITLE

0.99+

MunichLOCATION

0.99+

Rob BeardenPERSON

0.99+

second serviceQUANTITY

0.99+

30 yearsQUANTITY

0.99+

bothQUANTITY

0.99+

tomorrowDATE

0.99+

firstQUANTITY

0.99+

Berlin, GermanyLOCATION

0.99+

secondQUANTITY

0.99+

DataPlaneORGANIZATION

0.99+

sixth yearQUANTITY

0.98+

three timesQUANTITY

0.98+

first intervieweeQUANTITY

0.98+

Dataworks SummitEVENT

0.98+

oneQUANTITY

0.97+

this morningDATE

0.97+

DataWorks Summit 2018EVENT

0.97+

MapReduceORGANIZATION

0.96+

HadoopTITLE

0.96+

HadoopORGANIZATION

0.96+

one timeQUANTITY

0.96+

35QUANTITY

0.96+

single paneQUANTITY

0.96+

NiFiORGANIZATION

0.96+

todayDATE

0.94+

DataWorks Summit Europe 2018EVENT

0.93+

Data Steward StudioORGANIZATION

0.93+

Dataworks Summit EU 2018EVENT

0.92+

about seven years agoDATE

0.91+

a year orDATE

0.88+

yearsDATE

0.87+

StormORGANIZATION

0.87+

WikibonORGANIZATION

0.86+

Apache NiFiORGANIZATION

0.85+

The CubePERSON

0.84+

North AmericanOTHER

0.84+

DataWorksORGANIZATION

0.84+

Data PlaneORGANIZATION

0.76+

Data Steward StudioTITLE

0.75+

KafkaORGANIZATION

0.75+

Scott Gnau, Hortonworks - DataWorks Summit 2017


 

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Lisa MartinPERSON

0.99+

George GilbertPERSON

0.99+

ScottPERSON

0.99+

IBMORGANIZATION

0.99+

80%QUANTITY

0.99+

San JoseLOCATION

0.99+

10%QUANTITY

0.99+

90%QUANTITY

0.99+

Scott GnauPERSON

0.99+

Silicon ValleyLOCATION

0.99+

IBMsORGANIZATION

0.99+

PythonTITLE

0.99+

two aspectsQUANTITY

0.99+

five secondsQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

OneQUANTITY

0.99+

DataWorks Summit 2017EVENT

0.98+

Horton WorksORGANIZATION

0.98+

HadoopTITLE

0.98+

oneQUANTITY

0.98+

DataWorks SummitEVENT

0.98+

todayDATE

0.98+

eachQUANTITY

0.98+

five yearsQUANTITY

0.97+

thirdQUANTITY

0.96+

second thingQUANTITY

0.96+

Apache CacaORGANIZATION

0.95+

three personasQUANTITY

0.95+

this morningDATE

0.95+

Apache NifiORGANIZATION

0.95+

this morningDATE

0.94+

three categoriesQUANTITY

0.94+

CTOPERSON

0.93+

The CubeTITLE

0.9+

SequelPERSON

0.89+

Apache RangerORGANIZATION

0.88+

two thingsQUANTITY

0.86+

hundred timesQUANTITY

0.85+

PortworksORGANIZATION

0.82+

earlier todayDATE

0.8+

Data Science ExperienceTITLE

0.79+

The CubeORGANIZATION

0.78+

Apache AtlasORGANIZATION

0.75+

StormORGANIZATION

0.74+

day oneQUANTITY

0.74+

waveEVENT

0.69+

one of the keynotesQUANTITY

0.66+

lotsQUANTITY

0.63+

yearsQUANTITY

0.53+

HortonworksEVENT

0.5+

lots of dataQUANTITY

0.49+

SequelORGANIZATION

0.46+

FlowORGANIZATION

0.39+