Scott Gnau, Hortonworks | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicone Valley, it's theCUBE. Covering Datawork Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of Dataworks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost James Kobielus. We're joined by Scott Gnau, he is the chief technology officer at Hortonworks. Welcome back to theCUBE, Scott. >> Great to be here. >> It's always fun to have you on the show. So, you have really spent your entire career in the data industry. I want to start off at 10,000 feet, and just have you talk about where we are now, in terms of customer attitudes, in terms of the industry, in terms of where customers feel, how they're dealing with their data and how they're thinking about their approach in their business strategy. >> Well I have to say, 30 plus years ago starting in the data field, it wasn't as exciting as it is today. Of course, I always found it very exciting. >> Exciting means nerve-wracking. Keep going. >> Or nerve-wracking. But you know, we've been predicting it. I remember even you know, 10, 15 years ago before big data was a thing, it's like oh all this data's going to come, and it's going to be you know 10x what it is. And we were wrong. It was like 5000x, you know what it is. And I think the really exciting part is that data really used to be relegated frankly, to big companies as a derivative work of ERP systems, and so on and so forth. And while that's very interesting, and certainly enabled a whole level of productivity for industry, when you compare that to all of the data flying around everywhere today, whether it be Twitter feeds and even doing live polls, like we did in the opening session today. Data is just being created everywhere. And the same thing applies to that data that applied to the ERP data of old. And that is being able to harness, manage and understand that data is a new business creating opportunity. And you know, we were with some analysts the other day, and I think one of the more quoted things that came out of that when I was speaking with them, was really, like railroads and shipping in the 1800s and oil in the 1900s, data really is the wealth creator of this century. And so that creates a very nerve-wracking environment. It also creates an environment, a very agile and very important technological breakthroughs that enable those things to be turned into wealth. >> So thinking about that, in terms of where we are at this point in time and on the main stage this morning someone had likened it to the interstate highway system, that really revolutionized transportation, but also commerce. >> I love that actually. I may steal it in some of my future presentations. >> That's good but we'll know where you pilfered it. >> Well perhaps if data is oil the edge, in containerized applications and piping data, you know, microbursts of data across the internet of things, is sort of like the new fracking. You know, you're being able to extract more of this precious resource from the territory. >> Hopefully not quite as damaging to the environment. >> Maybe not. I'm sorry for environmentalist if I just offended you, I apologize. >> But I think you know, all of those analogies are very true, and I particularly like the interstate one this morning. Because when I think about what we've done in our core http platform, and I know Arun was here talking about all the great advances that we built into this, the kind of the core hadoop platform. Very traditional. Store data, analyze data but also bring in new kinds of algorithms, rapid innovation and so on. That's really great but that's kind of half of the story. In a device connected world, in a consumer centric world, capturing data at the edge, moving and processing data at the edge is the new normal, right? And so just like the interstate highway system actually created new ways of commerce because we could move people and things more efficiently, moving data and processing data more efficiently is kind of the second part of the opportunity that we have in this new deluge of data. And that's really where we've been with our Hortonworks data flow. And really saying that the complete package of managing data from origination at the edge all the way through analytic to decision that's triggered back at the edge is like the holy grail, right? And building a technology for that footprint, is why I'm certainly excited today. It's not the caffeine, it's just the opportunity of making all of that work. >> You know, one of the, I think the key announcement for me at this show, that you guys made on HDP 3.0 was containerization of more of the capabilities of your distributed environment so that these capabilities, in terms of processing. First of all, capturing and analyzing an moving that data, can be pushed closer to the end points. Can you speak a bit Scott, about this new capability or this containerization support? Within HDP 3.0 but really in your broader portfolio and where you're going with that in terms of addressing edge applications perhaps, autonomous vehicles or you know, whatever you might put into a new smart phone or whatever you put at the edge. Describe the potential containerizations to sort of break this ecosystem wide open. >> Yeah, I think there are a couple of aspects to containerization and by the way, we're like so excited about kind of the cloud first, containerized HDP 3.0 that we launched here today. There's a lot of great tech that our customers have been clamoring for that they can take advantage of. And it's really just the beginning, which again is part of the excitement of being in the technology space and certainly being part of Hortonworks. So containerization affords a couple of things. Certainly, agility. Agility in deploying applications. So, you know for 30 years we've built these enterprise software stacks that were very integrated, hugely complicated systems that could bring together multiple different applications, different workloads and manage all that in a multi-tendency kind of environment. And that was because we had to do that, right? Servers were getting bigger, they were more powerful but not particularly well distributed. Obviously in a containerized world, you now turn that whole paradigm on its head and you say, you know what? I'm just going to collect these three microservices that I need to do this job. I can isolate them. I can have them run in a server-less technology. I can actually allocate in the cloud servers to go run, and when they're done they go away. And I don't pay for them anymore. So thinking about kind of that from a software development deployment implementation perspective, there huge implications but the real value for customers is agility, right? I don't have to wait until next year to upgrade my enterprise software stack to take advantage of this new algorithm. I can simply isolate it inside of a container, have it run, and have it go away. And get the answer, right? And so when I think about, and a number of our keynotes this morning were talking about just kind of the exponential rate of change, this is really the net new norm. Because the only way we can do things faster, is in fact to be able to provide this. >> And it's not just microservices. Also orchestrating them through Kubernetes, and so forth, so they can be. >> Sure. That's the how versus yeah. >> Quickly deployed as an ensemble and then quickly de-provisioned when you don't need them anymore. >> Yeah so then there's obviously the cost aspect, right? >> Yeah. >> So if you're going to run a whole bunch of stuff or even if you have something as mundane as a really big merge join inside of hive. Let me spin up a thousand extra containers to go do that big thing, and then have them go away when it's done. >> And oh, by the way, you'll be deployed on. >> And only pay for it while I'm using it. >> And then you can possibly distribute those containers across different public clouds depending on what's most cost effective at any point in time Azure or AWS or whatever it might be. >> And I tease with Arun, you know the only thing that we haven't solved is for the speed of light, but we're working on it. >> In talking about how this warp speed change, being the new norm, can you talk about some of the most exciting use cases you've seen in terms of the customers and clients that are using Hortonworks in the coolest ways. >> Well I mean obviously autonomous vehicles is one that we all captured all of our imagination. 'Cause we understand how that works. But it's a perfect use case for this kind of technology. But the technology also applies in fraud detection and prevention. It applies in healthcare management, in proactive personalized medicine delivery, and in generating better outcomes for treatment. So, you know, all across. >> It will bind us in every aspect of our lives including the consumer realm increasingly, yeah. >> Yeah, all across the board. And you know one of the things that really changed, right, is well a couple things. A lot of bandwidth so you can start to connect these things. The devices themselves are particularly smart, so you don't any longer have to transfer all the data to a mainframe and then wait three weeks, sorry, wait three weeks for your answer and then come back. You can have analytic models running on and edge device. And think about, you know, that is really real time. And that actually kind of solves for the speed of light. 'Cause you're not waiting for those things to go back and forth. So there are a lot of new opportunities and those architectures really depend on some of the core tenets of ultimately containerization stateless application deployment and delivery. And they also depend on the ability to create feedback loops to do point-to-point and peer kinds of communication between devices. This is a whole new world of how data get moved and how the decisions around date movement get made. And certainly that's what we're excited about, building with the core components. The other implication of all of this, and we've know each other for a long time. Data has gravity. Data movements expensive. It takes time, frankly, you have to pay for the bandwidth and all that kind of stuff. So being able to play the data where it lies becomes a lot more interesting from an application portability perspective and with all of these new sensors, devices and applications out there, a lot more data is living its entire lifecycle in the cloud. And so being able to create that connective tissue. >> Or as being as terralexical on the edge. >> And even on the edge. >> In with machine learn, let me just say, butt in a second. One of the areas that we're focusing on increasingly in Wikibot in terms of our focus on machine learning at the edge, is more and more machine learning frameworks are coming into the browser world. Javascript for the most like tenser flow JS, you know more of this inferencing and training is going to happen inside your browser. That blows a lot of people's minds. It may not be heavy hitting machine learning, but it'll be good enough for a lot of things that people do in their normal life. Where you don't want to round trip back to the cloud. It's all happening right there, in you know, Chrome or whatever you happen to be using. >> Yeah and so the point being now, you know when I think about the early days, talking about scalability, I remember ship being my first one terabyte database. And then the first 10 terabyte database. Yeah, it doesn't sound very exciting. When I think about scalability of the future, it's really going to, scalability is not going to be defined as petabytes or exabytes under management. It's really going to be defined as petabytes or exabytes affected across a grid of storage and processing devices. And that's a whole new technology paradigm, and really that's kind of the driving force behind what we've been building and what we've been talking about at this conference. >> Excellent. >> So when you're talking about these things. I mean how much, are the companies themselves prepared, and do they have the right kind of talent to use the kinds of insights that you're able to extract? And then act on them in the real time. 'Cause you're talking about how this is saving a lot of the waiting around time. So is this really changing the way business gets done, and do companies have the talent to execute? >> Sure. I mean it's changing the way business gets done. We showed a quote on stage this morning from the CEO of Marriott, right? So, I think there a couple of pieces. One is business are increasingly data driven and business strategy is increasingly the data strategy. And so it starts from the top, kind of setting that strategy and understanding the value of that asset and how that needs to be leveraged to drive new business. So that's kind of one piece. And you know, obviously there are more and more folks kind of coming to the realization that that is important. The other thing that's been helpful is, you know, as with any new technology there's always kind of the startup shortage of resource and people start to spool up and learn. You know the really good news, and for the past 10 years I've been working with a number of different university groups. Parents are actually going to universities and demanding that the curriculum include data, and processing and big data and all of these technologies. Because they know that their children educated in that kind of a world, number one, they're going to have a fun job to go to everyday. 'Cause it's going to be something different everyday. But number two they're going to be employed for life. (laughing) >> Yeah. >> They will be solvent. >> Frankly the demand has actually created a catch up in supply that we're seeing. And of course, you know, as tools start to get more mature and more integrated, they also become a little bit easier to use. You know, less, there's a little bit easier deployment and so on. So a combination of, I'm seeing a really good supply, there really, obviously we invest in education through the community. And then frankly, the education system itself, and folks saying this is really the hot job of the next century. You know, I can be the new oil barren. Or I can be the new railroad captain. It's actually creating more supply which is also very helpful. >> Data's the heart of what I call the new stem cell. It's science, technology, engineering, mathematics that you want to implant in the brains of the young as soon as possible. I hear ya. >> Yeah, absolutely. >> Well Scott thanks so much for coming on. But I want to first also, we can't let you go without the fashion statement. You arrived on set wearing it. >> The elephants. >> I mean it was quite a look. >> Well I did it because then you couldn't see I was sweating on my brow. >> Oh please, no, no, no. >> 'Cause I was worried about this tough interview. >> You know one of the things I love about your logo, and I'll just you know, sounds like I'm fawning. The elephant is a very intelligent animal. >> It is indeed. >> My wife's from Indonesia. I remember going back one time they had Asian elephants at a one of these safari parks. And watching it perform, and then my son was very little then. The elephant is a very sensitive, intelligent animal. You don't realize 'till you're up close. They pick up all manner of social cues. I think it's an awesome symbol for a company that's all about data driven intelligence. >> The elephant never forgets. >> Yeah. >> That's what we know. >> That's right we never forget. >> Him forget 'cause he's got a brain. Or she, I'm sorry. He or she has a brain. >> And it's data driven. >> Yeah. >> Thanks very much. >> Great. Well thanks for coming on theCUBE. I'm Rebecca Knight for James Kobielus. We will have more coming up from Dataworks just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicone Valley, he is the chief technology in terms of the industry, in the data field, Exciting means nerve-wracking. and shipping in the 1800s and on the main stage this I love that actually. where you pilfered it. is sort of like the new fracking. to the environment. I apologize. And really saying that the of more of the capabilities of the cloud servers to go run, and so forth, so they can be. and then quickly de-provisioned and then have them go away when it's done. And oh, by the way, And then you can possibly is for the speed of light, Hortonworks in the coolest ways. But the technology also including the consumer and how the decisions around terralexical on the edge. One of the areas that we're Yeah and so the point being now, the talent to execute? and demanding that the And of course, you know, in the brains of the young the fashion statement. then you couldn't see 'Cause I was worried and I'll just you know, and then my son was very little then. He or she has a brain. for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
James Kobielus	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Indonesia	LOCATION	0.99+
three weeks	QUANTITY	0.99+
30 years	QUANTITY	0.99+
10x	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Marriott	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
1900s	DATE	0.99+
1800s	DATE	0.99+
10,000 feet	QUANTITY	0.99+
Silicone Valley	LOCATION	0.99+
one piece	QUANTITY	0.99+
Dataworks Summit	EVENT	0.99+
AWS	ORGANIZATION	0.99+
Chrome	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
next year	DATE	0.98+
next century	DATE	0.98+
today	DATE	0.98+
30 plus years ago	DATE	0.98+
Javascript	TITLE	0.98+
second part	QUANTITY	0.98+
Twitter	ORGANIZATION	0.98+
first	QUANTITY	0.97+
Dataworks	ORGANIZATION	0.97+
One	QUANTITY	0.97+
5000x	QUANTITY	0.97+
Datawork Summit 2018	EVENT	0.96+
HDP 3.0	TITLE	0.95+
one	QUANTITY	0.95+
this morning	DATE	0.95+
HDP 3.0	TITLE	0.94+
three microservices	QUANTITY	0.93+
first one terabyte	QUANTITY	0.93+
First	QUANTITY	0.92+
DataWorks Summit 2018	EVENT	0.92+
JS	TITLE	0.9+
Asian	OTHER	0.9+
3.0	TITLE	0.87+
one time	QUANTITY	0.86+
a thousand extra containers	QUANTITY	0.84+
this morning	DATE	0.83+
15 years ago	DATE	0.82+
Arun	PERSON	0.81+
this century	DATE	0.81+
10,	DATE	0.8+
first 10 terabyte	QUANTITY	0.79+
couple	QUANTITY	0.72+
Azure	ORGANIZATION	0.7+
Kubernetes	TITLE	0.7+
theCUBE	EVENT	0.66+
parks	QUANTITY	0.59+
a second	QUANTITY	0.58+
past 10 years	DATE	0.57+
number two	QUANTITY	0.56+
Wikibot	TITLE	0.55+
HDP	COMMERCIAL_ITEM	0.54+
rd.	QUANTITY	0.48+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Scott Gnau, Hortonworks | Big Data SV 2018

>> Narrator: Live from San Jose, it's the Cube. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to the Cube's continuing coverage of Big Data SV. >> This is out tenth Big Data event, our fifth year in San Jose. We are down the street from the Strata Data Conference. We invite you to come down and join us, come on down! We are at Forager Tasting Room & Eatery, super cool place. We've got a cocktail event tonight, and a endless briefing tomorrow morning. We are excited to welcome back to the Cube, Scott Gnau, the CTO of Hortonworks. Hey, Scott, welcome back. >> Thanks for having me, and I really love what you've done with the place. I think there's as much energy here as I've seen in the entire show. So, thanks for having me over. >> Yeah! >> We have done a pretty good thing to this place that we're renting for the day. So, thanks for stopping by and talking with George and I. So, February, Hortonworks announced some news about Hortonworks DataFlow. What was in that announcement? What does that do to help customers simplify data in motion? What industries is it going to be most impactful for? I'm thinking, you know, GDPR is a couple months away, kind of what's new there? >> Well, yeah, and there are a couple of topics in there, right? So, obviously, we're very committed to, which I think is one of our unique value propositions, is we're committed to really creating an easy to use data management platform, as it were, for the entire lifecycle of data, from one data created at the edge and as data are streaming from one place to another place, and, at rest, analytics get run, analytics get pushed back out to the edge. So, that entire lifecycle is really the footprint that we're looking at, and when you dig a level into that, obviously, the data in motion piece is usually important, and So I think one a the things that we've looked at is we don't want to be just a streaming engine or just a tool for creating pipes and data flows and so on. We really want to create that entire experience around what needs to happen for data that's moving, whether it be acquisition at the edge in a protected way with provenance and encryption, whether it be applying streaming analytics as the data are flowing and everywhere kind of in between, and so that's what HDF represents, and what we released in our latest release, which, to your point, was just a few weeks ago, is a way for our customers to go build their data in motion applications using a very simple drag and drop GUI interface. So, they don't have to understand all of the different animals in the zoo, and the different technologies that are in play. It's like, "I want to do this." Okay, here's a GUI tool, you can have all of the different operators that are represented by the different underlying technologies that we provide as Hortonworks DataFlow, and you can stream them together, and then, you can make those applications and test those applications. One of the biggest enhancements that we did, is we made it very easy then for once those things are built in a laptop environment or in a dev environment, to be published out to production or to be published out to other developers who might want to enhance them and so on. So, the idea is to make it consumable inside of an enterprise, and when you think about data in motion and IOT and all those use cases, it's not going to be one department, one organization, or one person that's doing it. It's going to be a team of people that are distributed just like the data and the sensors, and, so, being able to have that sharing capability is what we've enhanced in the experience. >> So, you were just saying, before we went live, that you're here having speed dates with customers. What are some of the things... >> It's a little bit more sincere than that, but yeah. >> (laughs) Isn't speed dating sincere? It's 2018, I'm not sure. (Scott laughs) What are some of the things that you're hearing from customers, and how is that helping to drive what's coming out from Hortonworks? >> So, the two things that I'm hearing right, number one, certainly, is that they really appreciate our approach to the entire lifecycle of data, because customers are really experiencing huge data volume increases and data just from everywhere, and it's no longer just from the ERP system inside the firewall. It's from third party, it's from Sensors, it's from mobile devices, and, so, they really do appreciate kind of the territory that we cover with the tools and technologies we bring to market, and, so, that's been very rewarding. Clearly, customers who are now well into this path, they're starting to think about, in this new world, data governance, and data governance, I just took all of the energy out of the room, governance, it sounds like, you know, hard. What I mean by data governance, really, is customers need to understand, with all of this diverse, connected data everywhere, in the cloud, on PRIM, then Sensors, third party, partners, is, frankly, they need a trail of breadcrumbs that say what is it, where'd it come from, who had access to it, and then, what did they do with it? If you start to piece that together, that's what they really need to understand, the data estate that belongs to them, so they can turn that into refined product, and, so, when you then segway in one of your earlier questions, that GDPR is, certainly, a triggering point where if it's like, okay, the penalties are huge, oh my God, it's a whole new set of regulations that I have to comply with, and when you think about that trail of breadcrumbs that I just described, that actually becomes a roadmap for compliance under regulations like GDPR, where if a European customer calls up and says, "Forget my data.", the only way that you can guarantee that you forgot that person's data, is to actually understand where it all is, and that requires proper governance, tools, and techniques, and, so, when I say governance, it's, really, not like, you know, the governor and the government, and all that. That's an aspect, but the real, important part is how do I keep all of that connectivity so that I can understand the landscape of data that I've got access to, and I'm hearing a lot of energy around that, and when you think about an IOT kind of world, distributed processing, multiple hybrid cloud footprints, data is just everywhere, and, so, the perimeter is no longer fixed, it's kind of variable, and being able to keep track of that is a very important thing for our customers. >> So, continuing on that theme, Scott. Data lakes seem to be the first major new repository we added after we had data warehouses and data marts, and it looked like the governance solutions were sort of around that perimeter of the data lake. Tell us, you were alluding to, sort of, how many more repositories, whether at rest or in motion, there are for data. Do we have to solve the governance problem end-to-end before we can build meaningful applications? >> So, I would argue personally, that governance is one of the most strategic things for us as an industry, collectively, to go solve in a universal way, and what I mean by that, is throughout my career, which is probably longer than I'd like to admit, in an EDW centric world, where things are somewhat easier in terms of the perimeter and where the data came from, data sources were much more controlled, typically ERP systems, owned wholly by a company. Even in that era, true data governance, meta data management, and that provenance was never really solved adequately. There were 300 different solutions, none of which really won. They were all different, non-compatible, and the problem was easier. In this new world, with connected data, the problem is infinitely more difficult to go solve, and, so, that same kind of approach of 300 different proprietary solutions I don't think is going to work. >> So, tell us, how does that approach have to change and who can make that change? >> So, one of the things, obviously, that we're driving is we're leveraging our position in the open community to try to use the community to create that common infrastructure, common set of APIs for meta data management, and, of course, we call that Apache Atlas, and we work with a lot of partners, some of whom are customers, some of whom are other vendors, even some of whom could be considered competitors, to try to drive an Apache open source kind of project to become that standard layer that's common into which vendors can bring their applications. So, now, if I have a common API for tracking meta data in that trail of breadcrumbs that's commonly understood, I can bring in an application that helps customers go develop the taxonomy of the rules that they want to implement, and, then, that helps visualize all of the other functionality, which is also extremely important, and that's where I think specialization comes into play, but having that common infrastructure, I think, is a really important thing, because that's going to enable data, data lakes, IOT to be trusted, and if it's not trusted, it's not going to be successful. >> Okay, there's a chicken and an egg there it sounds like, potentially. >> Am I the chicken or the egg? >> Well, you're the CTO. (Lisa laughs) >> Okay. >> The thing I was thinking of was, the broader the scope of trust that you're trying to achieve at first, the more difficult the problem, do you see customers wanting to pick off one high value application, not necessarily that's about managing what's in Atlas, in the meta data, so much as they want to do an IOT app and they'll implement some amount of governance to solve that app. In other words, which comes first? Do they have to do the end-to-end meta data management and governance, or do they pick a problem off first? >> In this case, I think it's chicken or egg. I mean, you could start from either point. I see customers who are implementing applications in the IOT space, and they're saying, "Hey, this requires a new way to think of governance, "so, I'm going to go and build that out, but I'm going to "think about it being pluggable into the next app." I also see a lot of customers, especially in highly regulated industries, and especially in highly regulated jurisdictions, who are stepping back and saying, "Forget the applications, this is a data opportunity, "and, so, I want to go solve my data fabric, "and I want to have some consistency across "that data fabric into which I can publish data "for specific applications and guarantee "that, wholistically, I am compliant "and that I'm sitting inside of our corporate mission "and all of those things." >> George: Okay. >> So, one of the things you mention, and we talk about this a lot, is the proliferation of data. It's so many, so many different sources, and companies have an opportunity, you had mentioned the phrase data opportunity, there is massive opportunity there, but you said, you know, from even a GDR perspective alone, I can't remove the data if I don't know where it is to the breadcrumbs. As a marketer, we use terms like get a 360 degree view of your customer. Is that actually really something that customers can achieve leveraging a data. Can they actually really get, say a retailer, a 360, a complete view of their customer? >> Alright, 358. >> That's pretty good! >> And we're getting there. (Lisa laughs) Yeah, I mean, obviously, the idea is to get a much broader view, and 360 is a marketing term. I'm not a marketing person, >> Yes. But it, certainly, creates a much broader view of highly personalized information that help you interact with your customer better, and, yes, we're seeing customers do that today and have great success with it and actually change and build new business models based on that capability, for sure. The folks who've done that have realized that in this new world, the way that that works is you have to have a lot of people have access to a lot of data, and that's scary, because that's not the way it used to be, right? >> Right. >> It used to be you go to the DBA and you ask for access, and then, your boss has to sign off and say it's what you asked for. In this world, you need to have access to all of it. So, when you think about this new governance capability where as part of the governance integrated with security, personalized information can be encrypted, it can be blurred out, but you still have access to the data to look at the relationships to be found in the data to build out those sophisticated models. So, that's where not only is it a new opportunity for governance just because the sources, the variety at the different landscape, but it's, ultimately, very much required, because if you're the CSO, you're not going to give access to the marketing team all of its customer data unless you understand that, right, but it has to be, "I'm just giving it to you, "and I know that it's automatically protected." versus, "I'm going to let you ask for it." to be successful. >> Right. >> I guess, following up on that, it sounds like what we were talking about, chicken or egg. Are you seeing an accelerating shift from where data is sort of collected, centrally, from applications, or, what we hear on Amazon, is the amount coming off the edge is accelerating. >> It is, and I think that that is a big drive to, frankly, faster clouded option, you know, the analytic space, particularly, has been a laggard in clouded option for many reasons, and we've talked about it previously, but one of the biggest reasons, obviously, is that data has gravity, data movement is expensive, and, so, now, when you think about where data is being created, where it lives, being further out on the edge, and may live its entire lifecycle in the cloud, you're seeing a reversal of gravity more towards cloud, and that, again, creates more opportunities in terms of driving a more varied perimeter and just keeping track of where all the assets are. Finally, I think it also leads to this notion of managing entire lifecycle of data. One of the implications of that is if data is not going to be centralized, it's going to live in different places, applications have to be portable to move to where the data exists. So, when I think about that landscape of creating ubiquitous data management within Hortonworks' portfolio, that's one of the big values that we can create for our customers. Not only can we be an on-ramp to their hybrid architecture, but as we become that on-ramp, we can also guarantee the portability of the applications that they've built out to those cloud footprints and, ultimately, even out to the edge. >> So, a quick question, then, to clarify on that, or drill down, would that mean you could see scenarios where Hortonworks is managing the distribution of models that do the inferencing on the edge, and you're collecting, bringing back the relevant data, however that's defined, to do the retraining of any models or recreation of new models. >> Absolutely, absolutely. That's one of the key things about the NiFi project in general and Hortonworks DataFlow, specifically, is the ability to selectively move data, and the selectivity can be based on analytic models as well. So, the easiest case to think about is self-driving cars. We all understand how that works, right? A self-driving car has cameras, and it's looking at things going on. It's making decisions, locally, based on models that have been delivered, and they have to be done locally, because of latency, right, but, selectively, hey, here's something that I saw as an image I didn't recognize. I need to send that up, so that it can be added to my lexicon of what images are and what action should be taken. So, of course, that's all very futuristic, but we understand how that works, but that has application in things that are very relevant today. Think about jet engines that have diagnostics running. Do I need to send that terabyte of data an hour over an expensive thing? No, but I have a model that runs locally that says, "Wow, this thing looks interesting. "Let me send a gigabyte now for immediate action." So, that decision making capability is extremely important. >> Well, Scott, thanks so much for taking some time to come chat with us once again on the Cube. We appreciate your insights. >> Appreciate it, time flies. This is great. >> Doesn't it? When you're having fun! >> Yeah. >> Alright, we want to thank you for watching the Cube. I'm Lisa Martin with George Gilbert. We are live at Forager Tasting Room in downtown San Jose at our own event, Big Data SV. We'd love for you to come on down and join us tonight, today, tonight, and tomorrow. Stick around, we'll be right back with our next guest after a short break. (techno music) >> Narrator: Since the dawn of the cloud, the Cube

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media Welcome back to the Cube's We are down the street from the Strata Data Conference. as I've seen in the entire show. What does that do to help customers simplify data in motion? So, the idea is to make it consumable What are some of the things... It's a little bit more from customers, and how is that helping to drive what's that I have to comply with, and when you think and it looked like the governance solutions the problem is infinitely more difficult to go solve, So, one of the things, obviously, Okay, there's a chicken and an egg there it sounds like, Well, you're the CTO. of governance to solve that app. "so, I'm going to go and build that out, but I'm going to So, one of the things you mention, is to get a much broader view, that help you interact with your customer better, in the data to build out those sophisticated models. off the edge is accelerating. if data is not going to be centralized, of models that do the inferencing on the edge, is the ability to selectively move data, to come chat with us once again on the Cube. This is great. Alright, we want to thank you for watching the Cube.

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Scott Gnau	PERSON	0.99+
Lisa Martin	PERSON	0.99+
San Jose	LOCATION	0.99+
February	DATE	0.99+
360 degree	QUANTITY	0.99+
2018	DATE	0.99+
tomorrow	DATE	0.99+
358	OTHER	0.99+
GDPR	TITLE	0.99+
today	DATE	0.99+
tomorrow morning	DATE	0.99+
fifth year	QUANTITY	0.99+
tonight	DATE	0.99+
Lisa	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Hortonworks'	ORGANIZATION	0.99+
one department	QUANTITY	0.99+
one organization	QUANTITY	0.99+
two things	QUANTITY	0.99+
360	QUANTITY	0.98+
one person	QUANTITY	0.98+
one	QUANTITY	0.98+
Cube	ORGANIZATION	0.97+
Strata Data Conference	EVENT	0.96+
300 different solutions	QUANTITY	0.96+
an hour	QUANTITY	0.95+
One	QUANTITY	0.95+
tenth	QUANTITY	0.95+
300 different proprietary solutions	QUANTITY	0.95+
Big Data SV 2018	EVENT	0.93+
few weeks ago	DATE	0.92+
one data	QUANTITY	0.87+
Atlas	TITLE	0.86+
Hortonworks DataFlow	ORGANIZATION	0.85+
Big Data	EVENT	0.85+
Cube	COMMERCIAL_ITEM	0.84+
Silicon Valley	LOCATION	0.83+
European	OTHER	0.82+
DBA	ORGANIZATION	0.82+
Apache	TITLE	0.79+
Tasting	ORGANIZATION	0.76+
Apache	ORGANIZATION	0.73+
CTO	PERSON	0.72+
Sensors	ORGANIZATION	0.71+
downtown San Jose	LOCATION	0.7+
Forager Tasting Room	LOCATION	0.67+
SV	EVENT	0.66+
terabyte of data	QUANTITY	0.66+
NiFi	ORGANIZATION	0.64+
Forager	LOCATION	0.62+
Narrator:	TITLE	0.6+
Big Data	ORGANIZATION	0.55+
Room	LOCATION	0.52+
Eatery	ORGANIZATION	0.45+

Scott Gnau, Hortonworks & Tendü Yogurtçu, Syncsort - DataWorks Summit 2017

>> Man's Voiceover: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE, we are live at Day One of the DataWorks Summit, we've had a great day here, I'm surprised that we still have our voices left. I'm Lisa Martin, with my co-host George Gilbert. We have been talking with great innovators today across this great community, folks from Hortonworks, of course, IBM, partners, now I'd like to welcome back to theCube, who was here this morning in the green shoes, the CTO of Hortonworks, Scott Gnau, welcome back Scott! >> Great to be here yet again. >> Yet again! And we have another CTO, we've got CTO corner over here, with CUBE Alumni and the CTO of SyncSort, Tendu Yogurtcu Welcome back to theCUBE both of you >> Pleasure to be here, thank you. >> So, guys, what's new with the partnership? I know that syncsort, you have 87%, or 87 of the Fortune 100 companies are customers. Scott, 60 of the Fortune 100 companies are customers of Hortonworks. Talk to us about the partnership that you have with syncsort, what's new, what's going on there? >> You know there's always something new in our partnership. We launched our partnership, what a year and a half ago or so? >> Yes. And it was really built on the foundation of helping our customers get time to value very quickly, right and leveraging our mutual strengths. And we've been back on theCUBE a couple of times and we continue to have new things to talk about whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that was had a great foundation in value and continues to grow. And, ya know, with some of the latest moves that I'm sure Tendu will bring us up to speed on that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. >> Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you Scott. And Trillium acquisition has been transformative for us really. We have achieved quite a lot within the last six months. Delivering joint solutions between our data integration, DMX-h, and Trillium data quality and profiling portfolio and that was kind of our first step very much focused on the data governance. We are going to have data quality for Data Lake product available later this year and this week actually we will be announcing our partnership with Collibra data governance platform basically making business rules and technical meta data available through the Collibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year that's in production, a large complex production deployment's already happened. Our customers access all their data all enterprise data including legacy data, warehouse, new data sources as well as legacy main frame in the data lake so we will be announcing again in a week or so change in the capture capabilities from legacy data storage into Hadoop keeping that data fresh and giving more choices to our customers in terms of populating the data lake as well as use cases like archiving data into cloud. >> Tendu, let me try and unpack what was a very dense, in a good way, lot of content. Sticking my foot in my mouth every 30 seconds (laughter) >> Scott Voiceover: I think he called you dense. (laughter) >> So help us visualize a scenario where you have maybe DMX-h bringing data in you might have changed it at capture coming from a live data base >> Tendu Voiceover: Yes. and you've got the data quality at work as well. Help us picture how much faster and higher fidelity the data flow might be relative to >> Sure, absolutely. So, our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all data, all enterprise data accessible in the data lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop is a service architecture, there are multiple ways that you can keep that data fresh in the data lake. And you can have changed it at capture by basically taking snap-shots of the data and comparing in the data lake which is a viable method of doing it. But, as the data volumes are growing and the real time analytics requirements of the business are growing we recognize our customers are also looking for alternative ways that they can actually capture the change in real time when the change is just like less than 10% of the data, original data set and keep the data fresh in the data lake. So that enables faster analytics, real time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities that are on profiling that data, getting better understanding of that data. So we will be focused on delivering products around that because as we understand data we can also help our customers to create the business rules, to cleanse that data, and preserve the fidelity of the data and integrity of the data. >> So, with the change data capture it sounds like near real time, you're capturing changes in near real time, could that serve as a streaming solution that then is also populating the history as well? >> Absolutely. We can go through streaming or message cues. We also offer more efficient proprietary ways of streaming the data to the Hadoop. >> So the, I assume the message cues refers to, probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. >> Yes, we can do either true Kafka cues which is very efficient as well. We can also go through proprietary methods. >> So, Scott, help us understand then now the governance capabilities that, um I'm having a senior moment (laughter) I'm getting too many of these! (laughter) Help us understand the governance capabilities that Syncsort's adding to the, sort of mix with the data warehouse optimization package and how it relates to what you're doing. >> Yeah, right. So what we talked about even again this morning, right the whole notion of the value of open squared, right open source and open ecosystem. And I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like Yarn for multi tendency and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool that simply accrete's to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality. And at the same time we're really thrilled because, and we've talked about this on many times right, the whole notion of governance and meta data management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common meta data tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now, regardless of the application they choose, or the applications that they choose, they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. >> So you're partnership sounds very very symbiotic, that there's changes made on one side that reflect the other. Give us an example of where is your common customer, and this might not be, well, they're all over the place, who has got an enterprise data warehouse, are you finding more customers that are looking to modernize this? That have multi-cloud, core edge, IOT devices that's a pretty distributed environment versus customers that might be still more on prem? What's kind of the mix there? >> Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation, I think one of the strengths of our partnership is at many different levels it's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering field teams working together And in terms of our customers, it's really organizations are trying to move toward modern data architecture. And as they are trying to build the modern data architecture there are the data in motion piece I will let Scott talk about, data in rest piece and as we have so much data coming from cloud, originating through mobile and web in the enterprise, especially the Fortune 500, that we talk, Fortune 100 we talked about, insurance, health care, Talco financial services and banking has a lot of legacy data stores. So our, really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott >> Yeah, I agree And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of cuz they've already made the right decision, right. I also think, though, there's a lot of green field opportunity for us because there are hundreds if not thousands of customers out there who have legacy data systems where their data is kind of locked away. And by the way, it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great, but that is a source of raw material that belongs also in the data lake, right, and can be, can certainly enhance the value of all the other data that's being built there. And so the value, frankly, of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake and then from there, the sky's the limit, right. Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive, and optimization of the overall data fabric and off loading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed. And so, there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there is a distinct value proposition not just for our existing customers but frankly for a large set of customers out there that have, kind of, the data locked away. >> So, how would you see do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? >> Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the syncs or DMX-h we actually understand the path that data travel from, the meta data is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. >> Sure, I mean extended provenance is kind of what you're describing and that's a big deal when you think about some of these legacy systems where frankly 90% of the costs of implementing them originally was actually building out those business rules and that meta data. And so being able to preserve that and bring it over into a common or an open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in, ya know, the latest releases of Hive as an example where we can get low latency query response times there's a whole new class of work loads that now is appropriate to move into this platform and you'll see us continue to move along those lines as we advance the technology from the open community. >> Well, congratulations on continuing this great, symbiotic as we said, partnership. It sounds like it's incredible strong on the technology side, on the strategic side, on the GTM side. I'd loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE >> Thank you. twice in one day. >> Twice in one day. Tendu, thank you as well >> Thank you. for coming back to theCUBE. >> Always a pleasure. For both of our CTO's that have joined us from Hortonworks and Syncsort and my co-host George Gilbert, I am Lisa Martin, you've been watching theCUBE live from day one of the DataWorks summit. Stick around, we've got great guests coming up (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, the CTO of Hortonworks, Scott Gnau, Pleasure to be here, Scott, 60 of the Fortune 100 companies We launched our partnership, what and we continue to have new things and the bundle that we launched early February of this year what was a very dense, in a good way, lot of content. Scott Voiceover: I think he called you dense. and higher fidelity the data flow might be relative to and keep the data fresh in the data lake. We can go through streaming or message cues. So the, I assume the message cues refers to, Yes, we can do either true Kafka cues and how it relates to what you're doing. And so the fact that we're able that reflect the other. and in terms of the partnership and get it in the data lake Some of the areas that we are jointly focused on frankly 90% of the costs of implementing them originally on the strategic side, on the GTM side. Thank you. Tendu, thank you as well for coming back to theCUBE. For both of our CTO's that have joined us

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
hundreds	QUANTITY	0.99+
90%	QUANTITY	0.99+
Twice	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
twice	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trillium	ORGANIZATION	0.99+
Syncsort	ORGANIZATION	0.99+
both	QUANTITY	0.99+
60	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Data Lake	ORGANIZATION	0.99+
less than 10%	QUANTITY	0.99+
this week	DATE	0.99+
one day	QUANTITY	0.99+
Tendu	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
87%	QUANTITY	0.99+
first step	QUANTITY	0.99+
thousands of customers	QUANTITY	0.99+
Syncsort	TITLE	0.98+
87	QUANTITY	0.98+
one	QUANTITY	0.98+
Atlas	TITLE	0.98+
later this year	DATE	0.98+
SyncSort	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.98+
a year and a half ago	DATE	0.97+
Tendu	PERSON	0.97+
DataWorks Summit 2017	EVENT	0.97+
Day One	QUANTITY	0.97+
Fortune 500	ORGANIZATION	0.96+
a week	QUANTITY	0.96+
one side	QUANTITY	0.96+
Fortune 100	ORGANIZATION	0.96+
Scott Voiceover	PERSON	0.95+
Hadoop	TITLE	0.93+
Atlas	ORGANIZATION	0.93+
theCUBE	ORGANIZATION	0.92+
this morning	DATE	0.92+
CTO	PERSON	0.92+
day one	QUANTITY	0.92+
couple	QUANTITY	0.91+
last six months	DATE	0.9+
first use cases	QUANTITY	0.9+
early February of this year	DATE	0.89+
theCube	ORGANIZATION	0.89+
CUBE Alumni	ORGANIZATION	0.87+
DataWorks summit	EVENT	0.86+
today	DATE	0.86+
Talco financial services	ORGANIZATION	0.85+
every 30 seconds	QUANTITY	0.83+
Fortune	ORGANIZATION	0.8+
Kafka	PERSON	0.79+
DMX-h	ORGANIZATION	0.75+
data lake	ORGANIZATION	0.73+
Man's Voiceover	TITLE	0.6+
Kafka	TITLE	0.6+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

Scott Gnau | DataWorks Summit Europe 2017

>> More information, click here. (soothing technological music) >> Announcer: Live from Munich, Germany, it's theCUBE. Covering Dataworks Summit Europe 2017. Brought to you by Hortonworks. (soft technological music) >> Okay welcome back everyone, we're here in Munich, Germany for Dataworks Summit 2017 formerly Hadoop Summit powered by Hortonworks. It's their event, but now called Dataworks because data is at the center of the value proposition Hadoop plus Airal Data and storage. I'm John, my cohost David. Our next guest is Scott Gnau he's the CTO of Hortonworks joining us again from the keynote stage, good to see you again. >> Thanks for having me back, great to be here. >> Good having you back. Get down and dirty and get technical. I'm super excited about the conversations that are happening in the industry right now for a variety of reasons. One is you can't get more excited about what's happening in the data business. Machine learning AI has really brought up the hype around, to me is human America, people can visualize AI and see the self-driving cars and understand how software's powering all this. But still it's data driven and Hadoop is extending into data seeing that natural extension and CloudAIR has filed their S1 to go public. So it brings back the conversations of this opensource community that's been doin' all this work in the big data industry, originally riding in the horse of Hadoop. You guys have an update to your Hadoop data platform which we'll get to in a second, but I want to ask you a lot of stories around Hadoop, I say Hadoop was the first horse that everyone rode in on in the big data industry... When I say big data, I mean like DevOps, Cloud, the whole open sourcing he does, but it's evolving it's not being replaced. So I want you to clarify your position on this because we're just talkin' about some of the false premises, a lot of stories being written about the demise of Hadoop, long-live Hadoop. Yeah, well, how long do we have? (laughing) I think you hit it first, we're at Dataworks Summit 2017 and we rebranded and it was previously Hadoop Summit. We rebranded it to really recognize that there's this bigger thing going on and it's not just Hadoop. Hadoop is a big contributor, a big driver, a very important part of the ecosystem but it's more than that. It's really about being able to manage and deliver analytic content on all data across that data's lifecycle from when it gets created at the edge to its moving through networks, to its landed and store in a cluster to analytics run and decisions go back out. It's that entire lifecycle and you mentioned some of the megatrends and I talked about this morning in the opening keynote. With AI and streaming and IoT, all of these things kind of converging are creating a much larger problem set and frankly, opportunity for us as an industry to go soft. So that's the context that we're really looking-- >> And there's real demand there. This is not like, I mean there's certainly a hype factor on AI, but IoT is real. You have data now, not just a back office concept, you have a front-facing business centric... I mean there's real customer demand here. >> There's real customer demand and it really creates the ability to dramatically change a business. A simple example that I used onstage this morning is think about the electric utility business. I live in Southern California. 25 years ago, by the way I studied to be an electrical engineer, 20 years ago, 30 years ago, that business not entirely simple was about building a big power plant and distributing electrons out to all the consumers of electrons. One direction and optimization of that grid, network and that business was very hard and there was billions of dollars at stake. Fast forward to today, now you still got those generating plants online, but you've also got folks like me generating their own power and putting it back into the grid. So now you've got bidirectional electrons. The optimization is totally different. Then how do you figure out how most effectively to create capacity and distribute that capacity because created capacity that's not consumed is 100% spoiled. So it's a huge data problem but it's a huge data problem meeting IoT, right? Devices, smart meter devices out at the edge creating data doing it in realtime. A cloud blew over, my generating capacity on my roof went down so I've got to pull from the grid, combining all of that data to make realtime decisions is we're talking hundreds of billions of dollars and it's being done today in an industry, it's not a high-tech Silicon Valley kind of industry, electric utilities are taking advantage of this technology today. >> So we were talking off-camera about you know some commentary about the Hadoop is failed and obviously you take exception to that and I and you also made the point it's not just about Hadoop but in a way it is because Hadoop was the catalyst of all this open Why has Hadoop not failed in your view >> Well because we have customers and you know the great thing about conferences like this is we're actually able to get a lot of folks to come in and talk about what they're doing with the technology and how they're driving business benefit and share that business benefit to their colleagues so we see that that it's business benefit coming along you know In any hype cycle you know people can go down a path maybe they had false expectations right early on you know six years ago years ago we were talking about hey is open source of Hadoop is going to come along and replace EDW complete fallacy right what I talked about in that opportunity being able to store all kinds of disparate data being able to manage and maneuver analytics in real time that's the value proposition is very different than some of the legacy ten. So if you view it as hey this thing is going to replace that thing okay maybe not but the point is is very successful for what is not verified that-- >> Just to clarify what you just said there that was you guys never kicked that position. CloudAIR or did with their impala was their initial on you could give me that you don't agree with that? >> Publicly they would say oh it's not a replacement but you're right i mean the actions were maybe designed to do that >> And set in the marketplace that that might be one of the outcomes >> Yeah, but they pivoted quickly when they realized that was failed strategy but i mean that but that became a premise that people locked in on. >> If that becomes your yardstick for measuring then then so-- >> Oh but but wouldn't you agree that that Hadoop in many respects was designed to solve some of the problems that edw never could >> Exactly so so you know again when you think about the the variety of data when you think about the analytic content doing time series analysis is very hard to do in a relational model so it's a new tool in the workbench to go solve analytic problems and so when you look at it from that perspective and I use the utility example the manufacturing example financial consumer finance telco all of these companies are using this technology leveraging this technology to solve problems they couldn't solve or and frankly to build new businesses that they couldn't build before because they didn't have access to that real time-- >> And so money did shift from pouring money into the edw with limited returns because you were at the steep part or the flat part of the s-curve to hey let's put it over here and this so called big data thing and that's why the market I think was conditioned to sort of come to that simple conclusion but dollars the spending did shift did it not? >> Yeah I mean if you subscribe kind of that to that herd mentality and you know the net increase the net new expenditure in the new technology is always going to outpace the growth of the existing kind of plateau technologists. That's just math. >> The growth yes, but not the size not the absolute dollars and so you have a lot of companies right now struggling in the traditional legacy space and you got this rocket ship going in-- >> And again I think if you think about kind of the converging forces that are out there in addition to you know i OT and streaming the ability frankly Hadoop is an enabler of AI when you think about the success of AI and machine learning it's about having massive massive massive amounts of data right? And I think back 25 years ago my first data Mart was 30 gigabytes and we thought that was all the data in the world Now fits on your phone so so when you think about just having the utter capacity and the ability to actually process that capacity of data these are technology breakthroughs that have been driven in the poor open source in Hadoop community when combined with the ability then to execute in clouds and ephemeral kinds of workloads you combine all that stuff together now instead of going to capital committee for 20 millioin dollars for a bunch of hardware to do an exabyte kind of study where you may not get an answer that means anything you can now spin that up in the cloud and for a couple of thousand dollars get the answer take that answer and go build a new system of insight that's going to drive your business and this is a whole new area of opportunity or even by the convergence of all that >> So I agree i mean it's absurd to say Hadoop and big data has failed, it's crazy. Okay but despite the growth i called profitless prosperity can the industry fund itself I mean you've got to make big bets yarn tezz different clouds how does the industry turn into one that is profitable and growing well I mean obviously it creates new business models and new ways of monetizing software in deploying software you know one of the key things that is core to our belief system is really leveraging and working with and nurturing the community is going to be a key success factor for our business right nurturing that innovation in collaboration across the community to keep up with the rate of pace of change is one of the aspects of being relevant as a business and then obviously creating a great service experience for our customers so that they they know that they can depend on enterprise class support enterprise-class security and governance and operational management in the cloud and on-prem in creating that value propisition along with the the advanced and accelerated delivery of innovation is where I think you know we kind of intersect uniquely in in the in the industry. >> and one of the things that I think that people point out and I have this conversation all the time of people who try to squint through the you know the wall street implications of the value proposition of the industry and this and that and I want to get your thoughts on because open source at this era that we're living in today bringing so much value outside of just important works in your your company Dave would made a comment on the intro package we're doing is that the practitioners are getting a lot of value people out in the field so these are the white space as a value and they're actually transformative can you give some examples where things are getting done that are real of real value as use cases that are that are highlighted you guys can i light I think that's the unwritten story that no one thought about it that rising tide floating all boat happening? >> Yeah yes I mean what is the most use cases the white so you have some of those use cases again it really involves kind of integrating legacy traditional transactional information right very valuable information about a company its operations its customers its products and all this kind of thing about being able to combine that with the ability to do real-time sensor management and ultimately have a technology stack that enables kind of the connection of all of those sources of data for an analytic and that's an important differentiation you know for the first 25 years of my career right it was all about what school all this data into a place and then let's do something with it and then we can push analytics back not an entirely bad model but a model that breaks in the world of IOT connected devices it's just frankly isn't enough money to spend on bandwidth to make that happen and as fast as the speed of light is it creates latency so those decisions aren't going to be able to be made in time so we're seeing even in traditional i mentioned utility business think about manufacturing oil and gas right sensors everywhere being able to take advantage not not of collecting all the central data and all of that but being able to actually create analytics based on sensor data and put those analytics outs of the sensors to make real-time decisions that can affect hundreds of millions of dollars of production or equipment are the use cases that we're seeing be deployed today and that's complete white space that was unavailable before. >> Yeah and customer demand too I mean Dave and I were also debating about the this not being a new trend this is just big data happening the customers are demanding production workload so you've seen a lot more forcing function driven by the customer and you guys have some news I want to get to and give your thoughts on HTTP or worse data platform two points dicks what's the key news their house in real time you talking about real time. >> Yeah it's about real time real time flexibility and choice you know motherhood and apple pie >> And the major highlights of that operate >> So the upgrades really inside of hive we now have operational analytic query capabilities where when you do tactical response times second sub second kind of response time. >> You know Hadoop and Hive wasn't previously known for that kind of a tactical response we've been able to now add inside of that technology the ability to view that workload we have customers who building these white space applications who have hundreds or thousands of users or applications that depend on consistency of very quick analytic response time we now deliver that inside the platform what's really cool about it in addition to the fact that it works is is that we did it inside a pipe so we didn't create yet another project or yet another thing that a customer has to integrate to or rewrite their application so any high based application cannot take advantage of this performance enhancement and that's part of our thinking of it as a platform the second thing inside of that that we've done that really it creaks to those kinds of workload is is we've really enhance the ability to incremental data acquisition right whether it be streaming whether it be patch up certs right on the sequel person doing up service being able to do that data maintenance in an active compliant fashion completely automatically and behind the scenes so that those applications again can just kind of run without any heavy lifting >> Just staying in motion kind of thing going on >> Right it's anywhere from data in motion even to batch to mini batch and anywhere kind of in between but we're doing those incremental data loads you know, it's easy to get the same file twice by mistake you don't want to double count you want to have sanctity of the transactions we now handle that inside of Hive with acid compliance. >> So a layperson question for the CTO if I may you mentioned Hadoop was not known for a sort of real-time response you just mentioned acid it was never in the early days known for a sort of acid you know complies others would say you know Hadoop the original Big Data Platform is not designed for the matrix of the matrix math of AI for example are these misconceptions and like Tim Berners-lee when we met Tim Berners-lee web 2.0 this is what the web was designed for would you say the same thing about Hadoop? >> Yeah. Ultimately from my perspective and kind of mending it out, Hadoop was designed for the easy acquisition of data the easy onboarding of data and then once you've onboarded that data it it also was known for enabling new kinds of analytics that could be plugged in certainly starting out with MapReduce in HDFS was kind of before but the whole idea is I have now the flexible way to easily acquire data in its native form without having to apply schema without having to have any formatting distort I can get it exactly as it was and store it and then I can apply whatever schema whatever rules whatever analytics on top of that that I want so the center of gravity from my mind has really moved up to yarn which enables a multi-tenancy approach to having pluggable multiple different kinds of file formats and pluggable different kinds of analytics and data access methods whether it be sequel whether it be machine learning whether the HBase will look up and indexing and anywhere kind of in between it's that it's that Swiss Army knife as it were for handling all of this new stuff that is changing every second we sit here data has changed. >> And just a quick follow-up if I can just clarification so you said new types of analytics that can be plugged in by design because of its openness is that right? >> By design because of its openness and the flexibility that the platform was was built for in addition on the performance we've also got a new update to spark and usability consume ability and collaboration for data scientists using the latest versions of spark inside the platform we've got a whole lot of other features and functions as that our customers have asked for and then on the flexibility and choice it's available public cloud infrastructures of service public cloud platform as a service on Prem x and net new on prem with power >> Just got final question for you just as the industry evolves what are some of the key areas that open source can pivot to that really takes advantage of the machine learning the AI trends going on because you start to see that really increase the narrative around the importance of data and a lot of people are scratching their heads going okay i need to do the back office to set up my IT to have all those crates stuff always open source projects all that the Hadoop data platform but then I got to get down and dirty i might do multiple clouds on the hybrid cloud going on i might want to leverage the moles canoe cool containers and super Nettie's and micro services and almost devops where's that transition happening as a CTO what do you see that that how do you talk to customers about that this transition this evolution of how the data businesses in getting more and more mainstream? >> Yeah i mean i think i think the big thing that people had to get over is we've reverse polarity from again 30 years of I want a stack vendor to have an integrated stack of everything a plug-and-play it's integrated and end it might not be a hundred percent what I want but the cost leverage that I get out of the stack versus what I'm going to go do that's perfect in this world if the opposite it's about enabling the ecosystem and that's where having and by the way it's a combination of open source and proprietary software that you know some of our partners have proprietary software that's okay but it's really about enabling the ecosystem and I think the biggest service that we as an open source community can do is to continue to kind of keep that standard kernel for the platform and make it very usable and very easy for many apps and software providers and other folks. >> A thousand flower bloom and kind of concept and that's what you've done with the white spaces as these cases are evolving very rapidly and then the bigger apps are kind of going to settling into a workload with realtime. >> Yeah all time you know think about the next generation of IT professional the next generation of business professional grew up with iphones and here comes they grew up in a mini app world i mean it download an app i'm going to try it is a widget boom and it's going to help me get something done but it's not a big stack that I'm going to spend 30 years to implement and I liked it and then I want to take to those widgets and connect them together to do things that i haven't been able to do before and that's how this ecosystem is really-- >> Great DevOps culture very agile that's their mindset. So Scott congratulations on your 2.6 upgrade and >> Scott: We're thrilled about it. >> Great stuff acid compliance really big deal again these compliance because little things are important in the enterprise great all right thanks for coming to accuse the Dataworks in Germany Munich I'm John thanks for watching more coverage live here in Germany after this short break

Published Date : Apr 5 2017

SUMMARY :

(soothing technological music) Brought to you by Hortonworks. because data is at the center of the value proposition that are happening in the industry you have a front-facing business centric... combining all of that data to make realtime decisions and share that business benefit to their Just to clarify what you just said there a premise that people locked in on. that to that herd mentality and you know the community to keep up with the rate cases the white so you have some of debating about the this not being a new So the upgrades really inside of hive we it's easy to get the same file twice by mistake you the CTO if I may you mentioned Hadoop acquisition of data the easy onboarding the big thing that people had to get kind of going to settling into a So Scott congratulations on your 2.6 upgrade and

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
100%	QUANTITY	0.99+
John	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Germany	LOCATION	0.99+
Southern California	LOCATION	0.99+
30 years	QUANTITY	0.99+
30 gigabytes	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
hundreds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Swiss Army	ORGANIZATION	0.99+
six years ago years ago	DATE	0.99+
America	LOCATION	0.99+
25 years ago	DATE	0.99+
Hadoop	TITLE	0.99+
Munich, Germany	LOCATION	0.99+
today	DATE	0.98+
Dataworks Summit 2017	EVENT	0.98+
30 years ago	DATE	0.98+
two points	QUANTITY	0.98+
iphones	COMMERCIAL_ITEM	0.98+
telco	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
hundred percent	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
first 25 years	QUANTITY	0.97+
DevOps	TITLE	0.97+
hundreds of millions of dollars	QUANTITY	0.97+
20 years ago	DATE	0.97+
20 millioin dollars	QUANTITY	0.97+
twice	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.97+
One	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Tim Berners-lee	PERSON	0.96+
Silicon Valley	LOCATION	0.96+
Munich	LOCATION	0.96+
Hadoop Summit	EVENT	0.96+
One direction	QUANTITY	0.96+
first horse	QUANTITY	0.95+
first data	QUANTITY	0.95+
Dataworks	ORGANIZATION	0.94+
second	QUANTITY	0.92+
Cloud	TITLE	0.92+
EDW	ORGANIZATION	0.85+
2017	EVENT	0.85+
couple of thousand dollars	QUANTITY	0.84+
Dataworks Summit Europe 2017	EVENT	0.84+
MapReduce	TITLE	0.84+
thousands of users	QUANTITY	0.83+
lot of folks	QUANTITY	0.83+
this morning	DATE	0.8+
S1	TITLE	0.79+
Europe	LOCATION	0.78+
A thousand flower bloom	QUANTITY	0.78+
2.6	OTHER	0.76+
apps	QUANTITY	0.73+

Scott Gnau, Hortonworks Big Data SV 17 #BigDataSV #theCUBE

>> Narrator: Live from San Jose, California it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We're here live in Silicon Valley. This is theCUBE's coverage of Big Data Silicon Valley. Our event in conjunction with O'Reilly Strata Hadoop, of course we have our Big Data NYC event and we have our special popup event in New York and Silicon Valley. This is our Silicon Valley version. I'm John Furrier, with my co-host Jeff Frick and our next guest is Scott Gnau, CTO of Hortonworks. Great to have you on, good to see you again. >> Scott: Thanks for having me. >> You guys have an event coming up in Munich, so I know that there's a slew of new announcements coming up with Hortonworks in April, next month in Munich for your EU event and you're going to be holding a little bit of that back, but some interesting news this morning. We had Wei Wang yesterday with Microsoft Azure team HDInsight's. That's flowering nicely, a good bet there, but the question has always been at least from people in the industry and we've been questioning you guys on, hey, where's your cloud strategy? Because as a disture you guys have been very successful with your always open approach. Microsoft as your guy was basically like, that's why we go with Hortonworks because of pure open source, committed to that from day one, never wavered. The question is cloud first, AI, machine learning this is a sweet spot for IoT. You're starting to see the collision between cloud and data, and in the intersection of that is deep learning, IoT, a lot of amazing new stuff going to be really popping out of this. Your thoughts and your cloud strategy. >> Obviously we see cloud as an enabler for these use cases. In many instances the use cases can be femoral. They might not be tied immediately to an ROI, so you're going to go to the capital committee and all this kind of stuff, versus let me go prove some value very quickly. It's one of the key enablers core ingredients and when we say cloud first, we really mean it. It's something where the solutions work together. At the same time, cloud becomes important. Our cloud strategy and I think we've talked about this in many different venues is really twofold. One is we want to give a common experience to our customers across whatever footprint they chose, whether it be they roll their own, they do it on print, they do it in public cloud and they have choice of different public cloud vendors. We want to give them a similar experience, a good experience that is enterprise great, platform level experience, so not point solution kind of one function and then get rid of it, but really being able to extend the platform. What I mean by that of course, is being able to have common security, common governance, common operational management. Being able to have a blueprint of the footprint so that there's compatibility of applications that get written. And those applications can move as they decide to change their mind about where their platform hosting the data, so our goal really is to give them a great and common experience across all of those footprints number one. Then number two, to offer a lot of choices across all of those domains as well, whether it be, hey I want to do infrastructure as a service and I know what I want on one end of the spectrum to I'm not sure exactly what I want, but I want to spin up a data science cluster really quickly. Boom, here's a platform as a service offer that runs and is available very easy to consume, comes preconfigured and kind of everywhere in between. >> By the way yesterday Wei was pointing out 99.99 SLAs on some of the stuff coming out. >> Are amazing and obviously in the platform as a service space, you also get the benefit of other cloud services that can plug in that wouldn't necessarily be something you'd expect to be typical of a core Hadoop platform. Getting the SLAs, getting the disaster recovery, getting all of the things that cloud providers can provide behind the scenes is some additional upside obviously as well in those deployment options. Having that common look and feel, making it easy, making it frictionless, are all of the core components of our strategy and we saw a lot of success with that in coming out of year end last year. We see rapid customer adoption. We see rapid customer success and frankly I see that I would say that 99.9% of customers that I talk to are hybrid where they have a foot in nonprem and they have a foot in cloud and they may have a foot in multiple clouds. I think that's indicative of what's going on in the world. Think about the gravity of data. Data movement is expensive. Analytics and multi-core chipsets give us the ability to process and crunch numbers at unprecedented rates, but movement of data is actually kind of hard. There's latency, it can be expensive. A lot of data in the future, IoT data, machine data is going to be created and live its entire lifecycle in the cloud, so the notion of being able to support hybrid with a common look and feel, I think very strategically positions us to help our customers be successful when they start actually dealing with data that lives its entire lifecycle outside the four walls of the data center. >> You guys really did a good job I thought on having that clean positioning of data at rest, but also you had the data in motion, which I think ahead of its time you guys really nailed that and you also had the IoT edge in mind, we've talked I think two years ago and this was really not on everyone's radar, but you guys saw that, so you've made some good bets on the HDInsight and we talked about that yesterday with Wei on here and Microsoft. So edge analytics and data in motion a very key right now, because that batch streaming world's coming together and IoTs flooding it with all this kind of data. We've seen the success in the clouds where analytics have been super successful with powering by the clouds. I got to ask you with Microsoft as your preferred cloud provider, what's the current status for customers who have data in motion, specifically IoT too. It's the common question we're getting, not necessarily the Microsoft question, but okay I've got edge coming in strong-- >> Scott: Mm-hmm >> and I'm going to run certainly hybrid in a multi cloud world, but I want to put the cloud stuff for most of the analytics and how do I deal with the edge? >> Wow, there's a lot there (laughs) >> John: You got 10 seconds, go! (laughs) You have Microsoft as your premier cloud and you have an Amazon relationship with a marketplace and what not. You've got a great relationship with Microsoft. >> Yeah. I think it boils down to a bigger macro thing and hopefully I'll peel into some specifics. I think number one, we as an industry kind of short change ourselves talking about Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. I think it's bigger than Hadoop, not different than but certainly than, right, and this is where we started with the whole connected platforms indicating of traditional Hadoop comes from traditional thinking of data at rest. So I've got some data, I've stored it and I want to run some analytics and I want to be able to scale it and all that kinds of stuff. Really good stuff, but only part of the issue. The other part of the issue is data that's moving, data that's being created outside of the four walls of the data center. Data that's coming from devices. How do I manage and move and handle all of that? Of course there have been different hype cycles on streaming and streaming analytics and data flow and all those things. What we wanted to do is take a very protracted look at the problem set of the future. We said look it's really about the entire lifecycle of data from inception to demise of the data or data being delayed, delete it, which very infrequently happens these days. >> Or cold storage-- >> Cold storage, whatever. You know it's created at the edge, it moves through, it moves in different places, its landed, its analyzed, there are models built. But as models get deployed back out to the edge, that entire problem set is a problem set that I think we, certainly we at Hortonworks are looking to address with the solutions. That actually is accelerated by the notion of multiple cloud footprints because when you think about a customer that may have multiple cloud footprints and trying to tie the data together, it creates a unique opportunity, I think there's a reversal in the way people need to think about the future of compute. Where having been around for a little bit of time, it's always been let me bring all the data together to the applications and have the applications run and then I'll send answers back. That is impossible in this new world order, whether it be the cloud or the fog or any of the things in between or the data center, data are going to be distributed and data movement will become the expensive thing, so it will be very important to be able to have applications that are deployable across a grid, and applications move to the data instead of data moving to the application. And or at least to have a choice and be able to be selective so that I believe that ultimately scalability five years from now, ten years from now, it's not going to be about how many exabytes I have in my cloud instance, that will be part of it, it will be about how many edge devices can I have computing and analyzing simultaneously and coordinating with each other this information to optimize customer experience, to optimize the way an autonomous car drives or anywhere in between. >> It's totally radical, but it's also innovative. You mentioned the cost of moving data will be the issue. >> Scott: Yeah. >> So that's going to change the architecture of the edge. What are you seeing with customers, cuz we're seeing a lot of people taking a protracted view like you were talking about and looking at the architectures, specifically around okay. There's some pressure, but there's no real gun to the head yet, but there's certainly pressure to do architectural thinking around edge and some of the things you mentioned. Patterns, things you can share, anecdotal stories, customer references. >> You know the common thing is that customers go, "Yep, that's going to be interesting. "It's not hitting me right now, "but I know it's going to be important. "How can I ease into it and kind of without the suspenders "how can I prove this is going to work and all that." We've seen a lot of certainly interest in that. What's interesting is we're able to apply some of that futuristic IoT technology in Hortonworks data flow that includes NiFi and MiNiFi out to the edge to traditional problems like, let me get the data from the branches into the central office and have that roundtrip communication to a banker who's talking to a customer and has the benefit of all the analytics at home, but I can guarantee that roundtrip of data and analytics. Things that we thought were solid before, can be solved very easily and efficiently with this technology, which is then also extensible even out further to the edge. In many instances, I've been surprised by customer adoption with them saying, "Yeah, I get that, but gee this helps me "solve a problem that I've had for the last 20 years "and it's very easy and it sets me up "on the right architectural course, "for when I start to add in those edge devices, "I know exactly how I'm going to go do it." It's been actually a really good conversation that's very pragmatic with immediate ROI, but again positioning people for the future that they know is coming. Doing that, by the way, we're also able to prove the security. Think about security is a big issue that everyone's talking about, cyber security and everything. That's typically security about my data center where I've got this huge fence around it and it's very controlled. Think about edge devices are now outside that fence, so security and privacy and provenance become really, really interesting in that world. It's been gratifying to be able to go prove that technology today and again put people on that architectural course that positions them to be able to go out further to the edge as their business demands it. >> That's such great validation when they come back to you with a different solution based on what you just proposed. >> Scott: Yep. >> That means they really start to understand, they really start to see-- >> Scott: Yep. >> How it can provide value to them. >> Absolutely, absolutely. That is all happening and again like I said this I think the notion of the bigger problem set, where it's not just storing data and analyzing data, but how do I have portable applications and portable applications that move further and further out to the edge is going to be the differentiation. The future successful deployments out there because those deployments and folks are able to adopt that kind of technology will have a time to market advantage, they'll have a latency advantage in terms of interaction with a customer, not waiting for that roundtrip of really being able to push out customized, tailored interactions, whether it be again if it's driving your car and stopping on time, which is kind of important, to getting a coupon when you're walking past a store and anywhere in between. >> It's good you guys have certainly been well positioned for being flexible, being an open source has been a great advantage. I got to ask you the final question for the folks watching, I'm sure you guys answer this either to investors or whatnot and customers. A lot's changed in the past five years and a lot's happening right now. You just illustrated it out, the scenario with the edge is very robust, dynamic, changing, but yet value opportunity for businesses. What's the biggest thing that's changing right now in the Hortonworks view of the world that's notable that you thinks worth highlighting to people watching that are your customers, investors, or people in the industry. >> I think you brought up a good point, the whole notion of open and the whole groundswell around open source, open community development as a new paradigm for delivering software. I talked a little bit about a new paradigm of the gravity of data and sensors and this new problem set that we've got to go solve, that's kind of one piece of this storm. The other piece of the storm is the adoption and the wave of open, open community collaboration of developers versus integrated silo stacks of software. That's manifesting itself in two places and obviously I think we're an example of helping to create that. Open collaboration means quicker time to market and more innovation and accelerated innovation in an increasingly complex world. That's one requirement slash advantage of being in the open world. I think the other thing that's happening is the generation of workforce. When I think about when I got my first job, I typed a resume with a typewriter. I'm dating myself. >> White out. >> Scott: Yeah, with white out. (laughter) >> I wasn't a good typer. >> Resumes today is basically name and get GitHub address. Here's my body of work and it's out there for everybody to see, and that's the mentality-- >> And they have their cute videos up there as well, of course. >> Scott: Well yeah, I'm sure. (laughter) >> So it's kind of like that shift to this is now the new paradigm for software delivery. >> This is important. You've got theCUBE interview, but I mean you're seeing it-- >> Is that the open source? >> In the entertainment. No, we're seeing people put huge interviews on their LinkedIn, so this notion of collaboration in the software engineering mindset. You go back to when we grew up in software engineering, now it went to open source, now it's GitHub is essentially a social network for your body of work. You're starting to see the software development open source concepts, they apply to data engineering, data science is still early days. Media media creation what not so, I think that's a really key point in the data science tools are still in their infancy. >> I think open, and by the way I'm not here to suggest that everything will be open, but I think a majority and-- >> Collaborative the majority of the problem that we're solving will be collaborative, it will be ecosystem driven and where there's an extremely large market open will be the most efficient way to address it. And certainly no one's arguing that data and big data is not a large market. >> Yep. You guys are all on the cloud now, you got the Microsoft, any other updates that you think worth sharing with folks. >> You've got to come back and see us in Munich then. >> Alright. We'll be there, theCUBE will be there in Munich in April. We have the Hortonworks coverage going on in Data Works, the conference is now called Data Works in Munich. This is theCUBE here with Scott Gnau, the CTO of Hortonworks. Breaking it down I'm John Furrier with Jeff Frick. More coverage from Big Data SV in conjunction with Strata Hadoop after the short break. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE covering Big good to see you again. and in the intersection of blueprint of the footprint on some of the stuff coming out. of customers that I talk to are hybrid I got to ask you with Microsoft and you have an Amazon relationship of the data center. and be able to be selective You mentioned the cost of and looking at the architectures, and has the benefit on what you just proposed. and further out to the edge I got to ask you the final and the whole groundswell Scott: Yeah, with white out. and that's the mentality-- And they have their cute videos Scott: Well yeah, I'm sure. So it's kind of like that shift to but I mean you're seeing it-- in the data science tools the majority of the you got the Microsoft, You've got to come back We have the Hortonworks

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
Jeff Frick	PERSON	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
New York	LOCATION	0.99+
Munich	LOCATION	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
April	DATE	0.99+
yesterday	DATE	0.99+
10 seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
99.99	QUANTITY	0.99+
two places	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
first job	QUANTITY	0.99+
GitHub	ORGANIZATION	0.99+
next month	DATE	0.99+
two years ago	DATE	0.98+
today	DATE	0.98+
99.9%	QUANTITY	0.98+
ten years	QUANTITY	0.97+
Big Data	EVENT	0.97+
five years	QUANTITY	0.96+
Big Data Silicon Valley 2017	EVENT	0.96+
this morning	DATE	0.95+
O'Reilly Strata Hadoop	ORGANIZATION	0.95+
One	QUANTITY	0.95+
Data Works	EVENT	0.94+
year end last year	DATE	0.94+
one	QUANTITY	0.93+
Hadoop	TITLE	0.93+
theCUBE	ORGANIZATION	0.93+
one piece	QUANTITY	0.93+
Wei Wang	PERSON	0.91+
NYC	LOCATION	0.9+
Wei	PERSON	0.88+
past five years	DATE	0.87+
first	QUANTITY	0.86+
CTO	PERSON	0.83+
four walls	QUANTITY	0.83+
Big Data SV	ORGANIZATION	0.83+
#BigDataSV	EVENT	0.82+
one function	QUANTITY	0.81+
Big Data SV 17	EVENT	0.78+
EU	LOCATION	0.73+
HDInsight	ORGANIZATION	0.69+
Strata Hadoop	PERSON	0.69+
one requirement	QUANTITY	0.68+
number two	QUANTITY	0.65+

Scott Gnau - Hadoop Summit 2013 - theCUBE - #HadoopSummit

live at hadoop summit this is SiliconANGLE and wiki bonds exclusive coverage of hadoop summit this is the cube our flagship program would go out the advanced extract the signal from the noise i'm to enjoy my co-host Jeff Kelly Jeff welcome to the cube Scott welcome to the cube great to have you here so you kicked off help kick off the show this morning with your keynote talking about a number of things among them the new teradata plans for Hadoop brought it on stage which I thought was great i love i love some i was joined by a dancing appliance okay great it was fantastic a good-looking appliance it was but why don't you tell us a little bit about yourself kind of your role and then we'll kind of get into what tara date is doing here at the show and some of the some of the strategies you're taking towards the big data market okay great well I'm Scott now I'm from tarde de labs and turny two labs is actually organization within teradata that is responsible for research development engineering product management product marketing all the products all of the technology that we roll out kind of the innovation engine of teradata is what we're responsible for and we've been obviously affiliated with hadoop summit we were here last year it's really great to be back having been in the in the data warehouse big data kind of data analytics business for a long time the one thing I have to say about this whole movement in the Hadoop space is that it's unlike anything else I've seen in that it's every geography it's every industry and there's so much energy and emotion around it's unlike any other transition that I've seen and even the difference between our visit here last year and this year where we've seen the the promise turned into reality where we've got customers who are implementing where we've got businesses who are driving value from the solutions that they're really that they're integrating with the solutions that they've already got and and being able to demonstrate that value really emphasizes the importance and I think will help to continue the momentum that we feel in this market Scott one of the things I want to ask you was obviously the theme at had dude was off loading data warehouses what they do is a benefit there but you have a relationship with Hortonworks and we've had we were talking early with Murph was an analyst at Gartner was talking about the the early adopters and the mainstream getting it now and but there's always a question of value right where's the value because his legacy involved right so the most of the web based companies are going to be cloud they'll be SAS they might have a Greenfield clean sheet of paper to work with on big data but an existing enterprise large financial institutions insurance company or what have you they have legacy technology and they have to but they want Hadoop they want to bring it in when you talk to folks out there what are some of the challenges and opportunities they have with that environment and the technology specifically sure that was like a long question there's a lot of a lot of threads in there I want to really try to hit on a couple of important themes because you know you hear it here I get asked a lot about it you know one of the things that people often say is you know this why are you here this whole Hadoop thing is offloading data warehouses isn't that bad doesn't that bother you and the answer is absolutely not certainly there's some hype around that and you know those some marketing around that but when you really look at the technology and the value of what it brings to the table it's a new technology that really allows us to harness new kinds of data and store those new kinds of data in the native format and you know storing detailed data in the native format really enables the best world-class analytics we've seen this happen for you know as long as my career is in the traditional data space so that's a really good thing the way I view it though is sure will some work load move around the infrastructure from the data warehouse to a Hadoop cluster potentially right and by the way if Hadoop is a great solution for it it should go there all right but at the same time there is more demand than there is supply of technology and what I mean by that is the demand for analytics is so extreme that actually adding this tool to the toolkit gives customers more choice and gives them the opportunity to really catch up with the backlog of things that they've wanted to invest in overtime and then the final point really I view what's happening here as perhaps one of the single largest opportunities for expansion of the role and size and scope of the data warehouse in an enterprise because one of the big things that Hadoop brings to the table is a whole lot of raw material a whole lot more data data that used to be thrown away data that never existed a year ago is now going to be able to capture be captured be stored be refined be analyzed and as companies start to find relationships as companies start to find actionable tidbits from the analytics in this huge source of raw material I think it's actually an opportunity for upside for them to integrate more data into their data warehouse where they can actually do the real-time interaction and streaming that's going to get them to the demonstrable business benefit so it's the modernization of the enterprise it's its modernization the way I look at it is also it's sometimes the word incremental can be it can sound like it we're trying to downplay it but I see it as incremental in that it's different data and it's incremental data it's incremental subject areas its new stuff that's going to come into the environment and based on what we've seen in the history of analytics right that there's no end to the value that companies find and there's no end to competition in their businesses so this is a huge opportunity for the entire community to deliver more analytics and i think that there's actually more upside for traditional legacy data warehouse vendors and there is anything I think that's a really important point because as you said a lot of people think about that offloading workloads but it's also about offloading we're close but bringing in new data doing more analytics and then moving some of that into back into the data warehouse you can actually create more value from it yeah I mean one of the things that I've seen is you know over time and Moore's law is something that's been going on for some time right and and cost erosion in Hardware has been going on for a long time and you think about the thing that you buy today for your bi implementation the hardware costs what twenty percent of what it costs three four years ago and you know what revenues continue to increase because they're such pent-up demand that as it gets less expensive it becomes more consumable and I think the same thing it's really going to continue to happen as we add in these new technologies and these new data types so one of the things I want to commend teradata for doing is focusing on kind of that reference protector and helping customers understand how this new technology of Hadoop and big data fits in with everything else that they're doing talk a little bit a bit about how from a reference architecture and then maybe even from a product perspective how teradata goes about turning this into a reality for enterprise customers who you know really you know they're not looking to just kick the tires of the Duke they want they want to use this for its really support you know applications and workflows they're really you know critical to their business yeah I think you know one of the biggest things that we can do to help the industry and to help our customers really is to define a realistic roadmap that's consumable for them in their enterprise and so while it's certainly easy to have marketing release or press release it says uh this new technology does everything in slices bread it washes your car does all these things in reality there are very few things like that in the world right but the new technologies and the new innovations really do fit into some very interesting new use cases and so by providing this integrated roadmap of how customers can deploy and fit these technologies together is a really great education process and it's been extremely well received by our customers and prospects I have to tell you that even in advance of the announcement of the things that we had here today we've already got customers who have gone down this path with us because it's such a compelling value proposition the other thing is that we don't actually put specific technology in those boxes it's a reference architecture we hope that there's some teradata product in there but at the same time we you know our customers understand that there is choice in the marketplace and the best solution is going to win and by providing this reference architecture I think we helped elevate ourselves to more of a trusted advisor status with with the the industry and in how we see these things fitting together and providing very effective very low-risk kinds of solutions well I think you hit on something that trusted advisor I think companies and enterprises are just crying out for some leadership and to help to help them really understand how they're going to make this a reality in their organizations and you know you mentioned kind of the openness and being you know allowing enterprises shoots a technology that fits that fits the the work case of course you know you hope that stared at in a lot of cases but it could be something else so talk a little bit about your relationship with hortonworks so I know you announced today kind of a reseller agreement you're going to be actually reselling the the subscription service to Hortonworks service offering talk about that a little bit and also I want to dive into the tech as well the Hadoop appliance I mentioned earlier like you announced and maybe just kind of walk us through some of the news to them sure so I mean obviously we have a strategic relationship with Hortonworks and it's our second year here at Summit and it really started with I think a very common view of what's happening in the marketplace and how these technologies should really play well together at the same time we also really believe that it's important that the community embrace the open source Apache version of the software so that it doesn't become fragmented and become obsolete right so Horton is spot-on in terms of business model and putting everything back into the Apache open source version so that means that I think this is the version that will win and this will be the version that companies can count on to be sustainable so i think that there's an advantage there implied so that's said i think it fits into the right place we've got a great engineering relationship and a great common vision on how the enterprise architecture and how the pieces can fit together and be optimized for different workloads for different service levels and for different applications so having that common vision and kind of I think bringing to Best of Breed providers together with Wharton works on the on the Hadoop side and teradata for what we're very well known for I think it's really the best of all worlds and we work together to lay out this reference architecture and so it's not just you know tur data came down from the mountain said this should be your reference architecture we've got some validation we got some validation of use cases and then we went to work from an engineering perspective on how we go build these things out and make them work and optimize them and support them end to end because obviously not only in you know with the all of the new solutions is their kind of a scarcity of talent and some confusion support becomes really really important so one of the things we added to our portfolio we announced today is an expanded relationship on the support side where customers can come to teradata for integrated support of all of their data analytics environments whether it be teradata whether it be asked her whether it be Hadoop with hdb and you know that's a really nice thing where there's one phone number to call we've got fully integrated processes we can help with a global footprint in the 80 countries where we do business and obviously Hortonworks with the with the extreme depth and ability to manage the content of the kernel can get it done unlike anyone else Scott we've been talking enterprise-grade all morning as you did those the theme of the keynote mer from our garden about security compliance I mean these are meat and potatoes enterprise issues right so I got to ask you what's what are you guys looking at what's what's coming next obviously the platform to do has a stabilized developers going to want to program on it in different environments but the reality in the enterprise is a certain requirement so what are you looking at in the labs that's coming around the corner that's it going to be really really important for customers to realize the value of scaling and harnessing the big data of Hadoop with the existing infrastructure yeah I mean I think there are two things that will continue to do one is will look to build out kind of that framework of ecosystem and in all of the keynotes this morning you know everyone talked about the value of the ecosystem and it's amazing the ecosystem how they're just more and more logos this year than there were last year and I think that that will continue but really building out that ecosystem so that those things that are important can be realized and they can be realized in a very repeatable fashion I think in addition to that kind of ease of use right because despite the fact that we have burgeoning numbers of newly minted data scientists and people getting into the marketplace that's really good there still aren't enough and so de-risking things by making them easier to deploy and easier to support i think is a key focus area and then you know finally I said two things but now third you know finally it will say to me I'd all right we'll continue to look at performance and just making sure that we have the best density the best performance the cost performance value proposition that our customers will want because I also continue to believe that the supply of data will outstrip any customers ability to invest in infrastructure I'd love to get your take on want to go back to mention to what you mentioned about the you know the Hadoop distribution focusing on a patchy and moving a patchy compatible so I take that number one to me and Tara day is not going to be coming out with their own Hadoop distribution absolutely not but how do you think about that yeah I think we can say that pretty definitively so but what about how do you see this whole Hadoop market playing out them you've got a Hortonworks Cloudera map are some others how do you see this playing out in the next year or so I mean is this you mentioned you think again that's kind of the open source of patchy versions going to kind of win when do you think that's going to happen you've got some competitors in the market and different business models hot yeah you know there are different business models and different innovators and you know my crystal ball is probably only about as clear as anyone elses but you know kind of for the long term I think it's best for the industry if if it mimics a model similar to the way Linux is deployed where this kind of a duopoly maybe three vendors it's very largely open source there's a lot of portability between I think that really strengthens the position of Hadoop as a tech as a core technology and foundation for some of the things that we're doing and so I would hope that in you know the most successful outcome would be that we'd end up with a duopoly or or you know maybe three kind of providers around a similar colonel because that would that would remove fragmentation from the market by the way I think it you know where we are software company so I think it's fair for companies to have value add proprietary software that's not a bad thing but at the file system level at a core two level I think the open source community cannot be out innovated right and and so I think that that's a really important thing so I think you know hopefully we'll get to that duopoly or maybe three companies that kind of have that I don't know if we will but I sure hope we do and I think the if I were to bet on it I would say it's odds on that that will be the case now will that be 18 months three years five years I don't know Scott thanks for coming inside the cube obviously you guys have a great position in the market place and the enterprise message is straw here that's what the demand is we're seeing a lot of trends out there that want the enterprise grade big data which is not just once there's but Hadoop's a big part of it Thanks coming inside the cube and sharing your perspective and what you got working on certainly having the new products come out to be great so thanks for coming onto the cube this is SiliconANGLE and wiki bonds coverage of hadoop summit we'll be right back with our next guest after this short break you

Published Date : Jul 2 2013

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
Jeff Kelly	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
twenty percent	QUANTITY	0.99+
Scott	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
hortonworks	ORGANIZATION	0.99+
last year	DATE	0.99+
Horton	ORGANIZATION	0.99+
second year	QUANTITY	0.99+
this year	DATE	0.99+
18 months	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
last year	DATE	0.99+
80 countries	QUANTITY	0.99+
three years	QUANTITY	0.99+
today	DATE	0.99+
two things	QUANTITY	0.98+
next year	DATE	0.98+
Linux	TITLE	0.98+
three companies	QUANTITY	0.98+
five years	QUANTITY	0.98+
Wharton	ORGANIZATION	0.98+
a year ago	DATE	0.98+
two things	QUANTITY	0.98+
Hadoop	TITLE	0.97+
third	QUANTITY	0.97+
tarde de labs	ORGANIZATION	0.96+
one	QUANTITY	0.95+
this year	DATE	0.95+
SAS	ORGANIZATION	0.94+
Hadoop Summit 2013	EVENT	0.94+
kernel	TITLE	0.93+
one phone	QUANTITY	0.93+
Greenfield	ORGANIZATION	0.92+
Murph	PERSON	0.92+
Jeff	PERSON	0.92+
this morning	DATE	0.9+
two	QUANTITY	0.9+
three kind	QUANTITY	0.9+
hadoop summit	EVENT	0.9+
three vendors	QUANTITY	0.89+
teradata	ORGANIZATION	0.88+
one thing	QUANTITY	0.87+
this morning	DATE	0.87+
Apache	TITLE	0.86+
Duke	ORGANIZATION	0.8+
four years ago	DATE	0.79+
Apache	ORGANIZATION	0.79+
two labs	QUANTITY	0.77+
Hadoop	ORGANIZATION	0.77+
Tara day	PERSON	0.74+
three	DATE	0.7+
one of the biggest things	QUANTITY	0.7+
#HadoopSummit	EVENT	0.68+
lot of people	QUANTITY	0.67+
lot more data	QUANTITY	0.66+
a lot of threads	QUANTITY	0.66+
SiliconANGLE	ORGANIZATION	0.66+
Cloudera	TITLE	0.66+
single	QUANTITY	0.65+
things	QUANTITY	0.61+
turny	ORGANIZATION	0.6+
lot	QUANTITY	0.55+
wiki	TITLE	0.55+
Best	ORGANIZATION	0.52+
Moore	ORGANIZATION	0.48+
wiki	ORGANIZATION	0.47+

Today’s Data Challenges and the Emergence of Smart Data Fabrics

(intro music) >> Now, as we all know, businesses are awash with data, from financial services to healthcare to supply chain and logistics and more. Our activities, and increasingly, actions from machines are generating new and more useful information in much larger volumes than we've ever seen. Now, meanwhile, our data-hungry society's expectations for experiences are increasingly elevated. Everybody wants to leverage and monetize all this new data coming from smart devices and innumerable sources around the globe. All this data, it surrounds us, but more often than not, it lives in silos, which makes it very difficult to consume, share, and make valuable. These factors, combined with new types of data and analytics, make things even more complicated. Data from ERP systems to images, to data generated from deep learning and machine learning platforms, this is the reality that organizations are facing today. And as such, effectively leveraging all of this data has become an enormous challenge. So, today, we're going to be discussing these modern data challenges and the emergence of so-called "Smart Data Fabrics" as a key solution to said challenges. To do so, we're joined by thought leaders from InterSystems. This is a really creative technology provider that's attacking some of the most challenging data obstacles. InterSystems tells us that they're dedicated to helping customers address their critical scalability, interoperability, and speed-to-value challenges. And in this first segment, we welcome Scott Gnau, he's the global Head of Data Platforms at InterSystems, to discuss the context behind these issues and how smart data fabrics provide a solution. Scott, welcome. Good to see you again. >> Thanks a lot. It's good to be here. >> Yeah. So, look, you and I go back, you know, several years and, you know, you've worked in Tech, you've worked in Data Management your whole career. You've seen many data management solutions, you know, from the early days. And then we went through the hoop, the Hadoop era together and you've come across a number of customer challenges that sort of change along the way. And they've evolved. So, what are some of the most pressing issues that you see today when you're talking to customers and, you know, put on your technical hat if you want to. >> (chuckles) Well, Dave, I think you described it well. It's a perfect storm out there. You know, combined with there's just data everywhere and it's coming up on devices, it's coming from new different kinds of paradigms of processing and people are trying to capture and harness the value from this data. At the same time, you talked about silos and I've talked about data silos through my entire career. And I think, I think the interesting thing about it is for so many years we've talked about, "We've got to reduce the silos and we've got to integrate the data, we've got to consolidate the data." And that was a really good paradigm for a long time. But frankly, the perfect storm that you described? The sources are just too varied. The required agility for a business unit to operate and manage their customers is creating an enormous presser and I think ultimately, silos aren't going away. So, there's a realization that, "Okay, we're going to have these silos, we want to manage them, but how do we really take advantage of data that may live across different parts of our business and in different organizations?" And then of course, the expectation of the consumer is at an all-time high, right? They expect that we're going to treat them and understand their needs or they're going to find some other provider. So, you know, pulling all of this together really means that, you know, our customers and businesses around the world are struggling to keep up and it's forcing a real, a new paradigm shift in underlying data management, right? We started, you know, many, many years ago with data marts and then data warehouses and then we graduated to data lakes, where we expanded beyond just traditional transactional data into all kinds of different data. And at each step along the way, we help businesses to thrive and survive and compete and win. But with the perfect storm that you've described, I think those technologies are now just a piece of the puzzle that is really required for success. And this is really what's leading to data fabrics and data meshes in the industry. >> So what are data fabrics? What problems do they solve? How do they work? Can you just- >> Yeah. So the idea behind it is, and this is not to the exclusion of other technologies that I described in data warehouses and data lakes and so on, but data fabrics kind of take the best of those worlds but add in the notion of being able to do data connectivity with provenance as a way to integrate data versus data consolidation. And when you think about it, you know, data has gravity, right? It's expensive to move data. It's expensive in terms of human cost to do ETL processes where you don't have known provenance of data. So, being able to play data where it lies and connect the information from disparate systems to learn new things about your business is really the ultimate goal. You think about in the world today, we hear about issues with the supply chain and supply and logistics is a big issue, right? Why is that an issue? Because all of these companies are data-driven. They've got lots of access to data. They have formalized and automated their processes, they've installed software, and all of that software is in different systems within different companies. But being able to connect that information together, without changing the underlying system, is an important way to learn and optimize for supply and logistics, as an example. And that's a key use case for data fabrics. Being able to connect, have provenance, not interfere with the operational system, but glean additional knowledge by combining multiple different operational systems' data together. >> And to your point, data is by its very nature, you know, distributed around the globe, it's on different clouds, it's in different systems. You mentioned "data mesh" before. How do data fabrics relate to this concept of data mesh? Are they competing? Are they complimentary? >> Ultimately, we think that they're complimentary. And we actually like to talk about smart data fabrics as a way to kind of combine the best of the two worlds. >> What is that? >> The biggest thing really is there's a lot around data fabric architecture that talks about centralized processing. And in data meshes, it's more about distributed processing. Ultimately, we think a smart data fabric will support both and have them be interchangeable and be able to be used where it makes the most sense. There are some things where it makes sense to process, you know, for a local business unit, or even on a device for real-time kinds of implementations. There are some other areas where centralized processing of multiple different data sources make sense. And what we're saying is, "Your technology and the architecture that you define behind that technology should allow for both where they make the most sense." >> What's the bottom line business benefit of implementing a data fabric? What can I expect if I go that route? >> I think there are a couple of things, right? Certainly, being able to interact with customers in real time and being able to manage through changes in the marketplace is certainly a key concept. Time-to-value is another key concept. You know, if you think about the supply and logistics discussion that I had before, right? No company is going to rewrite their ERP operational system. It's how they manage and run their business. But being able to glean additional insights from that data combined with data from a partner combined with data from a customer or combined with algorithmic data that, you know, you may create some sort of forecast and that you want to fit into. And being able to combine that together without interfering with the operational process and get those answers quickly is an important thing. So, seeing through the silos and being able to do the connectivity, being able to have interoperability, and then, combining that with flexibility on the analytics and flexibility on the algorithms you might want to run against that data. Because in today's world, of course, you know, certainly there's the notion of predictive modeling and relational theory, but also now adding in machine learning, deep learning algorithms, and have all of those things kind of be interchangeable is another important concept behind data fabric. So you're not relegated to one type of processing. You're saying, "It's data and I have multiple different processing engines and I may want to interchange them over time." >> So, I know, well actually, you know, when you said "real time", I infer from that, I don't have a zillion copies of the data and it's not in a bunch of silos. Is that a correct premise? >> You try to minimize your copies of the data? >> Yeah. Okay. >> There's certainly, there's a nirvana that says, "There's only ever one copy of data." That's probably impossible. But you certainly don't want to be forced into making multiple copies of data to support different processing engines unnecessarily. >> And so, you've recently made some enhancements to the data fabric capability that takes it, you know, ostensibly to the next level. Is that the smart piece? Is that machine intelligence? Can you describe what's in there? >> Well, you know, ultimately, the business benefit is be able to have a single source of the truth for a company. And so, what we're doing is combining multiple technologies in a single set of software that makes that software agile and supportable and not fragile for deployment of applications. At its core, what we're saying is, you know, we want to be able to consume any kind of data and I think your data fabric architecture is predicated on the fact that you're going to have relational data, you're going to have document data, you may have key-value store data, you may have images, you may have other things, and you want to be able to not be limited by the kind of data that you want to process. And so that certainly is what we build into our product set. And then, you want to be able to have any kind of algorithm, where appropriate, run against that data without having to do a bunch of massive ETL processes or make another copy of the data and move it somewhere else. And so, to that end, we have, taking our award-winning engine, which, you know, provides, you know, traditional analytic capabilities and relational capabilities, we've now integrated machine learning. So, you basically can bring machine learning algorithms to the data without having to move data to the machine learning algorithm. What does that mean? Well, number one, your application developer doesn't have to think differently to take advantage of the new algorithm. So that's a really good thing. The other thing that happens is if you, you're playing that algorithm where the data actually exists from your operational system, that means the round trip from running the model to inferring some decision you want to make to actually implementing that decision can happen instantaneously, as opposed to, you know, other kinds of architectures, where you may want to make a copy of the data and move it somewhere else. That takes time, latency. Now the data gets stale, your model may not be as efficient because you're running against stale data. We've now taken all of that off the table by being able to pull that processing inside the data fabric, inside of the single source of truth. >> And you got to manage all that complexity. So you got one system, so that makes it, you know, cost-effective, and you're bringing modern tooling to the platform. Is that right? >> That's correct. >> How can people learn more and maybe continue the conversation with you if they have other questions? (both chuckle) >> Call or write. >> Yeah. >> Yeah, I mean, certainly, check out our website. We've got a lot of information about the different kinds of solutions, the different industries, the different technologies. Reach out: scottg@intersystems.com. >> Excellent. Thank you, Scott. Really appreciate it and great to see you again. >> Good to see you. >> All right, keep it right there. We have a demo coming up next. You want to see smart data fabrics in action? Stay tuned. (ambient music)

Published Date : Feb 17 2023

SUMMARY :

Good to see you again. It's good to be here. and I go back, you know, and data meshes in the industry. and this is not to the exclusion data is by its very nature, you know, the best of the two worlds. and be able to be used where and that you want to fit into. and it's not in a bunch of silos. But you certainly don't want to be forced Is that the smart piece? and you want to be able to not be limited so that makes it, you about the different kinds of solutions, great to see you again. data fabrics in action?

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
InterSystems	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Scott Gnau	PERSON	0.99+
scottg@intersystems.com	OTHER	0.99+
one system	QUANTITY	0.99+
both	QUANTITY	0.99+
one copy	QUANTITY	0.99+
today	DATE	0.98+
first segment	QUANTITY	0.98+
single	QUANTITY	0.97+
each step	QUANTITY	0.96+
two worlds	QUANTITY	0.96+
single source	QUANTITY	0.96+
single set	QUANTITY	0.94+
Today	DATE	0.91+
many years ago	DATE	0.84+
zillion copies	QUANTITY	0.73+
one type	QUANTITY	0.71+
one	QUANTITY	0.64+

Day Two Kickoff | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to day two of theCube's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight along with my co-host James Kobielus. James, it's great to be here with you in the hosting seat again. >> Day two, yes. >> Exactly. So here we are, this conference, 2,100 attendees from 32 countries, 23 industries. It's a relatively big show. They do three of them during the year. One of the things that I really-- >> It's a well-established show too. I think this is like the 11th year since Yahoo started up the first Hadoop summit in 2008. >> Right, right. >> So it's an established event, yeah go. >> Exactly, exactly. But I really want to talk about Hortonworks the company. This is something that you had brought up in an analyst report before the show started and that was talking about Hortonworks' cash flow positivity for the first time. >> Which is good. >> Which is good, which is a positive sign and yet what are the prospects for this company's financial health? We're still not seeing really clear signs of robust financial growth. >> I think the signs are good for the simple reason they're making significant investments now to prepare for the future that's almost inevitable. And the future that's almost inevitable, and when I say the future, the 2020s, the decade that's coming. Most of their customers will shift more of their workloads, maybe not entirely yet, to public cloud environments for everything they're doing, AI, machine learning, deep learning. And clearly the beneficiaries of that trend will be the public cloud providers, all of whom are Hortonworks' partners and established partners, AWS, Microsoft with Azure, Google with, you know, Google Cloud Platform, IBM with IBM Cloud. Hortonworks, and this is... You know, their partnerships with these cloud providers go back several years so it's not a new initiative for them. They've seen the writing on the wall practically from the start of Hortonworks' founding in 2011 and they now need to go deeper towards making their solution portfolio capable of being deployable on-prem, in cloud, public clouds, and in various and sundry funky combinations called hybrid multi-clouds. Okay, so, they've been making those investments in those partnerships and in public cloud enabling the Hortonworks Data Platform. Here at this show, DataWorks 2018 here in San Jose, they've released the latest major version, HDP 3.0 of their core platform with a lot of significant enhancements related to things that their customers are increasingly doing-- >> Well I want to ask you about those enhancements. >> But also they have partnership announcements, the deep ones of integration and, you know, lift and shift of the Hortonworks portfolio of HDP with Hortonworks DataFlow and DataPlane Services, so that those solutions can operate transparently on those public cloud environments as the customers, as and when the customers choose to shift their workloads. 'Cause Hortonworks really... You know, like Scott Gnau yesterday, I mean just laid it on the line, they know that the more of the public cloud workloads will predominate now in this space. They're just making these speculative investments that they absolutely have to now to prepare the way. So I think this cost that they're incurring now to prepare their entire portfolio for that inevitable future is the right thing to do and that's probably why they still have not attained massive rock and rollin' positive cash flow yet but I think that they're preparing the way for them to do so in the coming decade. >> So their financial future is looking brighter and they're doing the right things. >> Yeah, yes. >> So now let's talk tech. And this is really where you want to be, Jim, I know you. >> Oh I get sleep now and I don't think about tech constantly. >> So as you've said, they're really doing a lot of emphasis now on their public cloud partnerships. >> Yes. >> But they've also launched several new products and upgrades to existing products, what are you seeing that excites you and that you think really will be potential game changers? >> You know, this is geeky but this is important 'cause it's at the very heart of Hortonworks Data Platform 3.0, containerization of more... When you're a data scientist, and you're building a machine learning model using data that's maintained, and is persisted, and processed within Hortonworks Data Platform or any other big data platform, you want the ability increasingly for developing machine learning, deep learning, AI in general, to take that application you might build while you're using TensorFlow models, that you build on HDP, they will containerize it in Docker and, you know, orchestrate it all through Kubernetes and all that wonderful stuff, and deploy it out, those AI, out to increasingly edge computing, mobile computing, embedded computing environments where, you know, the real venture capital mania's happening, things like autonomous vehicles, and you know, drones, and you name it. So the fact is that Hortonworks has made that in many ways the premier new feature of HDP 3.0 announced here this week at the show. That very much harmonizes with what their partners, where their partners are going with containerization of AI. IBM, one of their premier partners, very recently, like last month, I think it was, announced the latest version of IBM, what do they call it, IBM Cloud Private, which has embedded as a core feature containerization within that environment which is a prem-based environment of AI and so forth. The fact that Hortonworks continues to maintain close alignment with the capabilities that its public cloud partners are building to their respective portfolios is important. But also Hortonworks with its, they call it, you know, a single pane of glass, the DataPlane Services for metadata and monitoring and governance and compliance across this sprawling hybrid multi-cloud, these scenarios. The fact that they're continuing to make, in fact, really focusing on deep investments in that portfolio, so that when an IBM introduces or, AWS, whoever, introduces some new feature in their respective platforms, Hortonworks has the ability to, as it were, abstract above and beyond all of that so that the customer, the developer, and the data administrator, all they need to do, if they're a Hortonworks customer, is stay within the DataPlane Services and environment to be able to deploy with harmonized metadata and harmonized policies, and harmonized schemas and so forth and so on, and query optimization across these sprawling environments. So Hortonworks, I think, knows where their bread is buttered and it needs to stay on the DPS, DataPlane Services, side which is why a couple months ago in Berlin, Hortonworks made a, I think, the most significant announcement of the year for them and really for the industry, was that they announced the Data Steward Studio in Berlin. Tech really clearly was who addressed the GDPR mandate that was coming up but really did a stewardship as an end-to-end workflow for lots of, you know, core enterprise applications, absolutely essential. Data Steward Studio is a DataPlane Service that can operate across multi-cloud environments. Hortonworks is going to keep on, you know... They didn't have a DPS, DataPlane Services, announcements here in San Jose this week but you can best believe that next year at this time at this show, and in the interim they'll probably have a number of significant announcements to deepen that portfolio. Once again it's to grease the wheels towards a more purely public cloud future in which there will be Hortonworks DNA inside most of their customers' environments going forward. >> I want to ask you about themes of this year's conference. The thing is is that you were in Berlin at the last big Hortonworks DataWorks Summit. >> (speaks in foreign language) >> And really GDPR dominated the conversations because the new rules and regulations hadn't yet taken effect and companies were sort of bracing for what life was going to be like under GDPR. Now the rules are here, they're here to stay, and companies are really grappling with it, trying to understand the changes and how they can exist in this new regime. What would you say are the biggest themes... We're still talking about GDPR, of course, but what would you say are the bigger themes that are this week's conference? Is it scalability, is it... I mean, what would you say we're going, what do you think has dominated the conversations here? >> Well scalability is not the big theme this week though there are significant scalability announcements this week in the context of HDP 3.0, the ability to persist in a scale-out fashion across multi-cloud, billions of files. Storage efficiency is an important piece of the overall announcement with support for erasure coding, blah blah blah. That's not, you know, that's... Already, Hortonworks, like all of their cloud providers and other big data providers, provide very scalable environments for storage, workload management. That was not the hugest, buzzy theme in terms of the announcements this week. The buzz of course was HDP 3.0. Containerization, that's important, but you know, we just came out of the day two keynote. AI is not a huge focus yet for a lot of the Hortonworks customers who are here, the developers. They're, you know, most of their customers are not yet that far along in their deep learning journeys and whatever but they're definitely going there. There's plenty of really cool keynote discussions including the guy with the autonomous vehicles or whatever that, the thing we just came out of. That was not the predominant theme this week here in terms of the HDP 3.0. I think what it comes down to is that with HDP 3.0... Hive, though you tend to take it for granted, it's been in Hadoop from the very start, practically, Hive is now a full enterprise database and that's the core, one of the cores, of HDP 3.0. Hive itself, Hive 3.0 now is its version, is ACID compliant and that may be totally geeky to the most of the world but that enables it to support transactional applications. So more big data in every environment is supporting more traditional enterprise application, transactional applications that require like two-phase commit and all that goodness. The fact is, you know, Hortonworks have, from what I can see, is the first of the big data vendors to incorporate those enhancements to Hive 3.0 because they're so completely tuned in to the Hive environment in terms of a committer. I think in many ways that is the predominant theme in terms of the new stuff that will actually resonate with the developers, their customers here at the show. And with the, you know, enterprises in general, they can put more of their traditional enterprise application workloads on big data environments and specifically, Hortonworks hopes, its HDP 3.0. >> Well I'm excited to learn more here at the on theCube with you today. We've got a lot of great interviews lined up and a lot of interesting content. We got a great crew too so this is a fun show to do. >> Sure is. >> We will have more from day two of the.

Published Date : Jun 20 2018

SUMMARY :

Live from San Jose, in the heart James, it's great to be here with you One of the things that I really-- I think this is like the So it's an This is something that you had brought up of robust financial growth. in public cloud enabling the Well I want to ask you is the right thing to do doing the right things. And this is really where you Oh I get sleep now and I don't think of emphasis now on their announcement of the year at the last big Hortonworks because the new rules of the announcements this week. this is a fun show to do.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Hortonworks'	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
2011	DATE	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
James	PERSON	0.99+
23 industries	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Hive 3.0	TITLE	0.99+
2020s	DATE	0.99+
next year	DATE	0.99+
this week	DATE	0.99+
32 countries	QUANTITY	0.99+
Hive	TITLE	0.99+
11th year	QUANTITY	0.99+
yesterday	DATE	0.99+
first time	QUANTITY	0.99+
GDPR	TITLE	0.98+
last month	DATE	0.98+
DataPlane Services	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Scott Gnau	PERSON	0.98+
2008	DATE	0.98+
three	QUANTITY	0.98+
2,100 attendees	QUANTITY	0.98+
HDP 3.0	TITLE	0.98+
today	DATE	0.98+
Data Steward Studio	ORGANIZATION	0.98+
two-phase	QUANTITY	0.98+
one	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.96+
DataPlane	ORGANIZATION	0.96+
Day two	QUANTITY	0.96+
billions of files	QUANTITY	0.95+
first	QUANTITY	0.95+
day two	QUANTITY	0.95+
DPS	ORGANIZATION	0.95+
Data Platform 3.0	TITLE	0.94+
Hortonworks DataWorks Summit	EVENT	0.94+
DataWorks	EVENT	0.92+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Day Two Keynote Analysis | Dataworks Summit 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Datawork Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Hello and welcome to the Cube on day two of Dataworks Summit 2018 from Berlin. It's been a great show so far. We have just completed the day two keynote and in just a moment I'll bring ya up to speed on the major points and the presentations from that. It's been a great conference. Fairly well attended here. The hallway chatter, discussion's been great. The breakouts have been stimulating. For me the takeaway is the fact that Hortonworks, the show host, has announced yesterday at the keynote, Scott Gnau, the CTO of Hortonworks announced Data Steward Studio, DSS they call it, part of the data plane, Hotronworks data plane services portfolio and it could not be more timely Data Steward Studio because we are now five weeks away from GDPR, that's the General Data Protection Regulation becoming the law of the land. When I say the land, the EU, but really any company that operates in the EU, and that includes many U.S. based and Apac based and other companies will need to comply with the GDPR as of May 25th and ongoing. In terms of protecting the personal data of EU citizens. And that means a lot of different things. Data Steward Studio announced yesterday, was demo'd today, by Hortonworks and it was a really excellent demo, and showed that it's a powerful solution for a number of things that are at the core of GDPR compliance. The demo covered the capability of the solution to discover and inventory personal data within a distributed data lake or enterprise data environment, number one. Number two, the ability of the solution to centralize consent, provide a consent portal essentially that data subjects can use then to review the data that's kept on them to make fine grain consents or withdraw consents for use in profiling of their data that they own. And then number three, the show, they demonstrated the capability of the solution then to execute the data subject to people's requests in terms of the handling of their personal data. The three main points in terms of enabling, adding the teeth to enforce GDPR in an operational setting in any company that needs to comply with GDPR. So, what we're going to see, I believe going forward in the, really in the whole global economy and in the big data space is that Hortonworks and others in the data lake industry, and there's many others, are going to need to roll out similar capabilities in their portfolios 'cause their customers are absolutely going to demand it. In fact the deadline is fast approaching, it's only five weeks away. One of the interesting take aways from the, the keynote this morning was the fact that John Kreisa, the VP for marketing at Hortonworks today, a quick survey of those in the audience a poll, asking how ready they are to comply with GDPR as of May 25th and it was a bit eye opening. I wasn't surprised, but I think it was 19 or 20%, I don't have the numbers in front of me, said that they won't be ready to comply. I believe it was something where between 20 and 30% said they will be able to comply. About 40% I'm, don't quote me on that, but a fair plurality said that they're preparing. So that, indicates that they're not entirely 100% sure that they will be able to comply 100% to the letter of the law as of May 25th. I think that's probably accurate in terms of ballpark figures. I think there's a lot of, I know there's a lot of companies, users racing for compliance by that date. And so really GDPR is definitely the headline banner, umbrella story around this event and really around the big data community world-wide right now in terms of enterprise, investments in the needed compliance software and services and capabilities are needed to comply with GDPR. That was important. That wasn't the only thing that was covered in, not only the keynotes, but in the sessions here so far. AI, clearly AI and machine learning are hot themes in terms of the innovation side of big data. There's compliance, there's GDPR, but really innovation in terms of what enterprises are doing with their data, with their analytics, they're building more and more AI and embedding that in conversational UIs and chatbots and their embedding AI, you know manner of e-commerce applications, internal applications in terms of search, as well as things like face recognition, voice recognition, and so forth and so on. So, what we've seen here at the show is what I've been seeing for quite some time is that more of the actual developers who are working with big data are the data scientists of the world. And more of the traditional coders are getting up to speed very rapidly on the new state of the art for building machine learning and deep learning AI natural language processing into their applications. That said, so Hortonworks has become a fairly substantial player in the machine learning space. In fact, you know, really across their portfolio many of the discussions here I've seen shows that everybody's buzzing about getting up to speed on frameworks for building and deploying and iterating and refining machine learning models in operational environments. So that's definitely a hot theme. And so there was an AI presentation this morning from the first gentleman that came on that laid out the broad parameters of what, what developers are doing and looking to do with data that they maintain in their lakes, training data to both build the models and train them and deploy them. So, that was also something I expected and it's good to see at Dataworks Summit that there is a substantial focus on that in addition of course to GDPR and compliance. It's been about seven years now since Hortonworks was essentially spun off of Yahoo. It's been I think about three years or so since they went IPO. And what I can see is that they are making great progress in terms of their growth, in terms of not just the finances, but their customer acquisition and their deal size and also customer satisfaction. I get a sense from talking to many of the attendees at this event that Hortonworks has become a fairly blue chip vendor, that they're really in many ways, continuing to grow their footprint of Hortonworks products and services in most of their partners, such as IBM. And from what I can see everybody was wrapped with intention around Data Steward Studio and I sensed, sort of a sigh of relief that it looks like a fairly good solution and so I have no doubt that a fair number of those in this hall right now are probably, as we say in the U.S., probably kicking the tires of DSS and probably going to expedite their adoption of it. So, with that said, we have day two here, so what we're going to have is Alan Gates, one of the founders of Hortonworks coming on in just a few minutes and I'll be interviewing him, asking about the vibrancy in the health of the community, the Hortonworks ecosystem, developers, partners, and so forth as well as of course the open source communities for Hadoop and Ranger and Atlas and so forth, the growing stack of open source code upon which Hortonworks has built their substantial portfolio of solutions. Following him we'll have John Kreisa, the VP for marketing. I'm going to ask John to give us an update on, really the, sort of the health of Hortonworks as a business in terms of the reach out to the community in terms of their messaging obviously and have him really position Hortonworks in the community in terms of who's he see them competing with. What segments is Hortonworks in now? The whole Hadoop segment increasingly... Hadoop is there. It's the foundation. The word is not invoked in the context of discussions of Hortonworks as much now as it was in the past. And the same thing for say Cloudera one of their closest to traditional rivals, closest in the sense that people associate them. I was at the Cloudera analyst event the other week in Santa Monica, California. It was the same thing. I think both of these vendors are on a similar path to become fairly substantial data warehousing and data governance suppliers to the enterprises of the world that have traditionally gone with the likes of IBM and Oracle and SAP and so forth. So I think they're, Hortonworks, has definitely evolved into a far more diversified solution provider than people realize. And that's really one of the take aways from Dataworks Summit. With that said, this is Jim Kobielus. I'm the lead analyst, I should've said that at the outset. I'm the lead analyst at SiliconANGLE's Media's Wikibon team focused on big data analytics. I'm your host this week on the Cube at Dataworks Summit Berlin. I'll close out this segment and we'll get ready to talk to the Hortonworks and IBM personnel. I understand there's a gentleman from Accenture on as well today on the Cube here at Dataworks Summit Berlin. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube as a business in terms of the reach out to the community

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
John Kreisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
May 25th	DATE	0.99+
Berlin	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
five weeks	QUANTITY	0.99+
Alan Gates	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Hotronworks	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Santa Monica, California	LOCATION	0.99+
GDPR	TITLE	0.99+
19	QUANTITY	0.99+
both	QUANTITY	0.99+
100%	QUANTITY	0.99+
today	DATE	0.99+
20%	QUANTITY	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
U.S.	LOCATION	0.99+
DSS	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
three main points	QUANTITY	0.98+
Atlas	ORGANIZATION	0.98+
20	QUANTITY	0.98+
about seven years	QUANTITY	0.98+
Accenture	ORGANIZATION	0.97+
SiliconANGLE	ORGANIZATION	0.97+
One	QUANTITY	0.97+
about three years	QUANTITY	0.97+
Day Two	QUANTITY	0.97+
first gentleman	QUANTITY	0.96+
day two	QUANTITY	0.96+
SAP	ORGANIZATION	0.96+
EU	LOCATION	0.95+
Datawork Summit Europe 2018	EVENT	0.95+
Dataworks Summit	EVENT	0.94+
this morning	DATE	0.91+
About 40%	QUANTITY	0.91+
Wikibon	ORGANIZATION	0.9+
EU	ORGANIZATION	0.9+

Joe Morrissey, Hortonworks | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to theCUBE. I'm James Kobielus. I'm lead analyst at Wikibon for big data analytics. Wikibon, of course, is the analyst team inside of SiliconANGLE Media. One of our core offerings is theCUBE and I'm here with Joe Morrissey. Joe is the VP for International at Hortonworks and Hortonworks is the host of Dataworks Summit. We happen to be at Dataworks Summit 2018 in Berlin! Berlin, Germany. And so, Joe, it's great to have you. >> Great to be here! >> We had a number of conversations today with Scott Gnau and others from Hortonworks and also from your customer and partners. Now, you're International, you're VP for International. We've had a partner of yours from South Africa on theCUBE today. We've had a customer of yours from Uruguay. So there's been a fair amount of international presence. We had Munich Re from Munich, Germany. Clearly Hortonworks is, you've been in business as a company for seven years now, I think it is, and you've established quite a presence worldwide, I'm looking at your financials in terms of your customer acquisition, it just keeps going up and up so you're clearly doing a great job of bringing the business in throughout the world. Now, you've told me before the camera went live that you focus on both Europe and Asia PACS, so I'd like to open it up to you, Joe. Tell us how Hortonworks is doing worldwide and the kinds of opportunities you're selling into. >> Absolutely. 2017 was a record year for us. We grew revenues by over 40% globally. I joined to lead the internationalization of the business and you know, not a lot of people know that Hortonworks is actually one of the fastest growing software companies in history. We were the fastest to get to $100 million. Also, now the fastest to get to $200 million but the majority of that revenue contribution was coming from the United States. When I joined, it was about 15% of international contribution. By the end of 2017, we'd grown that to 31%, so that's a significant improvement in contribution overall from our international customer base even though the company was growing globally at a very fast rate. >> And that's also not only fast by any stretch of the imagination in terms of growth, some have said," Oh well, maybe Hortonworks, "just like Cloudera, maybe they're going to plateau off "because the bloom is off the rose of Hadoop." But really, Hadoop is just getting going as a market segment or as a platform but you guys have diversified well beyond that. So give us a sense for going forward. What are your customers? What kind of projects are you positioning and selling Hortonworks solutions into now? Is it a different, well you've only been there 18 months, but is it shifting towards more things to do with streaming, NiFi and so forth? Does it shift into more data science related projects? Coz this is worldwide. >> Yeah. That's a great question. This company was founded on the premise that data volumes and diversity of data is continuing to explode and we believe that it was necessary for us to come and bring enterprise-grade security and management and governance to the core Hadoop platform to make it really ready for the enterprise, and that's what the first evolution of our journey was really all about. A number of years ago, we acquired a company called Onyara, and the logic behind that acquisition was we believe companies now wanted to go out to the point of origin, of creation of data, and manage data throughout its entire life cycle and derive pre-event as well as post-event analytical insight into their data. So what we've seen as our customers are moving beyond just unifying data in the data lake and deriving post-transaction inside of their data. They're now going all the way out to the edge. They're deriving insight from their data in real time all the way from the point of creation and getting pre-transaction insight into data as well so-- >> Pre-transaction data, can you define what you mean by pre-transaction data. >> Well, I think if you look at it, it's really the difference between data in motion and data at rest, right? >> Oh, yes. >> A specific example would be if a customer walks into the store and they've interacted in the store maybe on social before they come in or in some other fashion, before they've actually made the purchase. >> Engagement data, interaction data, yes. >> Engagement, exactly. Exactly. Right. So that's one example, but that also extends out to use cases in IoT as well, so data in motion and streaming data, as you mentioned earlier since become a very, very significant use case that we're seeing a lot of adoption for. Data science, I think companies are really coming to the realization that that's an essential role in the organization. If we really believe that data is the most important asset, that it's the crucial asset in the new economy, then data scientist becomes a really essential role for any company. >> How do your Asian customers' requirements differ, or do they differ from your European cause European customers clearly already have their backs against the wall. We have five weeks until GDPR goes into effect. Do many of your Asian customer, I'm sure a fair number sell into Europe, are they putting a full court, I was going to say in the U.S., a full court press on complying with GDPR, or do they have equivalent privacy mandates in various countries in Asia or a bit of both? >> I think that one of the primary drivers I see in Asia is that a lot of companies there don't have the years of legacy architecture that European companies need to contend with. In some cases, that means that they can move towards next generation data-orientated architectures much quicker than European companies have. They don't have layers of legacy tech that they need to sunset. A great example of that is Reliance. Reliance is the largest company in India, they've got a subsidiary called GO, which is the fastest growing telco in the world. They've implemented our technology to build a next-generation OSS system to improve their service delivery on their network. >> Operational support system. >> Exactly. They were able to do that from the ground up because they formed their telco division around being a data-only company and giving away voice for free. So they can in some extent, move quicker and innovate a little faster in that regards. I do see much more emphasis on regulatory compliance in Europe than I see in Asia. I do think that GDPR amongst other regulations is a big driver of that. The other factor though I think that's influencing that is Cloud and Cloud strategy in general. What we've found is that, customers are drawn to the Cloud for a number of reasons. The economics sometimes can be attractive, the ability to be able to leverage the Cloud vendors' skills in terms of implementing complex technology is attractive, but most importantly, the elasticity and scalability that the Cloud provides us, hugely important. Now, the key concern for customers as they move to the Cloud though, is how do they leverage that as a platform in the context of an overall data strategy, right? And when you think about what a data strategy is all about, it all comes down to understanding what your data assets are and ensuring that you can leverage them for a competitive advantage but do so in a regulatory compliant manner, whether that's data in motion or data at rest. Whether it's on-prem or in the Cloud or in data across multiple Clouds. That's very much a top of mind concern for European companies. >> For your customers around the globe, specifically of course, your area of Europe and Asia, what percentage of your customers that are deploying Hortonworks into a purely public Cloud environment like HDInsight and Microsoft Azure or HDP inside of AWS, in a public Cloud versus in a private on-premises deployment versus in a hybrid public-private multi Cloud. Is it mostly on-prem? >> Most of our business is still on-prem to be very candid. I think almost all of our customers are looking at migrating, some more close to the Cloud. Even those that had intended to have a Cloud for a strategy have now realized that not all workloads belong in the Cloud. Some are actually more economically viable to be on-prem, and some just won't ever be able to move to the Cloud because of regulation. In addition to that, most of our customers are telling us that they actually want Cloud optionality. They don't want to be locked in to a single vendor, so we very much view the future as hybrid Cloud, as multi Cloud, and we hear our customers telling us that rather than just have a Cloud strategy, they need a data strategy. They need a strategy to be able to manage data no matter where it lives, on which tier, to ensure that they are regulatory compliant with that data. But then to be able to understand that they can secure, govern, and manage those data assets at any tier. >> What percentage of your deals involve a partner? Like IBM is a major partner. Do you do a fair amount of co-marketing and joint sales and joint deals with IBM and other partners or are they mostly Hortonworks-led? >> No, partners are absolutely critical to our success in the international sphere. Our partner revenue contribution across EMEA in the past year grew, every region grew by over 150% in terms of channel contribution. Our total channel business was 28% of our total, right? That's a very significant contribution. The growth rate is very high. IBM are a big part of that, as are many other partners. We've got, the very significant reseller channel, we've got IHV and ISV partners that are critical to our success also. Where we're seeing the most impact with with IBM is where we go to some of these markets where we haven't had a presence previously, and they've got deep and long-standing relationships and that helps us accelerate time to value with our customers. >> Yeah, it's been a very good and solid partnership going back several years. Well, Joe, this is great, we have to wrap it up, we're at the end of our time slot. This has been Joe Morrissey who is the VP for International at Hortonworks. We're on theCUBE here at Dataworks Summit 2018 in Berlin, and want to thank you all for watching this segment and tune in tomorrow, we'll have a full slate of further discussions with Hortonworks, with IBM and others tomorrow on theCUBE. Have a good one. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and Hortonworks is the host of Dataworks Summit. and the kinds of opportunities you're selling into. Also, now the fastest to get to $200 million of the imagination in terms of growth, and governance to the core Hadoop platform Pre-transaction data, can you define what you mean maybe on social before they come in or Engagement data, that that's an essential role in the organization. Do many of your Asian customer, that they need to sunset. the ability to be able to leverage the Cloud vendors' skills and Microsoft Azure or Most of our business is still on-prem to be very candid. and joint deals with IBM that are critical to our success also. and want to thank you all for watching this segment and

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Joe Morrissey	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Europe	LOCATION	0.99+
Joe	PERSON	0.99+
Uruguay	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
India	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
seven years	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
28%	QUANTITY	0.99+
South Africa	LOCATION	0.99+
Onyara	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
United States	LOCATION	0.99+
$100 million	QUANTITY	0.99+
$200 million	QUANTITY	0.99+
31%	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
18 months	QUANTITY	0.99+
GO	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.98+
today	DATE	0.98+
U.S.	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
AWS	ORGANIZATION	0.98+
Berlin, Germany	LOCATION	0.98+
over 40%	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.98+
Reliance	ORGANIZATION	0.98+
over 150%	QUANTITY	0.97+
Dataworks Summit	EVENT	0.97+
EMEA	ORGANIZATION	0.97+
first evolution	QUANTITY	0.96+
2018	EVENT	0.96+
European	OTHER	0.96+
SiliconANGLE Media	ORGANIZATION	0.95+
Munich, Germany	LOCATION	0.95+
One	QUANTITY	0.95+
end of 2017	DATE	0.94+
Hadoop	TITLE	0.93+
Cloudera	ORGANIZATION	0.93+
about 15%	QUANTITY	0.93+
past year	DATE	0.92+
theCUBE	ORGANIZATION	0.92+
single vendor	QUANTITY	0.91+
telco	ORGANIZATION	0.89+
Munich Re	ORGANIZATION	0.88+

Keynote Analysis | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering DataWorks Summit, Europe 2018. (upbeat music) Brought to you by Hortonworks. (upbeat music) >> Hello, and welcome to theCUBE. I'm James Kobielus. I'm the lead analyst for Big Data analytics in the Wikibon team of SiliconANGLE Media, and we're here at DataWorks Summit 2018 in Berlin, Germany. And it's an excellent event, and we are here for two days of hard-hitting interviews with industry experts focused on the hot issues facing customers, enterprises, in Europe and the world over, related to the management of data and analytics. And what's super hot this year, and it will remain hot as an issue, is data privacy and privacy protection. Five weeks from now, a new regulation of the European Union called the General Data Protection Regulation takes effect, and it's a mandate that is effecting any business that is not only based in the EU but that does business in the EU. It's coming fairly quickly, and enterprises on both sides of the Atlantic and really throughout the world are focused on GDPR compliance. So that's a hot issue that was discussed this morning in the keynote, and so what we're going to be doing over the next two days, we're going to be having experts from Hortonworks, the show's host, as well as IBM, Hortonworks is one of their lead partners, as well as a customer, Munich Re, will appear on theCUBE and I'll be interviewing them about not just GDPR but really the trends facing the Big Data industry. Hadoop, of course, Hortonworks got started about seven years ago as one of the solution providers that was focused on commercializing the open source Hadoop code base, and they've come quite a ways. They've had their recent financials were very good. They continue to rock 'n' roll on the growth side and customer acquisitions and deal sizes. So we'll be talking a little bit later to Scott Gnau, their chief technology officer, who did the core keynote this morning. He'll be talking not only about how the business is doing but about a new product announcement, the Data Steward Studio that Hortonworks announced overnight. It is directly related to or useful, this new solution, for GDPR compliance, and we'll ask Scott to bring us more insight there. But what we'll be doing over the next two days is extracting signal from noise. The Big Data space continues to grow and develop. Hadoop has been around for a number of years now, but in many ways it's been superseded in the agenda as the priorities of enterprises that are building applications from data by some newer primarily open source technology such as Apache Spark, TensorFlow for building deep learning and so forth. We'll be discussing the trends towards the deepening of the open source data analytics stack with our guest. We'll be talking with a European based reinsurance company, Munich Re, about the data lake that they have built for their internal operations, and we'll be asking their, Andres Kohlmaier, their lead of data engineering, to discuss how they're using it, how they're managing their data lake, and possibly to give us some insight about it will serve them in achieving GDPR compliance and sustaining it going forward. So what we will be doing is that we'll be looking at trends, not just in compliance, not just in the underlying technologies, but the applications that Hadoop and Spark and so forth, these technologies are being used for, and the applications are really, the same initiatives in Europe are world-wide in terms of what enterprises are doing. They're moving away from Big Data environments built primarily on data at rest, that's where Hadoop has been, the sweet spot, towards more streaming architectures. And so Hortonworks, as I said the show's host, has been going more deeply towards streaming architectures with its investments in NiFi and so forth. We'll be asking them to give us some insight about where they're going with that. We'll also be looking at the growth of multi-cloud Big Data environments. What we're seeing is that there's a trend in the marketplace away from predominately premises-based Big Data platforms towards public cloud-based Big Data platforms. And so Hortonworks, they are partners with a number of the public cloud providers, the IBM that I mentioned. They've also got partnerships with Microsoft Azure, with Amazon Web Services, with Google and so forth. We'll be looking, we'll be asking our guest to give us some insight about where they're going in terms of their support for multi-clouds, support for edge computing, analytics, and the internet of things. Big Data increasingly is evolving towards more of a focus on serving applications at the edge like mobile devices that have autonomous smarts like for self-driving vehicles. Big Data is critically important for feeding, for modeling and building the AI needed to power the intelligence and endpoints. Not just self-driving cars but intelligent appliances, conversational user interfaces for mobile devices for our consumer appliances like, you know, Amazon's got their Alexa, Apple's got their Siri and so forth. So we'll be looking at those trends as well towards pushing more of that intelligence towards the edge and the power and the role of Big Data and data driven algorithms, like machine learning, and driving those kinds of applications. So what we see in the Wikibon, the team that I'm embedded within, we have published just recently our updated forecast for the Big Data analytics market, and we've identified key trends that are... revolutionizing and disrupting and changing the market for Big Data analytics. So among the core trends, I mentioned the move towards multi-clouds. The move towards a more public cloud-based big data environments in the enterprise, I'll be asking Hortonworks, who of course built their business and their revenue stream primarily on on-premises deployments, to give us a sense for how they plan to evolve as a business as their customers move towards more public cloud facing deployments. And IBM, of course, will be here in force. We have tomorrow, which is a Thursday. We have several representatives from IBM to talk about their initiatives and partnerships with Hortonworks and others in the area of metadata management, in the area of machine learning and AI development tools and collaboration platforms. We'll be also discussing the push by IBM and Hortonworks to enable greater depths of governance applied to enterprise deployments of Big Data, both data governance, which is an area where Hortonworks and IBM as partners have achieved a lot of traction in terms of recognition among the pace setters in data governance in the multi-cloud, unstructured, Big Data environments, but also model governments. The governing, the version controls and so forth of machine learning and AI models. Model governance is a huge push by enterprises who increasingly are doing data science, which is what machine learning is all about. Taking that competency, that practice, and turning into more of an industrialized pipeline of building and training and deploying into an operational environment, a steady stream of machine-learning models into multiple applications, you know, edge applications, conversational UIs, search engines, eCommerce environments that are driven increasingly by machine learning that's able to process Big Data in real time and deliver next best actions and so forth more intelligence into all applications. So we'll be asking Hortonworks and IBM to net out where they're going with their partnership in terms of enabling a multi-layered governance environment to enable this pipeline, this machine-learning pipeline, this data science pipeline, to be deployed it as an operational capability into more organizations. Also, one of the areas where I'll be probing our guest is to talk about automation in the machine learning pipeline. That's been a hot theme that Wikibon has seen in our research. A lot of vendors in the data science arena are adding automation capabilities to their machine-learning tools. Automation is critically important for productivity. Data scientists as a discipline are in limited supply. I mean experienced, trained, seasoned data scientists fetch a high price. There aren't that many of them, so more of the work they do needs to be automated. It can be automated by a mature tool, increasingly mature tools on the market, a growing range of vendors. I'll be asking IBM and Hortonworks to net out where they're going with automation in sight of their Big Data, their machine learning tools and partnerships going forward. So really what we're going to be doing over the next few days is looking at these trends, but it's going to come back down to GDPR as a core envelope that many companies attending this event, DataWorks Summit, Berlin, are facing. So I'm James Kobielus with theCUBE. Thank you very much for joining us, and we look forward to starting our interviews in just a little while. Our first up will be Scott Gnau from Hortonworks. Thank you very much. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and enterprises on both sides of the Atlantic

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Andres Kohlmaier	PERSON	0.99+
Apple	ORGANIZATION	0.99+
European Union	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Scott	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Munich Re	ORGANIZATION	0.99+
Thursday	DATE	0.99+
Siri	TITLE	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
Wikibon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Data Steward Studio	ORGANIZATION	0.98+
both	QUANTITY	0.98+
tomorrow	DATE	0.98+
DataWorks Summit	EVENT	0.98+
Atlantic	LOCATION	0.98+
one	QUANTITY	0.98+
Berlin	LOCATION	0.98+
both sides	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
Apache	ORGANIZATION	0.96+
Hadoop	TITLE	0.95+
Alexa	TITLE	0.94+
this year	DATE	0.94+
Spark	TITLE	0.92+
2018	EVENT	0.91+
EU	ORGANIZATION	0.91+
Dataworks Summit 2018	EVENT	0.88+
TensorFlow	ORGANIZATION	0.81+
this morning	DATE	0.77+
about seven years ago	DATE	0.76+
Azure	TITLE	0.7+
next two days	DATE	0.68+
Five weeks	QUANTITY	0.62+
NiFi	TITLE	0.59+
European	LOCATION	0.59+
theCUBE	ORGANIZATION	0.58+

Chris Selland, Unifi Software | Big Data SV 2018

>> Voiceover: Live from San Jose, it's The Cube. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. >> Welcome back to The Cube, our continuing coverage of our event, Big Data SV. We're on day two of this event. I'm Lisa Martin, with George Gilbert. We've had a great day yesterday learning a lot and really peeling back the layers of big data, looking at it from different perspectives, from challenges to opportunities. Joining us next is one of our Cube alumni, Chris Selland, the VP of Strategic Alliances from Unifi Software. Chris, great to meet you, welcome back! >> Thank you Lisa, it's great to be here. I have to say, as a alumni and a many time speaker, this venue is spectacular. Congratulations on the growth of The Cube, and this is an awesome venue. I've been on The Cube a bunch of times and this is as nice as I've ever seen it, >> Yeah, this is pretty cool, >> Onward and upward. This place is great. Isn't it cool? >> It really is. This is our 10th Big Data event, we've been having five now in San Jose, do our fifth one in New York City in the fall, and it's always interesting because we get the chance, George and I, and the other hosts, to really look at what is going on from different perspectives in the industry of big data. So before we kind of dig into that, tell us a little bit about Unifi Software, what do you guys do, what is unique and differentiating about Unifi. >> Sure, yeah, so I joined Unifi a little over a year ago. You know, I was attracted to the company because it really, I think, is aligned with where the market is going, and Peter talked this morning, Peter Burris was talking this morning about networks of data. Unifi is fundamentally a data catalog and data preparation platform, kind of combined or unified together. So, you know, so people say, "What do you do?" We're a data catalog with integrated data preparation. And the idea behind that, to go to Peter's, you know, mention of networks of data, is that data is becoming more and more distributed in terms of where it is, where it lives, where it sits. This idea of we're going to put everything in the data warehouse, and then we're going to put everything in the data lake, well, in reality, some of the data's in the warehouse, some of the data's in the lake, some of the data's in SAS applications, some of the data's in blob storage. And where is all of that data, what is it, and what can I do with it, that's really the fundamental problem that we solve. And, by the way, solve it for business people, because it's not just data scientists anymore, it's really going out into the entire business community now, you know, marketing people, operations people, finance people, they need data to do their jobs. Their jobs are becoming more data driven, but they're not necessarily data people. They don't know what schemas are, or joins are, but they know, "I need better data "to be able to do my job more effectively." So that's really what we're helping with. So, Chris, this is, it's kind of interesting, if you distill, you know, the capability down to the catalog and the prep-- >> Chris: Yep. So that it's ready for a catalog, but that sort of thing is, it's like investment in infrastructure, in terms of like building the highway system, but there're going to be, you know, for those early highways, there's got to be roots that you, a reason to build them out. What are some of those early use cases that justifies the investment in data infrastructure? >> There absolutely are, I mean, and by the way, those roots don't go away, those roots, you know, just like cities, right? New roots get built on top of them. So we're very much, you know, about, there's still data sitting in mainframes and legacy systems and you know, that data is absolutely critical for many large organizations. We do a lot of working in banking and financial services, and healthcare. They're still-- >> George: Are there common use cases that they start with? >> A lot of times-- >> Like, either by industry or just cross-sectional? >> Well, it's interesting, because, you know, analysts like yourselves have tended to put data catalog, which is a relatively new term, although some other big analyst firm that's having another conference this week, they were telling us recently that, starts with a "G," right? They were telling us that data catalog is now the number one search term they're getting. But it's been, by many annals, also kind of lumped in, lumped in's the wrong word, but incorporated with data governance. So traditionally, governance, another word that starts with "G," it's been the term. So, we often, we're not a traditional data governance platform, per se, but cataloging data has to have a foundation of security in governance. You know, think about what's going on in the world right now, both in the court of law and the court of public opinion, things like GDPR, right? So GDPR sort of says any customer data you have needs to be managed a certain way, with a certain level of sensitivity, and then there's other capabilities you need to open up to customers, like the right to be forgotten, so that means I need to have really good control, first of all, knowledge of, control over, and governance over my customer data. I talked about all those business people before. Certainly marketers are a great example. Marketers want all the customer data they can get, right? But there's social security numbers, PII, who should be able to see and use what? Because, if this data is used inappropriately, then it can cause a lot of problems. So, IT kind of sits in a-- they want to enable the business, but at the same time, there's a lot of risk there. So, anyway, going back to your question, you know, the catalog market is kind of evolved out of the governance market with more of a focus on kind of, you know, enabling the business, but making sure that it's done in a secure and well-governed way. >> George: Guard rails. >> Yes, guard rails, exactly, good way to say it. So, yep, that's good, I said about 500 words, and you distilled it to about two, right? Perfect, yep. >> So, in terms of your role in strategic alliances, tell us a little about some of the partnerships that Unifi is forging, to help customers understand where all this data is, to your point earlier, the different lines of business that need it to drive, identify where's their value, and drive the business forward, can actually get it. >> Absolutely, well, certainly to your point, our customers are our partners, and we can talk about some of them. But also, strategic alliances, we work very closely with a number of, you know, larger technology companies, Microsoft is a good example. We were actually part of the Microsoft Accelerator Program, which I think they've now rebranded Microsoft for Startups, but we've really been given tremendous support by that team, and we're doing a lot of work to, kind of, we're to some degree cloud agnostic, we support AWS, we support Azure, we support Google Cloud, but we're doing a lot of our development also on the Azure cloud platform. But you know, customers use all of the above, so we need to support all of the above. So Microsoft's a very close partner of ours. Another, I'll be in two weeks, and we've got some interesting news pending, which unfortunately I can't get into today, but maybe in a couple weeks, with Adobe. We're working very closely with them on their marketing cloud, their experience cloud, which is what they call their enterprise marketing cloud, which obviously, big, big focus on customer data, and then we've been working with a number of organizations and the sort of professional services system integration. We've had a lot of success with a firm called Access Group. We announced the partnership with them about two weeks ago. They've been a great partner for us, as well. So, you know, it's all about an ecosystem. Making customers successful is about getting an ecosystem together, so it's a really exciting place to be. >> So, Chris, it's actually interesting, it sounds like there's sort of a two classic routes to market. One is essentially people building your solution into theirs, whether it's an application or, you know, >> Chris: An enabling layer. >> Yes. >> Chris: Yes. >> Even higher layer. But with corporate developers, you know, it's almost like we spent years experimenting with these data lakes. But they were a little too opaque. >> Chris: Yes. >> And you know, it's not just that you provide the guard rails, but you also provide, sort of some transparency-- >> Chris: Yes. >> Into that. Have you seen a greater success rate within organizations who curate their data lakes, as opposed to those who, you know, who don't? >> Yes, absolutely. I think Peter said it very well in his presentation this morning, as well. That, you know, generally when you see data lake, we associate it with Hadoop. There are use cases that Hadoop is very good for, but there are others where it might not be the best fit. Which, to the early point about networks of data and distributed data, so companies that have, or organizations that have approached Hadoop with a "let's use it what it's good for," as opposed to "let's just dump "everything in there and figure it out later," and there have been a lot of the latter, but the former have done, generally speaking, a lot better, and that's what you're seeing. And we actually use Hadoop as a part of our platform, at least for the data preparation and transformation side of what we do. We use it in its enabling technology, as well. >> You know, it's funny, actually, when you talk about, as Peter talked about, networks of data versus centralized repositories. Scott Gnau, CTO of Hortonworks, was on yesterday, and he was talking about how he had originally come from Teradata, and that they had tried to do work, that he had tried to push them in the direction of recognizing that not all the analytic data was going to be in Teradata, you know, but they had to look more broadly with Hadapt, and I forgot what the rest of, you know-- >> Chris: Right, Aster, and-- >> Aster, yeah. >> Chris: Yes, exactly, yep. >> But what was interesting is that Hortonworks was moving towards the "we believe "everything is going to be in the data lake," but now, with their data plane service, they're talking about, you know, "We have to give you visibility and access." You mediate access to data everywhere. >> Chris: Right. >> So maybe help, so for folks who aren't, like, all bought into Hortonworks, for example, how much, you know, explain how you work relative to data plane service. >> Well, you know, maybe I could step back and give you a more general answer, because I agree with that philosophically, right? That, as I think we've been talking about here, with the networks of data, that goes back to my prior statement that there's, you know, there's different types of data platforms that have different use cases, and different types of solutions should be built on top of them, so things are getting more distributed. I think that, you know, Hortonworks, like every company, has to make the investments that are, as we are, making their customers successful. So, using Hadoop, and Hortonworks is one of our supported Hadoop platforms, we do work with them on engagements, but you know, it's all about making customers successful, ultimately. It's not about a particular product, it's about, you know, which data belongs in which location, and for what use case and what purpose, and then at the same time, when we're taking all of these different data sets and data sources, and cataloging them and preparing them and creating our output, where should we put that and catalog that, so we can create kind of a continuous improvement cycle, as well, and for those types-- >> A flywheel. >> A flywheel, exactly, continuous improvement flywheel, and for those types of purposes, you know, that's actually great use case for, you know, Hortonworks, Hadoop. That's a lot of what we typically use it for. We can actually put the data any place our customers define, but that's very often what we do with it, and then, but doing it in a very structured and organized way. As opposed to, you know, a lot of the early Hadoop, and not specific to any particular distro that went bad, were, it was just like, "Let's just dump it all "into Hadoop because it's cheaper." You know, "Let's, 'cause it's cheaper than the warehouse, "so let's just put it all in there, "and we'll figure what to do with it later." That's bad, but if you're using it in a structured way, it can be extremely useful. At the same point, and at the same time, not everything's going to go there belongs there, if you're being thoughtful about it. So you're seeing a lot more thoughtfulness these days, which is good. Which is good for customers, and it's good for us in the vendor side. Us, Hortonworks, everybody, so. >> So is there, maybe you can tell us of the different approaches to, like, the advantage of integrating the data prep with the catalogized service, because as soon as you're done with data prep it's visible within the catalog. >> Chris: Absolutely, that's one, yep. >> When, let's say when people do derive additional views into the data, how are they doing that in a way that then gets also registered back in the catalog, for further discovery? >> Yeah, well, having the integrated data preparation which is a huge differentiator from us, there are a lot of data catalog products out there, but our huge differentiator, one of them, is the fact that we have integrated data preparation. We don't have to hand off to another product, so that, as you said, gives us the ability to then catalog our output and build that flywheel, that continuous improvement flywheel, and it also just basically simplifies things for customers, hence our name. So, you know, it really kind of starts there. I think I, the second part of your question I didn't really, rewind back on that for me, it was-- >> Go ahead. >> Well, I'm not sure I remember it, right now, either. >> We all need more coffee. >> Exactly, we all need more coffee. >> So I'll ask you this last question, then. >> Yes, please. >> What are, so here we are in March 2018, what are you looking forward to, in terms of momentum and evolution of Unifi this year? >> Well, a lot of it, and tying into my role, I mentioned we will be at Adobe Summit in two weeks, so if you're going to be at Adobe Summit, come see us there, some of the work that we're doing with our partner, some of the events we're doing with people like Microsoft and Access, but really it's also just customer success, I mean, we're seeing tremendous momentum on the customer side, working with our customers, working with our partners, and again, as I mentioned, we're seeing so much more thoughtfulness in the market, these days, and less talk about, you know, the speeds and feeds, and more around business solutions. That's really also where our professional services, system integration partners, many of whom I've been with this week, really help, because they're building out solutions. You know, GDPR is coming in May, right? And you're starting to really see a groundswell of, okay, you know, and that's not about, you know, speeds and feeds. That's ultimately about making sure that I'm compliant with, you know, this huge regulatory environment. And at the same time, the court of public opinion is just as important. You know, we want to make sure that we're doing the right thing with data. Spread it throughout organization, make ourselves successful and make our customers successful. So, it's a lot of fun. >> That's, fun is good. >> Exactly, fun is good. >> Well, we thank you so much, Chris, for stopping back by The Cube and sharing your insights, what you're hearing in the big data industry, and some of the momentum that you're looking forward to carrying throughout the year. >> It's always a pleasure, and you, too. So, love the venue. >> Lisa: All right. >> Thank you, Lisa, thank you, George. >> Absolutely. We want to thank you for watching The Cube. You're watching our coverage of our event, Big Data SV, hashtag BigDataSV, for George, I almost said George Martin. For George Gilbert. >> George: I wish. >> George R.R., yeah. You would not be here if you were George R.R. Martin. >> George: No, I wouldn't. >> That was a really long way to say thank you for watching. I'm Lisa Martin, for this George. Stick around, we'll be right back with our next guest. (techno music)

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media and really peeling back the layers of big data, Thank you Lisa, it's great to be here. Onward and upward. George and I, and the other hosts, So, you know, so people say, "What do you do?" you know, for those early highways, and legacy systems and you know, with more of a focus on kind of, you know, and you distilled it to about two, right? and drive the business forward, can actually get it. So, you know, it's all about an ecosystem. or, you know, But with corporate developers, you know, as opposed to those who, you know, who don't? That, you know, generally when you see data lake, and I forgot what the rest of, you know-- yeah. "We have to give you visibility and access." how much, you know, explain how you work to my prior statement that there's, you know, and for those types of purposes, you know, So is there, maybe you can tell us So, you know, it really kind of starts there. and that's not about, you know, speeds and feeds. Well, we thank you so much, Chris, So, love the venue. We want to thank you for watching The Cube. You would not be here if you were George R.R. That was a really long way to say thank you for watching.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Chris	PERSON	0.99+
Peter	PERSON	0.99+
Chris Selland	PERSON	0.99+
George	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Lisa	PERSON	0.99+
March 2018	DATE	0.99+
Adobe	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Peter Burris	PERSON	0.99+
Unifi	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
George R.R. Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Unifi Software	ORGANIZATION	0.99+
May	DATE	0.99+
Teradata	ORGANIZATION	0.99+
George Martin	PERSON	0.99+
George R.R.	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Access Group	ORGANIZATION	0.99+
yesterday	DATE	0.99+
both	QUANTITY	0.99+
five	QUANTITY	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
this week	DATE	0.98+
Hadapt	ORGANIZATION	0.98+
fifth one	QUANTITY	0.98+
about 500 words	QUANTITY	0.98+
Hadoop	TITLE	0.98+
Adobe Summit	EVENT	0.98+
one	QUANTITY	0.98+
two weeks	QUANTITY	0.98+
One	QUANTITY	0.96+
Aster	PERSON	0.96+
this morning	DATE	0.96+
this year	DATE	0.95+
two weeks ago	DATE	0.95+
today	DATE	0.95+
The Cube	ORGANIZATION	0.95+
Cube	ORGANIZATION	0.93+
Big Data	EVENT	0.91+
day two	QUANTITY	0.91+
Access	ORGANIZATION	0.9+

George Chow, Simba Technologies - DataWorks Summit 2017

>> (Announcer) Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi everybody, this is George Gilbert, Big Data and Analytics Analyst with Wikibon. We are wrapping up our show on theCUBE today at DataWorks 2017 in San Jose. It has been a very interesting day, and we have a special guest to help us do a survey of the wrap-up, George Chow from Simba. We used to call him Chief Technology Officer, now he's Technology Fellow, but when we was explaining the different in titles to me, I thought he said Technology Felon. (George Chow laughs) But he's since corrected me. >> Yes, very much so >> So George and I have been, we've been looking at both Spark Summit last week and DataWorks this week. What are some of the big advances that really caught your attention? >> What's caught my attention actually is how much manufacturing has really, I think, caught into the streaming data. I think last week was very notable that both Volkswagon and Audi actually had case studies for how they're using streaming data. And I think just before the break now, there was also a similar session from Ford, showcasing what they are doing around streaming data. >> And are they using the streaming analytics capabilities for autonomous driving, or is it other telemetry that they're analyzing? >> The, what is it, I think the Volkswagon study was production, because I still have to review the notes, but the one for Audi was actually quite interesting because it was for managing paint defect. >> (George Gilbert) For paint-- >> Paint defect. >> (George Gilbert) Oh. >> So what they were doing, they were essentially recording the environmental condition that they were painting the cars in, basically the entire pipeline-- >> To predict when there would be imperfections. >> (George Chow) Yes. >> Because paint is an extremely high-value sort of step in the assembly process. >> Yes, what they are trying to do is to essentially make a connection between downstream defect, like future defect, and somewhat trying to pinpoint the causes upstream. So the idea is that if they record all the environmental conditions early on, they could turn around and hopefully figure it out later on. >> Okay, this sounds really, really concrete. So what are some of the surprising environmental variables that they're tracking, and then what's the technology that they're using to build model and then anticipate if there's a problem? >> I think the surprising finding they said were actually, I think it was a humidity or fan speed, if I recall, at the time when the paint was being applied, because essentially, paint has to be... Paint is very sensitive to the condition that is being applied to the body. So my recollection is that one of the finding was that it was a narrow window during which the paint were, like, ideal, in terms of having the least amount of defect. >> So, had they built a digital twin style model, where it's like a digital replica of some aspects of the car, or was it more of a predictive model that had telemetry coming at it, and when it's an outside a certain bounds they know they're going to have defects downstream? >> I think they're still working on the predictive model, or actually the model is still being built, because they are essentially trying to build that model to figure out how they should be tuning the production pipeline. >> Got it, so this is sort of still in the development phase? >> (George Chow) Yeah, yeah >> And can you tell us, did they talk about the technologies that they're using? >> I remember the... It's a little hazy now because after a couple weeks of conference, so I don't remember the specifics because I was counting on the recordings to come out in a couples weeks' time. So I'll definitely share that. It's a case study to keep an eye on. >> So tell us, were there other ones where this use of real-time or near real-time data had some applications that we couldn't do before because we now can do things with very low latency? >> I think that's the one that I was looking forward to with Ford. That was the session just earlier, I think about an hour ago. The session actually consisted of a demo that was being done live, you know. It was being streamed to us where they were showcasing the data that was coming off a car that's been rigged up. >> So what data were they tracking and what were they trying to anticipate here? >> They didn't give enough detail, but it was basically data coming off of the CAN bus of the car, so if anybody is familiar with the-- >> Oh that's right, you're a car guru, and you and I compare, well our latest favorite is the Porche Macan >> Yes, yes. >> SUV, okay. >> But yeah, they were looking at streaming the performance data of the car as well as the location data. >> Okay, and... Oh, this sounds more like a test case, like can we get telemetry data that might be good for insurance or for... >> Well they've built out the system enough using the Lambda Architecture with Kafka, so they were actually consuming the data in real-time, and the demo was actually exactly seeing the data being ingested and being acted on. So in the case they were doing a simplistic visualization of just placing the car on the Google Map so you can basically follow the car around. >> Okay so, what was the technical components in the car, and then, how much data were they sending to some, or where was the data being sent to, or how much of the data? >> The data was actually sent, streamed, all the way into Ford's own data centers. So they were using NiFi with all the right proxy-- >> (George Gilbert) NiFi being from Hortonworks there. >> Yeah, yeah >> The Hortonworks data flow, okay >> Yeah, with all the appropriate proxys and firewall to bring it all the way into a secure environment. >> Wow >> So it was quite impressive from the point of view of, it was life data coming off of the 4G modem, well actually being uploaded through the 4G modem in the car. >> Wow, okay, did they say how much compute and storage they needed in the device, in this case the car? >> I think they were using a very lightweight platform. They were streaming apparently from the Raspberry Pi. >> (George Gilbert) Oh, interesting. >> But they were very guarded about what was inside the data center because, you know, for competitive reasons, they couldn't share much about how big or how large a scale they could operate at. >> Okay, so Simba has been doing ODBC and JDBC drivers to standard APIs, to databases for a long time. That was all about, that was an era where either it was interactive or batch. So, how is streaming, sort of big picture, going to change the way applications are built? >> Well, one way to think about streaming is that if you look at many of these APIs, into these systems, like Spark is a good example, where they're trying to harmonize streaming and batch, or rather, to take away the need to deal with it as a streaming system as opposed to a batch system, because it's obviously much easier to think about and reason about your system when it is traditional, like in the traditional batch model. So, the way that I see it also happening is that streaming systems will, you could say will adapt, will actually become easier to build, and everyone is trying to make it easier to build, so that you don't have to think about and reason about it as a streaming system. >> Okay, so this is really important. But they have to make a trade-off if they do it that way. So there's the desire for leveraging skill sets, which were all batch-oriented, and then, presumably SQL, which is a data manipulation everyone's comfortable with, but then, if you're doing it batch-oriented, you have a portion of time where you're not sure you have the final answer. And I assume if you were in a streaming-first solution, you would explicitly know whether you have all the data or don't, as opposed to late arriving stuff, that might come later. >> Yes, but what I'm referring to is actually the programming model. All I'm saying is that more and more people will want streaming applications, but more and more people need to develop it quickly, without having to build it in a very specialized fashion. So when you look at, let's say the example of Spark, when they focus on structured streaming, the whole idea is to make it possible for you to develop the app without having to write it from scratch. And the comment about SQL is actually exactly on point, because the idea is that you want to work with the data, you can say, not mindful, not with a lot of work to account for the fact that it is actually streaming data that could arrive out of order even, so the whole idea is that if you can build applications in a more consistent way, irrespective whether it's batch or streaming, you're better off. >> So, last week even though we didn't have a major release of Spark, we had like a point release, or a discussion about the 2.2 release, and that's of course very relevant for our big data ecosystem since Spark has become the compute engine for it. Explain the significance where the reaction time, the latency for Spark, went down from several hundred milliseconds to one millisecond or below. What are the implications for the programming model and for the applications you can build with it. >> Actually, hitting that new threshold, the millisecond, is actually a very important milestone because when you look at a typical scenario, let's say with AdTech where you're serving ads, you really only have, maybe, on the order about 100 or maybe 200 millisecond max to actually turn around. >> And that max includes a bunch of things, not just the calculation. >> Yeah, and that, let's say 100 milliseconds, includes transfer time, which means that in your real budget, you only have allowances for maybe, under 10 to 20 milliseconds to compute and do any work. So being able to actually have a system that delivers millisecond-level performance actually gives you ability to use Spark right now in that scenario. >> Okay, so in other words, now they can claim, even if it's not per event processing, they can claim that they can react so fast that it's as good as per event processing, is that fair to say? >> Yes, yes that's very fair. >> Okay, that's significant. So, what type... How would you see applications changing? We've only got another minute or two, but how do you see applications changing now that, Spark has been designed for people that have traditional, batch-oriented skills, but who can now learn how to do streaming, real-time applications without learning anything really new. How will that change what we see next year? >> Well I think we should be careful to not pigeonhole Spark as something built for batch, because I think the idea is that, you could say, the originators, of Spark know that it's all about the ease of development, and it's the ease of reasoning about your system. It's not the fact that the technology is built for batch, so the fact that you could use your knowledge and experience and an API that actually is familiar, should leverage it for something that you can build for streaming. That's the power, you could say. That's the strength of what the Spark project has taken on. >> Okay, we're going to have to end it on that note. There's so much more to go through. George, you will be back as a favorite guest on the show. There will be many more interviews to come. >> Thank you. >> With that, this is George Gilbert. We are DataWorks 2017 in San Jose. We had a great day today. We learned a lot from Rob Bearden and Rob Thomas up front about the IBM deal. We had Scott Gnau, CTO of Hortonworks on several times, and we've come away with an appreciation for a partnership now between IBM and Hortonworks that can take the two of them into a set of use cases that neither one on its own could really handle before. So today was a significant day. Tune in tomorrow, we have another great set of guests. Keynotes start at nine, and our guests will be on starting at 11. So with that, this is George Gilbert, signing out. Have a good night. (energetic, echoing chord and drum beat)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, do a survey of the wrap-up, What are some of the big advances caught into the streaming data. but the one for Audi was actually quite interesting in the assembly process. So the idea is that if they record So what are some of the surprising environmental So my recollection is that one of the finding or actually the model is still being built, of conference, so I don't remember the specifics the data that was coming off a car the performance data of the car for insurance or for... So in the case they were doing a simplistic visualization So they were using NiFi with all the right proxy-- to bring it all the way into a secure environment. So it was quite impressive from the point of view of, I think they were using a very lightweight platform. the data center because, you know, for competitive reasons, going to change the way applications are built? so that you don't have to think about and reason about it But they have to make a trade-off if they do it that way. so the whole idea is that if you can build and for the applications you can build with it. because when you look at a typical scenario, not just the calculation. So being able to actually have a system that delivers but how do you see applications changing now that, so the fact that you could use your knowledge There's so much more to go through. that can take the two of them

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
George	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Scott Gnau	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Audi	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
George Chow	PERSON	0.99+
Ford	ORGANIZATION	0.99+
last week	DATE	0.99+
Silicon Valley	LOCATION	0.99+
one millisecond	QUANTITY	0.99+
two	QUANTITY	0.99+
next year	DATE	0.99+
100 milliseconds	QUANTITY	0.99+
200 millisecond	QUANTITY	0.99+
today	DATE	0.99+
tomorrow	DATE	0.99+
Volkswagon	ORGANIZATION	0.99+
this week	DATE	0.99+
Google Map	TITLE	0.99+
AdTech	ORGANIZATION	0.99+
DataWorks 2017	EVENT	0.98+
DataWorks Summit 2017	EVENT	0.98+
both	QUANTITY	0.98+
11	DATE	0.98+
Spark	TITLE	0.98+
Wikibon	ORGANIZATION	0.96+
under 10	QUANTITY	0.96+
one	QUANTITY	0.96+
20 milliseconds	QUANTITY	0.95+
Spark Summit	EVENT	0.94+
first solution	QUANTITY	0.94+
SQL	TITLE	0.93+
hundred milliseconds	QUANTITY	0.93+
2.2	QUANTITY	0.92+
one way	QUANTITY	0.89+
Spark	ORGANIZATION	0.88+
Lambda Architecture	TITLE	0.87+
Kafka	TITLE	0.86+
minute	QUANTITY	0.86+
Porche Macan	ORGANIZATION	0.86+
about 100	QUANTITY	0.85+
ODBC	TITLE	0.84+
DataWorks	EVENT	0.84+
NiFi	TITLE	0.84+
about an hour ago	DATE	0.8+
JDBC	TITLE	0.79+
Raspberry Pi	COMMERCIAL_ITEM	0.76+
Simba	ORGANIZATION	0.75+
Simba Technologies	ORGANIZATION	0.74+
couples weeks'	QUANTITY	0.7+
CTO	PERSON	0.68+
theCUBE	ORGANIZATION	0.67+
twin	QUANTITY	0.67+
couple weeks	QUANTITY	0.64+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Scott Gnau: