VideoClipper Reel | DataWorks Summit 2018

like railroads and shipping in the 1800s and oil in the 1900s data really is the wealth creator of this century and so that creates a very nerve-wracking environment it also creates an environment a very agile and very important technological breakthroughs that enable those things to be turned into we believe everything is data-driven and in fact we would argue that data is more valuable than oil or diamonds or plutonium or platinum or silver or anything else it is the most valuable asset whether it be a global fortune 500 whether you be a midsize come here whether it be her songs firing drill we're in the business of doing helping customers do better with the data they have without I'm do not spend more whether it's on frame or on the cloud that sort of we want to help customers be comfortable getting more data under management now along with security and governance and the lower third TCO is flying it from Atlanta Georgia to London and you want to be able to make sure you really understand how well is that each component performing so that that point is going to need service when it gets there it doesn't miss the turn around and leave 300 passengers stranded or delayed right now with with our with our connect levers we have the ability to take every piece of data from every component that's generated and see that in real time and left with the airline's make [Music]

Published Date : Jun 26 2018

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
Atlanta Georgia	LOCATION	0.99+
TCO	ORGANIZATION	0.99+
each component	QUANTITY	0.99+
1900s	DATE	0.99+
1800s	DATE	0.98+
DataWorks Summit 2018	EVENT	0.94+
every piece	QUANTITY	0.93+
every component	QUANTITY	0.91+
third	QUANTITY	0.9+
this century	DATE	0.75+
VideoClipper	EVENT	0.73+
500	QUANTITY	0.7+

Steve Wooledge, Arcadia Data & Satya Ramachandran, Neustar | DataWorks Summit 2018

(upbeat electronic music) >> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering Dataworks Summit 2018, brought to you by Hortonworks. (electronic whooshing) >> Welcome back to theCUBE's live coverage of Dataworks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests in this segment, we have Steve Wooledge, he is the VP of Product Marketing at Arcadia Data, and Satya Ramachandran, who is the VP of Engineering at Neustar. Thanks so much for coming on theCUBE. >> Our pleasure and thank you. >> So let's start out by setting the scene for our viewers. Tell us a little bit about what Arcadia Data does. >> Arcadia Data is focused on getting business value from these modern scale-out architectures, like Hadoop, and the Cloud. We started in 2012 to solve the problem of how do we get value into the hands of the business analysts that understand a little bit more about the business, in addition to empowering the data scientists to deploy their models and value to a much broader audience. So I think that's been, in some ways, the last mile of value that people need to get out of Hadoop and data lakes, is to get it into the hands of the business. So that's what we're focused on. >> And start seeing the value, as you said. >> Yeah, seeing is believing, a picture is a thousand words, all those good things. And what's really emerging, I think, is companies are realizing that traditional BI technology won't solve the scale and user concurrency issues, because architecturally, big data's different, right? We're on the scale-out, MPP architectures now, like Hadoop, the data complexity and variety has changed, but the BI tools are still the same, and you pull the data out of the system to put it into some little micro cube to do some analysis. Companies want to go after all the data, and view the analysis across a much broader set, and that's really what we enable. >> I want to hear about the relationship between your two companies, but Satya, tell us a little about Neustar, what you do. >> Neustar is an information services company, we are built around identity. We are the premiere identity provider, the most authoritative identity provider for the US. And we built a whole bunch of services around that identity platform. I am part of the marketing solutions group, and I head the analytics engineering for marketing solutions. The product that I work on helps marketers do their annual planning, as well as their campaign or tactical planning, so that they can fine tune their campaigns on an ongoing basis. >> So how do you use Arcadia Data's primary product? >> So we are a predictive analytics platform, the reporting solution, we use Arcadia for the reporting part of it. So we have multi terabytes of advertising data in our values, and so we use Arcadia to provide fast taxes to our customers, and also very granular and explorative analysis of this data. High (mumbles) and explorative analysis of this data. >> So you say you help your customers with their marketing campaigns, so are you doing predictive analytics? And are you during churn analysis and so forth? And how does Arcadia fit into all of that? >> So we get data and then they build an activation model, which tells how the marketing spent corresponds to the revenue. We not only do historical analysis, we also do predictive, in the sense that the marketers frequently done what-if analysis, saying that, what if I moved my budget from page search to TV? And how does it affect the revenue? So all of this modeling is built by Neustar, the modeling platform is built by the Neustar, but the last mile of taking these reports and providing this explorative analysis of the results, that is provided by the reporting solution, which is Arcadia. >> Well, I mean, the thing about data analytics, is that it really is going to revolutionize marketing. That famous marketing adage of, I know my advertising works, I just don't know which half, and now we're really going to be able to figure out which half. Can you talk a little bit about return on investment and what your clients see? >> Sure, we've got some major Fortune 500 companies that have said publicly that they've realized over a billion dollars of incremental value. And that could be across both marketing analytics, and how we better treat our messaging, our brand, to reach our intended audience. There's things like supply chain and being able to more realtime analyze what-if analysis for different routes, it's things like cyber security and stopping fraud and waste and things like that at a much grander scale than what was really possible in the past. >> So we're here at Dataworks and it's the Hortonworks show. Give us a sense of the degree of your engagement or partnership with Hortonworks and participation in their partner ecosystem. >> Yeah, absolutely. Hortonworks is one of our key partners, and what we did that's different architecturally, is we built our BI server directly into the data platforms. So what I mean by that is, we take the concept of a BI server, we install it and run it on the data nodes of Hortonworks Data Platform. We inherit the security directly out of systems like Apache Ranger, so that all that administration and scale is done at Hadoop economics, if you will, and it leverages the things that are already in place. So that has huge advantages both in terms of scale, but also simplicity, and then you get the performance, the concurrency that companies need to deploy out to like, 5,000 users directly on that Hadoop cluster. So, Hortonworks is a fantastic partner for us and a large number of our customers run on Hortonworks, as well as other platforms, such as Amazon Web Services, where Satya's got his system deployed. >> At the show they announced Hortonworks Data Platform 3.0. There's containerization there, there's updates to Hive to enable it to be more of a realtime analytics, and also a data warehousing engine. In Arcadia Data, do you follow their product enhancements, in terms of your own product roadmap with any specific, fixed cycle? Are you going to be leveraging the new features in HDP 3.0 going forward to add value to your customers' ability to do interactive analysis of this data in close to realtime? >> Sure, yeah, no, because we're a native-- >> 'Cause marketing campaigns are often in realtime increasingly, especially when you're using, you know, you got a completely digital business. >> Yeah, absolutely. So we benefit from the innovations happening within the Hortonworks Data Platform. So, because we're a native BI tool that runs directly within that system, you know, with changes in Hive, or different things within HDFS, in terms of performance or compression and things like that, our customers generally benefit from that directly, so yeah. >> Satya, going forward, what are some of the problems that you want to solve for your clients? What is their biggest pain points and where do you see Neustar? >> So, data is the new island, right? So, marketers, also for them now, data is the biggest, is what they're going after. They want faster analysis, they want to be able to get to insights as fast as they can, and they want to obviously get, work on as large amount of data as possible. The variety of sources is becoming higher and higher and higher, in terms of marketing. There used to be a few channels in '70s and '80s, and '90s kind of increased, now you have like, hundreds of channels, if not thousands of channels. And they want visibility across all of that. It's the ability to work across this variety of data, increasing volume at a very high speed. Those are high level challenges that we have at Neustar. >> Great. >> So the difference, marketing attribution analysis you say is one of the core applications of your solution portfolio. How is that more challenging now than it had been in the past? We have far more marketing channels, digital and so forth, then how does the state-of-the-art of marketing attribution analysis, how is it changing to address this multiplicity of channels and media for advertising and for influencing the customer on social media and so forth? And then, you know, can you give us a sense for then, what are the necessary analytical tools needed for that? We often hear about a social graph analysis or semantic analysis, or for behavioral analytics and so forth, all of this makes it very challenging. How can you determine exactly what influences a customer now in this day and age, where, you think, you know, Twitter is an influencer over the conversation. How can you nail that down to specific, you know, KPIs or specific things to track? >> So I think, from our, like you pointed out, the variety is increasing, right? And I think the marketers now have a lot more options than what they have, and that that's a blessing, and it's also a curse. Because then I don't know where I'm going to move my marketing spending to. So, attribution right now, is still sitting at the headquarters, it's kind of sitting at a very high level and it is answering questions. Like we said, with the Fortune 100 companies, it's still answering questions to the CMOs, right? Where attribution will take us, next step is to then lower down, where it's able to answer the regional headquarters on what needs to happen, and more importantly, on every store, I'm able to then answer and tailor my attribution model to a particular store. Let's take Ford for an example, right? Now, instead of the CMO suite, but, if I'm able to go to every dealer, and I'm able to personal my attribution to that particular dealer, then it becomes a lot more useful. The challenge there is it all needs to be connected. Whatever model we are working for the dealer, needs to be connected up to the headquarters. >> Yes, and that personalization, it very much leverages the kind of things that Steve was talking about at Arcadia. Being able to analyze all the data to find those micro, micro, micro segments that can be influenced to varying degrees, so yeah. I like where you're going with this, 'cause it very much relates to the power of distributing federated big data fabrics like Hortonworks' offers. >> And so it's streaming analytics is coming to forward, and it's been talked about for the past longest period of time, but we have real use cases for streaming analytics right now. Similarly, the large volumes of the data volumes is, indeed, becoming a lot more. So both of them are doing a lot more right now. >> Yes. >> Great. >> Well, Satya and Steve, thank you so much for coming on theCUBE, this was really, really fun talking to you. >> Excellent. >> Thanks, it was great to meet you. Thanks for having us. >> I love marketing talk. >> (laughs) It's fun. I'm Rebecca Knight, for James Kobielus, stay tuned to theCUBE, we will have more coming up from our live coverage of Dataworks, just after this. (upbeat electronic music)

Published Date : Jun 20 2018

SUMMARY :

brought to you by Hortonworks. the VP of Product Marketing the scene for our viewers. the data scientists to deploy their models the value, as you said. and you pull the data out of the system Neustar, what you do. and I head the analytics engineering the reporting solution, we use Arcadia analysis of the results, and what your clients see? and being able to more realtime and it's the Hortonworks show. and it leverages the things of this data in close to realtime? you got a completely digital business. So we benefit from the It's the ability to work to specific, you know, KPIs and I'm able to personal my attribution the data to find those micro, analytics is coming to forward, talking to you. Thanks, it was great to meet you. stay tuned to theCUBE, we

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Steve Wooledge	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Satya Ramachandran	PERSON	0.99+
Steve	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Neustar	ORGANIZATION	0.99+
Arcadia Data	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
Satya	PERSON	0.99+
2012	DATE	0.99+
San Jose	LOCATION	0.99+
two companies	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
Arcadia	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
US	LOCATION	0.99+
both	QUANTITY	0.99+
Hortonworks'	ORGANIZATION	0.99+
5,000 users	QUANTITY	0.99+
Dataworks	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.97+
Twitter	ORGANIZATION	0.96+
hundreds of channels	QUANTITY	0.96+
Dataworks Summit 2018	EVENT	0.96+
DataWorks Summit 2018	EVENT	0.93+
thousands of channels	QUANTITY	0.93+
over a billion dollars	QUANTITY	0.93+
Data Platform 3.0	TITLE	0.9+
'70s	DATE	0.86+
Arcadia	TITLE	0.84+
Hadoop	TITLE	0.84+
HDP 3.0	TITLE	0.83+
'90s	DATE	0.82+
Apache Ranger	ORGANIZATION	0.82+
thousand words	QUANTITY	0.76+
HDFS	TITLE	0.76+
multi terabytes	QUANTITY	0.75+
Hive	TITLE	0.69+
Neustar	TITLE	0.67+
Fortune	ORGANIZATION	0.62+
80s	DATE	0.55+
500	QUANTITY	0.45+
100	QUANTITY	0.4+
theCUBE	TITLE	0.39+

Partha Seetala, Robin Systems | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone, you are watching day two of theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight. I'm coming at you with my cohost Jame Kobielus. We're joined by Partha Seetala, he is the Chief Technology Officer at Robin Systems, thanks so much for coming on theCUBE. >> Pleasure to be here. >> You're a first timer, so we promise we don't bite. >> Actually I'm not, I was on theCUBE- >> Oh! >> At DockerCon in 2016. >> Oh well excellent, okay, so now you're a veteran, right. >> Yes, ma'am. >> So Robin Systems, as before the cameras were rolling, we were talking about it, it's about four years old, based here in San Jose, venture backed company. Tell us a little bit more about the company and what you do. >> Absolutely. First of all, thanks for hosting me here. Like you said, Robin is a Silicon Valley based company. Our focus is in allowing applications, such as big data, databases, no sequel and AI ML, to run within the Kubernetes platform. What we have built is a product that converges storage, complex storage, networking, application workflow management, along with Kubernetes to create a one click experience where users can get managed services kind of feel when they're deploying these applications. They can also do one click life cycle management on these apps. Our thesis has initially been to, instead of looking at this problem from an infrastructure up into application, to actually look at it from the applications down and then say, "Let the applications drive the underlying infrastructure to meet the user's requirements." >> Is that your differentiating factor, would you say? >> Yeah, I think it is because most of the folks out there today are looking at is as if it's a competent based play, it's like they want to bring storage to Kubernetes or networking to Kubernetes but the challenges are not really around storage and networking. If you talk to the operations folk they say that, "You know what? Those are underlying problems but my challenge is more along the lines of, okay, my CIO says the initiative is to make my applications mobile. They want go across to different Clouds. That's my challenge." The line of business user says, "I want to get a managed source experience." Yes, storage is the thing that you want to manage underneath, but I want to go and click and create my, let's say, an Oracle database or distributions log. >> In terms of the developer experience here, from the application down, give us a sense for how Robin Systems tooling your product enables that degree of specification of the application logic that will then get containerized within? >> Absolutely, like I said, we want applications to drive the infrastructure. What it means is that we, Robin is a software platform. We later ourselves on top of the machines that we sit on whether it is bare metal machines on premises, our VMs, or even an Azure, Google Cloud as well as AWs. Then we make the underlying compute, storage, network resources almost invisible. We treat it as a pool of resources. Now once you have this pool of resources, they can be attached to the applications that are being deployed as can inside containers. I mean, it's a software place, install on machines. Once it's installed, the experience now moves away from infrastructure into applications. You log in, you can see a portal, you have a lot of applications in that portal. We ship support for about 25 applications of some such. >> So these are templates? >> Yes. >> That the developer can then customize to their specific requirements? Or no? >> Absolutely, we ship reference templates for pretty much a wide variety of the most popular big data, no sequel, database, AI ML applications today. But again, as I said, it's a reference implementation. Typically customers take the reference recommendation and they enhance it or they use that to onboard their custom apps, for example, or the apps that we don't ship out of the box. So it's a very open, extensible platform but the goal being that whatever the application might be, in fact we keep saying that, if it runs somewhere else, it's runs on Robin, right? So the idea here is that you can bring anything, and we just, the flip of switch, you can make it a one click deploy, one click manage, one click mobile across Clouds. >> You keep mentioning this one click and this idea of it being so easy, so convenient, so seamless, is that what you say is the biggest concern of your customers? Is this ease and speed? Or what are some other things that are on their minds that you want to deliver? >> Right, so one click of course is a user experience part but what is the real challenge? The real challenges, there are a wide variety of tools being used by enterprises today. Even the data analytic pipeline, there's a lot across the data store, processor pipeline. Users don't want to deal with setting it up and keeping it up and running. They don't want that, they want to get the job done, right? Now when you only get the job done, you really want to hide the underlying details of those platforms and the best way to convey that, the best way to give that experience is to make it a single click experience from the UI. So I keep calling it all one click because that is the experience that you get to hide the underlying complexity for these apps. >> Does your environment actually compile executable code based on that one click experience? Or where does the compilation and containerization actually happen in your distributed architecture? >> Alright, so, I think the simplest- >> You're a prem based offering, right? You're not in the Cloud yourself? >> No, we are. We work on all the three big public clouds. >> Oh, okay. >> Whether it is Azure, AWS or Google. >> So your entire application is containerized itself for deployment into these Clouds? >> Yes, it is. >> Okay. >> So the idea here is let's simplify it significantly, right? You have Kubernetes today, it can run anywhere, on premises, in the public Cloud and so on. Kubernetes is a great platform for orchestrating containers but it is largely inaccessible to a certain class of data centric applications. >> Yeah. >> We make that possible. But our take is, just onboarding those applications on Kubernetes does not solve your CXO or you line of business user's problems. You ought to make the management, from an application point of view, not from a container management point of view, from an application point of view, a lot easier and that is where we kind of create this experience that I'm talking about, one click experience. >> Give us a sense for how, we're here at DataWorks and it's the Hortonworks show. Discuss with us your partnership with Hortonworks and you know, we've heard the announcement of HDP 3.0 and containerization support, just give us a rough sense for how you align or partner with Hortonworks in this area. >> Absolutely. It's kind of interesting because Hortonworks is a data management platform, if you think about it from that point of view and when we engaged with them first- So some of our customers have been using the product, Hortonworks, on top of Robin, so orchestrating Hortonworks, making it a lot easier to use. >> Right. >> One of the requirements was, "Are you certified with Hortonworks?" And the challenge that Hortonworks also had is they had never certified a container based deployment of Hortonworks before. They actually were very skeptical, you know, "You guys are saying all these things. Can you actually containerize and run Hortonworks?" So we worked with Hortonworks and we are, I mean if you go to the Hortonworks website, you'll see that we are the first in the entire industry who have been certified as a container based play that can actually deploy and manage Hortonworks. They have certified us by running a wide variety of tests, which they call the Q80 Test Suite, and when we got certified the only other players in the market that got that stamp of approval was Microsoft in Azure and EMC with Isilon. >> So you're in good company? >> I think we are in great company. >> You're certified to work with HTP 3.0 or the prior version or both? >> When we got certified we were still in the 2.X version of Hortonworks, HTP 3.0 is a more relatively newer version. But our plan is that we want to continue working with Hortonworks to get certified as they release the program and also help them because HTP 3.0 also has some container based orchestration and deployment so you want to help them provide the underlying infrastructure so that it becomes easier for beyond to spin up more containers. >> The higher level security and governance and all these things you're describing, they have to be over the Kubernetes layer. Hortonworks supports it in their data plane services portfolio. Does Robin Systems solutions portfolio tap in to any of that, or do you provide your own layer of sort of security and metadata management so forth? >> Yeah, so we don't want- >> In context of what you offer? >> Right, so we don't want to take away the security model that the application itself provides because might have step it up so that they are doing governance, it's not just logging in and auto control and things like this. Some governance is built into. We don't want to change that. We want to keep the same experience and the same workflow hat customers have so we just integrate with whatever security that the application has. We, of course, provide security in terms of isolating these different apps that are running on the Robin platform where the security or the access into the application itself is left to the apps themselves. When I say apps, I'm talking about Hortonworks. >> Yeah, sure. >> Or any other databases. >> Moving forward, as you think about ways you're going to augment and enhance and alter the Robin platform, what are some of the biggest trends that are driving your decision making around that in the sense of, as we know that companies are living with this deluge of data, how are you helping them manage it better? >> Sure. I think there are a few trends that we are closely watching. One is around Cloud mobility. CIOs want their applications along with their data to be available where their end users are. It's almost like follow the sun model, where you might have generated the data in one Cloud and at a different time, different time zone, you'll basically want to keep the app as well as data, moving. So we are following that very closely. How we can enable the mobility of data and apps a lot easier in that world. The other one is around the general AI ML workflow. One of the challenges there, of course, you have great apps like TensorFlow or Theano or Caffe, these are very good AI ML toolkits but one of the challenges that people face, is they are buying this very expensive, let's say NVIDIA DGX Box, this box costs about $150,000 each, how do you keep these boxes busy so that you're getting a good return on investment? It will require you to better manage the resources offered with these boxes. We are also monitoring that space and we're seeing that how can we take the Robin platform and how do you enable the better utilization of GPUs or the sharing of GPUs for running your AI ML kind of workload. >> Great. >> Those are, I think, two key trends that we are closely watching. >> We'll be discussing those at the next DataWorks Summit, I'm sure, at some other time in the future. >> Absolutely. >> Thank you so much for coming on theCUBE, Partha. >> Thank you. >> Thank you, my pleasure. Thanks. >> I'm Rebecca Knight for James Kobielus, We will have more from DataWorks coming up in just a little bit. (techno beat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, he is the Chief Technology we promise we don't bite. so now you're a veteran, right. and what you do. from the applications down Yes, storage is the thing that you want the machines that we sit on or the apps that we don't because that is the No, we are. So the idea here is let's and that is where we kind of create and it's the Hortonworks show. if you think about it One of the requirements was, or the prior version or both? the underlying infrastructure so that to any of that, or do you that are running on the Robin platform the Robin platform and how do you enable that we are closely watching. at the next DataWorks Summit, Thank you so much for Thank you, my pleasure. We will have more from DataWorks

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Jame Kobielus	PERSON	0.99+
San Jose	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Robin Systems	ORGANIZATION	0.99+
Partha Seetala	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
one click	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2016	DATE	0.99+
both	QUANTITY	0.99+
HTP 3.0	TITLE	0.99+
NVIDIA	ORGANIZATION	0.99+
first	QUANTITY	0.99+
DataWorks	ORGANIZATION	0.99+
Robin	ORGANIZATION	0.98+
Kubernetes	TITLE	0.98+
One	QUANTITY	0.98+
TensorFlow	TITLE	0.98+
about $150,000 each	QUANTITY	0.98+
about 25 applications	QUANTITY	0.98+
one click	QUANTITY	0.98+
Partha	PERSON	0.98+
Isilon	ORGANIZATION	0.97+
DGX Box	COMMERCIAL_ITEM	0.97+
today	DATE	0.96+
First	QUANTITY	0.96+
DockerCon	EVENT	0.96+
Azure	ORGANIZATION	0.96+
Theano	TITLE	0.96+
DataWorks Summit 2018	EVENT	0.95+
theCUBE	ORGANIZATION	0.94+
Caffe	TITLE	0.91+
Azure	TITLE	0.91+
Robin	PERSON	0.91+
Robin	TITLE	0.9+
two key trends	QUANTITY	0.89+
HDP 3.0	TITLE	0.87+
EMC	ORGANIZATION	0.86+
single click	QUANTITY	0.86+
day two	QUANTITY	0.84+
DataWorks Summit	EVENT	0.83+
three big public clouds	QUANTITY	0.82+
DataWorks	EVENT	0.81+

Rob Bearden, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Rob	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
hundreds of thousands of dollars	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
each component	QUANTITY	0.99+
GDPR	TITLE	0.99+
DataWorks Summit	EVENT	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
millions of dollars	QUANTITY	0.98+
Atlas	TITLE	0.98+
first steps	QUANTITY	0.98+
HDP 3.0	TITLE	0.97+
One pane	QUANTITY	0.97+
both	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
First	QUANTITY	0.96+
next year	DATE	0.96+
each	QUANTITY	0.96+
DataPlane	TITLE	0.96+
theCUBE	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
DataWorks	ORGANIZATION	0.95+
Spark	TITLE	0.95+
today	DATE	0.94+
EU	LOCATION	0.93+
this morning	DATE	0.91+
Atlanta,	LOCATION	0.91+
Berlin	LOCATION	0.9+
each type	QUANTITY	0.88+
Global Data Protection Regulation GDPR	TITLE	0.87+
one common	QUANTITY	0.86+
few months ago	DATE	0.85+
NiFi	ORGANIZATION	0.85+
Data Platform 3.0	TITLE	0.84+
each tier	QUANTITY	0.84+
Data Studio	ORGANIZATION	0.84+
Data Studio	TITLE	0.83+
day one	QUANTITY	0.83+
one management platform	QUANTITY	0.82+
MiNiFi	ORGANIZATION	0.82+
San	LOCATION	0.71+
DataPlane	ORGANIZATION	0.69+
Kafka	TITLE	0.67+
Encore ERP	TITLE	0.66+
one common set	QUANTITY	0.65+
Data Steward Studio	ORGANIZATION	0.65+
HDF	ORGANIZATION	0.59+
Georgia	LOCATION	0.55+
announcements	QUANTITY	0.51+
Jose	ORGANIZATION	0.47+

Tim Vincent & Steve Roberts, IBM | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, overing DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to day two of theCUBE's live coverage of DataWorks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host James Kobielus. We have two guests on this panel today, we have Tim Vincent, he is the VP of Cognitive Systems Software at IBM, and Steve Roberts, who is the Offering Manager for Big Data on IBM Power Systems. Thanks so much for coming on theCUBE. >> Oh thank you very much. >> Thanks for having us. >> So we're now in this new era, this Cognitive Systems era. Can you set the scene for our viewers, and tell our viewers a little bit about what you do and why it's so important >> Okay, I'll give a bit of a background first, because James knows me from my previous role as, and you know I spent a lot of time in the data and analytics space. I was the CTO for Bob running the analytics group up 'til about a year and a half ago, and we spent a lot of time looking at what we needed to do from a data perspective and AI's perspective. And Bob, when he moved over to the Cognitive Systems, Bob Picciano who's my current boss, Bob asked me to move over and really start helping build, help to build out more of a software, and more of an AI focus, and a workload focus on how we thinking of the Power brand. So we spent a lot of time on that. So when you talk about cognitive systems or AI, what we're really trying to do is think about how you actually couple a combination of software, so co-optimize software space and the hardware space specific of what's needed for AI systems. Because the act of processing, the data processing, the algorithmic processing for AI is very, very different then what you would have for traditional data workload. So we're spending a lot of time thinking about how you actually co-optimize those systems so you can actually build a system that's really optimized for the demands of AI. >> And is this driven by customers, is this driven by just a trend that IBM is seeing? I mean how are you, >> It's a combination of both. >> So a lot of this is, you know, there's a lot of thought put into this before I joined the team. So there was a lot of good thinking from the Power brand, but it was really foresight on things like Moore's Law coming to an end of it's lifecycle right, and the ramifications to that. And at the same time as you start getting into things like narrow NATS and the floating point operations that you need to drive a narrow NAT, it was clear that we were hitting the boundaries. And then there's new technologies such as what Nvidia produces with with their GPUs, that are clearly advantageous. So there's a lot of trends that were comin' together the technical team saw, and at the same time we were seeing customers struggling with specific things. You know how to actually build a model if the training time is going to be weeks, and months, or let alone hours. And one of the scenarios I like to think about, I was probably showing my age a bit, but went to a school called University of Waterloo, and when I went to school, and in my early years, they had a batch based system for compilation and a systems run. You sit in the lab at night and you submit a compile job and the compile job will say, okay it's going to take three hours to compile the application, and you think of the productivity hit that has to you. And now you start thinking about, okay you've got this new skill in data scientists, which is really, really hard to find, they're very, very valuable. And you're giving them systems that take hours and weeks to do what the need to do. And you know, so they're trying to drive these models and get a high degree of accuracy in their predictions, and they just can't do it. So there's foresight on the technology side and there's clear demand on the customer side as well. >> Before the cameras were rolling you were talking about how the term data scientists and app developers is used interchangeably, and that's just wrong. >> And actually let's hear, 'cause I'd be in this whole position that I agree with it. I think it's the right framework. Data science is a team sport but application development has an even larger team sport in which data scientists, data engineers play a role. So, yeah we want to hear your ideas on the broader application development ecosystem, and where data scientists, and data engineers, and sort, fall into that broader spectrum. And then how IBM is supporting that entire new paradigm of application development, with your solution portfolio including, you know Power, AI on Power? >> So I think you used the word collaboration and team sport, and data science is a collaborative team sport. But you're 100% correct, there's also a, and I think it's missing to a great degree today, and it's probably limiting the actual value AI in the industry, and that's had to be data scientists and the application developers interact with each other. Because if you think about it, one of the models I like to think about is a consumer-producer model. Who consumes things and who produces things? And basically the data scientists are producing a specific thing, which is you know simply an AI model, >> Machine models, deep-learning models. >> Machine learning and deep learning, and the application developers are consuming those things and then producing something else, which is the application logic which is driving your business processes, and this view. So they got to work together. But there's a lot of confusion about who does what. You know you see people who talk with data scientists, build application logic, and you know the number of people who are data scientists can do that is, you know it exists, but it's not where the value, the value they bring to the equation. And the application developers developing AI models, you know they exist, but it's not the most prevalent form fact. >> But you know it's kind of unbalanced Tim, in the industry discussion of these role definitions. Quite often the traditional, you know definition, our sculpting of data scientist is that they know statistical modeling, plus data management, plus coding right? But you never hear the opposite, that coders somehow need to understand how to build statistical models and so forth. Do you think that the coders of the future will at least on some level need to be conversant with the practices of building,and tuning, or training the machine learning models or no? >> I think it's absolutely happen. And I will actually take it a step further, because again the data scientist skill is hard for a lot of people to find. >> Yeah. >> And as such is a very valuable skill. And what we're seeing, and we are actually one of the offerings that we're pulling out is something called PowerAI Vision, and it takes it up another level above the application developer, which is how do you actually really unlock the capabilities of AI to the business persona, the subject matter expert. So in the case of vision, how do you actually allow somebody to build a model without really knowing what a deep learning algorithm is, what kind of narrow NATS you use, how to do data preparation. So we build a tool set which is, you know effectively a SME tool set, which allows you to automatically label, it actually allows you to tag and label images, and then as you're tagging and labeling images it learns from that and actually it helps automate the labeling of the image. >> Is this distinct from data science experience on the one hand, which is geared towards the data scientists and I think Watson Analytics among your tools, is geared towards the SME, this a third tool, or an overlap. >> Yeah this is a third tool, which is really again one of the co-optimized capabilities that I talked about, is it's a tool that we built out that really is leveraging the combination of what we do in Power, the interconnect which we have with the GPU's, which is the NVLink interconnect, which gives us basically a 10X improvement in bandwidth between the CPU and GPU. That allows you to actually train your models much more quickly, so we're seeing about a 4X improvement over competitive technologies that are also using GPU's. And if we're looking at machine learning algorithms, we've recently come out with some technology we call Snap ML, which allows you to push machine learning, >> Snap ML, >> Yeah, it allows you to push machine learning algorithms down into the GPU's, and this is, we're seeing about a 40 to 50X improvement over traditional processing. So it's coupling all these capabilities, but really allowing a business persona to something specific, which is allow them to build out AI models to do recognition on either images or videos. >> Is there a pre-existing library of models in the solution that they can tap into? >> Basically it allows, it has a, >> Are they pre-trained? >> No they're not pre-trained models that's one of the differences in it. It actually has a set of models that allow, it picks for you, and actually so, >> Oh yes, okay. >> So this is why it helps the business persona because it's helping them with labeling the data. It's also helping select the best model. It's doing things under the covers to optimize things like hyper-parameter tuning, but you know the end-user doesn't have to know about all these things right? So you're tryin' to lift, and it comes back to your point on application developers, it allows you to lift the barrier for people to do these tasks. >> Even for professional data scientists, there may be a vast library of models that they don't necessarily know what is the best fit for the particular task. Ideally you should have, the infrastructure should recommend and choose, under various circumstances, the models, and the algorithms, the libraries, whatever for you for to the task, great. >> One extra feature of PowerAI Enterprises is that it does include a way to do a quick visual inspection of a models accuracy with a small data sample before you invest in scaling over a cluster or large data set. So you can get a visual indicator as to the, whether the models moving towards accuracy or you need to go and test an alternate model. >> So it's like a dashboard, of like Gini coefficients and all that stuff, okay. >> Exactly it gives you a snapshot view. And the other thing I was going to mention, you guys talked about application development, data scientists and of course a big message here at the conference is, you know data science meets big data and the work that Hortonworks is doing involving the notion of container support in YARN, GPU awareness in YARN, bringing data science experience, which you can include the PowerAI capability that Tim was talking about, as a workload tightly coupled with Hadoop. And this is where our Power servers are really built, not for just a monolithic building block that always has the same ratio of compute and storage, but fit for purpose servers that can address either GPU optimized workloads, providing the bandwidth enhancements that Tim talked about with the GPU, but also day-to-day servers, that can now support two terrabytes of memory, double the overall memory bandwidth on the box, 44 cores that can support up to 176 threads for parallelization of Spark workloads, Sequel workloads, distributed data science workloads. So it's really about choosing the combination of servers that can really mix this evolving workload need, 'cause a dupe isn't now just map produced, it's a multitude of workloads that you need to be able to mix and match, and bring various capabilities to the table for a compute, and that's where Power8, now Power9 has really been built for this kind of combination workloads where you can add acceleration where it makes sense, add big data, smaller core, smaller memory, where it makes sense, pick and choose. >> So Steve at this show, at DataWorks 2018 here in San Jose, the prime announcement, partnership announced between IBM and Hortonworks was IHAH, which I believe is IBM Host Analytics on Hortonworks. What I want to know is that solution that runs inside, I mean it runs on top of HDP 3.0 and so forth, is there any tie-in from an offering management standpoint between that and PowerAI so you can build models in the PowerAI environment, and then deploy them out to, in conjunction with the IHAH, is there, going forward, I mean just wanted to get a sense of whether those kinds of integrations. >> Well the same data science capability, data science experience, whether you choose to run it in the public cloud, or run it in private cloud monitor on prem, it's the same data science package. You know PowerAI has a set of optimized deep-learning libraries that can provide advantage on power, apply when you choose to run those deployments on our Power system alright, so we can provide additional value in terms of these optimized libraries, this memory bandwidth improvements. So really it depends upon the customer requirements and whether a Power foundation would make sense in some of those deployment models. I mean for us here with Power9 we've recently announced a whole series of Linux Power9 servers. That's our latest family, including as I mentioned, storage dense servers. The one we're showcasing on the floor here today, along with GPU rich servers. We're releasing fresh reference architecture. It's really to support combinations of clustered models that can as I mentioned, fit for purpose for the workload, to bring data science and big data together in the right combination. And working towards cloud models as well that can support mixing Power in ICP with big data solutions as well. >> And before we wrap, we just wanted to wrap. I think in the reference architecture you describe, I'm excited about the fact that you've commercialized distributed deep-learning for the growing number of instances where you're going to build containerized AI and distributing pieces of it across in this multi-cloud, you need the underlying middleware fabric to allow all those pieces to play together into some larger applications. So I've been following DDL because you've, research lab has been posting information about that, you know for quite a while. So I'm excited that you guys have finally commercialized it. I think there's a really good job of commercializing what comes out of the lab, like with Watson. >> Great well a good note to end on. Thanks so much for joining us. >> Oh thank you. Thank you for the, >> Thank you. >> We will have more from theCUBE's live coverage of DataWorks coming up just after this. (bright electronic music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon he is the VP of Cognitive little bit about what you do and you know I spent a lot of time And at the same time as you how the term data scientists on the broader application one of the models I like to think about and the application developers in the industry discussion because again the data scientist skill So in the case of vision, on the one hand, which is geared that really is leveraging the combination down into the GPU's, and this is, that's one of the differences in it. it allows you to lift the barrier for the particular task. So you can get a visual and all that stuff, okay. and the work that Hortonworks is doing in the PowerAI environment, in the right combination. So I'm excited that you guys Thanks so much for joining us. Thank you for the, of DataWorks coming up just after this.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Bob	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Tim Vincent	PERSON	0.99+
IBM	ORGANIZATION	0.99+
James	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Bob Picciano	PERSON	0.99+
Steve	PERSON	0.99+
San Jose	LOCATION	0.99+
100%	QUANTITY	0.99+
44 cores	QUANTITY	0.99+
two guests	QUANTITY	0.99+
Tim	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
10X	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
IBM Power Systems	ORGANIZATION	0.99+
Cognitive Systems Software	ORGANIZATION	0.99+
today	DATE	0.99+
three hours	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Cognitive Systems	ORGANIZATION	0.99+
University of Waterloo	ORGANIZATION	0.98+
third tool	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
50X	QUANTITY	0.96+
PowerAI	TITLE	0.96+
DataWorks 2018	EVENT	0.93+
theCUBE	ORGANIZATION	0.93+
two terrabytes	QUANTITY	0.93+
up to 176 threads	QUANTITY	0.92+
40	QUANTITY	0.91+
about	DATE	0.91+
Power9	COMMERCIAL_ITEM	0.89+
a year and a half ago	DATE	0.89+
IHAH	ORGANIZATION	0.88+
4X	QUANTITY	0.88+
IHAH	TITLE	0.86+
DataWorks	TITLE	0.85+
Watson	ORGANIZATION	0.84+
Linux Power9	TITLE	0.83+
Snap ML	OTHER	0.78+
Power8	COMMERCIAL_ITEM	0.77+
Spark	TITLE	0.76+
first	QUANTITY	0.73+
PowerAI	ORGANIZATION	0.73+
One extra	QUANTITY	0.71+
DataWorks	ORGANIZATION	0.7+
day two	QUANTITY	0.69+
HDP 3.0	TITLE	0.68+
Watson Analytics	ORGANIZATION	0.65+
Power	ORGANIZATION	0.58+
NVLink	OTHER	0.57+
YARN	ORGANIZATION	0.55+
Hadoop	TITLE	0.55+
theCUBE	EVENT	0.53+
Moore	ORGANIZATION	0.45+
Analytics	ORGANIZATION	0.43+
Power9	ORGANIZATION	0.41+
Host	TITLE	0.36+

Mike McNamara, NetApp | DataWorks Summit 2018

>> Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost James Kobielus. We are joined by Mike McNamara, he is the Senior Product and Solutions Marketing at NetApp. Thanks so much for coming on theCUBE. >> Thanks for having me. >> You're a first timer, >> Yes, >> So this is very exciting! >> Happy to be here. >> Welcome. >> Thanks. >> So, before the cameras were rolling, we were talking about how NetApp has been in this space for a while, but is really just starting to be recognized as a player. So, talk a little bit about your company's evolution. >> Sure. So, in the whole analytic space, is something NetApp was in a long time ago, and then sort got out of it, and then over the last several years, we've gotten back in, and we recognize it's a huge opportunity for data storage, data management, if you look at IDC Data, massive, massive market, but, the opportunity for us, is like you know what, they're mainly using a direct attached storage model where compute and storage is tied together. And now, with data just exploding, and growing like crazy, it's always been growing, but now it seems like it's just growing like crazy now, that, and customers wanting to have data on-prem, but also being able to move it off to the cloud, we're like, hey this is a great opportunity for us to come in with a solution that's, external storage solution that can come in and show them the benefits of have a more reliable, have an opportunity to move their data off to the cloud, we've got great solutions with that, so it's gone well, but it's been a little bit different, like at this show, a lot of the people, the data scientists, data engineers, some who know us, some still don't like, so, NetApp, what do you guys do, and so it's a little bit of an education, 'cause it's not a traditional buyer, if you will, we look at them as influencers, but it's only one influence than we traditionally have sold to say Vice President of Infrastructure, as an example, or maybe a Director of Storage Admin, but most of those folks are not here, so we're, this is just kind of a new market for us that we're making inroads. >> How do data scientists, or do they influence the purchase of storage solutions, or data management solutions? >> Sure, so they want to have access to the data, they want to be able analyze it quickly and effectively, they want to make sure it's always available, you know, at their fingertips so to speak. We can help them by giving them very fast, very reliable solutions, and specially with our software, they want to do for example, do some virtual clone of that data, and just do some testing on that without impacting their production data, we can do that in a snap, so we can make their lives a lot easier, so we can show them how, hey, mister data scientist, we can make your life a little easier-- >> Or miss data scientist. >> Or miss, we were talking about that, >> There are a lot of women in this field. >> Yeah, yeah. >> More than we realize, and they're great. >> So we can help you do your job better, and then, that, him or her can then influence who's making the purchase decisions. >> Yeah, training sets, test sets, validation sets of data for the machine learning and analytics development pipeline, yes, you need a solid storage infrastructure to do it right. >> Absolutely. >> So, when you're getting inside the head of your potential buyer here, the VP of Infrastructure, or data admin, what is it that you're hearing from those people most, what are their concerns, what keeps them up at night, and where do you come in? >> Yeah, so one of the concerns is, often times, you're, hey, how do I, do you have a cloud storage, connected to the cloud, you know, I'm doing things on-prem now, but is there a path, so that's a big one. And we, NetApp, pride ourselves on being the most cloud-connected, all flash storage in the industry. So, that's a big focus, big push for us. If you saw our marketing, it shows data authority for the hybrid cloud, so we really honestly do, whether it's with Google, or Azure, or AWS, we know our software runs in those environments, it also runs on-premises, but because it's the same on-tap software, we can move data between those environments. So, we get a real good storage, so we can you know, boom, check the box, we got you covered if you want to utilize the cloud, and I think the next piece of that is just from a protecting, protecting the data, you know, again I said data is just growing so much, I want to make sure it's always available, and we can back it up and all that, and that's been a core, core strength, versus like a lot of these traditional solutions they've been using, these direct attached models, they just don't have anywhere near the enterprise-grade data protection that NetApp has always prided itself on, over many decades now. And so, we can help them do that, and quite honestly, a lot of people think, well you know, you guys are external storage, how do you compare versus direct attached storage from our total cost, that's another one. I can tell you definitively, and we've got data to back it up from a total cost of ownership point of view, because of the fact that, of the advantages we bring from, up-time, and you know from RAID, but you know, in a Hadoop environment, often times there's three copies of data. With our solution, a good piece of software, there's only one copy of your data, so have three versus one is a big saving, but even what we do with the data, compressing it, and compacting it, a lot of benefits. So, we do have honest to goodness, outwards to 50% better total cost of ownership, versus a DAS model. >> Do you use machine learning within your portfolio? I'm hearing of more stories, >> Great question, yeah. >> Incorporating machine learning to automate or facilitate more of the functions in the data protection or data management life-cycle. >> Yeah, that's a great question, and we do use, so we've got a piece of software which we call Active IQ, it was referred to as Ace Update, you may have, it may ring a bell, but to answer your question, so we've got thousands of thousands of NetApp systems out there, and those customers that allow us, we have, think of it as kind of a call home feature, where we're getting data back from all our installed customers, and then we will go and do predictive analytics, and do some machine learning on that data, so then we can go back to those customers and say, hey you know what, you've got this volume that's unprotected, you should protect this, or we can show them, if you were to move that data off into our cloud environment, here's maybe performance you would see, so we do do a lot of that predictive-- >> Predictive performance assessment, it sounds like there's anomaly detection in there as well. >> Anomaly as well, letting them know, hey, you know, it's time for this drive, it may fail on you, let's ship you out a new drive now before it happens, so yeah, a lot of, from an analytics, predictive analysis going on. And you know, it's a huge benefit to our customers. Huge benefit. >> I know you're also doing a push toward artificial intelligence, so I'd like to hear more about that, and then also, if there's any best practices that have emerged. >> Sure, sure, so yes. That is another big area, so it's kind of a logical progression from where we were, if you will, in the analytics space, data lakes, but now moving into artificial intelligence, which has always been around, but it's really taking more of a more prominent role, I mean just a quick fun fact, I read that, you know that at the royal wedding that recently happened, did you know that Amazon used artificial intelligence to help us, the TV viewer, identify who the guests were. >> Ooh. >> So, you know it's like, it's everywhere, right? And so for us, we see that trend, a ton of data that needs to be managed, and so we kind of look at it from the edge to the core, to the cloud, those three, not pillars, but directional ways, taking data from IOT centers at the edge, bring it into the core, doing training, and then if the customer so chooses, out to the cloud. So, yeah it is a big push for us now, and we're going a lot with Nvidia, is a key partner with us. >> Really? This is a bit futuristic, but I can see a role going forward for AI to look into large data volumes, like video objects, to find things like faces, and poses and gestures and so forth, and see, to use that intelligence to be able to reduce the data sets down to where it's reduced, to de-duplicate, so that you can use less storage and then you can re-construct the original video objects or whatever going forward, I mean as a potential use of AI within the storage efficiency. >> Yep, yeah you're right, and that again, like in the analytic space, how we roll our in-line efficiency capabilities and data protection, is you know, very important, and then being able to move the data off into the cloud, if the customer so chooses, or just wants to use the cloud. So yeah, some of the same benefits from cloud connectivity, performance and efficiency that analytics apply certainly to AI. You know, another fun fact too about AI, which might help us, you and I living in the Boston area, is that I've read IBM has a patent out to use AI in traffic signaling, so in conjunction with cameras, to get AI, so hopefully that, you know, that works well it could alleviate-- >> Lead them out of the Tip O'Neill tunnel easy. (laughing) >> You got it maybe worse in D.C. (laughing) >> I'd like to hear though, if you have any best practices that with this moving into AI, how are you experimenting with it, and how are you finding it used most efficiently and effectively. >> Yeah, so I think one way we are eating our own dog food, so to speak, in that we're using it internally, we're using it on our customers' data, as I was explaining to help look at trends, and do analysis. So that's one, and then it's other things, just you know, partnering with companies like Nvidia as well and coming out with a joint solution, so we're doing work with them on different solution areas. >> Great, great. Well, Mike thanks so much for coming on theCUBE, >> Thanks for having me! >> It was fun having you. >> You survived! >> Yes! (laughs) >> We'll look forward to many more CUBE conversations. >> Great to hear from NetApp, you're very much in the game. >> Indeed, indeed. >> Alright, thank you very much. >> I'm Rebecca Knight for James Kobielus, we will have more from theCUBE's coverage of DataWorks coming up in just a little bit. (electronic music)

Published Date : Jun 20 2018

SUMMARY :

Brought to you by Hortonworks. he is the Senior Product and So, before the cameras were rolling, and we recognize it's a huge opportunity so we can show them how, More than we realize, So we can help you do your job better, yes, you need a solid storage boom, check the box, we got you covered more of the functions it sounds like there's anomaly And you know, it's a huge so I'd like to hear you know that at the royal from the edge to the core, so that you can use less so hopefully that, you Lead them out of the You got it maybe worse in D.C. that with this moving into AI, how are you so to speak, in that for coming on theCUBE, We'll look forward to Great to hear from NetApp, we will have more from theCUBE's coverage

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Mike McNamara	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Nvidia	ORGANIZATION	0.99+
Mike	PERSON	0.99+
50%	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Google	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
D.C.	LOCATION	0.99+
Boston	LOCATION	0.99+
one copy	QUANTITY	0.99+
three	QUANTITY	0.99+
one	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
DataWorks Summit 2018	EVENT	0.98+
three copies	QUANTITY	0.98+
NetApp	ORGANIZATION	0.97+
Hortonworks	ORGANIZATION	0.94+
Ace Update	TITLE	0.91+
IDC	ORGANIZATION	0.88+
Azure	ORGANIZATION	0.86+
thousands of thousands	QUANTITY	0.86+
NetApp	TITLE	0.82+
RAID	TITLE	0.8+
DataWorks	EVENT	0.76+
Vice President of Infrastructure	PERSON	0.71+
Active IQ	TITLE	0.69+
one influence	QUANTITY	0.69+
a lot of the people	QUANTITY	0.66+
of women	QUANTITY	0.66+
last	DATE	0.65+
years	DATE	0.63+
CUBE	ORGANIZATION	0.56+
ton	QUANTITY	0.54+
first	QUANTITY	0.53+
DataWorks	TITLE	0.51+
NetApp	QUANTITY	0.4+

Scott Gnau, Hortonworks | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicone Valley, it's theCUBE. Covering Datawork Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of Dataworks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost James Kobielus. We're joined by Scott Gnau, he is the chief technology officer at Hortonworks. Welcome back to theCUBE, Scott. >> Great to be here. >> It's always fun to have you on the show. So, you have really spent your entire career in the data industry. I want to start off at 10,000 feet, and just have you talk about where we are now, in terms of customer attitudes, in terms of the industry, in terms of where customers feel, how they're dealing with their data and how they're thinking about their approach in their business strategy. >> Well I have to say, 30 plus years ago starting in the data field, it wasn't as exciting as it is today. Of course, I always found it very exciting. >> Exciting means nerve-wracking. Keep going. >> Or nerve-wracking. But you know, we've been predicting it. I remember even you know, 10, 15 years ago before big data was a thing, it's like oh all this data's going to come, and it's going to be you know 10x what it is. And we were wrong. It was like 5000x, you know what it is. And I think the really exciting part is that data really used to be relegated frankly, to big companies as a derivative work of ERP systems, and so on and so forth. And while that's very interesting, and certainly enabled a whole level of productivity for industry, when you compare that to all of the data flying around everywhere today, whether it be Twitter feeds and even doing live polls, like we did in the opening session today. Data is just being created everywhere. And the same thing applies to that data that applied to the ERP data of old. And that is being able to harness, manage and understand that data is a new business creating opportunity. And you know, we were with some analysts the other day, and I think one of the more quoted things that came out of that when I was speaking with them, was really, like railroads and shipping in the 1800s and oil in the 1900s, data really is the wealth creator of this century. And so that creates a very nerve-wracking environment. It also creates an environment, a very agile and very important technological breakthroughs that enable those things to be turned into wealth. >> So thinking about that, in terms of where we are at this point in time and on the main stage this morning someone had likened it to the interstate highway system, that really revolutionized transportation, but also commerce. >> I love that actually. I may steal it in some of my future presentations. >> That's good but we'll know where you pilfered it. >> Well perhaps if data is oil the edge, in containerized applications and piping data, you know, microbursts of data across the internet of things, is sort of like the new fracking. You know, you're being able to extract more of this precious resource from the territory. >> Hopefully not quite as damaging to the environment. >> Maybe not. I'm sorry for environmentalist if I just offended you, I apologize. >> But I think you know, all of those analogies are very true, and I particularly like the interstate one this morning. Because when I think about what we've done in our core http platform, and I know Arun was here talking about all the great advances that we built into this, the kind of the core hadoop platform. Very traditional. Store data, analyze data but also bring in new kinds of algorithms, rapid innovation and so on. That's really great but that's kind of half of the story. In a device connected world, in a consumer centric world, capturing data at the edge, moving and processing data at the edge is the new normal, right? And so just like the interstate highway system actually created new ways of commerce because we could move people and things more efficiently, moving data and processing data more efficiently is kind of the second part of the opportunity that we have in this new deluge of data. And that's really where we've been with our Hortonworks data flow. And really saying that the complete package of managing data from origination at the edge all the way through analytic to decision that's triggered back at the edge is like the holy grail, right? And building a technology for that footprint, is why I'm certainly excited today. It's not the caffeine, it's just the opportunity of making all of that work. >> You know, one of the, I think the key announcement for me at this show, that you guys made on HDP 3.0 was containerization of more of the capabilities of your distributed environment so that these capabilities, in terms of processing. First of all, capturing and analyzing an moving that data, can be pushed closer to the end points. Can you speak a bit Scott, about this new capability or this containerization support? Within HDP 3.0 but really in your broader portfolio and where you're going with that in terms of addressing edge applications perhaps, autonomous vehicles or you know, whatever you might put into a new smart phone or whatever you put at the edge. Describe the potential containerizations to sort of break this ecosystem wide open. >> Yeah, I think there are a couple of aspects to containerization and by the way, we're like so excited about kind of the cloud first, containerized HDP 3.0 that we launched here today. There's a lot of great tech that our customers have been clamoring for that they can take advantage of. And it's really just the beginning, which again is part of the excitement of being in the technology space and certainly being part of Hortonworks. So containerization affords a couple of things. Certainly, agility. Agility in deploying applications. So, you know for 30 years we've built these enterprise software stacks that were very integrated, hugely complicated systems that could bring together multiple different applications, different workloads and manage all that in a multi-tendency kind of environment. And that was because we had to do that, right? Servers were getting bigger, they were more powerful but not particularly well distributed. Obviously in a containerized world, you now turn that whole paradigm on its head and you say, you know what? I'm just going to collect these three microservices that I need to do this job. I can isolate them. I can have them run in a server-less technology. I can actually allocate in the cloud servers to go run, and when they're done they go away. And I don't pay for them anymore. So thinking about kind of that from a software development deployment implementation perspective, there huge implications but the real value for customers is agility, right? I don't have to wait until next year to upgrade my enterprise software stack to take advantage of this new algorithm. I can simply isolate it inside of a container, have it run, and have it go away. And get the answer, right? And so when I think about, and a number of our keynotes this morning were talking about just kind of the exponential rate of change, this is really the net new norm. Because the only way we can do things faster, is in fact to be able to provide this. >> And it's not just microservices. Also orchestrating them through Kubernetes, and so forth, so they can be. >> Sure. That's the how versus yeah. >> Quickly deployed as an ensemble and then quickly de-provisioned when you don't need them anymore. >> Yeah so then there's obviously the cost aspect, right? >> Yeah. >> So if you're going to run a whole bunch of stuff or even if you have something as mundane as a really big merge join inside of hive. Let me spin up a thousand extra containers to go do that big thing, and then have them go away when it's done. >> And oh, by the way, you'll be deployed on. >> And only pay for it while I'm using it. >> And then you can possibly distribute those containers across different public clouds depending on what's most cost effective at any point in time Azure or AWS or whatever it might be. >> And I tease with Arun, you know the only thing that we haven't solved is for the speed of light, but we're working on it. >> In talking about how this warp speed change, being the new norm, can you talk about some of the most exciting use cases you've seen in terms of the customers and clients that are using Hortonworks in the coolest ways. >> Well I mean obviously autonomous vehicles is one that we all captured all of our imagination. 'Cause we understand how that works. But it's a perfect use case for this kind of technology. But the technology also applies in fraud detection and prevention. It applies in healthcare management, in proactive personalized medicine delivery, and in generating better outcomes for treatment. So, you know, all across. >> It will bind us in every aspect of our lives including the consumer realm increasingly, yeah. >> Yeah, all across the board. And you know one of the things that really changed, right, is well a couple things. A lot of bandwidth so you can start to connect these things. The devices themselves are particularly smart, so you don't any longer have to transfer all the data to a mainframe and then wait three weeks, sorry, wait three weeks for your answer and then come back. You can have analytic models running on and edge device. And think about, you know, that is really real time. And that actually kind of solves for the speed of light. 'Cause you're not waiting for those things to go back and forth. So there are a lot of new opportunities and those architectures really depend on some of the core tenets of ultimately containerization stateless application deployment and delivery. And they also depend on the ability to create feedback loops to do point-to-point and peer kinds of communication between devices. This is a whole new world of how data get moved and how the decisions around date movement get made. And certainly that's what we're excited about, building with the core components. The other implication of all of this, and we've know each other for a long time. Data has gravity. Data movements expensive. It takes time, frankly, you have to pay for the bandwidth and all that kind of stuff. So being able to play the data where it lies becomes a lot more interesting from an application portability perspective and with all of these new sensors, devices and applications out there, a lot more data is living its entire lifecycle in the cloud. And so being able to create that connective tissue. >> Or as being as terralexical on the edge. >> And even on the edge. >> In with machine learn, let me just say, butt in a second. One of the areas that we're focusing on increasingly in Wikibot in terms of our focus on machine learning at the edge, is more and more machine learning frameworks are coming into the browser world. Javascript for the most like tenser flow JS, you know more of this inferencing and training is going to happen inside your browser. That blows a lot of people's minds. It may not be heavy hitting machine learning, but it'll be good enough for a lot of things that people do in their normal life. Where you don't want to round trip back to the cloud. It's all happening right there, in you know, Chrome or whatever you happen to be using. >> Yeah and so the point being now, you know when I think about the early days, talking about scalability, I remember ship being my first one terabyte database. And then the first 10 terabyte database. Yeah, it doesn't sound very exciting. When I think about scalability of the future, it's really going to, scalability is not going to be defined as petabytes or exabytes under management. It's really going to be defined as petabytes or exabytes affected across a grid of storage and processing devices. And that's a whole new technology paradigm, and really that's kind of the driving force behind what we've been building and what we've been talking about at this conference. >> Excellent. >> So when you're talking about these things. I mean how much, are the companies themselves prepared, and do they have the right kind of talent to use the kinds of insights that you're able to extract? And then act on them in the real time. 'Cause you're talking about how this is saving a lot of the waiting around time. So is this really changing the way business gets done, and do companies have the talent to execute? >> Sure. I mean it's changing the way business gets done. We showed a quote on stage this morning from the CEO of Marriott, right? So, I think there a couple of pieces. One is business are increasingly data driven and business strategy is increasingly the data strategy. And so it starts from the top, kind of setting that strategy and understanding the value of that asset and how that needs to be leveraged to drive new business. So that's kind of one piece. And you know, obviously there are more and more folks kind of coming to the realization that that is important. The other thing that's been helpful is, you know, as with any new technology there's always kind of the startup shortage of resource and people start to spool up and learn. You know the really good news, and for the past 10 years I've been working with a number of different university groups. Parents are actually going to universities and demanding that the curriculum include data, and processing and big data and all of these technologies. Because they know that their children educated in that kind of a world, number one, they're going to have a fun job to go to everyday. 'Cause it's going to be something different everyday. But number two they're going to be employed for life. (laughing) >> Yeah. >> They will be solvent. >> Frankly the demand has actually created a catch up in supply that we're seeing. And of course, you know, as tools start to get more mature and more integrated, they also become a little bit easier to use. You know, less, there's a little bit easier deployment and so on. So a combination of, I'm seeing a really good supply, there really, obviously we invest in education through the community. And then frankly, the education system itself, and folks saying this is really the hot job of the next century. You know, I can be the new oil barren. Or I can be the new railroad captain. It's actually creating more supply which is also very helpful. >> Data's the heart of what I call the new stem cell. It's science, technology, engineering, mathematics that you want to implant in the brains of the young as soon as possible. I hear ya. >> Yeah, absolutely. >> Well Scott thanks so much for coming on. But I want to first also, we can't let you go without the fashion statement. You arrived on set wearing it. >> The elephants. >> I mean it was quite a look. >> Well I did it because then you couldn't see I was sweating on my brow. >> Oh please, no, no, no. >> 'Cause I was worried about this tough interview. >> You know one of the things I love about your logo, and I'll just you know, sounds like I'm fawning. The elephant is a very intelligent animal. >> It is indeed. >> My wife's from Indonesia. I remember going back one time they had Asian elephants at a one of these safari parks. And watching it perform, and then my son was very little then. The elephant is a very sensitive, intelligent animal. You don't realize 'till you're up close. They pick up all manner of social cues. I think it's an awesome symbol for a company that's all about data driven intelligence. >> The elephant never forgets. >> Yeah. >> That's what we know. >> That's right we never forget. >> Him forget 'cause he's got a brain. Or she, I'm sorry. He or she has a brain. >> And it's data driven. >> Yeah. >> Thanks very much. >> Great. Well thanks for coming on theCUBE. I'm Rebecca Knight for James Kobielus. We will have more coming up from Dataworks just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicone Valley, he is the chief technology in terms of the industry, in the data field, Exciting means nerve-wracking. and shipping in the 1800s and on the main stage this I love that actually. where you pilfered it. is sort of like the new fracking. to the environment. I apologize. And really saying that the of more of the capabilities of the cloud servers to go run, and so forth, so they can be. and then quickly de-provisioned and then have them go away when it's done. And oh, by the way, And then you can possibly is for the speed of light, Hortonworks in the coolest ways. But the technology also including the consumer and how the decisions around terralexical on the edge. One of the areas that we're Yeah and so the point being now, the talent to execute? and demanding that the And of course, you know, in the brains of the young the fashion statement. then you couldn't see 'Cause I was worried and I'll just you know, and then my son was very little then. He or she has a brain. for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
James Kobielus	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Indonesia	LOCATION	0.99+
three weeks	QUANTITY	0.99+
30 years	QUANTITY	0.99+
10x	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Marriott	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
1900s	DATE	0.99+
1800s	DATE	0.99+
10,000 feet	QUANTITY	0.99+
Silicone Valley	LOCATION	0.99+
one piece	QUANTITY	0.99+
Dataworks Summit	EVENT	0.99+
AWS	ORGANIZATION	0.99+
Chrome	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
next year	DATE	0.98+
next century	DATE	0.98+
today	DATE	0.98+
30 plus years ago	DATE	0.98+
Javascript	TITLE	0.98+
second part	QUANTITY	0.98+
Twitter	ORGANIZATION	0.98+
first	QUANTITY	0.97+
Dataworks	ORGANIZATION	0.97+
One	QUANTITY	0.97+
5000x	QUANTITY	0.97+
Datawork Summit 2018	EVENT	0.96+
HDP 3.0	TITLE	0.95+
one	QUANTITY	0.95+
this morning	DATE	0.95+
HDP 3.0	TITLE	0.94+
three microservices	QUANTITY	0.93+
first one terabyte	QUANTITY	0.93+
First	QUANTITY	0.92+
DataWorks Summit 2018	EVENT	0.92+
JS	TITLE	0.9+
Asian	OTHER	0.9+
3.0	TITLE	0.87+
one time	QUANTITY	0.86+
a thousand extra containers	QUANTITY	0.84+
this morning	DATE	0.83+
15 years ago	DATE	0.82+
Arun	PERSON	0.81+
this century	DATE	0.81+
10,	DATE	0.8+
first 10 terabyte	QUANTITY	0.79+
couple	QUANTITY	0.72+
Azure	ORGANIZATION	0.7+
Kubernetes	TITLE	0.7+
theCUBE	EVENT	0.66+
parks	QUANTITY	0.59+
a second	QUANTITY	0.58+
past 10 years	DATE	0.57+
number two	QUANTITY	0.56+
Wikibot	TITLE	0.55+
HDP	COMMERCIAL_ITEM	0.54+
rd.	QUANTITY	0.48+

Ram Venkatesh, Hortonworks & Sudhir Hasbe, Google | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by HortonWorks. >> We are wrapping up Day One of coverage of Dataworks here in San Jose, California on theCUBE. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sudhir Hasbe, who is the director of product management at Google and Ram Venkatesh, who is VP of Engineering at Hortonworks. Ram, Sudhir, thanks so much for coming on the show. >> Thank you very much. >> Thank you. >> So, I want to start out by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. >> Sure, so basically what we announced was support for the Hortonworks DataPlatform and Hortonworks DataFlow, HDP and HDF, running on top of the Google Cloud Platform. So this includes deep integration with Google's cloud storage connector layer as well as it's a certified distribution of HDP to run on the Google Cloud Platform. >> I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises and as they look at moving to cloud, like in GCP, Google Cloud, they want the similar, familiar environment. So, they want the choice to deploy on-premises or Google Cloud, but they want the familiarity of what they've already been using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in cloud, or they wat to build this hybrid solution where the data can reside on-premises, can move to cloud and build these common, hybrid architecture. So, that's what this does. >> So, HDP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there's some tie-in between Hortonworks's real-time or low latency or streaming capabilities from HDF in the Google Cloud. So, could you describe, at a full sort of detail level, the degrees of technical integration between your two offerings here. >> You want to take that? >> Sure, I'll handle that. So, essentially, deep in the heart of HDP, there's the HDFS layer that includes Hadoop compatible file system which is a plug-able file system layer. So, what Google has done is they have provided an implementation of this API for the Google Cloud Storage Connector. So this is the GCS Connector. We've taken the connector and we've actually continued to refine it to work with our workloads and now Hortonworks has actually bundling, packaging, and making this connector be available as part of HDP. >> So bilateral data movement between them? Bilateral workload movement? >> No, think of this as being very efficient when our workloads are running on top of GCP. When they need to get at data, they can get at data that is in the Google Cloud Storage buckets in a very, very efficient manner. So, since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the cloud storage connector. This is a critical part of making sure that the architecture is actually optimized for the cloud. So, at our skill and our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiency that they're looking for as they move to the cloud. So, to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled elastically, independent of each other. So this is a fairly fundamental architectural change. We want to make sure that we enable this in a first-class manner. >> I think that's a key point, right. I think what cloud allows you to do is scale the storage and compute independently. And so, with storing data in Google Cloud Storage, you can like scale that horizontally and then just leverage that as your storage layer. And the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is store the data on GCP, on the cloud storage, and then just use the scale, the compute side of it with HDP and HDF. >> So, if you'll indulge me to a name, another Hortonworks partner for just a hypothetical. Let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of HDP on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling which is more compute intensive and then the separate decoupled storage infrastructure to do the training which is more storage intensive? Is that a capability that would available to your customers? With this integration with Google? >> Yeah, so where we are going with this is we are saying, IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So, you can run your machine learning training jobs, you can run your scoring jobs and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in their particular way with HDP, with Apache Ranger, Atlas, and so forth. So, when they move to the cloud, we want this experience to be seamless from a management standpoint. So, from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they are running in Google Cloud as well. So, we've had this capability on Azure and on AWS, so with this partnership, we are announcing the same type of deep integration with GCP as well. >> So Hortonworks is that one pane of glass across all your product partners for all manner of jobs. Go ahead, Rebecca. >> Well, I just wanted to ask about, we've talked about the reason, the impetus for this. With the customer, it's more familiar for customers, it offers the seamless experience, But, can you delve a little bit into the business problems that you're solving for customers here? >> A lot of times, our customers are at various points on their cloud journey, that for some of them, it's very simple, they're like there's a broom coming by and the datacenter is going away in 12 months and I need to be in the cloud. So, this is where there is a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So, for example, one of our large customers, a travel partner, so they are exploring their new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure, they know they have that for a while. They are spinning up new use cases in the cloud typically for reasons of agility. So, if you, typically many of our customers, they operate large, multi-tenant clusters on-prem. That's nice for, so a very scalable compute for running large jobs. But, if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you can do that. Whereas in this sort of model, what they can say is, they can bring up a new workload and just have the specific versions and dependency that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as... >> Through the containerization of the Spark jobs or whatever. >> Correct, and so containerization as well as even spinning up an entire new environment. Because, in the cloud, given that you have access to elastic compute resources, they can come and go. So, your workloads are much more independent of the underlying cluster than they are on-premise. And this is where sort of the core business benefits around agility, speed of deployment, things like that come into play. >> And also, if you look at the total cost of ownership, really take an example where customers are collecting all this information through the month. And, at month end, you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That's a great scenario for cloud where you don't have to on-premises create an infrastructure, keep it ready. So that's one example where now, in the new partnership, you can collect all the data through the on-premises if you want throughout the month. But, move that and leverage cloud to go ahead and scale and do this workload and finish the books and all. That's one, the second example I can give is, a lot of customers collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still connect all these events through HDP that may be running on-premises with Kafka and then, what you can do is, in-cloud, in GCP, you can deploy HDP, HDF, and you can use the HDF from there for real-time stream processing. So, collect all these clickstream events, use them, make decisions like, hey, which products are selling better?, should we go ahead and give?, how many people are looking at that product?, or how many people have bought it?. That kind of aggregation and real-time at scale, now you can do in-cloud and build these hybrid architectures that are there. And enable scenarios where in past, to do that kind of stuff, you would have to procure hardware, deploy hardware, all of that. Which all goes away. In-cloud, you can do that much more flexibly and just use whatever capacity you have. >> Well, you know, ephemeral workloads are at the heart of what many enterprise data scientists do. Real-world experiments, ad-hoc experiments, with certain datasets. You build a TensorFlow model or maybe a model in Caffe or whatever and you deploy it out to a cluster and so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of an ongoing experimentation program that's, you know, they're building and testing assets that may be or may not be deployed in the production applications. That's you know, so I can see a clear need for that, well, that capability of this announcement in lots of working data science shops in the business world. >> Absolutely. >> And I think coming down to, if you really look at the partnership, right. There are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at-scale at a lower cost, like total cost of ownership, reducing that, running at-scale analytics. That's one of the big things. Again, as I said, the hybrid scenarios. Most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in these hybrid scenarios. And what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they are building and get business value out of all of these. And then, finally, we at Google believe that the world will be more and more real-time over a period of time. Like, we already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions. And this is only going to grow. And this partnership also provides the whole streaming analytics capabilities in-cloud at-scale for customers to build these hybrid plus also real-time streaming scenarios with this package. >> Well it's clear from Google what the Hortonworks partnership gives you in this competitive space, in the multi-cloud space. It gives you that ability to support hybrid cloud scenarios. You're one of the premier public cloud providers and we all know about. And clearly now that you got, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. >> That's perfect, exactly right. >> Well a great note to end on. Thank you so much for coming on theCUBE. Sudhir, Ram, that you so much. >> Thank you, thanks a lot. >> Thank you. >> I'm Rebecca Knight for James Kobielus, we will have more tomorrow from DataWorks. We will see you tomorrow. This is theCUBE signing off. >> From sunny San Jose. >> That's right.

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, for coming on the show. So, I want to start out by asking you to run on the Google Cloud Platform. and as they look at moving to cloud, in the Google Cloud. So, essentially, deep in the heart of HDP, and the cost efficiency is scale the storage and to do the training which and you can have the same that one pane of glass With the customer, it's and just have the specific of the Spark jobs or whatever. of the underlying cluster and then, what you can and so the life of a data that the world will be And clearly now that you got, Sudhir, Ram, that you so much. We will see you tomorrow.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
two	QUANTITY	0.99+
Sudhir	PERSON	0.99+
Ram Venkatesh	PERSON	0.99+
San Jose	LOCATION	0.99+
HortonWorks	ORGANIZATION	0.99+
Sudhir Hasbe	PERSON	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
DataWorks	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Ram	PERSON	0.99+
AWS	ORGANIZATION	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.99+
two offerings	QUANTITY	0.98+
12 months	QUANTITY	0.98+
One	QUANTITY	0.98+
Day One	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
IBM	ORGANIZATION	0.97+
second example	QUANTITY	0.97+
Google Cloud Platform	TITLE	0.96+
Atlas	ORGANIZATION	0.96+
Google Cloud	TITLE	0.94+
Apache Ranger	ORGANIZATION	0.92+
three key areas	QUANTITY	0.92+
Hadoop	TITLE	0.91+
Kafka	TITLE	0.9+
theCUBE	ORGANIZATION	0.88+
earlier this morning	DATE	0.87+
Apache Hive	ORGANIZATION	0.86+
GCP	TITLE	0.86+
one pane	QUANTITY	0.86+
IBM Data Science	ORGANIZATION	0.84+
Azure	TITLE	0.82+
Spark	TITLE	0.81+
first	QUANTITY	0.79+
HDF	ORGANIZATION	0.74+
once in a month	QUANTITY	0.73+
HDP	ORGANIZATION	0.7+
TensorFlow	OTHER	0.69+
Hortonworks DataPlatform	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.61+
GCS	OTHER	0.57+
HDP	TITLE	0.5+
DSX	TITLE	0.49+
Cloud Storage	TITLE	0.47+

Stephanie McReynolds, Alation | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Stephanie McReynolds. She is the Vice President of Marketing at Alation. Thanks so much for, for returning to theCUBE, Stephanie. >> Thank you for having me again. >> So, before the cameras were rolling, we were talking about Kevin Slavin's talk on the main stage this morning, and talking about, well really, a background to sort of this concern about AI and automation coming to take people's jobs, but really, his overarching point was that we really, we shouldn't, we shouldn't let the algorithms take over, and that humans actually are an integral piece of this loop. So, riff on that a little bit. >> Yeah, what I found fascinating about what he presented were actual examples where having a human in the loop of AI decision-making had a more positive impact than just letting the algorithms decide for you, and turning it into kind of a black, a black box. And the issue is not so much that, you know, there's very few cases where the algorithms make the wrong decision. What happens the majority of the time is that the algorithms actually can't be understood by human. So if you have to roll back >> They're opaque, yeah. >> in your decision-making, or uncover it, >> I mean, who can crack what a convolutional neural network does, at a layer by layer, nobody can. >> Right, right. And so, his point was, if we want to avoid not just poor outcomes, but also make sure that the robots don't take over the world, right, which is where every like, media person goes first, right? (Rebecca and James laugh) That you really need a human in the loop of this process. And a really interesting example he gave was what happened with the 2015 storm, and he talked about 16 different algorithms that do weather predictions, and only one algorithm predicted, mis-predicted that there would be a huge weather storm on the east coast. So if there had been a human in the loop, we wouldn't have, you know, caused all this crisis, right? The human could've >> And this is the storm >> Easily seen. >> That shut down the subway system, >> That's right. That's right. >> And really canceled New York City for a few days there, yeah. >> That's right. So I find this pretty meaningful, because Alation is in the data cataloging space, and we have a lot of opportunity to take technical metadata and automate the collection of technical and business metadata and do all this stuff behind the scenes. >> And you make the discovery of it, and the analysis of it. >> We do the discovery of this, and leading to actual recommendations to users of data, that you could turn into automated analyses or automated recommendations. >> Algorithmic, algorithmically augmented human judgment is what it's all about, the way I see it. What do you think? >> Yeah, but I think there's a deeper insight that he was sharing, is it's not just human judgment that is required, but for humans to actually be in the loop of the analysis as it moves from stage to stage, that we can try to influence or at least understand what's happening with that algorithm. And I think that's a really interesting point. You know, there's a number of data cataloging vendors, you know, some analysts will say there's anywhere from 10 to 30 different vendors in the data cataloging space, and as vendors, we kind of have this debate. Some vendors have more advanced AI and machine learning capabilities, and other vendors haven't automated at all. And I think that the answer, if you really want humans to adopt analytics, and to be comfortable with the decision-making of those algorithms, you need to have a human in the loop, in the middle of that process, of not only making the decision, but actually managing the data that flows through these systems. >> Well, algorithmic transparency and accountability is an increasing requirement. It's a requirement for GDPR compliance, for example. >> That's right. >> That I don't see yet with Wiki, but we don't see a lot of solution providers offering solutions to enable more of an automated roll-up of a narrative of an algorithmic decision path. But that clearly is a capability as it comes along, and it will. That will absolutely depend on a big data catalog managing the data, the metadata, but also helping to manage the tracking of what models were used to drive what decision, >> That's right. >> And what scenario. So that, that plays into what Alation >> So we talk, >> And others in your space do. >> We call that data catalog, almost as if the data's the only thing that we're tracking, but in addition to that, that metadata or the data itself, you also need to track the business semantics, how the business is using or applying that data and that algorithmic logic, so that might be logic that's just being used to transform that data, or it might be logic to actually make and automate decision, like what they're talking about GDPR. >> It's a data artifact catalog. These are all artifacts that, they are derived in many ways, or supplement and complement the data. >> That's right. >> They're all, it's all the logic, like you said. >> And what we talk about is, how do you create transparency into all those artifacts, right? So, a catalog starts with this inventory that creates a foundation for transparency, but if you don't make those artifacts accessible to a business person, who might not understand what is metadata, what is a transformation script. If you can't make that, those artifacts accessible to a, what I consider a real, or normal human being, right, (James laughs) I love to geek out, but, (all laugh) at some point, not everyone is going to understand. >> She's the normal human being in this team. >> I'm normal. I'm normal. >> I'm the abnormal human being among the questioners here. >> So, yeah, most people in the business are just getting our arms around how do we trust the output of analytics, how do we understand enough statistics and know what to apply to solve a business problem or not, and then we give them this like, hairball of technical artifacts and say, oh, go at it. You know, here's your transparency. >> Well, I want to ask about that, that human that we're talking about, that needs to be in the loop at every stage. What, that, surely, we can make the data more accessible, and, but it also requires a specialized skill set, and I want to ask you about the talent, because I noticed on your LinkedIn, you said, hey, we're hiring, so let me know. >> That's right, we're always hiring. We're a startup, growing well. >> So I want to know from you, I mean, are you having difficulty with filling roles? I mean, what is at the pipeline here? Are people getting the skills that they need? >> Yeah, I mean, there's a wide, what I think is a misnomer is there's actually a wide variety of skills, and I think we're adding new positions to this pool of skills. So I think what we're starting to see is an expectation that true business people, if you are in a finance organization, or you're in a marketing organization, or you're in a sales organization, you're going to see a higher level of data literacy be expected of that, that business person, and that's, that doesn't mean that they have to go take a Python course and learn how to be a data scientist. It means that they have to understand statistics enough to realize what the output of an algorithm is, and how they should be able to apply that. So, we have some great customers, who have formally kicked off internal training programs that are data literacy programs. Munich Re Insurance is a good example. They spoke with James a couple of months ago in Berlin. >> Yeah, this conference in Berlin, yeah. >> That's right, that's right, and their chief data officer has kicked off a formal data literacy training program for their employees, so that they can get business people comfortable enough and trusting the data, and-- >> It's a business culture transformation initiative that's very impressive. >> Yeah. >> How serious they are, and how comprehensive they are. >> But I think we're going to see that become much more common. Pfizer has taken, who's another customer of ours, has taken on a similar initiative, and how do they make all of their employees be able to have access to data, but then also know when to apply it to particular decision-making use cases. And so, we're seeing this need for business people to get a little bit of training, and then for new roles, like information stewards, or data stewards, to come online, folks who can curate the data and the data assets, and help be kind of translators in the organization. >> Stephanie, will there be a need for a algorithm curator, or a model curator, to, you know, like a model whisperer, to explain how these AI, convolutional, recurrent, >> Yeah. >> Whatever, all these neural, how, what they actually do, you know. Would there be a need for that going forward? Another as a normal human being, who can somehow be bilingual in neural net and in standard language? >> I think, I think so. I mean, I think we've put this pressure on data scientists to be that person. >> Oh my gosh, they're so busy doing their job. How can we expect them to explain, and I mean, >> Right. >> And to spend 100% of their time explaining it to the rest of us? >> And this is the challenge with some of the regulations like GDPR. We aren't set up yet, as organizations, to accommodate this complexity of understanding, and I think that this part of the market is going to move very quickly, so as vendors, one of the things that we can do is continue to help by building out applications that make it easy for information stewardship. How do you lower the barrier for these specialist roles and make it easy for them to do their job by using AI and machine learning, where appropriate, to help scale the manual work, but keeping a human in the loop to certify that data asset, or to add additional explanation and then taking their work and using AI, machine learning, and automation to propagate that work out throughout the organization, so that everyone then has access to those explanations. So you're no longer requiring the data scientists to hold like, I know other organizations that hold office hours, and the data scientist like sits at a desk, like you did in college, and people can come in and ask them questions about neural nets. That's just not going to scale at today's pace of business. >> Right, right. >> You know, the term that I used just now, the algorithm or model whisperer, you know, the recommend-er function that is built into your environment, in similar data catalog, is a key piece of infrastructure to rank the relevance rank, you know, the outputs of the catalog or responses to queries that human beings might make. You know, the recommendation ranking is critically important to help human beings assess the, you know, what's going on in the system, and give them some advice about how to, what avenues to explore, I think, so. >> Yeah, yeah. And that's part of our definition of data catalog. It's not just this inventory of technical metadata. >> That would be boring, and dry, and useless. >> But that's where, >> For most human beings. >> That's where a lot of vendor solutions start, right? >> Yeah. >> And that's an important foundation. >> Yeah, for people who don't live 100% of their work day inside the big data catalog. I hear what you're saying, you know. >> Yeah, so people who want a data catalog, how you make that relevant to the business is you connect those technical assets, that technical metadata with how is the business actually using this in practice, and how can we have proactive recommendation or the recommendation engines, and certifications, and this information steward then communicating through this platform to others in the organization about how do you interpret this data and how do you use it to actually make business decisions. And I think that's how we're going to close the gap between technology adoption and actual data-driven decision-making, which we're not quite seeing yet. We're only seeing about 30, when they survey, only about 36% of companies are actually confident they're making data-driven decisions, even though there have been, you know, millions, if not billions of dollars that have gone into the data analytics market and investments, and it's because as a manager, I don't quite have the data literacy yet, and I don't quite have the transparency across the rest of the organization to close that trust gap on analytics. >> Here's my feeling, in terms of cultural transformations across businesses in general. I think the legal staff of every company is going to need to get real savvy on using those kinds of tools, like your catalog, with recommendation engines, to support e-discovery, or discovery of the algorithmic decision paths that were taken by their company's products, 'cause they're going to be called by judges and juries, under a subpoena and so forth, and so on, to explain all this, and they're human beings who've got law degrees, but who don't know data, and they need the data environment to help them frame up a case for what we did, and you know, so, we being the company that's involved. >> Yeah, and our politicians. I mean, anyone who's read Cathy's book, Weapons of Math Destruction, there are some great use cases of where, >> Math, M-A-T-H, yeah. >> Yes, M-A-T-H. But there are some great examples of where algorithms can go wrong, and many of our politicians and our representatives in government aren't quite ready to have that conversation. I think anyone who watched the Zuckerberg hearings you know, in congress saw the gap of knowledge that exists between >> Oh my gosh. >> The legal community, and you know, and the tech community today. So there's a lot of work to be done to get ready for this new future. >> But just getting back to the cultural transformation needed to be, to make data-driven decisions, one of the things you were talking about is getting the managers to trust the data, and we're hearing about what are the best practices to have that happen in the sense, of starting small, be willing to experiment, get out of the lab, try to get to insight right away. What are, what would your best advice be, to gain trust in the data? >> Yeah, I think the biggest gap is this issue of transparency. How do you make sure that everyone understands each step of the process and has access to be able to dig into that. If you have a foundation of transparency, it's a lot easier to trust, rather than, you know, right now, we have kind of like the high priesthood of analytics going on, right? (Rebecca laughs) And some believers will believe, but a lot of folks won't, and, you know, the origin story of Alation is really about taking these concepts of the scientific revolution and scientific process and how can we support, for data analysis, those same steps of scientific evaluation of a finding. That means that you need to publish your data set, you need to allow others to rework that data, and come up with their own findings, and you have to be open and foster conversations around data in your organization. One other customer of ours, Meijer, who's a grocery store in the mid-west, and if you're west coast or east coast-based, you might not have heard of them-- >> Oh, Meijers, thrifty acres. I'm from Michigan, and I know them, yeah. >> Gigantic. >> Yeah, there you go. Gigantic grocery chain in the mid-west, and, Joe Oppenheimer there actually introduced a program that he calls the social contract for analytics, and before anyone gets their license to use Tableau, or MicroStrategy, or SaaS, or any of the tools internally, he asks those individuals to sign a social contract, which basically says that I'll make my work transparent, I will document what I'm doing so that it's shareable, I'll use certain standards on how I format the data, so that if I come up with a, with a really insightful finding, it can be easily put into production throughout the rest of the organization. So this is a really simple example. His inspiration for that social contract was his high school freshman. He was entering high school and had to sign a social contract, that he wouldn't make fun of the teachers, or the students, you know, >> I love it. >> Very simple basics. >> Yeah, right, right, right. >> I wouldn't make fun of the teacher. >> We all need social contract. >> Oh my gosh, you have to make fun of the teacher. >> I think it was a little more formal than that, in the language, but that was the concept. >> That's violating your civil rights as a student. I'm sorry. (Stephanie laughs) >> Stephanie, always so much fun to have you here. Thank you so much for coming on. >> Thank you. It's a pleasure to be here. >> I'm Rebecca Knight, for James Kobielus. We'll have more of theCUBE's live coverage of DataWorks just after this.

Published Date : Jun 20 2018

SUMMARY :

brought to you by Hortonworks. She is the Vice President of Marketing Thank you for having me and that humans actually of the time is that yeah. I mean, who can crack but also make sure that the robots That's right. And really canceled because Alation is in the and the analysis of it. and leading to actual recommendations the way I see it. and to be comfortable with It's a requirement for GDPR compliance, the metadata, but also helping to manage that plays into what Alation that metadata or the data itself, or supplement and complement the data. it's all the logic, I love to geek out, but, She's the normal human being I'm normal. I'm the abnormal and know what to apply that needs to be in the That's right, we're always hiring. and how they should be able to apply that. Yeah, this conference It's a business culture and how comprehensive they are. in the organization. and in standard language? on data scientists to be to explain, and I mean, and the data scientist to rank the relevance rank, you know, definition of data catalog. and dry, and useless. And that's an important inside the big data catalog. and I don't quite have the transparency and so on, to explain all this, Yeah, and our politicians. and many of our politicians and the tech community today. is getting the managers to trust the data, and has access to be and I know them, yeah. or the students, you know, the teacher. the teacher. in the language, but that was That's violating much fun to have you here. It's a pleasure to be here. We'll have more of theCUBE's live coverage

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
Michigan	LOCATION	0.99+
Stephanie	PERSON	0.99+
Berlin	LOCATION	0.99+
James	PERSON	0.99+
100%	QUANTITY	0.99+
Kevin Slavin	PERSON	0.99+
San Jose	LOCATION	0.99+
millions	QUANTITY	0.99+
Cathy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Pfizer	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
Munich Re Insurance	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
congress	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Joe Oppenheimer	PERSON	0.99+
Python	TITLE	0.99+
10	QUANTITY	0.99+
Meijers	ORGANIZATION	0.99+
Zuckerberg	PERSON	0.99+
16 different algorithms	QUANTITY	0.99+
Weapons of Math Destruction	TITLE	0.99+
GDPR	TITLE	0.99+
One	QUANTITY	0.98+
each step	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
about 36%	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
Tableau	TITLE	0.97+
about 30	QUANTITY	0.97+
Hortonworks	ORGANIZATION	0.97+
Alation	ORGANIZATION	0.96+
one algorithm	QUANTITY	0.96+
30 different vendors	QUANTITY	0.95+
billions of dollars	QUANTITY	0.95+
2015	DATE	0.95+
SaaS	TITLE	0.94+
one	QUANTITY	0.94+
Gigantic	ORGANIZATION	0.93+
first	QUANTITY	0.9+
MicroStrategy	TITLE	0.88+
this morning	DATE	0.88+
couple of months ago	DATE	0.84+
today	DATE	0.81+
Meijer	ORGANIZATION	0.77+
Wiki	TITLE	0.74+
Vice President	PERSON	0.72+
DataWorks	ORGANIZATION	0.71+
Alation	PERSON	0.53+
DataWorks	EVENT	0.43+

John Kreisa, Hortonworks | DataWorks Summit 2018

>> Live from San José, in the heart of Silicon Valley, it's theCUBE! Covering DataWorks Summit 2018. Brought to you by Hortonworks. (electro music) >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San José, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by John Kreisa. He is the VP of marketing here at Hortonworks. Thanks so much for coming on the show. >> Thank you for having me. >> We've enjoyed watching you on the main stage, it's been a lot of fun. >> Thank you, it's been great. It's been great general sessions, some great talks. Talking about the technology, we've heard from some customers, some third parties, and most recently from Kevin Slavin from The Shed which is really amazing. >> So I really want to get into this event. You have 2,100 attendees from 23 different countries, 32 different industries. >> Yep. This started as a small, >> That's right. tiny little thing! >> Didn't Yahoo start it in 2008? >> It did, yeah. >> You changed names a few year ago, but it's still the same event, looming larger and larger. >> Yeah! >> It's been great, it's gone international as you've said. It's actually the 17th total event that we've done. >> Yeah. >> If you count the ones we've done in Europe and Asia. It's a global community around data, so it's no surprise. The growth has been phenomenal, the energy is great, the innovations that the community is talking about, the ecosystem is talking about, is really great. It just continues to evolve as an event, it continues to bring new ideas and share those ideas. >> What are you hearing from customers? What are they buzzing about? Every morning on the main stage, you do different polls that say, "how much are you using machine learning? What portion of your data are you moving to the cloud?" What are you learning? >> So it's interesting because we've done similar polls in our show in Berlin, and the results are very similar. We did the cloud poll pole and there's a lot of buzz around cloud. What we're hearing is there's a lot of companies that are thinking about, or are somewhere along their cloud journey. It's exactly what their overall plans are, and there's a lot of news about maybe cloud will eat everything, but if you look at the pole results, something like 75% of the attendees said they have cloud in their plans. Only about 12% said they're going to move everything to the cloud, so a lot of hybrid with cloud. It's how to figure out which work loads to run where, how to think about that strategy in terms of where to deploy the data, where to deploy the work loads and what that should look like and that's one of the main things that we're hearing and talking a lot about. >> We've been seeing that Wikiban and our recent update to the recent market forecast showed that public cloud will dominate increasingly in the coming decade, but hybrid cloud will be a long transition period for many or most enterprises who are still firmly rooted in on-premises employment, so forth and so on. Clearly, the bulk of your customers, both of your custom employments are on premise. >> They are. >> So you're working from a good starting point which means you've got what, 1,400 customers? >> That's right, thereabouts. >> Predominantly on premises, but many of them here at this show want to sustain their investment in a vendor that provides them with that flexibility as they decide they want to use Google or Microsoft or AWS or IBM for a particular workload that their existing investment to Hortonworks doesn't prevent them from facilitating. It moves that data and those workloads. >> That's right. The fact that we want to help them do that, a lot of our customers have, I'll call it a multi-cloud strategy. They want to be able to work with an Amazon or a Google or any of the other vendors in the space equally well and have the ability to move workloads around and that's one of the things that we can help them with. >> One of the things you also did yesterday on the main stage, was you talked about this conference in the greater context of the world and what's going on right now. This is happening against the backdrop of the World Cup, and you said that this is really emblematic of data because this is a game, a tournament that generates tons of data. >> A tremendous amount of data. >> It's showing how data can launch new business models, disrupt old ones. Where do you think we're at right now? For someone who's been in this industry for a long time, just lay the scene. >> I think we're still very much at the beginning. Even though the conference has been around for awhile, the technology has been. It's emerging so fast and just evolving so fast that we're still at the beginning of all the transformations. I've been listening to the customer presentations here and all of them are at some point along the journey. Many are really still starting. Even in some of the polls that we had today talked about the fact that they're very much at the beginning of their journey with things like streaming or some of the A.I. machine learning technologies. They're at various stages, so I believe we're really at the beginning of the transformation that we'll see. >> That reminds me of another detail of your product portfolio or your architecture streaming and edge deployments are also in the future for many of your customers who still primarily do analytics on data at rest. You made an investment in a number of technologies NiFi from streaming. There's something called MiNiFi that has been discussed here at this show as an enabler for streaming all the way out to edge devices. What I'm getting at is that's indicative of Arun Murthy, one of your co-founders, has made- it was a very good discussion for us analysts and also here at the show. That is one of many investments you're making is to prepare for a future that will set workloads that will be more predominant in the coming decade. One of the new things I've heard this week that I'd not heard in terms of emphasis from you guys is more of an emphasis on data warehousing as an important use case for HDP in your portfolios, specifically with HIVE. The HIVE 3.0 now in- HDP3.0. >> Yes. >> With the enhancements to HIVE to support more real time and low latency, but also there's ACID capabilities there. I'm hearing something- what you guys are doing is consistent with one of your competitors, Cloudera. They're going deeper into data warehousing too because they recognize they've got to got there like you do to be able to absorb more of your customers' workloads. I think that's important that you guys are making that investment. You're not just big data, you're all data and all data applications. Potentially, if your customers want to go there and engage you. >> Yes. >> I think that was a significant, subtle emphasis that me as an analyst noticed. >> Thank you. There were so many enhancements in 3.0 that were brought from the community that it was hard to talk about everything in depth, but you're right. The enhancements to HIVE in terms of performance have really enabled it to take on a greater set of workloads and inner activity that we know that our customers want. The advantage being that you have a common data layer in the back end and you can run all this different work. It might be data warehousing, high speed query workloads, but you can do it on that same data with Spark and data-science related workloads. Again, it's that common pool backend of the data lake and having that ability to do it with common security and governance. It's one of the benefits our customers are telling us they really appreciate. >> One of the things we've also heard this morning was talking about data analytics in terms of brand value and brand protection importantly. Fedex, exactly. Talking about, the speaker said, we've all seen these apology commercials. What do you think- is it damage control? What is the customer motivation here? >> Well a company can have billions of dollars of market cap wiped out by breeches in security, and we've seen it. This is not theoretical, these are actual occurrences that we've seen. Really, they're trying to protect the brand and the business and continue to be viable. They can get knocked back so far that it can take years to recover from the impact. They're looking at the security aspects of it, the governance of their data, the regulations of GVPR. These things you've mentioned have real financial impact on the businesses, and I think it's brand and the actual operations and finances of the businesses that can be impacted negatively. >> When you're thinking about Hortonworks's marketing messages going forward, how do you want to be described now, and then how do you want customers to think of you five or 10 years from now? >> I want them to think of us as a partner to help us with their data journey, on all aspects of their data journey, whether they're collecting data from the EDGE, you mentioned NiFi and things like that. Bringing that data back, processing it in motion, as well as processing it in rest, regardless of where that data lands. On premise, in the cloud, somewhere in between, the hybrid, multi-cloud strategy. We really want to be thought of as their partner in their data journey. That's really what we're doing. >> Even going forward, one of the things you were talking about earlier is the company's sort of saying, "we want to be boring. We want to help you do all the stuff-" >> There's a lot of money in boring. >> There's a lot of money, right! Exactly! As you said, a partner in their data journey. Is it "we'll do anything and everything"? Are you going to do niche stuff? >> That's a good question. Not everything. We are focused on the data layer. The movement of data, the process and storage, and truly the analytic applications that can be built on top of the platform. Right now we've stuck to our strategy. It's been very consistent since the beginning of the company in terms of taking these open source technologies, making them enterprise viable, developing an eco-system around it and fostering a community around it. That's been our strategy since before the company even started. We want to continue to do that and we will continue to do that. There's so much innovation happening in the community that we quickly bring that into the products and make sure that's available in a trusted, enterprise-tested platform. That's really one of the things we see our customers- over and over again they select us because we bring innovation to them quickly, in a safe and consumable way. >> Before we came on camera, I was telling Rebecca that Hortonworks has done a sensational job of continuing to align your product roadmaps with those of your leading partners. IBM, AWS, Microsoft. In many ways, your primary partners are not them, but the entire open source community. 26 open source projects in which Hortonworks represents and incorporated in your product portfolio in which you are a primary player and committer. You're a primary ingester of innovation from all the communities in which you operate. >> We do. >> That is your core business model. >> That's right. We both foster the innovation and we help drive the information ourselves with our engineers and architects. You're absolutely right, Jim. It's the ability to get that innovation, which is happening so fast in the community, into the product and companies need to innovate. Things are happening so fast. Moore's Law was mentioned multiple times on the main stage, you know, and how it's impacting different parts of the organization. It's not just the technology, but business models are evolving quickly. We heard a little bit about Trumble, and if you've seen Tim Leonard's talk that he gave around what they're doing in terms of logistics and the ability to go all the way out to the farmer and impact what's happening at the farm and tracking things down to the level of a tomato or an egg all the way back and just understand that. It's evolving business models. It's not just the tech but the evolution of business models. Rob talked about it yesterday. I think those are some of the things that are kind of key. >> Let me stay on that point really quick. Industrial internet like precision agriculture and everything it relates to, is increasingly relying on visual analysis, parts and eggs and whatever it might be. That is convolutional neural networks, that is A.I., it has to be trained, and it has to be trained increasingly in the cloud where the data lives. The data lives in H.D.P, clusters and whatnot. In many ways, no matter where the world goes in terms of industrial IoT, there will be massive cluster of HTFS and object storage driving it and also embedded A.I. models that have to follow a specific DevOps life cycle. You guys have a strong orientation in your portfolio towards that degree of real-time streaming, as it were, of tasks that go through the entire life cycle. From the preparing the data, to modeling, to training, to deploying it out, to Google or IBM or wherever else they want to go. So I'm thinking that you guys are in a good position for that as well. >> Yeah. >> I just wanted to ask you finally, what is the takeaway? We're talking about the attendees, talking about the community that you're cultivating here, theme, ideas, innovation, insight. What do you hope an attendee leaves with? >> I hope that the attendee leaves educated, understanding the technology and the impacts that it can have so that they will go back and change their business and continue to drive their data projects. The whole intent is really, and we even changed the format of the conference for more educational opportunities. For me, I want attendees to- a satisfied attendee would be one that learned about the things they came to learn so that they could go back to achieve the goals that they have when they get back. Whether it's business transformation, technology transformation, some combination of the two. To me, that's what I hope that everyone is taking away and that they want to come back next year when we're in Washington, D.C. and- >> My stomping ground. >> His hometown. >> Easy trip for you. They'll probably send you out here- (laughs) >> Yeah, that's right. >> Well John, it's always fun talking to you. Thank you so much. >> Thank you very much. >> We will have more from theCUBE's live coverage of DataWorks right after this. I'm Rebecca Knight for James Kobielus. (upbeat electro music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the VP of marketing you on the main stage, Talking about the technology, So I really want to This started as a small, That's right. but it's still the same event, It's actually the 17th total event the innovations that the community is that's one of the main things that Clearly, the bulk of your customers, their existing investment to Hortonworks have the ability to move workloads One of the things you also did just lay the scene. Even in some of the polls that One of the new things I've heard this With the enhancements to HIVE to subtle emphasis that me the data lake and having that ability to One of the things we've also aspects of it, the the EDGE, you mentioned NiFi and one of the things you were talking There's a lot of money, right! That's really one of the things we all the communities in which you operate. It's the ability to get that innovation, the cloud where the data lives. talking about the community that learned about the things they came to They'll probably send you out here- fun talking to you. coverage of DataWorks right after this.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Tim Leonard	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
Jim	PERSON	0.99+
Kevin Slavin	PERSON	0.99+
Europe	LOCATION	0.99+
John Kreisa	PERSON	0.99+
Berlin	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
2008	DATE	0.99+
Washington, D.C.	LOCATION	0.99+
Asia	LOCATION	0.99+
75%	QUANTITY	0.99+
Rob	PERSON	0.99+
five	QUANTITY	0.99+
San José	LOCATION	0.99+
next year	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
32 different industries	QUANTITY	0.99+
World Cup	EVENT	0.99+
yesterday	DATE	0.99+
23 different countries	QUANTITY	0.99+
one	QUANTITY	0.99+
1,400 customers	QUANTITY	0.99+
today	DATE	0.99+
two	QUANTITY	0.99+
2,100 attendees	QUANTITY	0.99+
Fedex	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
26 open source projects	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.98+
17th	QUANTITY	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
Cloudera	ORGANIZATION	0.97+
about 12%	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
this week	DATE	0.96+
DataWorks Summit 2018	EVENT	0.95+
NiFi	ORGANIZATION	0.91+
this morning	DATE	0.89+
HIVE 3.0	OTHER	0.86+
Spark	TITLE	0.86+
few year ago	DATE	0.85+
Wikiban	ORGANIZATION	0.85+
The Shed	ORGANIZATION	0.84+
San José, California	LOCATION	0.84+
tons	QUANTITY	0.82+
H.D.P	LOCATION	0.82+
DataWorks	EVENT	0.81+
things	QUANTITY	0.78+
DataWorks	ORGANIZATION	0.74+
MiNiFi	TITLE	0.62+
data	QUANTITY	0.61+
Moore	TITLE	0.6+
years	QUANTITY	0.59+
coming decade	DATE	0.59+
Trumble	ORGANIZATION	0.59+
GVPR	ORGANIZATION	0.58+
3.0	OTHER	0.56+

Day Two Kickoff | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to day two of theCube's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight along with my co-host James Kobielus. James, it's great to be here with you in the hosting seat again. >> Day two, yes. >> Exactly. So here we are, this conference, 2,100 attendees from 32 countries, 23 industries. It's a relatively big show. They do three of them during the year. One of the things that I really-- >> It's a well-established show too. I think this is like the 11th year since Yahoo started up the first Hadoop summit in 2008. >> Right, right. >> So it's an established event, yeah go. >> Exactly, exactly. But I really want to talk about Hortonworks the company. This is something that you had brought up in an analyst report before the show started and that was talking about Hortonworks' cash flow positivity for the first time. >> Which is good. >> Which is good, which is a positive sign and yet what are the prospects for this company's financial health? We're still not seeing really clear signs of robust financial growth. >> I think the signs are good for the simple reason they're making significant investments now to prepare for the future that's almost inevitable. And the future that's almost inevitable, and when I say the future, the 2020s, the decade that's coming. Most of their customers will shift more of their workloads, maybe not entirely yet, to public cloud environments for everything they're doing, AI, machine learning, deep learning. And clearly the beneficiaries of that trend will be the public cloud providers, all of whom are Hortonworks' partners and established partners, AWS, Microsoft with Azure, Google with, you know, Google Cloud Platform, IBM with IBM Cloud. Hortonworks, and this is... You know, their partnerships with these cloud providers go back several years so it's not a new initiative for them. They've seen the writing on the wall practically from the start of Hortonworks' founding in 2011 and they now need to go deeper towards making their solution portfolio capable of being deployable on-prem, in cloud, public clouds, and in various and sundry funky combinations called hybrid multi-clouds. Okay, so, they've been making those investments in those partnerships and in public cloud enabling the Hortonworks Data Platform. Here at this show, DataWorks 2018 here in San Jose, they've released the latest major version, HDP 3.0 of their core platform with a lot of significant enhancements related to things that their customers are increasingly doing-- >> Well I want to ask you about those enhancements. >> But also they have partnership announcements, the deep ones of integration and, you know, lift and shift of the Hortonworks portfolio of HDP with Hortonworks DataFlow and DataPlane Services, so that those solutions can operate transparently on those public cloud environments as the customers, as and when the customers choose to shift their workloads. 'Cause Hortonworks really... You know, like Scott Gnau yesterday, I mean just laid it on the line, they know that the more of the public cloud workloads will predominate now in this space. They're just making these speculative investments that they absolutely have to now to prepare the way. So I think this cost that they're incurring now to prepare their entire portfolio for that inevitable future is the right thing to do and that's probably why they still have not attained massive rock and rollin' positive cash flow yet but I think that they're preparing the way for them to do so in the coming decade. >> So their financial future is looking brighter and they're doing the right things. >> Yeah, yes. >> So now let's talk tech. And this is really where you want to be, Jim, I know you. >> Oh I get sleep now and I don't think about tech constantly. >> So as you've said, they're really doing a lot of emphasis now on their public cloud partnerships. >> Yes. >> But they've also launched several new products and upgrades to existing products, what are you seeing that excites you and that you think really will be potential game changers? >> You know, this is geeky but this is important 'cause it's at the very heart of Hortonworks Data Platform 3.0, containerization of more... When you're a data scientist, and you're building a machine learning model using data that's maintained, and is persisted, and processed within Hortonworks Data Platform or any other big data platform, you want the ability increasingly for developing machine learning, deep learning, AI in general, to take that application you might build while you're using TensorFlow models, that you build on HDP, they will containerize it in Docker and, you know, orchestrate it all through Kubernetes and all that wonderful stuff, and deploy it out, those AI, out to increasingly edge computing, mobile computing, embedded computing environments where, you know, the real venture capital mania's happening, things like autonomous vehicles, and you know, drones, and you name it. So the fact is that Hortonworks has made that in many ways the premier new feature of HDP 3.0 announced here this week at the show. That very much harmonizes with what their partners, where their partners are going with containerization of AI. IBM, one of their premier partners, very recently, like last month, I think it was, announced the latest version of IBM, what do they call it, IBM Cloud Private, which has embedded as a core feature containerization within that environment which is a prem-based environment of AI and so forth. The fact that Hortonworks continues to maintain close alignment with the capabilities that its public cloud partners are building to their respective portfolios is important. But also Hortonworks with its, they call it, you know, a single pane of glass, the DataPlane Services for metadata and monitoring and governance and compliance across this sprawling hybrid multi-cloud, these scenarios. The fact that they're continuing to make, in fact, really focusing on deep investments in that portfolio, so that when an IBM introduces or, AWS, whoever, introduces some new feature in their respective platforms, Hortonworks has the ability to, as it were, abstract above and beyond all of that so that the customer, the developer, and the data administrator, all they need to do, if they're a Hortonworks customer, is stay within the DataPlane Services and environment to be able to deploy with harmonized metadata and harmonized policies, and harmonized schemas and so forth and so on, and query optimization across these sprawling environments. So Hortonworks, I think, knows where their bread is buttered and it needs to stay on the DPS, DataPlane Services, side which is why a couple months ago in Berlin, Hortonworks made a, I think, the most significant announcement of the year for them and really for the industry, was that they announced the Data Steward Studio in Berlin. Tech really clearly was who addressed the GDPR mandate that was coming up but really did a stewardship as an end-to-end workflow for lots of, you know, core enterprise applications, absolutely essential. Data Steward Studio is a DataPlane Service that can operate across multi-cloud environments. Hortonworks is going to keep on, you know... They didn't have a DPS, DataPlane Services, announcements here in San Jose this week but you can best believe that next year at this time at this show, and in the interim they'll probably have a number of significant announcements to deepen that portfolio. Once again it's to grease the wheels towards a more purely public cloud future in which there will be Hortonworks DNA inside most of their customers' environments going forward. >> I want to ask you about themes of this year's conference. The thing is is that you were in Berlin at the last big Hortonworks DataWorks Summit. >> (speaks in foreign language) >> And really GDPR dominated the conversations because the new rules and regulations hadn't yet taken effect and companies were sort of bracing for what life was going to be like under GDPR. Now the rules are here, they're here to stay, and companies are really grappling with it, trying to understand the changes and how they can exist in this new regime. What would you say are the biggest themes... We're still talking about GDPR, of course, but what would you say are the bigger themes that are this week's conference? Is it scalability, is it... I mean, what would you say we're going, what do you think has dominated the conversations here? >> Well scalability is not the big theme this week though there are significant scalability announcements this week in the context of HDP 3.0, the ability to persist in a scale-out fashion across multi-cloud, billions of files. Storage efficiency is an important piece of the overall announcement with support for erasure coding, blah blah blah. That's not, you know, that's... Already, Hortonworks, like all of their cloud providers and other big data providers, provide very scalable environments for storage, workload management. That was not the hugest, buzzy theme in terms of the announcements this week. The buzz of course was HDP 3.0. Containerization, that's important, but you know, we just came out of the day two keynote. AI is not a huge focus yet for a lot of the Hortonworks customers who are here, the developers. They're, you know, most of their customers are not yet that far along in their deep learning journeys and whatever but they're definitely going there. There's plenty of really cool keynote discussions including the guy with the autonomous vehicles or whatever that, the thing we just came out of. That was not the predominant theme this week here in terms of the HDP 3.0. I think what it comes down to is that with HDP 3.0... Hive, though you tend to take it for granted, it's been in Hadoop from the very start, practically, Hive is now a full enterprise database and that's the core, one of the cores, of HDP 3.0. Hive itself, Hive 3.0 now is its version, is ACID compliant and that may be totally geeky to the most of the world but that enables it to support transactional applications. So more big data in every environment is supporting more traditional enterprise application, transactional applications that require like two-phase commit and all that goodness. The fact is, you know, Hortonworks have, from what I can see, is the first of the big data vendors to incorporate those enhancements to Hive 3.0 because they're so completely tuned in to the Hive environment in terms of a committer. I think in many ways that is the predominant theme in terms of the new stuff that will actually resonate with the developers, their customers here at the show. And with the, you know, enterprises in general, they can put more of their traditional enterprise application workloads on big data environments and specifically, Hortonworks hopes, its HDP 3.0. >> Well I'm excited to learn more here at the on theCube with you today. We've got a lot of great interviews lined up and a lot of interesting content. We got a great crew too so this is a fun show to do. >> Sure is. >> We will have more from day two of the.

Published Date : Jun 20 2018

SUMMARY :

Live from San Jose, in the heart James, it's great to be here with you One of the things that I really-- I think this is like the So it's an This is something that you had brought up of robust financial growth. in public cloud enabling the Well I want to ask you is the right thing to do doing the right things. And this is really where you Oh I get sleep now and I don't think of emphasis now on their announcement of the year at the last big Hortonworks because the new rules of the announcements this week. this is a fun show to do.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Hortonworks'	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
2011	DATE	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
James	PERSON	0.99+
23 industries	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Hive 3.0	TITLE	0.99+
2020s	DATE	0.99+
next year	DATE	0.99+
this week	DATE	0.99+
32 countries	QUANTITY	0.99+
Hive	TITLE	0.99+
11th year	QUANTITY	0.99+
yesterday	DATE	0.99+
first time	QUANTITY	0.99+
GDPR	TITLE	0.98+
last month	DATE	0.98+
DataPlane Services	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Scott Gnau	PERSON	0.98+
2008	DATE	0.98+
three	QUANTITY	0.98+
2,100 attendees	QUANTITY	0.98+
HDP 3.0	TITLE	0.98+
today	DATE	0.98+
Data Steward Studio	ORGANIZATION	0.98+
two-phase	QUANTITY	0.98+
one	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.96+
DataPlane	ORGANIZATION	0.96+
Day two	QUANTITY	0.96+
billions of files	QUANTITY	0.95+
first	QUANTITY	0.95+
day two	QUANTITY	0.95+
DPS	ORGANIZATION	0.95+
Data Platform 3.0	TITLE	0.94+
Hortonworks DataWorks Summit	EVENT	0.94+
DataWorks	EVENT	0.92+

Pandit Prasad, IBM | DataWorks Summit 2018

>> From San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE's live coverage of Data Works here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Pandit Prasad. He is the analytics, projects, strategy, and management at IBM Analytics. Thanks so much for coming on the show. >> Thanks Rebecca, glad to be here. >> So, why don't you just start out by telling our viewers a little bit about what you do in terms of in relationship with the Horton Works relationship and the other parts of your job. >> Sure, as you said I am in Offering Management, which is also known as Product Management for IBM, manage the big data portfolio from an IBM perspective. I was also working with Hortonworks on developing this relationship, nurturing that relationship, so it's been a year since the Northsys partnership. We announced this partnership exactly last year at the same conference. And now it's been a year, so this year has been a journey and aligning the two portfolios together. Right, so Hortonworks had HDP HDF. IBM also had similar products, so we have for example, Big Sequel, Hortonworks has Hive, so how Hive and Big Sequel align together. IBM has a Data Science Experience, where does that come into the picture on top of HDP, so it means before this partnership if you look into the market, it has been you sell Hadoop, you sell a sequel engine, you sell Data Science. So what this year has given us is more of a solution sell. Now with this partnership we go to the customers and say here is NTN experience for you. You start with Hadoop, you put more analytics on top of it, you then bring Big Sequel for complex queries and federation visualization stories and then finally you put Data Science on top of it, so it gives you a complete NTN solution, the NTN experience for getting the value out of the data. >> Now IBM a few years back released a Watson data platform for team data science with DSX, data science experience, as one of the tools for data scientists. Is Watson data platform still the core, I call it dev ops for data science and maybe that's the wrong term, that IBM provides to market or is there sort of a broader dev ops frame work within which IBM goes to market these tools? >> Sure, Watson data platform one year ago was more of a cloud platform and it had many components of it and now we are getting a lot of components on to the (mumbles) and data science experience is one part of it, so data science experience... >> So Watson analytics as well for subject matter experts and so forth. >> Yes. And again Watson has a whole suit of side business based offerings, data science experience is more of a a particular aspect of the focus, specifically on the data science and that's been now available on PRAM and now we are building this arm from stack, so we have HDP, HDF, Big Sequel, Data Science Experience and we are working towards adding more and more to that portfolio. >> Well you have a broader reference architecture and a stack of solutions AI and power and so for more of the deep learning development. In your relationship with Hortonworks, are they reselling more of those tools into their customer base to supplement, extend what they already resell DSX or is that outside of the scope of the relationship? >> No it is all part of the relationship, these three have been the core of what we announced last year and then there are other solutions. We have the whole governance solution right, so again it goes back to the partnership HDP brings with it Atlas. IBM has a whole suite of governance portfolio including the governance catalog. How do you expand the story from being a Hadoop-centric story to an enterprise data-like story, and then now we are taking that to the cloud that's what Truata is all about. Rob Thomas came out with a blog yesterday morning talking about Truata. If you look at it is nothing but a governed data-link hosted offering, if you want to simplify it. That's one way to look at it caters to the GDPR requirements as well. >> For GDPR for the IBM Hortonworks partnership is the lead solution for GDPR compliance, is it Hortonworks Data Steward Studio or is it any number of solutions that IBM already has for data governance and curation, or is it a combination of all of that in terms of what you, as partners, propose to customers for soup to nuts GDPR compliance? Give me a sense for... >> It is a combination of all of those so it has a HDP, its has HDF, it has Big Sequel, it has Data Science Experience, it had IBM governance catalog, it has IBM data quality and it has a bunch of security products, like Gaurdium and it has some new IBM proprietary components that are very specific towards data (cough drowns out speaker) and how do you deal with the personal data and sensitive personal data as classified by GDPR. I'm supposed to query some high level information but I'm not allowed to query deep into the personal information so how do you blog those queries, how do you understand those, these are not necessarily part of Data Steward Studio. These are some of the proprietary components that are thrown into the mix by IBM. >> One of the requirements that is not often talked about under GDPR, Ricky of Formworks got in to it a little bit in his presentation, was the notion that the requirement that if you are using an UE citizen's PII to drive algorithmic outcomes, that they have the right to full transparency. It's the algorithmic decision paths that were taken. I remember IBM had a tool under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it was called Watson Curator a few years back, is that a solution that IBM still offers, because I'm getting a sense right now that Hortonworks has a specific solution, not to say that they may not be working on it, that addresses that side of GDPR, do you know what I'm referring to there? >> I'm not aware of something from the Hortonworks side beyond the Data Steward Studio, which offers basically identification of what some of the... >> Data lineage as opposed to model lineage. It's a subtle distinction. >> It can identify some of the personal information and maybe provide a way to tag it and hence, mask it, but the Truata offering is the one that is bringing some new research assets, after GDPR guidelines became clear and then they got into they are full of how do we cater to those requirements. These are relatively new proprietary components, they are not even being productized, that's why I am calling them proprietary components that are going in to this hosting service. >> IBM's got a big portfolio so I'll understand if you guys are still working out what position. Rebecca go ahead. >> I just wanted to ask you about this new era of GDPR. The last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it, in the sense of they're really still understand the ripple effects and how it's all going to play out? How would you describe your interactions with companies in terms of how they're dealing with these new requirements? >> They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. For example I met with a customer and they are a multi-national company. They have data centers across different geos and they asked me, I have somebody from Asia trying to query the data so that the query should go to Europe, but the query processing should not happen in Asia, the query processing all should happen in Europe, and only the output of the query should be sent back to Asia. You won't be able to think in these terms before the GDPR guidance era. >> Right, exceedingly complicated. >> Decoupling storage from processing enables those kinds of fairly complex scenarios for compliance purposes. >> It's not just about the access to data, now you are getting into where the processing happens were the results are getting displayed, so we are getting... >> Severe penalties for not doing that so your customers need to keep up. There was announcement at this show at Dataworks 2018 of an IBM Hortonwokrs solution. IBM post-analytics with with Hortonworks. I wonder if you could speak a little bit about that, Pandit, in terms of what's provided, it's a subscription service? If you could tell us what subset of IBM's analytics portfolio is hosted for Hortonwork's customers? >> Sure, was you said, it is a a hosted offering. Initially we are starting of as base offering with three products, it will have HDP, Big Sequel, IBM DB2 Big Sequel and DSX, Data Science Experience. Those are the three solutions, again as I said, it is hosted on IBM Cloud, so customers have a choice of different configurations they can choose, whether it be VMs or bare metal. I should say this is probably the only offering, as of today, that offers bare metal configuration in the cloud. >> It's geared to data scientist developers and machine-learning models will build the models and train them in IBM Cloud, but in a hosted HDP in IBM Cloud. Is that correct? >> Yeah, I would rephrase that a little bit. There are several different offerings on the cloud today and we can think about them as you said for ad-hoc or ephemeral workloads, also geared towards low cost. You think about this offering as taking your on PRAM data center experience directly onto the cloud. It is geared towards very high performance. The hardware and the software they are all configured, optimized for providing high performance, not necessarily for ad-hoc workloads, or ephemeral workloads, they are capable of handling massive workloads, on sitcky workloads, not meant for I turned this massive performance computing power for a couple of hours and then switched them off, but rather, I'm going to run these massive workloads as if it is located in my data center, that's number one. It comes with the complete set of HDP. If you think about it there are currently in the cloud you have Hive and Hbase, the sequel engines and the stories separate, security is optional, governance is optional. This comes with the whole enchilada. It has security and governance all baked in. It provides the option to use Big Sequel, because once you get on Hadoop, the next experience is I want to run complex workloads. I want to run federated queries across Hadoop as well as other data storage. How do I handle those, and then it comes with Data Science Experience also configured for best performance and integrated together. As a part of this partnership, I mentioned earlier, that we have progress towards providing this story of an NTN solution. The next steps of that are, yeah I can say that it's an NTN solution but are the product's look and feel as if they are one solution. That's what we are getting into and I have featured some of those integrations. For example Big Sequel, IBM product, we have been working on baking it very closely with HDP. It can be deployed through Morey, it is integrated with Atlas and Granger for security. We are improving the integrations with Atlas for governance. >> Say you're building a Spark machine learning model inside a DSX on HDP within IH (mumbles) IBM hosting with Hortonworks on HDP 3.0, can you then containerize that machine learning Sparks and then deploy into an edge scenario? >> Sure, first was Big Sequel, the next one was DSX. DSX is integrated with HDP as well. We can run DSX workloads on HDP before, but what we have done now is, if you want to run the DSX workloads, I want to run a Python workload, I need to have Python libraries on all the nodes that I want to deploy. Suppose you are running a big cluster, 500 cluster. I need to have Python libraries on all 500 nodes and I need to maintain the versioning of it. If I upgrade the versions then I need to go and upgrade and make sure all of them are perfectly aligned. >> In this first version will you be able build a Spark model and a Tesorflow model and containerize them and deploy them. >> Yes. >> Across a multi-cloud and orchestrate them with Kubernetes to do all that meshing, is that a capability now or planned for the future within this portfolio? >> Yeah, we have that capability demonstrated in the pedestal today, so that is a new one integration. We can run virtual, we call it virtual Python environment. DSX can containerize it and run data that's foreclosed in the HDP cluster. Now we are making use of both the data in the cluster, as well as the infrastructure of the cluster itself for running the workloads. >> In terms of the layers stacked, is also incorporating the IBM distributed deep-learning technology that you've recently announced? Which I think is highly differentiated, because deep learning is increasingly become a set of capabilities that are across a distributed mesh playing together as is they're one unified application. Is that a capability now in this solution, or will it be in the near future? DPL distributed deep learning? >> No, we have not yet. >> I know that's on the AI power platform currently, gotcha. >> It's what we'll be talking about at next year's conference. >> That's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration, next one is, depending on how the customers react to it, definitely we're thinking about bare metal with GPUs optimized for Tensorflow workloads. >> Exciting, we'll be tuned in the coming months and years I'm sure you guys will have that. >> Pandit, thank you so much for coming on theCUBE. We appreciate it. I'm Rebecca Knight for James Kobielus. We will have, more from theCUBE's live coverage of Dataworks, just after this.

Published Date : Jun 19 2018

SUMMARY :

Brought to you by Hortonworks. Thanks so much for coming on the show. and the other parts of your job. and aligning the two portfolios together. and maybe that's the wrong term, getting a lot of components on to the (mumbles) and so forth. a particular aspect of the focus, and so for more of the deep learning development. No it is all part of the relationship, For GDPR for the IBM Hortonworks partnership the personal information so how do you blog One of the requirements that is not often I'm not aware of something from the Hortonworks side Data lineage as opposed to model lineage. It can identify some of the personal information if you guys are still working out what position. in the sense of they're really still understand the and interpret the requirements coming to terms kinds of fairly complex scenarios for compliance purposes. It's not just about the access to data, I wonder if you could speak a little that offers bare metal configuration in the cloud. It's geared to data scientist developers in the cloud you have Hive and Hbase, can you then containerize that machine learning Sparks on all the nodes that I want to deploy. In this first version will you be able build of the cluster itself for running the workloads. is also incorporating the IBM distributed It's what we'll be talking next one is, depending on how the customers react to it, I'm sure you guys will have that. Pandit, thank you so much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Pandit	PERSON	0.99+
last year	DATE	0.99+
Python	TITLE	0.99+
yesterday morning	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
three solutions	QUANTITY	0.99+
Ricky	PERSON	0.99+
Northsys	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Pandit Prasad	PERSON	0.99+
GDPR	TITLE	0.99+
IBM Analytics	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
both	QUANTITY	0.99+
one year ago	DATE	0.98+
Hortonwork	ORGANIZATION	0.98+
three	QUANTITY	0.98+
today	DATE	0.98+
DSX	TITLE	0.98+
Formworks	ORGANIZATION	0.98+
this year	DATE	0.98+
Atlas	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Granger	ORGANIZATION	0.97+
Gaurdium	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Data Steward Studio	ORGANIZATION	0.97+
two portfolios	QUANTITY	0.97+
Truata	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
one solution	QUANTITY	0.96+
one way	QUANTITY	0.95+
next year	DATE	0.94+
500 nodes	QUANTITY	0.94+
NTN	ORGANIZATION	0.93+
Watson	TITLE	0.93+
Hortonworks	PERSON	0.93+

Cindy Maike, Hortonworks | DataWorks Summit 2018

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. She is the VP Industry Thank you, thank about the business case and your approach kind of like the operational reporting. the questions that I haven't asked yet. And then you know, the last goods, you explain it. before it expires you know, of the produce or are you also looking at you know, about data as the new oil but you know, you can make actually you know, use? actually you know, I mentioned that says, you know, if I have, the industry and you made accelerate the time to value business of the future. of the insurance industry. competitive differentiation of the whole Right, and you think Thank you Cindy.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Cindy	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Cindy Maike	PERSON	0.99+
General Motors	ORGANIZATION	0.99+
General Data Protection act	TITLE	0.99+
San Jose	LOCATION	0.99+
10	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
58 percent	QUANTITY	0.99+
Arity	ORGANIZATION	0.99+
GDPR	TITLE	0.98+
20 years ago	DATE	0.98+
On Star	ORGANIZATION	0.98+
once a month	QUANTITY	0.98+
GM Insurance	ORGANIZATION	0.97+
theCUBE	ORGANIZATION	0.97+
Data Works Summit 2018	EVENT	0.96+
one	QUANTITY	0.96+
today	DATE	0.96+
DataWorks Summit 2018	EVENT	0.95+
both	QUANTITY	0.95+
10 years ago	DATE	0.94+
VP Industry Solutions	ORGANIZATION	0.94+
GMAC Insurance	ORGANIZATION	0.92+
this morning	DATE	0.9+
both data	QUANTITY	0.84+
five	QUANTITY	0.78+
20 years	QUANTITY	0.75+
10 years	QUANTITY	0.72+
Dataworks	ORGANIZATION	0.59+
Data Works	TITLE	0.59+
Best Review	TITLE	0.54+
theCUBE	EVENT	0.54+
State	ORGANIZATION	0.49+

Peter Smails, ImanisData | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's the Cube. Covering Dataworks Summit 2018 brought to you by Hortonworks. (upbeat music) >> Welcome back to The Cube's live coverage of Dataworks here in San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Peter Smails. He is the vice president of marketing at Imanis Data. Thanks so much for coming on The Cube. >> Thanks for having me, glad to be here. >> So you've been in the data storage solution industry for a long time, but you're new to Imanis, what made you jump? What was it about Imanis? >> Yep, so very easy to answer that. It's a hot market. So essentially what Imanis all about is we're an enterprise data management company. So the reason I jumped here is because if I put it in market context, if I take a small step back, I put it in market context, here's what happening. You've got your traditional application world, right? On prem typically already a mas based applications, that's the old world. New world is everybody's moving to the microservices based applications for IOT, for customer 360, for customer analysis, whatever you want. They're building these new modern applications. They're building those applications not in traditional RDMS, they're building them on microservices based architectures built on top of FEDOOP, or built on sequel databases. Those applications, as they go mainstream, and they go into production environments, they require data management. They require backup. They require backup and recovery. They require disaster recovery. They require archiving, etc. They require the whole plethora of data management capabilities. Nobody's touching that market. It's a blue ocean. So, that's why I'm here. >> Imanis as you were saying is one of the greatest little company no one's ever heard of. You've been around five years. (laughter) >> No, the company is not new. So, the thing that's exciting as a marketeer, what's exciting is that we're not sort of out there just pitching our wears untested technology. We have blue chip, we're getting into customers that people would die to get into. Big, blue chip companies because we're addressing a problem that's materialist. They roll out these new applications, they've got to have data management solutions for them. The company's been around five years. And I've only been on about a month, but what that's resulted is that over the last five years what they've had the opportunity, it's an enterprise product. And you don't build an enterprise product overnight. So they've had the last five years to really gestate the platform, gestate the technology, prove it in real world scenarios. And now, the opportunity for us as as a company is we're doubling down from a marketing standpoint. We're doubling down from the sales infrastructure standpoint. So the timing's right to essentially put this thing on the map, make sure everybody does know exactly what we do. Because we're solving a real world problem. >> Your backup and restore but much more. When you lay out the broad set of enterprise data and management capabilities, the mana state currently supports in your product portfolio on where you're going, on how you're going in terms of evolving in what you offer. >> Yeah, that's great. I love that question. So, think of us as the platform itself is this highly scalable distributed architecture. Okay, so we scale on multiple, and I'll come directly to your question. We scale on a number of different ways. One is we're infinitely scalable just in terms of computational power. So we're built for big data by definition. Number two is we're very, we scale very well from a storage efficiency standpoint. So we can store very large volumes of storage, which is a requirement. We also scale very much for the use case standpoint. So we support use cases throughout the life cycle. The one that gets all sort of the attention is obviously backup recovery. Because you have to protect your data. But if I look at it from a life cycle standpoint, our number use case is Test Def. So a lot of these organizations building these new apps now they want to spin up subsets of their data, cause they're supporting things like CICD. Okay, so they want to be able to do rapid testing and such. >> Develop Dev Opps and stuff like that. >> Yeah, Dev Opps and so worth. So, they need Test Def. So we help them automate the process and orchestrate the process of Test Def. Supporting things like sampling. I may have a one petabyte dataset, I'm not going to do Test Def against that. I want to do 10 percent of that and spin that up, and I want to do some masking of personal, PII data. So we can do masking and sampling against that Sport Test Def. We do backup and recovery. We do disaster recovery. So some customers, particularly in the big data space, they may for now say, well, I have replica so for some of this data it's not permanent data, it's transient data, but I do care about DR. So, DR is a key use case. We also do archiving. So if you just think of data through the life cycle, we support all of those. The piece in terms of where we're going is that what's truly unique, in addition to everything I just mentioned, is that we're the only data management platform that's machine learning based. Okay, so machine learning gets a lot of attention, and all that type of stuff, but we're actually delivering machine learning and abled capabilities today, so. >> And we discussed this before this interview. There's a bit of an anomaly detection. How exactly are you using machine learning? What value does it provide to a enterprise data administrator? They have ML inside your tool. >> Inside our platform, Great question. Very specifically, the product we're delivering today essentially there's a capability in the product called threat sets. Okay, so the number one use cases I mentioned is backup and recovery. So within backup and recovery, threat sense, what it will do with no user intervention whatsoever, what it will do is it will analyze your backups, as they go forward. And what it will do is it will learn what a normal pattern looks like across like 50 different metrics. The details of which I couldn't give you right now. But essentially, a whole bunch of different metrics that we look at to establish this is what a normal baseline looks like for you or for you, kind of thing. Great, that's number one. Number two is then we look and constantly analyze is anything occurring that is knocking things outside of that? Creating an anomaly, does something fall outside of that, and when it does, we're notifying the administrators. You might want to look at this, something could've happened. So the value very specifically is around ransomware typically one of the ways you're going to detect ransomware is you will see an anomaly in your backup set, because your data set will change materially. So we will be able to tell you, >> Cause somebody's holding for ransom is what you're saying. >> Correct, so something's going to happen in your data pattern. >> You've lost data that should be there, or whatever it might be. >> Correct, it could be that you lost data. Your change rate went way up, or something. >> Yeah, gotcha. >> There's any number of things that could trigger it. And then we let the administrator know, it happened here. So today we don't then turn around and just automatically solve that. But your point about where we're going. We've already broken the ice on delivering machine learning and abled data management. >> That might indicate you want to check point your backups to like a few days before this was detected. So the least you have, you know what data is most likely missing, so yeah, I understand. >> Bingo, that's exactly right now where we're going with that. As you could imagine, having a machine learning power data management platform at our core, how many different ways we can go with that. When do I backup? What data do I backup? How do I create the optimal RTO and IRPO? From a storage management standpoint, when do I put what data wear? There's all kinds of the whole library science of data management. The future of data management is machine learning based. There's too much data. There's too much complexity for humans to just be able to, you need to bring machine learning into the equation to help you harness the power of your data. We've broken the ice, we've got a long way to go. But we've got the platform to start with. And we've already introduced the first use case around this. And you can imagine all the places we can take this going forward. >> Very exciting. >> So you were the company that's using machine learning right now. What in your opinion will separate the winners from the losers? >> In terms of vendors, or in terms of the customers? >> Well, in terms of both. >> Yeah, let me answer that two ways. So, let me answer it sort of the inward/outward versus how we are unique. We are very unique, and since we're infinitely scalable, We are a single pane of glass for all of your distributed systems. We are very unique in terms of our multi-staged data reduction. And we're the only vendor that's doing, from a technology differentiation standpoint, we're the only vendor that's doing machine learning based stuff. >> Multi-stage data reduction, I want to break that down. What does that actually mean in practice? >> Sure, so we get the question frequently. Is that compression or duplication or is there something else in there? >> There's a couple different things actually. So why does that matter? So a lot of customers will ask a question, well by definition, no sequel or redo based environments, it's all based on replica, so how to back things up. First of all, replication isn't backup. So that's lesson number one. Point in time backup is very different than replication. Replication replicates bad data just as quickly as it replicates good. When you back up these very large data sets, you have to be incredibly efficient in how you do that. What we do with multi-stage data reduction is one, we will do de duplication, we'll do variable length, de duplication, we will do compression, we will do erasure coding, but the other thing that we'll also do in there, is what we call a global de plication pool. So when we're de duping your data, we're actually de duping it against a very large data set. So there's value in, this is where size matters. So the larger the data set, your data's all secured. But the larger the size of the data that I'm actually storing, the higher percentage I could get of de duplication. Because I've got a higher pool to reduce against. So the net result is we're incredibly efficient when you're talking about petabyte scale data management. We're incredibly efficient to the tune of 10 X easily 10 X over traditional de duplication, and multi X over technologies that are more current, if you will. So back to your question about, we are confident that we have a very strong head start. Our opportunity now is we got to drive why we're here. Cause we got to drive awareness. We got to make sure everybody knows who we are and how we're unique and how we're different. And you guys are great. Love being on The Cube. From a customer standpoint, the customers are going to win, and this is sort of a cliche, but it's true, the customer's the best harness of their data. They're the ones that are going to win. They're going to be more competitive, they're going to be able to find ways to be differentiated. And the only way they're going to do that is they're make the appropriate investments in their data infrastructure, in their data lakes, in their data management tool, so that they can harness all that data. >> Where do you see the future of your Hortonworks partnership going? So Hortonworks is, so we support a broad ecosystem. So, Hortonworks is just as important as any of our other data source partners. So, we are where we see that enfolding, is they're going to, we play an important part in, we feel our value, let me put it that way. We feel our value in helping Hortonworks, is as more and more organizations go mainstream with these applications. These are not corner cases anymore. This is not sort of in the lab. This is like the real deal. This is mainstream enterprises running business critical applications. The value we bring is you're not going to rely on those platforms without an enterprise data management solution that delivers what we deliver. So our value there is we can go to market, too. There's all kinds of ways we can go to market together. But net and that our value there is that we provide a very important enterprise data management capability that's important for customers that are deploying in these business critical environments. >> Great. >> Very good, as more of the data gets persisted out at the edge devices and the Internet of things, and so forth, what are the challenges in terms of protecting that data, backup and restore, de duplication, and so forth, and to what extent is your company's Imanis data maybe addressing those kinds of more distributed data management requirements going forward? Do you see that on the rise? Are you hearing that from customers? They want to do more of that? More of an edge cloud environment? Or is that way too far in the future? >> I don't think it's way too far in the future, but I do think there's an inside out. So my position on that is that it's not that there isn't edge work going on. What I would contend is that the big problem right now from an enterprise mainstreaming standpoint, is more getting the house is order, just your core house in order, from you move from sort of a traditional four wall data center to a hybrid cloud environment. Maybe not quite as edge. Combination of how do I leverage on prem and the cloud, so to speak. And how do I get the core data lake and the case of Hortonworks, how do I get that big data lake sorted out? You're touching on, I think, a longer discussion, which is where is the analysis going on? Where is the data going to persist? You know, where do you do some of that computational work? So you get all this information out at the edge. Does all that information end up going into the data lake? So, do you move the storage to where the lake is? Do you start pushing some of the lake functionale out to the edge where you have to then start doing some of the, so it's a much more complicated discussion. >> I know we had this discussion over lunch. This may be outside your wheelhouse, but let me just ask it anyway. We've seen more at Wikibon, I cover AI and distributed training and distributed inference and things so the edges are capturing the data and for more and more, there's a trend to where they're performing local training of their models, their embedded models, from the data they capture. But quite often, edge devices don't have a ton of storage and they're not going to retain that long. But some of that data will need to be archived. Will need to be persisted in a way and managed as a core resource, so we see that kind of requirement maybe not now, but in a few years time distributed training in persistence of that data, protection of that data, becoming a mainstream enterprise requirement. Where AI and machine learning, the whole pipeline is a concern. That's like I said, that's probably outside you guys wheelhouse. That's probably outside the realm for your customers But that kind of thing is coming out, as the likes of Hortonworks and IBM and everybody else, is starting to look at it and implement it, containerization of analytics and data management out to all these micro devices. >> Yes, and I think you're right there. And to your point about the, we're kind of going where the data is, if you will in volumes, kind of thing. And it's going that direction. And frankly, where we see that happening is, that's where the cloud plays a big role as well, because there's edge, but how do you get to the edge? You can get to the edge through the cloud. So, again, we run on AWS. We run on GCP, we run on Asher. So, to be clear, in terms of the data we can rotect, we got a broad portfolio, broad ecosystem of adute based big data, data sources that we support as well as no sequel. If they're running on AWS or GCP or Asher, we support ADLS, we support Asher's data lake stuff, HD Inside, we support a whole bunch of different things both from a cloud standpoint as on prem. Which is where we're seeing some of that edge work happening. >> Great, well Peter thank you so much for coming on The Cube. It's always a pleasure to have you on. >> Yes, thanks for having me and I look forward to being back sometime soon. >> We'll have you. >> Thank you both. >> When the time is right. >> Indeed, we will have more from The Cube's live coverage of Dataworks just after this. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

of Silicon Valley, it's the Cube. He is the vice president of So the reason I jumped here is because is one of the greatest little company So the timing's right to essentially evolving in what you offer. and I'll come directly to your question. and orchestrate the process of Test Def. And we discussed this So the value very specifically ransom is what you're saying. to happen in your data pattern. You've lost data that should be there, be that you lost data. So today we don't then turn around So the least you have, you know the power of your data. So you were the company the inward/outward What does that actually mean in practice? Sure, so we get the They're the ones that are going to win. This is not sort of in the lab. Where is the data going to persist? from the data they capture. of the data we can rotect, It's always a pleasure to have you on. and I look forward to Indeed, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Peter Smails	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Imanis	ORGANIZATION	0.99+
10 percent	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose	LOCATION	0.99+
today	DATE	0.99+
San Jose, California	LOCATION	0.99+
50 different metrics	QUANTITY	0.99+
both	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
two ways	QUANTITY	0.99+
one	QUANTITY	0.99+
Test Def	TITLE	0.98+
about a month	QUANTITY	0.98+
Asher	ORGANIZATION	0.98+
Imanis Data	ORGANIZATION	0.97+
Wikibon	ORGANIZATION	0.97+
around five years	QUANTITY	0.96+
10 X	QUANTITY	0.95+
One	QUANTITY	0.94+
Dataworks Summit 2018	EVENT	0.94+
Dev Opps	TITLE	0.94+
DataWorks Summit 2018	EVENT	0.94+
one petabyte	QUANTITY	0.93+
The Cube	ORGANIZATION	0.93+
First	QUANTITY	0.92+
Imanis	PERSON	0.91+
ImanisData	ORGANIZATION	0.89+
single pane	QUANTITY	0.87+
Number two	QUANTITY	0.86+
FEDOOP	TITLE	0.84+
first use case	QUANTITY	0.81+
last five years	DATE	0.76+
GCP	TITLE	0.65+
number one	QUANTITY	0.62+
couple	QUANTITY	0.6+
Dataworks	ORGANIZATION	0.59+
CICD	TITLE	0.55+
HD Inside	ORGANIZATION	0.55+
days	DATE	0.55+
ADLS	ORGANIZATION	0.5+
Test	TITLE	0.47+
IOT	TITLE	0.34+
Cube	ORGANIZATION	0.27+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Dan Potter	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Ali Bajwah	PERSON	0.99+
Dan	PERSON	0.99+
Ali Bajwa	PERSON	0.99+
Ali	PERSON	0.99+
James Kobielus	PERSON	0.99+
Thursday morning	DATE	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
Attunity	ORGANIZATION	0.99+
Last year	DATE	0.99+
One	QUANTITY	0.99+
second piece	QUANTITY	0.99+
GDPR	TITLE	0.99+
Atlas	ORGANIZATION	0.99+
Thursday	DATE	0.99+
both	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
Ranger	ORGANIZATION	0.98+
second offering	QUANTITY	0.98+
DataWorks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
Atlas	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
today	DATE	0.97+
DataWorks Summit 2018	EVENT	0.96+
two main barriers	QUANTITY	0.95+
DataPlane Services	ORGANIZATION	0.95+
DataWorks Summit 2018	EVENT	0.94+
one	QUANTITY	0.93+
San Jose, California	LOCATION	0.93+
Docker	TITLE	0.9+
single glass	QUANTITY	0.87+
3.0	OTHER	0.85+
European	OTHER	0.84+
Attunity	PERSON	0.84+
Hive	LOCATION	0.83+
HDP 3.0	OTHER	0.82+
one nice thing	QUANTITY	0.82+
DataWorks Berlin	EVENT	0.81+
EU	ORGANIZATION	0.81+
first	QUANTITY	0.8+
DataPlane	TITLE	0.8+
EU	LOCATION	0.78+
EDW	TITLE	0.77+
Data Steward Studio	ORGANIZATION	0.73+
Hive	ORGANIZATION	0.73+
Data Steward Studio	TITLE	0.69+
single transaction	QUANTITY	0.68+
Ranger	TITLE	0.66+
Studio	COMMERCIAL_ITEM	0.63+
CDC	ORGANIZATION	0.58+
DataPlane	ORGANIZATION	0.55+
them	QUANTITY	0.53+
HDP 3.0	OTHER	0.52+

Eric Herzog, IBM | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have with us Eric Herzog. He is the Chief Marketing Officer and VP of Global Channels at the IBM Storage Division. Thanks so much for coming on theCUBE once again, Eric. >> Well, thank you. We always love to be on theCUBE and talk to all of theCUBE analysts about various topics, data, storage, multi-cloud, all the works. >> And before the cameras were rolling, we were talking about how you might be the biggest CUBE alum in the sense of you've been on theCUBE more times than anyone else. >> I know I'm in the top five, but I may be number one, I have to check with Dave Vellante and crew and see. >> Exactly and often wearing a Hawaiian shirt. >> Yes. >> Yes, I was on theCUBE last week from CISCO Live. I was not wearing a Hawaiian shirt. And Stu and John gave me a hard time about why was not I wearing a Hawaiian shirt? So I make sure I showed up to the DataWorks show- >> Stu, Dave, get a load. >> You're in California with a tan, so it fits, it's good. >> So we were talking a little bit before the cameras were rolling and you were saying one of the points that is sort of central to your professional life is it's not just about the storage, it's about the data. So riff on that a little bit. >> Sure, so at IBM we believe everything is data driven and in fact we would argue that data is more valuable than oil or diamonds or plutonium or platinum or silver to anything else. It is the most viable asset, whether you be a global Fortune 500, whether you be a midsize company or whether you be Herzogs Bar and Grill. So data is what you use with your suppliers, with your customers, with your partners. Literally everything around your company is really built around the data so most effectively managing it and make sure, A, it's always performant because when it's not performant they go away. As you probably know, Google did a survey that one, two, after one, two they go off your website, they click somewhere else so has to be performant. Obviously in today's 365, 7 by 24 company it needs to always be resilient and reliable and it always needs to be available, otherwise if the storage goes down, guess what? Your AI doesn't work, your Cloud doesn't work, whatever workload, if you're more traditional, your Oracle, Sequel, you know SAP, none of those workloads work if you don't have a solid storage foundation underneath your data driven enterprise. >> So with that ethos in mind, talk about the products that you are launching, that you newly launched and also your product roadmap going forward. >> Sure, so for us everything really is that storage is this critical foundation for the data driven, multi Cloud enterprise. And as I've said before on theCube, all of our storage software's now Cloud-ified so if you need to automatically tier out to IBM Cloud or Amazon or Azure, we automatically will move the data placement around from one premise out to a Cloud and for certain customers who may be multi Cloud, in this case using multiple private Cloud providers, which happens due to either legal reasons or procurement reasons or geographic reasons for the larger enterprises, we can handle that as well. That's part of it, second thing is we just announced earlier today an artificial intelligence, an AI reference architecture, that incorporates a full stack from the very bottom, both servers and storage, all the way up through the top layer, then the applications on top, so we just launched that today. >> AI for storage management or AI for run a range of applications? >> Regular AI, artificial intelligence from an application perspective. So we announced that reference architecture today. Basically think of the reference architecture as your recipe, your blueprint, of how to put it all together. Some of the components are from IBM, such as Spectrum Scale and Spectrum Computing from my division, our servers from our Cloud division. Some are opensource, Tensor, Caffe, things like that. Basic gives you what the stack needs to be, and what you need to do in various AI workloads, applications and use cases. >> I believe you have distributed deep learning as an IBM capability, that's part of that stack, is that correct? >> That is part of the stack, it's like in the middle of the stack. >> Is it, correct me if I'm wrong, that's containerization of AI functionality? >> Right. >> For distributed deployment? >> Right. >> In an orchestrated Kubernetes fabric, is that correct? >> Yeah, so when you look at it from an IBM perspective, while we clearly support the virtualized world, the VM wares, the hyper V's, the KVMs and the OVMs, and we will continue to do that, we're also heavily invested in the container environment. For example, one of our other divisions, the IBM Cloud Private division, has announced a solution that's all about private Clouds, you can either get it hosted at IBM or literally buy our stack- >> Rob Thomas in fact demoed it this morning, here. >> Right, exactly. And you could create- >> At DataWorks. >> Private Cloud initiative, and there are companies that, whether it be for security purposes or whether it be for legal reasons or other reasons, don't want to use public Cloud providers, be it IBM, Amazon, Azure, Google or any of the big public Cloud providers, they want a private Cloud and IBM either A, will host it or B, with IBM Cloud Private. All of that infrastructure is built around a containerized environment. We support the older world, the virtualized world, and the newer world, the container world. In fact, our storage, allows you to have persistent storage in a container's environment, Dockers and Kubernetes, and that works on all of our block storage and that's a freebie, by the way, we don't charge for that. >> You've worked in the data storage industry for a long time, can you talk a little bit about how the marketing message has changed and evolved since you first began in this industry and in terms of what customers want to hear and what assuages their fears? >> Sure, so nobody cares about speeds and feeds, okay? Except me, because I've been doing storage for 32 years. >> And him, he might care. (laughs) >> But when you look at it, the decision makers today, the CIOs, in 32 years, including seven start ups, IBM and EMC, I've never, ever, ever, met a CIO who used to be a storage guy, ever. So, they don't care. They know that they need storage and the other infrastructure, including servers and networking, but think about it, when the app is slow, who do they blame? Usually they blame the storage guy first, secondarily they blame the server guy, thirdly they blame the networking guy. They never look to see that their code stack is improperly done. Really what you have to do is talk applications, workloads and use cases which is what the AI reference architecture does. What my team does in non AI workloads, it's all about, again, data driven, multi Cloud infrastructure. They want to know how you're going to make a new workload fast AI. How you're going to make their Cloud resilient whether it's private or hybrid. In fact, IBM storage sells a ton of technology to large public Cloud providers that do not have the initials IBM. We sell gobs of storage to other public Cloud providers, both big, medium and small. It's really all about the applications, workloads and use cases, and that's what gets people excited. You basically need a position, just like I talked about with the AI foundations, storage is the critical foundation. We happen to be, knocking on wood, let's hope there's no earthquake, since I've lived here my whole life, and I've been in earthquakes, I was in the '89 quake. Literally fell down a bunch of stairs in the '89 quake. If there's an earthquake as great as IBM storage is, or any other storage or servers, it's crushed. Boom, you're done! Okay, well you need to make sure that your infrastructure, really your data, is covered by the right infrastructure and that it's always resilient, it's always performing and is always available. And that's what IBM drives is about, that's the message, not about how many gigabytes per second in bandwidth or what's the- Not that we can't spew that stuff when we talk to the right person but in general people don't care about it. What they want to know is, "Oh that SAP workload took 30 hours and now it takes 30 minutes?" We have public references that will say that. "Oh, you mean I can use eight to ten times less storage for the same money?" Yes, and we have public references that will say that. So that's what it's really about, so storage is really more from really a speeds and feeds Nuremberger sort of thing, and now all the Nurembergers are doing AI and Caffe and TensorFlow and all of that, they're all hackers, right? It used to be storage guys who used to do that and to a lesser extent server guys and definitely networking guys. That's all shifted to the software side so you got to talk the languages. What can we do with Hortonworks? By the way we were named in Q1 of 2018 as the Hortonworks infrastructure partner of the year. We work with Hortonworks all time, at all levels, whether it be with our channel partners, whether it be with our direct end users, however the customer wants to consume, we work with Hortonworks very closely and other providers as well in that big data analytics and the AI infrastructure world, that's what we do. >> So the containerizations side of the IBM AI stack, then the containerization capabilities in Hortonworks Data Platform 3.0, can you give us a sense for how you plan to, or do you plan at IBM, to work with Hortonworks to bring these capabilities, your reference architecture, into more, or bring their environment for that matter, into more of an alignment with what you're offering? >> So we haven't an exact decision of how we're going to do it, but we interface with Hortonworks on a continual basis. >> Yeah. >> We're working to figure out what's the right solution, whether that be an integrated solution of some type, whether that be something that we do through an adjunct to our reference architecture or some reference architecture that they have but we always make sure, again, we are their partner of the year for infrastructure named in Q1, and that's because we work very tightly with Hortonworks and make sure that what we do ties out with them, hits the right applications, workloads and use cases, the big data world, the analytic world and the AI world so that we're tied off, you know, together to make sure that we deliver the right solutions to the end user because that's what matters most is what gets the end users fired up, not what gets Hortonworks or IBM fired up, it's what gets the end users fired up. >> When you're trying to get into the head space of the CIO, and get your message out there, I mean what is it, what would you say is it that keeps them up at night? What are their biggest pain points and then how do you come in and solve them? >> I'd say the number one pain point for most CIOs is application delivery, okay? Whether that be to the line of business, put it this way, let's take an old workload, okay? Let's take that SAP example, that CIO was under pressure because they were trying, in this case it was a giant retailer who was shipping stuff every night, all over the world. Well guess what? The green undershirts in the wrong size, went to Paducah, Kentucky and then one of the other stores, in Singapore, which needed those green shirts, they ended up with shoes and the reason is, they couldn't run that SAP workload in a couple hours. Now they run it in 30 minutes. It used to take 30 hours. So since they're shipping every night, you're basically missing a cycle, essentially and you're not delivering the right thing from a retail infrastructure perspective to each of their nodes, if you will, to their retail locations. So they care about what do they need to do to deliver to the business the right applications, workloads and use cases on the right timeframe and they can't go down, people get fired for that at the CIO level, right? If something goes down, the CIO is gone and obviously for certain companies that are more in the modern mode, okay? People who are delivering stuff and their primary transactional vehicle is the internet, not retail, not through partners, not through people like IBM, but their primary transactional vehicle is a website, if that website is not resilient, performant and always reliable, then guess what? They are shut down and they're not selling anything to anybody, which is to true if you're Nordstroms, right? Someone can always go into the store and buy something, right, and figure it out? Almost all old retailers have not only a connection to core but they literally have a server and storage in every retail location so if the core goes down, guess what, they can transact. In the era of the internet, you don't do that anymore. Right? If you're shipping only on the internet, you're shipping on the internet so whether it be a new workload, okay? An old workload if you're doing the whole IOT thing. For example, I know a company that I was working with, it's a giant, private mining company. They have those giant, like three story dump trucks you see on the Discovery Channel. Those things cost them a hundred million dollars, so they have five thousand sensors on every dump truck. It's a fricking dump truck but guess what, they got five thousand sensors on there so they can monitor and make sure they take proactive action because if that goes down, whether these be diamond mines or these be Uranium mines or whatever it is, it costs them hundreds of millions of dollars to have a thing go down. That's, if you will, trying to take it out of the traditional, high tech area, which we all talk about, whether it be Apple or Google, or IBM, okay great, now let's put it to some other workload. In this case, this is the use of IOT, in a big data analytics environment with AI based infrastructure, to manage dump trucks. >> I think you're talking about what's called, "digital twins" in a networked environment for materials management, supply chain management and so forth. Are those requirements growing in terms of industrial IOT requirements of that sort and how does that effect the amount of data that needs to be stored, the sophistication of the AI and the stream competing that needs to be provisioned? Can you talk to that? >> The amount of data is growing exponentially. It's growing at yottabytes and zettabytes a year now, not at just exabytes anymore. In fact, everybody on their iPhone or their laptop, I've got a 10GB phone, okay? My laptop, which happens to be a Power Book, is two terabytes of flash, on a laptop. So just imagine how much data's being generated if you're doing in a giant factory, whether you be in the warehouse space, whether you be in healthcare, whether you be in government, whether you be in the financial sector and now all those additional regulations, such as GDPR in Europe and other regulations across the world about what you have to do with your healthcare data, what you have to do with your finance data, the amount of data being stored. And then on top of it, quite honestly, from an AI big data analytics perspective, the more data you have, the more valuable it is, the more you can mine it or the more oil, it's as if the world was just oil, forget the pollution side, let's assume oil didn't cause pollution. Okay, great, then guess what? You would be using oil everywhere and you wouldn't be using solar, you'd be using oil and by the way you need more and more and more, and how much oil you have and how you control that would be the power. That right now is the power of data and if anything it's getting more and more and more. So again, you always have to be able to be resilient with that data, you always have to interact with things, like we do with Hortonworks or other application workloads. Our AI reference architecture is another perfect example of the things you need to do to provide, you know, at the base infrastructure, the right foundation. If you have the wrong foundation to a building, it falls over. Whether it be your house, a hotel, this convention center, if it had the wrong foundation, it falls over. >> Actually to follow the oil analogy just a little bit further, the more of this data you have, the more PII there is and it usually, and the more the workloads need to scale up, especially for things like data masking. >> Right. >> When you have compliance requirements like GDPR, so you want to process the data but you need to mask it first, therefore you need clusters that conceivably are optimized for high volume, highly scalable masking in real time, to drive the downstream app, to feed the downstream applications and to feed the data scientist, you know, data lakes, whatever, and so forth and so on? >> That's why you need things like Incredible Compute which IBM offers with the Power Platform. And why you need storage that, again, can scale up. >> Yeah. >> Can get as big as you need it to be, for example in our reference architecture, we use both what we call Spectrum Scale, which is a big data analytics workload performance engine, it has multiple threaded, multi tasking. In fact one of the largest banks in the world, if you happen to bank with them, your credit card fraud is being done on our stuff, okay? But at the same time we have what's called IBM Cloud Object Storage which is an object store, you want to take every one of those searches for fraud and when they find out that no one stole my MasterCard or the Visa, you still want to put it in there because then you mine it later and see patterns of how people are trying to steal stuff because it's all being done digitally anyway. You want to be able to do that. So you A, want to handle it very quickly and resiliently but then you want to be able to mine it later, as you said, mining the data. >> Or do high value anomaly detection in the moment to be able to tag the more anomalous data that you can then sift through later or maybe in the moment for realtime litigation. >> Well that's highly compute intensive, it's AI intensive and it's highly storage intensive on a performance side and then what happens is you store it all for, lets say, further analysis so you can tell people, "When you get your Am Ex card, do this and they won't steal it." Well the only way to do that, is you use AI on this ocean of data, where you're analyzing all this fraud that has happened, to look at patterns and then you tell me, as a consumer, what to do. Whether it be in the financial business, in this case the credit card business, healthcare, government, manufacturing. One of our resellers actually developed an AI based tool that can scan boxes and cans for faults on an assembly line and actually have sold it to a beer company and to a soda company that instead of people looking at the cans, like you see on the Food Channel, to pull it off, guess what? It's all automatically done. There's no people pulling the can off, "Oh, that can is damaged" and they're looking at it and by the way, sometimes they slip through. Now, using cameras and this AI based infrastructure from IBM, with our storage underneath the hood, they're able to do this. >> Great. Well Eric thank you so much for coming on theCUBE. It's always been a lot of fun talking to you. >> Great, well thank you very much. We love being on theCUBE and appreciate it and hope everyone enjoys the DataWorks conference. >> We will have more from DataWorks just after this. (techno beat music)

Published Date : Jun 19 2018

SUMMARY :

in the heart of Silicon He is the Chief Marketing Officer and talk to all of theCUBE analysts in the sense of you've been on theCUBE I know I'm in the top five, Exactly and often And Stu and John gave me a hard time about You're in California with and you were saying one of the points and it always needs to be available, that you are launching, for the data driven, and what you need to do of the stack, it's like in in the container environment. Rob Thomas in fact demoed it And you could create- and that's a freebie, by the Sure, so nobody cares And him, he might care. and the AI infrastructure So the containerizations So we haven't an exact decision so that we're tied off, you know, together and the reason is, they of the AI and the stream competing and by the way you need more of this data you have, And why you need storage that, again, my MasterCard or the Visa, you still want anomaly detection in the moment at the cans, like you of fun talking to you. the DataWorks conference. We will have more from

ENTITIES

Entity	Category	Confidence
Diane Greene	PERSON	0.99+
Eric Herzog	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
Diane	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Mark Albertson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Rebecca Knight	PERSON	0.99+
Jennifer	PERSON	0.99+
Colin	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Rob Hof	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Tricia Wang	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Singapore	LOCATION	0.99+
James Scott	PERSON	0.99+
Scott	PERSON	0.99+
Ray Wang	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Brian Walden	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Jeff Bezos	PERSON	0.99+
Rachel Tobik	PERSON	0.99+
Alphabet	ORGANIZATION	0.99+
Zeynep Tufekci	PERSON	0.99+
Tricia	PERSON	0.99+
Stu	PERSON	0.99+
Tom Barton	PERSON	0.99+
Google	ORGANIZATION	0.99+
Sandra Rivera	PERSON	0.99+
John	PERSON	0.99+
Qualcomm	ORGANIZATION	0.99+
Ginni Rometty	PERSON	0.99+
France	LOCATION	0.99+
Jennifer Lin	PERSON	0.99+
Steve Jobs	PERSON	0.99+
Seattle	LOCATION	0.99+
Brian	PERSON	0.99+
Nokia	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Peter Burris	PERSON	0.99+
Scott Raynovich	PERSON	0.99+
Radisys	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Eric	PERSON	0.99+
Amanda Silver	PERSON	0.99+

Tendü Yogurtçu, Syncsort | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California, I'm your host, along with my cohost, James Kobielus. We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. Thanks so much for coming on theCUBE, for returning to theCUBE I should say. >> Thank you Rebecca and James. It's always a pleasure to be here. >> So you've been on theCUBE before and the last time you were talking about Syncsort's growth. So can you give our viewers a company update? Where you are now? >> Absolutely, Syncsort has seen extraordinary growth within the last the last three year. We tripled our revenue, doubled our employees and expanded the product portfolio significantly. Because of this phenomenal growth that we have seen, we also embarked on a new initiative with refreshing our brand. We rebranded and this was necessitated by the fact that we have such a broad portfolio of products and we are actually showing our new brand here, articulating the value our products bring with optimizing existing infrastructure, assuring data security and availability and advancing the data by integrating into next generation analytics platforms. So it's very exciting times in terms of Syncsort's growth. >> So the last time you were on the show it was pre-GT prop PR but we were talking before the cameras were rolling and you were explaining the kinds of adoption you're seeing and what, in this new era, you're seeing from customers and hearing from customers. Can you tell our viewers a little bit about it? >> When we were discussing last time, I talked about four mega trends we are seeing and those mega trends were primarily driven by the advanced business and operation analytics. Data governance, cloud, streaming and data science, artificial intelligence. And we talked, we really made a lot of announcement and focus on the use cases around data governance. Primarily helping our customers for the GDPR Global Data Protection Regulation initiatives and how we can create that visibility in the enterprise through the data by security and lineage and delivering trust data sets. Now we are talking about cloud primarily and the keynotes, this event and our focus is around cloud, primarily driven by again the use cases, right? How the businesses are adopting to the new era. One of the challenges that we see with our enterprise customers, over 7000 customers by the way, is the ability to future-proof their applications. Because this is a very rapidly changing stack. We have seen the keynotes talking about the importance of how do you connect your existing infrastructure with the future modern, next generation platforms. How do you future-proof the platform, make a diagnostic about whether it's Amazon, Microsoft of Google Cloud. Whether it's on-premise in legacy platforms today that the data has to be available in the next generation platforms. So the challenge we are seeing is how do we keep the data fresh? How do we create that abstraction that applications are future-proofed? Because organizations, even financial services customers, banking, insurance, they now have at least one cluster running in the public cloud. And there's private implementations, hybrid becomes the new standard. So our focus and most recent announcements have been around really helping our customers with real-time resilient changes that capture, keeping the data fresh, feeding into the downstream applications with the streaming and messaging data frames, for example Kafka, Amazon Kinesis, as well as keeping the persistent stores and how to Data Lake on-premise in the cloud fresh. >> Puts you into great alignment with your partner Hortonworks so, Tendu I wonder if we are here at DataWorks, it's Hortonworks' show, if you can break out for our viewers, what is the nature, the levels of your relationship, your partnership with Hortonworks and how the Syncsort portfolio plays with HDP 3.0 with Hortonworks DataFlow and the data plan services at a high level. >> Absolutely, so we have been a longtime partner with Hortonworks and a couple of years back, we strengthened our partnership. Hortonworks is reselling Syncsort and we have actually a prescriptive solution for Hadoop and ETL onboarding in Hadoop jointly. And it's very complementary, our strategy is very complementary because what Hortonworks is trying and achieving, is creating that abstraction and future-proofing and interaction consistency around referred as this morning. Across the platform, whether it's on-premise or in the cloud or across multiple clouds. We are providing the data application layer consistency and future-proofing on top of the platform. Leveraging the tools in the platform for orchestration, integrating with HTP, certifying with Trange or HTP, all of the tools DataFlow and at last of course for lineage. >> The theme of this conference is ideas, insights and innovation and as a partner of Hortonworks, can you describe what it means for you to be at this conference? What kinds of community and deepening existing relationships, forming new ones. Can you talk about what happens here? >> This is one of the major events around data and it's DataWorks as opposed to being more specific to the Hadoop itself, right? Because stack is evolving and data challenges are evolving. For us, it means really the interactions with the customers, the organizations and the partners here. Because the dynamics of the use cases is also evolving. For example Data Lake implementations started in U.S. And we started MER European organizations moving to streaming, data streaming applications faster than U.S. >> Why is that? >> Yeah. >> Why are Europeans moving faster to streaming than we are in North America? >> I think a couple of different things might participate. The open sources really enabling organizations to move fast. When the Data Lake initiative started, we have seen a little bit slow start in Europe but more experimentation with the Open Source Stack. And by that the more transformative use cases started really evolving. Like how do I manage interactions of the users with the remote controls as they are watching live TV, type of transformative use cases became important. And as we move to the transformative use cases, streaming is also very critical because lots of data is available and being able to keep the cloud data stores as well as on-premise data stores and downstream applications with fresh data becomes important. We in fact in early June announced that Syncsort's now's a part of Microsoft One Commercial Partner Program. With that our integrate solutions with data integration and data quality are Azure gold certified and Azure ready. We are in co-sale agreement and we are helping jointly a lot of customers, moving data and workloads to Azure and keeping those data stores close to platforms in sync. >> Right. >> So lots of exciting things, I mean there's a lot happening with the application space. There's also lots still happening connected to the governance cases that we have seen. Feeding security and IT operations data into again modern day, next generation analytics platforms is key. Whether it's Splunk, whether it's Elastic, as part of the Hadoop Stack. So we are still focused on governance as part of this multi-cloud and on-premise the cloud implementations as well. We in fact launched our Ironstream for IBMI product to help customers, not just making this state available for mainframes but also from IBMI into Splunk, Elastic and other security information and event management platforms. And today we announced work flow optimization across on-premise and multi-cloud and cloud platforms. So lots of focus across to optimize, assure and integrate portfolio of products helping customers with the business use cases. That's really our focus as we innovate organically and also acquire technologies and solutions. What are the problems we are solving and how we can help our customers with the business and operation analytics, targeting those mega trends around data governance, cloud streaming and also data science. >> What is the biggest trend do you think that is sort of driving all of these changes? As you said, the data is evolving. The use cases are evolving. What is it that is keeping your customers up at night? >> Right now it's still governance, keeping them up at night, because this evolving architecture is also making governance more complex, right? If we are looking at financial services, banking, insurance, healthcare, there are lots of existing infrastructures, mission critical data stores on mainframe IBMI in addition to this gravity of data changing and lots of data with the online businesses generated in the cloud. So how to govern that also while optimizing and making those data stores available for next generation analytics, makes the governance quite complex. So that really keeps and creates a lot of opportunity for the community, right? All of us here to address those challenges. >> Because it sounds to me, I'm hearing Splunk, Advanced Machine did it, I think of the internet of things and sensor grids. I'm hearing IBM mainframes, that's transactional data, that's your customer data and so forth. It seems like much of this data that you're describing that customers are trying to cleanse and consolidate and provide strict governance on, is absolutely essential for them to drive more artificial intelligence into end applications and mobile devices that are being used to drive the customer experience. Do you see more of your customers using your tools to massage the data sets as it were than data scientists then use to build and train their models for deployment into edge applications. Is that an emerging area where your customers are deploying Syncsort? >> Thank you for asking that question. >> It's a complex question. (laughing) But thanks for impacting it... >> It is a complex question but it's very important question. Yes and in the previous discussions, we have seen, and this morning also, Rob Thomas from IBM mentioned it as well, that machine learning and artificial intelligence data science really relies on high-quality data, right? It's 1950s anonymous computer scientist says garbage in, garbage out. >> Yeah. >> When we are using artificial intelligence and machine learning, the implications, the impact of bad data multiplies. Multiplies with the training of historical data. Multiplies with the insights that we are getting out of that. So data scientists today are still spending significant time on preparing the data for the iPipeline, and the data science pipeline, that's where we shine. Because our integrate portfolio accesses the data from all enterprise data stores and cleanses and matches and prepares that in a trusted manner for use for advanced analytics with machine learning, artificial intelligence. >> Yeah 'cause the magic of machine learning for predictive analytics is that you build a statistical model based on the most valid data set for the domain of interest. If the data is junk, then you're going to be building a junk model that will not be able to do its job. So, for want of a nail, the kingdom was lost. For want of a Syncsort, (laughing) Data cleansing and you know governance tool, the whole AI superstructure will fall down. >> Yes, yes absolutely. >> Yeah, good. >> Well thank you so much Tendu for coming on theCUBE and for giving us a lot of background and information. >> Thank you for having me, thank you. >> Good to have you. >> Always a pleasure. >> I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of DataWorks 2018 just after this. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

in the heart of Silicon Valley, It's theCUBE, We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. It's always a pleasure to be here. and the last time you were talking about Syncsort's growth. and expanded the product portfolio significantly. So the last time you were on the show it was pre-GT prop One of the challenges that we see with our enterprise and how the Syncsort portfolio plays with HDP 3.0 We are providing the data application layer consistency and innovation and as a partner of Hortonworks, can you Because the dynamics of the use cases is also evolving. When the Data Lake initiative started, we have seen a little What are the problems we are solving and how we can help What is the biggest trend do you think that is businesses generated in the cloud. massage the data sets as it were than data scientists It's a complex question. Yes and in the previous discussions, we have seen, and the data science pipeline, that's where we shine. If the data is junk, then you're going to be building and for giving us a lot of background and information. of DataWorks 2018 just after this.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Rebecca Knight	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Tendu Yogurtcu	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
U.S.	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Syncsort	ORGANIZATION	0.99+
1950s	DATE	0.99+
San Jose, California	LOCATION	0.99+
Hortonworks'	ORGANIZATION	0.99+
North America	LOCATION	0.99+
early June	DATE	0.99+
DataWorks	ORGANIZATION	0.99+
over 7000 customers	QUANTITY	0.99+
One	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
DataWorks Summit 2018	EVENT	0.97+
Elastic	TITLE	0.97+
one	QUANTITY	0.96+
today	DATE	0.96+
IBMI	TITLE	0.96+
four	QUANTITY	0.95+
Splunk	TITLE	0.95+
Tendü Yogurtçu	PERSON	0.95+
Kafka	TITLE	0.94+
this morning	DATE	0.94+
Data Lake	ORGANIZATION	0.93+
DataWorks	TITLE	0.92+
iPipeline	COMMERCIAL_ITEM	0.91+
DataWorks 2018	EVENT	0.91+
Splunk	PERSON	0.9+
ETL	ORGANIZATION	0.87+
Azure	TITLE	0.85+
Google Cloud	ORGANIZATION	0.83+
Hadoop	TITLE	0.82+
last three year	DATE	0.82+
couple of years back	DATE	0.81+
Syncsort	PERSON	0.8+
HTP	TITLE	0.78+
European	OTHER	0.77+
Tendu	PERSON	0.74+
Europeans	PERSON	0.72+
Data Protection Regulation	TITLE	0.71+
Kinesis	TITLE	0.7+
least one cluster	QUANTITY	0.7+
Ironstream	COMMERCIAL_ITEM	0.66+
Program	TITLE	0.61+
Azure	ORGANIZATION	0.54+
Commercial Partner	OTHER	0.54+
DataFlow	TITLE	0.54+
One	TITLE	0.54+
CTO	PERSON	0.53+
3.0	TITLE	0.53+
Trange	TITLE	0.53+
Stack	TITLE	0.51+

Arun Murthy, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost, Jim Kobielus. We're joined by Aaron Murphy, Arun Murphy, sorry. He is the co-founder and chief product officer of Hortonworks. Thank you so much for returning to theCUBE. It's great to have you on >> Yeah, likewise. It's been a fun time getting back, yeah. >> So you were on the main stage this morning in the keynote, and you were describing the journey, the data journey that so many customers are on right now, and you were talking about the cloud saying that the cloud is part of the strategy but it really needs to fit into the overall business strategy. Can you describe a little bit about how you're approach to that? >> Absolutely, and the way we look at this is we help customers leverage data to actually deliver better capabilities, better services, better experiences, to their customers, and that's the business we are in. Now with that obviously we look at cloud as a really key part of it, of the overall strategy in terms of how you want to manage data on-prem and on the cloud. We kind of joke that we ourself live in a world of real-time data. We just live in it and data is everywhere. You might have trucks on the road, you might have drawings, you might have sensors and you have it all over the world. At that point, we've kind of got to a point where enterprise understand that they'll manage all the infrastructure but in a lot of cases, it will make a lot more sense to actually lease some of it and that's the cloud. It's the same way, if you're delivering packages, you don't got buy planes and lay out roads you go to FedEx and actually let them handle that view. That's kind of what the cloud is. So that is why we really fundamentally believe that we have to help customers leverage infrastructure whatever makes sense pragmatically both from an architectural standpoint and from a financial standpoint and that's kind of why we talked about how your cloud strategy, is part of your data strategy which is actually fundamentally part of your business strategy. >> So how are you helping customers to leverage this? What is on their minds and what's your response? >> Yeah, it's really interesting, like I said, cloud is cloud, and infrastructure management is certainly something that's at the foremost, at the top of the mind for every CIO today. And what we've consistently heard is they need a way to manage all this data and all this infrastructure in a hybrid multi-tenant, multi-cloud fashion. Because in some GEOs you might not have your favorite cloud renderer. You know, go to parts of Asia is a great example. You might have to use on of the Chinese clouds. You go to parts of Europe, especially with things like the GDPR, the data residency laws and so on, you have to be very, very cognizant of where your data gets stored and where your infrastructure is present. And that is why we fundamentally believe it's really important to have and give enterprise a fabric with which it can manage all of this. And hide the details of all of the underlying infrastructure from them as much as possible. >> And that's DataPlane Services. >> And that's DataPlane Services, exactly. >> The Hortonworks DataPlane Services we launched in October of last year. Actually I was on CUBE talking about it back then too. We see a lot of interest, a lot of excitement around it because now they understand that, again, this doesn't mean that we drive it down to the least common denominator. It is about helping enterprises leverage the key differentiators at each of the cloud renderers products. For example, Google, which we announced a partnership, they are really strong on AI and MO. So if you are running TensorFlow and you want to deal with things like Kubernetes, GKE is a great place to do it. And, for example, you can now go to Google Cloud and get DPUs which work great for TensorFlow. Similarly, a lot of customers run on Amazon for a bunch of the operational stuff, Redshift as an example. So the world we live in, we want to help the CIO leverage the best piece of the cloud but then give them a consistent way to manage and count that data. We were joking on stage that IT has just about learned how deal with Kerberos and Hadoob And now we're telling them, "Oh, go figure out IM on Google." which is also IM on Amazon but they are completely different. The only thing that's consistent is the name. So I think we have a unique opportunity especially with the open source technologies like Altas, Ranger, Knox and so on, to be able to draw a consistent fabric over this and secured occurrence. And help the enterprise leverage the best parts of the cloud to put a best fit architecture together, but which also happens to be a best of breed architecture. >> So the fabric is everything you're describing, all the Apache open source projects in which HortonWorks is a primary committer and contributor, are able to scheme as in policies and metadata and so forth across this distributed heterogeneous fabric of public and private cloud segments within a distributed environment. >> Exactly. >> That's increasingly being containerized in terms of the applications for deployment to edge nodes. Containerization is a big theme in HTP3.0 which you announced at this show. >> Yeah. >> So, if you could give us a quick sense for how that containerization capability plays into more of an edge focus for what your customers are doing. >> Exactly, great point, and again, the fabric is obviously, the core parts of the fabric are the open source projects but we've also done a lot of net new innovation with data plans which, by the way, is also open source. Its a new product and a new platform that you can actually leverage, to lay it out over the open source ones you're familiar with. And again, like you said, containerization, what is actually driving the fundamentals of this, the details matter, the scale at which we operate, we're talking about thousands of nodes, terabytes of data. The details really matter because a 5% improvement at that scale leads to millions of dollars in optimization for capex and opex. So that's why all of that, the details are being fueled and driven by the community which is kind of what we tell over HDP3 Until the key ones, like you said, are containerization because now we can actually get complete agility in terms of how you deploy the applications. You get isolation not only at the resource management level with containers but you also get it at the software level, which means, if two data scientists wanted to use a different version of Python or Scala or Spark or whatever it is, they get that consistently and holistically. That now they can actually go from the test dev cycle into production in a completely consistent manner. So that's why containers are so big because now we can actually leverage it across the stack and the things like MiNiFi showing up. We can actually-- >> Define MiNiFi before you go further. What is MiNiFi for our listeners? >> Great question. Yeah, so we've always had NiFi-- >> Real-time >> Real-time data flow management and NiFi was still sort of within the data center. What MiNiFi does is actually now a really, really small layer, a small thin library if you will that you can throw on a phone, a doorbell, a sensor and that gives you all the capabilities of NiFi but at the edge. >> Mmm Right? And it's actually not just data flow but what is really cool about NiFi it's actually command and control. So you can actually do bidirectional command and control so you can actually change in real-time the flows you want, the processing you do, and so on. So what we're trying to do with MiNiFi is actually not just collect data from the edge but also push the processing as much as possible to the edge because we really do believe a lot more processing is going to happen at the edge especially with the A6 and so on coming out. There will be custom hardware that you can throw and essentially leverage that hardware at the edge to actually do this processing. And we believe, you know, we want to do that even if the cost of data not actually landing up at rest because at the end of the day we're in the insights business not in the data storage business. >> Well I want to get back to that. You were talking about innovation and how so much of it is driven by the open source community and you're a veteran of the big data open source community. How do we maintain that? How does that continue to be the fuel? >> Yeah, and a lot of it starts with just being consistent. From day one, James was around back then, in 2011 we started, we've always said, "We're going to be open source." because we fundamentally believed that the community is going to out innovate any one vendor regardless of how much money they have in the bank. So we really do believe that's the best way to innovate mostly because their is a sense of shared ownership of that product. It's not just one vendor throwing some code out there try to shove it down the customers throat. And we've seen this over and over again, right. Three years ago, we talk about a lot of the data plane stuff comes from Atlas and Ranger and so on. None of these existed. These actually came from the fruits of the collaboration with the community with actually some very large enterprises being a part of it. So it's a great example of how we continue to drive it6 because we fundamentally believe that, that's the best way to innovate and continue to believe so. >> Right. And the community, the Apache community as a whole so many different projects that for example, in streaming, there is Kafka, >> Okay. >> and there is others that address a core set of common requirements but in different ways, >> Exactly. >> supporting different approaches, for example, they are doing streaming with stateless transactions and so forth, or stateless semantics and so forth. Seems to me that HortonWorks is shifting towards being more of a streaming oriented vendor away from data at rest. Though, I should say HDP3.0 has got great scalability and storage efficiency capabilities baked in. I wonder if you could just break it down a little bit what the innovations or enhancements are in HDP3.0 for those of your core customers, which is most of them who are managing massive multi-terabyte, multi-petabyte distributed, federated, big data lakes. What's in HDP3.0 for them? >> Oh for lots. Again, like I said, we obviously spend a lot of time on the streaming side because that's where we see. We live in a real-time world. But again, we don't do it at the cost of our core business which continues to be HDP. And as you can see, the community trend is drive, we talked about continuization massive step up for the Hadoob Community. We've also added support for GPUs. Again, if you think about Trove's at scale machine learning. >> Graphing processing units, >> Graphical-- >> AI, deep learning >> Yeah, it's huge. Deep learning, intensive flow and so on, really, really need a custom, sort of GPU, if you will. So that's coming. That's an HDP3. We've added a whole bunch of scalability improvements with HDFS. We've added federation because now we can go from, you can go over a billion files a billion objects in HDFS. We also added capabilities for-- >> But you indicated yesterday when we were talking that very few of your customers need that capacity yet but you think they will so-- >> Oh for sure. Again, part of this is as we enable more source of data in real-time that's the fuel which drives and that was always the strategy behind the HDF product. It was about, can we leverage the synergies between the real-time world, feed that into what you do today, in your classic enterprise with data at rest and that is what is driving the necessity for scale. >> Yes. >> Right. We've done that. We spend a lot of work, again, loading the total cost of ownership the TCO so we added erasure coding. >> What is that exactly? >> Yeah, so erasure coding is a classic sort of storage concept which allows you to actually in sort of, you know HTFS has always been three replicas So for redundancy, fault tolerance and recovery. Now, it sounds okay having three replicas because it's cheap disk, right. But when you start to think about our customers running 70, 80 hundred terabytes of data those three replicas add up because you've now gone from 80 terabytes of effective data where actually two 1/4 of an exobyte in terms of raw storage. So now what we can do with erasure coding is actually instead of storing the three blocks we actually store parody. We store the encoding of it which means we can actually go down from three to like two, one and a half, whatever we want to do. So, if we can get from three blocks to one and a half especially for your core data, >> Yeah >> the ones you're not accessing every day. It results in a massive savings in terms of your infrastructure costs. And that's kind of what we're in the business doing, helping customers do better with the data they have whether it's on-prem or on the cloud, that's sort of we want to help customers be comfortable getting more data under management along with secured and the lower TCO. The other sort of big piece I'm really excited about HDP3 is all the work that's happened to Hive Community for what we call the real-time database. >> Yes. >> As you guys know, you follow the whole sequel of ours in the Doob Space. >> And hive has changed a lot in the last several years, this is very different from what it was five years ago. >> The only thing that's same from five years ago is the name (laughing) >> So again, the community has done a phenomenal job, kind of, really taking sort of a, we used to call it like a sequel engine on HDFS. From there, to drive it with 3.0, it's now like, with Hive 3 which is part of HDP3 it's a full fledged database. It's got full asset support. In fact, the asset support is so good that writing asset tables is at least as fast as writing non-asset tables now. And you can do that not only on-- >> Transactional database. >> Exactly. Now not only can you do it on prem, you can do it on S3. So you can actually drive the transactions through Hive on S3. We've done a lot of work to actually, you were there yesterday when we were talking about some of the performance work we've done with LAP and so on to actually give consistent performance both on-prem and the cloud and this is a lot of effort simply because the performance characteristics you get from the storage layer with HDFS versus S3 are significantly different. So now we have been able to bridge those with things with LAP. We've done a lot of work and sort of enhanced the security model around it, governance and security. So now you get things like account level, masking, row-level filtering, all the standard stuff that you would expect and more from an Enprise air house. We talked to a lot of our customers, they're doing, literally tens of thousands of views because they don't have the capabilities that exist in Hive now. >> Mmm-hmm 6 And I'm sitting here kind of being amazed that for an open source set of tools to have the best security and governance at this point is pretty amazing coming from where we started off. >> And it's absolutely essential for GDPR compliance and compliance HIPA and every other mandate and sensitivity that requires you to protect personally identifiable information, so very important. So in many ways HortonWorks has one of the premier big data catalogs for all manner of compliance requirements that your customers are chasing. >> Yeah, and James, you wrote about it in the contex6t of data storage studio which we introduced >> Yes. >> You know, things like consent management, having--- >> A consent portal >> A consent portal >> In which the customer can indicate the degree to which >> Exactly. >> they require controls over their management of their PII possibly to be forgotten and so forth. >> Yeah, it's going to be forgotten, it's consent even for analytics. Within the context of GDPR, you have to allow the customer to opt out of analytics, them being part of an analytic itself, right. >> Yeah. >> So things like those are now something we enable to the enhanced security models that are done in Ranger. So now, it's sort of the really cool part of what we've done now with GDPR is that we can get all these capabilities on existing data an existing applications by just adding a security policy, not rewriting It's a massive, massive, massive deal which I cannot tell you how much customers are excited about because they now understand. They were sort of freaking out that I have to go to 30, 40, 50 thousand enterprise apps6 and change them to take advantage, to actually provide consent, and try to be forgotten. The fact that you can do that now by changing a security policy with Ranger is huge for them. >> Arun, thank you so much for coming on theCUBE. It's always so much fun talking to you. >> Likewise. Thank you so much. >> I learned something every time I listen to you. >> Indeed, indeed. I'm Rebecca Knight for James Kobeilus, we will have more from theCUBE's live coverage of DataWorks just after this. (Techno music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. It's great to have you on Yeah, likewise. is part of the strategy but it really needs to fit and that's the business we are in. And hide the details of all of the underlying infrastructure for a bunch of the operational stuff, So the fabric is everything you're describing, in terms of the applications for deployment to edge nodes. So, if you could give us a quick sense for Until the key ones, like you said, are containerization Define MiNiFi before you go further. Yeah, so we've always had NiFi-- and that gives you all the capabilities of NiFi the processing you do, and so on. and how so much of it is driven by the open source community that the community is going to out innovate any one vendor And the community, the Apache community as a whole I wonder if you could just break it down a little bit And as you can see, the community trend is drive, because now we can go from, you can go over a billion files the real-time world, feed that into what you do today, loading the total cost of ownership the TCO sort of storage concept which allows you to actually is all the work that's happened to Hive Community in the Doob Space. And hive has changed a lot in the last several years, And you can do that not only on-- the performance characteristics you get to have the best security and governance at this point and sensitivity that requires you to protect possibly to be forgotten and so forth. Within the context of GDPR, you have to allow The fact that you can do that now Arun, thank you so much for coming on theCUBE. Thank you so much. we will have more from theCUBE's live coverage of DataWorks

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
James	PERSON	0.99+
Aaron Murphy	PERSON	0.99+
Arun Murphy	PERSON	0.99+
Arun	PERSON	0.99+
2011	DATE	0.99+
Google	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
80 terabytes	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Arun Murthy	PERSON	0.99+
HortonWorks	ORGANIZATION	0.99+
yesterday	DATE	0.99+
San Jose, California	LOCATION	0.99+
three replicas	QUANTITY	0.99+
James Kobeilus	PERSON	0.99+
three blocks	QUANTITY	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
Europe	LOCATION	0.99+
millions of dollars	QUANTITY	0.99+
Scala	TITLE	0.99+
Spark	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
five years ago	DATE	0.99+
one and a half	QUANTITY	0.98+
Enprise	ORGANIZATION	0.98+
three	QUANTITY	0.98+
Hive 3	TITLE	0.98+
Three years ago	DATE	0.98+
both	QUANTITY	0.98+
Asia	LOCATION	0.97+
50 thousand	QUANTITY	0.97+
TCO	ORGANIZATION	0.97+
MiNiFi	TITLE	0.97+
Apache	ORGANIZATION	0.97+
40	QUANTITY	0.97+
Altas	ORGANIZATION	0.97+
Hortonworks DataPlane Services	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
30	QUANTITY	0.95+
thousands of nodes	QUANTITY	0.95+
A6	COMMERCIAL_ITEM	0.95+
Kerberos	ORGANIZATION	0.95+
today	DATE	0.95+
Knox	ORGANIZATION	0.94+
one	QUANTITY	0.94+
hive	TITLE	0.94+
two data scientists	QUANTITY	0.94+
each	QUANTITY	0.92+
Chinese	OTHER	0.92+
TensorFlow	TITLE	0.92+
S3	TITLE	0.91+
October of last year	DATE	0.91+
Ranger	ORGANIZATION	0.91+
Hadoob	ORGANIZATION	0.91+
HIPA	TITLE	0.9+
CUBE	ORGANIZATION	0.9+
tens of thousands	QUANTITY	0.9+
one vendor	QUANTITY	0.89+
last several years	DATE	0.88+
a billion objects	QUANTITY	0.86+
70, 80 hundred terabytes of data	QUANTITY	0.86+
HTP3.0	TITLE	0.86+
two 1/4 of an exobyte	QUANTITY	0.86+
Atlas and	ORGANIZATION	0.85+
DataPlane Services	ORGANIZATION	0.84+
Google Cloud	TITLE	0.82+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

Mandy Chessell, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Well hello welcome to the Cube I'm James Kobielus. I'm the lead analyst for big data analytics within the Wikibon team of SiliconANGLE Media. I'm hosting the Cube this week at Dataworks Summit 2018 in Berlin, Germany. It's been an excellent event. Hortonworks, the host, had... We've completed two days of keynotes. They made an announcement of the Data Steward Studio as the latest of their offerings and demonstrated it this morning, to address GDPR compliance, which of course is hot and heavy is coming down on enterprises both in the EU and around the world including in the U.S. and the May 25th deadline is fast approaching. One of Hortonworks' prime partners is IBM. And today on this Cube segment we have Mandy Chessell. Mandy is a distinguished engineer at IBM who did an excellent keynote yesterday all about metadata and metadata management. Mandy, great to have you. >> Hi and thank you. >> So I wonder if you can just reprise or summarize the main take aways from your keynote yesterday on metadata and it's role in GDPR compliance, so forth and the broader strategies that enterprise customers have regarding managing their data in this new multi-cloud world where Hadoop and open source platforms are critically important for storing and processing data. So Mandy go ahead. >> So, metadata's not new. I mean it's basically information about data. And a lot of companies are trying to build a data catalog which is not a catalog of, you know, actually containing their data, it's a catalog that describes their data. >> James: Is it different with index or a glossary. How's the catalog different from-- >> Yeah, so catalog actually includes both. So it is a list of all the data sets plus a links to glossary definitions of what those data items mean within the data sets, plus information about the lineage of the data. It includes information about who's using it, what they're using it for, how it should be governed. >> James: It's like a governance repository. >> So governance is part of it. So the governance part is really saying, "This is how you're allowed to use it, "this is how the data's classified," "these are the automated actions that are going to happen "on the data as it's used "within the operational environment." >> James: Yeah. >> So there's that aspect to it, but there is the collaboration side. Hey I've been using this data set it's great. Or, actually this data set is full of errors, we can't use it. So you've got feedback to data set owners as well as, exchange and collaboration between data scientists working with the data. So it's really, it is a central resource for an organization that has a strong data strategy, is interested in becoming a data-driven organization as such, so, you know, this becomes their major catalog over their data assets, and how they're using it. So when a regulator comes in and says, "can you show up, show me that you're "managing personal data?" The data catalog will have the information about where personal data's located, what type of infrastructure it's sitting on, how it's being used by different services. So they can really show that they know what they're doing and then from that they can show how to processes are used in the metadata in order to use the data appropriately day to day. >> So Apache Atlas, so it's basically a catalog, if I understand correctly at least for IBM and Hortonworks, it's Hadoop, it's Apache Atlas and Apache Atlas is essentially a metadata open source code base. >> Mandy: Yes, yes. >> So explain what Atlas is in this context. >> So yes, Atlas is a collection of code, but it supports a server, a graph-based metadata server. It also supports-- >> James: A graph-based >> Both: Metadata server >> Yes >> James: I'm sorry, so explain what you mean by graph-based in this context. >> Okay, so it runs using the JanusGraph, graph repository. And this is very good for metadata 'cause if you think about what it is it's connecting dots. It's basically saying this data set means this value and needs to be classified in this way and this-- >> James: Like a semantic knowledge graph >> It is, yes actually. And on top of it we impose a type system that describes the different types of things you need to control and manage in a data catalog, but the graph, the Atlas component gives you that graph-based, sorry, graph-based repository underneath, but on top we've built what we call the open metadata and governance libraries. They run inside Atlas so when you run Atlas you will have all the open metadata interfaces, but you can also take those libraries and connect them and load them actually into another vendor's product. And what they're doing is allowing metadata to be exchanged between repositories of different types. And this becomes incredibly important as an organization increases their maturity and their use of data because you can't just have knowledge about data in a single server, it just doesn't scale. You need to get that knowledge into every runtime environment, into the data tools that people are using across the organization. And so it needs to be distributed. >> Mandy I'm wondering, the whole notion of what you catalog in that repository, does it include, or does Apache Atlas support adding metadata relevant to data derivative assets like machine learning models-- >> Mandy: Absolutely. >> So forth. >> Mandy: Absolutely, so we have base types in the upper metadata layer, but also it's a very flexible and sensible type system. So, if you've got a specialist machine learning model that needs additional information stored about it, that can easily be added to the runtime environment. And then it will be managed through the open metadata protocols as if it was part of the native type system. >> Because of the courses in analysts, one of my core areas is artificial intelligence and one of the hot themes in artificial, well there's a broad umbrella called AI safety. >> Mandy: Yeah. >> And one of the core subsets of that is something called explicable AI, being able to identify the lineage of a given algorithmic decision back to what machine learning models fed from what data. >> Mandy: Yeah. >> Throw what action like when let's say a self-driving vehicle hits a human being for legal, you know, discovery whatever. So what I'm getting at, what I'm working through to is the extent to which the Hortonworks, IBM big data catalog running Atlas can be a foundation for explicable AI either now or in the future. We see a lot of enterprise, me as an analyst at least, sees lots of enterprises that are exploring this topic, but it's not to the point where it's in production, explicable AI, but where clearly companies like IBM are exploring building a stack or a architecture for doing this kind of thing in a standardized way. What are your thoughts there? Is IBM working on bringing, say Atlas and the overall big data catalog into that kind of a use case. >> Yes, yeah, so if you think about what's required, you need to understand the data that was used to train the AI how, what data's been fed to it since it was deployed because that's going to change its behavior, and then also a view of how that data's going to change in the future so you can start to anticipate issues that might arising from the model's changing behavior. And this is where the data catalog can actually associate and maintain information about the data that's being used with the algorithm. You can also associate the checking mechanism that's constantly monitoring the profile of the data so you can see where the data is changing over time, that will obviously affect the behavior of the machine learning model. So it's really about providing, not just information about the model itself, but also the data that's feeding it, how those characteristics are changing over time so that you know the model is continuing to work into the future. >> So tell us about the IBM, Hortonworks partnership on metadata and so forth. >> Mandy: Okay. >> How is that evolving? So, you know, your partnership is fairly tight. You clearly, you've got ODPI, you've got the work that you're doing related to the big data catalog. What can we expect to see in the near future in terms of, initiatives building on all of that for governance of big data in the multi-cloud environment? >> Yeah so Hortonworks started the Apache Atlas project a couple of years ago with a number of their customers. And they built a base repository and a set of APIs that allow it to work in the Hadoop environment. We came along last year, formed our partnership. That partnership includes this open metadata and governance layer. So since then we worked with ING as well and ING bring the, sort of, user perspective, this is the organization's use of the data. And, so between the three of us we are basically transforming Apache Atlas from an Hadoop focused metadata repository to an enterprise focused metadata repository. Plus enabling other vendors to connect into the open metadata ecosystem. So we're standardizing types, standardizing format, the format of metadata, there's a protocol for exchanging metadata between repositories. And this is all coming from that three-way partnership where you've got a consuming organization, you've got a company who's used to building enterprise middleware, and you've got Hortonworks with their knowledge of open source development in their Hadoop environment. >> Quick out of left field, as you develop this architecture, clearly you're leveraging Hadoop HTFS for storage. Are you looking to at least evaluating maybe using block chain for more distributed management of the metadata in these heterogeneous environments in the multi-cloud, or not? >> So Atlas itself does run on HTFS, but doesn't need to run on HTFS, it's got other storage environments so that we can run it outside of Hadoop. When it comes to block chain, so block chain is, for, sharing data between partners, small amounts of data that basically express agreements, so it's like a ledger. There are some aspects that we could use for metadata management. It's more that we actually need to put metadata management into block chain. So the agreements and contracts that are stored in block chain are only meaningful if we understand the data that's there, what it's quality, where it came from what it means. And so actually there's a very interesting distributor metadata question that comes with the block chain technology. And I think that's an important area of research. >> Well Mandy we're at the end of our time. Thank you very much. We could go on and on. You're a true expert and it's great to have you on the Cube. >> Thank you for inviting me. >> So this is James Kobielus with Mandy Chessell of IBM. We are here this week in Berlin at Dataworks Summit 2018. It's a great event and we have some more interviews coming up so thank you very much for tuning in. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube I'm hosting the Cube this week at Dataworks Summit 2018 and the broader strategies that enterprise customers which is not a catalog of, you know, actually containing How's the catalog different from-- So it is a list of all the data sets plus a links "these are the automated actions that are going to happen in the metadata in order to use So Apache Atlas, so it's basically a catalog, So yes, Atlas is a collection of code, James: I'm sorry, so explain what you mean and needs to be classified in this way that describes the different types of things you need in the upper metadata layer, but also it's a very flexible and one of the hot themes in artificial, And one of the core subsets of that the extent to which the Hortonworks, IBM big data catalog in the future so you can start to anticipate issues So tell us about the IBM, Hortonworks partnership for governance of big data in the multi-cloud environment? And, so between the three of us we are basically of the metadata in these heterogeneous environments So the agreements and contracts that are stored You're a true expert and it's great to have you on the Cube. So this is James Kobielus with Mandy Chessell of IBM.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
ING	ORGANIZATION	0.99+
James	PERSON	0.99+
three	QUANTITY	0.99+
Berlin	LOCATION	0.99+
Mandy	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
May 25th	DATE	0.99+
last year	DATE	0.99+
U.S.	LOCATION	0.99+
two days	QUANTITY	0.99+
Atlas	TITLE	0.99+
yesterday	DATE	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Both	QUANTITY	0.98+
EU	LOCATION	0.98+
GDPR	TITLE	0.98+
One	QUANTITY	0.98+
one	QUANTITY	0.98+
Dataworks Summit 2018	EVENT	0.97+
Dataworks Summit EU 2018	EVENT	0.96+
this week	DATE	0.94+
single server	QUANTITY	0.94+
Hadoop	TITLE	0.94+
today	DATE	0.93+
this morning	DATE	0.93+
three-way partnership	QUANTITY	0.93+
Wikibon	ORGANIZATION	0.91+
Hortonworks'	ORGANIZATION	0.9+
Atlas	ORGANIZATION	0.89+
Dataworks Summit Europe 2018	EVENT	0.89+
couple of years ago	DATE	0.87+
Apache Atlas	TITLE	0.86+
Cube	COMMERCIAL_ITEM	0.83+
Apache	ORGANIZATION	0.82+
JanusGraph	TITLE	0.79+
hot themes	QUANTITY	0.68+
Hado	ORGANIZATION	0.67+
Hadoop HTFS	TITLE	0.63+

Dave McDonnell, IBM | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE (relaxing music) covering DataWorks Summit Europe 2018. (relaxing music) Brought to you by Hortonworks. (quieting music) >> Well, hello and welcome to theCUBE. We're here at DataWorks Summit 2018 in Berlin, Germany, and it's been a great show. Who we have now is we have IBM. Specifically we have Dave McDonnell of IBM, and we're going to be talkin' with him for the next 10 minutes or so about... Dave, you explain. You are in storage for IBM, and IBM of course is a partner of Hortonworks who are of course the host of this show. So Dave, have you been introduced, give us your capacity or roll at IBM. Discuss the partnership of Hortonworks, and really what's your perspective on the market for storage systems for Big Data right now and going forward? And what kind of work loads and what kind of requirements are customers coming to you with for storage systems now? >> Okay, sure, so I lead alliances for the storage business unit, and Hortonworks, we actually partner with Hortonworks not just in our storage business unit but also with our analytics counterparts, our power counterparts, and we're in discussions with many others, right? Our partner organization services and so forth. So the nature of our relationship is quite broad compared to many of our others. We're working with them in the analytics space, so these are a lot of these Big Data Data Lakes, BDDNA a lot of people will use as an acronym. These are the types of work loads that customers are using us both for. >> Mm-hmm. >> And it's not new anymore, you know, by now they're well past their first half dozen applications. We've got customers running hundreds of applications. These are production applications now, so it's all about, "How can I be more efficient? "How can I grow this? "How can I get the best performance and scalability "and ease of management to deploy these "in a way that's manageable?" 'cause if I have 400 production applications, that's not off in any corner anymore. So that's how I'd describe it in a nutshell. >> One of the trends that we're seeing at Wikibon, of course I'm the lead analyst for Big Data Analytics at Wikibon under SiliconANGLE Media, we're seeing a trend in the marketplace towards I wouldn't call them appliances, but what I would call them is workload optimized hardware software platforms so they can combine storage with compute and are optimized for AI and machine learning and so forth. Is that something that you're hearing from customers, that they require those built-out, AI optimized storage systems, or is that far in the future or? Give me a sense for whether IBM is doing anything in that area and whether that's on your horizon. >> If you were to define all of IBM in five words or less, you would say "artificial intelligence and cloud computing," so this is something' >> Yeah. that gets a lot of thought in Mindshare. So absolutely we hear about it a lot. It's a very broad market with a lot of diverse requirements. So we hear people asking for the Converged infrastructure, for Appliance solutions. There's of course Hyper Converged. We actually have, either directly or with partners, answers to all of those. Now we do think one of the things that customers want to do is they're going to scale and grow in these environments is to take a software-defined strategy so they're not limited, they're not limited by hardware blocks. You know, they don't want to have to buy processing power and spend all that money on it when really all they need is more data. >> Yeah. >> There's pros and cons to the different (mumbles). >> You have power AI systems, I know that, so that's where they're probably heading, yeah. >> Yes, yes, yes. So of course, we have packages that we've modeled in AI. They feed off of some of the Hortonworks data lakes that we're building. Of course we see a lot of people putting these on new pieces of infrastructure because they don't want to put this on their production applications, so they're extracting data from maybe a Hortonworks data lake number one, Hortonworks data lake number two, some of the EDWs, some external data, and putting that into the AI infrastructure. >> As customers move their cloud infrastructures towards more edge facing environments, or edge applications, how are storage requirements change or evolving in terms of in the move to edge computing. Can you give us a sense for any sort of trends you're seeing in that area? >> Well, if we're going to the world of AI and cognitive applications, all that data that I mighta thrown in the cloud five years ago I now, I'm educated enough 'cause I've been paying bills for a few years on just how expensive it is, and if I'm going to be bringing that data back, some of which I don't even know I'm going to be bringing back, it gets extremely expensive. So we see a pendulum shift coming back where now a lot of data is going to be on host, ah sorry, on premise, but it's not going to stay there. They need the flexibility to move it here, there, or everywhere. So if it's going to come back, how can we bring customers some of that flexibility that they liked about the cloud, the speed, the ease of deployment, even a consumption based model? These are very big changes on a traditional storage manufacturer like ourselves, right? So that's requiring a lot of development in software, it's requiring a lot of development in our business model, and one of the biggest thing you hear us talk about this year is IBM Cloud Private, which does exactly that, >> Right. and it gives them somethin' they can work with that's flexible, it's agile, and allows you to take containerized based applications and move them back and forth as you please. >> Yeah. So containerized applications. So if you can define it for our audience, what is a containerized application? You talk about Docker and orchestrate it through Kubernetes and so forth. So you mentioned Cloud Private. Can you bring us up to speed on what exactly Cloud Private is and in terms of the storage requirements or storage architecture within that portfolio? >> Oh yes, absolutely. So this is a set of infrastructure that's optimized for on-premise deployment that gives you multi-cloud access, not just IBM Cloud, Amazon Web Services, Microsoft Azure, et cetera, and then it also gives you multiple architectural choices basically wrapped by software to allow you to move those containers around and put them where you want them at the right time at the right place given the business requirement at that hour. >> Now is the data storager persisted in the container itself? I know that's fairly difficult to do in a Docker environment. How do ya handle persistence of data for containerized applications within your architecture? >> Okay, some of those are going to be application specific. It's the question of designing the right data management layer depending on the application. So we have software intelligence, some of it from open source, some of which we add on top of open source to bring some of the enterprise resilience and performance needed. And of course, you have to be very careful if the biggest trend in the world is unstructured data. Well, okay fine, it's a lot of sensor data. That's still fairly easy to move around. But once we get into things like medical images, lots of video, you know, HD video, 4K video, those are the things which you have to give a lot of thought to how to do that. And that's why we have lots of new partners that we work with the help us with edge cloud, which gives that on premise-like performance in really a cloud-like set up. >> Here's a question out of left field, and you may not have the answer, but I would like to hear your thoughts on this. How has Blockchain, and IBM's been making significant investments in blockchain technology database technology, how is blockchain changing the face of the storage industry in terms of customers' requirements for a storage systems to manage data in distributed blockchains? Is that something you're hearing coming from customers as a requirement? I'm just tryin' to get a sense for whether that's, you know, is it moving customers towards more flash, towards more distributed edge-oriented or edge deployed storage systems? >> Okay, so yes, yes, and yes. >> Okay. So all of a sudden, if you're doing things like a blockchain application, things become even more important than they are today. >> Yeah. >> Okay, so you can't lose a transaction. You can't have a storage going down. So there's a lot more care and thought into the resiliency of the infrastructure. If I'm, you know, buying a diamond from you, I can't accept the excuse that my $100,000 diamond, maybe that's a little optimistic, my $10,000 diamond or yours, you know, the transaction's corrupted because the data's not proper. >> Right. >> Or if I want my privacy, I need to be assured that there's good data governance around that transaction, and that that will be protected for a good 10, 20, and 30 years. So it's elevating the importance of all the infrastructure to a whole different level. >> Switching our focus slightly, so we're here at DataWorks Summit in Berlin. Where are the largest growth markets right now for cloud storage systems? Is it Apache, is it the North America, or where are the growth markets in terms of regions, in terms of vertical industries right now in the marketplace for enterprise grade storage systems for big data in the cloud? >> That's a great question, 'cause we certainly have these conversations globally. I'd say the place where we're seeing the most activity would be the Americas, we see it in China. We have a lot of interesting engagements and people reaching out to us. I would say by market, you can also point to financial services in more than those two regions. Financial services, healthcare, retail, these are probably the top verticals. I think it's probably safe to assume, and we can the federal governments also have a lot of stringent requirements and, you know, requirements, new applications around the space as well. >> Right. GDPR, how is that impacting your customers' storage requirements. The requirement for GDPR compliance, is that moving the needle in terms of their requirement for consolidated storage of the data that they need to maintain? I mean obviously there's a security, but there's just the sheer amount of, there's a leading to consolidation or centralization of storage, of customer data, that would seem to make it easier to control and monitor usage of the data. Is it making a difference at all? >> It's making a big difference. Not many people encrypt data today, so there's a whole new level of interest in encryption at many different levels, data at rest, data in motion. There's new levels of focus and attention on performance, on the ability for customers to get their arms around disparate islands of data, because now GDPR is not only a legal requirement that requires you to be able to have it, but you've also got timelines which you're expected to act on a request from a customer to have your data removed. And most of those will have a baseline of 30 days. So you can't fool around now. It's not just a nice to have. It's an actual core part of a business requirement that if you don't have a good strategy for, you could be spending tens of millions of dollars in liability if you're not ready for it. >> Well Dave, thank you very much. We're at the end of our time. This has been Dave McDonnell of IBM talking about system storage and of course a big Hortonworks partner. We are here on day two of the DataWorks Summit, and I'm James Kobielus of Wikibon SiliconANGLE Media, and have a good day. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. are customers coming to you with for storage systems now? So the nature of our relationship is quite broad "and ease of management to deploy these One of the trends that we're seeing at Wikibon, and spend all that money on it to the different (mumbles). so that's where they're probably heading, yeah. and putting that into the AI infrastructure. in terms of in the move to edge computing. and one of the biggest thing you hear us and allows you to take containerized based applications and in terms of the storage requirements and put them where you want them at the right time in the container itself? And of course, you have to be very careful and you may not have the answer, and yes. So all of a sudden, Okay, so you can't So it's elevating the importance of all the infrastructure for big data in the cloud? and people reaching out to us. is that moving the needle in terms of their requirement on the ability for customers to get their arms around and of course a big Hortonworks partner.

ENTITIES

Entity	Category	Confidence
Nicola	PERSON	0.99+
Michael	PERSON	0.99+
David	PERSON	0.99+
Josh	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Jeremy Burton	PERSON	0.99+
Paul Gillon	PERSON	0.99+
GM	ORGANIZATION	0.99+
Bob Stefanski	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave McDonnell	PERSON	0.99+
amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
James Kobielus	PERSON	0.99+
Keith	PERSON	0.99+
Paul O'Farrell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
BMW	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
David Siegel	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
Nicola Acutt	PERSON	0.99+
Paul	PERSON	0.99+
David Lantz	PERSON	0.99+
Stu Miniman	PERSON	0.99+
three	QUANTITY	0.99+
Lisa	PERSON	0.99+
Lithuania	LOCATION	0.99+
Michigan	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
General Motors	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
America	LOCATION	0.99+
Charlie	PERSON	0.99+
Europe	LOCATION	0.99+
Pat Gelsing	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bobby	PERSON	0.99+
London	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dante	PERSON	0.99+
Switzerland	LOCATION	0.99+
six-week	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Bob	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
100	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
John Walls	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
Sandy Carter	PERSON	0.99+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Alan Gates, Hortonworks | Dataworks Summit 2018

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
Alan	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Alan Gates	PERSON	0.99+
four years	QUANTITY	0.99+
James	PERSON	0.99+
ING	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Apache	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Java	TITLE	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
100%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Atlas	ORGANIZATION	0.99+
DataWorks Summit 2018	EVENT	0.98+
Data Steward Studio	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
NiFi	ORGANIZATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
Hadoop	ORGANIZATION	0.98+
one platform	QUANTITY	0.97+
2018	EVENT	0.97+
both	QUANTITY	0.97+
millions of events	QUANTITY	0.96+
Hbase	ORGANIZATION	0.95+
Tablo	TITLE	0.95+
ODPI	ORGANIZATION	0.94+
Big Data Analytics	ORGANIZATION	0.94+
One	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
NiFi	COMMERCIAL_ITEM	0.92+
day two	QUANTITY	0.92+
about five	QUANTITY	0.91+
Kafka	TITLE	0.9+
Zeppelin	ORGANIZATION	0.89+
Atlas	TITLE	0.85+
Ranger	ORGANIZATION	0.84+
Jupyter	ORGANIZATION	0.83+
first	QUANTITY	0.82+
Apache Atlas	ORGANIZATION	0.82+
Hadoop	TITLE	0.79+

Day Two Keynote Analysis | Dataworks Summit 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Datawork Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Hello and welcome to the Cube on day two of Dataworks Summit 2018 from Berlin. It's been a great show so far. We have just completed the day two keynote and in just a moment I'll bring ya up to speed on the major points and the presentations from that. It's been a great conference. Fairly well attended here. The hallway chatter, discussion's been great. The breakouts have been stimulating. For me the takeaway is the fact that Hortonworks, the show host, has announced yesterday at the keynote, Scott Gnau, the CTO of Hortonworks announced Data Steward Studio, DSS they call it, part of the data plane, Hotronworks data plane services portfolio and it could not be more timely Data Steward Studio because we are now five weeks away from GDPR, that's the General Data Protection Regulation becoming the law of the land. When I say the land, the EU, but really any company that operates in the EU, and that includes many U.S. based and Apac based and other companies will need to comply with the GDPR as of May 25th and ongoing. In terms of protecting the personal data of EU citizens. And that means a lot of different things. Data Steward Studio announced yesterday, was demo'd today, by Hortonworks and it was a really excellent demo, and showed that it's a powerful solution for a number of things that are at the core of GDPR compliance. The demo covered the capability of the solution to discover and inventory personal data within a distributed data lake or enterprise data environment, number one. Number two, the ability of the solution to centralize consent, provide a consent portal essentially that data subjects can use then to review the data that's kept on them to make fine grain consents or withdraw consents for use in profiling of their data that they own. And then number three, the show, they demonstrated the capability of the solution then to execute the data subject to people's requests in terms of the handling of their personal data. The three main points in terms of enabling, adding the teeth to enforce GDPR in an operational setting in any company that needs to comply with GDPR. So, what we're going to see, I believe going forward in the, really in the whole global economy and in the big data space is that Hortonworks and others in the data lake industry, and there's many others, are going to need to roll out similar capabilities in their portfolios 'cause their customers are absolutely going to demand it. In fact the deadline is fast approaching, it's only five weeks away. One of the interesting take aways from the, the keynote this morning was the fact that John Kreisa, the VP for marketing at Hortonworks today, a quick survey of those in the audience a poll, asking how ready they are to comply with GDPR as of May 25th and it was a bit eye opening. I wasn't surprised, but I think it was 19 or 20%, I don't have the numbers in front of me, said that they won't be ready to comply. I believe it was something where between 20 and 30% said they will be able to comply. About 40% I'm, don't quote me on that, but a fair plurality said that they're preparing. So that, indicates that they're not entirely 100% sure that they will be able to comply 100% to the letter of the law as of May 25th. I think that's probably accurate in terms of ballpark figures. I think there's a lot of, I know there's a lot of companies, users racing for compliance by that date. And so really GDPR is definitely the headline banner, umbrella story around this event and really around the big data community world-wide right now in terms of enterprise, investments in the needed compliance software and services and capabilities are needed to comply with GDPR. That was important. That wasn't the only thing that was covered in, not only the keynotes, but in the sessions here so far. AI, clearly AI and machine learning are hot themes in terms of the innovation side of big data. There's compliance, there's GDPR, but really innovation in terms of what enterprises are doing with their data, with their analytics, they're building more and more AI and embedding that in conversational UIs and chatbots and their embedding AI, you know manner of e-commerce applications, internal applications in terms of search, as well as things like face recognition, voice recognition, and so forth and so on. So, what we've seen here at the show is what I've been seeing for quite some time is that more of the actual developers who are working with big data are the data scientists of the world. And more of the traditional coders are getting up to speed very rapidly on the new state of the art for building machine learning and deep learning AI natural language processing into their applications. That said, so Hortonworks has become a fairly substantial player in the machine learning space. In fact, you know, really across their portfolio many of the discussions here I've seen shows that everybody's buzzing about getting up to speed on frameworks for building and deploying and iterating and refining machine learning models in operational environments. So that's definitely a hot theme. And so there was an AI presentation this morning from the first gentleman that came on that laid out the broad parameters of what, what developers are doing and looking to do with data that they maintain in their lakes, training data to both build the models and train them and deploy them. So, that was also something I expected and it's good to see at Dataworks Summit that there is a substantial focus on that in addition of course to GDPR and compliance. It's been about seven years now since Hortonworks was essentially spun off of Yahoo. It's been I think about three years or so since they went IPO. And what I can see is that they are making great progress in terms of their growth, in terms of not just the finances, but their customer acquisition and their deal size and also customer satisfaction. I get a sense from talking to many of the attendees at this event that Hortonworks has become a fairly blue chip vendor, that they're really in many ways, continuing to grow their footprint of Hortonworks products and services in most of their partners, such as IBM. And from what I can see everybody was wrapped with intention around Data Steward Studio and I sensed, sort of a sigh of relief that it looks like a fairly good solution and so I have no doubt that a fair number of those in this hall right now are probably, as we say in the U.S., probably kicking the tires of DSS and probably going to expedite their adoption of it. So, with that said, we have day two here, so what we're going to have is Alan Gates, one of the founders of Hortonworks coming on in just a few minutes and I'll be interviewing him, asking about the vibrancy in the health of the community, the Hortonworks ecosystem, developers, partners, and so forth as well as of course the open source communities for Hadoop and Ranger and Atlas and so forth, the growing stack of open source code upon which Hortonworks has built their substantial portfolio of solutions. Following him we'll have John Kreisa, the VP for marketing. I'm going to ask John to give us an update on, really the, sort of the health of Hortonworks as a business in terms of the reach out to the community in terms of their messaging obviously and have him really position Hortonworks in the community in terms of who's he see them competing with. What segments is Hortonworks in now? The whole Hadoop segment increasingly... Hadoop is there. It's the foundation. The word is not invoked in the context of discussions of Hortonworks as much now as it was in the past. And the same thing for say Cloudera one of their closest to traditional rivals, closest in the sense that people associate them. I was at the Cloudera analyst event the other week in Santa Monica, California. It was the same thing. I think both of these vendors are on a similar path to become fairly substantial data warehousing and data governance suppliers to the enterprises of the world that have traditionally gone with the likes of IBM and Oracle and SAP and so forth. So I think they're, Hortonworks, has definitely evolved into a far more diversified solution provider than people realize. And that's really one of the take aways from Dataworks Summit. With that said, this is Jim Kobielus. I'm the lead analyst, I should've said that at the outset. I'm the lead analyst at SiliconANGLE's Media's Wikibon team focused on big data analytics. I'm your host this week on the Cube at Dataworks Summit Berlin. I'll close out this segment and we'll get ready to talk to the Hortonworks and IBM personnel. I understand there's a gentleman from Accenture on as well today on the Cube here at Dataworks Summit Berlin. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube as a business in terms of the reach out to the community

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
John Kreisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
May 25th	DATE	0.99+
Berlin	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
five weeks	QUANTITY	0.99+
Alan Gates	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Hotronworks	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Santa Monica, California	LOCATION	0.99+
GDPR	TITLE	0.99+
19	QUANTITY	0.99+
both	QUANTITY	0.99+
100%	QUANTITY	0.99+
today	DATE	0.99+
20%	QUANTITY	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
U.S.	LOCATION	0.99+
DSS	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
three main points	QUANTITY	0.98+
Atlas	ORGANIZATION	0.98+
20	QUANTITY	0.98+
about seven years	QUANTITY	0.98+
Accenture	ORGANIZATION	0.97+
SiliconANGLE	ORGANIZATION	0.97+
One	QUANTITY	0.97+
about three years	QUANTITY	0.97+
Day Two	QUANTITY	0.97+
first gentleman	QUANTITY	0.96+
day two	QUANTITY	0.96+
SAP	ORGANIZATION	0.96+
EU	LOCATION	0.95+
Datawork Summit Europe 2018	EVENT	0.95+
Dataworks Summit	EVENT	0.94+
this morning	DATE	0.91+
About 40%	QUANTITY	0.91+
Wikibon	ORGANIZATION	0.9+
EU	ORGANIZATION	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Dataworks Summit 2018: