Steve Wooledge, Arcadia Data & Satya Ramachandran, Neustar | DataWorks Summit 2018

(upbeat electronic music) >> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering Dataworks Summit 2018, brought to you by Hortonworks. (electronic whooshing) >> Welcome back to theCUBE's live coverage of Dataworks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests in this segment, we have Steve Wooledge, he is the VP of Product Marketing at Arcadia Data, and Satya Ramachandran, who is the VP of Engineering at Neustar. Thanks so much for coming on theCUBE. >> Our pleasure and thank you. >> So let's start out by setting the scene for our viewers. Tell us a little bit about what Arcadia Data does. >> Arcadia Data is focused on getting business value from these modern scale-out architectures, like Hadoop, and the Cloud. We started in 2012 to solve the problem of how do we get value into the hands of the business analysts that understand a little bit more about the business, in addition to empowering the data scientists to deploy their models and value to a much broader audience. So I think that's been, in some ways, the last mile of value that people need to get out of Hadoop and data lakes, is to get it into the hands of the business. So that's what we're focused on. >> And start seeing the value, as you said. >> Yeah, seeing is believing, a picture is a thousand words, all those good things. And what's really emerging, I think, is companies are realizing that traditional BI technology won't solve the scale and user concurrency issues, because architecturally, big data's different, right? We're on the scale-out, MPP architectures now, like Hadoop, the data complexity and variety has changed, but the BI tools are still the same, and you pull the data out of the system to put it into some little micro cube to do some analysis. Companies want to go after all the data, and view the analysis across a much broader set, and that's really what we enable. >> I want to hear about the relationship between your two companies, but Satya, tell us a little about Neustar, what you do. >> Neustar is an information services company, we are built around identity. We are the premiere identity provider, the most authoritative identity provider for the US. And we built a whole bunch of services around that identity platform. I am part of the marketing solutions group, and I head the analytics engineering for marketing solutions. The product that I work on helps marketers do their annual planning, as well as their campaign or tactical planning, so that they can fine tune their campaigns on an ongoing basis. >> So how do you use Arcadia Data's primary product? >> So we are a predictive analytics platform, the reporting solution, we use Arcadia for the reporting part of it. So we have multi terabytes of advertising data in our values, and so we use Arcadia to provide fast taxes to our customers, and also very granular and explorative analysis of this data. High (mumbles) and explorative analysis of this data. >> So you say you help your customers with their marketing campaigns, so are you doing predictive analytics? And are you during churn analysis and so forth? And how does Arcadia fit into all of that? >> So we get data and then they build an activation model, which tells how the marketing spent corresponds to the revenue. We not only do historical analysis, we also do predictive, in the sense that the marketers frequently done what-if analysis, saying that, what if I moved my budget from page search to TV? And how does it affect the revenue? So all of this modeling is built by Neustar, the modeling platform is built by the Neustar, but the last mile of taking these reports and providing this explorative analysis of the results, that is provided by the reporting solution, which is Arcadia. >> Well, I mean, the thing about data analytics, is that it really is going to revolutionize marketing. That famous marketing adage of, I know my advertising works, I just don't know which half, and now we're really going to be able to figure out which half. Can you talk a little bit about return on investment and what your clients see? >> Sure, we've got some major Fortune 500 companies that have said publicly that they've realized over a billion dollars of incremental value. And that could be across both marketing analytics, and how we better treat our messaging, our brand, to reach our intended audience. There's things like supply chain and being able to more realtime analyze what-if analysis for different routes, it's things like cyber security and stopping fraud and waste and things like that at a much grander scale than what was really possible in the past. >> So we're here at Dataworks and it's the Hortonworks show. Give us a sense of the degree of your engagement or partnership with Hortonworks and participation in their partner ecosystem. >> Yeah, absolutely. Hortonworks is one of our key partners, and what we did that's different architecturally, is we built our BI server directly into the data platforms. So what I mean by that is, we take the concept of a BI server, we install it and run it on the data nodes of Hortonworks Data Platform. We inherit the security directly out of systems like Apache Ranger, so that all that administration and scale is done at Hadoop economics, if you will, and it leverages the things that are already in place. So that has huge advantages both in terms of scale, but also simplicity, and then you get the performance, the concurrency that companies need to deploy out to like, 5,000 users directly on that Hadoop cluster. So, Hortonworks is a fantastic partner for us and a large number of our customers run on Hortonworks, as well as other platforms, such as Amazon Web Services, where Satya's got his system deployed. >> At the show they announced Hortonworks Data Platform 3.0. There's containerization there, there's updates to Hive to enable it to be more of a realtime analytics, and also a data warehousing engine. In Arcadia Data, do you follow their product enhancements, in terms of your own product roadmap with any specific, fixed cycle? Are you going to be leveraging the new features in HDP 3.0 going forward to add value to your customers' ability to do interactive analysis of this data in close to realtime? >> Sure, yeah, no, because we're a native-- >> 'Cause marketing campaigns are often in realtime increasingly, especially when you're using, you know, you got a completely digital business. >> Yeah, absolutely. So we benefit from the innovations happening within the Hortonworks Data Platform. So, because we're a native BI tool that runs directly within that system, you know, with changes in Hive, or different things within HDFS, in terms of performance or compression and things like that, our customers generally benefit from that directly, so yeah. >> Satya, going forward, what are some of the problems that you want to solve for your clients? What is their biggest pain points and where do you see Neustar? >> So, data is the new island, right? So, marketers, also for them now, data is the biggest, is what they're going after. They want faster analysis, they want to be able to get to insights as fast as they can, and they want to obviously get, work on as large amount of data as possible. The variety of sources is becoming higher and higher and higher, in terms of marketing. There used to be a few channels in '70s and '80s, and '90s kind of increased, now you have like, hundreds of channels, if not thousands of channels. And they want visibility across all of that. It's the ability to work across this variety of data, increasing volume at a very high speed. Those are high level challenges that we have at Neustar. >> Great. >> So the difference, marketing attribution analysis you say is one of the core applications of your solution portfolio. How is that more challenging now than it had been in the past? We have far more marketing channels, digital and so forth, then how does the state-of-the-art of marketing attribution analysis, how is it changing to address this multiplicity of channels and media for advertising and for influencing the customer on social media and so forth? And then, you know, can you give us a sense for then, what are the necessary analytical tools needed for that? We often hear about a social graph analysis or semantic analysis, or for behavioral analytics and so forth, all of this makes it very challenging. How can you determine exactly what influences a customer now in this day and age, where, you think, you know, Twitter is an influencer over the conversation. How can you nail that down to specific, you know, KPIs or specific things to track? >> So I think, from our, like you pointed out, the variety is increasing, right? And I think the marketers now have a lot more options than what they have, and that that's a blessing, and it's also a curse. Because then I don't know where I'm going to move my marketing spending to. So, attribution right now, is still sitting at the headquarters, it's kind of sitting at a very high level and it is answering questions. Like we said, with the Fortune 100 companies, it's still answering questions to the CMOs, right? Where attribution will take us, next step is to then lower down, where it's able to answer the regional headquarters on what needs to happen, and more importantly, on every store, I'm able to then answer and tailor my attribution model to a particular store. Let's take Ford for an example, right? Now, instead of the CMO suite, but, if I'm able to go to every dealer, and I'm able to personal my attribution to that particular dealer, then it becomes a lot more useful. The challenge there is it all needs to be connected. Whatever model we are working for the dealer, needs to be connected up to the headquarters. >> Yes, and that personalization, it very much leverages the kind of things that Steve was talking about at Arcadia. Being able to analyze all the data to find those micro, micro, micro segments that can be influenced to varying degrees, so yeah. I like where you're going with this, 'cause it very much relates to the power of distributing federated big data fabrics like Hortonworks' offers. >> And so it's streaming analytics is coming to forward, and it's been talked about for the past longest period of time, but we have real use cases for streaming analytics right now. Similarly, the large volumes of the data volumes is, indeed, becoming a lot more. So both of them are doing a lot more right now. >> Yes. >> Great. >> Well, Satya and Steve, thank you so much for coming on theCUBE, this was really, really fun talking to you. >> Excellent. >> Thanks, it was great to meet you. Thanks for having us. >> I love marketing talk. >> (laughs) It's fun. I'm Rebecca Knight, for James Kobielus, stay tuned to theCUBE, we will have more coming up from our live coverage of Dataworks, just after this. (upbeat electronic music)

Published Date : Jun 20 2018

SUMMARY :

brought to you by Hortonworks. the VP of Product Marketing the scene for our viewers. the data scientists to deploy their models the value, as you said. and you pull the data out of the system Neustar, what you do. and I head the analytics engineering the reporting solution, we use Arcadia analysis of the results, and what your clients see? and being able to more realtime and it's the Hortonworks show. and it leverages the things of this data in close to realtime? you got a completely digital business. So we benefit from the It's the ability to work to specific, you know, KPIs and I'm able to personal my attribution the data to find those micro, analytics is coming to forward, talking to you. Thanks, it was great to meet you. stay tuned to theCUBE, we

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Steve Wooledge	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Satya Ramachandran	PERSON	0.99+
Steve	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Neustar	ORGANIZATION	0.99+
Arcadia Data	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
Satya	PERSON	0.99+
2012	DATE	0.99+
San Jose	LOCATION	0.99+
two companies	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
Arcadia	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
US	LOCATION	0.99+
both	QUANTITY	0.99+
Hortonworks'	ORGANIZATION	0.99+
5,000 users	QUANTITY	0.99+
Dataworks	ORGANIZATION	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.97+
Twitter	ORGANIZATION	0.96+
hundreds of channels	QUANTITY	0.96+
Dataworks Summit 2018	EVENT	0.96+
DataWorks Summit 2018	EVENT	0.93+
thousands of channels	QUANTITY	0.93+
over a billion dollars	QUANTITY	0.93+
Data Platform 3.0	TITLE	0.9+
'70s	DATE	0.86+
Arcadia	TITLE	0.84+
Hadoop	TITLE	0.84+
HDP 3.0	TITLE	0.83+
'90s	DATE	0.82+
Apache Ranger	ORGANIZATION	0.82+
thousand words	QUANTITY	0.76+
HDFS	TITLE	0.76+
multi terabytes	QUANTITY	0.75+
Hive	TITLE	0.69+
Neustar	TITLE	0.67+
Fortune	ORGANIZATION	0.62+
80s	DATE	0.55+
500	QUANTITY	0.45+
100	QUANTITY	0.4+
theCUBE	TITLE	0.39+

Partha Seetala, Robin Systems | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone, you are watching day two of theCUBE's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight. I'm coming at you with my cohost Jame Kobielus. We're joined by Partha Seetala, he is the Chief Technology Officer at Robin Systems, thanks so much for coming on theCUBE. >> Pleasure to be here. >> You're a first timer, so we promise we don't bite. >> Actually I'm not, I was on theCUBE- >> Oh! >> At DockerCon in 2016. >> Oh well excellent, okay, so now you're a veteran, right. >> Yes, ma'am. >> So Robin Systems, as before the cameras were rolling, we were talking about it, it's about four years old, based here in San Jose, venture backed company. Tell us a little bit more about the company and what you do. >> Absolutely. First of all, thanks for hosting me here. Like you said, Robin is a Silicon Valley based company. Our focus is in allowing applications, such as big data, databases, no sequel and AI ML, to run within the Kubernetes platform. What we have built is a product that converges storage, complex storage, networking, application workflow management, along with Kubernetes to create a one click experience where users can get managed services kind of feel when they're deploying these applications. They can also do one click life cycle management on these apps. Our thesis has initially been to, instead of looking at this problem from an infrastructure up into application, to actually look at it from the applications down and then say, "Let the applications drive the underlying infrastructure to meet the user's requirements." >> Is that your differentiating factor, would you say? >> Yeah, I think it is because most of the folks out there today are looking at is as if it's a competent based play, it's like they want to bring storage to Kubernetes or networking to Kubernetes but the challenges are not really around storage and networking. If you talk to the operations folk they say that, "You know what? Those are underlying problems but my challenge is more along the lines of, okay, my CIO says the initiative is to make my applications mobile. They want go across to different Clouds. That's my challenge." The line of business user says, "I want to get a managed source experience." Yes, storage is the thing that you want to manage underneath, but I want to go and click and create my, let's say, an Oracle database or distributions log. >> In terms of the developer experience here, from the application down, give us a sense for how Robin Systems tooling your product enables that degree of specification of the application logic that will then get containerized within? >> Absolutely, like I said, we want applications to drive the infrastructure. What it means is that we, Robin is a software platform. We later ourselves on top of the machines that we sit on whether it is bare metal machines on premises, our VMs, or even an Azure, Google Cloud as well as AWs. Then we make the underlying compute, storage, network resources almost invisible. We treat it as a pool of resources. Now once you have this pool of resources, they can be attached to the applications that are being deployed as can inside containers. I mean, it's a software place, install on machines. Once it's installed, the experience now moves away from infrastructure into applications. You log in, you can see a portal, you have a lot of applications in that portal. We ship support for about 25 applications of some such. >> So these are templates? >> Yes. >> That the developer can then customize to their specific requirements? Or no? >> Absolutely, we ship reference templates for pretty much a wide variety of the most popular big data, no sequel, database, AI ML applications today. But again, as I said, it's a reference implementation. Typically customers take the reference recommendation and they enhance it or they use that to onboard their custom apps, for example, or the apps that we don't ship out of the box. So it's a very open, extensible platform but the goal being that whatever the application might be, in fact we keep saying that, if it runs somewhere else, it's runs on Robin, right? So the idea here is that you can bring anything, and we just, the flip of switch, you can make it a one click deploy, one click manage, one click mobile across Clouds. >> You keep mentioning this one click and this idea of it being so easy, so convenient, so seamless, is that what you say is the biggest concern of your customers? Is this ease and speed? Or what are some other things that are on their minds that you want to deliver? >> Right, so one click of course is a user experience part but what is the real challenge? The real challenges, there are a wide variety of tools being used by enterprises today. Even the data analytic pipeline, there's a lot across the data store, processor pipeline. Users don't want to deal with setting it up and keeping it up and running. They don't want that, they want to get the job done, right? Now when you only get the job done, you really want to hide the underlying details of those platforms and the best way to convey that, the best way to give that experience is to make it a single click experience from the UI. So I keep calling it all one click because that is the experience that you get to hide the underlying complexity for these apps. >> Does your environment actually compile executable code based on that one click experience? Or where does the compilation and containerization actually happen in your distributed architecture? >> Alright, so, I think the simplest- >> You're a prem based offering, right? You're not in the Cloud yourself? >> No, we are. We work on all the three big public clouds. >> Oh, okay. >> Whether it is Azure, AWS or Google. >> So your entire application is containerized itself for deployment into these Clouds? >> Yes, it is. >> Okay. >> So the idea here is let's simplify it significantly, right? You have Kubernetes today, it can run anywhere, on premises, in the public Cloud and so on. Kubernetes is a great platform for orchestrating containers but it is largely inaccessible to a certain class of data centric applications. >> Yeah. >> We make that possible. But our take is, just onboarding those applications on Kubernetes does not solve your CXO or you line of business user's problems. You ought to make the management, from an application point of view, not from a container management point of view, from an application point of view, a lot easier and that is where we kind of create this experience that I'm talking about, one click experience. >> Give us a sense for how, we're here at DataWorks and it's the Hortonworks show. Discuss with us your partnership with Hortonworks and you know, we've heard the announcement of HDP 3.0 and containerization support, just give us a rough sense for how you align or partner with Hortonworks in this area. >> Absolutely. It's kind of interesting because Hortonworks is a data management platform, if you think about it from that point of view and when we engaged with them first- So some of our customers have been using the product, Hortonworks, on top of Robin, so orchestrating Hortonworks, making it a lot easier to use. >> Right. >> One of the requirements was, "Are you certified with Hortonworks?" And the challenge that Hortonworks also had is they had never certified a container based deployment of Hortonworks before. They actually were very skeptical, you know, "You guys are saying all these things. Can you actually containerize and run Hortonworks?" So we worked with Hortonworks and we are, I mean if you go to the Hortonworks website, you'll see that we are the first in the entire industry who have been certified as a container based play that can actually deploy and manage Hortonworks. They have certified us by running a wide variety of tests, which they call the Q80 Test Suite, and when we got certified the only other players in the market that got that stamp of approval was Microsoft in Azure and EMC with Isilon. >> So you're in good company? >> I think we are in great company. >> You're certified to work with HTP 3.0 or the prior version or both? >> When we got certified we were still in the 2.X version of Hortonworks, HTP 3.0 is a more relatively newer version. But our plan is that we want to continue working with Hortonworks to get certified as they release the program and also help them because HTP 3.0 also has some container based orchestration and deployment so you want to help them provide the underlying infrastructure so that it becomes easier for beyond to spin up more containers. >> The higher level security and governance and all these things you're describing, they have to be over the Kubernetes layer. Hortonworks supports it in their data plane services portfolio. Does Robin Systems solutions portfolio tap in to any of that, or do you provide your own layer of sort of security and metadata management so forth? >> Yeah, so we don't want- >> In context of what you offer? >> Right, so we don't want to take away the security model that the application itself provides because might have step it up so that they are doing governance, it's not just logging in and auto control and things like this. Some governance is built into. We don't want to change that. We want to keep the same experience and the same workflow hat customers have so we just integrate with whatever security that the application has. We, of course, provide security in terms of isolating these different apps that are running on the Robin platform where the security or the access into the application itself is left to the apps themselves. When I say apps, I'm talking about Hortonworks. >> Yeah, sure. >> Or any other databases. >> Moving forward, as you think about ways you're going to augment and enhance and alter the Robin platform, what are some of the biggest trends that are driving your decision making around that in the sense of, as we know that companies are living with this deluge of data, how are you helping them manage it better? >> Sure. I think there are a few trends that we are closely watching. One is around Cloud mobility. CIOs want their applications along with their data to be available where their end users are. It's almost like follow the sun model, where you might have generated the data in one Cloud and at a different time, different time zone, you'll basically want to keep the app as well as data, moving. So we are following that very closely. How we can enable the mobility of data and apps a lot easier in that world. The other one is around the general AI ML workflow. One of the challenges there, of course, you have great apps like TensorFlow or Theano or Caffe, these are very good AI ML toolkits but one of the challenges that people face, is they are buying this very expensive, let's say NVIDIA DGX Box, this box costs about $150,000 each, how do you keep these boxes busy so that you're getting a good return on investment? It will require you to better manage the resources offered with these boxes. We are also monitoring that space and we're seeing that how can we take the Robin platform and how do you enable the better utilization of GPUs or the sharing of GPUs for running your AI ML kind of workload. >> Great. >> Those are, I think, two key trends that we are closely watching. >> We'll be discussing those at the next DataWorks Summit, I'm sure, at some other time in the future. >> Absolutely. >> Thank you so much for coming on theCUBE, Partha. >> Thank you. >> Thank you, my pleasure. Thanks. >> I'm Rebecca Knight for James Kobielus, We will have more from DataWorks coming up in just a little bit. (techno beat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, he is the Chief Technology we promise we don't bite. so now you're a veteran, right. and what you do. from the applications down Yes, storage is the thing that you want the machines that we sit on or the apps that we don't because that is the No, we are. So the idea here is let's and that is where we kind of create and it's the Hortonworks show. if you think about it One of the requirements was, or the prior version or both? the underlying infrastructure so that to any of that, or do you that are running on the Robin platform the Robin platform and how do you enable that we are closely watching. at the next DataWorks Summit, Thank you so much for Thank you, my pleasure. We will have more from DataWorks

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Jame Kobielus	PERSON	0.99+
San Jose	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Robin Systems	ORGANIZATION	0.99+
Partha Seetala	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
one click	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one	QUANTITY	0.99+
2016	DATE	0.99+
both	QUANTITY	0.99+
HTP 3.0	TITLE	0.99+
NVIDIA	ORGANIZATION	0.99+
first	QUANTITY	0.99+
DataWorks	ORGANIZATION	0.99+
Robin	ORGANIZATION	0.98+
Kubernetes	TITLE	0.98+
One	QUANTITY	0.98+
TensorFlow	TITLE	0.98+
about $150,000 each	QUANTITY	0.98+
about 25 applications	QUANTITY	0.98+
one click	QUANTITY	0.98+
Partha	PERSON	0.98+
Isilon	ORGANIZATION	0.97+
DGX Box	COMMERCIAL_ITEM	0.97+
today	DATE	0.96+
First	QUANTITY	0.96+
DockerCon	EVENT	0.96+
Azure	ORGANIZATION	0.96+
Theano	TITLE	0.96+
DataWorks Summit 2018	EVENT	0.95+
theCUBE	ORGANIZATION	0.94+
Caffe	TITLE	0.91+
Azure	TITLE	0.91+
Robin	PERSON	0.91+
Robin	TITLE	0.9+
two key trends	QUANTITY	0.89+
HDP 3.0	TITLE	0.87+
EMC	ORGANIZATION	0.86+
single click	QUANTITY	0.86+
day two	QUANTITY	0.84+
DataWorks Summit	EVENT	0.83+
three big public clouds	QUANTITY	0.82+
DataWorks	EVENT	0.81+

Rob Bearden, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Rob	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
hundreds of thousands of dollars	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
each component	QUANTITY	0.99+
GDPR	TITLE	0.99+
DataWorks Summit	EVENT	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
millions of dollars	QUANTITY	0.98+
Atlas	TITLE	0.98+
first steps	QUANTITY	0.98+
HDP 3.0	TITLE	0.97+
One pane	QUANTITY	0.97+
both	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
First	QUANTITY	0.96+
next year	DATE	0.96+
each	QUANTITY	0.96+
DataPlane	TITLE	0.96+
theCUBE	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
DataWorks	ORGANIZATION	0.95+
Spark	TITLE	0.95+
today	DATE	0.94+
EU	LOCATION	0.93+
this morning	DATE	0.91+
Atlanta,	LOCATION	0.91+
Berlin	LOCATION	0.9+
each type	QUANTITY	0.88+
Global Data Protection Regulation GDPR	TITLE	0.87+
one common	QUANTITY	0.86+
few months ago	DATE	0.85+
NiFi	ORGANIZATION	0.85+
Data Platform 3.0	TITLE	0.84+
each tier	QUANTITY	0.84+
Data Studio	ORGANIZATION	0.84+
Data Studio	TITLE	0.83+
day one	QUANTITY	0.83+
one management platform	QUANTITY	0.82+
MiNiFi	ORGANIZATION	0.82+
San	LOCATION	0.71+
DataPlane	ORGANIZATION	0.69+
Kafka	TITLE	0.67+
Encore ERP	TITLE	0.66+
one common set	QUANTITY	0.65+
Data Steward Studio	ORGANIZATION	0.65+
HDF	ORGANIZATION	0.59+
Georgia	LOCATION	0.55+
announcements	QUANTITY	0.51+
Jose	ORGANIZATION	0.47+

Tim Vincent & Steve Roberts, IBM | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, overing DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to day two of theCUBE's live coverage of DataWorks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host James Kobielus. We have two guests on this panel today, we have Tim Vincent, he is the VP of Cognitive Systems Software at IBM, and Steve Roberts, who is the Offering Manager for Big Data on IBM Power Systems. Thanks so much for coming on theCUBE. >> Oh thank you very much. >> Thanks for having us. >> So we're now in this new era, this Cognitive Systems era. Can you set the scene for our viewers, and tell our viewers a little bit about what you do and why it's so important >> Okay, I'll give a bit of a background first, because James knows me from my previous role as, and you know I spent a lot of time in the data and analytics space. I was the CTO for Bob running the analytics group up 'til about a year and a half ago, and we spent a lot of time looking at what we needed to do from a data perspective and AI's perspective. And Bob, when he moved over to the Cognitive Systems, Bob Picciano who's my current boss, Bob asked me to move over and really start helping build, help to build out more of a software, and more of an AI focus, and a workload focus on how we thinking of the Power brand. So we spent a lot of time on that. So when you talk about cognitive systems or AI, what we're really trying to do is think about how you actually couple a combination of software, so co-optimize software space and the hardware space specific of what's needed for AI systems. Because the act of processing, the data processing, the algorithmic processing for AI is very, very different then what you would have for traditional data workload. So we're spending a lot of time thinking about how you actually co-optimize those systems so you can actually build a system that's really optimized for the demands of AI. >> And is this driven by customers, is this driven by just a trend that IBM is seeing? I mean how are you, >> It's a combination of both. >> So a lot of this is, you know, there's a lot of thought put into this before I joined the team. So there was a lot of good thinking from the Power brand, but it was really foresight on things like Moore's Law coming to an end of it's lifecycle right, and the ramifications to that. And at the same time as you start getting into things like narrow NATS and the floating point operations that you need to drive a narrow NAT, it was clear that we were hitting the boundaries. And then there's new technologies such as what Nvidia produces with with their GPUs, that are clearly advantageous. So there's a lot of trends that were comin' together the technical team saw, and at the same time we were seeing customers struggling with specific things. You know how to actually build a model if the training time is going to be weeks, and months, or let alone hours. And one of the scenarios I like to think about, I was probably showing my age a bit, but went to a school called University of Waterloo, and when I went to school, and in my early years, they had a batch based system for compilation and a systems run. You sit in the lab at night and you submit a compile job and the compile job will say, okay it's going to take three hours to compile the application, and you think of the productivity hit that has to you. And now you start thinking about, okay you've got this new skill in data scientists, which is really, really hard to find, they're very, very valuable. And you're giving them systems that take hours and weeks to do what the need to do. And you know, so they're trying to drive these models and get a high degree of accuracy in their predictions, and they just can't do it. So there's foresight on the technology side and there's clear demand on the customer side as well. >> Before the cameras were rolling you were talking about how the term data scientists and app developers is used interchangeably, and that's just wrong. >> And actually let's hear, 'cause I'd be in this whole position that I agree with it. I think it's the right framework. Data science is a team sport but application development has an even larger team sport in which data scientists, data engineers play a role. So, yeah we want to hear your ideas on the broader application development ecosystem, and where data scientists, and data engineers, and sort, fall into that broader spectrum. And then how IBM is supporting that entire new paradigm of application development, with your solution portfolio including, you know Power, AI on Power? >> So I think you used the word collaboration and team sport, and data science is a collaborative team sport. But you're 100% correct, there's also a, and I think it's missing to a great degree today, and it's probably limiting the actual value AI in the industry, and that's had to be data scientists and the application developers interact with each other. Because if you think about it, one of the models I like to think about is a consumer-producer model. Who consumes things and who produces things? And basically the data scientists are producing a specific thing, which is you know simply an AI model, >> Machine models, deep-learning models. >> Machine learning and deep learning, and the application developers are consuming those things and then producing something else, which is the application logic which is driving your business processes, and this view. So they got to work together. But there's a lot of confusion about who does what. You know you see people who talk with data scientists, build application logic, and you know the number of people who are data scientists can do that is, you know it exists, but it's not where the value, the value they bring to the equation. And the application developers developing AI models, you know they exist, but it's not the most prevalent form fact. >> But you know it's kind of unbalanced Tim, in the industry discussion of these role definitions. Quite often the traditional, you know definition, our sculpting of data scientist is that they know statistical modeling, plus data management, plus coding right? But you never hear the opposite, that coders somehow need to understand how to build statistical models and so forth. Do you think that the coders of the future will at least on some level need to be conversant with the practices of building,and tuning, or training the machine learning models or no? >> I think it's absolutely happen. And I will actually take it a step further, because again the data scientist skill is hard for a lot of people to find. >> Yeah. >> And as such is a very valuable skill. And what we're seeing, and we are actually one of the offerings that we're pulling out is something called PowerAI Vision, and it takes it up another level above the application developer, which is how do you actually really unlock the capabilities of AI to the business persona, the subject matter expert. So in the case of vision, how do you actually allow somebody to build a model without really knowing what a deep learning algorithm is, what kind of narrow NATS you use, how to do data preparation. So we build a tool set which is, you know effectively a SME tool set, which allows you to automatically label, it actually allows you to tag and label images, and then as you're tagging and labeling images it learns from that and actually it helps automate the labeling of the image. >> Is this distinct from data science experience on the one hand, which is geared towards the data scientists and I think Watson Analytics among your tools, is geared towards the SME, this a third tool, or an overlap. >> Yeah this is a third tool, which is really again one of the co-optimized capabilities that I talked about, is it's a tool that we built out that really is leveraging the combination of what we do in Power, the interconnect which we have with the GPU's, which is the NVLink interconnect, which gives us basically a 10X improvement in bandwidth between the CPU and GPU. That allows you to actually train your models much more quickly, so we're seeing about a 4X improvement over competitive technologies that are also using GPU's. And if we're looking at machine learning algorithms, we've recently come out with some technology we call Snap ML, which allows you to push machine learning, >> Snap ML, >> Yeah, it allows you to push machine learning algorithms down into the GPU's, and this is, we're seeing about a 40 to 50X improvement over traditional processing. So it's coupling all these capabilities, but really allowing a business persona to something specific, which is allow them to build out AI models to do recognition on either images or videos. >> Is there a pre-existing library of models in the solution that they can tap into? >> Basically it allows, it has a, >> Are they pre-trained? >> No they're not pre-trained models that's one of the differences in it. It actually has a set of models that allow, it picks for you, and actually so, >> Oh yes, okay. >> So this is why it helps the business persona because it's helping them with labeling the data. It's also helping select the best model. It's doing things under the covers to optimize things like hyper-parameter tuning, but you know the end-user doesn't have to know about all these things right? So you're tryin' to lift, and it comes back to your point on application developers, it allows you to lift the barrier for people to do these tasks. >> Even for professional data scientists, there may be a vast library of models that they don't necessarily know what is the best fit for the particular task. Ideally you should have, the infrastructure should recommend and choose, under various circumstances, the models, and the algorithms, the libraries, whatever for you for to the task, great. >> One extra feature of PowerAI Enterprises is that it does include a way to do a quick visual inspection of a models accuracy with a small data sample before you invest in scaling over a cluster or large data set. So you can get a visual indicator as to the, whether the models moving towards accuracy or you need to go and test an alternate model. >> So it's like a dashboard, of like Gini coefficients and all that stuff, okay. >> Exactly it gives you a snapshot view. And the other thing I was going to mention, you guys talked about application development, data scientists and of course a big message here at the conference is, you know data science meets big data and the work that Hortonworks is doing involving the notion of container support in YARN, GPU awareness in YARN, bringing data science experience, which you can include the PowerAI capability that Tim was talking about, as a workload tightly coupled with Hadoop. And this is where our Power servers are really built, not for just a monolithic building block that always has the same ratio of compute and storage, but fit for purpose servers that can address either GPU optimized workloads, providing the bandwidth enhancements that Tim talked about with the GPU, but also day-to-day servers, that can now support two terrabytes of memory, double the overall memory bandwidth on the box, 44 cores that can support up to 176 threads for parallelization of Spark workloads, Sequel workloads, distributed data science workloads. So it's really about choosing the combination of servers that can really mix this evolving workload need, 'cause a dupe isn't now just map produced, it's a multitude of workloads that you need to be able to mix and match, and bring various capabilities to the table for a compute, and that's where Power8, now Power9 has really been built for this kind of combination workloads where you can add acceleration where it makes sense, add big data, smaller core, smaller memory, where it makes sense, pick and choose. >> So Steve at this show, at DataWorks 2018 here in San Jose, the prime announcement, partnership announced between IBM and Hortonworks was IHAH, which I believe is IBM Host Analytics on Hortonworks. What I want to know is that solution that runs inside, I mean it runs on top of HDP 3.0 and so forth, is there any tie-in from an offering management standpoint between that and PowerAI so you can build models in the PowerAI environment, and then deploy them out to, in conjunction with the IHAH, is there, going forward, I mean just wanted to get a sense of whether those kinds of integrations. >> Well the same data science capability, data science experience, whether you choose to run it in the public cloud, or run it in private cloud monitor on prem, it's the same data science package. You know PowerAI has a set of optimized deep-learning libraries that can provide advantage on power, apply when you choose to run those deployments on our Power system alright, so we can provide additional value in terms of these optimized libraries, this memory bandwidth improvements. So really it depends upon the customer requirements and whether a Power foundation would make sense in some of those deployment models. I mean for us here with Power9 we've recently announced a whole series of Linux Power9 servers. That's our latest family, including as I mentioned, storage dense servers. The one we're showcasing on the floor here today, along with GPU rich servers. We're releasing fresh reference architecture. It's really to support combinations of clustered models that can as I mentioned, fit for purpose for the workload, to bring data science and big data together in the right combination. And working towards cloud models as well that can support mixing Power in ICP with big data solutions as well. >> And before we wrap, we just wanted to wrap. I think in the reference architecture you describe, I'm excited about the fact that you've commercialized distributed deep-learning for the growing number of instances where you're going to build containerized AI and distributing pieces of it across in this multi-cloud, you need the underlying middleware fabric to allow all those pieces to play together into some larger applications. So I've been following DDL because you've, research lab has been posting information about that, you know for quite a while. So I'm excited that you guys have finally commercialized it. I think there's a really good job of commercializing what comes out of the lab, like with Watson. >> Great well a good note to end on. Thanks so much for joining us. >> Oh thank you. Thank you for the, >> Thank you. >> We will have more from theCUBE's live coverage of DataWorks coming up just after this. (bright electronic music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon he is the VP of Cognitive little bit about what you do and you know I spent a lot of time And at the same time as you how the term data scientists on the broader application one of the models I like to think about and the application developers in the industry discussion because again the data scientist skill So in the case of vision, on the one hand, which is geared that really is leveraging the combination down into the GPU's, and this is, that's one of the differences in it. it allows you to lift the barrier for the particular task. So you can get a visual and all that stuff, okay. and the work that Hortonworks is doing in the PowerAI environment, in the right combination. So I'm excited that you guys Thanks so much for joining us. Thank you for the, of DataWorks coming up just after this.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Bob	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Tim Vincent	PERSON	0.99+
IBM	ORGANIZATION	0.99+
James	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Bob Picciano	PERSON	0.99+
Steve	PERSON	0.99+
San Jose	LOCATION	0.99+
100%	QUANTITY	0.99+
44 cores	QUANTITY	0.99+
two guests	QUANTITY	0.99+
Tim	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
10X	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
IBM Power Systems	ORGANIZATION	0.99+
Cognitive Systems Software	ORGANIZATION	0.99+
today	DATE	0.99+
three hours	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Cognitive Systems	ORGANIZATION	0.99+
University of Waterloo	ORGANIZATION	0.98+
third tool	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
50X	QUANTITY	0.96+
PowerAI	TITLE	0.96+
DataWorks 2018	EVENT	0.93+
theCUBE	ORGANIZATION	0.93+
two terrabytes	QUANTITY	0.93+
up to 176 threads	QUANTITY	0.92+
40	QUANTITY	0.91+
about	DATE	0.91+
Power9	COMMERCIAL_ITEM	0.89+
a year and a half ago	DATE	0.89+
IHAH	ORGANIZATION	0.88+
4X	QUANTITY	0.88+
IHAH	TITLE	0.86+
DataWorks	TITLE	0.85+
Watson	ORGANIZATION	0.84+
Linux Power9	TITLE	0.83+
Snap ML	OTHER	0.78+
Power8	COMMERCIAL_ITEM	0.77+
Spark	TITLE	0.76+
first	QUANTITY	0.73+
PowerAI	ORGANIZATION	0.73+
One extra	QUANTITY	0.71+
DataWorks	ORGANIZATION	0.7+
day two	QUANTITY	0.69+
HDP 3.0	TITLE	0.68+
Watson Analytics	ORGANIZATION	0.65+
Power	ORGANIZATION	0.58+
NVLink	OTHER	0.57+
YARN	ORGANIZATION	0.55+
Hadoop	TITLE	0.55+
theCUBE	EVENT	0.53+
Moore	ORGANIZATION	0.45+
Analytics	ORGANIZATION	0.43+
Power9	ORGANIZATION	0.41+
Host	TITLE	0.36+

Scott Gnau, Hortonworks | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicone Valley, it's theCUBE. Covering Datawork Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of Dataworks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my cohost James Kobielus. We're joined by Scott Gnau, he is the chief technology officer at Hortonworks. Welcome back to theCUBE, Scott. >> Great to be here. >> It's always fun to have you on the show. So, you have really spent your entire career in the data industry. I want to start off at 10,000 feet, and just have you talk about where we are now, in terms of customer attitudes, in terms of the industry, in terms of where customers feel, how they're dealing with their data and how they're thinking about their approach in their business strategy. >> Well I have to say, 30 plus years ago starting in the data field, it wasn't as exciting as it is today. Of course, I always found it very exciting. >> Exciting means nerve-wracking. Keep going. >> Or nerve-wracking. But you know, we've been predicting it. I remember even you know, 10, 15 years ago before big data was a thing, it's like oh all this data's going to come, and it's going to be you know 10x what it is. And we were wrong. It was like 5000x, you know what it is. And I think the really exciting part is that data really used to be relegated frankly, to big companies as a derivative work of ERP systems, and so on and so forth. And while that's very interesting, and certainly enabled a whole level of productivity for industry, when you compare that to all of the data flying around everywhere today, whether it be Twitter feeds and even doing live polls, like we did in the opening session today. Data is just being created everywhere. And the same thing applies to that data that applied to the ERP data of old. And that is being able to harness, manage and understand that data is a new business creating opportunity. And you know, we were with some analysts the other day, and I think one of the more quoted things that came out of that when I was speaking with them, was really, like railroads and shipping in the 1800s and oil in the 1900s, data really is the wealth creator of this century. And so that creates a very nerve-wracking environment. It also creates an environment, a very agile and very important technological breakthroughs that enable those things to be turned into wealth. >> So thinking about that, in terms of where we are at this point in time and on the main stage this morning someone had likened it to the interstate highway system, that really revolutionized transportation, but also commerce. >> I love that actually. I may steal it in some of my future presentations. >> That's good but we'll know where you pilfered it. >> Well perhaps if data is oil the edge, in containerized applications and piping data, you know, microbursts of data across the internet of things, is sort of like the new fracking. You know, you're being able to extract more of this precious resource from the territory. >> Hopefully not quite as damaging to the environment. >> Maybe not. I'm sorry for environmentalist if I just offended you, I apologize. >> But I think you know, all of those analogies are very true, and I particularly like the interstate one this morning. Because when I think about what we've done in our core http platform, and I know Arun was here talking about all the great advances that we built into this, the kind of the core hadoop platform. Very traditional. Store data, analyze data but also bring in new kinds of algorithms, rapid innovation and so on. That's really great but that's kind of half of the story. In a device connected world, in a consumer centric world, capturing data at the edge, moving and processing data at the edge is the new normal, right? And so just like the interstate highway system actually created new ways of commerce because we could move people and things more efficiently, moving data and processing data more efficiently is kind of the second part of the opportunity that we have in this new deluge of data. And that's really where we've been with our Hortonworks data flow. And really saying that the complete package of managing data from origination at the edge all the way through analytic to decision that's triggered back at the edge is like the holy grail, right? And building a technology for that footprint, is why I'm certainly excited today. It's not the caffeine, it's just the opportunity of making all of that work. >> You know, one of the, I think the key announcement for me at this show, that you guys made on HDP 3.0 was containerization of more of the capabilities of your distributed environment so that these capabilities, in terms of processing. First of all, capturing and analyzing an moving that data, can be pushed closer to the end points. Can you speak a bit Scott, about this new capability or this containerization support? Within HDP 3.0 but really in your broader portfolio and where you're going with that in terms of addressing edge applications perhaps, autonomous vehicles or you know, whatever you might put into a new smart phone or whatever you put at the edge. Describe the potential containerizations to sort of break this ecosystem wide open. >> Yeah, I think there are a couple of aspects to containerization and by the way, we're like so excited about kind of the cloud first, containerized HDP 3.0 that we launched here today. There's a lot of great tech that our customers have been clamoring for that they can take advantage of. And it's really just the beginning, which again is part of the excitement of being in the technology space and certainly being part of Hortonworks. So containerization affords a couple of things. Certainly, agility. Agility in deploying applications. So, you know for 30 years we've built these enterprise software stacks that were very integrated, hugely complicated systems that could bring together multiple different applications, different workloads and manage all that in a multi-tendency kind of environment. And that was because we had to do that, right? Servers were getting bigger, they were more powerful but not particularly well distributed. Obviously in a containerized world, you now turn that whole paradigm on its head and you say, you know what? I'm just going to collect these three microservices that I need to do this job. I can isolate them. I can have them run in a server-less technology. I can actually allocate in the cloud servers to go run, and when they're done they go away. And I don't pay for them anymore. So thinking about kind of that from a software development deployment implementation perspective, there huge implications but the real value for customers is agility, right? I don't have to wait until next year to upgrade my enterprise software stack to take advantage of this new algorithm. I can simply isolate it inside of a container, have it run, and have it go away. And get the answer, right? And so when I think about, and a number of our keynotes this morning were talking about just kind of the exponential rate of change, this is really the net new norm. Because the only way we can do things faster, is in fact to be able to provide this. >> And it's not just microservices. Also orchestrating them through Kubernetes, and so forth, so they can be. >> Sure. That's the how versus yeah. >> Quickly deployed as an ensemble and then quickly de-provisioned when you don't need them anymore. >> Yeah so then there's obviously the cost aspect, right? >> Yeah. >> So if you're going to run a whole bunch of stuff or even if you have something as mundane as a really big merge join inside of hive. Let me spin up a thousand extra containers to go do that big thing, and then have them go away when it's done. >> And oh, by the way, you'll be deployed on. >> And only pay for it while I'm using it. >> And then you can possibly distribute those containers across different public clouds depending on what's most cost effective at any point in time Azure or AWS or whatever it might be. >> And I tease with Arun, you know the only thing that we haven't solved is for the speed of light, but we're working on it. >> In talking about how this warp speed change, being the new norm, can you talk about some of the most exciting use cases you've seen in terms of the customers and clients that are using Hortonworks in the coolest ways. >> Well I mean obviously autonomous vehicles is one that we all captured all of our imagination. 'Cause we understand how that works. But it's a perfect use case for this kind of technology. But the technology also applies in fraud detection and prevention. It applies in healthcare management, in proactive personalized medicine delivery, and in generating better outcomes for treatment. So, you know, all across. >> It will bind us in every aspect of our lives including the consumer realm increasingly, yeah. >> Yeah, all across the board. And you know one of the things that really changed, right, is well a couple things. A lot of bandwidth so you can start to connect these things. The devices themselves are particularly smart, so you don't any longer have to transfer all the data to a mainframe and then wait three weeks, sorry, wait three weeks for your answer and then come back. You can have analytic models running on and edge device. And think about, you know, that is really real time. And that actually kind of solves for the speed of light. 'Cause you're not waiting for those things to go back and forth. So there are a lot of new opportunities and those architectures really depend on some of the core tenets of ultimately containerization stateless application deployment and delivery. And they also depend on the ability to create feedback loops to do point-to-point and peer kinds of communication between devices. This is a whole new world of how data get moved and how the decisions around date movement get made. And certainly that's what we're excited about, building with the core components. The other implication of all of this, and we've know each other for a long time. Data has gravity. Data movements expensive. It takes time, frankly, you have to pay for the bandwidth and all that kind of stuff. So being able to play the data where it lies becomes a lot more interesting from an application portability perspective and with all of these new sensors, devices and applications out there, a lot more data is living its entire lifecycle in the cloud. And so being able to create that connective tissue. >> Or as being as terralexical on the edge. >> And even on the edge. >> In with machine learn, let me just say, butt in a second. One of the areas that we're focusing on increasingly in Wikibot in terms of our focus on machine learning at the edge, is more and more machine learning frameworks are coming into the browser world. Javascript for the most like tenser flow JS, you know more of this inferencing and training is going to happen inside your browser. That blows a lot of people's minds. It may not be heavy hitting machine learning, but it'll be good enough for a lot of things that people do in their normal life. Where you don't want to round trip back to the cloud. It's all happening right there, in you know, Chrome or whatever you happen to be using. >> Yeah and so the point being now, you know when I think about the early days, talking about scalability, I remember ship being my first one terabyte database. And then the first 10 terabyte database. Yeah, it doesn't sound very exciting. When I think about scalability of the future, it's really going to, scalability is not going to be defined as petabytes or exabytes under management. It's really going to be defined as petabytes or exabytes affected across a grid of storage and processing devices. And that's a whole new technology paradigm, and really that's kind of the driving force behind what we've been building and what we've been talking about at this conference. >> Excellent. >> So when you're talking about these things. I mean how much, are the companies themselves prepared, and do they have the right kind of talent to use the kinds of insights that you're able to extract? And then act on them in the real time. 'Cause you're talking about how this is saving a lot of the waiting around time. So is this really changing the way business gets done, and do companies have the talent to execute? >> Sure. I mean it's changing the way business gets done. We showed a quote on stage this morning from the CEO of Marriott, right? So, I think there a couple of pieces. One is business are increasingly data driven and business strategy is increasingly the data strategy. And so it starts from the top, kind of setting that strategy and understanding the value of that asset and how that needs to be leveraged to drive new business. So that's kind of one piece. And you know, obviously there are more and more folks kind of coming to the realization that that is important. The other thing that's been helpful is, you know, as with any new technology there's always kind of the startup shortage of resource and people start to spool up and learn. You know the really good news, and for the past 10 years I've been working with a number of different university groups. Parents are actually going to universities and demanding that the curriculum include data, and processing and big data and all of these technologies. Because they know that their children educated in that kind of a world, number one, they're going to have a fun job to go to everyday. 'Cause it's going to be something different everyday. But number two they're going to be employed for life. (laughing) >> Yeah. >> They will be solvent. >> Frankly the demand has actually created a catch up in supply that we're seeing. And of course, you know, as tools start to get more mature and more integrated, they also become a little bit easier to use. You know, less, there's a little bit easier deployment and so on. So a combination of, I'm seeing a really good supply, there really, obviously we invest in education through the community. And then frankly, the education system itself, and folks saying this is really the hot job of the next century. You know, I can be the new oil barren. Or I can be the new railroad captain. It's actually creating more supply which is also very helpful. >> Data's the heart of what I call the new stem cell. It's science, technology, engineering, mathematics that you want to implant in the brains of the young as soon as possible. I hear ya. >> Yeah, absolutely. >> Well Scott thanks so much for coming on. But I want to first also, we can't let you go without the fashion statement. You arrived on set wearing it. >> The elephants. >> I mean it was quite a look. >> Well I did it because then you couldn't see I was sweating on my brow. >> Oh please, no, no, no. >> 'Cause I was worried about this tough interview. >> You know one of the things I love about your logo, and I'll just you know, sounds like I'm fawning. The elephant is a very intelligent animal. >> It is indeed. >> My wife's from Indonesia. I remember going back one time they had Asian elephants at a one of these safari parks. And watching it perform, and then my son was very little then. The elephant is a very sensitive, intelligent animal. You don't realize 'till you're up close. They pick up all manner of social cues. I think it's an awesome symbol for a company that's all about data driven intelligence. >> The elephant never forgets. >> Yeah. >> That's what we know. >> That's right we never forget. >> Him forget 'cause he's got a brain. Or she, I'm sorry. He or she has a brain. >> And it's data driven. >> Yeah. >> Thanks very much. >> Great. Well thanks for coming on theCUBE. I'm Rebecca Knight for James Kobielus. We will have more coming up from Dataworks just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicone Valley, he is the chief technology in terms of the industry, in the data field, Exciting means nerve-wracking. and shipping in the 1800s and on the main stage this I love that actually. where you pilfered it. is sort of like the new fracking. to the environment. I apologize. And really saying that the of more of the capabilities of the cloud servers to go run, and so forth, so they can be. and then quickly de-provisioned and then have them go away when it's done. And oh, by the way, And then you can possibly is for the speed of light, Hortonworks in the coolest ways. But the technology also including the consumer and how the decisions around terralexical on the edge. One of the areas that we're Yeah and so the point being now, the talent to execute? and demanding that the And of course, you know, in the brains of the young the fashion statement. then you couldn't see 'Cause I was worried and I'll just you know, and then my son was very little then. He or she has a brain. for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
James Kobielus	PERSON	0.99+
Scott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Indonesia	LOCATION	0.99+
three weeks	QUANTITY	0.99+
30 years	QUANTITY	0.99+
10x	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Marriott	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
1900s	DATE	0.99+
1800s	DATE	0.99+
10,000 feet	QUANTITY	0.99+
Silicone Valley	LOCATION	0.99+
one piece	QUANTITY	0.99+
Dataworks Summit	EVENT	0.99+
AWS	ORGANIZATION	0.99+
Chrome	TITLE	0.99+
theCUBE	ORGANIZATION	0.99+
next year	DATE	0.98+
next century	DATE	0.98+
today	DATE	0.98+
30 plus years ago	DATE	0.98+
Javascript	TITLE	0.98+
second part	QUANTITY	0.98+
Twitter	ORGANIZATION	0.98+
first	QUANTITY	0.97+
Dataworks	ORGANIZATION	0.97+
One	QUANTITY	0.97+
5000x	QUANTITY	0.97+
Datawork Summit 2018	EVENT	0.96+
HDP 3.0	TITLE	0.95+
one	QUANTITY	0.95+
this morning	DATE	0.95+
HDP 3.0	TITLE	0.94+
three microservices	QUANTITY	0.93+
first one terabyte	QUANTITY	0.93+
First	QUANTITY	0.92+
DataWorks Summit 2018	EVENT	0.92+
JS	TITLE	0.9+
Asian	OTHER	0.9+
3.0	TITLE	0.87+
one time	QUANTITY	0.86+
a thousand extra containers	QUANTITY	0.84+
this morning	DATE	0.83+
15 years ago	DATE	0.82+
Arun	PERSON	0.81+
this century	DATE	0.81+
10,	DATE	0.8+
first 10 terabyte	QUANTITY	0.79+
couple	QUANTITY	0.72+
Azure	ORGANIZATION	0.7+
Kubernetes	TITLE	0.7+
theCUBE	EVENT	0.66+
parks	QUANTITY	0.59+
a second	QUANTITY	0.58+
past 10 years	DATE	0.57+
number two	QUANTITY	0.56+
Wikibot	TITLE	0.55+
HDP	COMMERCIAL_ITEM	0.54+
rd.	QUANTITY	0.48+

Ram Venkatesh, Hortonworks & Sudhir Hasbe, Google | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by HortonWorks. >> We are wrapping up Day One of coverage of Dataworks here in San Jose, California on theCUBE. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sudhir Hasbe, who is the director of product management at Google and Ram Venkatesh, who is VP of Engineering at Hortonworks. Ram, Sudhir, thanks so much for coming on the show. >> Thank you very much. >> Thank you. >> So, I want to start out by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. >> Sure, so basically what we announced was support for the Hortonworks DataPlatform and Hortonworks DataFlow, HDP and HDF, running on top of the Google Cloud Platform. So this includes deep integration with Google's cloud storage connector layer as well as it's a certified distribution of HDP to run on the Google Cloud Platform. >> I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises and as they look at moving to cloud, like in GCP, Google Cloud, they want the similar, familiar environment. So, they want the choice to deploy on-premises or Google Cloud, but they want the familiarity of what they've already been using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in cloud, or they wat to build this hybrid solution where the data can reside on-premises, can move to cloud and build these common, hybrid architecture. So, that's what this does. >> So, HDP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there's some tie-in between Hortonworks's real-time or low latency or streaming capabilities from HDF in the Google Cloud. So, could you describe, at a full sort of detail level, the degrees of technical integration between your two offerings here. >> You want to take that? >> Sure, I'll handle that. So, essentially, deep in the heart of HDP, there's the HDFS layer that includes Hadoop compatible file system which is a plug-able file system layer. So, what Google has done is they have provided an implementation of this API for the Google Cloud Storage Connector. So this is the GCS Connector. We've taken the connector and we've actually continued to refine it to work with our workloads and now Hortonworks has actually bundling, packaging, and making this connector be available as part of HDP. >> So bilateral data movement between them? Bilateral workload movement? >> No, think of this as being very efficient when our workloads are running on top of GCP. When they need to get at data, they can get at data that is in the Google Cloud Storage buckets in a very, very efficient manner. So, since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the cloud storage connector. This is a critical part of making sure that the architecture is actually optimized for the cloud. So, at our skill and our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiency that they're looking for as they move to the cloud. So, to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled elastically, independent of each other. So this is a fairly fundamental architectural change. We want to make sure that we enable this in a first-class manner. >> I think that's a key point, right. I think what cloud allows you to do is scale the storage and compute independently. And so, with storing data in Google Cloud Storage, you can like scale that horizontally and then just leverage that as your storage layer. And the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is store the data on GCP, on the cloud storage, and then just use the scale, the compute side of it with HDP and HDF. >> So, if you'll indulge me to a name, another Hortonworks partner for just a hypothetical. Let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of HDP on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling which is more compute intensive and then the separate decoupled storage infrastructure to do the training which is more storage intensive? Is that a capability that would available to your customers? With this integration with Google? >> Yeah, so where we are going with this is we are saying, IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So, you can run your machine learning training jobs, you can run your scoring jobs and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in their particular way with HDP, with Apache Ranger, Atlas, and so forth. So, when they move to the cloud, we want this experience to be seamless from a management standpoint. So, from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they are running in Google Cloud as well. So, we've had this capability on Azure and on AWS, so with this partnership, we are announcing the same type of deep integration with GCP as well. >> So Hortonworks is that one pane of glass across all your product partners for all manner of jobs. Go ahead, Rebecca. >> Well, I just wanted to ask about, we've talked about the reason, the impetus for this. With the customer, it's more familiar for customers, it offers the seamless experience, But, can you delve a little bit into the business problems that you're solving for customers here? >> A lot of times, our customers are at various points on their cloud journey, that for some of them, it's very simple, they're like there's a broom coming by and the datacenter is going away in 12 months and I need to be in the cloud. So, this is where there is a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So, for example, one of our large customers, a travel partner, so they are exploring their new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure, they know they have that for a while. They are spinning up new use cases in the cloud typically for reasons of agility. So, if you, typically many of our customers, they operate large, multi-tenant clusters on-prem. That's nice for, so a very scalable compute for running large jobs. But, if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you can do that. Whereas in this sort of model, what they can say is, they can bring up a new workload and just have the specific versions and dependency that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as... >> Through the containerization of the Spark jobs or whatever. >> Correct, and so containerization as well as even spinning up an entire new environment. Because, in the cloud, given that you have access to elastic compute resources, they can come and go. So, your workloads are much more independent of the underlying cluster than they are on-premise. And this is where sort of the core business benefits around agility, speed of deployment, things like that come into play. >> And also, if you look at the total cost of ownership, really take an example where customers are collecting all this information through the month. And, at month end, you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That's a great scenario for cloud where you don't have to on-premises create an infrastructure, keep it ready. So that's one example where now, in the new partnership, you can collect all the data through the on-premises if you want throughout the month. But, move that and leverage cloud to go ahead and scale and do this workload and finish the books and all. That's one, the second example I can give is, a lot of customers collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still connect all these events through HDP that may be running on-premises with Kafka and then, what you can do is, in-cloud, in GCP, you can deploy HDP, HDF, and you can use the HDF from there for real-time stream processing. So, collect all these clickstream events, use them, make decisions like, hey, which products are selling better?, should we go ahead and give?, how many people are looking at that product?, or how many people have bought it?. That kind of aggregation and real-time at scale, now you can do in-cloud and build these hybrid architectures that are there. And enable scenarios where in past, to do that kind of stuff, you would have to procure hardware, deploy hardware, all of that. Which all goes away. In-cloud, you can do that much more flexibly and just use whatever capacity you have. >> Well, you know, ephemeral workloads are at the heart of what many enterprise data scientists do. Real-world experiments, ad-hoc experiments, with certain datasets. You build a TensorFlow model or maybe a model in Caffe or whatever and you deploy it out to a cluster and so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of an ongoing experimentation program that's, you know, they're building and testing assets that may be or may not be deployed in the production applications. That's you know, so I can see a clear need for that, well, that capability of this announcement in lots of working data science shops in the business world. >> Absolutely. >> And I think coming down to, if you really look at the partnership, right. There are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at-scale at a lower cost, like total cost of ownership, reducing that, running at-scale analytics. That's one of the big things. Again, as I said, the hybrid scenarios. Most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in these hybrid scenarios. And what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they are building and get business value out of all of these. And then, finally, we at Google believe that the world will be more and more real-time over a period of time. Like, we already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions. And this is only going to grow. And this partnership also provides the whole streaming analytics capabilities in-cloud at-scale for customers to build these hybrid plus also real-time streaming scenarios with this package. >> Well it's clear from Google what the Hortonworks partnership gives you in this competitive space, in the multi-cloud space. It gives you that ability to support hybrid cloud scenarios. You're one of the premier public cloud providers and we all know about. And clearly now that you got, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. >> That's perfect, exactly right. >> Well a great note to end on. Thank you so much for coming on theCUBE. Sudhir, Ram, that you so much. >> Thank you, thanks a lot. >> Thank you. >> I'm Rebecca Knight for James Kobielus, we will have more tomorrow from DataWorks. We will see you tomorrow. This is theCUBE signing off. >> From sunny San Jose. >> That's right.

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, for coming on the show. So, I want to start out by asking you to run on the Google Cloud Platform. and as they look at moving to cloud, in the Google Cloud. So, essentially, deep in the heart of HDP, and the cost efficiency is scale the storage and to do the training which and you can have the same that one pane of glass With the customer, it's and just have the specific of the Spark jobs or whatever. of the underlying cluster and then, what you can and so the life of a data that the world will be And clearly now that you got, Sudhir, Ram, that you so much. We will see you tomorrow.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
two	QUANTITY	0.99+
Sudhir	PERSON	0.99+
Ram Venkatesh	PERSON	0.99+
San Jose	LOCATION	0.99+
HortonWorks	ORGANIZATION	0.99+
Sudhir Hasbe	PERSON	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
DataWorks	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Ram	PERSON	0.99+
AWS	ORGANIZATION	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.99+
two offerings	QUANTITY	0.98+
12 months	QUANTITY	0.98+
One	QUANTITY	0.98+
Day One	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
IBM	ORGANIZATION	0.97+
second example	QUANTITY	0.97+
Google Cloud Platform	TITLE	0.96+
Atlas	ORGANIZATION	0.96+
Google Cloud	TITLE	0.94+
Apache Ranger	ORGANIZATION	0.92+
three key areas	QUANTITY	0.92+
Hadoop	TITLE	0.91+
Kafka	TITLE	0.9+
theCUBE	ORGANIZATION	0.88+
earlier this morning	DATE	0.87+
Apache Hive	ORGANIZATION	0.86+
GCP	TITLE	0.86+
one pane	QUANTITY	0.86+
IBM Data Science	ORGANIZATION	0.84+
Azure	TITLE	0.82+
Spark	TITLE	0.81+
first	QUANTITY	0.79+
HDF	ORGANIZATION	0.74+
once in a month	QUANTITY	0.73+
HDP	ORGANIZATION	0.7+
TensorFlow	OTHER	0.69+
Hortonworks DataPlatform	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.61+
GCS	OTHER	0.57+
HDP	TITLE	0.5+
DSX	TITLE	0.49+
Cloud Storage	TITLE	0.47+

Day Two Kickoff | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to day two of theCube's live coverage of DataWorks here in San Jose, California. I'm your host, Rebecca Knight along with my co-host James Kobielus. James, it's great to be here with you in the hosting seat again. >> Day two, yes. >> Exactly. So here we are, this conference, 2,100 attendees from 32 countries, 23 industries. It's a relatively big show. They do three of them during the year. One of the things that I really-- >> It's a well-established show too. I think this is like the 11th year since Yahoo started up the first Hadoop summit in 2008. >> Right, right. >> So it's an established event, yeah go. >> Exactly, exactly. But I really want to talk about Hortonworks the company. This is something that you had brought up in an analyst report before the show started and that was talking about Hortonworks' cash flow positivity for the first time. >> Which is good. >> Which is good, which is a positive sign and yet what are the prospects for this company's financial health? We're still not seeing really clear signs of robust financial growth. >> I think the signs are good for the simple reason they're making significant investments now to prepare for the future that's almost inevitable. And the future that's almost inevitable, and when I say the future, the 2020s, the decade that's coming. Most of their customers will shift more of their workloads, maybe not entirely yet, to public cloud environments for everything they're doing, AI, machine learning, deep learning. And clearly the beneficiaries of that trend will be the public cloud providers, all of whom are Hortonworks' partners and established partners, AWS, Microsoft with Azure, Google with, you know, Google Cloud Platform, IBM with IBM Cloud. Hortonworks, and this is... You know, their partnerships with these cloud providers go back several years so it's not a new initiative for them. They've seen the writing on the wall practically from the start of Hortonworks' founding in 2011 and they now need to go deeper towards making their solution portfolio capable of being deployable on-prem, in cloud, public clouds, and in various and sundry funky combinations called hybrid multi-clouds. Okay, so, they've been making those investments in those partnerships and in public cloud enabling the Hortonworks Data Platform. Here at this show, DataWorks 2018 here in San Jose, they've released the latest major version, HDP 3.0 of their core platform with a lot of significant enhancements related to things that their customers are increasingly doing-- >> Well I want to ask you about those enhancements. >> But also they have partnership announcements, the deep ones of integration and, you know, lift and shift of the Hortonworks portfolio of HDP with Hortonworks DataFlow and DataPlane Services, so that those solutions can operate transparently on those public cloud environments as the customers, as and when the customers choose to shift their workloads. 'Cause Hortonworks really... You know, like Scott Gnau yesterday, I mean just laid it on the line, they know that the more of the public cloud workloads will predominate now in this space. They're just making these speculative investments that they absolutely have to now to prepare the way. So I think this cost that they're incurring now to prepare their entire portfolio for that inevitable future is the right thing to do and that's probably why they still have not attained massive rock and rollin' positive cash flow yet but I think that they're preparing the way for them to do so in the coming decade. >> So their financial future is looking brighter and they're doing the right things. >> Yeah, yes. >> So now let's talk tech. And this is really where you want to be, Jim, I know you. >> Oh I get sleep now and I don't think about tech constantly. >> So as you've said, they're really doing a lot of emphasis now on their public cloud partnerships. >> Yes. >> But they've also launched several new products and upgrades to existing products, what are you seeing that excites you and that you think really will be potential game changers? >> You know, this is geeky but this is important 'cause it's at the very heart of Hortonworks Data Platform 3.0, containerization of more... When you're a data scientist, and you're building a machine learning model using data that's maintained, and is persisted, and processed within Hortonworks Data Platform or any other big data platform, you want the ability increasingly for developing machine learning, deep learning, AI in general, to take that application you might build while you're using TensorFlow models, that you build on HDP, they will containerize it in Docker and, you know, orchestrate it all through Kubernetes and all that wonderful stuff, and deploy it out, those AI, out to increasingly edge computing, mobile computing, embedded computing environments where, you know, the real venture capital mania's happening, things like autonomous vehicles, and you know, drones, and you name it. So the fact is that Hortonworks has made that in many ways the premier new feature of HDP 3.0 announced here this week at the show. That very much harmonizes with what their partners, where their partners are going with containerization of AI. IBM, one of their premier partners, very recently, like last month, I think it was, announced the latest version of IBM, what do they call it, IBM Cloud Private, which has embedded as a core feature containerization within that environment which is a prem-based environment of AI and so forth. The fact that Hortonworks continues to maintain close alignment with the capabilities that its public cloud partners are building to their respective portfolios is important. But also Hortonworks with its, they call it, you know, a single pane of glass, the DataPlane Services for metadata and monitoring and governance and compliance across this sprawling hybrid multi-cloud, these scenarios. The fact that they're continuing to make, in fact, really focusing on deep investments in that portfolio, so that when an IBM introduces or, AWS, whoever, introduces some new feature in their respective platforms, Hortonworks has the ability to, as it were, abstract above and beyond all of that so that the customer, the developer, and the data administrator, all they need to do, if they're a Hortonworks customer, is stay within the DataPlane Services and environment to be able to deploy with harmonized metadata and harmonized policies, and harmonized schemas and so forth and so on, and query optimization across these sprawling environments. So Hortonworks, I think, knows where their bread is buttered and it needs to stay on the DPS, DataPlane Services, side which is why a couple months ago in Berlin, Hortonworks made a, I think, the most significant announcement of the year for them and really for the industry, was that they announced the Data Steward Studio in Berlin. Tech really clearly was who addressed the GDPR mandate that was coming up but really did a stewardship as an end-to-end workflow for lots of, you know, core enterprise applications, absolutely essential. Data Steward Studio is a DataPlane Service that can operate across multi-cloud environments. Hortonworks is going to keep on, you know... They didn't have a DPS, DataPlane Services, announcements here in San Jose this week but you can best believe that next year at this time at this show, and in the interim they'll probably have a number of significant announcements to deepen that portfolio. Once again it's to grease the wheels towards a more purely public cloud future in which there will be Hortonworks DNA inside most of their customers' environments going forward. >> I want to ask you about themes of this year's conference. The thing is is that you were in Berlin at the last big Hortonworks DataWorks Summit. >> (speaks in foreign language) >> And really GDPR dominated the conversations because the new rules and regulations hadn't yet taken effect and companies were sort of bracing for what life was going to be like under GDPR. Now the rules are here, they're here to stay, and companies are really grappling with it, trying to understand the changes and how they can exist in this new regime. What would you say are the biggest themes... We're still talking about GDPR, of course, but what would you say are the bigger themes that are this week's conference? Is it scalability, is it... I mean, what would you say we're going, what do you think has dominated the conversations here? >> Well scalability is not the big theme this week though there are significant scalability announcements this week in the context of HDP 3.0, the ability to persist in a scale-out fashion across multi-cloud, billions of files. Storage efficiency is an important piece of the overall announcement with support for erasure coding, blah blah blah. That's not, you know, that's... Already, Hortonworks, like all of their cloud providers and other big data providers, provide very scalable environments for storage, workload management. That was not the hugest, buzzy theme in terms of the announcements this week. The buzz of course was HDP 3.0. Containerization, that's important, but you know, we just came out of the day two keynote. AI is not a huge focus yet for a lot of the Hortonworks customers who are here, the developers. They're, you know, most of their customers are not yet that far along in their deep learning journeys and whatever but they're definitely going there. There's plenty of really cool keynote discussions including the guy with the autonomous vehicles or whatever that, the thing we just came out of. That was not the predominant theme this week here in terms of the HDP 3.0. I think what it comes down to is that with HDP 3.0... Hive, though you tend to take it for granted, it's been in Hadoop from the very start, practically, Hive is now a full enterprise database and that's the core, one of the cores, of HDP 3.0. Hive itself, Hive 3.0 now is its version, is ACID compliant and that may be totally geeky to the most of the world but that enables it to support transactional applications. So more big data in every environment is supporting more traditional enterprise application, transactional applications that require like two-phase commit and all that goodness. The fact is, you know, Hortonworks have, from what I can see, is the first of the big data vendors to incorporate those enhancements to Hive 3.0 because they're so completely tuned in to the Hive environment in terms of a committer. I think in many ways that is the predominant theme in terms of the new stuff that will actually resonate with the developers, their customers here at the show. And with the, you know, enterprises in general, they can put more of their traditional enterprise application workloads on big data environments and specifically, Hortonworks hopes, its HDP 3.0. >> Well I'm excited to learn more here at the on theCube with you today. We've got a lot of great interviews lined up and a lot of interesting content. We got a great crew too so this is a fun show to do. >> Sure is. >> We will have more from day two of the.

Published Date : Jun 20 2018

SUMMARY :

Live from San Jose, in the heart James, it's great to be here with you One of the things that I really-- I think this is like the So it's an This is something that you had brought up of robust financial growth. in public cloud enabling the Well I want to ask you is the right thing to do doing the right things. And this is really where you Oh I get sleep now and I don't think of emphasis now on their announcement of the year at the last big Hortonworks because the new rules of the announcements this week. this is a fun show to do.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Hortonworks'	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
2011	DATE	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
James	PERSON	0.99+
23 industries	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
Hive 3.0	TITLE	0.99+
2020s	DATE	0.99+
next year	DATE	0.99+
this week	DATE	0.99+
32 countries	QUANTITY	0.99+
Hive	TITLE	0.99+
11th year	QUANTITY	0.99+
yesterday	DATE	0.99+
first time	QUANTITY	0.99+
GDPR	TITLE	0.98+
last month	DATE	0.98+
DataPlane Services	ORGANIZATION	0.98+
One	QUANTITY	0.98+
Scott Gnau	PERSON	0.98+
2008	DATE	0.98+
three	QUANTITY	0.98+
2,100 attendees	QUANTITY	0.98+
HDP 3.0	TITLE	0.98+
today	DATE	0.98+
Data Steward Studio	ORGANIZATION	0.98+
two-phase	QUANTITY	0.98+
one	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.96+
DataPlane	ORGANIZATION	0.96+
Day two	QUANTITY	0.96+
billions of files	QUANTITY	0.95+
first	QUANTITY	0.95+
day two	QUANTITY	0.95+
DPS	ORGANIZATION	0.95+
Data Platform 3.0	TITLE	0.94+
Hortonworks DataWorks Summit	EVENT	0.94+
DataWorks	EVENT	0.92+

Pandit Prasad, IBM | DataWorks Summit 2018

>> From San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE's live coverage of Data Works here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Pandit Prasad. He is the analytics, projects, strategy, and management at IBM Analytics. Thanks so much for coming on the show. >> Thanks Rebecca, glad to be here. >> So, why don't you just start out by telling our viewers a little bit about what you do in terms of in relationship with the Horton Works relationship and the other parts of your job. >> Sure, as you said I am in Offering Management, which is also known as Product Management for IBM, manage the big data portfolio from an IBM perspective. I was also working with Hortonworks on developing this relationship, nurturing that relationship, so it's been a year since the Northsys partnership. We announced this partnership exactly last year at the same conference. And now it's been a year, so this year has been a journey and aligning the two portfolios together. Right, so Hortonworks had HDP HDF. IBM also had similar products, so we have for example, Big Sequel, Hortonworks has Hive, so how Hive and Big Sequel align together. IBM has a Data Science Experience, where does that come into the picture on top of HDP, so it means before this partnership if you look into the market, it has been you sell Hadoop, you sell a sequel engine, you sell Data Science. So what this year has given us is more of a solution sell. Now with this partnership we go to the customers and say here is NTN experience for you. You start with Hadoop, you put more analytics on top of it, you then bring Big Sequel for complex queries and federation visualization stories and then finally you put Data Science on top of it, so it gives you a complete NTN solution, the NTN experience for getting the value out of the data. >> Now IBM a few years back released a Watson data platform for team data science with DSX, data science experience, as one of the tools for data scientists. Is Watson data platform still the core, I call it dev ops for data science and maybe that's the wrong term, that IBM provides to market or is there sort of a broader dev ops frame work within which IBM goes to market these tools? >> Sure, Watson data platform one year ago was more of a cloud platform and it had many components of it and now we are getting a lot of components on to the (mumbles) and data science experience is one part of it, so data science experience... >> So Watson analytics as well for subject matter experts and so forth. >> Yes. And again Watson has a whole suit of side business based offerings, data science experience is more of a a particular aspect of the focus, specifically on the data science and that's been now available on PRAM and now we are building this arm from stack, so we have HDP, HDF, Big Sequel, Data Science Experience and we are working towards adding more and more to that portfolio. >> Well you have a broader reference architecture and a stack of solutions AI and power and so for more of the deep learning development. In your relationship with Hortonworks, are they reselling more of those tools into their customer base to supplement, extend what they already resell DSX or is that outside of the scope of the relationship? >> No it is all part of the relationship, these three have been the core of what we announced last year and then there are other solutions. We have the whole governance solution right, so again it goes back to the partnership HDP brings with it Atlas. IBM has a whole suite of governance portfolio including the governance catalog. How do you expand the story from being a Hadoop-centric story to an enterprise data-like story, and then now we are taking that to the cloud that's what Truata is all about. Rob Thomas came out with a blog yesterday morning talking about Truata. If you look at it is nothing but a governed data-link hosted offering, if you want to simplify it. That's one way to look at it caters to the GDPR requirements as well. >> For GDPR for the IBM Hortonworks partnership is the lead solution for GDPR compliance, is it Hortonworks Data Steward Studio or is it any number of solutions that IBM already has for data governance and curation, or is it a combination of all of that in terms of what you, as partners, propose to customers for soup to nuts GDPR compliance? Give me a sense for... >> It is a combination of all of those so it has a HDP, its has HDF, it has Big Sequel, it has Data Science Experience, it had IBM governance catalog, it has IBM data quality and it has a bunch of security products, like Gaurdium and it has some new IBM proprietary components that are very specific towards data (cough drowns out speaker) and how do you deal with the personal data and sensitive personal data as classified by GDPR. I'm supposed to query some high level information but I'm not allowed to query deep into the personal information so how do you blog those queries, how do you understand those, these are not necessarily part of Data Steward Studio. These are some of the proprietary components that are thrown into the mix by IBM. >> One of the requirements that is not often talked about under GDPR, Ricky of Formworks got in to it a little bit in his presentation, was the notion that the requirement that if you are using an UE citizen's PII to drive algorithmic outcomes, that they have the right to full transparency. It's the algorithmic decision paths that were taken. I remember IBM had a tool under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it was called Watson Curator a few years back, is that a solution that IBM still offers, because I'm getting a sense right now that Hortonworks has a specific solution, not to say that they may not be working on it, that addresses that side of GDPR, do you know what I'm referring to there? >> I'm not aware of something from the Hortonworks side beyond the Data Steward Studio, which offers basically identification of what some of the... >> Data lineage as opposed to model lineage. It's a subtle distinction. >> It can identify some of the personal information and maybe provide a way to tag it and hence, mask it, but the Truata offering is the one that is bringing some new research assets, after GDPR guidelines became clear and then they got into they are full of how do we cater to those requirements. These are relatively new proprietary components, they are not even being productized, that's why I am calling them proprietary components that are going in to this hosting service. >> IBM's got a big portfolio so I'll understand if you guys are still working out what position. Rebecca go ahead. >> I just wanted to ask you about this new era of GDPR. The last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it, in the sense of they're really still understand the ripple effects and how it's all going to play out? How would you describe your interactions with companies in terms of how they're dealing with these new requirements? >> They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. For example I met with a customer and they are a multi-national company. They have data centers across different geos and they asked me, I have somebody from Asia trying to query the data so that the query should go to Europe, but the query processing should not happen in Asia, the query processing all should happen in Europe, and only the output of the query should be sent back to Asia. You won't be able to think in these terms before the GDPR guidance era. >> Right, exceedingly complicated. >> Decoupling storage from processing enables those kinds of fairly complex scenarios for compliance purposes. >> It's not just about the access to data, now you are getting into where the processing happens were the results are getting displayed, so we are getting... >> Severe penalties for not doing that so your customers need to keep up. There was announcement at this show at Dataworks 2018 of an IBM Hortonwokrs solution. IBM post-analytics with with Hortonworks. I wonder if you could speak a little bit about that, Pandit, in terms of what's provided, it's a subscription service? If you could tell us what subset of IBM's analytics portfolio is hosted for Hortonwork's customers? >> Sure, was you said, it is a a hosted offering. Initially we are starting of as base offering with three products, it will have HDP, Big Sequel, IBM DB2 Big Sequel and DSX, Data Science Experience. Those are the three solutions, again as I said, it is hosted on IBM Cloud, so customers have a choice of different configurations they can choose, whether it be VMs or bare metal. I should say this is probably the only offering, as of today, that offers bare metal configuration in the cloud. >> It's geared to data scientist developers and machine-learning models will build the models and train them in IBM Cloud, but in a hosted HDP in IBM Cloud. Is that correct? >> Yeah, I would rephrase that a little bit. There are several different offerings on the cloud today and we can think about them as you said for ad-hoc or ephemeral workloads, also geared towards low cost. You think about this offering as taking your on PRAM data center experience directly onto the cloud. It is geared towards very high performance. The hardware and the software they are all configured, optimized for providing high performance, not necessarily for ad-hoc workloads, or ephemeral workloads, they are capable of handling massive workloads, on sitcky workloads, not meant for I turned this massive performance computing power for a couple of hours and then switched them off, but rather, I'm going to run these massive workloads as if it is located in my data center, that's number one. It comes with the complete set of HDP. If you think about it there are currently in the cloud you have Hive and Hbase, the sequel engines and the stories separate, security is optional, governance is optional. This comes with the whole enchilada. It has security and governance all baked in. It provides the option to use Big Sequel, because once you get on Hadoop, the next experience is I want to run complex workloads. I want to run federated queries across Hadoop as well as other data storage. How do I handle those, and then it comes with Data Science Experience also configured for best performance and integrated together. As a part of this partnership, I mentioned earlier, that we have progress towards providing this story of an NTN solution. The next steps of that are, yeah I can say that it's an NTN solution but are the product's look and feel as if they are one solution. That's what we are getting into and I have featured some of those integrations. For example Big Sequel, IBM product, we have been working on baking it very closely with HDP. It can be deployed through Morey, it is integrated with Atlas and Granger for security. We are improving the integrations with Atlas for governance. >> Say you're building a Spark machine learning model inside a DSX on HDP within IH (mumbles) IBM hosting with Hortonworks on HDP 3.0, can you then containerize that machine learning Sparks and then deploy into an edge scenario? >> Sure, first was Big Sequel, the next one was DSX. DSX is integrated with HDP as well. We can run DSX workloads on HDP before, but what we have done now is, if you want to run the DSX workloads, I want to run a Python workload, I need to have Python libraries on all the nodes that I want to deploy. Suppose you are running a big cluster, 500 cluster. I need to have Python libraries on all 500 nodes and I need to maintain the versioning of it. If I upgrade the versions then I need to go and upgrade and make sure all of them are perfectly aligned. >> In this first version will you be able build a Spark model and a Tesorflow model and containerize them and deploy them. >> Yes. >> Across a multi-cloud and orchestrate them with Kubernetes to do all that meshing, is that a capability now or planned for the future within this portfolio? >> Yeah, we have that capability demonstrated in the pedestal today, so that is a new one integration. We can run virtual, we call it virtual Python environment. DSX can containerize it and run data that's foreclosed in the HDP cluster. Now we are making use of both the data in the cluster, as well as the infrastructure of the cluster itself for running the workloads. >> In terms of the layers stacked, is also incorporating the IBM distributed deep-learning technology that you've recently announced? Which I think is highly differentiated, because deep learning is increasingly become a set of capabilities that are across a distributed mesh playing together as is they're one unified application. Is that a capability now in this solution, or will it be in the near future? DPL distributed deep learning? >> No, we have not yet. >> I know that's on the AI power platform currently, gotcha. >> It's what we'll be talking about at next year's conference. >> That's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration, next one is, depending on how the customers react to it, definitely we're thinking about bare metal with GPUs optimized for Tensorflow workloads. >> Exciting, we'll be tuned in the coming months and years I'm sure you guys will have that. >> Pandit, thank you so much for coming on theCUBE. We appreciate it. I'm Rebecca Knight for James Kobielus. We will have, more from theCUBE's live coverage of Dataworks, just after this.

Published Date : Jun 19 2018

SUMMARY :

Brought to you by Hortonworks. Thanks so much for coming on the show. and the other parts of your job. and aligning the two portfolios together. and maybe that's the wrong term, getting a lot of components on to the (mumbles) and so forth. a particular aspect of the focus, and so for more of the deep learning development. No it is all part of the relationship, For GDPR for the IBM Hortonworks partnership the personal information so how do you blog One of the requirements that is not often I'm not aware of something from the Hortonworks side Data lineage as opposed to model lineage. It can identify some of the personal information if you guys are still working out what position. in the sense of they're really still understand the and interpret the requirements coming to terms kinds of fairly complex scenarios for compliance purposes. It's not just about the access to data, I wonder if you could speak a little that offers bare metal configuration in the cloud. It's geared to data scientist developers in the cloud you have Hive and Hbase, can you then containerize that machine learning Sparks on all the nodes that I want to deploy. In this first version will you be able build of the cluster itself for running the workloads. is also incorporating the IBM distributed It's what we'll be talking next one is, depending on how the customers react to it, I'm sure you guys will have that. Pandit, thank you so much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Pandit	PERSON	0.99+
last year	DATE	0.99+
Python	TITLE	0.99+
yesterday morning	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
three solutions	QUANTITY	0.99+
Ricky	PERSON	0.99+
Northsys	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Pandit Prasad	PERSON	0.99+
GDPR	TITLE	0.99+
IBM Analytics	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
both	QUANTITY	0.99+
one year ago	DATE	0.98+
Hortonwork	ORGANIZATION	0.98+
three	QUANTITY	0.98+
today	DATE	0.98+
DSX	TITLE	0.98+
Formworks	ORGANIZATION	0.98+
this year	DATE	0.98+
Atlas	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Granger	ORGANIZATION	0.97+
Gaurdium	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Data Steward Studio	ORGANIZATION	0.97+
two portfolios	QUANTITY	0.97+
Truata	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
one solution	QUANTITY	0.96+
one way	QUANTITY	0.95+
next year	DATE	0.94+
500 nodes	QUANTITY	0.94+
NTN	ORGANIZATION	0.93+
Watson	TITLE	0.93+
Hortonworks	PERSON	0.93+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Dan Potter. He is the VP Product Management at Attunity and also Ali Bajwah, who is the principal partner solutions engineer at Hortonworks. Thanks so much for coming on theCUBE. >> Pleasure to be here. >> It's good to be here. >> So I want to start with you, Dan, and have you tell our viewers a little bit about the company based in Boston, Massachusetts, what Attunity does. >> Attunity, we're a data integration vendor. We are best known as a provider of real-time data movement from transactional systems into data lakes, into clouds, into streaming architectures, so it's a modern approach to data integration. So as these core transactional systems are being updated, we're able to take those changes and move those changes where they're needed when they're needed for analytics for new operational applications, for a variety of different tasks. >> Change data capture. >> Change data capture is the heart of our-- >> They are well known in this business. They have changed data capture. Go ahead. >> We are. >> So tell us about the announcement today that Attunity has made at the Hortonworks-- >> Yeah, thank you, it's a great announcement because it showcases the collaboration between Attunity and Hortonworks and it's all about taking the metadata that we capture in that integration process. So we're a piece of a data lake architecture. As we are capturing changes from those source systems, we are also capturing the metadata, so we understand the source systems, we understand how the data gets modified along the way. We use that metadata internally and now we're built extensions to share that metadata into Atlas and to be able to extend that out through Atlas to higher data governance initiatives, so Data Steward Studio, into the DataPlane Services, so it's really important to be able to take the metadata that we have and to add to it the metadata that's from the other sources of information. >> Sure, for more of the transactional semantics of what Hortonworks has been describing they've baked in to HDP in your overall portfolios. Is that true? I mean, that supports those kind of requirements. >> With HTP, what we're seeing is you know the EDW optimization play has become more and more important for a lot of customers as they try to optimize the data that their EDWs are working on, so it really gels well with what we've done here with Attunity and then on the Atlas side with the integration on the governance side with GDPR and other sort of regulations coming into the play now, you know, those sort of things are becoming more and more important, you know, specifically around the governance initiative. We actually have a talk just on Thursday morning where we're actually showcasing the integration as well. >> So can you talk a little bit more about that for those who aren't going to be there for Thursday. GDPR was really a big theme at the DataWorks Berlin event and now we're in this new era and it's not talked about too, too much, I mean we-- >> And global business who have businesses at EU, but also all over the world, are trying to be systematic and are consistent about how they manage PII everywhere. So GDPR are those in EU regulation, really in many ways it's having ripple effects across the world in terms of practices. >> Absolutely and at the heart of understanding how you protect yourself and comply, I need to understand my data, and that's where metadata comes in. So having a holistic understanding of all of the data that resides in your data lake or in your cloud, metadata becomes a key part of that. And also in terms of enforcing that, if I understand my customer data, where the customer data comes from, the lineage from that, then I'm able to apply the protections of the masking on top of that data. So it's really, the GDPR effect has had, you know, it's created a broad-scale need for organizations to really get a handle on metadata so the timing of our announcement just works real well. >> And one nice thing about this integration is that you know it's not just about being able to capture the data in Atlas, but now with the integration of Atlas and Ranger, you can do enforcement of policies based on classifications as well, so if you can tag data as PCI, PII, personal data, that can get enforced through Ranger to say, hey, only certain admins can access certain types of data and now all that becomes possible once we've taken the initial steps of the Atlas integration. >> So with this collaboration, and it's really deepening an existing relationship, so how do you go to market? How do you collaborate with each other and then also service clients? >> You want to? >> Yeah, so from an engineering perspective, we've got deep roots in terms of being a first-class provider into the Hortonworks platform, both HDP and HDF. Last year about this time, we announced our support for acid merge capabilities, so the leading-edge work that Hortonworks has done in bringing acid compliance capabilities into Hive, was a really important one, so our change to data capture capabilities are able to feed directly into that and be able to support those extensions. >> Yeah, we have a lot of you know really key customers together with Attunity and you know maybe a a result of that they are actually our ISV of the Year as well, which they probably showcase on their booth there. >> We're very proud of that. Yeah, no, it's a nice honor for us to get that distinction from Hortonworks and it's also a proof point to the collaboration that we have commercially. You know our sales reps work hand in hand. When we go into a large organization, we both sell to very large organizations. These are big transformative initiatives for these organizations and they're looking for solutions not technologies, so the fact that we can come in, we can show the proof points from other customers that are successfully using our joint solution, that's really, it's critical. >> And I think it helps that they're integrating with some of our key technologies because, you know, that's where our sales force and our customers really see, you know, that as well as that's where we're putting in the investment and that's where these guys are also investing, so it really, you know, helps the story together. So with Hive, we're doing a lot of investment of making it closer and closer to a sort of real-time database, where you can combine historical insights as well as your, you know, real-time insights. with the new acid merge capabilities where you can do the inserts, updates and deletes, and so that's exactly what Attunity's integrating with with Atlas. We're doing a lot of investments there and that's exactly what these guys are integrating with. So I think our customers and prospects really see that and that's where all the wins are coming from. >> Yeah, and I think together there were two main barriers that we saw in terms of customers getting the most out of their data lake investment. One of them was, as I'm moving data into my data lake, I need to be able to put some structure around this, I need to be able to handle continuously updating data from multiple sources and that's what we introduce with Attunity composed for Hive, building out the structure in an automated fashion so I've got analytics-ready data and using the acid merge capabilities just made those updates much easier. The second piece was metadata. Business users need to have confidence that the data that they're using. Where did this come from? How is it modified? And overcoming both of those is really helping organizations make the most of those investments. >> How would you describe customer attitudes right now in terms of their approach to data because I mean, as we've talked about, data is the new oil, so there's a real excitement and there's a buzz around it and yet there's also so many high-profile cases of breeches and security concerns, so what would you say, is it that customers, are they more excited or are they more trepidatious? How would you describe the CIL mindset right now? >> So I think security and governance has become top of minds right, so more and more the serveways that we've taken with our customers, right, you know, more and more customers are more concerned about security, they're more concerned about governance. The joke is that we talk to some of our customers and they keep talking to us about Atlas, which is sort of one of the newer offerings on governance that we have, but then we ask, "Hey, what about Ranger for enforcement?" And they're like, "Oh, yeah, that's a standard now." So we have Ranger, now it's a question of you know how do we get our you know hooks into the Atlas and all that kind of stuff, so yeah, definitely, as you mentioned, because of GDPR, because of all these kind of issues that have happened, it's definitely become top of minds. >> And I would say the other side of that is there's real excitement as well about the possibilities. Now bringing together all of this data, AI, machine learning, real-time analytics and real-time visualization. There's analytic capabilities now that organizations have never had, so there's great excitement, but there's also trepidation. You know, how do we solve for both of those? And together, we're doing just that. >> But as you mentioned, if you look at Europe, some of the European companies that are more hit by GDPR, they're actually excited that now they can, you know, really get to understand their data more and do better things with it as a result of you know the GDPR initiative. >> Absolutely. >> Are you using machine learning inside of Attunity in a Hortonworks context to find patterns in that data in real time? >> So we enable data scientists to build those models. So we're not only bringing the data together but again, part of the announcement last year is the way we structure that data in Hive, we provide a complete historic data store so every single transaction that has happened and we send those transactions as they happen, it's at a big append, so if you're a data scientist, I want to understand the complete history of the transactions of a customer to be able to build those models, so building those out in Hive and making those analytics ready in Hive, that's what we do, so we're a key enabler to machine learning. >> Making analytics ready rather than do the analytics in the spring, yeah. >> Absolutely. >> Yeah, the other side to that is that because they're integrated with Atlas, you know, now we have a new capability called DataPlane and Data Steward Studio so the idea there is around multi-everything, so more and more customers have multiple clusters whether it's on-prem, in the cloud, so now more and more customers are looking at how do I get a single glass pane of view across all my data whether it's on-prem, in the cloud, whether it's IOT, whether it's data at rest, right, so that's where DataPlane comes in and with the Data Steward Studio, which is our second offering on top of DataPlane, they can kind of get that view across all their clusters, so as soon as you know the data lands from Attunity into Atlas, you can get a view into that across as a part of Data Steward Studio, and one of the nice things we do in Data Steward Studio is that we also have machine learning models to do some profiling, to figure out that hey, this looks like a credit card, so maybe I should suggest this as a tag of sensitive data and now the end user, the end administration has the option of you know saying that okay, yeah, this is a credit card, I'll accept that tag, or they can reject that and pick one of their own. >> Will any of this going forward of the Attunity CDC change in the capture capability be containerized for deployment to the edges in HDP 3.0? I mean, 'cause it seems, I mean for internetive things, edge analytics and so forth, change data capture, is it absolutely necessary to make the entire, some call it the fog computing, cloud or whatever, to make it a completely transactional environment for all applications from micro endpoint to micro endpoint? Are there any plans to do that going forward? >> Yeah, so I think what HDP 3.0 as you mentioned right, one of the key factors that was coming into play was around time to value, so with containerization now being able to bring third-party apps on top of Yarn through Docker, I think that's definitely an avenue that we're looking at. >> Yes, we're excited about that with 3.0 as well, so that's definitely in the cards for us. >> Great, well, Ali and Dan, thank you so much for coming on theCUBE. It's fun to have you here. >> Nice to be here, thank you guys. >> Great to have you. >> Thank you, it was a pleasure. >> I'm Rebecca Knight, for James Kobielus, we will have more from DataWorks in San Jose just after this. (techno music)

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Dan Potter	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Ali Bajwah	PERSON	0.99+
Dan	PERSON	0.99+
Ali Bajwa	PERSON	0.99+
Ali	PERSON	0.99+
James Kobielus	PERSON	0.99+
Thursday morning	DATE	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
Attunity	ORGANIZATION	0.99+
Last year	DATE	0.99+
One	QUANTITY	0.99+
second piece	QUANTITY	0.99+
GDPR	TITLE	0.99+
Atlas	ORGANIZATION	0.99+
Thursday	DATE	0.99+
both	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
Ranger	ORGANIZATION	0.98+
second offering	QUANTITY	0.98+
DataWorks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
Atlas	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
today	DATE	0.97+
DataWorks Summit 2018	EVENT	0.96+
two main barriers	QUANTITY	0.95+
DataPlane Services	ORGANIZATION	0.95+
DataWorks Summit 2018	EVENT	0.94+
one	QUANTITY	0.93+
San Jose, California	LOCATION	0.93+
Docker	TITLE	0.9+
single glass	QUANTITY	0.87+
3.0	OTHER	0.85+
European	OTHER	0.84+
Attunity	PERSON	0.84+
Hive	LOCATION	0.83+
HDP 3.0	OTHER	0.82+
one nice thing	QUANTITY	0.82+
DataWorks Berlin	EVENT	0.81+
EU	ORGANIZATION	0.81+
first	QUANTITY	0.8+
DataPlane	TITLE	0.8+
EU	LOCATION	0.78+
EDW	TITLE	0.77+
Data Steward Studio	ORGANIZATION	0.73+
Hive	ORGANIZATION	0.73+
Data Steward Studio	TITLE	0.69+
single transaction	QUANTITY	0.68+
Ranger	TITLE	0.66+
Studio	COMMERCIAL_ITEM	0.63+
CDC	ORGANIZATION	0.58+
DataPlane	ORGANIZATION	0.55+
them	QUANTITY	0.53+
HDP 3.0	OTHER	0.52+

Tendü Yogurtçu, Syncsort | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, It's theCUBE, covering DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in San Jose, California, I'm your host, along with my cohost, James Kobielus. We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. Thanks so much for coming on theCUBE, for returning to theCUBE I should say. >> Thank you Rebecca and James. It's always a pleasure to be here. >> So you've been on theCUBE before and the last time you were talking about Syncsort's growth. So can you give our viewers a company update? Where you are now? >> Absolutely, Syncsort has seen extraordinary growth within the last the last three year. We tripled our revenue, doubled our employees and expanded the product portfolio significantly. Because of this phenomenal growth that we have seen, we also embarked on a new initiative with refreshing our brand. We rebranded and this was necessitated by the fact that we have such a broad portfolio of products and we are actually showing our new brand here, articulating the value our products bring with optimizing existing infrastructure, assuring data security and availability and advancing the data by integrating into next generation analytics platforms. So it's very exciting times in terms of Syncsort's growth. >> So the last time you were on the show it was pre-GT prop PR but we were talking before the cameras were rolling and you were explaining the kinds of adoption you're seeing and what, in this new era, you're seeing from customers and hearing from customers. Can you tell our viewers a little bit about it? >> When we were discussing last time, I talked about four mega trends we are seeing and those mega trends were primarily driven by the advanced business and operation analytics. Data governance, cloud, streaming and data science, artificial intelligence. And we talked, we really made a lot of announcement and focus on the use cases around data governance. Primarily helping our customers for the GDPR Global Data Protection Regulation initiatives and how we can create that visibility in the enterprise through the data by security and lineage and delivering trust data sets. Now we are talking about cloud primarily and the keynotes, this event and our focus is around cloud, primarily driven by again the use cases, right? How the businesses are adopting to the new era. One of the challenges that we see with our enterprise customers, over 7000 customers by the way, is the ability to future-proof their applications. Because this is a very rapidly changing stack. We have seen the keynotes talking about the importance of how do you connect your existing infrastructure with the future modern, next generation platforms. How do you future-proof the platform, make a diagnostic about whether it's Amazon, Microsoft of Google Cloud. Whether it's on-premise in legacy platforms today that the data has to be available in the next generation platforms. So the challenge we are seeing is how do we keep the data fresh? How do we create that abstraction that applications are future-proofed? Because organizations, even financial services customers, banking, insurance, they now have at least one cluster running in the public cloud. And there's private implementations, hybrid becomes the new standard. So our focus and most recent announcements have been around really helping our customers with real-time resilient changes that capture, keeping the data fresh, feeding into the downstream applications with the streaming and messaging data frames, for example Kafka, Amazon Kinesis, as well as keeping the persistent stores and how to Data Lake on-premise in the cloud fresh. >> Puts you into great alignment with your partner Hortonworks so, Tendu I wonder if we are here at DataWorks, it's Hortonworks' show, if you can break out for our viewers, what is the nature, the levels of your relationship, your partnership with Hortonworks and how the Syncsort portfolio plays with HDP 3.0 with Hortonworks DataFlow and the data plan services at a high level. >> Absolutely, so we have been a longtime partner with Hortonworks and a couple of years back, we strengthened our partnership. Hortonworks is reselling Syncsort and we have actually a prescriptive solution for Hadoop and ETL onboarding in Hadoop jointly. And it's very complementary, our strategy is very complementary because what Hortonworks is trying and achieving, is creating that abstraction and future-proofing and interaction consistency around referred as this morning. Across the platform, whether it's on-premise or in the cloud or across multiple clouds. We are providing the data application layer consistency and future-proofing on top of the platform. Leveraging the tools in the platform for orchestration, integrating with HTP, certifying with Trange or HTP, all of the tools DataFlow and at last of course for lineage. >> The theme of this conference is ideas, insights and innovation and as a partner of Hortonworks, can you describe what it means for you to be at this conference? What kinds of community and deepening existing relationships, forming new ones. Can you talk about what happens here? >> This is one of the major events around data and it's DataWorks as opposed to being more specific to the Hadoop itself, right? Because stack is evolving and data challenges are evolving. For us, it means really the interactions with the customers, the organizations and the partners here. Because the dynamics of the use cases is also evolving. For example Data Lake implementations started in U.S. And we started MER European organizations moving to streaming, data streaming applications faster than U.S. >> Why is that? >> Yeah. >> Why are Europeans moving faster to streaming than we are in North America? >> I think a couple of different things might participate. The open sources really enabling organizations to move fast. When the Data Lake initiative started, we have seen a little bit slow start in Europe but more experimentation with the Open Source Stack. And by that the more transformative use cases started really evolving. Like how do I manage interactions of the users with the remote controls as they are watching live TV, type of transformative use cases became important. And as we move to the transformative use cases, streaming is also very critical because lots of data is available and being able to keep the cloud data stores as well as on-premise data stores and downstream applications with fresh data becomes important. We in fact in early June announced that Syncsort's now's a part of Microsoft One Commercial Partner Program. With that our integrate solutions with data integration and data quality are Azure gold certified and Azure ready. We are in co-sale agreement and we are helping jointly a lot of customers, moving data and workloads to Azure and keeping those data stores close to platforms in sync. >> Right. >> So lots of exciting things, I mean there's a lot happening with the application space. There's also lots still happening connected to the governance cases that we have seen. Feeding security and IT operations data into again modern day, next generation analytics platforms is key. Whether it's Splunk, whether it's Elastic, as part of the Hadoop Stack. So we are still focused on governance as part of this multi-cloud and on-premise the cloud implementations as well. We in fact launched our Ironstream for IBMI product to help customers, not just making this state available for mainframes but also from IBMI into Splunk, Elastic and other security information and event management platforms. And today we announced work flow optimization across on-premise and multi-cloud and cloud platforms. So lots of focus across to optimize, assure and integrate portfolio of products helping customers with the business use cases. That's really our focus as we innovate organically and also acquire technologies and solutions. What are the problems we are solving and how we can help our customers with the business and operation analytics, targeting those mega trends around data governance, cloud streaming and also data science. >> What is the biggest trend do you think that is sort of driving all of these changes? As you said, the data is evolving. The use cases are evolving. What is it that is keeping your customers up at night? >> Right now it's still governance, keeping them up at night, because this evolving architecture is also making governance more complex, right? If we are looking at financial services, banking, insurance, healthcare, there are lots of existing infrastructures, mission critical data stores on mainframe IBMI in addition to this gravity of data changing and lots of data with the online businesses generated in the cloud. So how to govern that also while optimizing and making those data stores available for next generation analytics, makes the governance quite complex. So that really keeps and creates a lot of opportunity for the community, right? All of us here to address those challenges. >> Because it sounds to me, I'm hearing Splunk, Advanced Machine did it, I think of the internet of things and sensor grids. I'm hearing IBM mainframes, that's transactional data, that's your customer data and so forth. It seems like much of this data that you're describing that customers are trying to cleanse and consolidate and provide strict governance on, is absolutely essential for them to drive more artificial intelligence into end applications and mobile devices that are being used to drive the customer experience. Do you see more of your customers using your tools to massage the data sets as it were than data scientists then use to build and train their models for deployment into edge applications. Is that an emerging area where your customers are deploying Syncsort? >> Thank you for asking that question. >> It's a complex question. (laughing) But thanks for impacting it... >> It is a complex question but it's very important question. Yes and in the previous discussions, we have seen, and this morning also, Rob Thomas from IBM mentioned it as well, that machine learning and artificial intelligence data science really relies on high-quality data, right? It's 1950s anonymous computer scientist says garbage in, garbage out. >> Yeah. >> When we are using artificial intelligence and machine learning, the implications, the impact of bad data multiplies. Multiplies with the training of historical data. Multiplies with the insights that we are getting out of that. So data scientists today are still spending significant time on preparing the data for the iPipeline, and the data science pipeline, that's where we shine. Because our integrate portfolio accesses the data from all enterprise data stores and cleanses and matches and prepares that in a trusted manner for use for advanced analytics with machine learning, artificial intelligence. >> Yeah 'cause the magic of machine learning for predictive analytics is that you build a statistical model based on the most valid data set for the domain of interest. If the data is junk, then you're going to be building a junk model that will not be able to do its job. So, for want of a nail, the kingdom was lost. For want of a Syncsort, (laughing) Data cleansing and you know governance tool, the whole AI superstructure will fall down. >> Yes, yes absolutely. >> Yeah, good. >> Well thank you so much Tendu for coming on theCUBE and for giving us a lot of background and information. >> Thank you for having me, thank you. >> Good to have you. >> Always a pleasure. >> I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of DataWorks 2018 just after this. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

in the heart of Silicon Valley, It's theCUBE, We're joined by Tendu Yogurtcu, she is the CTO of Syncsort. It's always a pleasure to be here. and the last time you were talking about Syncsort's growth. and expanded the product portfolio significantly. So the last time you were on the show it was pre-GT prop One of the challenges that we see with our enterprise and how the Syncsort portfolio plays with HDP 3.0 We are providing the data application layer consistency and innovation and as a partner of Hortonworks, can you Because the dynamics of the use cases is also evolving. When the Data Lake initiative started, we have seen a little What are the problems we are solving and how we can help What is the biggest trend do you think that is businesses generated in the cloud. massage the data sets as it were than data scientists It's a complex question. Yes and in the previous discussions, we have seen, and the data science pipeline, that's where we shine. If the data is junk, then you're going to be building and for giving us a lot of background and information. of DataWorks 2018 just after this.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Rebecca Knight	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Tendu Yogurtcu	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
U.S.	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Syncsort	ORGANIZATION	0.99+
1950s	DATE	0.99+
San Jose, California	LOCATION	0.99+
Hortonworks'	ORGANIZATION	0.99+
North America	LOCATION	0.99+
early June	DATE	0.99+
DataWorks	ORGANIZATION	0.99+
over 7000 customers	QUANTITY	0.99+
One	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
DataWorks Summit 2018	EVENT	0.97+
Elastic	TITLE	0.97+
one	QUANTITY	0.96+
today	DATE	0.96+
IBMI	TITLE	0.96+
four	QUANTITY	0.95+
Splunk	TITLE	0.95+
Tendü Yogurtçu	PERSON	0.95+
Kafka	TITLE	0.94+
this morning	DATE	0.94+
Data Lake	ORGANIZATION	0.93+
DataWorks	TITLE	0.92+
iPipeline	COMMERCIAL_ITEM	0.91+
DataWorks 2018	EVENT	0.91+
Splunk	PERSON	0.9+
ETL	ORGANIZATION	0.87+
Azure	TITLE	0.85+
Google Cloud	ORGANIZATION	0.83+
Hadoop	TITLE	0.82+
last three year	DATE	0.82+
couple of years back	DATE	0.81+
Syncsort	PERSON	0.8+
HTP	TITLE	0.78+
European	OTHER	0.77+
Tendu	PERSON	0.74+
Europeans	PERSON	0.72+
Data Protection Regulation	TITLE	0.71+
Kinesis	TITLE	0.7+
least one cluster	QUANTITY	0.7+
Ironstream	COMMERCIAL_ITEM	0.66+
Program	TITLE	0.61+
Azure	ORGANIZATION	0.54+
Commercial Partner	OTHER	0.54+
DataFlow	TITLE	0.54+
One	TITLE	0.54+
CTO	PERSON	0.53+
3.0	TITLE	0.53+
Trange	TITLE	0.53+
Stack	TITLE	0.51+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

Joe Morrissey, Hortonworks | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to theCUBE. I'm James Kobielus. I'm lead analyst at Wikibon for big data analytics. Wikibon, of course, is the analyst team inside of SiliconANGLE Media. One of our core offerings is theCUBE and I'm here with Joe Morrissey. Joe is the VP for International at Hortonworks and Hortonworks is the host of Dataworks Summit. We happen to be at Dataworks Summit 2018 in Berlin! Berlin, Germany. And so, Joe, it's great to have you. >> Great to be here! >> We had a number of conversations today with Scott Gnau and others from Hortonworks and also from your customer and partners. Now, you're International, you're VP for International. We've had a partner of yours from South Africa on theCUBE today. We've had a customer of yours from Uruguay. So there's been a fair amount of international presence. We had Munich Re from Munich, Germany. Clearly Hortonworks is, you've been in business as a company for seven years now, I think it is, and you've established quite a presence worldwide, I'm looking at your financials in terms of your customer acquisition, it just keeps going up and up so you're clearly doing a great job of bringing the business in throughout the world. Now, you've told me before the camera went live that you focus on both Europe and Asia PACS, so I'd like to open it up to you, Joe. Tell us how Hortonworks is doing worldwide and the kinds of opportunities you're selling into. >> Absolutely. 2017 was a record year for us. We grew revenues by over 40% globally. I joined to lead the internationalization of the business and you know, not a lot of people know that Hortonworks is actually one of the fastest growing software companies in history. We were the fastest to get to $100 million. Also, now the fastest to get to $200 million but the majority of that revenue contribution was coming from the United States. When I joined, it was about 15% of international contribution. By the end of 2017, we'd grown that to 31%, so that's a significant improvement in contribution overall from our international customer base even though the company was growing globally at a very fast rate. >> And that's also not only fast by any stretch of the imagination in terms of growth, some have said," Oh well, maybe Hortonworks, "just like Cloudera, maybe they're going to plateau off "because the bloom is off the rose of Hadoop." But really, Hadoop is just getting going as a market segment or as a platform but you guys have diversified well beyond that. So give us a sense for going forward. What are your customers? What kind of projects are you positioning and selling Hortonworks solutions into now? Is it a different, well you've only been there 18 months, but is it shifting towards more things to do with streaming, NiFi and so forth? Does it shift into more data science related projects? Coz this is worldwide. >> Yeah. That's a great question. This company was founded on the premise that data volumes and diversity of data is continuing to explode and we believe that it was necessary for us to come and bring enterprise-grade security and management and governance to the core Hadoop platform to make it really ready for the enterprise, and that's what the first evolution of our journey was really all about. A number of years ago, we acquired a company called Onyara, and the logic behind that acquisition was we believe companies now wanted to go out to the point of origin, of creation of data, and manage data throughout its entire life cycle and derive pre-event as well as post-event analytical insight into their data. So what we've seen as our customers are moving beyond just unifying data in the data lake and deriving post-transaction inside of their data. They're now going all the way out to the edge. They're deriving insight from their data in real time all the way from the point of creation and getting pre-transaction insight into data as well so-- >> Pre-transaction data, can you define what you mean by pre-transaction data. >> Well, I think if you look at it, it's really the difference between data in motion and data at rest, right? >> Oh, yes. >> A specific example would be if a customer walks into the store and they've interacted in the store maybe on social before they come in or in some other fashion, before they've actually made the purchase. >> Engagement data, interaction data, yes. >> Engagement, exactly. Exactly. Right. So that's one example, but that also extends out to use cases in IoT as well, so data in motion and streaming data, as you mentioned earlier since become a very, very significant use case that we're seeing a lot of adoption for. Data science, I think companies are really coming to the realization that that's an essential role in the organization. If we really believe that data is the most important asset, that it's the crucial asset in the new economy, then data scientist becomes a really essential role for any company. >> How do your Asian customers' requirements differ, or do they differ from your European cause European customers clearly already have their backs against the wall. We have five weeks until GDPR goes into effect. Do many of your Asian customer, I'm sure a fair number sell into Europe, are they putting a full court, I was going to say in the U.S., a full court press on complying with GDPR, or do they have equivalent privacy mandates in various countries in Asia or a bit of both? >> I think that one of the primary drivers I see in Asia is that a lot of companies there don't have the years of legacy architecture that European companies need to contend with. In some cases, that means that they can move towards next generation data-orientated architectures much quicker than European companies have. They don't have layers of legacy tech that they need to sunset. A great example of that is Reliance. Reliance is the largest company in India, they've got a subsidiary called GO, which is the fastest growing telco in the world. They've implemented our technology to build a next-generation OSS system to improve their service delivery on their network. >> Operational support system. >> Exactly. They were able to do that from the ground up because they formed their telco division around being a data-only company and giving away voice for free. So they can in some extent, move quicker and innovate a little faster in that regards. I do see much more emphasis on regulatory compliance in Europe than I see in Asia. I do think that GDPR amongst other regulations is a big driver of that. The other factor though I think that's influencing that is Cloud and Cloud strategy in general. What we've found is that, customers are drawn to the Cloud for a number of reasons. The economics sometimes can be attractive, the ability to be able to leverage the Cloud vendors' skills in terms of implementing complex technology is attractive, but most importantly, the elasticity and scalability that the Cloud provides us, hugely important. Now, the key concern for customers as they move to the Cloud though, is how do they leverage that as a platform in the context of an overall data strategy, right? And when you think about what a data strategy is all about, it all comes down to understanding what your data assets are and ensuring that you can leverage them for a competitive advantage but do so in a regulatory compliant manner, whether that's data in motion or data at rest. Whether it's on-prem or in the Cloud or in data across multiple Clouds. That's very much a top of mind concern for European companies. >> For your customers around the globe, specifically of course, your area of Europe and Asia, what percentage of your customers that are deploying Hortonworks into a purely public Cloud environment like HDInsight and Microsoft Azure or HDP inside of AWS, in a public Cloud versus in a private on-premises deployment versus in a hybrid public-private multi Cloud. Is it mostly on-prem? >> Most of our business is still on-prem to be very candid. I think almost all of our customers are looking at migrating, some more close to the Cloud. Even those that had intended to have a Cloud for a strategy have now realized that not all workloads belong in the Cloud. Some are actually more economically viable to be on-prem, and some just won't ever be able to move to the Cloud because of regulation. In addition to that, most of our customers are telling us that they actually want Cloud optionality. They don't want to be locked in to a single vendor, so we very much view the future as hybrid Cloud, as multi Cloud, and we hear our customers telling us that rather than just have a Cloud strategy, they need a data strategy. They need a strategy to be able to manage data no matter where it lives, on which tier, to ensure that they are regulatory compliant with that data. But then to be able to understand that they can secure, govern, and manage those data assets at any tier. >> What percentage of your deals involve a partner? Like IBM is a major partner. Do you do a fair amount of co-marketing and joint sales and joint deals with IBM and other partners or are they mostly Hortonworks-led? >> No, partners are absolutely critical to our success in the international sphere. Our partner revenue contribution across EMEA in the past year grew, every region grew by over 150% in terms of channel contribution. Our total channel business was 28% of our total, right? That's a very significant contribution. The growth rate is very high. IBM are a big part of that, as are many other partners. We've got, the very significant reseller channel, we've got IHV and ISV partners that are critical to our success also. Where we're seeing the most impact with with IBM is where we go to some of these markets where we haven't had a presence previously, and they've got deep and long-standing relationships and that helps us accelerate time to value with our customers. >> Yeah, it's been a very good and solid partnership going back several years. Well, Joe, this is great, we have to wrap it up, we're at the end of our time slot. This has been Joe Morrissey who is the VP for International at Hortonworks. We're on theCUBE here at Dataworks Summit 2018 in Berlin, and want to thank you all for watching this segment and tune in tomorrow, we'll have a full slate of further discussions with Hortonworks, with IBM and others tomorrow on theCUBE. Have a good one. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and Hortonworks is the host of Dataworks Summit. and the kinds of opportunities you're selling into. Also, now the fastest to get to $200 million of the imagination in terms of growth, and governance to the core Hadoop platform Pre-transaction data, can you define what you mean maybe on social before they come in or Engagement data, that that's an essential role in the organization. Do many of your Asian customer, that they need to sunset. the ability to be able to leverage the Cloud vendors' skills and Microsoft Azure or Most of our business is still on-prem to be very candid. and joint deals with IBM that are critical to our success also. and want to thank you all for watching this segment and

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Joe Morrissey	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Europe	LOCATION	0.99+
Joe	PERSON	0.99+
Uruguay	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
India	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
seven years	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
28%	QUANTITY	0.99+
South Africa	LOCATION	0.99+
Onyara	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
United States	LOCATION	0.99+
$100 million	QUANTITY	0.99+
$200 million	QUANTITY	0.99+
31%	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
18 months	QUANTITY	0.99+
GO	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.98+
today	DATE	0.98+
U.S.	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
AWS	ORGANIZATION	0.98+
Berlin, Germany	LOCATION	0.98+
over 40%	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.98+
Reliance	ORGANIZATION	0.98+
over 150%	QUANTITY	0.97+
Dataworks Summit	EVENT	0.97+
EMEA	ORGANIZATION	0.97+
first evolution	QUANTITY	0.96+
2018	EVENT	0.96+
European	OTHER	0.96+
SiliconANGLE Media	ORGANIZATION	0.95+
Munich, Germany	LOCATION	0.95+
One	QUANTITY	0.95+
end of 2017	DATE	0.94+
Hadoop	TITLE	0.93+
Cloudera	ORGANIZATION	0.93+
about 15%	QUANTITY	0.93+
past year	DATE	0.92+
theCUBE	ORGANIZATION	0.92+
single vendor	QUANTITY	0.91+
telco	ORGANIZATION	0.89+
Munich Re	ORGANIZATION	0.88+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Vikram Murali, IBM | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Vikram	PERSON	0.99+
John	PERSON	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Walls	PERSON	0.99+
October 30th	DATE	0.99+
2017	DATE	0.99+
April	DATE	0.99+
June	DATE	0.99+
one year	QUANTITY	0.99+
Daniel Hernandez	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
September	DATE	0.99+
one	QUANTITY	0.99+
ten miles	QUANTITY	0.99+
YARN	ORGANIZATION	0.99+
eight miles	QUANTITY	0.99+
Vikram Murali	PERSON	0.99+
New York City	LOCATION	0.99+
North America	LOCATION	0.99+
two day	QUANTITY	0.99+
Python	TITLE	0.99+
two releases	QUANTITY	0.99+
New York	LOCATION	0.99+
two years	QUANTITY	0.99+
three years	QUANTITY	0.99+
six releases	QUANTITY	0.99+
Toronto	LOCATION	0.99+
today	DATE	0.99+
Both	QUANTITY	0.99+
two months	QUANTITY	0.99+
a year	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
third phase	QUANTITY	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
first methodology	QUANTITY	0.98+
First	QUANTITY	0.97+
second thing	QUANTITY	0.97+
one small piece	QUANTITY	0.96+
One	QUANTITY	0.96+
XGBoost	TITLE	0.96+
Cameraman	PERSON	0.96+
about eight miles	QUANTITY	0.95+
Horton Data Platform	ORGANIZATION	0.95+
2017-2018	DATE	0.94+
first	QUANTITY	0.94+
The Valley	LOCATION	0.94+
TensorFlow	TITLE	0.94+

Wrap Up | IBM Fast Track Your Data 2017

>> Narrator: Live from Munich Germany, it's theCUBE, covering IBM, Fast Track Your Data. Brought to you by IBM. >> We're back. This is Dave Vellante with Jim Kobielus, and this is theCUBE, the leader in live tech coverage. We go out to the events. We extract the signal from the noise. We are here covering special presentation of IBM's Fast Track your Data, and we're in Munich Germany. It's been a day-long session. We started this morning with a panel discussion with five senior level data scientists that Jim and I hosted. Then we did CUBE interviews in the morning. We cut away to the main tent. Kate Silverton did a very choreographed scripted, but very well done, main keynote set of presentations. IBM made a couple of announcements today, and then we finished up theCUBE interviews. Jim and I are here to wrap. We're actually running on IBMgo.com. We're running live. Hilary Mason talking about what she's doing in data science, and also we got a session on GDPR. You got to log in to see those sessions. So go ahead to IBMgo.com, and you'll find those. Hit the schedule and go to the Hilary Mason and GDP our channels, and check that out, but we're going to wrap now. Jim two main announcements today. I hesitate to call them big announcements. I mean they were you know just kind of ... I think the word you used last night was perfunctory. You know I mean they're okay, but they're not game changing. So what did you mean? >> Well first of all, when you look at ... Though IBM is not calling this a signature event, it's essentially a signature event. They do these every June or so. You know in the past several years, the signature events have had like a one track theme, whether it be IBM announcing their investing deeply in Spark, or IBM announcing that they're focusing on investing in R as the core language for data science development. This year at this event in Munich, it's really a three track event, in terms of the broad themes, and I mean they're all important tracks, but none of them is like game-changing. Perhaps IBM doesn't intend them to be it seems like. One of which is obviously Europe. We're holding this in Munich. And a couple of things of importance to European customers, first and foremost GDPR. The deadline next year, in terms of compliance, is approaching. So sound the alarm as it were. And IBM has rolled out compliance or governance tools. Download and the go from the information catalog, governance catalog and so forth. Now announcing the consortium with Hortonworks to build governance on top of Apache Atlas, but also IBM announcing that they've opened up a DSX center in England and a machine-learning hub here in Germany, to help their European clients, in those countries especially, to get deeper down into data science and machine learning, in terms of developing those applicants. That's important for the audience, the regional audience here. The second track, which is also important, and I alluded to it. It's governance. In all of its manifestations you need a master catalog of all the assets for building and maintaining and controlling your data applications and your data science applications. The catalog, the consortium, the various offerings at IBM is announced and discussed in great detail. They've brought in customers and partners like Northern Trust, talk about the importance of governance, not just as a compliance mandate, but also the potential strategy for monetizing your data. That's important. Number three is what I call cloud native data applications and how the state of the art in developing data applications is moving towards containerized and orchestrated environments that involve things like Docker and Kubernetes. The IBM DB2 developer community edition. Been in the market for a few years. The latest version they announced today includes kubernetes support. Includes support for JSON. So it's geared towards new generation of cloud and data apps. What I'm getting at ... Those three core themes are Europe governance and cloud native data application development. Each of them is individually important, but none of them is game changer. And one last thing. Data science and machine learning, is one of the overarching envelope themes of this event. They've had Hilary Mason. A lot of discussion there. My sense I was a little bit disappointed because there wasn't any significant new announcements related to IBM evolving their machine learning portfolio into deep learning or artificial intelligence in an environment where their direct competitors like Microsoft and Google and Amazon are making a huge push in AI, in terms of their investments. There's a bit of a discussion, and Rob Thomas got to it this morning, about DSX. Working with power AI, the IBM platform, I would like to hear more going forward about IBM investments in these areas. So I thought it was an interesting bunch of announcements. I'll backtrack on perfunctory. I'll just say it was good that they had this for a lot of reasons, but like I said, none of these individual announcements is really changing the game. In fact like I said, I think I'm waiting for the fall, to see where IBM goes in terms of doing something that's actually differentiating and innovative. >> Well I think that the event itself is great. You've got a bunch of partners here, a bunch of customers. I mean it's active. IBM knows how to throw a party. They've always have. >> And the sessions are really individually awesome. I mean terms of what you learn. >> The content is very good. I would agree. The two announcements that were sort of you know DB2, sort of what I call community edition. Simpler, easier to download. Even Dave can download DB2. I really don't want to download DB2, but I could, and play with it I guess. You know I'm not database guy, but those of you out there that are, go check it out. And the other one was the sort of unified data governance. They tried to tie it in. I think they actually did a really good job of tying it into GDPR. We're going to hear over the next, you know 11 months, just a ton of GDPR readiness fear, uncertainty and doubt, from the vendor community, kind of like we heard with Y2K. We'll see what kind of impact GDPR has. I mean it looks like it's the real deal Jim. I mean it looks like you know this 4% of turnover penalty. The penalties are much more onerous than any other sort of you know, regulation that we've seen in the past, where you could just sort of fluff it off. Say yeah just pay the fine. I think you're going to see a lot of, well pay the lawyers to delay this thing and battle it. >> And one of our people in theCUBE that we interviewed, said it exactly right. It's like the GDPR is like the inverse of Y2K. In Y2K everybody was freaking out. It was actually nothing when it came down to it. Where nobody on the street is really buzzing. I mean the average person is not buzzing about GDPR, but it's hugely important. And like you said, I mean some serious penalties may be in the works for companies that are not complying, companies not just in Europe, but all around the world who do business with European customers. >> Right okay so now bring it back to sort of machine learning, deep learning. You basically said to Rob Thomas, I see machine learning here. I don't see a lot of the deep learning stuff quite yet. He said stay tuned. You know you were talking about TensorFlow and things like that. >> Yeah they supported that ... >> Explain. >> So Rob indicated that IBM very much, like with power AI and DSX, provides an open framework or toolkit for plugging in your, you the developers, preferred machine learning or deep learning toolkit of an open source nature. And there's a growing range of open source deep learning toolkits beyond you know TensorFlow, including Theano and MXNet and so forth, that IBM is supporting within the overall ESX framework, but also within the power AI framework. In other words they've got those capabilities. They're sort of burying that message under a bushel basket, at least in terms of this event. Also one of the things that ... I said this too Mena Scoyal. Watson data platform, which they launched last fall, very important product. Very important platform for collaboration among data science professionals, in terms of the machine learning development pipeline. I wish there was more about the Watson data platform here, about where they're taking it, what the customers are doing with it. Like I said a couple of times, I see Watson data platform as very much a DevOps tool for the new generation of developers that are building machine learning models directly into their applications. I'd like to see IBM, going forward turn Watson data platform into a true DevOps platform, in terms of continuous integration of machine learning and deep learning another statistical models. Continuous training, continuous deployment, iteration. I believe that's where they're going, or probably she will be going. I'd like to see more. I'm expecting more along those lines going forward. What I just described about DevOps for data science is a big theme that we're focusing on at Wikibon, in terms where the industry is going. >> Yeah, yeah. And I want to come back to that again, and get an update on what you're doing within your team, and talk about the research. Before we do that, I mean one of the things we talked about on theCUBE, in the early days of Hadoop is that the guys are going to make the money in this big data business of the practitioners. They're not going to see, you know these multi-hundred billion dollar valuations come out of the Hadoop world. And so far that prediction has held up well. It's the Airbnbs and the Ubers and the Spotifys and the Facebooks and the Googles, the practitioners who are applying big data, that are crushing it and making all the money. You see Amazon now buying Whole Foods. That in our view is a data play, but who's winning here, in either the vendor or the practitioner community? >> Who's winning are the startups with a hot new idea that's changing, that's disrupting some industry, or set of industries with machine learning, deep learning, big data, etc. For example everybody's, with bated breath, waiting for you know self-driving vehicles. And the ecosystem as it develops somebody's going to clean up. And one or more companies, companies we probably never heard of, leveraging everything we're describing here today, data science and containerized distributed applications that involve you know deep learning for you know image analysis and sensor analyst and so forth. Putting it all together in some new fabric that changes the way we live on this planet, but as you said the platforms themselves, whether they be Hadoop or Spark or TensorFlow, whatever, they're open source. You know and the fact is, by it's very nature, open source based solutions, in terms of profit margins on selling those, inexorably migrate to zero. So you're not going to make any money as a tool vendor, or a platform vendor. You got to make money ... If you're going to make money, you make money, for example from providing an ecosystem, within which innovation can happen. >> Okay we have a few minutes left. Let's talk about the research that you're working on. What's exciting you these days? >> Right, right. So I think a lot of people know I've been around the analyst space for a long long time. I've joined the SiliconANGLE Wikibon team just recently. I used to work for a very large solution provider, and what I do here for Wikibon is I focus on data science as the core of next generation application development. When I say next-generation application development, it's the development of AI, deep learning machine learning, and the deployment of those data-driven statistical assets into all manner of application. And you look at the hot stuff, like chatbots for example. Transforming the experience in e-commerce on mobile devices. Siri and Alexa and so forth. Hugely important. So what we're doing is we're focusing on AI and everything. We're focusing on containerization and building of AI micro-services and the ecosystem of the pipelines and the tools that allow you to do that. DevOps for data science, distributed training, federated training of statistical models, so forth. We are also very much focusing on the whole distributed containerized ecosystem, Docker, Kubernetes and so forth. Where that's going, in terms of changing the state of the art, in terms of application development. Focusing on the API economy. All of those things that you need to wrap around the payload of AI to deliver it into every ... >> So you're focused on that intersection between AI and the related topics and the developer. Who is winning in that developer community? Obviously Amazon's winning. You got Microsoft doing a good job there. Google, Apple, who else? I mean how's IBM doing for example? Maybe name some names. Who do you who impresses you in the developer community? But specifically let's start with IBM. How is IBM doing in that space? >> IBM's doing really well. IBM has been for quite a while, been very good about engaging with new generation of developers, using spark and R and Hadoop and so forth to build applications rapidly and deploy them rapidly into all manner of applications. So IBM has very much reached out to, in the last several years, the Millennials for whom all of this, these new tools, have been their core repertoire from the very start. And I think in many ways, like today like developer edition of the DB2 developer community edition is very much geared to that market. Saying you know to the cloud native application developer, take a second look at DB2. There's a lot in DB2 that you might bring into your next application development initiative, alongside your spark toolkit and so forth. So IBM has startup envy. They're a big old company. Been around more than a hundred years. And they're trying to, very much bootstrap and restart their brand in this new context, in the 21st century. I think they're making a good effort at doing it. In terms of community engagement, they have a really good community engagement program, all around the world, in terms of hackathons and developer days, you know meetups here and there. And they get lots of turnout and very loyal customers and IBM's got to broadest portfolio. >> So you still bleed a little bit of blue. So I got to squeeze it out of you now here. So let me push a little bit on what you're saying. So DB2 is the emphasis here, trying to position DB2 as appealing for developers, but why not some of the other you know acquisitions that they've made? I mean you don't hear that much about Cloudant, Dash TV, and things of that nature. You would think that that would be more appealing to some of the developer communities than DB2. Or am I mistaken? Is it IBM sort of going after the core, trying to evolve that core you know constituency? >> No they've done a lot of strategic acquisitions like Cloudant, and like they've acquired Agrath Databases and brought them into their platform. IBM has every type of database or file system that you might need for web or social or Internet of Things. And so with all of the development challenges, IBM has got a really high-quality, fit-the-purpose, best-of-breed platform, underlying data platform for it. They've got huge amounts of developers energized all around the world working on this platform. DB2, in the last several years they've taken all of their platforms, their legacy ... That's the wrong word. All their existing mature platforms, like DB2 and brought them into the IBM cloud. >> I think legacy is the right word. >> Yeah, yeah. >> These things have been around for 30 years. >> And they're not going away because they're field-proven and ... >> They are evolving. >> And customers have implemented them everywhere. And they're evolving. If you look at how IBM has evolved DB2 in the last several years into ... For example they responded to the challenge from SAP HANA. We brought BLU Acceleration technology in memory technology into DB2 to make it screamingly fast and so forth. IBM has done a really good job of turning around these product groups and the product architecture is making them cloud first. And then reaching out to a new generation of cloud application developers. Like I said today, things like DB2 developer community edition, it's just the next chapter in this ongoing saga of IBM turning itself around. Like I said, each of the individual announcements today is like okay that's interesting. I'm glad to see IBM showing progress. None of them is individually disruptive. I think the last week though, I think Hortonworks was disruptive in the sense that IBM recognized that BigInsights didn't really have a lot of traction in the Hadoop spaces, not as much as they would have wished. Hortonworks very much does, and IBM has cast its lot to work with HDP, but HDP and Hortonworks recognizes they haven't achieved any traction with data scientists, therefore DSX makes sense, as part of the Hortonworks portfolio. Likewise a big sequel makes perfect sense as the sequel front end to the HDP. I think the teaming of IBM and Hortonworks is propitious of further things that they'll be doing in the future, not just governance, but really putting together a broader cloud portfolio for the next generation of data scientists doing work in the cloud. >> Do you think Hortonworks is a legitimate acquisition target for IBM. >> Of course they are. >> Why would IBM ... You know educate us. Why would IBM want to acquire Hortonworks? What does that give IBM? Open source mojo, obviously. >> Yeah mojo. >> What else? >> Strong loyalty with the Hadoop market with developers. >> The developer angle would supercharge the developer angle, and maybe make it more relevant outside of some of those legacy systems. Is that it? >> Yeah, but also remember that Hortonworks came from Yahoo, the team that developed much of what became Hadoop. They've got an excellent team. Strategic team. So in many ways, you can look at Hortonworks as one part aqui-hire if they ever do that and one part really substantial and growing solution portfolio that in many ways is complementary to IBM. Hortonworks is really deep on the governance of Hadoop. IBM has gone there, but I think Hortonworks is even deeper, in terms of their their laser focus. >> Ecosystem expansion, and it actually really wouldn't be that expensive of an acquisition. I mean it's you know north of ... Maybe a billion dollars might get it done. >> Yeah. >> You know so would you pay a billion dollars for Hortonworks? >> Not out of my own pocket. >> No, I mean if you're IBM. You think that would deliver that kind of value? I mean you know how IBM thinks about about acquisitions. They're good at acquisitions. They look at the IRR. They have their formula. They blue-wash the companies and they generally do very well with acquisitions. Do you think Hortonworks would fit profile, that monetization profile? >> I wouldn't say that Hortonworks, in terms of monetization potential, would match say what IBM has achieved by acquiring the Netezza. >> Cognos. >> Or SPSS. I mean SPSS has been an extraordinarily successful ... >> Well the day IBM acquired SPSS they tripled the license fees. As a customer I know, ouch, it worked. It was incredibly successful. >> Well, yeah. Cognos was. Netezza was. And SPSS. Those three acquisitions in the last ten years have been extraordinarily pivotal and successful for IBM to build what they now have, which is really the most comprehensive portfolio of fit-to-purpose data platform. So in other words all those acquisitions prepared IBM to duke it out now with their primary competitors in this new field, which are Microsoft, who's newly resurgent, and Amazon Web Services. In other words, the two Seattle vendors, Seattle has come on strong, in a way that almost Seattle now in big data in the cloud is eclipsing Silicon Valley, in terms of where you know ... It's like the locus of innovation and really of customer adoption in the cloud space. >> Quite amazing. Well Google still hanging in there. >> Oh yeah. >> Alright, Jim. Really a pleasure working with you today. Thanks so much. Really appreciate it. >> Thanks for bringing me on your team. >> And Munich crew, you guys did a great job. Really well done. Chuck, Alex, Patrick wherever he is, and our great makeup lady. Thanks a lot. Everybody back home. We're out. This is Fast Track Your Data. Go to IBMgo.com for all the replays. Youtube.com/SiliconANGLE for all the shows. TheCUBE.net is where we tell you where theCUBE's going to be. Go to wikibon.com for all the research. Thanks for watching everybody. This is Dave Vellante with Jim Kobielus. We're out.

Published Date : Jun 25 2017

SUMMARY :

Brought to you by IBM. I mean they were you know just kind of ... I think the word you used last night was perfunctory. And a couple of things of importance to European customers, first and foremost GDPR. IBM knows how to throw a party. I mean terms of what you learn. seen in the past, where you could just sort of fluff it off. I mean the average person is not buzzing about GDPR, but it's hugely important. I don't see a lot of the deep learning stuff quite yet. And there's a growing range of open source deep learning toolkits beyond you know TensorFlow, of Hadoop is that the guys are going to make the money in this big data business of the And the ecosystem as it develops somebody's going to clean up. Let's talk about the research that you're working on. the pipelines and the tools that allow you to do that. Who do you who impresses you in the developer community? all around the world, in terms of hackathons and developer days, you know meetups here Is it IBM sort of going after the core, trying to evolve that core you know constituency? They've got huge amounts of developers energized all around the world working on this platform. Likewise a big sequel makes perfect sense as the sequel front end to the HDP. You know educate us. The developer angle would supercharge the developer angle, and maybe make it more relevant Hortonworks is really deep on the governance of Hadoop. I mean it's you know north of ... They blue-wash the companies and they generally do very well with acquisitions. I wouldn't say that Hortonworks, in terms of monetization potential, would match say I mean SPSS has been an extraordinarily successful ... Well the day IBM acquired SPSS they tripled the license fees. now in big data in the cloud is eclipsing Silicon Valley, in terms of where you know Well Google still hanging in there. Really a pleasure working with you today. And Munich crew, you guys did a great job.

ENTITIES

Entity	Category	Confidence
Kate Silverton	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Patrick	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Germany	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Y2K	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Chuck	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
England	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
second track	QUANTITY	0.99+
Siri	TITLE	0.99+
two	QUANTITY	0.99+
21st century	DATE	0.99+
three track	QUANTITY	0.99+
Rob	PERSON	0.99+
next year	DATE	0.99+
4%	QUANTITY	0.99+
Mena Scoyal	PERSON	0.99+
Alex	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
Cloudant	ORGANIZATION	0.99+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)

Published Date : Jun 14 2017

SUMMARY :

brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Telco	ORGANIZATION	0.99+
Madhu	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jamie Engesser	PERSON	0.99+
Madhu Kochar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
U.S.	LOCATION	0.99+
second step	QUANTITY	0.99+
First	QUANTITY	0.99+
third	QUANTITY	0.99+
yesterday	DATE	0.99+
first step	QUANTITY	0.99+
two activities	QUANTITY	0.99+
San Jose	LOCATION	0.99+
second thing	QUANTITY	0.99+
Hortonwork	ORGANIZATION	0.99+
2015	DATE	0.99+
first	QUANTITY	0.99+
first thought	QUANTITY	0.98+
two things	QUANTITY	0.98+
eight months	QUANTITY	0.98+
three things	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.97+
two next guests	QUANTITY	0.97+
both	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Apache	ORGANIZATION	0.97+

Rob Bearden, Hortonworks & Rob Thomas, IBM Analytics - #DataWorks - #theCUBE

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi, welcome to theCUBE. We are live in San Jose, in the heart of Silicon Valley at the DataWorks Summit, day one. I'm Lisa Martin, with my co-host, George Gilbert. And we're very excited to be talking to two Robs. With Rob squared on the program this morning. Rob Bearden, the CEO of Hortonworks. Welcome, Rob. >> Thank you for having us. >> And Rob Thomas, the VP, GM rather, of IBM Analytics. So, guys, we just came from this really exciting, high energy keynote. The laser show was fantastic, but one of the great things, Rob, that you kicked off with was really showing the journey that Hortonworks has been on, and in a really pretty short period of time. Tremendous inertia, and you talked about the four mega-trends that are really driving enterprises to modernize their data architecture. Cloud, IOT, streaming data, and the fourth, next leg of this is data science. Data science, you said, will be the transformational next leg in the journey. Tell our viewers a little bit more about that. What does that mean for Hortonworks and your partnership with IBM? >> Well, what I think what IBM and Hortonworks now have the ability to do is to bring all the data together across a connected data platform. The data in motion, the data at rest, now have in one common platform, irrespective of the deployment architecture, whether it's on prim across multiple data centers or whether deployed in the cloud. And now that the large volume of data and we have access to it, we can now start to begin to drive the analytics in the end as that data moves through each phase of its life cycle. And what really happens now, is now that we have visibility and access to the inclusive life cycle of the data we can now put a data science framework over that to really now understand and learn those patterns and what's the data telling us, what's the pattern behind that. And we can bring simplification to the data science and turn data science actually into a team sport. Allow them to collaborate, allow them to have access to it. And sort of take the black magic out of doing data science with the framework of the tool and the power of DSX on top of the connected data platform. Now we can advance rapidly the insights in the end of the data and what that really does is drive value really quickly back into the customer. And then we can then begin to bring smart applications via the data science back into the enterprise. So we can now do things like connected car in real time, and have connected car learn as it's moving and through all the patterns, we can now, from a retail standpoint really get smart and accurate about inventory placement and inventory management. From an industrial standpoint, we know in real time, down to the component, what's happening with the machine, and any failures that may happen and be able to eliminate downtime. Agriculture, same kind of... Healthcare, every industry, financial services, fraud detection, money laundering advances that we have but it's all going to be attributable to how machine learning is applied and the DSX platform is the best platform in the world to do that with. >> And one of the things that I thought was really interesting, was that, as we saw enterprises start to embrace Hadoop and Big Data and Segano this needs to co-exist and inter-operate with our traditional applications, our traditional technologies. Now you're saying and seeing data science is going to be strategic business differentiator. You mentioned a number of industries, and there were several of them on stage today. Give us some, maybe some, one of your favorite examples of one of your customers leveraging data science and driving a pretty significant advantage for their business. >> Sure. Yeah, well, to step back a little bit, just a little context, only ten companies have out performed the S&P 500 in each of the last five years. We start looking at what are they doing. Those are companies that have decided data science and machine learning is critical. They've made a big bet on it, and every company needs to be doing that. So a big part of our message today was, kind of, I'd say, open the eyes of everybody to say there is something happening in the market right now. And it can make a huge difference in how you're applying data analytics to improve your business. We announced our first focus on this back in February, and one of our clients that spoke at that event is a company called Argus Healthcare. And Argus has massive amounts of data, sitting on a mainframe, and they were looking for how can we unleash that to do better care of patients, better care for our hospital networks, and they did that with data they had in their mainframe. So they brought data science experience and machine learning to their mainframe, that's what they talked about. What Rob and I have announced today is there's another great trove of data in every organization which is the data inside Hadoop. HDP, leading distribution for that, is a great place to start. So the use case that I just shared, which is on the mainframe, that's going to apply anywhere where there's large amounts of data. And right now there's not a great answer for data science on Hadoop, until today, where data science experience plus HDP brings really, I'd say, an elegant approach to it. It makes it a team sport. You can collaborate, you can interact, you can get education right in the platform. So we have the opportunity to create a next generation of data scientists working with data and HDP. That's why we're excited. >> Let me follow up with this question in your intro that, in terms of sort of the data science experience as this next major building block, to extract, or to build on the value from the data lake, the two companies, your two companies have different sort of, better markets, especially at IBM, but the industry solutions and global business services, you guys can actually build semi-custom solutions around this platform, both the data and the data science experience. With Hortonworks, what are those, what's your go to market motion going to look like and what are the offerings going to look like to the customer? >> They'll be several. You just described a great example, with IBM professional services, they have the ability to take those industry templates and take these data science models and instantly be able to bring those to the data, and so as part of our joint go to market motion, we'll be able now partner, bring those templates, bring those models to not only our customer base, but also part of the new sales go to market motion in the light space, in new customer opportunities and the whole point is, now we can use the enterprise data platforms to bring the data under management in a mission critical way that then bring value to it through these kinds of use case and templates that drive the smart applications into quick time to value. And just increase that time to value for the customers. >> So, how would you look at the mix changing over time in terms of data scientists working with the data to experiment on the model development and the two hard parts that you talked about, data prep and operationalization. So in other words, custom models, the issue of deploying it 11 months later because there's no real process for that that's packaged, and then packaged enterprise apps that are going to bake these models in as part of their functionality that, you know, the way Salesforce is starting to do and Workday is starting to do. How does that change over time? >> It'll be a layering effect. So today, we now have the ability to bring through the connected data platforms all the data under management in a mission critical manner from point of origination through the entire stream till it comes at rest. Now with the data science, through DSX, we can now, then, have that data science framework to where, you know, the analogy I would say, is instead of it being a black science of how you do data access and go through and build the models and determine what the algorithms are and how that yields a result, the analogy is you don't have to be a mechanic to drive a car anymore. The common person can drive a car. So, now we really open up the community business analyst that can now participate and enable data science through collaboration and then we can take those models and build the smart apps and evolve the smart apps that go to that very rapidly and we can accelerate that process also now through the partnership with IBM and bringing their core domain and value that, drivers that they've already built and drop that into the DSX environments and so I think we can accelerate the time to value now much faster and efficient than we've ever been able to do before. >> You mentioned teamwork a number of times, and I'm curious about, you also talked about the business analyst, what's the governance like to facilitate business analysts and different lines of business that have particular access? And what is that team composed of? >> Yeah, well, so let's look at what's happening in the big enterprises in the world right now. There's two major things going one. One is everybody's recognizing this is a multi-cloud world. There's multiple public cloud options, most clients are building a private cloud. They need a way to manage data as a strategic asset across all those multiple cloud environments. The second piece is, we are moving towards, what I would call, the next generation data fabric, which is your warehousing capabilities, your database capabilities, married with Hadoop, married with other open source data repositories and doing that in a seamless fashion. So you need a governance strategy for all of that. And the way I describe governance, simple analogy, we do for data what libraries do for books. Libraries create a catalog of books, they know they have different copies of books, some they archive, but they can access all of the intelligence in the library. That's what we do for data. So when we talk about governance and working together, we're both big supporters of the Atlas project, that will continue, but the other piece, kind of this point around enterprise data fabric is what we're doing with Big SQL. Big SQL is the only 100% ANSI-SQL compliant SQL engine for data across Hadoop and other repositories. So we'll be working closely together to help enterprises evolve in a multi-cloud world to this enterprise data fabric and Big SQL's a big capability for that. >> And an immediate example of that is in our EDW optimization suite that we have today we be loading Big SQL as the platform to do the complex query sector of that. That will go to market with almost immediately. >> Follow up question on the governance, there's, to what extent is end to end governance, meaning from the point of origin through the last mile, you know, if the last mile might be some specialized analytic engine, versus having all the data management capabilities in that fabric, you mentioned operational and analytic, so, like, are customers going to be looking for a provider who can give them sort of end to end capabilities on both the governance side and on all the data management capabilities? Is that sort of a critical decision? >> I believe so. I think there's really two use cases for governance. It's either insights or it's compliance. And if you're focus is on compliance, something like GDPR, as an example, that's really about the life cycle of data from when it starts to when it can be disposed of. So for compliance use case, absolutely. When I say insights as a governance use case, that's really about self-service. The ideal world is you can make your data available to anybody in your organization, knowing that they have the right permissions, that they can access, that they can do it in a protected way and most companies don't have that advantage today. Part of the idea around data science on HDP is if you've got the right governance framework in place suddenly you can enable self-service which is any data scientist or any business analyst can go find and access the data they need. So it's a really key part of delivering on data science, is this governance piece. Now I just talked to clients, they understand where you're going. Is this about compliance or is this about insights? Because there's probably a different starting point, but the end game is similar. >> Curious about your target markets, Tyler talked about the go to market model a minute ago, are you targeting customers that are on mainframes? And you said, I think, in your keynote, 90% of transactional data is in a mainframe. Is that one of the targets, or is it the target, like you mention, Rob, with the EDW optimization solution, are you working with customers who have an existing enterprise data warehouse that needs to be modernized, is it both? >> The good news is it's both. It's about, really the opportunity and mission, is about enabling the next generation data architecture. And within that is again, back to the layering approach, is being able to bring the data under management from point of origination through point of it reg. Now if we look at it, you know, probably 90% of, at least transactional data, sits in the mainframe, so you have to be able to span all data sets and all deployment architectures on prim multi-data center as well as public cloud. And that then, is the opportunity, but for that to then drive value ultimately back, you've got to be able to have then the simplification of the data science framework and toolset to be able to then have the proper insights and basis on which you can bring the new smart applications. And drive the insights, drive the governance through the entire life cycle. >> On the value front, you know, we talk about, and Hortonworks talks about, the fact that this technology can really help a business unlock transformational value across their organization, across lines of business. This conversation, we just talked about a couple of the customer segments, is this a conversation that you're having at the C-suite initially? Where are the business leaders in terms of understanding? We know there's more value here, we probably can open up new business opportunities or are you talking more the data science level? >> Look, it's at different levels. So, data science, machined learning, that is a C-suite topic. A lot of times I'm not sure the audience knows what they're asking for, but they know it's important and they know they need to be doing something. When you go to things like a data architecture, the C-suite discussion there is, I just want to become more productive in how I'm deploying and using technology because my IT budget's probably not going up, if anything it may be going down, so I've got to become a lot more productive and efficient to do that. So it depends on who you're talking to, there's different levels of dialogue. But there's no question in my mind, I've seen, you know, just look at major press Financial Times, Wallstreet Journal last year. CEOs are talking about AI, machine learning, using data as a competitive weapon. It is happening and it's happening right now. What we're doing together, saying how do we make data simple and accessible? How do we make getting there really easy? Because right now it's pretty hard. But we think with the combination of what we're bringing, we make it pretty darn easy. >> So one quick question following up on that, and then I think we're getting close to the end. Which is when the data lakes started out, it was sort of, it seemed like, for many customers a mandate from on high, we need a big data strategy, and that translated into standing up a Hadoop cluster, and that resulted in people realizing that there's a lot to manage there. It sounds like, right now people know machine learning is hot so they need to get data science tools in place, but is there a business capability sort of like the ETL offload was for the initial Hadoop use cases, where you would go to a customer and recommend do this, bite this off as something concrete? >> I'll start and then Rob can comment. Look, the issue's not Hadoop, a lot of clients have started with it. The reason there hasn't been, in some cases, the outcomes they wanted is because just putting data into Hadoop doesn't drive an outcome. What drives an outcome is what do you do with it. How do you change your business process, how do you change what the company's doing with the data, and that's what this is about, it's kind of that next step in the evolution of Hadoop. And that's starting to happen now. It's not happening everywhere, but we think this will start to propel that discussion. Any thoughts you had, Rob? >> Spot on. Data lake was about releasing the constraints of all the silos and being able to bring those together and aggregate that data. And it was the first basis for being able to have a 360 degree or wholistic centralized insight about something and, or pattern, but what then data science does is it actually accelerates those patterns and those lessons learned and the ability to have a much more detailed and higher velocity insight that you can react to much faster, and actually accelerate the business models around this aggregate. So it's a foundational approach with Hadoop. And it's then, as I mentioned in the keynote, the data science platforms, machine learning, and AI actually is what is the thing that transformationally opens up and accelerates those insights, so then new models and patterns and applications get built to accelerate value. >> Well, speaking of transformation, thank you both so much for taking time to share your transformation and the big news and the announcements with Hortonworks and IBM this morning. Thank you Rob Bearden, CEO of Hortonworks, Rob Thomas, General Manager of IBM Analytics. I'm Lisa Martin with my co-host, George Gilbert. Stick around. We are live from day one at DataWorks Summit in the heart of Silicon Valley. We'll be right back. (tech music)

Published Date : Jun 13 2017

SUMMARY :

brought to you by Hortonworks. We are live in San Jose, in the heart of Silicon Valley and the fourth, next leg of this is data science. now have the ability to do And one of the things and every company needs to be doing that. and the data science experience. that drive the smart applications into quick time to value. and the two hard parts that you talked about, and drop that into the DSX environments and doing that in a seamless fashion. in our EDW optimization suite that we have today and most companies don't have that advantage today. Tyler talked about the go to market model a minute ago, but for that to then drive value ultimately back, On the value front, you know, we talk about, and they know they need to be doing something. that there's a lot to manage there. it's kind of that next step in the evolution of Hadoop. and the ability to have a much more detailed and the announcements with Hortonworks and IBM this morning.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Argus	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
Tyler	PERSON	0.99+
February	DATE	0.99+
two companies	QUANTITY	0.99+
second piece	QUANTITY	0.99+
Argus Healthcare	ORGANIZATION	0.99+
last year	DATE	0.99+
360 degree	QUANTITY	0.99+
GDPR	TITLE	0.99+
one	QUANTITY	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
both	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
ten companies	QUANTITY	0.99+
two	QUANTITY	0.99+
fourth	QUANTITY	0.99+
today	DATE	0.99+
two hard parts	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
11 months later	DATE	0.98+
each	QUANTITY	0.98+
two use cases	QUANTITY	0.97+
100%	QUANTITY	0.97+
one quick question	QUANTITY	0.97+
Segano	ORGANIZATION	0.97+
SQL	TITLE	0.96+
four mega-trends	QUANTITY	0.96+
Big SQL	TITLE	0.96+
first basis	QUANTITY	0.94+
one common platform	QUANTITY	0.94+
two major things	QUANTITY	0.92+
Robs	PERSON	0.92+
Wallstreet Journal	ORGANIZATION	0.92+
Financial Times	ORGANIZATION	0.92+

John Kreisa, Hortonworks– DataWorks Summit Europe 2017 #DWS17 #theCUBE

>> Announcer: Live from Munich, Germany, it's theCUBE, covering DataWorks Summit Europe 2017. Brought to you by HORTONWORKS. (electronic music) (crowd) >> Okay, welcome back everyone, we are here live in Munich, Germany, for DataWorks 2017, formerly Hadoop Summit, the European version. Again, different kind of show than the main show in North America, in San Jose, but it's a great show, a lot of great topics. I'm John Furrier, my co-host, Dave Vellante. Our next guest is John Kreisa, Vice President of International Marketing. Great to see you emceeing the event. Great job, great event! >> John Kreisa: Great. >> Classic European event, its got the European vibe. >> Yep. >> Germany everything's tightly buttoned down, very professional. (laughing) But big IOT message-- >> Yes. >> Because in Germany a lot of industrial action-- >> That's right. >> And then Europe, in general, a lot of smart cities, a lot of mobility, and issues. >> Umm-hmm. >> So a lot of IOT, a lot of meat on the bone here. >> Yep. >> So congratulations! >> John Kreisa: Thank you. >> How's your thoughts? Are you happy with the event? Give us by the numbers, how many people, what's the focus? >> Sure, yeah, no, thanks, John, Dave. Long-time CUBE attendee, I'm really excited to be here. Always great to have you guys here-- >> Thanks. >> Thanks. >> And be participating. This is a great event this year. We did change the name as you mentioned from Hadoop Summit to DataWorks Summit. Perhaps, I'll just riff on that a little bit. I think that really was in response to the change in the community, the breadth of technologies. You mentioned IOT, machine learning, and AI, which we had some of in the keynotes. So just a real expansion of from data loading, data streaming, analytics, and machine learning and artificial intelligence, which all sit on top and use the core Hadoop platform. We felt like it was time to expand the conference itself. Open up the aperture to really bring in the other technologies that were involved, and really represent what was already starting to kind of feed into Hadoop Summit, so it's kind of a natural change, a natural evolution. >> And there's a 2-year visibility. We talk about this two years ago. >> John Kreisa: Yeah, yeah. >> That you are starting to see this aperture open up a little bit. >> Yeah. >> But it's interesting. I want to get your thoughts on this because Dave and I were talking yesterday. It's like we've been to every single Hadoop Summit. Even theCUBE's been following it all as you know. It's interesting the big data space was created by the Hadoop ecosystem. >> Umm-hmm. >> So, yeah, you rode in on the Hadoop horse. >> Yeah. >> I get that. A lot of people don't get them. They say, Oh, Hadoop's dead, but it's not. >> No. >> It's evolving to a much broader scope. >> That's right. >> And you guys saw that two years ago. Comment on your reaction to Hadoop is not dead. >> Yeah, wow (laughing). It's far from dead if you look at the momentum, largest conference ever here in Europe. I think strong interest from them. I think we had a very good customer panel, which talked about the usage, right. How they were really transforming. You had Walgreens Booth's talking about how they're redoing their shelf, shelving, and how they're redesigning their stores. Don Ske-bang talking about how they're analyzing, how they replenish their cash machines. Centrica talking about how they redo their... Or how they're driving down cost of energy by being smarter around energy consumption. So, these are real transformative use cases, and so, it's far from dead. Really what might be confusing people is probably the fact that there are so many other technologies and markets that are being enabled by this open source technologies and the breadth of the platform. And I think that's maybe people see it kind of move a little bit back as a platform play. And so, we talk more about streaming and analytics and machine learning, but all that's enabled by Hadoop. It's all riding on top of this platform. And I think people kind of just misconstrue that the fact that there's one enabling-- >> It's a fundamental element, obviously. >> John Kreisa: Yeah. >> But what's the new expansion? IOT, as I mentioned, is big here. >> Umm-hmm. >> But there's a lot more in connective tissue going on, as Shawn Connelly calls it. >> Yeah, yep. >> What are those other things? >> Yeah, so I think, as you said, smart cities, smart devices, the analytics, getting the value out of the technologies. The ability to load it and capture it in new ways with new open source technology, NyFy and some of those other things, Kafka we've heard of. And some of those technologies are enabling the broader use cases, so I don't think it's... I think it's that's really the fundamental change in shift that we see. It's why we renamed it to DataWorks Summit because it's all about the data, right. That's the thing-- >> But I think... Well, if you think about from a customer perspective, to me anyway, what's happened is we went through the adolescent phase of getting this stuff to work and-- >> Yeah. >> And figuring out, Okay, what's the relationship with my enterprise data warehouse, and then they realize, Wow, the enterprise data warehouse is critical to my big data platform. >> Umm-hmm. >> So what's customers have done as they've evolved, as Hadoop has evolved, their big data platforms internally-- >> Umm-hmm. And now they're turning to to their business saying, Okay, we have this platform. Let's now really start to go up the steep part of the S-curve and get more value out of it. >> John Kreisa: Umm-hmm. >> Do you agree with that scenario? >> I would definitely agree with that. I think that as companies have, and in particularly here in Europe, it's interesting because they kind of waited for the technology to mature and its reached that inflection point. To your point, Dave, such that they're really saying, Alright, let's really get this into production. Let's really drive value out of the data that they see and know they have. And there's sort of... We see a sense of urgency here in Europe, to get going and really start to get that value out. Yeah, and we call it a ratchet game. (laughing) The ratchet is, Okay, you get the technology to work. Okay, you still got to keep the lights on. Okay, and oh, by the way, we need some data governance. Let's ratchet it up that side. Oh, we need a CDO! >> Umm-hmm. >> And so, because if you just try to ratchet up one side of the house (laughing) (cross-talk)-- >> Well, Carlo from HPE said it great on our last segment. >> Yeah. >> And I thought this was fundamental. And this was kind of like you had a CUBE moment where it's like, Wow, that's a really amazing insight. And he said something profound, The data is now foundational to all conversations. >> Right. >> And that's from a business standpoint. It's never always been the case. Now, it's like, Okay, you can look at data as a fundamental foundation building block. >> Right. >> And then react from there. So if you get the data locked in, to Dave's point about compliance, you then can then do clever things. You can have a conversation about a dynamic edge or-- >> Right. >> Something else. So the foundational data is really now fundamental, and I think that is... Changes, it's not a database issue. It's just all data. >> Right, now all data-- >> All databases. >> You're right, it's all data. It's driving the business in all different functions. It's operational efficiency. It's new applications. It's customer intimacy. All of those different ways that all these companies are going, We've got this data. We now have the systems, and we can go ahead and move forward with it. And I think that's the momentum that we're seeing here in Europe, as evidence by the conference and those kinds of things, just I think really shows how maybe... We used to say... I'd say when I first moved over here, that Europe was maybe a year and a half behind the U.S., in terms of adoption. I'd say that's shrunk to where a lot of the conversations are the exact same conversations that we're having with big European companies, that we're having with U.S. companies. >> And, even in... >> Yeah. >> Like we were just talking to Carlo, He was like, Well, and Europe is ahead in things like certain IOT-- >> Yeah. >> And Industrial IOT. >> Yeah. >> Yeah. >> Even IOT analytics. Some of the... Tesla not withstanding some of the automated vehicles. >> John Kreisa: Correct. >> Autonomous vehicles activity that's going on. >> John Kreisa: That's right. >> Certainly with Daimler and others. So there's an advancement. It almost reminds me of the early days of mobile, so... (laughing) >> It's actually, it's a good point. If you look at... Squint through some of the perspectives, it depends on where you are in the room and what your view is. You could argue there are many things that Europe is advanced on and where we're behind. If you look at Amazon Web Services, for instance. >> Umm-hmm. >> They are clearly running as fast as they can to deploy regions. >> Umm-hmm. >> So the scoop's coming out now. I'm hearing buzz that there's another region coming out. >> Right. >> From Amazon soon (laughing). They can't go fast enough. Google is putting out regions again. >> Right. >> Data centers are now pushing global, yet, there's more industrial here than is there. So it's interesting perspective. It depends on how you look at it! >> Yeah, yeah, no, I think it's... And it's perfectly fair to say there are many places where it's more advanced. I think in this technology and open source technologies, in general, are helping drive some of those and enable some of those trends. >> Yeah. >> Because if you have the sensors, you need a place to store and analyze that data whether it's smart cars or smart cities, or energy, smart energy, all those different places. That's really where we are. >> What's different in the international theater that you're involved in because you've been on both sides. >> Yep. >> As you came from the U.S. then when we first met. What's different out here now? And I see the gaps closing? What other things that notable that you could share? >> Yeah, yeah, so I'd say, we still see customers in the U.S. that are still very much wanting to use the shiniest, new thing, like the very latest version of Spark or the very latest version of NyFy or some other technologies. They want to push and use that latest version. In Europe, now the conversations are slightly different, in terms of understanding the security and governance. I think there's a lot more consciousness, if you will, around data here. There's other rules and regulations that are coming into place. And I think they're a little bit more advanced in how they think of-- >> Yeah. >> Data, personal data, how to be treated, and so, consequently, those are where the conversations are about the platform. How do we secure it? How does it get governed? So that you need regulations-- >> John Furrier: It's not as fast, as loose as the U.S. >> Yeah, it's not as fast. And you look and see some of the regulations. (laughing) My wife asked me if we should set up a VPIN on our home WiFi because of this new rule about being able to sell the personal data. I've said, Well, we're not in the U.S., but perhaps, when we move to the U.S. >> In order to get the right to block chain (laughing). (cross-talk) >> Yeah, absolutely (cross-talk). >> John Furrier: Encrypt everything. >> (laughing) Yeah, exactly. >> Well, another topic is... Let's talk about the ecosystem a little bit. >> Umm-hmm. >> You've got now some additional public brethren, obviously Cloudera's, there's been a lot of talk here about-- >> Umm-hmm. Tow-len and Al-trex-is have gone public. >> Yeah. >> The ecosystem you've evolved that. IBM was up on stage with you guys. >> Yeah, yep. >> So that continues to be-- >> Gallium C. >> Can we talk about that a little bit? >> Gallium C >> Gallium C. >> We had a great... Partners are great. We've always been about the ecosystem. We were talking about before we came on-screen that for us it's not Marney Partnership. They're very much of substance, engineering to try to drive value for the customers. It's where we see that value in that joint value. So IBM is working with us across all of the DataWorks Summit, but, even in all of the engineering work that we're doing, participated in HDP 2.6 announcement that we just did. And I'm sure what you covered with Shawn and others, but those partnerships really help drive value for the customer. >> Umm-hmm. For us, it's all making sure the customer is successful. And to make a complete solution, it is a range of products, right. It is whether it's data warehousing, servers, networks, all of the different analytics, right. There's not one product that is the complete solution. It does take a stack, a multitude of technologies, to make somebody successful. >> Cloudera's S-1, was file, what's been part of the conversation, and we've been digging into, it's great to see the numbers. >> Umm-hmm. >> Anything surprise you in the S-1? And advice you'd give to open source companies looking to go public because, as Dave pointed out, there's a string now of comrades in arms, if you will, Mool-saw, that's doing very well. >> Yeah, yeah. >> And Al-trex-is just went public. >> Yeah. >> You guys have been public for a long time. You guys been operating the public open-- >> Yeah. >> Both open source, pure open source. But also on the public markets. You guys have experience. You got some scar tissue. >> John Kreisa: (laughing) Yeah, yeah. >> What's your advice to Cloudera or others that are... Because the risk certainly will be a rush for more public companies. >> Yeah. >> It's a fantastic trend. >> I think it is a fantastic trend. I completely agree. And I think that it shows the strength of the market. It shows both the big data market, in general, the analytics market, kind of all the different components that are represented in some of those IPOs or planned IPOs. I think that for us, we're always driving for success of the customer, and I think any of the open source companies, they have to look at their business plan and take it step-wise in approach, that keeps an eye on making the customer successful because that's ultimately what's going to drive the company success and drive revenue for it and continue to do it. But we welcome as many companies as possible to come into the public market because A: it just allows everybody to operate in an open and honest way, in terms of comparison and understanding how growth is. But B: it's shows that strength of how open source and related technologies can help-- >> Yeah. >> Drive things forward. >> And it's good for the customer, too, because now they can compare-- >> Yes! >> Apples to Apples-- >> Exactly. >> Visa V, Cloudera, and what's interesting is that they had such a head start on you guys, HORTONWORKS, but the numbers are almost identical. >> Umm-hmm, yeah. >> Really close. >> Yeah, I think it's indicative of the opportunity that they're now coming out and there's rumors of other companies coming out. And I think it's just gives that visibility. We welcome it, absolutely-- >> Yeah. >> To show because we're very proud of our performance and now are growth. And I think that's something that we stand behind and stand on top of. And we want to see others come out and show what they got. >> Let's talk about events, if we can? >> Yeah. >> We were there at the first Hadoop Summit in San Jose. Thrilled to be-- >> John Kreisa: In a few years. >> In Dublin last year. >> Yeah. >> So what's the event strategy? I love going into the local flavor. >> Umm-hmm. >> Last year we had the Irish singers. This year we had a great (laughing) locaL band. >> John Kreisa: (laughing) Yeah, yeah, yeah. >> So I don't know if you've announced where next year's going to be? Maybe you can share with us some of the roll-out strategies? >> Yeah, so first of all, DataWorks Summit is a great event as you guys know, And you guys are long participants, so it's a great partnership. We've moving them international, of course, we did a couple... We are already international, but moving a couple to Asia last year so-- >> Right. >> Those were a tremendous success, we actually exceeded our targets, in terms of how many people we thought would go. >> Dave: Where did you do those? >> We were in Melburn in Tokyo. >> Dave: That's right, yeah. >> Yeah, so in both places great community, kind of rushed to the event and kind of understanding, really showed that there is truly a global kind of data community around Hadoop and other related technologies. So from here as you guys know because you're going to be there, we're thinking about San Jose and really wanting to make sure that's a great event. It's already stacking up to be tremendous, call for papers is all done. And all that's announced so, even the sessions we're really starting build for that, We'll be later this year. We'll be in Sydney, so we're going to have to take DataWorks into Sydney, Australia, in September. So throughout the rest of this year, there's going to be continued building momentum and just really global participation in this community, which is great. >> Yeah. >> Yeah. >> Yeah, it's fantastic. >> Yeah, Sydney should be great. >> Yeah. >> Looking forward to it. We're going to expand theCUBE down under. Dave and I are are excited-- >> Dave: Yeah, let's talk about that. >> We got a lot of interest (laughing). >> Alright. >> John, great to have you-- >> Come on down. >> On theCUBE again. Great to see you. Congratulations, I'm going to see you up on stage. >> Thank you. >> Doing the emcee. Great show, a lot of great presenters and great customer testimonials. And as always the sessions are packed. And good learning, great community. >> Yeah. >> Congratulations on your ecosystem. This is theCUBE broadcasting live from Munich, Germany for DataWorks 2017, presented by HORTONWORKS and Yahoo. I'm John Furrier with Dave Vellante. Stay with us, great interviews on day two still up. Stay with us. (electronic music)

Published Date : Apr 6 2017

SUMMARY :

Brought to you by HORTONWORKS. Great to see you emceeing the event. its got the European vibe. But big IOT message-- a lot of smart cities, a lot of meat on the bone here. Always great to have you guys here-- We did change the name as you mentioned And there's a 2-year visibility. to see this aperture It's interesting the big data space in on the Hadoop horse. A lot of people don't get them. to a much broader scope. And you guys saw that two years ago. that the fact that there's one enabling-- But what's the new expansion? But there's a lot more in because it's all about the data, right. of getting this stuff to work and-- Wow, the enterprise data warehouse of the S-curve and get for the technology to mature it great on our last segment. And I thought It's never always been the case. So if you get the data locked in, So the foundational data a lot of the conversations of the automated vehicles. activity that's going on. It almost reminds me of the it depends on where you are in the room as fast as they can to deploy regions. So the scoop's Google is putting out regions again. It depends on how you look at it! And it's perfectly fair to have the sensors, the international theater And I see the gaps closing? or the very latest version of NyFy So that you need regulations-- fast, as loose as the U.S. some of the regulations. In order to get the right Let's talk about the Tow-len and Al-trex-is IBM was up on stage with you guys. even in all of the engineering work networks, all of the it's great to see the numbers. in the S-1? You guys been operating the public open-- But also on the public markets. Because the risk certainly will be kind of all the different components HORTONWORKS, but the numbers indicative of the opportunity And I think that's something at the first Hadoop Summit in San Jose. I love going into the local flavor. the Irish singers. Yeah, yeah, yeah. And you guys are long participants, in terms of how many kind of rushed to the event We're going to expand theCUBE down under. to see you up on stage. And as always the sessions are packed. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
John Kreisa	PERSON	0.99+
John Furrier	PERSON	0.99+
Carlo	PERSON	0.99+
Sydney	LOCATION	0.99+
Asia	LOCATION	0.99+
Shawn Connelly	PERSON	0.99+
2-year	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
Dublin	LOCATION	0.99+
Melburn	LOCATION	0.99+
San Jose	LOCATION	0.99+
North America	LOCATION	0.99+
John	PERSON	0.99+
Last year	DATE	0.99+
U.S.	LOCATION	0.99+
Daimler	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
September	DATE	0.99+
Centrica	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
last year	DATE	0.99+
Walgreens Booth	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
both sides	QUANTITY	0.99+
HORTONWORKS	ORGANIZATION	0.99+
This year	DATE	0.99+
S-1	TITLE	0.99+
yesterday	DATE	0.98+
next year	DATE	0.98+
Munich, Germany	LOCATION	0.98+
Shawn	PERSON	0.98+
HPE	ORGANIZATION	0.98+
Hadoop Summit	EVENT	0.98+
both	QUANTITY	0.98+
two years ago	DATE	0.98+
a year and a half	QUANTITY	0.98+
one product	QUANTITY	0.98+
DataWorks 2017	EVENT	0.98+
this year	DATE	0.98+
Sydney, Australia	LOCATION	0.97+
DataWorks Summit	EVENT	0.97+
Apples	ORGANIZATION	0.97+
day two	QUANTITY	0.97+
Hortonworks–	ORGANIZATION	0.97+
Gallium C	ORGANIZATION	0.96+
Gallium C.	ORGANIZATION	0.96+

Steve Roberts, IBM– DataWorks Summit Europe 2017 #DW17 #theCUBE

>> Narrator: Covering DataWorks Summit, Europe 2017, brought to you by Hortonworks. >> Welcome back to Munich everybody. This is The Cube. We're here live at DataWorks Summit, and we are the live leader in tech coverage. Steve Roberts is here as the offering manager for big data on power systems for IBM. Steve, good to see you again. >> Yeah, good to see you Dave. >> So we're here in Munich, a lot of action, good European flavor. It's my second European, formerly Hadoop Summit, now DataWorks. What's your take on the show? >> I like it. I like the size of the venue. It's the ability to interact and talk to a lot of the different sponsors and clients and partners, so the ability to network with a lot of people from a lot of different parts of the world in a short period of time, so it's been great so far and I'm looking forward to building upon this and towards the next DataWorks Summit in San Jose. >> Terri Virnig VP in your organization was up this morning, had a keynote presentation, so IBM got a lot of love in front of a fairly decent sized audience, talking a lot about the sort of ecosystem and that's evolving, the openness. Talk a little bit about open generally at IBM, but specifically what it means to your organization in the context of big data. >> Well, I am from the power systems team. So we have an initiative that we have launched a couple years ago called Open Power. And Open Power is a foundation of participants innovating from the power processor through all aspects, through accelerators, IO, GPUs, advanced analytics packages, system integration, but all to the point of being able to drive open power capability into the market and have power servers delivered not just through IBM, but through a whole ecosystem of partners. This compliments quite well with the Apache, Hadoop, and Spark philosophy of openness as it relates to software stack. So our story's really about being able to marry the benefits of open ecosystem for open power as it relates to the system infrastructure technology, which drives the same time to innovation, community value, and choice for customers as it relates to a multi-vendor ecosystem and coupled with the same premise as it relates to Hadoop and Spark. And of course, IBM is making significant contributions to Spark as part of the Apache Spark community and we're a key active member, as is Hortonworks with the ODPi organization forwarding the standards around Hadoop. So this is a one, two combo of open Hadoop, open Spark, either from Hortonworks or from IBM sitting on the open power platform built for big data. No other story really exists like that in the market today, open on open. >> So Terri mentioned cognitive systems. Bob Picciano has recently taken over and obviously has some cognitive chops, and some systems chops. Is this a rebranding of power? Is it sort of a layer on top? How should we interpret this? >> No, think of it more as a layer on top. So power will now be one of the assets, one of the sort of member family of the cognitive systems portion on IBM. System z can also be used as another great engine for cognitive in certain clients, certain use cases where they want to run cognitive close to the data and they have a lot of data sitting on System z. So power systems as a server really built for big data and machine learning, in particular our S822LC for high performance computing. This is a server which is landing very well in the deep learning, machine learning space. It offers the Tesla P100 GPU and with the NVIDIA NVLink technology can offer up to 2.8x bandwidth benefits CPU to GPU over what would be available through a PCIe Intel combination today. So this drives immediate value when you need to ensure that not just you're exploiting GPUs, but you of course need to move your data quickly from the processor to the GPU. >> So I was going to ask you actually, sort of what make power so well suited for big data and cognitive applications, particularly relative to Intel alternatives. You touched on that. IBM talks a lot about Moore's Law starting to hit its peak, that innovation is going to come from other places. I love that narrative 'cause it's really combinatorial innovation that's going to lead us in the next 50 years, but can we stay on that thread for a bit? What makes power so substantially unique, uniquely suited and qualified to run cognitive systems and big data? >> Yeah, it actually starts with even more of the fundamentals of the power processors. The power processor has eight threads per core in contrast to Intel's two threads per core. So this just means for being able to parallelize your workloads and workloads that come up in the cognitive space, whether you're running complex queries and need to drive SQL over a lot of parallel pipes or you're writing iterative computation, the same data set as when you're doing model training, these can all benefit from highly parallelized workloads, which can benefit from this 4x thread advantage. But of course to do this, you also need large, fast memory, and we have six times more cache per core versus Broadwell, so this just means you have a lot of memory close to the processor, driving that throughput that you require. And then on top of that, now we get to the ability to add accelerators, and unique accelerators such as I mentioned the NVIDIA in the links scenario for GPU or using the open CAPI as an approach to attach FPGA or Flash to get access speeds, processor memory access speeds, but with an attached acceleration device. And so this is economies of scale in terms of being able to offload specialized compute processing to the right accelerator at the right time, so you can drive way more throughput. The upper bounds are driving workload through individual nodes and being able to balance your IO and compute on an individual node is far superior with the power system server. >> Okay, so multi-threaded, giant memories, and this open CAPI gives you primitive level access I guess to a memory extension, instead of having to-- >> Yeah, pluggable accelerators through this high speed memory extension. >> Instead of going through, what I often call the horrible storage stack, aka SCSI, And so that's cool, some good technology discussion there. What's the business impact of all that? What are you seeing with clients? >> Well, the business impact is not everyone is going to start with supped up accelerated workloads, but they're going to get there. So part of the vision that clients need to understand is to begin to get more insights from their data is, it's hard to predict where your workloads are going to go. So you want to start with a server that provides you some of that upper room for growth. You don't want to keep scaling out horizontally by requiring to add nodes every time you need to add storage or add more compute capacity. So firstly, it's the flexibility, being able to bring versatile workloads onto a node or a small number of nodes and be able to exploit some of these memory advantages, acceleration advantages without necessarily having to build large scale out clusters. Ultimately, it's about improving time to insights. So with accelerators and with large memory, running workloads on a similar configured clusters, you're simply going to get your results faster. For example, recent benchmark we did with a representative set of TPC-DS queries on Hortonworks running on Linux and power servers, we're able to drive 70% more queries per hour over a comparable Intel configuration. So this is just getting more work done on what is now similarly priced infrastructure. 'Cause power family is a broad family that now includes 1U, 2U, scale out servers, along with our 192 core horsepowers for enterprise grade. So we can directly price compete on a scale out box, but we offer a lot more flexible choice as clients want to move up in the workload stack or to bring accelerators to the table as they start to experiment with machine learning. >> So if I understand that right, I can turn two knobs. I can do the same amount of work for less money, TCO play. Or, for the same amount of money, I can do more work. >> Absolutely >> Is that fair? >> Absolutely, now in some cases, especially in the Hadoop space, the size of your cluster is somewhat gated by how much storage you require. And if you're using the classic scale up storage model, you're going to have so many nodes no matter what 'cause you can only put so much storage on the node. So in that case, >> You're scaling storage. >> Your clusters can look the same, but you can put a lot more workload on that cluster or you can bring in IBM, a solution like IBM Spectrum Scale our elastic storage server, which allows you to essentially pull that storage off the nodes, put it in a storage appliance, and at that point, you now have high speed access to storage 'cause of course the network bandwidth has increased to the point that the performance benefit of local storage is no longer really a driving factor to a classic Hadoop deployment. You can get that high speed access in a storage appliance mode with the resiliency at far less cost 'cause you don't need 3x replication, you just have about a 30% overhead for the software erasure coding. And now with your compete nodes, you can really choose and scale those nodes just for your workload purposes. So you're not bound by the number of nodes equal total storage required by storage per node, which is a classic, how big is my cluster calculation. That just doesn't work if you get over 10 nodes, 'cause now you're just starting to get to the point where you're wasting something right? You're either wasting storage capacity or typically you're wasting compute capacity 'cause you're over provisioned on one side or the other. >> So you're able to scale compute and storage independent and tune that for the workload and grow that resource efficiently, more efficiently? >> You can right size the compute and storage for your cluster, but also importantly is you gain the flexibility with that storage tier, that data plan can be used for other non-HDFS workloads. You can still have classic POSIX applications or you may have new object based applications and you can with a single copy of the data, one virtual file system, which could also be geographically distributed, serving both Hadoop and non-Hadoop workloads, so you're saving then additional replicas of the data from being required by being able to onboard that onto a common data layer. >> So that's a return on asset play. You got an asset that's more fungible across the application portfolio. You can get more value out of it. You don't have to dedicate it to this one workload and then over provision for another one when you got extra capacity sitting here. >> It's a TCO play, but it's also a time saver. It's going to get you time to insight faster 'cause you don't have to keep moving that data around. The time you spend copying data is time you should be spending getting insights from the data, so having a common data layer removes that delay. >> Okay, 'cause it's HDFS ready I don't have to essentially move data from my existing systems into this new stovepipe. >> Yeah, we just present it through the HDFS API as it lands in the file system from the original application. >> So now, all this talk about rings of flexibility, agility, etc, what about cloud? How does cloud fit into this strategy? What do are you guys doing with your colleagues and cohorts at Bluemix, aka SoftLayer. You don't use that term anymore, but we do. When we get our bill it says SoftLayer still, but any rate, you know what I'm talking about. The cloud with IBM, how does it relate to what you guys are doing in power systems? >> Well the cloud is still, really the born on the cloud philosophy of IBM software analytics team is still very much the motto. So as you see in the data science experience, which was launched last year, born in the cloud, all our analytics packages whether it be our BigInsights software or our business intelligence software like Cognos, our future generations are landing first in the cloud. And of course we have our whole arsenal of Watson based analytics and APIs available through the cloud. So what we're now seeing as well as we're taking those born in the cloud, but now also offering a lot of those in an on-premise model. So they can also participate in the hybrid model, so data science experience now coming on premise, we're showing it at the booth here today. Bluemix has a on premise version as well, and the same software library, BigInsights, Cognos, SPSS are all available for on prem deployment. So power is still ideal place for hosting your on prem data and to run your analytics close to the data, and now we can federate that through hybrid access to these elements running in the cloud. So the focus is really being able to, the cloud applications being able to leverage the power and System z's based data through high speed connectors and being able to build hybrid configurations where you're running your analytics where they most make sense based upon your performance requirements, data security and compliance requirements. And a lot of companies, of course, are still not comfortable putting all their jewels in the cloud, so typically there's going to be a mix and match. We are expanding the footprint for cloud based offerings both in terms of power servers offered through SoftLayer, but also through other cloud providers, Nimbix is a partner we're working with right now who actually is offering our Power AI package. Power AI is a package of open source, deep learning frameworks, packaged by IBM, optimized for Power in an easily deployed package with IBM support available. And that's, could be deployed on premise in a power server, but also available on a pay per drink purpose through the Nimbix cloud. >> All right, we covered a lot of ground here. We talked strategy, we talked strategic fit, which I guess is sort of a adjunct to strategy, we talked a little bit about the competition and where you differentiate, some of the deployment models, like cloud, other bits and pieces of your portfolio. Can we talk specifically about the announcements that you have here at this event, just maybe summarize for use? >> Yeah, no absolutely. As it relates to IBM, and Hadoop, and Spark, we really have the full stack support, the rich analytics capabilities that I was mentioning, deep insight, prescriptive insights, streaming analytics with IBM Streams, Cognos Business Intelligence, so this set of technologies is available for both IBMs, Hadoop stack, and Hortonworks Hadoop stack today. Our BigInsights and IOP offering, is now out for tech preview, their next release their 4.3 release, is available for technical preview will be available for both Linux on Intel, Linux on power towards the end of this month, so that's kind of one piece of new Hadoop news at the analytics layer. As it relates to power systems, as Hortonworks announced this morning, HDP 2.6 is now available for Linux on power, so we've been partnering closely with Hortonworks to ensure that we have an optimized story for HDP running on power system servers as the data point I shared earlier with the 70% improved queries per hour. At the storage layer, we have a work in progress to certify Hortonworks, to certify Spectrum Scale file system, which really now unlocks abilities to offer this converged storage alternative to the classic Hadoop model. Spectrum Scale actually supports and provides advantages in both a classic Hadoop model with local storage or it can provide the flexibility of offering the same sort of multi-application support, but in a scale out model for storage that it also has the ability to form a part of a storage appliance that we call Elastic Storage Server, which is a combination of power servers and high density storage enclosures, SSD or spinning disk, depending upon the, or flash, depending on the configuration, and that certification will now have that as an available storage appliance, which could underpin either IBM Open Platform or HDP as a Hadoop data leg. But as I mentioned, not just for Hadoop, really for building a common data plane behind mixed analytics workloads that reduces your TCO through converged storage footprint, but more importantly, provides you that flexibility of not having to create data copies to support multiple applications. >> Excellent, IBM opening up its portfolio to the open source ecosystem. You guys have always had, well not always, but in the last 20 years, major, major investments in open source. They continue on, we're seeing it here. Steve, people are filing in. The evening festivities are about to begin. >> Steve: Yeah, yeah, the party will begin shortly. >> Really appreciate you coming on The Cube, thanks very much. >> Thanks a lot Dave. >> You're welcome. >> Great to talk to you. >> All right, keep it right there everybody. John and I will be back with a wrap up right after this short break, right back.

Published Date : Apr 6 2017

SUMMARY :

brought to you by Hortonworks. Steve, good to see you again. Munich, a lot of action, so the ability to network and that's evolving, the openness. as it relates to the system and some systems chops. from the processor to the GPU. in the next 50 years, and being able to balance through this high speed memory extension. What's the business impact of all that? and be able to exploit some of these I can do the same amount of especially in the Hadoop space, 'cause of course the network and you can with a You don't have to dedicate It's going to get you I don't have to essentially move data as it lands in the file system to what you guys are and to run your analytics a adjunct to strategy, to ensure that we have an optimized story but in the last 20 years, Steve: Yeah, yeah, the you coming on The Cube, John and I will be back with a wrap up

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Dave	PERSON	0.99+
Munich	LOCATION	0.99+
Bob Picciano	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Terri	PERSON	0.99+
3x	QUANTITY	0.99+
six times	QUANTITY	0.99+
70%	QUANTITY	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
two knobs	QUANTITY	0.99+
Bluemix	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
eight threads	QUANTITY	0.99+
Linux	TITLE	0.99+
Hadoop	TITLE	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
Nimbix	ORGANIZATION	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.98+
SoftLayer	TITLE	0.98+
second	QUANTITY	0.97+
Hadoop Summit	EVENT	0.97+
Intel	ORGANIZATION	0.97+
Spark	TITLE	0.97+
IBMs	ORGANIZATION	0.95+
single copy	QUANTITY	0.95+
end of this month	DATE	0.95+
Watson	TITLE	0.95+
S822LC	COMMERCIAL_ITEM	0.94+
Europe	LOCATION	0.94+
this morning	DATE	0.94+
firstly	QUANTITY	0.93+
HDP 2.6	TITLE	0.93+
first	QUANTITY	0.93+
HDFS	TITLE	0.91+
one piece	QUANTITY	0.91+
Apache	ORGANIZATION	0.91+
30%	QUANTITY	0.91+
ODPi	ORGANIZATION	0.9+
DataWorks Summit Europe 2017	EVENT	0.89+
two threads per core	QUANTITY	0.88+
SoftLayer	ORGANIZATION	0.88+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
2014	DATE	0.99+
John Furrier	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
John Mertic	PERSON	0.99+
Mike Olson	PERSON	0.99+
Shaun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Shaun Connolly	PERSON	0.99+
Centric	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Coca-Cola	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
2016	DATE	0.99+
4.1 billion	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
two	QUANTITY	0.99+
100 million	QUANTITY	0.99+
five	QUANTITY	0.99+
2011	DATE	0.99+
Mount Fuji	LOCATION	0.99+
US	LOCATION	0.99+
seven	QUANTITY	0.99+
185 million	QUANTITY	0.99+
eight years	QUANTITY	0.99+
four years	QUANTITY	0.99+
10x	QUANTITY	0.99+
Dahl Jeppe	PERSON	0.99+
YouTube	ORGANIZATION	0.99+
FedEx	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
100 million	QUANTITY	0.99+
one	QUANTITY	0.99+
MuleSoft	ORGANIZATION	0.99+
2025	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
three years	QUANTITY	0.99+
15	QUANTITY	0.99+
two companies	QUANTITY	0.99+
2012	DATE	0.99+
Munich, Germany	LOCATION	0.98+
Hadoop	TITLE	0.98+
DataWorks 2017	EVENT	0.98+
Wei Wang	PERSON	0.98+
Wei	PERSON	0.98+
10%	QUANTITY	0.98+
eight years	QUANTITY	0.98+
20x	QUANTITY	0.98+
Hortonworks Hadoop Summit	EVENT	0.98+
end of 2016	DATE	0.98+
three billion dollars	QUANTITY	0.98+
SiliconANGLE	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.97+

Jim Campigli, WANdisco - #BigDataNYC 2015 - #theCUBE

>> Live from New York. It's The Cube, covering Big Data NYC 2015. Brought to you by Horton Works, IBM, EMC, and Pivotal. Now for your hosts, John Furrier and Dave Vellante. >> Hello, everyone. Welcome back to live in New York City for the Cube. A special big data [inaudible 00:00:27] our flagship program will go out to the events. They expect a [Inaudible 00:00:30] We are here live as part of Strata Hadoop Big Data NYC. I'm John Furrier. My co-host, Dave Vellante. Our next guest is Jim Campigli, the Chief Product Officer at WANdisco. Welcome back to The Cube. Great to see you. >> Thanks, great to be here. >> You've been COO of WANdisco, head of marketing, now Chief Product Officer for a few years. You guys have always had the patent. David was on earlier. I asked him specifically, why doesn't the other guys just do what you do? I wanted you to comment deeper on that because he had a great answer. He said, patents. But you guys do something that's really hard that people can't do. >> Right. >> So let's get into it because Fusion is a big announcement you guys made. Big deal with EMC, lot of traction with that, and it's one of these things that is kind of talked about, but not talked about. It's really a big deal, so what is the reason why you guys are so successful on the product side? >> Well I think, first of all, it starts with the technology that we have patented, and it's this true active active replication capability that we have. Other software products claim to have active active replication, but when you drill down on what they're really doing, typically, what's happening is they'll have a set of servers that they replicate across, and you can write a transaction at any server, but then that server is responsible for propagating it to all of the other servers in the implementation. There's no mechanism for pre-agreeing to that transaction before it's actually written, so there's no way to avoid conflicts up front, there's no way to effectively handle scenarios where some of the servers in the implementation go down while the replication is in process, and very frequently, those solutions end up requiring administrators to do periodic resynchronization, go back and manually find out what didn't take, and deal with all the deltas, whereas we offer guaranteed consistency. And effectively what happens is with us, you can write at any server as well, but the difference is we go through a peer-to-peer agreement process, and once a quorum of the servers in the implementation agree to the transaction, they all accept it, and we make sure everything is written in the same order on every server. And every server knows the last good transaction it processed, so if it goes down at some point in time, as soon as it comes back up, it can grab all the transactions it missed during that time slice while it was offline, resync itself automatically without an administrator having to do anything. And you can use that feature not only for network and server outages that cause downtime, but even for planned maintenance, which is one of the biggest causes of Hadoop availability issues, because obviously if you've got a global appointment, when it's midnight on Sunday in the U.S., it's the start of the business day on Monday in Europe, and then it's the middle of the afternoon in Asia. So if you take Hadoop clusters down, somebody somewhere in the world is going to be going without their applications and data. >> It's interesting; I want to get your comments on this because this has a great highlight into the next conversation we've been hearing all throughout The Cube this week is analytics, outcomes. These are the kind of things that people talk about because that means there's checks being written. Hadoop is moving into production. People have done the clusters. It used to be the conversation, hey, x number of clusters, you do this, you do that, replication here and there, YARN, all these different buzz words. Really feeds and speeds. Now, Hadoop is relevant, but it's kind of invisible. It's under the hood. >> Right. >> Yet, it's part of other things in the network, so high availability, non-disruptive operations, is what our table stakes now. So I want you to talk about that nuance because that's what we're seeing as the things that are powering, as the engine of Hadoop deployments. What is that? Take us through that nuance, because that's one of the things that you guys have been doing a lot of work in that's making it reliable and stable. To actually go out and play with Hadoop, deploy it, make sure it's always on. >> Well, we really come into play when companies are moving Hadoop out of the lab and into production. When they have defined application SLAs, when they can only have so much down time, and it may be business requirements, it may be regulatory compliance issues, for example, financial services. They pretty much always have to have their data available. They have to have a solid back-up of the data. That's a hard requirement for them to put anything into production in their data centers. >> The other use case we've been hearing is okay, I've got Hadoop, I've been playing with it, now I need to scale it up big time. I need to double, triple my clusters. I have to put it with my applications. Then the conversation's, okay, wait, do I need to do more cis admin work? How do you address that particular piece because I think that's where I think Fusion comes in from how I'm reading it, but is that a Fusion value proposition? Is it a WANdisco thing, and what does the customer, and is that happening? >> Yeah, so there's actually two angles to that, and the first is how do we maintain that up-time? How do we make sure there's performance availability to meet the SLA's, the production SLA's? The active active replication that we have patents for, that I described earlier, and it's embodied in our discount distributed coordination engine, is at the core of Fusion, and once a Fusion server's installed with each of your Hadoop clusters, that active active replication capability is extended to them, and we expose that HDFS API so the client applications, Sqoop, Flume, Impala, HIVE, anything that would normally run against a Hadoop cluster, would talk through us. If it's been defined for replication, we do the active active replication of it. Pass straight through and process normally on the local cluster. So how does that address the issues you were talking about? What you're getting by default with our active active replication is effectively continuous hot back-up. That means if one cluster or an entire data center goes offline, that data exists elsewhere. Your users can fail over. They can continue accessing the data, running their applications. As soon as that cluster comes back online, it resyncs automatically. Now what's the other >> No user involvement? No admin? >> No user involvement in that. Now the only time, and this gets back into what I was talking about earlier, if I take servers offline for planned maintenance, upgrade the hardware, the operating system, whatever it may be, I can take advantage of that feature, as I was alluding to earlier. I can take the servers of the entire cluster offline, and Fusion knows the last good transactions that were processed on that cluster. As soon as the admin turns it back on, it'll resync itself automatically. So that's how you avoid down time, even for planned maintenance, if you have to take an entire location off. Now, to your other question, how do you scale this stuff up? Think about what we do. We eliminate idle standby hardware, because everything is full read write. You don't have standby read-only back-up clusters and servers when we come into the picture. So let's say we walk into an existing implementation, and they've got two clusters. One is the active cluster where everything's being written to, read from, actively being accessed by users. The other's just simply taking snapshots or periodic back-ups, or they're using dis(CP) or something else, but they really can't get full utilization out of that. We come in with our active active replication capability, and they don't have to change anything, but what suddenly happens is, as soon as they define what they want replicated, we'll replicate it for them initially to the other clusters. They don't have to pre-sync it, and the cluster that was formally for disaster recovery, for back-up, is now live and fully usable. So guess what? I'm now able to scale up to twice my original implementation by just leveraging that formally read-only back-up cluster that I was >> Is there a lot of configuration involved in that, or is it automatically? >> No, so basically what happens, again, you don't have to synchronize the clusters in advance. The way we replicate is based on this concept of folders, and you can think of a folder as basically a collection of files and subdirectories that roll up into root directories, effectively, that reflect typically particular applications that people are using with Hadoop or groups of users that have data sets that they access for their various sets of applications. And you define the replicated folders, basically a high level directory that consists of everything in it, and as soon as you do that, what we'll do automatically, in a new implementation. Let's keep it simple. Let's say you just have two clusters, two locations. We'll replicate that folder in its entirety to the target you specify, and then from that point on, we're just moving the deltas over the wire. So you don't have to do anything in advance. And then suddenly that back-up hardware is fully usable, and you've doubled the size of your implementations. You've scaled up to 2x. >> So, I mean what you're describing before, really strikes me that the way you tell the complexity of a product and the value of a product in this space is what happens when something goes wrong. >> Yep. >> That's the question you always ask. How do you recover, because recovery's a very hard thing, and your patents, you've got a lot of math inside there. >> Right. >> But you also said something that's interesting, which is you're an asset utilization play. >> Right. >> You're being able to go in relatively simply and say, okay, you've got this asset that's underutilized. I'm now going to give you back some capacity that's on the floor and take advantage of that. >> Right, and you're able to scale up without spending any more on hardware and infrastructure. >> So I'm interested in, so another company. You're now with an EMC partnership this week. And they sort of got into this way back in the mainframe days with SRDF. I always thought when I first heard about WANdisco, it's like SRDF for Hadoop, but it's active active. Then they bought that Yada Yada. >> And there's no distance limitations for their active active. >> So what's the nature of the relationship with EMC? >> Okay, so basically EMC, like the other storage vendors that want to play in the Hadoop space, expose some form of an HDFS API, and in fact, if you look at Hortonworks or Cloudera, if you go and look at Cloudera Manager, one of the things it asks you when you're installing it is are you going to run this on regular HDFS storage, effectively a bunch of commodity boxes typically, or are you going to use EMC Isilon or the various other options? And what we're able to do is replicate across Hadoop clusters running on Isilon, running on EMC ECS, running on standard HDFS, and what that allows these companies to do is without modifying those storage systems, without migrating that data off of them, incorporate it into an enterprise-wide data lake, if that's what they want to do, and selectively replicate across all of those different storage systems. It could be a mix of different Hadoop distributions. You could have replication between C/D/H, HDP, Pivotal, MapR, all of those things, including EMC Storage that I just mentioned, it was mentioned in the press release, Isilon, and ECS effectively has a Hadoop-compatible API support. And we can create in effect a single virtual cluster out of all of those different platforms. >> So is it a go-to-market relationship? Is it an OEM deal? >> Yeah, it was really born out of the fact that we have some mutual customers that want to do exactly what I just described. They have standard Hortonworks or Cloudera deployments in house. They've got data running on Isilon, and they want to deploy a data lake that includes what they've got stored on Isilon with what they've got in HDFS and Hadoop and replicate across that. >> Like onerous EMC certification process? >> Yeah, we went through that process. We actually set up environments in our labs where we had EMC, Isilon, and ECS running and did demonstration integrations, replication across Isilon to HDP to Hortonworks, Isilon to Cloudera, ECS to Isilon to HDP and Cloudera and so forth. So we did prove it out. They saw that. In fact, they lent us boxes to actually do this in our labs, so they were very motivated, and they're seeing us in some of their bigger accounts. >> Talk about the aspect of two things: non-disruptive operations, meaning I have to want to deploy stuff because now that Hadoop has a hardened top with some abstraction layer, with analytics to focus, there's a lot of work going on under the hood, and a large scale enterprise might have a zillion versions of Hadoop. They might have little Hortonworks here. They might have something over here, so there might be some diversity in the distributions. That's one thing. The other one is operational disruption. >> Right. >> What do you guys do there? Is it zero disruption, and how do you deal with multiple versions of the distro? >> Okay, so basically what we're doing, the simplest way to describe it is we're providing a common API across all of these different distributions, running on different storage platforms and so forth, so that the client applications are always interacting with us. They're not worrying about the nuances of the particular Hadoop API's that these different things expose. So we're providing a layer of abstraction effectively. So we're transparent in effect, in that sense, operationally, once we're installed. The other thing is, and I mentioned this earlier, we come in, basically, you don't have to pre-sync clusters, you don't have to make sure they're all the same versions or the same distros or any of that, just install us, select the data that you want to replicate, we'll replicate it over initially to the target clusters, and then from that point on, you just go. It just works, and we talked about the core patent for active active replication. We've got other patents that have been approved, three patents now and seven pending applications pending, that allow this active active replication to take place while servers are being added and removed from implementations without disrupting user access or running applications and so forth. >> Final question for you, sum up the show this week. What's the vibe here? What's the aroma? Is it really Hadoop next? What is the overall Big Data NYC story here in Strata Hadoop? What's the main theme that you're seeing coming out of the show? >> I think the main theme that we're starting to see, it's twofold. I think one is we are seeing more and more companies moving this into production. There's a lot of interest in Spark and the whole fast data concept, and I don't think that Spark is necessarily orthogonal to Hadoop at all. I think the two have to coexist. If you think about Spark streaming and the whole fast data concept, basically, Hadoop provides the historical data at rest. It provides the historical context. The streaming data provides the point in time information. What Spark together with Hadoop allows you to do is that real time analysis, do the real time informed decision-making, but do it within historical context instead of a single point in time vacuum. So I think what's happening, and you notice the vendors themselves aren't saying, oh it's all Spark, forget Hadoop. They're really talking about coexisting. >> Alright, Jim, from WANdisco, Chief Product Officer, really in the trenches, talking about what's under the hood and making it all scale in the infrastructure so his analysts can hit the scene. Great to see you again. Thanks for coming and sharing your insight here on The Cube. Live in New York City. We are here, day two of three days of wall-to-wall coverage of Big Data NYC in conjunction with Strata. We'll be right back with more live coverage in the moment here in New York City after this short break.

Published Date : Oct 6 2015

SUMMARY :

Brought to you by Horton New York City for the Cube. You guys have always had the patent. on the product side? and once a quorum of the servers These are the kind of things because that's one of the things back-up of the data. and is that happening? So how does that address the issues and the cluster that was and you can think of a folder really strikes me that the way you tell That's the question you always ask. But you also said that's on the floor and Right, and you're able to scale up in the mainframe days with SRDF. And there's no distance limitations one of the things it asks you born out of the fact and Cloudera and so forth. diversity in the distributions. so that the client applications What is the overall Big Data NYC story and the whole fast data concept, in the infrastructure

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim	PERSON	0.99+
Jim Campigli	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
WANdisco	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
U.S.	LOCATION	0.99+
New York	LOCATION	0.99+
John Furrier	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
two locations	QUANTITY	0.99+
Strata Hadoop	TITLE	0.99+
first	QUANTITY	0.99+
Pivotal	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two things	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
two clusters	QUANTITY	0.99+
three days	QUANTITY	0.99+
Monday	DATE	0.99+
three patents	QUANTITY	0.98+
this week	DATE	0.98+
seven pending applications	QUANTITY	0.98+
two angles	QUANTITY	0.98+
two clusters	QUANTITY	0.98+
Spark	TITLE	0.97+
this week	DATE	0.97+
one cluster	QUANTITY	0.97+
00:00:30	DATE	0.95+
ECS	TITLE	0.95+
HDP	ORGANIZATION	0.94+
Cloudera Manager	TITLE	0.94+
single point	QUANTITY	0.94+
#BigDataNYC	EVENT	0.94+
each	QUANTITY	0.94+
Impala	TITLE	0.93+
NYC	LOCATION	0.93+
twofold	QUANTITY	0.93+
Strata	ORGANIZATION	0.92+
Flume	TITLE	0.92+
00:00:27	DATE	0.92+
Sqoop	TITLE	0.92+
Fusion	TITLE	0.91+
Isilon	ORGANIZATION	0.89+
Cloudera	ORGANIZATION	0.89+
midnight	DATE	0.89+
Sunday	DATE	0.88+
Isilon	TITLE	0.88+
single	QUANTITY	0.88+
HIVE	TITLE	0.87+
one thing	QUANTITY	0.83+
double	QUANTITY	0.83+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for HDP 2.6: