Ram Venkatesh, Hortonworks & Sudhir Hasbe, Google | DataWorks Summit 2018

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018. Brought to you by HortonWorks. >> We are wrapping up Day One of coverage of Dataworks here in San Jose, California on theCUBE. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We have two guests for this last segment of the day. We have Sudhir Hasbe, who is the director of product management at Google and Ram Venkatesh, who is VP of Engineering at Hortonworks. Ram, Sudhir, thanks so much for coming on the show. >> Thank you very much. >> Thank you. >> So, I want to start out by asking you about a joint announcement that was made earlier this morning about using some Hortonworks technology deployed onto Google Cloud. Tell our viewers more. >> Sure, so basically what we announced was support for the Hortonworks DataPlatform and Hortonworks DataFlow, HDP and HDF, running on top of the Google Cloud Platform. So this includes deep integration with Google's cloud storage connector layer as well as it's a certified distribution of HDP to run on the Google Cloud Platform. >> I think the key thing is a lot of our customers have been telling us they like the familiar environment of Hortonworks distribution that they've been using on-premises and as they look at moving to cloud, like in GCP, Google Cloud, they want the similar, familiar environment. So, they want the choice to deploy on-premises or Google Cloud, but they want the familiarity of what they've already been using with Hortonworks products. So this announcement actually helps customers pick and choose like whether they want to run Hortonworks distribution on-premises, they want to do it in cloud, or they wat to build this hybrid solution where the data can reside on-premises, can move to cloud and build these common, hybrid architecture. So, that's what this does. >> So, HDP customers can store data in the Google Cloud. They can execute ephemeral workloads, analytic workloads, machine learning in the Google Cloud. And there's some tie-in between Hortonworks's real-time or low latency or streaming capabilities from HDF in the Google Cloud. So, could you describe, at a full sort of detail level, the degrees of technical integration between your two offerings here. >> You want to take that? >> Sure, I'll handle that. So, essentially, deep in the heart of HDP, there's the HDFS layer that includes Hadoop compatible file system which is a plug-able file system layer. So, what Google has done is they have provided an implementation of this API for the Google Cloud Storage Connector. So this is the GCS Connector. We've taken the connector and we've actually continued to refine it to work with our workloads and now Hortonworks has actually bundling, packaging, and making this connector be available as part of HDP. >> So bilateral data movement between them? Bilateral workload movement? >> No, think of this as being very efficient when our workloads are running on top of GCP. When they need to get at data, they can get at data that is in the Google Cloud Storage buckets in a very, very efficient manner. So, since we have fairly deep expertise on workloads like Apache Hive and Apache Spark, we've actually done work in these workloads to make sure that they can run efficiently, not just on HDFS, but also in the cloud storage connector. This is a critical part of making sure that the architecture is actually optimized for the cloud. So, at our skill and our customers are moving their workloads from on-premise to the cloud, it's not just functional parity, but they also need sort of the operational and the cost efficiency that they're looking for as they move to the cloud. So, to do that, we need to enable these fundamental disaggregated storage pattern. See, on-prem, the big win with Hadoop was we could bring the processing to where the data was. In the cloud, we need to make sure that we work well when storage and compute are disaggregated and they're scaled elastically, independent of each other. So this is a fairly fundamental architectural change. We want to make sure that we enable this in a first-class manner. >> I think that's a key point, right. I think what cloud allows you to do is scale the storage and compute independently. And so, with storing data in Google Cloud Storage, you can like scale that horizontally and then just leverage that as your storage layer. And the compute can independently scale by itself. And what this is allowing customers of HDP and HDF is store the data on GCP, on the cloud storage, and then just use the scale, the compute side of it with HDP and HDF. >> So, if you'll indulge me to a name, another Hortonworks partner for just a hypothetical. Let's say one of your customers is using IBM Data Science Experience to do TensorFlow modeling and training, can they then inside of HDP on GCP, can they use the compute infrastructure inside of GCP to do the actual modeling which is more compute intensive and then the separate decoupled storage infrastructure to do the training which is more storage intensive? Is that a capability that would available to your customers? With this integration with Google? >> Yeah, so where we are going with this is we are saying, IBM DSX and other solutions that are built on top of HDP, they can transparently take advantage of the fact that they have HDP compute infrastructure to run against. So, you can run your machine learning training jobs, you can run your scoring jobs and you can have the same unmodified DSX experience whether you're running against an on-premise HDP environment or an in-cloud HDP environment. Further, that's sort of the benefit for partners and partner solutions. From a customer standpoint, the big value prop here is that customers, they're used to securing and governing their data on-prem in their particular way with HDP, with Apache Ranger, Atlas, and so forth. So, when they move to the cloud, we want this experience to be seamless from a management standpoint. So, from a data management standpoint, we want all of their learning from a security and governance perspective to apply when they are running in Google Cloud as well. So, we've had this capability on Azure and on AWS, so with this partnership, we are announcing the same type of deep integration with GCP as well. >> So Hortonworks is that one pane of glass across all your product partners for all manner of jobs. Go ahead, Rebecca. >> Well, I just wanted to ask about, we've talked about the reason, the impetus for this. With the customer, it's more familiar for customers, it offers the seamless experience, But, can you delve a little bit into the business problems that you're solving for customers here? >> A lot of times, our customers are at various points on their cloud journey, that for some of them, it's very simple, they're like there's a broom coming by and the datacenter is going away in 12 months and I need to be in the cloud. So, this is where there is a wholesale movement of infrastructure from on-premise to the cloud. Others are exploring individual business use cases. So, for example, one of our large customers, a travel partner, so they are exploring their new pricing model and they want to roll out this pricing model in the cloud. They have on-premise infrastructure, they know they have that for a while. They are spinning up new use cases in the cloud typically for reasons of agility. So, if you, typically many of our customers, they operate large, multi-tenant clusters on-prem. That's nice for, so a very scalable compute for running large jobs. But, if you want to run, for example, a new version of Spark, you have to upgrade the entire cluster before you can do that. Whereas in this sort of model, what they can say is, they can bring up a new workload and just have the specific versions and dependency that it needs, independent of all of their other infrastructure. So this gives them agility where they can move as fast as... >> Through the containerization of the Spark jobs or whatever. >> Correct, and so containerization as well as even spinning up an entire new environment. Because, in the cloud, given that you have access to elastic compute resources, they can come and go. So, your workloads are much more independent of the underlying cluster than they are on-premise. And this is where sort of the core business benefits around agility, speed of deployment, things like that come into play. >> And also, if you look at the total cost of ownership, really take an example where customers are collecting all this information through the month. And, at month end, you want to do closing of books. And so that's a great example where you want ephemeral workloads. So this is like do it once in a month, finish the books and close the books. That's a great scenario for cloud where you don't have to on-premises create an infrastructure, keep it ready. So that's one example where now, in the new partnership, you can collect all the data through the on-premises if you want throughout the month. But, move that and leverage cloud to go ahead and scale and do this workload and finish the books and all. That's one, the second example I can give is, a lot of customers collecting, like they run their e-commerce platforms and all on-premises, let's say they're running it. They can still connect all these events through HDP that may be running on-premises with Kafka and then, what you can do is, in-cloud, in GCP, you can deploy HDP, HDF, and you can use the HDF from there for real-time stream processing. So, collect all these clickstream events, use them, make decisions like, hey, which products are selling better?, should we go ahead and give?, how many people are looking at that product?, or how many people have bought it?. That kind of aggregation and real-time at scale, now you can do in-cloud and build these hybrid architectures that are there. And enable scenarios where in past, to do that kind of stuff, you would have to procure hardware, deploy hardware, all of that. Which all goes away. In-cloud, you can do that much more flexibly and just use whatever capacity you have. >> Well, you know, ephemeral workloads are at the heart of what many enterprise data scientists do. Real-world experiments, ad-hoc experiments, with certain datasets. You build a TensorFlow model or maybe a model in Caffe or whatever and you deploy it out to a cluster and so the life of a data scientist is often nothing but a stream of new tasks that are all ephemeral in their own right but are part of an ongoing experimentation program that's, you know, they're building and testing assets that may be or may not be deployed in the production applications. That's you know, so I can see a clear need for that, well, that capability of this announcement in lots of working data science shops in the business world. >> Absolutely. >> And I think coming down to, if you really look at the partnership, right. There are two or three key areas where it's going to have a huge advantage for our customers. One is analytics at-scale at a lower cost, like total cost of ownership, reducing that, running at-scale analytics. That's one of the big things. Again, as I said, the hybrid scenarios. Most customers, enterprise customers have huge deployments of infrastructure on-premises and that's not going to go away. Over a period of time, leveraging cloud is a priority for a lot of customers but they will be in these hybrid scenarios. And what this partnership allows them to do is have these scenarios that can span across cloud and on-premises infrastructure that they are building and get business value out of all of these. And then, finally, we at Google believe that the world will be more and more real-time over a period of time. Like, we already are seeing a lot of these real-time scenarios with IoT events coming in and people making real-time decisions. And this is only going to grow. And this partnership also provides the whole streaming analytics capabilities in-cloud at-scale for customers to build these hybrid plus also real-time streaming scenarios with this package. >> Well it's clear from Google what the Hortonworks partnership gives you in this competitive space, in the multi-cloud space. It gives you that ability to support hybrid cloud scenarios. You're one of the premier public cloud providers and we all know about. And clearly now that you got, you've had the Hortonworks partnership, you have that ability to support those kinds of highly hybridized deployments for your customers, many of whom I'm sure have those requirements. >> That's perfect, exactly right. >> Well a great note to end on. Thank you so much for coming on theCUBE. Sudhir, Ram, that you so much. >> Thank you, thanks a lot. >> Thank you. >> I'm Rebecca Knight for James Kobielus, we will have more tomorrow from DataWorks. We will see you tomorrow. This is theCUBE signing off. >> From sunny San Jose. >> That's right.

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, for coming on the show. So, I want to start out by asking you to run on the Google Cloud Platform. and as they look at moving to cloud, in the Google Cloud. So, essentially, deep in the heart of HDP, and the cost efficiency is scale the storage and to do the training which and you can have the same that one pane of glass With the customer, it's and just have the specific of the Spark jobs or whatever. of the underlying cluster and then, what you can and so the life of a data that the world will be And clearly now that you got, Sudhir, Ram, that you so much. We will see you tomorrow.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca	PERSON	0.99+
two	QUANTITY	0.99+
Sudhir	PERSON	0.99+
Ram Venkatesh	PERSON	0.99+
San Jose	LOCATION	0.99+
HortonWorks	ORGANIZATION	0.99+
Sudhir Hasbe	PERSON	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
two guests	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
DataWorks	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Ram	PERSON	0.99+
AWS	ORGANIZATION	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.99+
two offerings	QUANTITY	0.98+
12 months	QUANTITY	0.98+
One	QUANTITY	0.98+
Day One	QUANTITY	0.98+
DataWorks Summit 2018	EVENT	0.97+
IBM	ORGANIZATION	0.97+
second example	QUANTITY	0.97+
Google Cloud Platform	TITLE	0.96+
Atlas	ORGANIZATION	0.96+
Google Cloud	TITLE	0.94+
Apache Ranger	ORGANIZATION	0.92+
three key areas	QUANTITY	0.92+
Hadoop	TITLE	0.91+
Kafka	TITLE	0.9+
theCUBE	ORGANIZATION	0.88+
earlier this morning	DATE	0.87+
Apache Hive	ORGANIZATION	0.86+
GCP	TITLE	0.86+
one pane	QUANTITY	0.86+
IBM Data Science	ORGANIZATION	0.84+
Azure	TITLE	0.82+
Spark	TITLE	0.81+
first	QUANTITY	0.79+
HDF	ORGANIZATION	0.74+
once in a month	QUANTITY	0.73+
HDP	ORGANIZATION	0.7+
TensorFlow	OTHER	0.69+
Hortonworks DataPlatform	ORGANIZATION	0.67+
Apache Spark	ORGANIZATION	0.61+
GCS	OTHER	0.57+
HDP	TITLE	0.5+
DSX	TITLE	0.49+
Cloud Storage	TITLE	0.47+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Hortonworks DataPlatform: