Ram Venkatesh, Cloudera | AWS re:Invent 2020
>>from >>around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel, AWS and our community partners. >>Everyone welcome back to the cubes Coverage of AWS reinvent 2020 virtual. This is the Cube virtual. I'm John for your host this year. We're not in person. We're doing remote interviews because of the pandemic. The whole events virtual over three weeks for this week would be having a lot of coverage in and out of what's going on with the news. All that stuff here happening on the Cube Our next guest is a featured segment. Brown Venkatesh, VP of Engineering at Cloudera. Welcome back to the Cube Cube Alumni. Last time you were on was 2018 when we had physical events. Great to see you, >>like good to be here. Thank you. >>S O. You know, Cloudera obviously modernized up with Horton works. That comedy has been for a while, always pioneering this abstraction layer originally with a dupe. Now, with data, all those right calls were made. Data is hot is a big part of reinvent. That's a big part of the theme, you know, machine learning ai ai edge edge edge data lakes on steroids, higher level services in the cloud. This is the focus of reinvents. The big conversations Give us an update on cloud eras. Data platform. What's that? What's new? >>Absolutely. You are really speaking of languages. Read with the whole, uh, data lake architecture that you alluded to. It's uploaded. This mission has always been about, you know, we want to manage how the world's data that what this means for our customers is being ableto aggregate data from lots of different sources into central places that we call data lakes on. Then apply lots of different types of passing to it to direct business value that would cdp with Florida data platform. What we have essentially done is take those same three core tenants around data legs multifunctional takes on data stewardship of management to add on a bunch off cloud native capabilities to it. So this was fundamentally I'm talking about things like disaggregated storage and compute by being able to now not only take advantage of H d efs, but also had a pretty deep, fundamental level club storage. But this is the form factor that's really, really good for our customers. Toe or to operate that from a TCO perspective, if you're going to manage hundreds of terabytes of data like like a lot of a lot of customers do it. The second key piece that we've done with CDP has to do with us embracing containers and communities in a big way on primer heritages around which machines and clusters and things of that nature. But in the cloud context, especially in the context, off managed community services like Amazon CKs, this Lexus spin apart traditional workloads, Sequels, park machine learning and so on. In the context of these Cuban exiles containerized environments which lets customers spin these up in seconds. They're supposed to, you know, tens of minutes on as they're passing, needs grow and shrink. They can actually scale much, much faster up and down to, you know, to make sure that they have the right cost effective footprint for their compute e >>go ahead third piece. >>But the turkey piece of all of this right is to say, along with like cloud native orchestration and cloud NATO storage is that we've embraced this notion of making sure that you actually have a robust data discovery story around it. so increasingly the data sets that you create on top off a platform like CDP. There themselves have value in other use cases that you want to make sure that these data sets are properly replicated. They're probably secure the public government. So you can go and analyze where the data set came from. Capabilities of security and provenance are increasingly more important to our customers. So with CDP, we have a really good story around that data stewardship aspect, which is increasingly important as you as you get into the cloud. And you have these sophisticated sharing scenarios. The >>you know, Clotaire has always had and Horton works. Both companies had strong technical chops. It's well document. Certainly the queues been toe all the events and covered both companies since the inception of 10 years ago. A big data. But now we're in cloud. Big data, fast data, little data, all data. This is what the cloud brings. So I want to get your thoughts on the number one focus of problem solving around cloud. I gotta migrate. Or do I move to the cloud immediately and be born there? Now we know the hyper scale is born in the cloud companies like the Dropbox in the world. They were born in the cloud and all the benefits and goodness came with that. But I'm gonna be pivoting. I'm a company at a co vid with a growth strategy. Lift and shift. Okay, that was It's over. Now that's the low hanging fruit that's use cases kind of done. Been there, done that. Is it migration or born in the cloud? Take us through your thoughts on what does the company do right now? >>E thinks it's a really good question. If you think off, you know where our customers are in their own data journey, right? So increasingly. You know, a few years ago, I would say it was about operating infrastructure. That's where their head was at, right? Increasingly, I think for them it's about deriving value from the data assets that they already have on. This typically means in a combining data from different sources the structure data, some restructure data, transactional data, non transactional, data event oriented data messaging data. They wanna bring all of that and analyze that to make sure that they can actually identify ways toe monetize it in ways that they had not thought about when they actually stored the data originally, right? So I think it's this drive towards increasing monetization of data assets that's driving the new use cases on the platform. Traditionally, it used to be about, you know, sequel analysts who are, if you are like a data scientist using a party's park. So it was sort of this one function that you would focus on with the data. But increasingly, we're seeing these air about, you know, these air collaborative use cases where you wanna have a little bit of sequel, a little bit of machine learning, a little bit off, you know, potentially real time streaming or even things like Apache fling that you're gonna use to actually analyze the data eso when this kind of an environment. But we see that the data that's being generated on Prem is extremely relevant to the use case, but the speed at which they want to deploy the use case. They really want to make sure that they can take advantage of the clouds, agility and infinite capacity to go do that. So it's it's really the answer is it's complicated. It's not so much about you know I'm gonna move my data platform that I used to run the old way from here to there. But it's about I got this use case and I got to stand this up in six weeks, right in the middle of the pandemic on how do I go do that on the data that has to come from my existing line of business systems. I'm not gonna move those over, but I want to make sure that I can analyze the data from their in some cohesive Does that make sense? >>Totally makes sense. And I think just to kind of bring that back for the folks watching. And I remember when CDP was launching the thes data platforms, it really was to replace the data warehouse is the old antiquated way of doing things. But it was interesting. It wasn't just about competing at that old category. It was a new category. So, yeah, you had to have some tooling some sequel, you know, to wrangle data and have some prefabricated, you know, data fenced out somewhere in some warehouse. But the value was the new use cases of data where you never know. You don't know where it's going to come until it comes right, because if you make it addressable, that was the idea of the data platform and data Lakes and then having higher level services. So s so to me. That's, I think, one distinction kind of new category coexisting and disrupting an old category data warehousing. Always bought into that. You know, there's some technical things spark Do all these elements on mechanisms underneath. That's just evolution. But income in incomes cloud on. I want to get your thoughts on this because one of the things that's coming out of all my interviews is speed, speed, speed, deploying high, high, large scale at very large speed. This is the modern application thinking okay to make that work, you gotta have the data fabric underneath. This has always been kind of the dream scenario, So it's kind of playing out. So one Do you believe in that? And to what is the relationship between Cloudera and AWS? Because I think that kind of interestingly points to this one piece. >>Absolutely. So I think that yeah, from my perspective, this is what we call the shared data experience that's central to see PP like the idea is that, you know, data that is generated by the business in one use case is relevant and valid in another use case that is central to how we see companies leveraging data or the second order monetization that they're after, Right? So I think this is where getting out off a traditional data warehouse like data side of context, being able to analyze all of the data that you have, I think is really, really important for many of our customers. For example, many of them increasingly hold what they call this like data hackathons right where they're looking at can be answered. This new question from all the data that we have that is, that is a type of use case that's really hard to enable unless you have a very cohesive, very homogeneous view off all of your data. When it comes to the cloud partners, right, Increasingly, we see that the cloud native services, especially for the core storage, compute and security services are extremely robust that they give us, you know, the scale and that's really truly unparalled in terms of how much data we can address, how quickly we can actually get access to compute on demand when we need it. And we can do all of this with, like, a very, very mature security and governance fabric that you can fit into. So we see that, you know, technologies like s three, for example, have come a long way on along the journey with Amazon on this over the last 78 years. But we both learned how to operate our work clothes. When you're running a terabytes scale, right, you really have to pay attention to matters like scale out and consistency and parallelism and all of these things. These matters significantly right? And it's taken a certain maturity curve that you have to go through to get there. The last part of that is that because the TCO is so optimized with the customer to operate this without any ops on their side, they could just start consuming data, even if it's a terabyte of data. So this means that now we have to have the smarts in the processing engines to think about things like cashing, for example very, very differently because the way you cash data that Zinn hedge defense is very different from how you would do that in the context of his three are similarly, the way you think about consistency and metadata is very, very different at that layer. But we made sure that we can abstract these differences out at the platform layer so that as an as it is an application consumer, you really get the same experience, whether you're running these analytics on clam or whether you're running them in the cloud. And that's really central to how I see this space evolving is that we want to meet the customer where they are, rather than forcing them to change the way they work because off the platform that they're simple. >>So could you take them in to explain some of the integrations with AWS and some customer examples? Because, um, you know, first of all, cost is a big concern on everyone's mind because, you know, it's still lower costs and higher value with the cloud anyway. But it could get away from you. So you know, you're constantly petabytes of scale. There's a lot of data moving around. That's one thing to integration with higher level services. Can you give where does explain how Claudia integration with Amazon? What's the relation of customer wants to know. Hey, you guys, you know, partnering, explain the partnership. And what does it mean for me? >>Absolutely. So the way we look at the partnership hit that one person and ghetto. It's really a four layer cake because the lowest layer is the core infrastructure services. We talked about storage and computing on security, and I am so on and so forth. So that layer is a very robust integration that goes back a few years. The next layer up from that has to do with increasingly, you know, as our customers use analytic experiences from Florida on, they want to combine that with data that's actually in the AWS compute experiences like the red Ship, for example. That's what the analytics layer uploaded the data warehouse offering and how that interrupts would be other services in Amazon that could be relevant. This is common file formats that open source well form it really help us in this context to make sure that they have a very strong level of interest at the analytics there. The third layer up from that has to do with consumption. Like if you're gonna bring an analyst on board. You want to make sure that all of their sequel, like analyst experiences, notebooks, things of that nature that's really strong. And club out of the third layer on the highest layer is really around. Data sharing. That's as aws new and technologies like that become more prevalent. Now. Customers want to make sure that they can have these data states that they have in the different clouds, actually in a robbery. So we provide ways for them, toe browse and search data, regardless of whether that data is on AWS or on traffic. And so that's how the fourth layer in the stack, the vertical slice running through all of these, that we have a really strong business relationship with them both on the on the on the commercial market side as well as in AWS marketplace. Right? So we can actually by having cdp be a part of it of the US marketplace. This means that if you have an enterprise agreement with with Amazon, you can actually pay for CDP toe the credit sexuality purchased. This is a very, very tight relationship that's designed again for these large scale speeds and feeds. Can the customer >>so just to get this right. So if I love the four layer cake icings the success of CDP love that birthday candles can be on top to when you're successful. But you're saying that you're going to mark with Amazon two ways marketplace listing and then also jointly with their enterprise field programs. That right? You say because they have this program you can bundle into the blanket pos or Pio processes That right can explain that again. >>S so if you think this'll states, if you're talking about are significant. So we want to make sure that, you know, we're really aligned with them in terms off our cloud migration strategy in terms of how the customer actually execute to what is a fairly you know, it's a complex deployment to deploy a large multiple functions did and existed takes time, right, So we're gonna make sure that we navigate this together jointly with the U. S. To make sure that from a best practices standpoint, for example, were very well aligned from a cost standpoint, you know what we're telling the customer architecturally is very rather nine. That's that's where I think really the heart of the engineering relationship between the two companies without. >>So if you want Cloudera on Amazon, you just go in. You can click to buy. Or if you got to deal with Amazon in terms of global marketplace deal, which they have been rolling out, I could buy there too, Right? All right, well, run. Thanks for the update and insight. Um, love the four layer cake love gets. See the modernization of the data platform from Cloudera. And congratulations on all the hard work you guys been doing with AWS. >>Thank you so much. Appreciate. >>Okay, good to see you. Okay, I'm John for your hearing. The Cube for Cube virtual for eight of us. Reinvent 2020 virtual. Thanks for watching.
SUMMARY :
It's the Cube with digital coverage of AWS All that stuff here happening on the Cube Our next like good to be here. That's a big part of the theme, you know, machine learning ai ai edge you know, to make sure that they have the right cost effective footprint for their compute e so increasingly the data sets that you create on top off a platform you know, Clotaire has always had and Horton works. on how do I go do that on the data that has to come from my existing line of business systems. But the value was the new use cases of data where you never know. So we see that, you know, technologies like s three, So you know, you're constantly petabytes of scale. The next layer up from that has to do with increasingly, you know, as our customers use analytic So if I love the four layer cake icings the success of CDP love So we want to make sure that, you know, we're really aligned with them And congratulations on all the hard work you guys been Thank you so much. Okay, good to see you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Amazon | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Ram Venkatesh | PERSON | 0.99+ |
2018 | DATE | 0.99+ |
Dropbox | ORGANIZATION | 0.99+ |
Cloudera | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
Florida | LOCATION | 0.99+ |
Horton | PERSON | 0.99+ |
Brown Venkatesh | PERSON | 0.99+ |
Both companies | QUANTITY | 0.99+ |
Lexus | ORGANIZATION | 0.99+ |
both companies | QUANTITY | 0.99+ |
two companies | QUANTITY | 0.99+ |
eight | QUANTITY | 0.99+ |
tens of minutes | QUANTITY | 0.99+ |
one thing | QUANTITY | 0.99+ |
hundreds of terabytes | QUANTITY | 0.98+ |
this week | DATE | 0.98+ |
three | QUANTITY | 0.98+ |
third layer | QUANTITY | 0.98+ |
aws | ORGANIZATION | 0.98+ |
two ways | QUANTITY | 0.98+ |
this year | DATE | 0.98+ |
US | LOCATION | 0.98+ |
Intel | ORGANIZATION | 0.97+ |
over three weeks | QUANTITY | 0.97+ |
10 years ago | DATE | 0.97+ |
third piece | QUANTITY | 0.97+ |
fourth layer | QUANTITY | 0.97+ |
both | QUANTITY | 0.97+ |
one piece | QUANTITY | 0.96+ |
Clotaire | ORGANIZATION | 0.96+ |
pandemic | EVENT | 0.94+ |
third laye | QUANTITY | 0.94+ |
second key piece | QUANTITY | 0.93+ |
Cube virtual | COMMERCIAL_ITEM | 0.92+ |
TCO | ORGANIZATION | 0.91+ |
second order | QUANTITY | 0.9+ |
four layer | QUANTITY | 0.89+ |
U. S. | LOCATION | 0.89+ |
six weeks | QUANTITY | 0.89+ |
one | QUANTITY | 0.88+ |
Zinn | ORGANIZATION | 0.86+ |
few years ago | DATE | 0.86+ |
last 78 years | DATE | 0.85+ |
one person | QUANTITY | 0.84+ |
terabyte | QUANTITY | 0.83+ |
Cube for | COMMERCIAL_ITEM | 0.83+ |
one function | QUANTITY | 0.81+ |
Apache | ORGANIZATION | 0.79+ |
Cube | COMMERCIAL_ITEM | 0.79+ |
2020 | TITLE | 0.79+ |
one distinction | QUANTITY | 0.77+ |
CDP | ORGANIZATION | 0.74+ |
three core tenants | QUANTITY | 0.72+ |
Claudia | PERSON | 0.72+ |
turkey | OTHER | 0.71+ |
reinvent 2020 | EVENT | 0.67+ |
S O. | PERSON | 0.64+ |
nine | QUANTITY | 0.63+ |
data | QUANTITY | 0.6+ |
NATO | ORGANIZATION | 0.59+ |
clam | ORGANIZATION | 0.59+ |
VP | PERSON | 0.53+ |