Tomer Shiran, Dremio | AWS re:Invent 2022

>>Hey everyone. Welcome back to Las Vegas. It's the Cube live at AWS Reinvent 2022. This is our fourth day of coverage. Lisa Martin here with Paul Gillen. Paul, we started Monday night, we filmed and streamed for about three hours. We have had shammed pack days, Tuesday, Wednesday, Thursday. What's your takeaway? >>We're routed final turn as we, as we head into the home stretch. Yeah. This is as it has been since the beginning, this show with a lot of energy. I'm amazed for the fourth day of a conference, how many people are still here I am too. And how, and how active they are and how full the sessions are. Huge. Proud for the keynote this morning. You don't see that at most of the day four conferences. Everyone's on their way home. So, so people come here to learn and they're, and they're still >>Learning. They are still learning. And we're gonna help continue that learning path. We have an alumni back with us, Toron joins us, the CPO and co-founder of Dremeo. Tomer, it's great to have you back on the program. >>Yeah, thanks for, for having me here. And thanks for keeping the, the best session for the fourth day. >>Yeah, you're right. I like that. That's a good mojo to come into this interview with Tomer. So last year, last time I saw you was a year ago here in Vegas at Reinvent 21. We talked about the growth of data lakes and the data lake houses. We talked about the need for open data architectures as opposed to data warehouses. And the headline of the Silicon Angle's article on the interview we did with you was, Dremio Predicts 2022 will be the year open data architectures replace the data warehouse. We're almost done with 2022. Has that prediction come true? >>Yeah, I think, I think we're seeing almost every company out there, certainly in the enterprise, adopting data lake, data lakehouse technology, embracing open source kind of file and table formats. And, and so I think that's definitely happening. Of course, nothing goes away. So, you know, data warehouses don't go away in, in a year and actually don't go away ever. We still have mainframes around, but certainly the trends are, are all pointing in that direction. >>Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, what it really means for organizations. >>Yeah. I think you could think of the data lakehouse as the evolution of the data lake, right? And so, you know, for, for, you know, the last decade we've had kind of these two options, data lakes and data warehouses and, you know, warehouses, you know, having good SQL support, but, and good performance. But you had to spend a lot of time and effort getting data into the warehouse. You got locked into them, very, very expensive. That's a big problem now. And data lakes, you know, more open, more scalable, but had all sorts of kind of limitations. And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache Iceberg, is we've unlocked all the capabilities of the warehouse directly on object storage like s3. So you can insert and update and delete individual records. You can do transactions, you can do all the things you could do with a, a database directly in kind of open formats without getting locked in at a much lower cost. >>But you're still dealing with semi-structured data as opposed to structured data. And there's, there's work that has to be done to get that into a usable form. That's where Drio excels. What, what has been happening in that area to, to make, I mean, is it formats like j s o that are, are enabling this to happen? How, how we advancing the cause of making semi-structured data usable? Yeah, >>Well, I think first of all, you know, I think that's all changed. I think that was maybe true for the original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. It's all, it's all tables with the schema. And, you know, you can, you know, create table insert records. You know, it's, it's, it's really everything you can do with a data warehouse you can now do in the lakehouse. Now, that's not to say that there aren't like very advanced capabilities when it comes to, you know, j s O and nested data and kind of sparse data. You know, we excel in that as well. But we're really seeing kind of the lakehouse take over the, the bread and butter data warehouse use cases. >>You mentioned open a minute ago. Talk about why it's, why open is important and the value that it can deliver for customers. >>Yeah, well, I think if you look back in time and you see all the challenges that companies have had with kind of traditional data architectures, right? The, the, the, a lot of that comes from the, the, the problems with data warehouses. The fact that they are, you know, they're very expensive. The data is, you have to ingest it into the data warehouse in order to query it. And then it's almost impossible to get off of these systems, right? It takes an enormous effort, tremendous cost to get off of them. And so you're kinda locked in and that's a big problem, right? You also, you're dependent on that one data warehouse vendor, right? You can only do things with that data that the warehouse vendor supports. And if you contrast that to data lakehouse and open architectures where the data is stored in entirely open formats. >>So things like par files and Apache iceberg tables, that means you can use any engine on that data. You can use s SQL Query Engine, you can use Spark, you can use flin. You know, there's a dozen different engines that you can use on that, both at the same time. But also in the future, if you ever wanted to try something new that comes out, some new open source innovation, some new startup, you just take it and point out the same data. So that data's now at the core, at the center of the architecture as opposed to some, you know, vendors logo. Yeah. >>Amazon seems to be bought into the Lakehouse concept. It has big announcements on day two about eliminating the ETL stage between RDS and Redshift. Do you see the cloud vendors as pushing this concept forward? >>Yeah, a hundred percent. I mean, I'm, I'm Amazon's a great, great partner of ours. We work with, you know, probably 10 different teams there. Everything from, you know, the S3 team, the, the glue team, the click site team, you know, everything in between. And, you know, their embracement of the, the, the lake house architecture, the fact that they adopted Iceberg as their primary table format. I think that's exciting as an industry. We're all coming together around standard, standard ways to represent data so that at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account in open formats and be able to use all these different engines without losing any of the functionality that they need, right? The ability to do all these interactions with data that maybe in the past you would have to move the data into a database or, or warehouse in order to do, you just don't have to do that anymore. Speaking >>Of functionality, talk about what's new this year with drio since we've seen you last. >>Yeah, there's a lot of, a lot of new things with, with Drio. So yeah, we now have full Apache iceberg support, you know, with DML commands, you can do inserts, updates, deletes, you know, copy into all, all that kind of stuff is now, you know, fully supported native part of the platform. We, we now offer kind of two flavors of dr. We have, you know, Dr. Cloud, which is our SaaS version fully hosted. You sign up with your Google or, you know, Azure account and, and, and you're up in, you're up and running in, in, in a minute. And then dral software, which you can self host usually in the cloud, but even, even even outside of the cloud. And then we're also very excited about this new idea of data as code. And so we've introduced a new product that's now in preview called Dr. >>Arctic. And the idea there is to bring the concepts of GI or GitHub to the world of data. So things like being able to create a branch and work in isolation. If you're a data scientist, you wanna experiment on your own without impacting other people, or you're a data engineer and you're ingesting data, you want to transform it and test it before you expose it to others. You can do that in a branch. So all these ideas that, you know, we take for granted now in the world of source code and software development, we're bringing to the world of data with Jamar. And when you think about data mesh, a lot of people talking about data mesh now and wanting to kind of take advantage of, of those concepts and ideas, you know, thinking of data as a product. Well, when you think about data as a product, we think you have to manage it like code, right? You have to, and that's why we call it data as code, right? The, all those reasons that we use things like GI have to build products, you know, if we wanna think of data as a product, we need all those capabilities also with data. You know, also the ability to go back in time. The ability to undo mistakes, to see who changed my data and when did they change that table. All of those are, are part of this, this new catalog that we've created. >>Are you talk about data as a product that's sort of intrinsic to the data mesh concept. Are you, what's your opinion of data mesh? Is the, is the world ready for that radically different approach to data ownership? >>You know, we are now in dozens of, dozens of our customers that are using drio for to implement enterprise-wide kind of data mesh solutions. And at the end of the day, I think it's just, you know, what most people would consider common sense, right? In a large organization, it is very hard for a centralized single team to understand every piece of data, to manage all the data themselves, to, you know, make sure the quality is correct to make it accessible. And so what data mesh is first and foremost about is being able to kind of federate the, or distribute the, the ownership of data, the governance of the data still has to happen, right? And so that is, I think at the heart of the data mesh, but thinking of data as kind of allowing different teams, different domains to own their own data to really manage it like a product with all the best practices that that we have with that super important. >>So we we're doing a lot with data mesh, you know, the way that cloud has multiple projects and the way that Jamar allows you to have multiple catalogs and different groups can kind of interact and share data among each other. You know, the fact that we can connect to all these different data sources, even outside your data lake, you know, with Redshift, Oracle SQL Server, you know, all the different databases that are out there and join across different databases in addition to your data lake, that that's all stuff that companies want with their data mesh. >>What are some of your favorite customer stories that where you've really helped them accelerate that data mesh and drive business value from it so that more people in the organization kind of access to data so they can really make those data driven decisions that everybody wants to make? >>I mean, there's, there's so many of them, but, you know, one of the largest tech companies in the world creating a, a data mesh where you have all the different departments in the company that, you know, they, they, they were a big data warehouse user and it kinda hit the wall, right? The costs were so high and the ability for people to kind of use it for just experimentation, to try new things out to collaborate, they couldn't do it because it was so prohibitively expensive and difficult to use. And so what they said, well, we need a platform that different people can, they can collaborate, they can ex, they can experiment with the data, they can share data with others. And so at a big organization like that, the, their ability to kind of have a centralized platform but allow different groups to manage their own data, you know, several of the largest banks in the world are, are also doing data meshes with Dr you know, one of them has over over a dozen different business units that are using, using Dremio and that ability to have thousands of people on a platform and to be able to collaborate and share among each other that, that's super important to these >>Guys. Can you contrast your approach to the market, the snowflakes? Cause they have some of those same concepts. >>Snowflake's >>A very closed system at the end of the day, right? Closed and very expensive. Right? I think they, if I remember seeing, you know, a quarter ago in, in, in one of their earnings reports that the average customer spends 70% more every year, right? Well that's not sustainable. If you think about that in a decade, that's your cost is gonna increase 200 x, most companies not gonna be able to swallow that, right? So companies need, first of all, they need more cost efficient solutions that are, you know, just more approachable, right? And the second thing is, you know, you know, we talked about the open data architecture. I think most companies now realize that the, if you want to build a platform for the future, you need to have the data and open formats and not be locked into one vendor, right? And so that's kind of another important aspect beyond that's ability to connect to all your data, even outside the lake to your different databases, no sequel databases, relational databases, and drs semantic layer where we can accelerate queries. And so typically what you have, what happens with data warehouses and other data lake query engines is that because you can't get the performance that you want, you end up creating lots and lots of copies of data. You, for every use case, you're creating a, you know, a pre-joy copy of that data, a pre aggregated version of that data. And you know, then you have to redirect all your data. >>You've got a >>Governance problem, individual things. It's expensive. It's expensive, it's hard to secure that cuz permissions don't travel with the data. So you have all sorts of problems with that, right? And so what we've done because of our semantic layer that makes it easy to kind of expose data in a logical way. And then our query acceleration technology, which we call reflections, which transparently accelerates queries and gives you subsecond response times without data copies and also without extracts into the BI tools. Cause if you start doing bi extracts or imports, again, you have lots of copies of data in the organization, all sorts of refresh problems, security problems, it's, it's a nightmare, right? And that just collapsing all those copies and having a, a simple solution where data's stored in open formats and we can give you fast access to any of that data that's very different from what you get with like a snowflake or, or any of these other >>Companies. Right. That, that's a great explanation. I wanna ask you, early this year you announced that your Dr. Cloud service would be a free forever, the basic DR. Cloud service. How has that offer gone over? What's been the uptake on that offer? >>Yeah, it, I mean it is, and thousands of people have signed up and, and it's, I think it's a great service. It's, you know, it's very, very simple. People can go on the website, try it out. We now have a test drive as well. If, if you want to get started with just some sample public sample data sets and like a tutorial, we've made that increasingly easy as well. But yeah, we continue to, you know, take that approach of, you know, making it, you know, making it easy, democratizing these kind of cloud data platforms and, and kinda lowering the barriers to >>Adoption. How, how effective has it been in driving sales of the enterprise version? >>Yeah, a lot of, a lot of, a lot of business with, you know, that, that we do like when it comes to, to selling is, you know, folks that, you know, have educated themselves, right? They've started off, they've followed some tutorials. I think generally developers, they prefer the first interaction to be with a product, not with a salesperson. And so that's, that's basically the reason we did that. >>Before we ask you the last question, I wanna just, can you give us a speak peek into the product roadmap as we enter 2023? What can you share with us that we should be paying attention to where Drum is concerned? >>Yeah. You know, actually a couple, couple days ago here at the conference, we, we had a press release with all sorts of new capabilities that we, we we just released. And there's a lot more for, for the coming year. You know, we will shortly be releasing a variety of different performance enhancements. So we'll be in the next quarter or two. We'll be, you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections and our career acceleration, you know, support for all the major clouds is coming. You know, just a lot of capabilities in Inre that make it easier and easier to use the platform. >>Awesome. Tomer, thank you so much for joining us. My last question to you is, if you had a billboard in your desired location and it was going to really just be like a mic drop about why customers should be looking at Drio, what would that billboard say? >>Well, DRIO is the easy and open data lake house and, you know, open architectures. It's just a lot, a lot better, a lot more f a lot more future proof, a lot easier and a lot just a much safer choice for the future for, for companies. And so hard to argue with those people to take a look. Exactly. That wasn't the best. That wasn't the best, you know, billboards. >>Okay. I think it's a great billboard. Awesome. And thank you so much for joining Poly Me on the program, sharing with us what's new, what some of the exciting things are that are coming down the pipe. Quite soon we're gonna be keeping our eye Ono. >>Awesome. Always happy to be here. >>Thank you. Right. For our guest and for Paul Gillin, I'm Lisa Martin. You're watching The Cube, the leader in live and emerging tech coverage.

Published Date : Dec 1 2022

SUMMARY :

It's the Cube live at AWS Reinvent This is as it has been since the beginning, this show with a lot of energy. it's great to have you back on the program. And thanks for keeping the, the best session for the fourth day. And the headline of the Silicon Angle's article on the interview we did with you was, So, you know, data warehouses don't go away in, in a year and actually don't go away ever. Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache are enabling this to happen? original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. You mentioned open a minute ago. The fact that they are, you know, they're very expensive. at the center of the architecture as opposed to some, you know, vendors logo. Do you see the at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account Apache iceberg support, you know, with DML commands, you can do inserts, updates, So all these ideas that, you know, we take for granted now in the world of Are you talk about data as a product that's sort of intrinsic to the data mesh concept. And at the end of the day, I think it's just, you know, what most people would consider common sense, So we we're doing a lot with data mesh, you know, the way that cloud has multiple several of the largest banks in the world are, are also doing data meshes with Dr you know, Cause they have some of those same concepts. And the second thing is, you know, you know, stored in open formats and we can give you fast access to any of that data that's very different from what you get What's been the uptake on that offer? But yeah, we continue to, you know, take that approach of, you know, How, how effective has it been in driving sales of the enterprise version? to selling is, you know, folks that, you know, have educated themselves, right? you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections My last question to you is, if you had a Well, DRIO is the easy and open data lake house and, you And thank you so much for joining Poly Me on the program, sharing with us what's new, Always happy to be here. the leader in live and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Paul Gillen	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tomer	PERSON	0.99+
Tomer Shiran	PERSON	0.99+
Toron	PERSON	0.99+
Las Vegas	LOCATION	0.99+
70%	QUANTITY	0.99+
Monday night	DATE	0.99+
Vegas	LOCATION	0.99+
fourth day	QUANTITY	0.99+
Paul	PERSON	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
dozens	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
10 different teams	QUANTITY	0.99+
Dremio	PERSON	0.99+
early this year	DATE	0.99+
SQL Query Engine	TITLE	0.99+
The Cube	TITLE	0.99+
Tuesday	DATE	0.99+
2023	DATE	0.99+
one	QUANTITY	0.98+
a year ago	DATE	0.98+
next quarter	DATE	0.98+
S3	TITLE	0.98+
a quarter ago	DATE	0.98+
twice	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Drio	ORGANIZATION	0.98+
couple days ago	DATE	0.98+
both	QUANTITY	0.97+
DRIO	ORGANIZATION	0.97+
2022	DATE	0.97+
Lake House	ORGANIZATION	0.96+
thousands of people	QUANTITY	0.96+
Wednesday	DATE	0.96+
Spark	TITLE	0.96+
200 x	QUANTITY	0.96+
first	QUANTITY	0.96+
Drio	TITLE	0.95+
Dremeo	ORGANIZATION	0.95+
two options	QUANTITY	0.94+
about three hours	QUANTITY	0.94+
day two	QUANTITY	0.94+
s3	TITLE	0.94+
Apache Iceberg	ORGANIZATION	0.94+
a minute ago	DATE	0.94+
Silicon Angle	ORGANIZATION	0.94+
hundred percent	QUANTITY	0.93+
Apache	ORGANIZATION	0.93+
single team	QUANTITY	0.93+
GitHub	ORGANIZATION	0.91+
this morning	DATE	0.9+
a dozen different engines	QUANTITY	0.89+
Iceberg	TITLE	0.87+
Redshift	TITLE	0.87+
last	DATE	0.87+
this year	DATE	0.86+
first interaction	QUANTITY	0.85+
two flavors	QUANTITY	0.84+
Thursday	DATE	0.84+
Azure	ORGANIZATION	0.84+
DR. Cloud	ORGANIZATION	0.84+
SQL Server	TITLE	0.83+
four conferences	QUANTITY	0.82+
coming year	DATE	0.82+
over over a dozen different business	QUANTITY	0.81+
one vendor	QUANTITY	0.8+
Poly	ORGANIZATION	0.79+
Jamar	PERSON	0.77+
GI	ORGANIZATION	0.77+
Inre	ORGANIZATION	0.76+
Dr.	ORGANIZATION	0.73+
Lake house	ORGANIZATION	0.71+
Arctic	ORGANIZATION	0.71+
a year	QUANTITY	0.7+
a minute	QUANTITY	0.7+
SQL	TITLE	0.69+
AWS Reinvent 2022	EVENT	0.69+
subsecond	QUANTITY	0.68+
DML	TITLE	0.68+

Ali Ghodsi, Databricks | Cube Conversation Partner Exclusive

(outro music) >> Hey, I'm John Furrier, here with an exclusive interview with Ali Ghodsi, who's the CEO of Databricks. Ali, great to see you. Preview for reinvent. We're going to launch this story, exclusive Databricks material on the notes, after the keynotes prior to the keynotes and after the keynotes that reinvent. So great to see you. You know, you've been a partner of AWS for a very, very long time. I think five years ago, I think I first interviewed you, you were one of the first to publicly declare that this was a place to build a company on and not just post an application, but refactor capabilities to create, essentially a platform in the cloud, on the cloud. Not just an ISV; Independent Software Vendor, kind of an old term, we're talking about real platform like capability to change the game. Can you talk about your experience as an AWS partner? >> Yeah, look, so we started in 2013. I swiped my personal credit card on AWS and some of my co-founders did the same. And we started building. And we were excited because we just thought this is a much better way to launch a company because you can just much faster get time to market and launch your thing and you can get the end users much quicker access to the thing you're building. So we didn't really talk to anyone at AWS, we just swiped a credit card. And eventually they told us, "Hey, do you want to buy extra support?" "You're asking a lot of advanced questions from us." "Maybe you want to buy our advanced support." And we said, no, no, no, no. We're very advanced ourselves, we know what we're doing. We're not going to buy any advanced support. So, you know, we just built this, you know, startup from nothing on AWS without even talking to anyone there. So at some point, I think around 2017, they suddenly saw this company with maybe a hundred million ARR pop up on their radar and it's driving massive amounts of compute, massive amounts of data. And it took a little bit in the beginning just us to get to know each other because as I said, it's like we were not on their radar and we weren't really looking, we were just doing our thing. And then over the years the partnership has deepened and deepened and deepened and then with, you know, Andy (indistinct) really leaning into the partnership, he mentioned us at Reinvent. And then we sort of figured out a way to really integrate the two service, the Databricks platform with AWS . And today it's an amazing partnership. You know, we directly connected with the general managers for the services. We're connected at the CEO level, you know, the sellers get compensated for pushing Databricks, we're, we have multiple offerings on their marketplace. We have a native offering on AWS. You know, we're prominently always sort of marketed and you know, we're aligned also vision wise in what we're trying to do. So yeah, we've come a very, very long way. >> Do you consider yourself a SaaS app or an ISV or do you see yourself more of a platform company because you have customers. How would you categorize your category as a company? >> Well, it's a data platform, right? And actually the, the strategy of the Databricks is take what's otherwise five, six services in the industry or five, six different startups, but do them as part of one data platform that's integrated. So in one word, the strategy of data bricks is "unification." We call it the data lake house. But really the idea behind the data lake house is that of unification, or in more words it's, "The whole is greater than the sum of its parts." So you could actually go and buy five, six services out there or actually use five, six services from the cloud vendors, stitch it together and it kind of resembles Databricks. Our power is in doing those integrated, together in a way in which it's really, really easy and simple to use for end users. So yeah, we're a data platform. I wouldn't, you know, ISV that's a old term, you know, Independent Software Vendor. You know, I think, you know, we have actually a whole slew of ISVs on top of Databricks, that integrate with our platform. And you know, in our marketplace as well as in our partner connect, we host those ISVs that then, you know, work on top of the data that we have in the Databricks, data lake house. >> You know, I think one of the things your journey has been great to document and watch from the beginning. I got to give you guys credit over there and props, congratulations. But I think you're the poster child as a company to what we see enterprises doing now. So go back in time when you guys swiped a credit card, you didn't need attending technical support because you guys had brains, you were refactoring, rethinking. It wasn't just banging out software, you had, you were doing some complex things. It wasn't like it was just write some software hosted on server. It was really a lot more. And as a result your business worth billions of dollars. I think 38 billion or something like that, big numbers, big numbers of great revenue growth as well, billions in revenue. You have customers, you have an ecosystem, you have data applications on top of Databricks. So in a way you're a cloud on top of the cloud. So is there a cloud on top of the cloud? So you have ISVs, Amazon has ISVs. Can you take us through what this means and at this point in history, because this seems to be an advanced version of benefits of platforming and refactoring, leveraging say AWS. >> Yeah, so look, when we started, there was really only one game in town. It was AWS. So it was one cloud. And the strategy of the company then was, well Amazon had this beautiful set of services that they're building bottom up, they have storage, compute, networking, and then they have databases and so on. But it's a lot of services. So let us not directly compete with AWS and try to take out one of their services. Let's not do that because frankly we can't. We were not of that size. They had the scale, they had the size and they were the only cloud vendor in town. So our strategy instead was, let's do something else. Let's not compete directly with say, a particular service they're building, let's take a different strategy. What if we had a unified holistic data platform, where it's just one integrated service end to end. So think of it as Microsoft office, which contains PowerPoint, and Word, and Excel and even Access, if you want to use it. What if we build that and AWS has this really amazing knack for releasing things, you know services, lots of them, every reinvent. And they're sort of a DevOps person's dream and you can stitch these together and you know you have to be technical. How do we elevate that and make it simpler and integrate it? That was our original strategy and it resonated with a segment of the market. And the reason it worked with AWS so that we wouldn't butt heads with AWS was because we weren't a direct replacement for this service or for that service, we were taking a different approach. And AWS, because credit goes to them, they're so customer obsessed, they would actually do what's right for the customer. So if the customer said we want this unified thing, their sellers would actually say, okay, so then you should use Databricks. So they truly are customer obsessed in that way. And I really mean it, John. Things have changed over the years. They're not the only cloud anymore. You know, Azure is real, GCP is real, there's also Alibaba. And now over 70% of our customers are on more than one cloud. So now what we hear from them is, not only want, do we want a simplified, unified thing, but we want it also to work across the clouds. Because those of them that are seriously considering multiple clouds, they don't want to use a service on cloud one and then use a similar service on cloud two. But it's a little bit different. And now they have to do twice the work to make it work. You know, John, it's hard enough as it is, like it's this data stuff and analytics. It's not a walk in the park, you know. You hire an administrator in the back office that clicks a button and its just, now you're a data driven digital transformed company. It's hard. If you now have to do it again on the second cloud with different set of services and then again on a third cloud with a different set of services. That's very, very costly. So the strategy then has changed that, how do we take that unified simple approach and make it also the same and standardize across the clouds, but then also integrate it as far down as we can on each of the clouds. So that you're not giving up any of the benefits that the particular cloud has. >> Yeah, I think one of the things that we see, and I want get your reaction to this, is this rise of the super cloud as we call it. I think you were involved in the Sky paper that I saw your position paper came out after we had introduced Super Cloud, which is great. Congratulations to the Berkeley team, wearing the hat here. But you guys are, I think a driver of this because you're creating the need for these things. You're saying, okay, we went on one cloud with AWS and you didn't hide that. And now you're publicly saying there's other clouds too, increased ham for your business. And customers have multiple clouds in their infrastructure for the best of breed that they have. Okay, get that. But there's still a challenge around the innovation, growth that's still around the corner. We still have a supply chain problem, we still have skill gaps. You know, you guys are unique at Databricks as other these big examples of super clouds that are developing. Enterprises don't have the Databricks kind of talent. They need, they need turnkey solutions. So Adam and the team at Amazon are promoting, you know, more solution oriented approaches higher up on the stack. You're starting to see kind of like, I won't say templates, but you know, almost like application specific headless like, low code, no code capability to accelerate clients who are wanting to write code for the modern error. Right, so this kind of, and then now you, as you guys pointed out with these common services, you're pushing the envelope. So you're saying, hey, I need to compete, I don't want to go to my customers and have them to have a staff or this cloud and this cloud and this cloud because they don't have the staff. Or if they do, they're very unique. So what's your reaction? Because this kind is the, it kind of shows your leadership as a partner of AWS and the clouds, but also highlights I think what's coming. But you share your reaction. >> Yeah, look, it's, first of all, you know, I wish I could take credit for this but I can't because it's really the customers that have decided to go on multiple clouds. You know, it's not Databricks that you know, push this or some other vendor, you know, that, Snowflake or someone who pushed this and now enterprises listened to us and they picked two clouds. That's not how it happened. The enterprises picked two clouds or three clouds themselves and we can get into why, but they did that. So this largely just happened in the market. We as data platforms responded to what they're then saying, which is they're saying, "I don't want to redo this again on the other cloud." So I think the writing is on the wall. I think it's super obvious what's going to happen next. They will say, "Any service I'm using, it better work exactly the same on all the clouds." You know, that's what's going to happen. So in the next five years, every enterprise will say, "I'm going to use the service, but you better make sure that this service works equally well on all of the clouds." And obviously the multicloud vendors like us, are there to do that. But I actually think that what you're going to see happening is that you're going to see the cloud vendors changing the existing services that they have to make them work on the other clouds. That's what's goin to happen, I think. >> Yeah, and I think I would add that, first of all, I agree with you. I think that's going to be a forcing function. Because I think you're driving it. You guys are in a way, one, are just an actor in the driving this because you're on the front end of this and there are others and there will be people following. But I think to me, I'm a cloud vendor, I got to differentiate. Adam, If I'm Adam Saleski, I got to say, "Hey, I got to differentiate." So I don't wan to get stuck in the middle, so to speak. Am I just going to innovate on the hardware AKA infrastructure or am I going to innovate at the higher level services? So what we're talking about here is the tail of two clouds within Amazon, for instance. So do I innovate on the silicon and get low level into the physics and squeeze performance out of the hardware and infrastructure? Or do I focus on ease of use at the top of the stack for the developers? So again, there's a channel of two clouds here. So I got to ask you, how do they differentiate? Number one and number two, I never heard a developer ever say, "I want to run my app or workload on the slower cloud." So I mean, you know, back when we had PCs you wanted to go, "I want the fastest processor." So again, you can have common level services, but where is that performance differentiation with the cloud? What do the clouds do in your opinion? >> Yeah, look, I think it's pretty clear. I think that it's, this is, you know, no surprise. Probably 70% or so of the revenue is in the lower infrastructure layers, compute, storage, networking. And they have to win that. They have to be competitive there. As you said, you can say, oh you know, I guess my CPUs are slower than the other cloud, but who cares? I have amazing other services which only work on my cloud by the way, right? That's not going to be a winning recipe. So I think all three are laser focused on, we going to have specialized hardware and the nuts and bolts of the infrastructure, we can do it better than the other clouds for sure. And you can see lots of innovation happening there, right? The Graviton chips, you know, we see huge price performance benefits in those chips. I mean it's real, right? It's basically a 20, 30% free lunch. You know, why wouldn't you, why wouldn't you go for it there? There's no downside. You know, there's no, "got you" or no catch. But we see Azure doing the same thing now, they're also building their own chips and we know that Google builds specialized machine learning chips, TPU, Tenor Processing Units. So their legs are focused on that. I don't think they can give up that or focused on higher levels if they had to pick bets. And I think actually in the next few years, most of us have to make more, we have to be more deliberate and calculated in the picks we do. I think in the last five years, most of us have said, "We'll do all of it." You know. >> Well you made a good bet with Spark, you know, the duke was pretty obvious trend that was, everyone was shut on that bandwagon and you guys picked a big bet with Spark. Look what happened with you guys? So again, I love this betting kind of concept because as the world matures, growth slows down and shifts and that next wave of value coming in, AKA customers, they're going to integrate with a new ecosystem. A new kind of partner network for AWS and the other clouds. But with aws they're going to need to nurture the next Databricks. They're going to need to still provide that SaaS, ISV like experience for, you know, a basic software hosting or some application. But I go to get your thoughts on this idea of multiple clouds because if I'm a developer, the old days was, old days, within our decade, full stack developer- >> It was two years ago, yeah (John laughing) >> This is a decade ago, full stack and then the cloud came in, you kind had the half stack and then you would do some things. It seems like the clouds are trying to say, we want to be the full stack or not. Or is it still going to be, you know, I'm an application like a PC and a Mac, I'm going to write the same application for both hardware. I mean what's your take on this? Are they trying to do full stack and you see them more like- >> Absolutely. I mean look, of course they're going, they have, I mean they have over 300, I think Amazon has over 300 services, right? That's not just compute, storage, networking, it's the whole stack, right? But my key point is, I think they have to nail the core infrastructure storage compute networking because the three clouds that are there competing, they're formidable companies with formidable balance sheets and it doesn't look like any of them is going to throw in the towel and say, we give up. So I think it's going to intensify. And given that they have a 70% revenue on that infrastructure layer, I think they, if they have to pick their bets, I think they'll focus it on that infrastructure layer. I think the layer above where they're also placing bets, they're doing that, the full stack, right? But there I think the demand will be, can you make that work on the other clouds? And therein lies an innovator's dilemma because if I make it work on the other clouds, then I'm foregoing that 70% revenue of the infrastructure. I'm not getting it. The other cloud vendor is going to get it. So should I do that or not? Second, is the other cloud vendor going to be welcoming of me making my service work on their cloud if I am a competing cloud, right? And what kind of terms of service are I giving me? And am I going to really invest in doing that? And I think right now we, you know, most, the vast, vast, vast majority of the services only work on the one cloud that you know, it's built on. It doesn't work on others, but this will shift. >> Yeah, I think the innovators dilemma is also very good point. And also add, it's an integrators dilemma too because now you talk about integration across services. So I believe that the super cloud movement's going to happen before Sky. And I think what explained by that, what you guys did and what other companies are doing by representing advanced, I call platform engineering, refactoring an existing market really fast, time to value and CAPEX is, I mean capital, market cap is going to be really fast. I think there's going to be an opportunity for those to emerge that's going to set the table for global multicloud ultimately in the future. So I think you're going to start to see the same pattern of what you guys did get in, leverage the hell out of it, use it, not in the way just to host, but to refactor and take down territory of markets. So number one, and then ultimately you get into, okay, I want to run some SLA across services, then there's a little bit more complication. I think that's where you guys put that beautiful paper out on Sky Computing. Okay, that makes sense. Now if you go to today's market, okay, I'm betting on Amazon because they're the best, this is the best cloud win scenario, not the most robust cloud. So if I'm a developer, I want the best. How do you look at their bet when it comes to data? Because now they've got machine learning, Swami's got a big keynote on Wednesday, I'm expecting to see a lot of AI and machine learning. I'm expecting to hear an end to end data story. This is what you do, so as a major partner, how do you view the moves Amazon's making and the bets they're making with data and machine learning and AI? >> First I want to lift off my hat to AWS for being customer obsessed. So I know that if a customer wants Databricks, I know that AWS and their sellers will actually help us get that customer deploy Databricks. Now which of the services is the customer going to pick? Are they going to pick ours or the end to end, what Swami is going to present on stage? Right? So that's the question we're getting. But I wanted to start with by just saying, their customer obsessed. So I think they're going to do the right thing for the customer and I see the evidence of it again and again and again. So kudos to them. They're amazing at this actually. Ultimately our bet is, customers want this to be simple, integrated, okay? So yes there are hundreds of services that together give you the end to end experience and they're very customizable that AWS gives you. But if you want just something simply integrated that also works across the clouds, then I think there's a special place for Databricks. And I think the lake house approach that we have, which is an integrated, completely integrated, we integrate data lakes with data warehouses, integrate workflows with machine learning, with real time processing, all these in one platform. I think there's going to be tailwinds because I think the most important thing that's going to happen in the next few years is that every customer is going to now be obsessed, given the recession and the environment we're in. How do I cut my costs? How do I cut my costs? And we learn this from the customers they're adopting the lake house because they're thinking, instead of using five vendors or three vendors, I can simplify it down to one with you and I can cut my cost. So I think that's going to be one of the main drivers of why people bet on the lake house because it helps them lower their TCO; Total Cost of Ownership. And it's as simple as that. Like I have three things right now. If I can get the same job done of those three with one, I'd rather do that. And by the way, if it's three or four across two clouds and I can just use one and it just works across two clouds, I'm going to do that. Because my boss is telling me I need to cut my budget. >> (indistinct) (John laughing) >> Yeah, and I'd rather not to do layoffs and they're asking me to do more. How can I get smaller budgets, not lay people off and do more? I have to cut, I have to optimize. What's happened in the last five, six years is there's been a huge sprawl of services and startups, you know, you know most of them, all these startups, all of them, all the activity, all the VC investments, well those companies sold their software, right? Even if a startup didn't make it big, you know, they still sold their software to some vendors. So the ecosystem is now full of lots and lots and lots and lots of different software. And right now people are looking, how do I consolidate, how do I simplify, how do I cut my costs? >> And you guys have a great solution. You're also an arms dealer and a innovator. So I have to ask this question, because you're a professor of the industry as well as at Berkeley, you've seen a lot of the historical innovations. If you look at the moment we're in right now with the recession, okay we had COVID, okay, it changed how people work, you know, people working at home, provisioning VLAN, all that (indistinct) infrastructure, okay, yeah, technology and cloud health. But we're in a recession. This is the first recession where the Amazon and the other cloud, mainly Amazon Web Services is a major economic puzzle in the piece. So they were never around before, even 2008, they were too small. They're now a major economic enabler, player, they're serving startups, enterprises, they have super clouds like you guys. They're a force and the people, their customers are cutting back but also they can also get faster. So agility is now an equation in the economic recovery. And I want to get your thoughts because you just brought that up. Customers can actually use the cloud and Databricks to actually get out of the recovery because no one's going to say, stop making profit or make more profit. So yeah, cut costs, be more efficient, but agility's also like, let's drive more revenue. So in this digital transformation, if you take this to conclusion, every company transforms, their company is the app. So their revenue is tied directly to their technology deployment. What's your reaction and comment to that because this is a new historical moment where cloud and scale and data, actually could be configured in a way to actually change the nature of a business in such a short time. And with the recession looming, no one's got time to wait. >> Yeah, absolutely. Look, the secular tailwind in the market is that of, you know, 10 years ago it was software is eating the world, now it's AI's going to eat all of software software. So more and more we're going to have, wherever you have software, which is everywhere now because it's eaten the world, it's going to be eaten up by AI and data. You know, AI doesn't exist without data so they're synonymous. You can't do machine learning if you don't have data. So yeah, you're going to see that everywhere and that automation will help people simplify things and cut down the costs and automate more things. And in the cloud you can also do that by changing your CAPEX to OPEX. So instead of I invest, you know, 10 million into a data center that I buy, I'm going to have headcount to manage the software. Why don't we change this to OPEX? And then they are going to optimize it. They want to lower the TCO because okay, it's in the cloud. but I do want the costs to be much lower that what they were in the previous years. Last five years, nobody cared. Who cares? You know what it costs. You know, there's a new brave world out there. Now there's like, no, it has to be efficient. So I think they're going to optimize it. And I think this lake house approach, which is an integration of the lakes and the warehouse, allows you to rationalize the two and simplify them. It allows you to basically rationalize away the data warehouse. So I think much faster we're going to see the, why do I need the data warehouse? If I can get the same thing done with the lake house for fraction of the cost, that's what's going to happen. I think there's going to be focus on that simplification. But I agree with you. Ultimately everyone knows, everybody's a software company. Every company out there is a software company and in the next 10 years, all of them are also going to be AI companies. So that is going to continue. >> (indistinct), dev's going to stop. And right sizing right now is a key economic forcing function. Final question for you and I really appreciate you taking the time. This year Reinvent, what's the bumper sticker in your mind around what's the most important industry dynamic, power dynamic, ecosystem dynamic that people should pay attention to as we move from the brave new world of okay, I see cloud, cloud operations. I need to really make it structurally change my business. How do I, what's the most important story? What's the bumper sticker in your mind for Reinvent? >> Bumper sticker? lake house 24. (John laughing) >> That's data (indistinct) bumper sticker. What's the- >> (indistinct) in the market. No, no, no, no. You know, it's, AWS talks about, you know, all of their services becoming a lake house because they want the center of the gravity to be S3, their lake. And they want all the services to directly work on that, so that's a lake house. We're Bumper see Microsoft with Synapse, modern, you know the modern intelligent data platform. Same thing there. We're going to see the same thing, we already seeing it on GCP with Big Lake and so on. So I actually think it's the how do I reduce my costs and the lake house integrates those two. So that's one of the main ways you can rationalize and simplify. You get in the lake house, which is the name itself is a (indistinct) of two things, right? Lake house, "lake" gives you the AI, "house" give you the database data warehouse. So you get your AI and you get your data warehousing in one place at the lower cost. So for me, the bumper sticker is lake house, you know, 24. >> All right. Awesome Ali, well thanks for the exclusive interview. Appreciate it and get to see you. Congratulations on your success and I know you guys are going to be fine. >> Awesome. Thank you John. It's always a pleasure. >> Always great to chat with you again. >> Likewise. >> You guys are a great team. We're big fans of what you guys have done. We think you're an example of what we call "super cloud." Which is getting the hype up and again your paper speaks to some of the innovation, which I agree with by the way. I think that that approach of not forcing standards is really smart. And I think that's absolutely correct, that having the market still innovate is going to be key. standards with- >> Yeah, I love it. We're big fans too, you know, you're doing awesome work. We'd love to continue the partnership. >> So, great, great Ali, thanks. >> Take care (outro music)

Published Date : Nov 23 2022

SUMMARY :

after the keynotes prior to the keynotes and you know, we're because you have customers. I wouldn't, you know, I got to give you guys credit over there So if the customer said we So Adam and the team at So in the next five years, But I think to me, I'm a cloud vendor, and calculated in the picks we do. But I go to get your thoughts on this idea Or is it still going to be, you know, And I think right now we, you know, So I believe that the super cloud I can simplify it down to one with you and startups, you know, and the other cloud, And in the cloud you can also do that I need to really make it lake house 24. That's data (indistinct) of the gravity to be S3, and I know you guys are going to be fine. It's always a pleasure. We're big fans of what you guys have done. We're big fans too, you know,

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
Adam	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
Google	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
2008	DATE	0.99+
five vendors	QUANTITY	0.99+
Adam Saleski	PERSON	0.99+
five	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Ali	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
three vendors	QUANTITY	0.99+
70%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Excel	TITLE	0.99+
38 billion	QUANTITY	0.99+
four	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Word	TITLE	0.99+
three	QUANTITY	0.99+
two clouds	QUANTITY	0.99+
Andy	PERSON	0.99+
three clouds	QUANTITY	0.99+
10 million	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
twice	QUANTITY	0.99+
Second	QUANTITY	0.99+
over 300 services	QUANTITY	0.99+
one game	QUANTITY	0.99+
second cloud	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Sky	ORGANIZATION	0.99+
one word	QUANTITY	0.99+
OPEX	ORGANIZATION	0.99+
two things	QUANTITY	0.98+
two years ago	DATE	0.98+
Access	TITLE	0.98+
over 300	QUANTITY	0.98+
six years	QUANTITY	0.98+
over 70%	QUANTITY	0.98+
five years ago	DATE	0.98+

Ali Ghosdi, Databricks | AWS Partner Exclusive

Published Date : Nov 23 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
Adam	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
Google	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
2008	DATE	0.99+
Ali Ghosdi	PERSON	0.99+
five vendors	QUANTITY	0.99+
Adam Saleski	PERSON	0.99+
five	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Ali	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
three vendors	QUANTITY	0.99+
70%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Excel	TITLE	0.99+
38 billion	QUANTITY	0.99+
four	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Word	TITLE	0.99+
three	QUANTITY	0.99+
two clouds	QUANTITY	0.99+
Andy	PERSON	0.99+
three clouds	QUANTITY	0.99+
10 million	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
twice	QUANTITY	0.99+
Second	QUANTITY	0.99+
over 300 services	QUANTITY	0.99+
one game	QUANTITY	0.99+
second cloud	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Sky	ORGANIZATION	0.99+
one word	QUANTITY	0.99+
OPEX	ORGANIZATION	0.99+
two things	QUANTITY	0.98+
two years ago	DATE	0.98+
Access	TITLE	0.98+
over 300	QUANTITY	0.98+
six years	QUANTITY	0.98+
over 70%	QUANTITY	0.98+
five years ago	DATE	0.98+

The Truth About MySQL HeatWave

>>When Oracle acquired my SQL via the Sun acquisition, nobody really thought the company would put much effort into the platform preferring to focus all the wood behind its leading Oracle database, Arrow pun intended. But two years ago, Oracle surprised many folks by announcing my SQL Heatwave a new database as a service with a massively parallel hybrid Columbia in Mary Mary architecture that brings together transactional and analytic data in a single platform. Welcome to our latest database, power panel on the cube. My name is Dave Ante, and today we're gonna discuss Oracle's MySQL Heat Wave with a who's who of cloud database industry analysts. Holgar Mueller is with Constellation Research. Mark Stammer is the Dragon Slayer and Wikibon contributor. And Ron Westfall is with Fu Chim Research. Gentlemen, welcome back to the Cube. Always a pleasure to have you on. Thanks for having us. Great to be here. >>So we've had a number of of deep dive interviews on the Cube with Nip and Aggarwal. You guys know him? He's a senior vice president of MySQL, Heatwave Development at Oracle. I think you just saw him at Oracle Cloud World and he's come on to describe this is gonna, I'll call it a shock and awe feature additions to to heatwave. You know, the company's clearly putting r and d into the platform and I think at at cloud world we saw like the fifth major release since 2020 when they first announced MySQL heat wave. So just listing a few, they, they got, they taken, brought in analytics machine learning, they got autopilot for machine learning, which is automation onto the basic o l TP functionality of the database. And it's been interesting to watch Oracle's converge database strategy. We've contrasted that amongst ourselves. Love to get your thoughts on Amazon's get the right tool for the right job approach. >>Are they gonna have to change that? You know, Amazon's got the specialized databases, it's just, you know, the both companies are doing well. It just shows there are a lot of ways to, to skin a cat cuz you see some traction in the market in, in both approaches. So today we're gonna focus on the latest heat wave announcements and we're gonna talk about multi-cloud with a native MySQL heat wave implementation, which is available on aws MySQL heat wave for Azure via the Oracle Microsoft interconnect. This kind of cool hybrid action that they got going. Sometimes we call it super cloud. And then we're gonna dive into my SQL Heatwave Lake house, which allows users to process and query data across MyQ databases as heatwave databases, as well as object stores. So, and then we've got, heatwave has been announced on AWS and, and, and Azure, they're available now and Lake House I believe is in beta and I think it's coming out the second half of next year. So again, all of our guests are fresh off of Oracle Cloud world in Las Vegas. So they got the latest scoop. Guys, I'm done talking. Let's get into it. Mark, maybe you could start us off, what's your opinion of my SQL Heatwaves competitive position? When you think about what AWS is doing, you know, Google is, you know, we heard Google Cloud next recently, we heard about all their data innovations. You got, obviously Azure's got a big portfolio, snowflakes doing well in the market. What's your take? >>Well, first let's look at it from the point of view that AWS is the market leader in cloud and cloud services. They own somewhere between 30 to 50% depending on who you read of the market. And then you have Azure as number two and after that it falls off. There's gcp, Google Cloud platform, which is further way down the list and then Oracle and IBM and Alibaba. So when you look at AWS and you and Azure saying, hey, these are the market leaders in the cloud, then you start looking at it and saying, if I am going to provide a service that competes with the service they have, if I can make it available in their cloud, it means that I can be more competitive. And if I'm compelling and compelling means at least twice the performance or functionality or both at half the price, I should be able to gain market share. >>And that's what Oracle's done. They've taken a superior product in my SQL heat wave, which is faster, lower cost does more for a lot less at the end of the day and they make it available to the users of those clouds. You avoid this little thing called egress fees, you avoid the issue of having to migrate from one cloud to another and suddenly you have a very compelling offer. So I look at what Oracle's doing with MyQ and it feels like, I'm gonna use a word term, a flanking maneuver to their competition. They're offering a better service on their platforms. >>All right, so thank you for that. Holger, we've seen this sort of cadence, I sort of referenced it up front a little bit and they sat on MySQL for a decade, then all of a sudden we see this rush of announcements. Why did it take so long? And and more importantly is Oracle, are they developing the right features that cloud database customers are looking for in your view? >>Yeah, great question, but first of all, in your interview you said it's the edit analytics, right? Analytics is kind of like a marketing buzzword. Reports can be analytics, right? The interesting thing, which they did, the first thing they, they, they crossed the chasm between OTP and all up, right? In the same database, right? So major engineering feed very much what customers want and it's all about creating Bellevue for customers, which, which I think is the part why they go into the multi-cloud and why they add these capabilities. And they certainly with the AI capabilities, it's kind of like getting it into an autonomous field, self-driving field now with the lake cost capabilities and meeting customers where they are, like Mark has talked about the e risk costs in the cloud. So that that's a significant advantage, creating value for customers and that's what at the end of the day matters. >>And I believe strongly that long term it's gonna be ones who create better value for customers who will get more of their money From that perspective, why then take them so long? I think it's a great question. I think largely he mentioned the gentleman Nial, it's largely to who leads a product. I used to build products too, so maybe I'm a little fooling myself here, but that made the difference in my view, right? So since he's been charged, he's been building things faster than the rest of the competition, than my SQL space, which in hindsight we thought was a hot and smoking innovation phase. It kind of like was a little self complacent when it comes to the traditional borders of where, where people think, where things are separated between OTP and ola or as an example of adjacent support, right? Structured documents, whereas unstructured documents or databases and all of that has been collapsed and brought together for building a more powerful database for customers. >>So I mean it's certainly, you know, when, when Oracle talks about the competitors, you know, the competitors are in the, I always say they're, if the Oracle talks about you and knows you're doing well, so they talk a lot about aws, talk a little bit about Snowflake, you know, sort of Google, they have partnerships with Azure, but, but in, so I'm presuming that the response in MySQL heatwave was really in, in response to what they were seeing from those big competitors. But then you had Maria DB coming out, you know, the day that that Oracle acquired Sun and, and launching and going after the MySQL base. So it's, I'm, I'm interested and we'll talk about this later and what you guys think AWS and Google and Azure and Snowflake and how they're gonna respond. But, but before I do that, Ron, I want to ask you, you, you, you can get, you know, pretty technical and you've probably seen the benchmarks. >>I know you have Oracle makes a big deal out of it, publishes its benchmarks, makes some transparent on on GI GitHub. Larry Ellison talked about this in his keynote at Cloud World. What are the benchmarks show in general? I mean, when you, when you're new to the market, you gotta have a story like Mark was saying, you gotta be two x you know, the performance at half the cost or you better be or you're not gonna get any market share. So, and, and you know, oftentimes companies don't publish market benchmarks when they're leading. They do it when they, they need to gain share. So what do you make of the benchmarks? Have their, any results that were surprising to you? Have, you know, they been challenged by the competitors. Is it just a bunch of kind of desperate bench marketing to make some noise in the market or you know, are they real? What's your view? >>Well, from my perspective, I think they have the validity. And to your point, I believe that when it comes to competitor responses, that has not really happened. Nobody has like pulled down the information that's on GitHub and said, Oh, here are our price performance results. And they counter oracles. In fact, I think part of the reason why that hasn't happened is that there's the risk if Oracle's coming out and saying, Hey, we can deliver 17 times better query performance using our capabilities versus say, Snowflake when it comes to, you know, the Lakehouse platform and Snowflake turns around and says it's actually only 15 times better during performance, that's not exactly an effective maneuver. And so I think this is really to oracle's credit and I think it's refreshing because these differentiators are significant. We're not talking, you know, like 1.2% differences. We're talking 17 fold differences, we're talking six fold differences depending on, you know, where the spotlight is being shined and so forth. >>And so I think this is actually something that is actually too good to believe initially at first blush. If I'm a cloud database decision maker, I really have to prioritize this. I really would know, pay a lot more attention to this. And that's why I posed the question to Oracle and others like, okay, if these differentiators are so significant, why isn't the needle moving a bit more? And it's for, you know, some of the usual reasons. One is really deep discounting coming from, you know, the other players that's really kind of, you know, marketing 1 0 1, this is something you need to do when there's a real competitive threat to keep, you know, a customer in your own customer base. Plus there is the usual fear and uncertainty about moving from one platform to another. But I think, you know, the traction, the momentum is, is shifting an Oracle's favor. I think we saw that in the Q1 efforts, for example, where Oracle cloud grew 44% and that it generated, you know, 4.8 billion and revenue if I recall correctly. And so, so all these are demonstrating that's Oracle is making, I think many of the right moves, publishing these figures for anybody to look at from their own perspective is something that is, I think, good for the market and I think it's just gonna continue to pay dividends for Oracle down the horizon as you know, competition intens plots. So if I were in, >>Dave, can I, Dave, can I interject something and, and what Ron just said there? Yeah, please go ahead. A couple things here, one discounting, which is a common practice when you have a real threat, as Ron pointed out, isn't going to help much in this situation simply because you can't discount to the point where you improve your performance and the performance is a huge differentiator. You may be able to get your price down, but the problem that most of them have is they don't have an integrated product service. They don't have an integrated O L T P O L A P M L N data lake. Even if you cut out two of them, they don't have any of them integrated. They have multiple services that are required separate integration and that can't be overcome with discounting. And the, they, you have to pay for each one of these. And oh, by the way, as you grow, the discounts go away. So that's a, it's a minor important detail. >>So, so that's a TCO question mark, right? And I know you look at this a lot, if I had that kind of price performance advantage, I would be pounding tco, especially if I need two separate databases to do the job. That one can do, that's gonna be, the TCO numbers are gonna be off the chart or maybe down the chart, which you want. Have you looked at this and how does it compare with, you know, the big cloud guys, for example, >>I've looked at it in depth, in fact, I'm working on another TCO on this arena, but you can find it on Wiki bod in which I compared TCO for MySEQ Heat wave versus Aurora plus Redshift plus ML plus Blue. I've compared it against gcps services, Azure services, Snowflake with other services. And there's just no comparison. The, the TCO differences are huge. More importantly, thefor, the, the TCO per performance is huge. We're talking in some cases multiple orders of magnitude, but at least an order of magnitude difference. So discounting isn't gonna help you much at the end of the day, it's only going to lower your cost a little, but it doesn't improve the automation, it doesn't improve the performance, it doesn't improve the time to insight, it doesn't improve all those things that you want out of a database or multiple databases because you >>Can't discount yourself to a higher value proposition. >>So what about, I wonder ho if you could chime in on the developer angle. You, you followed that, that market. How do these innovations from heatwave, I think you used the term developer velocity. I've heard you used that before. Yeah, I mean, look, Oracle owns Java, okay, so it, it's, you know, most popular, you know, programming language in the world, blah, blah blah. But it does it have the, the minds and hearts of, of developers and does, where does heatwave fit into that equation? >>I think heatwave is gaining quickly mindshare on the developer side, right? It's not the traditional no sequel database which grew up, there's a traditional mistrust of oracles to developers to what was happening to open source when gets acquired. Like in the case of Oracle versus Java and where my sql, right? And, but we know it's not a good competitive strategy to, to bank on Oracle screwing up because it hasn't worked not on Java known my sequel, right? And for developers, it's, once you get to know a technology product and you can do more, it becomes kind of like a Swiss army knife and you can build more use case, you can build more powerful applications. That's super, super important because you don't have to get certified in multiple databases. You, you are fast at getting things done, you achieve fire, develop velocity, and the managers are happy because they don't have to license more things, send you to more trainings, have more risk of something not being delivered, right? >>So it's really the, we see the suite where this best of breed play happening here, which in general was happening before already with Oracle's flagship database. Whereas those Amazon as an example, right? And now the interesting thing is every step away Oracle was always a one database company that can be only one and they're now generally talking about heat web and that two database company with different market spaces, but same value proposition of integrating more things very, very quickly to have a universal database that I call, they call the converge database for all the needs of an enterprise to run certain application use cases. And that's what's attractive to developers. >>It's, it's ironic isn't it? I mean I, you know, the rumor was the TK Thomas Curian left Oracle cuz he wanted to put Oracle database on other clouds and other places. And maybe that was the rift. Maybe there was, I'm sure there was other things, but, but Oracle clearly is now trying to expand its Tam Ron with, with heatwave into aws, into Azure. How do you think Oracle's gonna do, you were at a cloud world, what was the sentiment from customers and the independent analyst? Is this just Oracle trying to screw with the competition, create a little diversion? Or is this, you know, serious business for Oracle? What do you think? >>No, I think it has lakes. I think it's definitely, again, attriting to Oracle's overall ability to differentiate not only my SQL heat wave, but its overall portfolio. And I think the fact that they do have the alliance with the Azure in place, that this is definitely demonstrating their commitment to meeting the multi-cloud needs of its customers as well as what we pointed to in terms of the fact that they're now offering, you know, MySQL capabilities within AWS natively and that it can now perform AWS's own offering. And I think this is all demonstrating that Oracle is, you know, not letting up, they're not resting on its laurels. That's clearly we are living in a multi-cloud world, so why not just make it more easy for customers to be able to use cloud databases according to their own specific, specific needs. And I think, you know, to holder's point, I think that definitely lines with being able to bring on more application developers to leverage these capabilities. >>I think one important announcement that's related to all this was the JSON relational duality capabilities where now it's a lot easier for application developers to use a language that they're very familiar with a JS O and not have to worry about going into relational databases to store their J S O N application coding. So this is, I think an example of the innovation that's enhancing the overall Oracle portfolio and certainly all the work with machine learning is definitely paying dividends as well. And as a result, I see Oracle continue to make these inroads that we pointed to. But I agree with Mark, you know, the short term discounting is just a stall tag. This is not denying the fact that Oracle is being able to not only deliver price performance differentiators that are dramatic, but also meeting a wide range of needs for customers out there that aren't just limited device performance consideration. >>Being able to support multi-cloud according to customer needs. Being able to reach out to the application developer community and address a very specific challenge that has plagued them for many years now. So bring it all together. Yeah, I see this as just enabling Oracles who ring true with customers. That the customers that were there were basically all of them, even though not all of them are going to be saying the same things, they're all basically saying positive feedback. And likewise, I think the analyst community is seeing this. It's always refreshing to be able to talk to customers directly and at Oracle cloud there was a litany of them and so this is just a difference maker as well as being able to talk to strategic partners. The nvidia, I think partnerships also testament to Oracle's ongoing ability to, you know, make the ecosystem more user friendly for the customers out there. >>Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able to be best of breed. That's the kind of surprising thing that I'm hearing about, about heatwave. I want to, I want to talk about Lake House because when I think of Lake House, I think data bricks, and to my knowledge data bricks hasn't been in the sites of Oracle yet. Maybe they're next, but, but Oracle claims that MySQL, heatwave, Lakehouse is a breakthrough in terms of capacity and performance. Mark, what are your thoughts on that? Can you double click on, on Lakehouse Oracle's claims for things like query performance and data loading? What does it mean for the market? Is Oracle really leading in, in the lake house competitive landscape? What are your thoughts? >>Well, but name in the game is what are the problems you're solving for the customer? More importantly, are those problems urgent or important? If they're urgent, customers wanna solve 'em. Now if they're important, they might get around to them. So you look at what they're doing with Lake House or previous to that machine learning or previous to that automation or previous to that O L A with O ltp and they're merging all this capability together. If you look at Snowflake or data bricks, they're tacking one problem. You look at MyQ heat wave, they're tacking multiple problems. So when you say, yeah, their queries are much better against the lake house in combination with other analytics in combination with O ltp and the fact that there are no ETLs. So you're getting all this done in real time. So it's, it's doing the query cross, cross everything in real time. >>You're solving multiple user and developer problems, you're increasing their ability to get insight faster, you're having shorter response times. So yeah, they really are solving urgent problems for customers. And by putting it where the customer lives, this is the brilliance of actually being multicloud. And I know I'm backing up here a second, but by making it work in AWS and Azure where people already live, where they already have applications, what they're saying is, we're bringing it to you. You don't have to come to us to get these, these benefits, this value overall, I think it's a brilliant strategy. I give Nip and Argo wallet a huge, huge kudos for what he's doing there. So yes, what they're doing with the lake house is going to put notice on data bricks and Snowflake and everyone else for that matter. Well >>Those are guys that whole ago you, you and I have talked about this. Those are, those are the guys that are doing sort of the best of breed. You know, they're really focused and they, you know, tend to do well at least out of the gate. Now you got Oracle's converged philosophy, obviously with Oracle database. We've seen that now it's kicking in gear with, with heatwave, you know, this whole thing of sweets versus best of breed. I mean the long term, you know, customers tend to migrate towards suite, but the new shiny toy tends to get the growth. How do you think this is gonna play out in cloud database? >>Well, it's the forever never ending story, right? And in software right suite, whereas best of breed and so far in the long run suites have always won, right? So, and sometimes they struggle again because the inherent problem of sweets is you build something larger, it has more complexity and that means your cycles to get everything working together to integrate the test that roll it out, certify whatever it is, takes you longer, right? And that's not the case. It's a fascinating part of what the effort around my SQL heat wave is that the team is out executing the previous best of breed data, bringing us something together. Now if they can maintain that pace, that's something to to, to be seen. But it, the strategy, like what Mark was saying, bring the software to the data is of course interesting and unique and totally an Oracle issue in the past, right? >>Yeah. But it had to be in your database on oci. And but at, that's an interesting part. The interesting thing on the Lake health side is, right, there's three key benefits of a lakehouse. The first one is better reporting analytics, bring more rich information together, like make the, the, the case for silicon angle, right? We want to see engagements for this video, we want to know what's happening. That's a mixed transactional video media use case, right? Typical Lakehouse use case. The next one is to build more rich applications, transactional applications which have video and these elements in there, which are the engaging one. And the third one, and that's where I'm a little critical and concerned, is it's really the base platform for artificial intelligence, right? To run deep learning to run things automatically because they have all the data in one place can create in one way. >>And that's where Oracle, I know that Ron talked about Invidia for a moment, but that's where Oracle doesn't have the strongest best story. Nonetheless, the two other main use cases of the lake house are very strong, very well only concern is four 50 terabyte sounds long. It's an arbitrary limitation. Yeah, sounds as big. So for the start, and it's the first word, they can make that bigger. You don't want your lake house to be limited and the terabyte sizes or any even petabyte size because you want to have the certainty. I can put everything in there that I think it might be relevant without knowing what questions to ask and query those questions. >>Yeah. And you know, in the early days of no schema on right, it just became a mess. But now technology has evolved to allow us to actually get more value out of that data. Data lake. Data swamp is, you know, not much more, more, more, more logical. But, and I want to get in, in a moment, I want to come back to how you think the competitors are gonna respond. Are they gonna have to sort of do a more of a converged approach? AWS in particular? But before I do, Ron, I want to ask you a question about autopilot because I heard Larry Ellison's keynote and he was talking about how, you know, most security issues are human errors with autonomy and autonomous database and things like autopilot. We take care of that. It's like autonomous vehicles, they're gonna be safer. And I went, well maybe, maybe someday. So Oracle really tries to emphasize this, that every time you see an announcement from Oracle, they talk about new, you know, autonomous capabilities. It, how legit is it? Do people care? What about, you know, what's new for heatwave Lakehouse? How much of a differentiator, Ron, do you really think autopilot is in this cloud database space? >>Yeah, I think it will definitely enhance the overall proposition. I don't think people are gonna buy, you know, lake house exclusively cause of autopilot capabilities, but when they look at the overall picture, I think it will be an added capability bonus to Oracle's benefit. And yeah, I think it's kind of one of these age old questions, how much do you automate and what is the bounce to strike? And I think we all understand with the automatic car, autonomous car analogy that there are limitations to being able to use that. However, I think it's a tool that basically every organization out there needs to at least have or at least evaluate because it goes to the point of it helps with ease of use, it helps make automation more balanced in terms of, you know, being able to test, all right, let's automate this process and see if it works well, then we can go on and switch on on autopilot for other processes. >>And then, you know, that allows, for example, the specialists to spend more time on business use cases versus, you know, manual maintenance of, of the cloud database and so forth. So I think that actually is a, a legitimate value proposition. I think it's just gonna be a case by case basis. Some organizations are gonna be more aggressive with putting automation throughout their processes throughout their organization. Others are gonna be more cautious. But it's gonna be, again, something that will help the overall Oracle proposition. And something that I think will be used with caution by many organizations, but other organizations are gonna like, hey, great, this is something that is really answering a real problem. And that is just easing the use of these databases, but also being able to better handle the automation capabilities and benefits that come with it without having, you know, a major screwup happened and the process of transitioning to more automated capabilities. >>Now, I didn't attend cloud world, it's just too many red eyes, you know, recently, so I passed. But one of the things I like to do at those events is talk to customers, you know, in the spirit of the truth, you know, they, you know, you'd have the hallway, you know, track and to talk to customers and they say, Hey, you know, here's the good, the bad and the ugly. So did you guys, did you talk to any customers my SQL Heatwave customers at, at cloud world? And and what did you learn? I don't know, Mark, did you, did you have any luck and, and having some, some private conversations? >>Yeah, I had quite a few private conversations. The one thing before I get to that, I want disagree with one point Ron made, I do believe there are customers out there buying the heat wave service, the MySEQ heat wave server service because of autopilot. Because autopilot is really revolutionary in many ways in the sense for the MySEQ developer in that it, it auto provisions, it auto parallel loads, IT auto data places it auto shape predictions. It can tell you what machine learning models are going to tell you, gonna give you your best results. And, and candidly, I've yet to meet a DBA who didn't wanna give up pedantic tasks that are pain in the kahoo, which they'd rather not do and if it's long as it was done right for them. So yes, I do think people are buying it because of autopilot and that's based on some of the conversations I had with customers at Oracle Cloud World. >>In fact, it was like, yeah, that's great, yeah, we get fantastic performance, but this really makes my life easier and I've yet to meet a DBA who didn't want to make their life easier. And it does. So yeah, I've talked to a few of them. They were excited. I asked them if they ran into any bugs, were there any difficulties in moving to it? And the answer was no. In both cases, it's interesting to note, my sequel is the most popular database on the planet. Well, some will argue that it's neck and neck with SQL Server, but if you add in Mariah DB and ProCon db, which are forks of MySQL, then yeah, by far and away it's the most popular. And as a result of that, everybody for the most part has typically a my sequel database somewhere in their organization. So this is a brilliant situation for anybody going after MyQ, but especially for heat wave. And the customers I talk to love it. I didn't find anybody complaining about it. And >>What about the migration? We talked about TCO earlier. Did your t does your TCO analysis include the migration cost or do you kind of conveniently leave that out or what? >>Well, when you look at migration costs, there are different kinds of migration costs. By the way, the worst job in the data center is the data migration manager. Forget it, no other job is as bad as that one. You get no attaboys for doing it. Right? And then when you screw up, oh boy. So in real terms, anything that can limit data migration is a good thing. And when you look at Data Lake, that limits data migration. So if you're already a MySEQ user, this is a pure MySQL as far as you're concerned. It's just a, a simple transition from one to the other. You may wanna make sure nothing broke and every you, all your tables are correct and your schema's, okay, but it's all the same. So it's a simple migration. So it's pretty much a non-event, right? When you migrate data from an O LTP to an O L A P, that's an ETL and that's gonna take time. >>But you don't have to do that with my SQL heat wave. So that's gone when you start talking about machine learning, again, you may have an etl, you may not, depending on the circumstances, but again, with my SQL heat wave, you don't, and you don't have duplicate storage, you don't have to copy it from one storage container to another to be able to be used in a different database, which by the way, ultimately adds much more cost than just the other service. So yeah, I looked at the migration and again, the users I talked to said it was a non-event. It was literally moving from one physical machine to another. If they had a new version of MySEQ running on something else and just wanted to migrate it over or just hook it up or just connect it to the data, it worked just fine. >>Okay, so every day it sounds like you guys feel, and we've certainly heard this, my colleague David Foyer, the semi-retired David Foyer was always very high on heatwave. So I think you knows got some real legitimacy here coming from a standing start, but I wanna talk about the competition, how they're likely to respond. I mean, if your AWS and you got heatwave is now in your cloud, so there's some good aspects of that. The database guys might not like that, but the infrastructure guys probably love it. Hey, more ways to sell, you know, EC two and graviton, but you're gonna, the database guys in AWS are gonna respond. They're gonna say, Hey, we got Redshift, we got aqua. What's your thoughts on, on not only how that's gonna resonate with customers, but I'm interested in what you guys think will a, I never say never about aws, you know, and are they gonna try to build, in your view a converged Oola and o LTP database? You know, Snowflake is taking an ecosystem approach. They've added in transactional capabilities to the portfolio so they're not standing still. What do you guys see in the competitive landscape in that regard going forward? Maybe Holger, you could start us off and anybody else who wants to can chime in, >>Happy to, you mentioned Snowflake last, we'll start there. I think Snowflake is imitating that strategy, right? That building out original data warehouse and the clouds tasking project to really proposition to have other data available there because AI is relevant for everybody. Ultimately people keep data in the cloud for ultimately running ai. So you see the same suite kind of like level strategy, it's gonna be a little harder because of the original positioning. How much would people know that you're doing other stuff? And I just, as a former developer manager of developers, I just don't see the speed at the moment happening at Snowflake to become really competitive to Oracle. On the flip side, putting my Oracle hat on for a moment back to you, Mark and Iran, right? What could Oracle still add? Because the, the big big things, right? The traditional chasms in the database world, they have built everything, right? >>So I, I really scratched my hat and gave Nipon a hard time at Cloud world say like, what could you be building? Destiny was very conservative. Let's get the Lakehouse thing done, it's gonna spring next year, right? And the AWS is really hard because AWS value proposition is these small innovation teams, right? That they build two pizza teams, which can be fit by two pizzas, not large teams, right? And you need suites to large teams to build these suites with lots of functionalities to make sure they work together. They're consistent, they have the same UX on the administration side, they can consume the same way, they have the same API registry, can't even stop going where the synergy comes to play over suite. So, so it's gonna be really, really hard for them to change that. But AWS super pragmatic. They're always by themselves that they'll listen to customers if they learn from customers suite as a proposition. I would not be surprised if AWS trying to bring things closer together, being morely together. >>Yeah. Well how about, can we talk about multicloud if, if, again, Oracle is very on on Oracle as you said before, but let's look forward, you know, half a year or a year. What do you think about Oracle's moves in, in multicloud in terms of what kind of penetration they're gonna have in the marketplace? You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at the, the Microsoft Azure deal. I think that's really interesting. I've, I've called it a little bit of early days of a super cloud. What impact do you think this is gonna have on, on the marketplace? But, but both. And think about it within Oracle's customer base, I have no doubt they'll do great there. But what about beyond its existing install base? What do you guys think? >>Ryan, do you wanna jump on that? Go ahead. Go ahead Ryan. No, no, no, >>That's an excellent point. I think it aligns with what we've been talking about in terms of Lakehouse. I think Lake House will enable Oracle to pull more customers, more bicycle customers onto the Oracle platforms. And I think we're seeing all the signs pointing toward Oracle being able to make more inroads into the overall market. And that includes garnishing customers from the leaders in, in other words, because they are, you know, coming in as a innovator, a an alternative to, you know, the AWS proposition, the Google cloud proposition that they have less to lose and there's a result they can really drive the multi-cloud messaging to resonate with not only their existing customers, but also to be able to, to that question, Dave's posing actually garnish customers onto their platform. And, and that includes naturally my sequel but also OCI and so forth. So that's how I'm seeing this playing out. I think, you know, again, Oracle's reporting is indicating that, and I think what we saw, Oracle Cloud world is definitely validating the idea that Oracle can make more waves in the overall market in this regard. >>You know, I, I've floated this idea of Super cloud, it's kind of tongue in cheek, but, but there, I think there is some merit to it in terms of building on top of hyperscale infrastructure and abstracting some of the, that complexity. And one of the things that I'm most interested in is industry clouds and an Oracle acquisition of Cerner. I was struck by Larry Ellison's keynote, it was like, I don't know, an hour and a half and an hour and 15 minutes was focused on healthcare transformation. Well, >>So vertical, >>Right? And so, yeah, so you got Oracle's, you know, got some industry chops and you, and then you think about what they're building with, with not only oci, but then you got, you know, MyQ, you can now run in dedicated regions. You got ADB on on Exadata cloud to customer, you can put that OnPrem in in your data center and you look at what the other hyperscalers are, are doing. I I say other hyperscalers, I've always said Oracle's not really a hyperscaler, but they got a cloud so they're in the game. But you can't get, you know, big query OnPrem, you look at outposts, it's very limited in terms of, you know, the database support and again, that that will will evolve. But now you got Oracle's got, they announced Alloy, we can white label their cloud. So I'm interested in what you guys think about these moves, especially the industry cloud. We see, you know, Walmart is doing sort of their own cloud. You got Goldman Sachs doing a cloud. Do you, you guys, what do you think about that and what role does Oracle play? Any thoughts? >>Yeah, let me lemme jump on that for a moment. Now, especially with the MyQ, by making that available in multiple clouds, what they're doing is this follows the philosophy they've had the past with doing cloud, a customer taking the application and the data and putting it where the customer lives. If it's on premise, it's on premise. If it's in the cloud, it's in the cloud. By making the mice equal heat wave, essentially a plug compatible with any other mice equal as far as your, your database is concern and then giving you that integration with O L A P and ML and Data Lake and everything else, then what you've got is a compelling offering. You're making it easier for the customer to use. So I look the difference between MyQ and the Oracle database, MyQ is going to capture market more market share for them. >>You're not gonna find a lot of new users for the Oracle debate database. Yeah, there are always gonna be new users, don't get me wrong, but it's not gonna be a huge growth. Whereas my SQL heatwave is probably gonna be a major growth engine for Oracle going forward. Not just in their own cloud, but in AWS and in Azure and on premise over time that eventually it'll get there. It's not there now, but it will, they're doing the right thing on that basis. They're taking the services and when you talk about multicloud and making them available where the customer wants them, not forcing them to go where you want them, if that makes sense. And as far as where they're going in the future, I think they're gonna take a page outta what they've done with the Oracle database. They'll add things like JSON and XML and time series and spatial over time they'll make it a, a complete converged database like they did with the Oracle database. The difference being Oracle database will scale bigger and will have more transactions and be somewhat faster. And my SQL will be, for anyone who's not on the Oracle database, they're, they're not stupid, that's for sure. >>They've done Jason already. Right. But I give you that they could add graph and time series, right. Since eat with, Right, Right. Yeah, that's something absolutely right. That's, that's >>A sort of a logical move, right? >>Right. But that's, that's some kid ourselves, right? I mean has worked in Oracle's favor, right? 10 x 20 x, the amount of r and d, which is in the MyQ space, has been poured at trying to snatch workloads away from Oracle by starting with IBM 30 years ago, 20 years ago, Microsoft and, and, and, and didn't work, right? Database applications are extremely sticky when they run, you don't want to touch SIM and grow them, right? So that doesn't mean that heat phase is not an attractive offering, but it will be net new things, right? And what works in my SQL heat wave heat phases favor a little bit is it's not the massive enterprise applications which have like we the nails like, like you might be only running 30% or Oracle, but the connections and the interfaces into that is, is like 70, 80% of your enterprise. >>You take it out and it's like the spaghetti ball where you say, ah, no I really don't, don't want to do all that. Right? You don't, don't have that massive part with the equals heat phase sequel kind of like database which are more smaller tactical in comparison, but still I, I don't see them taking so much share. They will be growing because of a attractive value proposition quickly on the, the multi-cloud, right? I think it's not really multi-cloud. If you give people the chance to run your offering on different clouds, right? You can run it there. The multi-cloud advantages when the Uber offering comes out, which allows you to do things across those installations, right? I can migrate data, I can create data across something like Google has done with B query Omni, I can run predictive models or even make iron models in different place and distribute them, right? And Oracle is paving the road for that, but being available on these clouds. But the multi-cloud capability of database which knows I'm running on different clouds that is still yet to be built there. >>Yeah. And >>That the problem with >>That, that's the super cloud concept that I flowed and I I've always said kinda snowflake with a single global instance is sort of, you know, headed in that direction and maybe has a league. What's the issue with that mark? >>Yeah, the problem with the, with that version, the multi-cloud is clouds to charge egress fees. As long as they charge egress fees to move data between clouds, it's gonna make it very difficult to do a real multi-cloud implementation. Even Snowflake, which runs multi-cloud, has to pass out on the egress fees of their customer when data moves between clouds. And that's really expensive. I mean there, there is one customer I talked to who is beta testing for them, the MySQL heatwave and aws. The only reason they didn't want to do that until it was running on AWS is the egress fees were so great to move it to OCI that they couldn't afford it. Yeah. Egress fees are the big issue but, >>But Mark the, the point might be you might wanna root query and only get the results set back, right was much more tinier, which been the answer before for low latency between the class A problem, which we sometimes still have but mostly don't have. Right? And I think in general this with fees coming down based on the Oracle general E with fee move and it's very hard to justify those, right? But, but it's, it's not about moving data as a multi-cloud high value use case. It's about doing intelligent things with that data, right? Putting into other places, replicating it, what I'm saying the same thing what you said before, running remote queries on that, analyzing it, running AI on it, running AI models on that. That's the interesting thing. Cross administered in the same way. Taking things out, making sure compliance happens. Making sure when Ron says I don't want to be American anymore, I want to be in the European cloud that is gets migrated, right? So tho those are the interesting value use case which are really, really hard for enterprise to program hand by hand by developers and they would love to have out of the box and that's yet the innovation to come to, we have to come to see. But the first step to get there is that your software runs in multiple clouds and that's what Oracle's doing so well with my SQL >>Guys. Amazing. >>Go ahead. Yeah. >>Yeah. >>For example, >>Amazing amount of data knowledge and, and brain power in this market. Guys, I really want to thank you for coming on to the cube. Ron Holger. Mark, always a pleasure to have you on. Really appreciate your time. >>Well all the last names we're very happy for Romanic last and moderator. Thanks Dave for moderating us. All right, >>We'll see. We'll see you guys around. Safe travels to all and thank you for watching this power panel, The Truth About My SQL Heat Wave on the cube. Your leader in enterprise and emerging tech coverage.

Published Date : Nov 1 2022

SUMMARY :

Always a pleasure to have you on. I think you just saw him at Oracle Cloud World and he's come on to describe this is doing, you know, Google is, you know, we heard Google Cloud next recently, They own somewhere between 30 to 50% depending on who you read migrate from one cloud to another and suddenly you have a very compelling offer. All right, so thank you for that. And they certainly with the AI capabilities, And I believe strongly that long term it's gonna be ones who create better value for So I mean it's certainly, you know, when, when Oracle talks about the competitors, So what do you make of the benchmarks? say, Snowflake when it comes to, you know, the Lakehouse platform and threat to keep, you know, a customer in your own customer base. And oh, by the way, as you grow, And I know you look at this a lot, to insight, it doesn't improve all those things that you want out of a database or multiple databases So what about, I wonder ho if you could chime in on the developer angle. they don't have to license more things, send you to more trainings, have more risk of something not being delivered, all the needs of an enterprise to run certain application use cases. I mean I, you know, the rumor was the TK Thomas Curian left Oracle And I think, you know, to holder's point, I think that definitely lines But I agree with Mark, you know, the short term discounting is just a stall tag. testament to Oracle's ongoing ability to, you know, make the ecosystem Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able So when you say, yeah, their queries are much better against the lake house in You don't have to come to us to get these, these benefits, I mean the long term, you know, customers tend to migrate towards suite, but the new shiny bring the software to the data is of course interesting and unique and totally an Oracle issue in And the third one, lake house to be limited and the terabyte sizes or any even petabyte size because you want keynote and he was talking about how, you know, most security issues are human I don't think people are gonna buy, you know, lake house exclusively cause of And then, you know, that allows, for example, the specialists to And and what did you learn? The one thing before I get to that, I want disagree with And the customers I talk to love it. the migration cost or do you kind of conveniently leave that out or what? And when you look at Data Lake, that limits data migration. So that's gone when you start talking about So I think you knows got some real legitimacy here coming from a standing start, So you see the same And you need suites to large teams to build these suites with lots of functionalities You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at Ryan, do you wanna jump on that? I think, you know, again, Oracle's reporting I think there is some merit to it in terms of building on top of hyperscale infrastructure and to customer, you can put that OnPrem in in your data center and you look at what the So I look the difference between MyQ and the Oracle database, MyQ is going to capture market They're taking the services and when you talk about multicloud and But I give you that they could add graph and time series, right. like, like you might be only running 30% or Oracle, but the connections and the interfaces into You take it out and it's like the spaghetti ball where you say, ah, no I really don't, global instance is sort of, you know, headed in that direction and maybe has a league. Yeah, the problem with the, with that version, the multi-cloud is clouds And I think in general this with fees coming down based on the Oracle general E with fee move Yeah. Guys, I really want to thank you for coming on to the cube. Well all the last names we're very happy for Romanic last and moderator. We'll see you guys around.

ENTITIES

Entity	Category	Confidence
Mark	PERSON	0.99+
Ron Holger	PERSON	0.99+
Ron	PERSON	0.99+
Mark Stammer	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Ron Westfall	PERSON	0.99+
Ryan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Holgar Mueller	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Constellation Research	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
17 times	QUANTITY	0.99+
two	QUANTITY	0.99+
David Foyer	PERSON	0.99+
44%	QUANTITY	0.99+
1.2%	QUANTITY	0.99+
4.8 billion	QUANTITY	0.99+
Jason	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Fu Chim Research	ORGANIZATION	0.99+
Dave Ante	PERSON	0.99+

Breaking Analysis Further defining Supercloud W/ tech leaders VMware, Snowflake, Databricks & others

from the cube studios in palo alto in boston bringing you data driven insights from the cube and etr this is breaking analysis with dave vellante at our inaugural super cloud 22 event we further refined the concept of a super cloud iterating on the definition the salient attributes and some examples of what is and what is not a super cloud welcome to this week's wikibon cube insights powered by etr you know snowflake has always been what we feel is one of the strongest examples of a super cloud and in this breaking analysis from our studios in palo alto we unpack our interview with benoit de javille co-founder and president of products at snowflake and we test our super cloud definition on the company's data cloud platform and we're really looking forward to your feedback first let's examine how we defl find super cloudant very importantly one of the goals of super cloud 22 was to get the community's input on the definition and iterate on previous work super cloud is an emerging computing architecture that comprises a set of services which are abstracted from the underlying primitives of hyperscale clouds we're talking about services such as compute storage networking security and other native tooling like machine learning and developer tools to create a global system that spans more than one cloud super cloud as shown on this slide has five essential properties x number of deployment models and y number of service models we're looking for community input on x and y and on the first point as well so please weigh in and contribute now we've identified these five essential elements of a super cloud let's talk about these first the super cloud has to run its services on more than one cloud leveraging the cloud native tools offered by each of the cloud providers the builder of the super cloud platform is responsible for optimizing the underlying primitives of each cloud and optimizing for the specific needs be it cost or performance or latency or governance data sharing security etc but those primitives must be abstracted such that a common experience is delivered across the clouds for both users and developers the super cloud has a metadata intelligence layer that can maximize efficiency for the specific purpose of the super cloud i.e the purpose that the super cloud is intended for and it does so in a federated model and it includes what we call a super pass this is a prerequisite that is a purpose-built component and enables ecosystem partners to customize and monetize incremental services while at the same time ensuring that the common experiences exist across clouds now in terms of deployment models we'd really like to get more feedback on this piece but here's where we are so far based on the feedback we got at super cloud 22. we see three deployment models the first is one where a control plane may run on one cloud but supports data plane interactions with more than one other cloud the second model instantiates the super cloud services on each individual cloud and within regions and can support interactions across more than one cloud with a unified interface connecting those instantiations those instances to create a common experience and the third model superimposes its services as a layer or in the case of snowflake they call it a mesh on top of the cloud on top of the cloud providers region or regions with a single global instantiation a single global instantiation of those services which spans multiple cloud providers this is our understanding from a comfort the conversation with benoit dejaville as to how snowflake approaches its solutions and for now we're going to park the service models we need to more time to flesh that out and we'll propose something shortly for you to comment on now we peppered benoit dejaville at super cloud 22 to test how the snowflake data cloud aligns to our concepts and our definition let me also say that snowflake doesn't use the term data cloud they really want to respect and they want to denigrate the importance of their hyperscale partners nor do we but we do think the hyperscalers today anyway are building or not building what we call super clouds but they are but but people who bar are building super clouds are building on top of hyperscale clouds that is a prerequisite so here are the questions that we tested with snowflake first question how does snowflake architect its data cloud and what is its deployment model listen to deja ville talk about how snowflake has architected a single system play the clip there are several ways to do this you know uh super cloud as as you name them the way we we we picked is is to create you know one single system and that's very important right the the the um [Music] there are several ways right you can instantiate you know your solution uh in every region of a cloud and and you know potentially that region could be a ws that region could be gcp so you are indeed a multi-cloud solution but snowflake we did it differently we are really creating cloud regions which are superposed on top of the cloud provider you know region infrastructure region so we are building our regions but but where where it's very different is that each region of snowflake is not one in instantiation of our service our service is global by nature we can move data from one region to the other when you land in snowflake you land into one region but but you can grow from there and you can you know exist in multiple clouds at the same time and that's very important right it's not one single i mean different instantiation of a system is one single instantiation which covers many cloud regions and many cloud providers snowflake chose the most advanced level of our three deployment models dodgeville talked about too presumably so it could maintain maximum control and ensure that common experience like the iphone model next we probed about the technical enablers of the data cloud listen to deja ville talk about snow grid he uses the term mesh and then this can get confusing with the jamaicani's data mesh concept but listen to benoit's explanation well as i said you know first we start by building you know snowflake regions we have today furry region that spawn you know the world so it's a worldwide worldwide system with many regions but all these regions are connected together they are you know meshed together with our technology we name it snow grid and that makes it hard because you know regions you know azure region can talk to a ws region or gcp regions and and as a as a user of our cloud you you don't see really these regional differences that you know regions are in different you know potentially clown when you use snowflake you can exist your your presence as an organization can be in several regions several clouds if you want geographic and and and both geographic and cloud provider so i can share data irrespective of the the cloud and i'm in the snowflake data cloud is that correct i can do that today exactly and and that's very critical right what we wanted is to remove data silos and and when you instantiate a system in one single region and that system is locked in that region you cannot communicate with other parts of the world you are locking the data in one region right and we didn't want to do that we wanted you know data to be distributed the way customer wants it to be distributed across the world and potentially sharing data at world scale now maybe there are many ways to skin the other cat meaning perhaps if a platform does instantiate in multiple places there are ways to share data but this is how snowflake chose to approach the problem next question how do you deal with latency in this big global system this is really important to us because while snowflake has some really smart people working as engineers and and the like we don't think they've solved for the speed of light problem the best people working on it as we often joke listen to benoit deja ville's comments on this topic so yes and no the the way we do it it's very expensive to do that because generally if you want to join you know data which is in which are in different regions and different cloud it's going to be very expensive because you need to move you know data every time you join it so the way we do it is that you replicate the subset of data that you want to access from one region from other regions so you can create this data mesh but data is replicated to make it very cheap and very performant too and is the snow grid does that have the metadata intelligence yes to actually can you describe that a little bit yeah snow grid is both uh a way to to exchange you know metadata about so each region of snowflake knows about all the other regions of snowflake every time we create a new region diary you know the metadata is distributed over our data cloud not only you know region knows all the regions but knows you know every organization that exists in our clouds where this organization is where data can be replicated by this organization and then of course it's it's also used as a way to uh uh exchange data right so you can exchange you know beta by scale of data size and we just had i was just receiving an email from one of our customers who moved more than four petabytes of data cross-region cross you know cloud providers in you know few days and you know it's a lot of data so it takes you know some time to move but they were able to do that online completely online and and switch over you know to the diff to the other region which is failover is very important also so yes and no probably means typically no he says yes and no probably means no so it sounds like snowflake is selectively pulling small amounts of data and replicating it where necessary but you also heard him talk about the metadata layer which is one of the essential aspects of super cloud okay next we dug into security it's one of the most important issues and we think one of the hardest parts related to deploying super cloud so we've talked about how the cloud has become the first line of defense for the cso but now with multi-cloud you have multiple first lines of defense and that means multiple shared responsibility models and multiple tool sets from different cloud providers and an expanded threat surface so listen to benoit's explanation here please play the clip this is a great question uh security has always been the most important aspect of snowflake since day one right this is the question that every customer of ours has you know how you can you guarantee the security of my data and so we secure data really tightly in region we have several layers of security it starts by by encrypting it every data at rest and that's very important a lot of customers are not doing that right you hear these attacks for example on on cloud you know where someone left you know their buckets uh uh open and then you know you can access the data because it's a non-encrypted uh so we are encrypting everything at rest we are encrypting everything in transit so a region is very secure now you know you never from one region you never access data from another region in snowflake that's why also we replicate data now the replication of that data across region or the metadata for that matter is is really highly secure so snow grits ensure that everything is encrypted everything is you know we have multiple you know encryption keys and it's you know stored in hardware you know secure modules so we we we built you know snow grids such that it's secure and it allows very secure movement of data so when we heard this explanation we immediately went to the lowest common denominator question meaning when you think about how aws for instance deals with data in motion or data and rest it might be different from how another cloud provider deals with it so how does aws uh uh uh differences for example in the aws maturity model for various you know cloud capabilities you know let's say they've got a faster nitro or graviton does it do do you have to how does snowflake deal with that do they have to slow everything else down like imagine a caravan cruising you know across the desert so you know every truck can keep up let's listen it's a great question i mean of course our software is abstracting you know all the cloud providers you know infrastructure so that when you run in one region let's say aws or azure it doesn't make any difference as far as the applications are concerned and and this abstraction of course is a lot of work i mean really really a lot of work because it needs to be secure it needs to be performance and you know every cloud and it has you know to expose apis which are uniform and and you know cloud providers even though they have potentially the same concept let's say blob storage apis are completely different the way you know these systems are secure it's completely different the errors that you can get and and the retry you know mechanism is very different from one cloud to the other performance is also different we discovered that when we were starting to port our software and and and you know we had to completely rethink how to leverage blob storage in that cloud versus that cloud because just of performance too so we had you know for example to you know stripe data so all this work is work that's you know you don't need as an application because our vision really is that applications which are running in our data cloud can you know be abstracted of all this difference and and we provide all the services all the workload that this application need whether it's transactional access to data analytical access to data you know managing you know logs managing you know metrics all of these is abstracted too such that they are not you know tied to one you know particular service of one cloud and and distributing this application across you know many regions many cloud is very seamless so from that answer we know that snowflake takes care of everything but we really don't understand the performance implications in you know in that specific case but we feel pretty certain that the promises that snowflake makes around governance and security within their data sharing construct construct will be kept now another criterion that we've proposed for super cloud is a super pass layer to create a common developer experience and an enabler for ecosystem partners to monetize please play the clip let's listen we build it you know a custom build because because as you said you know what exists in one cloud might not exist in another cloud provider right so so we have to build you know on this all these this components that modern application mode and that application need and and and and that you know goes to machine learning as i say transactional uh analytical system and the entire thing so such that they can run in isolation basically and the objective is the developer experience will be identical across those clouds yes right the developers doesn't need to worry about cloud provider and actually our system we have we didn't talk about it but the marketplace that we have which allows actually to deliver we're getting there yeah okay now we're not going to go deep into ecosystem today we've talked about snowflakes strengths in this regard but snowflake they pretty much ticked all the boxes on our super cloud attributes and definition we asked benoit dejaville to confirm that this is all shipping and available today and he also gave us a glimpse of the future play the clip and we are still developing it you know the transactional you know unistore as we call it was announced in last summit so so they are still you know working properly but but but that's the vision right and and and that's important because we talk about the infrastructure right you mentioned a lot about storage and compute but it's not only that right when you think about application they need to use the transactional database they need to use an analytical system they need to use you know machine learning so you need to provide also all these services which are consistent across all the cloud providers so you can hear deja ville talking about expanding beyond taking advantage of the core infrastructure storage and networking et cetera and bringing intelligence to the data through machine learning and ai so of course there's more to come and there better be at this company's valuation despite the recent sharp pullback in a tightening fed environment okay so i know it's cliche but everyone's comparing snowflakes and data bricks databricks has been pretty vocal about its open source posture compared to snowflakes and it just so happens that we had aligotsy on at super cloud 22 as well he wasn't in studio he had to do remote because i guess he's presenting at an investor conference this week so we had to bring him in remotely now i didn't get to do this interview john furrier did but i listened to it and captured this clip about how data bricks sees super cloud and the importance of open source take a listen to goatzee yeah i mean let me start by saying we just we're big fans of open source we think that open source is a force in software that's going to continue for you know decades hundreds of years and it's going to slowly replace all proprietary code in its way we saw that you know it could do that with the most advanced technology windows you know proprietary operating system very complicated got replaced with linux so open source can pretty much do anything and what we're seeing with the data lake house is that slowly the open source community is building a replacement for the proprietary data warehouse you know data lake machine learning real-time stack in open source and we're excited to be part of it for us delta lake is a very important project that really helps you standardize how you lay out your data in the cloud and with it comes a really important protocol called delta sharing that enables you in an open way actually for the first time ever share large data sets between organizations but it uses an open protocol so the great thing about that is you don't need to be a database customer you don't even like databricks you just need to use this open source project and you can now securely share data sets between organizations across clouds and it actually does so really efficiently just one copy of the data so you don't have to copy it if you're within the same cloud so the implication of ellie gotzi's comments is that databricks with delta sharing as john implied is playing a long game now i don't know if enough about the databricks architecture to comment in detail i got to do more research there so i reached out to my two analyst friends tony bear and sanji mohan to see what they thought because they cover these companies pretty closely here's what tony bear said quote i've viewed the divergent lake house strategies of data bricks and snowflake in the context of their roots prior to delta lake databrick's prime focus was the compute not the storage layer and more specifically they were a compute engine not a database snowflake approached from the opposite end of the pool as they originally fit the mold of the classic database company rather than a specific compute engine per se the lake house pushes both companies outside of their original comfort zones data bricks to storage snowflake to compute engine so it makes perfect sense for databricks to embrace the open source narrative at the storage layer and for snowflake to continue its walled garden approach but in the long run their strategies are already overlapping databricks is not a 100 open source company its practitioner experience has always been proprietary and now so is its sql query engine likewise snowflake has had to open up with the support of iceberg for open data lake format the question really becomes how serious snowflake will be in making iceberg a first-class citizen in its environment that is not necessarily officially branding a lake house but effectively is and likewise can databricks deliver the service levels associated with walled gardens through a more brute force approach that relies heavily on the query engine at the end of the day those are the key requirements that will matter to data bricks and snowflake customers end quote that was some deep thought by by tony thank you for that sanjay mohan added the following quote open source is a slippery slope people buy mobile phones based on open source android but it's not fully open similarly databricks delta lake was not originally fully open source and even today its photon execution engine is not we are always going to live in a hybrid world snowflake and databricks will support whatever model works best for them and their customers the big question is do customers care as deeply about which vendor has a higher degree of openness as we technology people do i believe customers evaluation criteria is far more nuanced than just to decipher each vendor's open source claims end quote okay so i had to ask dodgeville about their so-called wall garden approach and what their strategy is with apache iceberg here's what he said iceberg is is very important so just to to give some context iceberg is an open you know table format right which was you know first you know developed by netflix and netflix you know put it open source in the apache community so we embrace that's that open source standard because because it's widely used by by many um many you know companies and also many companies have you know really invested a lot of effort in building you know big data hadoop solution or data like solution and they want to use snowflake and they couldn't really use snowflake because all their data were in open you know formats so we are embracing icebergs to help these companies move through the cloud but why we have been relentless with direct access to data direct access to data is a little bit of a problem for us and and the reason is when you direct access to data now you have direct access to storage now you have to understand for example the specificity of one cloud versus the other so as soon as you start to have direct access to data you lose your you know your cloud diagnostic layer you don't access data with api when you have direct access to data it's very hard to secure data because you need to grant access direct access to tools which are not you know protected and you see a lot of you know hacking of of data you know because of that so so that was not you know direct access to data is not serving well our customers and that's why we have been relented to do that because it's it's cr it's it's not cloud diagnostic it's it's you you have to code that you have to you you you need a lot of intelligence while apis access so we want open apis that's that's i guess the way we embrace you know openness is is by open api versus you know you access directly data here's my take snowflake is hedging its bets because enough people care about open source that they have to have some open data format options and it's good optics and you heard benoit deja ville talk about the risks of directly accessing the data and the complexities it brings now is that maybe a little fud against databricks maybe but same can be said for ollie's comments maybe flooding the proprietaryness of snowflake but as both analysts pointed out open is a spectrum hey i remember unix used to equal open systems okay let's end with some etr spending data and why not compare snowflake and data bricks spending profiles this is an xy graph with net score or spending momentum on the y-axis and pervasiveness or overlap in the data set on the x-axis this is data from the january survey when snowflake was holding above 80 percent net score off the charts databricks was also very strong in the upper 60s now let's fast forward to this next chart and show you the july etr survey data and you can see snowflake has come back down to earth now remember anything above 40 net score is highly elevated so both companies are doing well but snowflake is well off its highs and data bricks has come down somewhat as well databricks is inching to the right snowflake rocketed to the right post its ipo and as we know databricks wasn't able to get to ipo during the covet bubble ali gotzi is at the morgan stanley ceo conference this week they got plenty of cash to withstand a long-term recession i'm told and they've started the message that they're a billion dollars in annualized revenue i'm not sure exactly what that means i've seen some numbers on their gross margins i'm not sure what that means i've seen some numbers on their net retention revenue or net revenue retention again i'll reserve judgment until we see an s1 but it's clear both of these companies have momentum and they're out competing in the market well as always be the ultimate arbiter different philosophies perhaps is it like democrats and republicans well it could be but they're both going after a solving data problem both companies are trying to help customers get more value out of their data and both companies are highly valued so they have to perform for their investors to paraphrase ralph nader the similarities may be greater than the differences okay that's it for today thanks to the team from palo alto for this awesome super cloud studio build alex myerson and ken shiffman are on production in the palo alto studios today kristin martin and sheryl knight get the word out to our community rob hoff is our editor-in-chief over at siliconangle thanks to all please check out etr.ai for all the survey data remember these episodes are all available as podcasts wherever you listen just search breaking analysis podcasts i publish each week on wikibon.com and siliconangle.com and you can email me at david.vellante at siliconangle.com or dm me at devellante or comment on my linkedin posts and please as i say etr has got some of the best survey data in the business we track it every quarter and really excited to be partners with them this is dave vellante for the cube insights powered by etr thanks for watching and we'll see you next time on breaking analysis [Music] you

Published Date : Aug 14 2022

SUMMARY :

and and the retry you know mechanism is

ENTITIES

Entity	Category	Confidence
netflix	ORGANIZATION	0.99+
john furrier	PERSON	0.99+
palo alto	ORGANIZATION	0.99+
tony bear	PERSON	0.99+
boston	LOCATION	0.99+
sanji mohan	PERSON	0.99+
ken shiffman	PERSON	0.99+
both	QUANTITY	0.99+
today	DATE	0.99+
ellie gotzi	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
siliconangle.com	OTHER	0.99+
more than four petabytes	QUANTITY	0.99+
first point	QUANTITY	0.99+
kristin martin	PERSON	0.99+
both companies	QUANTITY	0.99+
first question	QUANTITY	0.99+
rob hoff	PERSON	0.99+
more than one	QUANTITY	0.99+
second model	QUANTITY	0.98+
alex myerson	PERSON	0.98+
third model	QUANTITY	0.98+
one region	QUANTITY	0.98+
one copy	QUANTITY	0.98+
one region	QUANTITY	0.98+
five essential elements	QUANTITY	0.98+
android	TITLE	0.98+
100	QUANTITY	0.98+
first line	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
sheryl	PERSON	0.98+
more than one cloud	QUANTITY	0.98+
first	QUANTITY	0.98+
iphone	COMMERCIAL_ITEM	0.98+
super cloud 22	EVENT	0.98+
each cloud	QUANTITY	0.98+
each	QUANTITY	0.97+
sanjay mohan	PERSON	0.97+
john	PERSON	0.97+
republicans	ORGANIZATION	0.97+
this week	DATE	0.97+
hundreds of years	QUANTITY	0.97+
siliconangle	ORGANIZATION	0.97+
each week	QUANTITY	0.97+
data lake house	ORGANIZATION	0.97+
one single region	QUANTITY	0.97+
january	DATE	0.97+
dave vellante	PERSON	0.96+
each region	QUANTITY	0.96+
one	QUANTITY	0.96+
dave vellante	PERSON	0.96+
tony	PERSON	0.96+
above 80 percent	QUANTITY	0.95+
more than one cloud	QUANTITY	0.95+
more than one cloud	QUANTITY	0.95+
data lake	ORGANIZATION	0.95+
five essential properties	QUANTITY	0.95+
democrats	ORGANIZATION	0.95+
first time	QUANTITY	0.95+
july	DATE	0.94+
linux	TITLE	0.94+
etr	ORGANIZATION	0.94+
devellante	ORGANIZATION	0.93+
dodgeville	ORGANIZATION	0.93+
each vendor	QUANTITY	0.93+
super cloud 22	ORGANIZATION	0.93+
delta lake	ORGANIZATION	0.92+
three deployment models	QUANTITY	0.92+
first lines	QUANTITY	0.92+
dejaville	LOCATION	0.92+
day one	QUANTITY	0.92+

Analyst Predictions 2022: The Future of Data Management

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

ENTITIES

Entity	Category	Confidence
381 databases	QUANTITY	0.99+
2014	DATE	0.99+
2022	DATE	0.99+
2021	DATE	0.99+
january of 2022	DATE	0.99+
100 users	QUANTITY	0.99+
jamal dagani	PERSON	0.99+
last week	DATE	0.99+
dave meninger	PERSON	0.99+
sanji	PERSON	0.99+
second question	QUANTITY	0.99+
15 converged data stores	QUANTITY	0.99+
dave vellante	PERSON	0.99+
microsoft	ORGANIZATION	0.99+
three	QUANTITY	0.99+
sanjeev	PERSON	0.99+
2023	DATE	0.99+
15 data stores	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
last year	DATE	0.99+
sanjeev mohan	PERSON	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
carl	PERSON	0.99+
tony	PERSON	0.99+
carl olufsen	PERSON	0.99+
six years	QUANTITY	0.99+
david	PERSON	0.99+
carlos specter	PERSON	0.98+
both sides	QUANTITY	0.98+
2010s	DATE	0.98+
first backlash	QUANTITY	0.98+
five years	QUANTITY	0.98+
today	DATE	0.98+
dave	PERSON	0.98+
each	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
first	QUANTITY	0.98+
single platform	QUANTITY	0.98+
lake house	ORGANIZATION	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
doug	PERSON	0.97+
one word	QUANTITY	0.97+
this year	DATE	0.97+
wikibon.com	OTHER	0.97+
one platform	QUANTITY	0.97+
39	QUANTITY	0.97+
about 600 percent	QUANTITY	0.97+
two analysts	QUANTITY	0.97+
ten years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
five	QUANTITY	0.96+
one	QUANTITY	0.96+
three quarters	QUANTITY	0.96+
california	LOCATION	0.96+
google	ORGANIZATION	0.96+
single	QUANTITY	0.95+

Tomer Shiran, Dremio | AWS re:Invent 2021

>>Good morning. Welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm Lisa Martin. We have two live sets here. We've got over a hundred guests on the program this week with our live sets of remote sets, talking about the next decade in cloud innovation. And I'm pleased to be welcoming back. One of our cube alumni timbers. She ran the founder and CPO of Jenny-O to the program. Tom is going to be talking about why 2022 is the year open data architectures surpass the data warehouse Timur. Welcome back to the >>Cube. Thanks for having me. It's great to be here. It's >>Great to be here at a live event in person, my goodness, sitting side by side with guests. Talk to me a little bit about before we kind of dig into the data lake house versus the data warehouse. I want to, I want to unpack that with you. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, but what are some of the things going on right now in the fall of 2021? >>Yeah, for us, it's a big year of, uh, a lot of product news, a lot of new products, new innovation, a company's grown a lot. We're, uh, you know, probably three times bigger than we were a year ago. So a lot of, a lot of new, new folks on the team and, uh, many, many new customers. >>It's good, always new customers, especially during the last 22 months, which have been obviously incredibly challenging, but I want to unpack this, the difference between a data lake and data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities and how customers are benefiting. Sure. Yeah. >>I think you could think of the lake house as kind of the evolution of the lake, right? So we have, we've had data lakes for a while. Now, the transition to the cloud made them a lot more powerful and now a lot of new capabilities coming into the world of data lakes really make the, that whole kind of concept that whole architecture, much more powerful to the point that you really are not going to need a data warehouse anymore. Right. And so it kind of gives you the best of both worlds, all the advantages that we had with data lakes, the flexibility to use different processing engines, to have data in your own account and open formats, um, all those benefits, but also the benefits that you had with warehouses, where you could do transactions and get high performance for your, uh, BI workloads and things like that. So the lake house makes kind of both of those come together and gives you the, the benefits of both >>Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go about from say they've got data warehouses, data lakes to actually evolving to the lake house. >>You know, data warehouses have been around forever, right? And you know, there's, there's been some new innovation there as we've kind of moved to the cloud, but fundamentally there are very close and very proprietary architecture that gets very expensive quickly. And so, you know, with a data warehouse, you have to take your data and load it into the warehouse, right. You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, that's, that's what you do. You bring the data into the engine. Um, the data lake house is a really different architecture. It's one where you actually, you're having, you have data as its own tier, right? Stored in open formats, things like parquet files and iceberg tables. And you're basically bringing the engines to the data instead of the data to the engine. And so now all of a sudden you can start to take advantage of all this innovation that's happening on the same set of data without having to copy and move it around. So whether that's, you know, Dremio for high performance, uh, BI workloads and SQL type of analysis, a spark for kind of batch processing and machine learning, Flink for streaming. So lots of different technologies that you can use on the, on the same data and the data stays in the customer's own account, right? So S3 effectively becomes their new data warehouse. >>Okay. So it can imagine during the last 22 months of this scattered work from Eddie, and we're still in this work from anywhere environment with so much data being generated at the edge of the edge, expanding that bringing the engines to the data is probably now more timely than ever. >>Yeah. I think the, the growth in data, uh, you see it everywhere, right? That that's the reason so many companies like ourselves are doing so well. Right? It's, it's, there's so much new data, so many new use cases and every company wants to be data-driven right. They all want to be, you know, to, to democratize data within the organization. Um, you know, but you need the platforms to be able to do that. Right. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, you know, which maybe is landing in S3, but move it into, you know, subsets of it into a data warehouse. And then from there move, you know, substance of that into, you know, BI extracts, right? Tableau extracts power BI imports, and you have to create cubes and lots of copies within the data warehouse. There's no way you're going to be able to provide self-service and data democratization. And so really requires a new architecture. Um, and that's one of the main things that we've been focused on at Dremio, um, is really taking the, the, the lake house and the lake and making it, not just something that data scientists use for, you know, really kind of advanced use cases, but even your production BI workloads can actually now run on the lake house when you're using a SQL technology. Like, and then >>It's really critical because as you talked about this, you know, companies, every company, these days is a data company. If they're not, they have to be, or there's a competitor in the rear view mirror that is going to be able to take over what they're doing. So this really is really critical, especially considering another thing that we learned in the last 22 months is that there's no real-time data access is no longer, a nice to have. It's really an essential for businesses in any organization. >>I think, you know, we, we see it even in our own company, right? The folks that are joining the workforce now, they, they learn sequel in school, right. They, they, they don't want to report on their desk, printed out every Monday morning. They want access to the database. How do I connect my whatever tool I want, or even type sequel by hand. And I want access to the data and I want to just use it. Right. And I want the performance of course, to be fast because otherwise I'll get frustrated and I won't use it, which has been the status quo for a long time. Um, and that's basically what we're solving >>The lake house versus a data warehouse, better able to really facilitate data democratization across an organization. >>Yeah. Because there's a big, you know, people don't talk a lot about the story before the story, right. With, with a data warehouse, the data never starts there. Right. You typically first have your data in something like an S3 or perhaps in other databases, right. And then you have to kind of ETL at all into, um, into that warehouse. And that's a lot of work. And typically only a small subset of the data gets ETL into that data warehouse. And then the user wants to query something that's not in the warehouse. And somebody has to go from engineering, spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, uh, to get the data in. And so it's a big problem, right? And so if you can have a system that can query the data directly in S3 and even join it with sources, uh, outside of that things like your Oracle database, your, your SQL server database here, you know, Mongo, DB, et cetera. Well, now you can really have the ability to expose data to your, to your users within the company and make it very self-service. They can, they can query any data at any time and get a fast response time that that's, that's what they need >>At self-service is key there. Speaking of self-service and things that are new. I know you guys dromio cloud launched that recently, new SAS offering. Talk to me about that. What's going on there. Yeah. >>We want to stream your cloud. We, we spent about two years, um, working on that internally and, uh, really the goal was to simplify how we deliver all of the, kind of the benefits that we've had in our product. Right. Sub-second response times on the lake, a semantic layer, the ability to connect to multiple sources, but take away the pain of having to, you know, install and manage software. Right. And so we did it in a way that the user doesn't have to think about versions. They don't have to think about upgrades. They don't have to monitor anything. It's basically like running and using Gmail. Right? You log in, you, you get to use it, right. You don't have to be very sophisticated. There's no, not a lot of administration you have to do. Um, it basically makes it a lot, a lot simpler. >>And what's the adoption been like so far? >>It's been great. It's been limited availability, but we've been onboarding customers, uh, every week now. Um, many startups, many of the world's largest companies. So that's been, that's been really exciting actually. >>So quite a range of customers. And one of the things, it sounds like you want me to has grown itself during the pandemic. We've seen acceleration of, of that, of, of, uh, startups, of a lot of companies, of cloud adoption of migration. What are some, how have your customer conversations changed in the last 22 months as businesses and every industry kind of scrambled in the beginning to, to survive and now are realizing that they need to modernize, to thrive and to be competitive and to have competitive advantage. >>I think I've seen a few different trends here. One is certainly, there's been a lot of, uh, acceleration of movement to the cloud, right? With, uh, uh, you know, how different businesses have been impacted. It's required them to be more agile, more elastic, right. They don't necessarily know how much workload they're gonna have at any point in time. So having that flexibility, both in terms of the technology that can, you know, with Dremio cloud, we scale, for example, infinitely, like you can have, you know, one query a day, or you can have a thousand queries a second and the system just takes care of it. Right. And so that's really important to these companies that are going through, you know, being impacted in various different ways, right? You had the companies, you know, the Peloton and zooms of the world that were business was exploding. >>And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, uh, you know, since then, but so that flexibility, um, has been really important to customers. I think the other thing is just they've realized that they have to leverage data, right? Because in parallel to this pandemic has been also really a boom in technology, right? And so every industry is being disrupted by new startups, whether it's the insurance industry, the financial services, a lot of InsureTech, FinTech, you know, different, uh, companies that are trying to take advantage of data. So if you, as a, as an enterprise are not doing that, you know, that's a problem. >>It is a problem. It's definitely something that I think every business and every industry needs to be very acutely aware of because from a competitive advantage perspective, you know, there's someone in that rear view mirror who is going to be focused on data. I have a real solid, modern data strategy. That's going to be able to take over if a company is resting on its laurels at all. So here we are at reinvent, they talked a lot about, um, I just came off of Adam psyllid speeds. So Lipsey's keynote. But talk to me about the jumbo AWS partnership. I know AWS its partner ecosystem is huge. You're one of the partners, but talk to me about what's going on with the partnership. How long have you guys been partners? What are the advantages for your customers? >>You know, we've been very close partners with AWS for, for a number of years now, and it kind of spans many different parts of AWS from kind of the, uh, the engineering organization. So very close relationship with the S3 team, the C2 team, uh, you know, just having dinner last night with, uh, Kevin Miller, the GM of S3. Um, and so that's kind of one side of things is really the engineering integration. You know, we're the first technology to integrate with AWS lake formation, which is Amazon's data lake security technology. So we do a lot of work together on kind of upcoming features that Amazon is releasing. Um, and then also they've been really helpful on the go-to-market side of things on the sales and marketing, um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually promoting Dremio to their customers, um, uh, to help them be successful. So it's really been a good, good partnership. >>And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, their customer obsession sounds like you're, there's deep alignment on from the technical engineering perspective, sales and marketing. Talk to me a little bit about cultural alignment, because when you're going into customer conversations, I imagine they want to see one unified team. >>Yeah. You know, I think Amazon does have that customer first and obviously we do as well. And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. Right. So that's where we call it customer obsession internally. Um, and I think that's very much what we've seen, you know, with, with AWS as well as the desire to make the customer successful comes before. Okay. How does this affect a specific Amazon product? Right? Because anytime a customer is, uh, you know, using Dremio on AWS, they're also consuming many different AWS services and they're bringing data into AWS. And so, um, I, I think for both of us, it's all about how do we solve customer problems and make them successful with their data in this case. Yup. >>Solving those customer problems is the whole reason that we're all here. Right. Talk to me a little bit about, um, as we have just a few more minutes here, we, when we hear terms like, future-proof, I always want to dig in with, with folks like yourself, chief product officers, what does it actually mean? How do you enable businesses to create these future-proof data architectures that are gonna allow them to scale and be really competitive? Sure. >>So yeah, I think many companies have been, have experienced. What's known as lock-in right. They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, right? You, you start using that and you can really never get off and prices go up and you find out that you're spending 10 times more, especially now with the cloud data warehouses 10 times more than you thought you were going to be spending. And at that point it becomes very difficult. Right? What do you do? And so, um, one of the great things about the data lake and the lake house architecture is that the data stays stored in the customer's own account. Right? It's in their S3 buckets in source formats, like parquet files and iceberg tables. Um, and they can use many different technologies on that. So, you know, today the best technology for, for, you know, sequel and, you know, powering your, your mission critical BI is, is Dremio, but tomorrow they might be something else, right. >>And that customer can then take that, uh, uh, that company can take that new technology point at the same data and start using it right. That they don't have to go through some really crazy migration process. And, you know, we see that with Teradata data and Oracle, right? The, the, the old school vendors, um, that's always been a pain. And now it is with the, with the newer, uh, cloud data warehouses, you see a lot of complaints around that, so that the lake house is fundamentally designed. Especially if you choose open source formats, like iceberg tables, as opposed to say a Delta, like you're, you're really, you know, future-proofing yourself. Right. Um, >>Got it. Talk to me about some of the things as we wrap up here that, that attendees can learn and see and touch and feel and smell at the jumbo booth at this reinvent. >>Yeah. I think there's a, there's a few different things they can, uh, they can watch, uh, watch a demo or play around with the dremmel cloud and they can talk to our team about what we're doing with Apache iceberg. It's a iceberg to me is one of the more exciting projects, uh, in this space because, you know, it's just created by Netflix and apple Salesforce, AWS just announced support for iceberg with that, with their products, Athena and EMR. So it's really kind of emerging as the standard table format, the way to represent data in open formats in S3. We've been behind iceberg now for, for a while. And so that to us is very exciting. We're happy to chat with folks at the booth about that. Um, Nessie is another project that we created an source project for, uh, really providing a good experience for your data, where you have version control and branching, and kind of trying to reinvent, uh, data engineering, data management. So that's another cool project that there, uh, we can talk about at the booth. >>So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program today, talking about the difference between a data warehouse data lake, the lake house, did a great job explaining that Jamil cloud what's going on and how you guys are deepening that partnership with AWS. We appreciate your time. Thank you. Thanks for having me. My pleasure for Tomer. She ran I'm Lisa Martin. You're watching the cube. Our coverage of AWS reinvent continues after this.

Published Date : Nov 30 2021

SUMMARY :

She ran the founder and CPO of Jenny-O to the program. It's great to be here. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, We're, uh, you know, probably three times bigger than we were a year data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities So the lake house makes kind of both of those come together and gives you the, the benefits of both Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, expanding that bringing the engines to the data is probably now more timely than ever. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, It's really critical because as you talked about this, you know, companies, every company, these days is a data company. I think, you know, we, we see it even in our own company, right? The lake house versus a data warehouse, better able to really facilitate data democratization across spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, I know you guys dromio cloud launched that recently, to, you know, install and manage software. Um, many startups, many of the world's largest companies. And one of the things, it sounds like you want me to has grown itself during the pandemic. So having that flexibility, both in terms of the technology that can, you know, And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, You're one of the partners, but talk to me about what's going on with the partnership. um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. How do you enable businesses to create these future-proof They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, And, you know, we see that with Teradata data and Oracle, right? Talk to me about some of the things as we wrap up here that, that attendees can learn and see and uh, in this space because, you know, it's just created by Netflix and apple Salesforce, So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Kevin Miller	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tom	PERSON	0.99+
10 times	QUANTITY	0.99+
10 times	QUANTITY	0.99+
Tomer	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Elizabeth	PERSON	0.99+
two months	QUANTITY	0.99+
Tomer Shiran	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Lipsey	PERSON	0.99+
Dremio	PERSON	0.99+
tomorrow	DATE	0.99+
apple	ORGANIZATION	0.99+
a month	QUANTITY	0.99+
One	QUANTITY	0.99+
fall of 2021	DATE	0.98+
today	DATE	0.98+
Eddie	PERSON	0.98+
one	QUANTITY	0.98+
both worlds	QUANTITY	0.98+
Adam psyllid	PERSON	0.98+
Gmail	TITLE	0.98+
S3	TITLE	0.97+
next decade	DATE	0.97+
SQL	TITLE	0.97+
a year ago	DATE	0.97+
three times	QUANTITY	0.97+
two live sets	QUANTITY	0.97+
2022	DATE	0.97+
this week	DATE	0.96+
iceberg	TITLE	0.96+
Dremio	ORGANIZATION	0.96+
first	QUANTITY	0.96+
about two years	QUANTITY	0.95+
Apache	ORGANIZATION	0.95+
Tableau	TITLE	0.95+
Monday morning	DATE	0.94+
SAS	ORGANIZATION	0.94+
one query	QUANTITY	0.94+
Jemena	ORGANIZATION	0.94+
earlier this summer	DATE	0.93+
second	QUANTITY	0.93+
first focus	QUANTITY	0.92+
last 22 months	DATE	0.91+
Delta	ORGANIZATION	0.9+
zero	QUANTITY	0.9+
2021	DATE	0.89+
last night	DATE	0.87+
a thousand queries	QUANTITY	0.85+
Mongo	ORGANIZATION	0.85+
a day	QUANTITY	0.84+
first technology	QUANTITY	0.82+
pandemic	EVENT	0.81+
a second	QUANTITY	0.8+

Greg Rokita, Edmunds.com & Joel Minnick, Databricks | AWS re:Invent 2021

>>We'll come back to the cubes coverage of AWS reinvent 2021, the industry's most important hybrid event. Very few hybrid events, of course, in the last two years. And the cube is excited to be here. Uh, this is our ninth year covering AWS reinvent this the 10th reinvent we're here with Joel Minnick, who the vice president of product and partner marketing at smoking hot company, Databricks and Greg Rokita, who is executive director of technology at Edmonds. If you're buying a car or leasing a car, you gotta go to Edmund's. We're gonna talk about busting data silos, guys. Great to see you again. >>Welcome. Welcome. Glad to be here. >>All right. So Joel, what the heck is a lake house? This is all over the place. Everybody's talking about lake house. What is it? >>And it did well in a nutshell, a Lakehouse is the ability to have one unified platform to handle all of your traditional analytics workloads. So your BI and reporting Trisha, the lake, the workloads that you would have for your data warehouse on the same platform as the workloads that you would have for data science and machine learning. And so if you think about kind of the way that, uh, most organizations have built their infrastructure in the cloud today, what we have is generally customers will land all their data in a data lake and a data lake is fantastic because it's low cost, it's open. It's able to handle lots of different kinds of data. Um, but the challenges that data lakes have is that they don't necessarily scale very well. It's very hard to govern data in a data lake house. It's very hard to manage that data in a data lake, sorry, in a, in a data lake. >>And so what happens is that customers then move the data out of a data lake into downstream systems and what they tend to move it into our data warehouses to handle those traditional reporting kinds of workloads that they have. And they do that because data warehouses are really great at being able to have really great scale, have really great performance. The challenge though, is that data warehouses really only work for structured data. And regardless of what kind of data warehouse you adopt, all data warehouse and platforms today are built on some kind of proprietary format. So once you've put that data into the data warehouse, that's, that is kind of what you're locked into. The promise of the data lake house was to say, look, what if we could strip away all of that complexity and having to move data back and forth between all these different systems and keep the data exactly where it is today and where it is today is in the data lake. >>And then being able to apply a transaction layer on top of that. And the Databricks case, we do that through a technology and open source technology called data lake, or sorry, Delta lake. And what Delta lake allows us to do is when you need it, apply that performance, that reliability, that quality, that scale that you would expect out of a data warehouse directly on your data lake. And if I can do that, then what I'm able to do now is operate from one single source of truth that handles all of my analytics workloads, both my traditional analytics workloads and my data science and machine learning workloads, and being able to have all of those workloads on one common platform. It means that now not only do I get much, much more simple in the way that my infrastructure works and therefore able to operate at much lower costs, able to get things to production much, much faster. >>Um, but I'm also able to now to leverage open source in a much bigger way being that lake house is inherently built on an open platform. Okay. So I'm no longer locked into any kind of data format. And finally, probably one of the most, uh, lasting benefits of a lake house is that all the roles that have to take that have to touch my data for my data engineers, to my data analyst, my data scientists, they're all working on the same data, which means that collaboration that has to happen to go answer really hard problems with data. I'm now able to do much, much more easy because those silos that traditionally exist inside of my environment no longer have to be there. And so Lakehouse is that is the promise to have one single source of truth, one unified platform for all of my data. Okay, >>Great. Thank you for that very cogent description of what a lake house is now. Let's I want to hear from the customer to see, okay, this is what he just said. True. So actually, let me ask you this, Greg, because the other problem that you, you didn't mention about the data lake is that with no schema on, right, it gets messy and Databricks, I think, correct me if I'm wrong, has begun to solve that problem, right? Through series of tooling and AI. That's what Delta liked us. It's a man, like it's a managed service. Everybody thought you were going to be like the cloud era of spark and Brittany Britain, a brilliant move to create a managed service. And it's worked great. Now everybody has a managed service, but so can you paint a picture at Edmonds as to what you're doing with, maybe take us through your journey the early days of a dupe, a data lake. Oh, that sounds good. Throw it in there, paint a picture as to how you guys are using data and then tie it into what y'all just said. >>As Joel said, that they'll the, it simplifies the architecture quite a bit. Um, in a modern enterprise, you have to deal with a variety of different data sources, structured semi-structured and unstructured in the form of images and videos. And with Delta lake and built a lake, you can have one system that handles all those data sources. So what that does is that basically removes the issue of multiple systems that you have to administer. It lowers the cost, and it provides consistency. If you have multiple systems that deal with data, you always arise as the issue as to which data has to be loaded into which system. And then you have issues with consistency. Once you have issues with consistency, business users, as analysts will stop trusting your data. So that was very critical for us to unify the system of data handling in the one place. >>Additionally, you have a massive scalability. So, um, I went to the talk with from apple saying that, you know, he can process two years worth of data. Instead of just two days in an Edmonds, we have this use case of backfilling the data. So often we changed the logic and went to new. We need to reprocess massive amounts of data with the lake house. We can reprocess months worth of data in, in a matter of minutes or hours. And additionally at the data lake houses based on open, uh, open standards, like parquet that allowed us, allowed us to basically hope open source and third-party tools on top of the Delta lake house. Um, for example, a Mattson, we use a Matson for data discovery, and finally, uh, the lake house approach allows us for different skillsets of people to work on the same source data. We have analysts, we have, uh, data engineers, we have statisticians and data scientists using their own programming languages, but working on the same core of data sets without worrying about duplicating data and consistency issues between the teams. >>So what, what is, what are the primary use cases where you're using house Lakehouse Delta? >>So, um, we work, uh, we have several use cases, one of them more interesting and important use cases as vehicle pricing, you have used the Edmonds. So, you know, you go to our website and you use it to research vehicles, but it turns out that pricing and knowing whether you're getting a good or bad deal is critical for our, uh, for our business. So with the lake house, we were able to develop a data pipeline that ingests the transactions, curates the transactions, cleans them, and then feeds that curated a curated feed into the machine learning model that is also deployed on the lake house. So you have one system that handles this huge complexity. And, um, as you know, it's very hard to find unicorns that know all those technologies, but because we have flexibility of using Scala, Java, uh, Python and SQL, we have different people working on different parts of that pipeline on the same system and on the same data. So, um, having Lakehouse really enabled us to be very agile and allowed us to deploy new sources easily when we, when they arrived and fine tune the model to decrease the error rates for the price prediction. So that process is ongoing and it's, it's a very agile process that kind of takes advantage of the, of the different skill sets of different people on one system. >>Because you know, you guys democratized by car buying, well, at least the data around car buying because as a consumer now, you know, I know what they're paying and I can go in of course, but they changed their algorithms as well. I mean, the, the dealers got really smart and then they got kickbacks from the manufacturer. So you had to get smarter. So it's, it's, it's a moving target, I guess. >>Great. The pricing is actually very complex. Like I, I don't have time to explain it to you, but knowing, especially in this crazy market inflationary market where used car prices are like 38% higher year over year, and new car prices are like 10% higher and they're changing rapidly. So having very responsive pricing model is, is extremely critical. Uh, you, I don't know if you're familiar with Zillow. I mean, they almost went out of business because they mispriced their, uh, their houses. So, so if you own their stock, you probably under shorthand of it, but, you know, >>No, but it's true because I, my lease came up in the middle of the pandemic and I went to Edmonds, say, what's this car worth? It was worth like $7,000. More than that. Then the buyout costs the residual value. I said, I'm taking it, can't pass up that deal. And so you have to be flexible. You're saying the premises though, that open source technology and Delta lake and lake house enabled that flexible. >>Yes, we are able to ingest new transactions daily recalculate our model within less than an hour and deploy the new model with new pricing, you know, almost real time. So, uh, in this environment, it's very critical that you kind of keep up to date and ingest their latest transactions as they prices change and recalculate your model that predicts the future prices. >>Because the business lines inside of Edmond interact with the data teams, you mentioned data engineers, data scientists, analysts, how do the business people get access to their data? >>Originally, we only had a core team that was using Lakehouse, but because the usage was so powerful and easy, we were able to democratize it across our units. So other teams within software engineering picked it up and then analysts picked it up. And then even business users started using the dashboarding and seeing, you know, how the price has changed over time and seeing other, other metrics within the, >>What did that do for data quality? Because I feel like if I'm a business person, I might have context of the data that an analyst might not have. If they're part of a team that's servicing all these lines of business, did you find that the data quality, the collaboration affected data? >>Th the biggest thing for us was the fact that we don't have multiple systems now. So you don't have to load the data. Whenever you have to load the data from one system to another, there is always a lag. There's always a delay. There is always a problematic job that didn't do the copy correctly. And the quality is uncertain. You don't know which system tells you the truth. Now we just have one layer of data. Whether you do reports, whether you're data processing or whether you do modeling, they all read the same data. And the second thing is that with the dashboarding capabilities, people that were not very technical that before we could only use Tableau and Tableau is not the easiest thing to use as if you're not technical. Now they can use it. So anyone can see how our pricing data looks, whether you're an executive, whether you're an analyst or a casual business users, >>But Hey, so many questions, you guys are gonna have to combat. I'm gonna run out of time, but you now allow a consumer to buy a car directly. Yes. Right? So that's a new service that you launched. I presume that required new data. We give, we >>Give consumers offers. Yes. And, and that offer you >>Offered to buy my league. >>Exactly. And that offer leverages the pricing that we develop on top of the lake house. So the most important thing is accurately giving you a very good offer price, right? So if we give you a price, that's not so good. You're going to go somewhere else. If we give you price, that's too high, we're going to go bankrupt like Zillow debt, right. >>It took to enable that you're working off the same dataset. Yes. You're going to have to spin up a, did you have to inject new data? Was there a new data source that we're working on? >>Once we curate the data sources and once we clean it, we see the directly to the model. And all of those components are running on the lake house, whether you're curating the data, cleaning it or running the model. The nice thing about lake house is that machine learning is the first class citizen. If you use something like snowflake, I'm not going to slam snowflake here, but you >>Have two different use case. You have >>To, you have to load it into a different system later. You have to load it into a different system. So like good luck doing machine learning on snowflake. Right. >>Whereas, whereas Databricks, that's kind of your raison d'etre >>So what are your, your, your data engineer? I feel like I should be a salesman or something. Yeah. I'm not, I'm not saying that. Just, just because, you know, I was told to, like, I'm saying it because of that's our use case, >>Your use case. So question for each of you, what, what business results did you see when you went to kind of pre lake house, post lake house? What are the, any metrics you can share? And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, what can you tell us? Well, >>Uh, before their lake house, we had two different systems. We had one for processing, which was still data breaks. And the second one for serving and we iterated over Nateeza or Redshift, but we figured that maintaining two different system and loading data from one to the other was a huge overhead administration security costs. Um, the fact that you had to consistency issues. So the fact that you can have one system, um, with, uh, centralized data, solves all those issues. You have to have one security mechanism, one administrative mechanism, and you don't have to load the data from one system to the other. You don't have to make compromises. >>It's scale is not a problem because of the cloud, >>Because you can spend clusters at will for different use cases. So your clusters are independent. You have processing clusters that are not affecting your serving clusters. So, um, in the past, if you were running a serving, say on Nateeza or Redshift, if you were doing heavy processing, your reports would be affected, but now all those clusters are separated. So >>Consumer data consumer can take that data from the producer independ >>Using its own cluster. Okay. >>Yeah. I'll give you the final word, Joel. I know it's been, I said, you guys got to come back. This is what have you seen broadly? >>Yeah. Well, I mean, I think Greg's point about scale. It's an interesting one. So if you look at cross the entire Databricks platform, the platform is launching 9 million VMs every day. Um, and we're in total processing over nine exabytes a month. So in terms of just how much data the platform is able to flow through it, uh, and still maintain a extremely high performance is, is bar none out there. And then in terms of, if you look at just kind of the macro environment of what's happening out there, you know, I think what's been most exciting to watch or what customers are experiencing traditionally or, uh, on the traditional data warehouse and kinds of workloads, because I think that's where the promise of lake house really comes into its own is saying, yes, I can run these traditional data warehousing workloads that require a high concurrency high scale, high performance directly on my data lake. >>And, uh, I think probably the two most salient data points to raise up there is, uh, just last month, Databricks announced it's set the world record for the, for the, uh, TPC D S 100 terabyte benchmark. So that is a place where Databricks at the lake house architecture, that benchmark is built to measure data warehouse performance and the lake house beat data warehouse and sat their own game in terms of overall performance. And then what's that spends from a price performance standpoint, it's customers on Databricks right now are able to enjoy that level of performance at 12 X better price performance than what cloud data warehouses provide. So not only are we jumping on this extremely high scale and performance, but we're able to do it much, much more efficiently. >>We're gonna need a whole nother section second segment to talk about benchmarking that guys. Thanks so much, really interesting session and thank you and best of luck to both join the show. Thank you for having us. Very welcome. Okay. Keep it right there. Everybody you're watching the cube, the leader in high-tech coverage at AWS reinvent 2021

Published Date : Nov 30 2021

SUMMARY :

Great to see you again. Glad to be here. This is all over the place. and reporting Trisha, the lake, the workloads that you would have for your data warehouse on And regardless of what kind of data warehouse you adopt, And what Delta lake allows us to do is when you need it, that all the roles that have to take that have to touch my data for as to how you guys are using data and then tie it into what y'all just said. And with Delta lake and built a lake, you can have one system that handles all Additionally, you have a massive scalability. So you have one system that So you had to get smarter. So, so if you own their stock, And so you have to be flexible. less than an hour and deploy the new model with new pricing, you know, you know, how the price has changed over time and seeing other, other metrics within the, lines of business, did you find that the data quality, the collaboration affected data? So you don't have to load But Hey, so many questions, you guys are gonna have to combat. So the most important thing is accurately giving you a very good offer did you have to inject new data? I'm not going to slam snowflake here, but you You have To, you have to load it into a different system later. Just, just because, you know, I was told to, And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, So the fact that you can have one system, So, um, in the past, if you were running a serving, Okay. This is what have you seen broadly? So if you look at cross the entire So not only are we jumping on this extremely high scale and performance, but we're able to do it much, Thanks so much, really interesting session and thank you and best of luck to both join the show.

ENTITIES

Entity	Category	Confidence
Joel	PERSON	0.99+
Greg	PERSON	0.99+
Joel Minnick	PERSON	0.99+
$7,000	QUANTITY	0.99+
Greg Rokita	PERSON	0.99+
38%	QUANTITY	0.99+
two days	QUANTITY	0.99+
10%	QUANTITY	0.99+
Java	TITLE	0.99+
Databricks	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
one system	QUANTITY	0.99+
one	QUANTITY	0.99+
Scala	TITLE	0.99+
apple	ORGANIZATION	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.99+
ninth year	QUANTITY	0.99+
last month	DATE	0.99+
lake house	ORGANIZATION	0.99+
two different systems	QUANTITY	0.99+
Tableau	TITLE	0.99+
2021	DATE	0.99+
9 million VMs	QUANTITY	0.99+
second thing	QUANTITY	0.99+
less than an hour	QUANTITY	0.99+
Lakehouse	ORGANIZATION	0.98+
12 X	QUANTITY	0.98+
Delta	ORGANIZATION	0.98+
Delta lake house	ORGANIZATION	0.98+
one layer	QUANTITY	0.98+
one common platform	QUANTITY	0.98+
both	QUANTITY	0.97+
AWS	ORGANIZATION	0.97+
Zillow	ORGANIZATION	0.97+
Brittany Britain	PERSON	0.97+
Edmunds.com	ORGANIZATION	0.97+
two different system	QUANTITY	0.97+
Edmonds	ORGANIZATION	0.97+
over nine exabytes a month	QUANTITY	0.97+
today	DATE	0.96+
Lakehouse Delta	ORGANIZATION	0.96+
Delta lake	ORGANIZATION	0.95+
Trisha	PERSON	0.95+
data lake	ORGANIZATION	0.94+
Mattson	ORGANIZATION	0.92+
second segment	QUANTITY	0.92+
each	QUANTITY	0.92+
Matson	ORGANIZATION	0.91+
two most salient data points	QUANTITY	0.9+
Edmonds	LOCATION	0.89+
100 terabyte	QUANTITY	0.87+
one single source	QUANTITY	0.86+
first class	QUANTITY	0.85+
Nateeza	TITLE	0.85+
one security	QUANTITY	0.85+
Redshift	TITLE	0.84+

Howard Levenson

>>AWS public sector summit here in person in Washington, D. C. For two days live. Finally a real event. I'm john for your host of the cube. Got a great guest Howard Levinson from data bricks, regional vice president and general manager of the federal team for data bricks. Uh Super unicorn. Is it a decade corn yet? It's uh, not yet public but welcome to the cube. >>I don't know what the next stage after unicorn is, but we're growing rapidly. >>Thank you. Our audience knows David bricks extremely well. Always been on the cube many times. Even back, we were covering them back when big data was big data. Now it's all data everything. So we watched your success. Congratulations. Thank you. Um, so there's no, you know, not a big bridge for us across to see you here at AWS public sector summit. Tell us what's going on inside the data bricks amazon relationship. >>Yeah. It's been a great relationship. You know, when the company got started some number of years ago we got a contract with the government to deliver the data brooks capability and they're classified cloud in amazon's classified cloud. So that was the start of a great federal relationship today. Virtually all of our businesses in AWS and we run in every single AWS environment from commercial cloud to Govcloud to secret top secret environments and we've got customers doing great things and experiencing great results from data bricks and amazon. >>The federal government's the classic, I call migration opportunity. Right? Because I mean, let's face it before the pandemic even five years ago, even 10 years ago. Glacier moving speed slow, slow and they had to get modernized with the pandemic forced really to do it. But you guys have already cleared the runway with your value problems. You've got lake house now you guys are really optimized for the cloud. >>Okay, hardcore. Yeah. We are, we only run in the cloud and we take advantage of every single go fast feature that amazon gives us. But you know john it's The Office of Management and Budget. Did a study a couple of years ago. I think there were 28,000 federal data centers, 28,000 federal data centers. Think about that for a minute and just think about like let's say in each one of those data centers you've got a handful of operational data stores of databases. The federal government is trying to take all of that data and make sense out of it. The first step to making sense out of it is bringing it all together, normalizing it. Fed aerating it and that's exactly what we do. And that's been a real win for our federal clients and it's been a real exciting opportunity to watch people succeed in that >>endeavour. We have another guest on. And she said those data center huggers tree huggers data center huggers, majority of term people won't let go. Yeah. So but they're slowly dying away and moving on to the cloud. So migrations huge. How are you guys migrating with your customers? Give us an example of how it's working. What are some of the use cases? >>So before I do that I want to tell you a quick story. I've I had the luxury of working with the Air Force Chief data officer Ailene vedrine and she is commonly quoted as saying just remember as as airmen it's not your data it's the Air Force's data. So people were data center huggers now their data huggers but all of that data belongs to the government at the end of the day. So how do we help in that? Well think about all this data sitting in all these operational data stores they're getting it's getting updated all the time. But you want to be able to Federated this data together and make some sense out of it. So for like an organization like uh us citizenship and immigration services they had I think 28 different data sources and they want to be able to pull that data basically in real time and bring it into a data lake. Well that means doing a change data capture off of those operational data stores transforming that data and normalizing it so that you can then enjoy it. And we've done that I think they're now up to 70 data sources that are continually ingested into their data lake. And from there they support thousands of users doing analysis and reports for the whole visa processing system for the United States, the whole naturalization environment And their efficiency has gone up I think by their metrics by 24 x. >>Yeah. I mean Sandy carter was just on the cube earlier. She's the Vice president partner ecosystem here at public sector. And I was coming to her that federal game has changed, it used to be hard to get into you know everybody and you navigate the trip wires and all the subtle hints and and the people who are friends and it was like cloak and dagger and so people were locked in on certain things databases and data because now has to be freely available. I know one of the things that you guys are passionate about and this is kind of hard core architectural thing is that you need horizontally scalable data to really make a I work right. Machine learning works when you have data. How far along are these guys in their thinking when you have a customer because we're seeing progress? How far along are we? >>Yeah, we still have a long way to go in the federal government. I mean, I tell everybody, I think the federal government's probably four or five years behind what data bricks top uh clients are doing. But there are clearly people in the federal government that have really ramped it up and are on a par were even exceeding some of the commercial clients, U. S. C. I. S CBP FBI or some of the clients that we work with that are pretty far ahead and I'll say I mentioned a lot about the operational data stores but there's all kinds of data that's coming in at U S. C. I. S. They do these naturalization interviews, those are captured in real text. So now you want to do natural language processing against them, make sure these interviews are of the highest quality control, We want to be able to predict which people are going to show up for interviews based on their geospatial location and the day of the week and other factors the weather perhaps. So they're using all of these data types uh imagery text and structure data all in the Lake House concept to make predictions about how they should run their >>business. So that's a really good point. I was talking with keith brooks earlier directive is development, go to market strategy for AWS public sector. He's been there from the beginning this the 10th year of Govcloud. Right, so we're kind of riffing but the jpl Nasa Jpl, they did production workloads out of the gate. Yeah. Full mission. So now fast forward today. Cloud Native really is available. So like how do you see the the agencies in the government handling Okay. Re platform and I get that but now to do the reef acting where you guys have the Lake House new things can happen with cloud Native technologies, what's the what's the what's the cross over point for that point. >>Yeah, I think our Lake House architecture is really a big breakthrough architecture. It used to be, people would take all of this data, they put it in a Hadoop data lake, they'd end up with a data swamp with really not good control or good data quality. And uh then they would take the data from the data swamp where the data lake and they curate it and go through an E. T. L. Process and put a second copy into their data warehouse. So now you have two copies of the data to governance models. Maybe two versions of the data. A lot to manage. A lot to control with our Lake House architecture. You can put all of that data in the data lake it with our delta format. It comes in a curated way. Uh there's a catalogue associated with the data. So you know what you've got. And now you can literally build an ephemeral data warehouse directly on top of that data and it exists only for the period of time that uh people need it. And so it's cloud Native. It's elastically scalable. It terminates when nobody's using it. We run the whole center for Medicaid Medicare services. The whole Medicaid repository for the United States runs in an ephemeral data warehouse built on Amazon S three. >>You know, that is a huge call out, I want to just unpack that for a second. What you just said to me puts the exclamation point on cloud value because it's not your grandfather's data warehouse, it's like okay we do data warehouse capability but we're using higher level cloud services, whether it's governance stuff for a I to actually make it work at scale for those environments. I mean that that to me is re factoring that's not re platform Ng. Just re platform that's re platform Ng in the cloud and then re factoring capability for on uh new >>advantages. It's really true. And now you know at CMS, they have one copy of the data so they do all of their reporting, they've got a lot of congressional reports that they need to do. But now they're leveraging that same data, not making a copy of it for uh the center for program integrity for fraud. And we know how many billions of dollars worth of fraud exist in the Medicaid system. And now we're applying artificial intelligence and machine learning on entity analytics to really get to the root of those problems. It's a game >>changer. And this is where the efficiency comes in at scale. Because you start to see, I mean we always talk on the cube about like how software is changed the old days you put on the shelf shelf where they called it. Uh that's our generation. And now you got the cloud, you didn't know if something is hot or not until the inventory is like we didn't sell through in the cloud. If you're not performing, you suck basically. So it's not working, >>it's an instant Mhm. >>Report card. So now when you go to the cloud, you think the data lake and uh the lake house what you guys do uh and others like snowflake and were optimized in the cloud, you can't deny it. And then when you compare it to like, okay, so I'm saving you millions and millions if you're just on one thing, never mind the top line opportunities. >>So so john you know, years ago people didn't believe the cloud was going to be what it is. Like pretty much today, the clouds inevitable. It's everywhere. I'm gonna make you another prediction. Um And you can say you heard it here first, the data warehouse is going away. The Lake house is clearly going to replace it. There's no need anymore for two separate copies, there's no need for a proprietary uh storage copy of your data and people want to be able to apply more than sequel to the data. Uh Data warehouses, just restrict. What about an ocean house? >>Yeah. Lake is kind of small. When you think about this lake, Michigan is pretty big now, I think it's I >>think it's going to go bigger than that. I think we're talking about Sky Computer, we've been a cloud computing, we're going to uh and we're going to do that because people aren't gonna put all of their data in one place, they're going to have, it spread across different amazon regions or or or amazon availability zones and you're going to want to share data and you know, we just introduced this delta sharing capability. I don't know if you're familiar with it but it allows you to share data without a sharing server directly from picking up basically the amazon, you RLS and sharing them with different organizations. So you're sharing in place. The data actually isn't moving. You've got great governance and great granularity of the data that you choose to share and data sharing is going to be the next uh >>next break. You know, I really loved the Lake House were fairly sing gateway. I totally see that. So I totally would align with that and say I bet with you on that one. The Sky net Skynet, the Sky computing. >>See you're taking it away man, >>I know Skynet got anything that was computing in the Sky is Skynet that's terminated So but that's real. I mean I think that's a concept where it's like, you know what services and functions does for servers, you don't have a data, >>you've got to be able to connect data, nobody lives in an island. You've got to be able to connect data and more data. We all know more data produces better results. So how do you get more data? You connect to more data sources, >>Howard great to have you on talk about the relationship real quick as we end up here with amazon, What are you guys doing together? How's the partnership? >>Yeah, I mean the partnership with amazon is amazing. We have, we work uh, I think probably 95% of our federal business is running in amazon's cloud today. As I mentioned, john we run across uh, AWS commercial AWS GovCloud secret environment. See to us and you know, we have better integration with amazon services than I'll say some of the amazon services if people want to integrate with glue or kinesis or Sagemaker, a red shift, we have complete integration with all of those and that's really, it's not just a partnership at the sales level. It's a partnership and integration at the engineering level. >>Well, I think I'm really impressed with you guys as a company. I think you're an example of the kind of business model that people might have been afraid of which is being in the cloud, you can have a moat, you have competitive advantage, you can build intellectual property >>and, and john don't forget, it's all based on open source, open data, like almost everything that we've done. We've made available to people, we get 30 million downloads of the data bricks technology just for people that want to use it for free. So no vendor lock in. I think that's really important to most of our federal clients into everybody. >>I've always said competitive advantage scale and choice. Right. That's a data bricks. Howard? Thanks for coming on the key, appreciate it. Thanks again. Alright. Cube coverage here in Washington from face to face physical event were on the ground. Of course, we're also streaming a digital for the hybrid event. This is the cubes coverage of a W. S. Public sector Summit will be right back after this short break.

Published Date : Sep 28 2021

SUMMARY :

to the cube. Um, so there's no, you know, So that was the start of a great federal relationship But you guys have already cleared the runway with your value problems. But you know john it's The How are you guys migrating with your customers? So before I do that I want to tell you a quick story. I know one of the things that you guys are passionate So now you want to do natural language processing against them, make sure these interviews are of the highest quality So like how do you see the So now you have two copies of the data to governance models. I mean that that to me is re factoring that's not re platform And now you know at CMS, they have one copy of the data talk on the cube about like how software is changed the old days you put on the shelf shelf where they called So now when you go to the cloud, you think the data lake and uh the lake So so john you know, years ago people didn't believe the cloud When you think about this lake, Michigan is pretty big now, I think it's I of the data that you choose to share and data sharing is going to be the next uh So I totally would align with that and say I bet with you on that one. I mean I think that's a concept where it's like, you know what services So how do you get more See to us and you know, we have better integration with amazon services Well, I think I'm really impressed with you guys as a company. I think that's really important to most of our federal clients into everybody. Thanks for coming on the key, appreciate it.

ENTITIES

Entity	Category	Confidence
amazon	ORGANIZATION	0.99+
Howard Levinson	PERSON	0.99+
Washington	LOCATION	0.99+
Skynet	ORGANIZATION	0.99+
Howard	PERSON	0.99+
AWS	ORGANIZATION	0.99+
two copies	QUANTITY	0.99+
Washington, D. C.	LOCATION	0.99+
two days	QUANTITY	0.99+
30 million	QUANTITY	0.99+
two versions	QUANTITY	0.99+
keith brooks	PERSON	0.99+
95%	QUANTITY	0.99+
two separate copies	QUANTITY	0.99+
Howard Levenson	PERSON	0.99+
millions	QUANTITY	0.99+
Ailene vedrine	PERSON	0.99+
one copy	QUANTITY	0.99+
four	QUANTITY	0.99+
Sky	ORGANIZATION	0.99+
10 years ago	DATE	0.99+
five years	QUANTITY	0.99+
first step	QUANTITY	0.99+
28 different data sources	QUANTITY	0.99+
Michigan	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Sky Computer	ORGANIZATION	0.98+
United States	LOCATION	0.98+
28,000 federal data centers	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
28,000 federal data centers	QUANTITY	0.98+
five years ago	DATE	0.98+
second copy	QUANTITY	0.98+
thousands of users	QUANTITY	0.98+
pandemic	EVENT	0.98+
AWS	EVENT	0.97+
today	DATE	0.97+
10th year	QUANTITY	0.97+
W. S. Public sector Summit	EVENT	0.97+
Lake House	LOCATION	0.97+
john	PERSON	0.96+
Air Force	ORGANIZATION	0.96+
one	QUANTITY	0.96+
Nasa	ORGANIZATION	0.96+
Sky net	ORGANIZATION	0.96+
each one	QUANTITY	0.96+
Medicaid Medicare	ORGANIZATION	0.95+
one thing	QUANTITY	0.94+
24	QUANTITY	0.94+
data bricks	ORGANIZATION	0.94+
U S. C. I. S.	LOCATION	0.92+
up to 70 data sources	QUANTITY	0.91+
Chief data officer	PERSON	0.9+
first	QUANTITY	0.89+
Govcloud	ORGANIZATION	0.88+
Cloud Native	TITLE	0.88+
one place	QUANTITY	0.87+
GovCloud	TITLE	0.87+
couple of years ago	DATE	0.86+
Office of Management and Budget	ORGANIZATION	0.85+
Sandy carter	PERSON	0.84+
years ago	DATE	0.83+
AWS public sector summit	EVENT	0.83+
U. S. C. I. S	ORGANIZATION	0.81+
Medicaid	ORGANIZATION	0.79+
a minute	QUANTITY	0.77+
number of years ago	DATE	0.77+
a second	QUANTITY	0.75+
center huggers	ORGANIZATION	0.72+
Ng	TITLE	0.71+

Rahul Pathak, AWS | AWS re:Invent 2020

>>from around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel and AWS. Yeah, welcome back to the cubes. Ongoing coverage of AWS reinvent virtual Cuba's Gone Virtual along with most events these days are all events and continues to bring our digital coverage of reinvent With me is Rahul Pathak, who is the vice president of analytics at AWS A Ro. It's great to see you again. Welcome. And thanks for joining the program. >>They have Great co two and always a pleasure. Thanks for having me on. >>You're very welcome. Before we get into your leadership discussion, I want to talk about some of the things that AWS has announced. Uh, in the early parts of reinvent, I want to start with a glue elastic views. Very notable announcement allowing people to, you know, essentially share data across different data stores. Maybe tell us a little bit more about glue. Elastic view is kind of where the name came from and what the implication is, >>Uh, sure. So, yeah, we're really excited about blue elastic views and, you know, as you mentioned, the idea is to make it easy for customers to combine and use data from a variety of different sources and pull them together into one or many targets. And the reason for it is that you know we're really seeing customers adopt what we're calling a lake house architectural, which is, uh, at its core Data Lake for making sense of data and integrating it across different silos, uh, typically integrated with the data warehouse, and not just that, but also a range of other purpose. Both stores like Aurora, Relation of Workloads or dynamodb for non relational ones. And while customers typically get a lot of benefit from using purpose built stores because you get the best possible functionality, performance and scale forgiven use case, you often want to combine data across them to get a holistic view of what's happening in your business or with your customers. And before glue elastic views, customers would have to either use E. T. L or data integration software, or they have to write custom code that could be complex to manage, and I could be are prone and tough to change. And so, with elastic views, you can now use sequel to define a view across multiple data sources pick one or many targets. And then the system will actually monitor the sources for changes and propagate them into the targets in near real time. And it manages the anti pipeline and can notify operators if if anything, changes. And so the you know the components of the name are pretty straightforward. Blues are survivalists E T Elling data integration service on blue elastic views about our about data integration their views because you could define these virtual tables using sequel and then elastic because it's several lists and will scale up and down to deal with the propagation of changes. So we're really excited about it, and customers are as well. >>Okay, great. So my understanding is I'm gonna be able to take what's called what the parlance of materialized views, which in my laypersons terms assumes I'm gonna run a query on the database and take that subset. And then I'm gonna be ableto thio. Copy that and move it to another data store. And then you're gonna automatically keep track of the changes and keep everything up to date. Is that right? >>Yes. That's exactly right. So you can imagine. So you had a product catalog for example, that's being updated in dynamodb, and you can create a view that will move that to Amazon Elasticsearch service. You could search through a current version of your catalog, and we will monitor your dynamodb tables for any changes and make sure those air all propagated in the real time. And all of that is is taken care of for our customers as soon as they defined the view on. But they don't be just kept in sync a za long as the views in effect. >>Let's see, this is being really valuable for a person who's building Looks like I like to think in terms of data services or data products that are gonna help me, you know, monetize my business. Maybe, you know, maybe it's a simple as a dashboard, but maybe it's actually a product. You know, it might be some content that I want to develop, and I've got transaction systems. I've got unstructured data, may be in a no sequel database, and I wanna actually combine those build new products, and I want to do that quickly. So So take me through what I would have to do. You you sort of alluded to it with, you know, a lot of e t l and but take me through in a little bit more detail how I would do that, you know, before this innovation. And maybe you could give us a sense as to what the possibilities are with glue. Elastic views? >>Sure. So, you know, before we announced elastic views, a customer would typically have toe think about using a T l software, so they'd have to write a neat L pipeline that would extract data periodically from a range of sources. They then have to write transformation code that would do things like matchup types. Make sure you didn't have any invalid values, and then you would combine it on periodically, Write that into a target. And so once you've got that pipeline set up, you've got to monitor it. If you see an unusual spike in data volume, you might have to add more. Resource is to the pipeline to make a complete on time. And then, if anything changed in either the source of the destination that prevented that data from flowing in the way you would expect it, you'd have toe manually, figure that out and have data, quality checks and all of that in place to make sure everything kept working but with elastic views just gets much simpler. So instead of having to write custom transformation code, you right view using sequel and um, sequel is, uh, you know, widely popular with data analysts and folks that work with data, as you well know. And so you can define that view and sequel. The view will look across multiple sources, and then you pick your destination and then glue. Elastic views essentially monitors both the source for changes as well as the source and the destination for any any issues like, for example, did the schema changed. The shape of the data change is something briefly unavailable, and it can monitor. All of that can handle any errors, but it can recover from automatically. Or if it can't say someone dropped an important table in the source. That was part of your view. You can actually get alerted and notified to take some action to prevent bad data from getting through your system or to prevent your pipeline from breaking without your knowledge and then the final pieces, the elasticity of it. It will automatically deal with adding more resource is if, for example, say you had a spiky day, Um, in the markets, maybe you're building a financial services application and you needed to add more resource is to process those changes into your targets more quickly. The system would handle that for you. And then, if you're monetizing data services on the back end, you've got a range of options for folks subscribing to those targets. So we've got capabilities like our, uh, Amazon data exchange, where people can exchange and monetize data set. So it allows this and to end flow in a much more straightforward way. It was possible before >>awesome. So a lot of automation, especially if something goes wrong. So something goes wrong. You can automatically recover. And if for whatever reason, you can't what happens? You quite ask the system and and let the operator No. Hey, there's an issue. You gotta go fix it. How does that work? >>Yes, exactly. Right. So if we can recover, say, for example, you can you know that for a short period of time, you can't read the target database. The system will keep trying until it can get through. But say someone dropped a column from your source. That was a key part of your ultimate view and destination. You just can't proceed at that point. So the pipeline stops and then we notify using a PS or an SMS alert eso that programmatic action can be taken. So this effectively provides a really great way to enforce the integrity of data that's going between the sources and the targets. >>All right, make it kindergarten proof of it. So let's talk about another innovation. You guys announced quicksight que, uh, kind of speaking to the machine in my natural language, but but give us some more detail there. What is quicksight Q and and how doe I interact with it. What What kind of questions can I ask it >>so quick? Like you is essentially a deep, learning based semantic model of your data that allows you to ask natural language questions in your dashboard so you'll get a search bar in your quick side dashboard and quick site is our service B I service. That makes it really easy to provide rich dashboards. Whoever needs them in the organization on what Q does is it's automatically developing relationships between the entities in your data, and it's able to actually reason about the questions you ask. So unlike earlier natural language systems, where you have to pre define your models, you have to pre define all the calculations that you might ask the system to do on your behalf. Q can actually figure it out. So you can say Show me the top five categories for sales in California and it'll look in your data and figure out what that is and will prevent. It will present you with how it parse that question, and there will, in line in seconds, pop up a dashboard of what you asked and actually automatically try and take a chart or visualization for that data. That makes sense, and you could then start to refine it further and say, How does this compare to what happened in New York? And we'll be able to figure out that you're tryingto overlay those two data sets and it'll add them. And unlike other systems, it doesn't need to have all of those things pre defined. It's able to reason about it because it's building a model of what your data means on the flight and we pre trained it across a variety of different domains So you can ask a question about sales or HR or any of that on another great part accused that when it presents to you what it's parsed, you're actually able toe correct it if it needs it and provide feedback to the system. So, for example, if it got something slightly off you could actually select from a drop down and then it will remember your selection for the next time on it will get better as you use it. >>I saw a demo on in Swamis Keynote on December 8. That was basically you were able to ask Quick psych you the same question, but in different ways, you know, like compare California in New York or and then the data comes up or give me the top, you know, five. And then the California, New York, the same exact data. So so is that how I kind of can can check and see if the answer that I'm getting back is correct is ask different questions. I don't have to know. The schema is what you're saying. I have to have knowledge of that is the user I can. I can triangulate from different angles and then look and see if that's correct. Is that is that how you verify or there are other ways? >>Eso That's one way to verify. You could definitely ask the same question a couple of different ways and ensure you're seeing the same results. I think the third option would be toe, uh, you know, potentially click and drill and filter down into that data through the dash one on, then the you know, the other step would be at data ingestion Time. Typically, data pipelines will have some quality controls, but when you're interacting with Q, I think the ability to ask the question multiple ways and make sure that you're getting the same result is a perfectly reasonable way to validate. >>You know what I like about that answer that you just gave, and I wonder if I could get your opinion on this because you're you've been in this business for a while? You work with a lot of customers is if you think about our operational systems, you know things like sales or E r. P systems. We've contextualized them. In other words, the business lines have inject context into the system. I mean, they kind of own it, if you will. They own the data when I put in quotes, but they do. They feel like they're responsible for it. There's not this constant argument because it's their data. It seems to me that if you look back in the last 10 years, ah, lot of the the data architecture has been sort of generis ized. In other words, the experts. Whether it's the data engineer, the quality engineer, they don't really have the business context. But the example that you just gave it the drill down to verify that the answer is correct. It seems to me, just in listening again to Swamis Keynote the other day is that you're really trying to put data in the hands of business users who have the context on the domain knowledge. And that seems to me to be a change in mindset that we're gonna see evolve over the next decade. I wonder if you could give me your thoughts on that change in the data architecture data mindset. >>David, I think you're absolutely right. I mean, we see this across all the customers that we speak with there's there's an increasing desire to get data broadly distributed into the hands of the organization in a well governed and controlled way. But customers want to give data to the folks that know what it means and know how they can take action on it to do something for the business, whether that's finding a new opportunity or looking for efficiencies. And I think, you know, we're seeing that increasingly, especially given the unpredictability that we've all gone through in 2020 customers are realizing that they need to get a lot more agile, and they need to get a lot more data about their business, their customers, because you've got to find ways to adapt quickly. And you know, that's not gonna change anytime in the future. >>And I've said many times in the The Cube, you know, there are industry. The technology industry used to be all about the products, and in the last decade it was really platforms, whether it's SAS platforms or AWS cloud platforms, and it seems like innovation in the coming years, in many respects is coming is gonna come from the ecosystem and the ability toe share data we've We've had some examples today and then But you hit on. You know, one of the key challenges, of course, is security and governance. And can you automate that if you will and protect? You know the users from doing things that you know, whether it's data access of corporate edicts for governance and compliance. How are you handling that challenge? >>That's a great question, and it's something that really emphasized in my leadership session. But the you know, the notion of what customers are doing and what we're seeing is that there's, uh, the Lake House architectural concept. So you've got a day late. Purpose build stores and customers are looking for easy data movement across those. And so we have things like blue elastic views or some of the other blue features we announced. But they're also looking for unified governance, and that's why we built it ws late formation. And the idea here is that it can quickly discover and catalog customer data assets and then allows customers to define granular access policies centrally around that data. And once you have defined that, it then sets customers free to give broader access to the data because they put the guardrails in place. They put the protections in place. So you know you can tag columns as being private so nobody can see them on gun were announced. We announced a couple of new capabilities where you can provide row based control. So only a certain set of users can see certain rose in the data, whereas a different set of users might only be able to see, you know, a different step. And so, by creating this fine grained but unified governance model, this actually sets customers free to give broader access to the data because they know that they're policies and compliance requirements are being met on it gets them out of the way of the analyst. For someone who can actually use the data to drive some value for the business, >>right? They could really focus on driving value. And I always talk about monetization. However monetization could be, you know, a generic term, for it could be saving lives, admission of the business or the or the organization I meant to ask you about acute customers in bed. Uh, looks like you into their own APs. >>Yes, absolutely so one of quick sites key strengths is its embed ability. And on then it's also serverless, so you could embed it at a really massive scale. And so we see customers, for example, like blackboard that's embedding quick side dashboards into information. It's providing the thousands of educators to provide data on the effectiveness of online learning. For example, on you could embed Q into that capability. So it's a really cool way to give a broad set of people the ability to ask questions of data without requiring them to be fluent in things like Sequel. >>If I ask you a question, we've talked a little bit about data movement. I think last year reinvent you guys announced our A three. I think it made general availability this year. And remember Andy speaking about it, talking about you know, the importance of having big enough pipes when you're moving, you know, data around. Of course you do. Doing tearing. You also announced Aqua Advanced Query accelerator, which kind of reduces bringing the computer. The data, I guess, is how I would think about that reducing that movement. But then we're talking about, you know, glue, elastic views you're copying and moving data. How are you ensuring you know, maintaining that that maximum performance for your customers. I mean, I know it's an architectural question, but as an analytics professional, you have toe be comfortable that that infrastructure is there. So how does what's A. W s general philosophy in that regard? >>So there's a few ways that we think about this, and you're absolutely right. I think there's data volumes were going up, and we're seeing customers going from terabytes, two petabytes and even people heading into the exabyte range. Uh, there's really a need to deliver performance at scale. And you know, the reality of customer architectures is that customers will use purpose built systems for different best in class use cases. And, you know, if you're trying to do a one size fits all thing, you're inevitably going to end up compromising somewhere. And so the reality is, is that customers will have more data. We're gonna want to get it to more people on. They're gonna want their analytics to be fast and cost effective. And so we look at strategies to enable all of this. So, for example, glue elastic views. It's about moving data, but it's about moving data efficiently. So What we do is we allow customers to define a view that represents the subset of their data they care about, and then we only look to move changes as efficiently as possible. So you're reducing the amount of data that needs to get moved and making sure it's focused on the essential. Similarly, with Aqua, what we've done, as you mentioned, is we've taken the compute down to the storage layer, and we're using our nitro chips to help with things like compression and encryption. And then we have F. P. J s in line to allow filtering an aggregation operation. So again, you're tryingto quickly and effectively get through as much data as you can so that you're only sending back what's relevant to the query that's being processed. And that again leads to more performance. If you can avoid reading a bite, you're going to speed up your queries. And that Awkward is trying to do. It's trying to push those operations down so that you're really reducing data as close to its origin as possible on focusing on what's essential. And that's what we're applying across our analytics portfolio. I would say one other piece we're focused on with performance is really about innovating across the stack. So you mentioned network performance. You know, we've got 100 gigabits per second throughout now, with the next 10 instances and then with things like Grab it on to your able to drive better price performance for customers, for general purpose workloads. So it's really innovating at all layers. >>It's amazing to watch it. I mean, you guys, it's a It's an incredible engineering challenge as you built this hyper distributed system. That's now, of course, going to the edge. I wanna come back to something you mentioned on do wanna hit on your leadership session as well. But you mentioned the one size fits all, uh, system. And I've asked Andy Jassy about this. I've had a discussion with many folks that because you're full and and of course, you mentioned the challenges you're gonna have to make tradeoffs if it's one size fits all. The flip side of that is okay. It's simple is you know, 11 of the Swiss Army knife of database, for example. But your philosophy is Amazon is you wanna have fine grained access and to the primitives in case the market changes you, you wanna be able to move quickly. So that puts more pressure on you to then simplify. You're not gonna build this big hairball abstraction layer. That's not what he gonna dio. Uh, you know, I think about, you know, layers and layers of paint. I live in a very old house. Eso your That's not your approach. So it puts greater pressure on on you to constantly listen to your customers, and and they're always saying, Hey, I want to simplify, simplify, simplify. We certainly again heard that in swamis presentation the other day, all about, you know, minimizing complexity. So that really is your trade office. It puts pressure on Amazon Engineering to continue to raise the bar on simplification. Isn't Is that a fair statement? >>Yeah, I think so. I mean, you know, I think any time we can do work, so our customers don't have to. I think that's a win for both of us. Um, you know, because I think we're delivering more value, and it makes it easier for our customers to get value from their data way. Absolutely believe in using the right tool for the right job. And you know you talked about an old house. You're not gonna build or renovate a house of the Swiss Army knife. It's just the wrong tool. It might work for small projects, but you're going to need something more specialized. The handle things that matter. It's and that is, uh, that's really what we see with that, you know, with that set of capabilities. So we want to provide customers with the best of both worlds. We want to give them purpose built tools so they don't have to compromise on performance or scale of functionality. And then we want to make it easy to use these together. Whether it's about data movement or things like Federated Queries, you can reach into each of them and through a single query and through a unified governance model. So it's all about stitching those together. >>Yeah, so far you've been on the right side of history. I think it serves you well on your customers. Well, I wanna come back to your leadership discussion, your your leadership session. What else could you tell us about? You know, what you covered there? >>So we we've actually had a bunch of innovations on the analytics tax. So some of the highlights are in m r, which is our managed spark. And to do service, we've been able to achieve 1.7 x better performance and open source with our spark runtime. So we've invested heavily in performance on now. EMR is also available for customers who are running and containerized environment. So we announced you Marnie chaos on then eh an integrated development environment and studio for you Marco D M R studio. So making it easier both for people at the infrastructure layer to run em are on their eks environments and make it available within their organizations but also simplifying life for data analysts and folks working with data so they can operate in that studio and not have toe mess with the details of the clusters underneath and then a bunch of innovation in red shift. We talked about Aqua already, but then we also announced data sharing for red Shift. So this makes it easy for red shift clusters to share data with other clusters without putting any load on the central producer cluster. And this also speaks to the theme of simplifying getting data from point A to point B so you could have central producer environments publishing data, which represents the source of truth, say into other departments within the organization or departments. And they can query the data, use it. It's always up to date, but it doesn't put any load on the producers that enables these really powerful data sharing on downstream data monetization capabilities like you've mentioned. In addition, like Swami mentioned in his keynote Red Shift ML, so you can now essentially train and run models that were built in sage maker and optimized from within your red shift clusters. And then we've also automated all of the performance tuning that's possible in red ships. So we really invested heavily in price performance, and now we've automated all of the things that make Red Shift the best in class data warehouse service from a price performance perspective up to three X better than others. But customers can just set red shift auto, and it'll handle workload management, data compression and data distribution. Eso making it easier to access all about performance and then the other big one was in Lake Formacion. We announced three new capabilities. One is transactions, so enabling consistent acid transactions on data lakes so you can do things like inserts and updates and deletes. We announced row based filtering for fine grained access control and that unified governance model and then automated storage optimization for Data Lake. So customers are dealing with an optimized small files that air coming off streaming systems, for example, like Formacion can auto compact those under the covers, and you can get a 78 x performance boost. It's been a busy year for prime lyrics. >>I'll say that, z that it no great great job, bro. Thanks so much for coming back in the Cube and, you know, sharing the innovations and, uh, great to see you again. And good luck in the coming here. Well, >>thank you very much. Great to be here. Great to see you. And hope we get Thio see each other in person against >>I hope so. All right. And thank you for watching everybody says Dave Volonte for the Cube will be right back right after this short break

Published Date : Dec 10 2020

SUMMARY :

It's great to see you again. They have Great co two and always a pleasure. to, you know, essentially share data across different And so the you know the components of the name are pretty straightforward. And then you're gonna automatically keep track of the changes and keep everything up to date. So you can imagine. services or data products that are gonna help me, you know, monetize my business. that prevented that data from flowing in the way you would expect it, you'd have toe manually, And if for whatever reason, you can't what happens? So if we can recover, say, for example, you can you know that for a So let's talk about another innovation. that you might ask the system to do on your behalf. but in different ways, you know, like compare California in New York or and then the data comes then the you know, the other step would be at data ingestion Time. But the example that you just gave it the drill down to verify that the answer is correct. And I think, you know, we're seeing that increasingly, You know the users from doing things that you know, whether it's data access But the you know, the notion of what customers are doing and what we're seeing is that admission of the business or the or the organization I meant to ask you about acute customers And on then it's also serverless, so you could embed it at a really massive But then we're talking about, you know, glue, elastic views you're copying and moving And you know, the reality of customer architectures is that customers will use purpose built So that puts more pressure on you to then really what we see with that, you know, with that set of capabilities. I think it serves you well on your customers. speaks to the theme of simplifying getting data from point A to point B so you could have central in the Cube and, you know, sharing the innovations and, uh, great to see you again. thank you very much. And thank you for watching everybody says Dave Volonte for the Cube will be right back right after

ENTITIES

Entity	Category	Confidence
Rahul Pathak	PERSON	0.99+
Andy Jassy	PERSON	0.99+
AWS	ORGANIZATION	0.99+
David	PERSON	0.99+
California	LOCATION	0.99+
New York	LOCATION	0.99+
Andy	PERSON	0.99+
Swiss Army	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
December 8	DATE	0.99+
Dave Volonte	PERSON	0.99+
last year	DATE	0.99+
2020	DATE	0.99+
third option	QUANTITY	0.99+
Swami	PERSON	0.99+
each	QUANTITY	0.99+
both	QUANTITY	0.99+
A. W	PERSON	0.99+
this year	DATE	0.99+
10 instances	QUANTITY	0.98+
A three	COMMERCIAL_ITEM	0.98+
78 x	QUANTITY	0.98+
two petabytes	QUANTITY	0.98+
five	QUANTITY	0.97+
Amazon Engineering	ORGANIZATION	0.97+
Red Shift ML	TITLE	0.97+
Formacion	ORGANIZATION	0.97+
11	QUANTITY	0.96+
one	QUANTITY	0.96+
one way	QUANTITY	0.96+
Intel	ORGANIZATION	0.96+
One	QUANTITY	0.96+
five categories	QUANTITY	0.94+
Aqua	ORGANIZATION	0.93+
Elasticsearch	TITLE	0.93+
terabytes	QUANTITY	0.93+
both worlds	QUANTITY	0.93+
next decade	DATE	0.92+
two data sets	QUANTITY	0.91+
Lake Formacion	ORGANIZATION	0.9+
single query	QUANTITY	0.9+
Data Lake	ORGANIZATION	0.89+
thousands of educators	QUANTITY	0.89+
Both stores	QUANTITY	0.88+
Thio	PERSON	0.88+
agile	TITLE	0.88+
Cuba	LOCATION	0.87+
dynamodb	ORGANIZATION	0.86+
1.7 x	QUANTITY	0.86+
Swamis	PERSON	0.84+
EMR	TITLE	0.82+
one size	QUANTITY	0.82+
Red Shift	TITLE	0.82+
up to three X	QUANTITY	0.82+
100 gigabits per second	QUANTITY	0.82+
Marnie	PERSON	0.79+
last decade	DATE	0.79+
reinvent 2020	EVENT	0.74+
Invent	EVENT	0.74+
last 10 years	DATE	0.74+
Cube	COMMERCIAL_ITEM	0.74+
today	DATE	0.74+
A Ro	EVENT	0.71+
three new capabilities	QUANTITY	0.71+
two	QUANTITY	0.7+
E T Elling	PERSON	0.69+
Eso	ORGANIZATION	0.66+
Aqua	TITLE	0.64+
Cube	ORGANIZATION	0.63+
Query	COMMERCIAL_ITEM	0.63+
SAS	ORGANIZATION	0.62+
Aurora	ORGANIZATION	0.61+
Lake House	ORGANIZATION	0.6+
Sequel	TITLE	0.58+
P.	PERSON	0.56+

Ash Ashutosh V1

>>from around the globe. It's the cue with digital coverage of active EO data driven 2020. Brought to you by activity. We're back. This is the cubes coverage. Our ongoing coverage of active FiOS data driven. Of course, we've gone virtual this year. Ash. Ashutosh is here. He's the founder, president and CEO of Active Eo. Great to see you again. >>Likewise, They always always good to see you. >>We have We're in a little meet up, You and I in Boston. I always enjoy our conversations. Little did we know that, You know, a few months later, we would only be talking at this type of distance and, uh and of course, it's sad. I mean, a data driven is one of our favorite events is intimate, its customer content driven. The theme this year is you call it the next normal. Some people call it the new abnormal, the next normal. What's that all about? >>I think it's pretty pretty fascinating to see when we walked in in March, all of us were shocked by the effect of this pandemic. And for a while we all scrambled around trying to figure out How do you react to this one, and everybody reacted very differently. But most people have this tendency to think that this is going to be a pretty broom environment with lots of unknown variables, and it is important for us to try to figure out how to get a get our hands on this. By the time we came on. For six weeks into that, almost all of us have figured out this is Ah, this is not something you fight again. This is not something you wait, what, it to go away? But this is one. Did you figure out how to live in and you figured out how to work around it? And that, we believe, is the next long. It's not about trying to create a new abnormal. It's not about creating a new normal, but it's truly one that basically says that is it. That is a way, perhaps packed forward. There's a is a way to create this next normal, and you just figured out how to live with the environment, behalf and the normal outcomes of companies that have done remarkably well as a result of these actions. Fact. If you're being one of them, >>it's quite amazing isn't it? I mean, I've talked to a lot of tech companies, CEOs and their customers, and it's almost like they feel the first reaction was course they cared about their there, their employees and their broader families. Number one number two was many companies, as you know, saw a tailwind, and it initially didn't want to be seen as ambulance chasing. And then, of course, the entrepreneurial spirit kicked in and they said, Okay, we can only control what we can control and tech companies in particular just exceedingly Well, I don't think anybody really predicted that early >>on. Yeah, I, um I think of the heart, We're all human beings, and the first reaction was to take it off. Four constituencies, right? One. Take care of your family. Take it off your community, take care of your employees, take care of your customers. And that was the hardest part. The first 4 to 6 weeks was to figure out How do you do each of those four. Once you figured that part out or you figured out ways to get around to making sure you can take it off those you really found the next mom, you really start forgetting our out of continue to innovate Could, you know to support each of those four constituencies and people have done different things. I know it's amazing how, um, Cuba continues to operate As far as a user is concerned, they're all watching anymore. Yes, we don't have the wonderful desk, and we all get to chat and look in the eye. But the content of the messages asked powerful as what it waas a few months ago. So I'm sure this is how we're all going to figure out how to make through this new next normal >>and digital transformation kind of went from from push to pull. I mean, every conference you go to, they say, Well, look at uber, you know, look at Airbnb and it put up the examples you have to do this to, and then all of sudden the industry dragged you along. Some Curis esta is toe. How and and I guess the other point there is digital means data. We've said that many, many times. If you didn't have a digital strategy during the height of the lock down, you couldn't transact business and still many restaurants is still trying to figure this out, But so how did it affect you and your customers? >>Yeah, it's very interesting. And I we spend a lot of time with several of our customers were managing some of the largest I T organizations. We talk about very interesting phenomena that happened some better beginning of this year. About 20 years ago, we used to worry about this thing called the Digital Divide, those who have access the network and Internet and those who don't. And now there is this beta divide, the divide between organizations that know how to leverage, exploit and absolutely excellent the business using data and those adorable. I think we're seeing this effect so very clearly among organizations that unable to come back and address some of this stuff. And it's fascinating. Yes, we all have the examples off the lights off. People are doing delivery. People are doing retailing, but there are so many little things you're seeing organizations. And just the other day, he had a video from Century Days Is Central Data System, which is helping accelerate Cohen 19 research because it will get copies of the data faster than they would get access to data so that these are just much, much faster. Sometimes you know, several days to a few minutes. It's that that level of effect, it's not just down to some seven. You know, you almost think of it as nice to have, but it's must have life threatening stuff. Essential stuff or just addressing. Korea was running a very pretty in a wonderful article about this supercomputer in That's Doing an Aristo covert 19 and how it's figured out most of these symptoms they're able to figure out by just crunching a ton of data. And almost every one of those symptoms that the computer has predicted Supercomputer is predicted has being accurate. It's about data. It is absolutely about data, which is why I think this is a phenomenal time for companies. Toe Absolutely go change. Make this information about data exploration, data leverage, exploitation. And there's a ton of it all over all around us. >>Yeah, and and part of that digital transformation, the mandate is to really put data at the core. I mean, we've we've certainly seen this with the top market cap companies. They've got dated at the core, and and now, as they say it's it's become a A mandate. And, you know, there's been several things that we've clearly noticed. I mean, you saw the work from home required laptops and, you know, endpoint security and things of that. VD. I made a comeback, and certainly Cloud was there. But I've been struck by the reality of multi Cloud. I was kind of a multi cloud skeptic early on. >>Yeah, >>I said many times I thought it was more of a symptom than it was a strategy, but it's that's completely flipped. Ah, recently in r e t r surveys, we saw multi cloud popping up all over the place. I wonder what you're seeing when you talk to your customers and other CEOs. >>Yeah, So fascinating, though really is the first flower part of sometime in 2018. End of 2018 >>Go right, Yeah, >>the act if you'll go on world, which is a phenomenal way to completely change the way you think about the using object storage in the flower for two years that we saw about 20% of our business. By the end of two years, the beginning of this year, 20% of our business was built on never it in the cloud since March. So that was end of our almost ended the Q one. So now we just limit left you three in six months. We added 12 more percent of the business literally weeded in six months. What we did not do before for 18 months before that, right? Significantly more than what we did for a year and a half before that. And there are really three reasons and we see this old nor again, we have a large customer. We closed in January. Ironically, were deploying out of UK, a very large marketing organization. Got everything deployed, running the they're back up and beyond and a separate data center. And they had a practical problem of not being able to access the second sight literally in the middle of deployment. Mystere that customer, Did you see me Google Cloud? Because they were simply no way for them to continue protecting their data, being able to develop new applications with that data that simply had no access. So there was. This was the number one reason the inability for already physically access, but put their their employees at rest and have before the plow would be the infrastructure. That's number one, so that first of all, drove the reason for the cloud. And then there's a second reason there are practical reasons. And why some clerk platforms that good one working the other ones are not. So where, uh, some other more fuels. And so if I'm an organization that has that spans everything, I've got no power PC and X 86 machine A vm I got container platforms. I got Oracle. They got a C P. There is no single cloud platform that supports all my work loaders efficiently. It's available in all the agents I want. So inevitably I have to go at our different about barefoot. So that's a second practical visa. And then there's a strategic reason. No, when no customer what's really locked into anyone card back at least two. You're gonna go pear more likely? Three. So those are the reasons. And then, interestingly enough, have you were on a panel with as global Cee Io's and in addition to just the usual cloud providers of you all know and love inside the U. S. Across the world, in Europe, in Asia, there's a rise off the regional flower fire. See you take all this factor. So have you got absolute physical necessity? You got practical constraints of what can the club provided support the strategic reasons on why either Because I don't want to be locked into a part for better or because there is a rise off data nationalism that's going on, that people want to keep their data within the country bombs all of these reasons. But the foundations or why multiplier is almost becoming a de facto. It's impossible. What a decent size organization to assume. They were just different on one car ready. >>The big trend we're seeing, I wonder if you could comment. Is this this notion of the data life cycle of the data pipeline? It's a very complex situation for a lot of organizations, their data siloed. We hear that a lot. They have data scientists, data engineers, developers, data quality engineers, just a lot of different constituencies and lines of business. And it's kind of a mess. And so what they're trying to do is bring that together. So they've done that data. Scientists complain they spend all their time wrangling data, but but ultimately the ones that are succeeding to putting data at the core is, we've just been discussing are seeing amazing outcomes by being able to have a single version of the truth, have confidence in that data, create self serve for their for their lines of business and actually reduce the end and cycle times. It's driving your major monetization, whether that's cost cutting or revenue. And I'm curious as to what you're seeing. You guys do a lot of work. Heavy work in Dev ops and hard core database those air key components of that data Lifecycle. Yeah, you're seeing in that regard regarding that data pipeline. >>Yeah, it's a It's a phenomenal point if you really want to go back and exploit data within an organization. If you really want to be a data driven organization, the very first thing you have to do is break down the silos. Ironically, every organization has all the data required to make the decisions they want to. They just can't either get to it or it's so hard to make the silos. That is just not what trying to make it happen. And 10 years ago we set out on this mission rather than keep this individual silos of data. Why don't we flip it open and making it a pipeline, which looks like a data cloud where essentially anybody who's consuming it has access to it based on the governance rules based on the security rules that the operations people have said and based on the kind of format they want to see data. Not everyone even want to see the data in a database. Former, maybe you want the database for my convert CSP for my before you don't analytics And this idea of making data, the new infrastructure, this idea of having the operations people provide this new layer for data, it's finally come to roost. I mean, it's it's fascinating. I was the numbers last quarter. We just finished up. You do now. 45% of our customer base is uses activity or for reuse is the back of data for things that excellent. The business things that make the business move faster, more productive or you will survive. That was the mission. That was what we set out to do 10 years ago. We were talking to an analyst this morning, and now this is question off. You know, it looks like there's a team of backup data being reused, said Yeah, that's kind of what we've been saying for 10 years. Backup cannot be an insurance back up in order to your destination. It has to be something that you could use as an asset and that I think it's finally coming to the point with you can use back up a single source of truth only if you designed it right from the beginning. For that purpose, you cannot just lots of lots of ways to fake it. Make it try to pretend like you're doing it. But that was a trooper was off making date of the new infrastructure, making it a cloud, making it something that is truly an ask. And it's fascinating to see our businesses. You take any of our larger counts and the way they've gone about transforming not just basic backup. India. Yes, we are the world's glasses back up in most Kayla will be our solution. That's that's a starting point. But do we will be used after Devil applications 8, 10 times faster? Ron Analytics, 100 ex pastor. The more data you have, the more people who use data you have, the better this return makeups. >>You know, that is interesting to hear you talk about that because that has been the holy Grail of backup. Was toe go beyond insurance to actually create business value. And you're actually seeing some underlying trends We talked about that data pipeline in one of the areas that is the most interesting is in database, which was so boring for so many years. Ah, and you're seeing new workloads emerge. Take the data warehouse beyond your reporting. Never really lived up to its Ah, it's promise of 360 degree view. You mentioned analytics. That's really starting toe happen. Ah, and it's all about data John, for Used to say that your data is that is the new development kit. You call it the new infrastructure, and it's sort of the same same type of theme. So maybe some of the trends you're seeing in ah in database enoughto talk about that for a little bit and then pick your brains and some other tech like object storage is another one that we've really seen takeoff? >>Yeah. So I think our journey with object story began in 16 4017 as we started or Doctor Cloud platform in response to the user requirements, Uh, we did more like most companies have done and unfortunately continue to do to take the in print product. And then it's smooth under the cloud. And one of the things we saw was there was a fundamental difference off how the design points of flower engineering is all about what they're designed it for object story, that one of those one of those primitives fundamental stories, primitives that the cloud providers actually produced that we know really exploited. There was. It was used as a graveyard for data. It's a replacement for me, please, where data goes to die. And then we look at it really closely and say, Well, this is actually a massively scalable, very low cost storage, but it has some problems. It has an interface that you cannot use with traditional servers. Uh, it has some issues around not being able to read, modify right the data. So it feels like a consuming a lot of stories. So we're going to solve those problems because a good two years to come back with something on world that fundamentally creeds objects the lady like this massive use capable high performer disk? Yes, except it is ridiculously low cost and optimize the capacity. So this finger on world that patented has really become the foundation of how everything in our works without using CPU Ray, that is simply nothing at a lower PCO that if you wanted to basic backup, the, uh, more importantly, use that to do this a massive analytics and you don't know more data warehouse data leaks. It is not a good deal of Lake House aladi. All of these are still silent. All of these are people trying to take some data from somewhere put into one of the new construct and have it being controlled by somebody else. This is artist thing. It's just you just move the silos from some place to another place instead of creating a pipeline. If you want to really create a pipeline object story has been integral part of the pipeline, not a separate bucket by itself. And that's what we did. And same thing with databases, you know, most business, most of the critical business and I was on a daily basis, and the ability to find a way to leverage those. Move them on our leverage in terms of whichever format databases access. Which location or Saxes doesn't know how big it is. Lots of work has gone into trying to figure figure that one out. And we we had some very, very good partners in some of the largest customers who help take the journey with us. I'm pretty much all of the global 2000 accounts you see across the board, but an integral part of a process. >>You mentioned the word journey and triggered a thought. Is your discussion with Robbie, the CEO of of Seeing >>A. It was a customer years. >>Ah, and what he said. I liked what he said. He course he used the term journey. We all do. But he said, You know what? I kind of don't like that term because I want to inject the sense of urgency essentially what he was saying. I want speed, you know, journeys like Okay, kids get in the car, were in a drive across country. We're gonna make some stops. And so, while there's a journey, he also was was really trying to push the organization hard and he talked about culture. Ah, as some of the most difficult things and it goes like many. See, I said, Now the technology is almost the easy part. It's true when it works. Oh, I thought that was a great discussion that you had. What were some of your takeaways >>with thinking? Robbie's is very astute. Ah, I t executive was being around the block for so long and one of the fascinating things, but a asking this question about what's the biggest challenge was just gone through this a couple of times. What is the biggest challenge? Taking an organization as vulnerable as well known A C gate is. I mean, this is a data company. This is This is the heart of the Oliver Half the world's data is on seeing stuff. How are you today was, or company has been around for long in the middle of Silicon Valley and make it into ah into a fast growing transformation company that's responding to the newer challenges. And I thought he was going to come back with Well, you know, I gotta go to the abuses. I picked this technology that techno in. Surely that is exactly what I expected he would end up with. There's nothing through technology in this day and age when you can have an Elon Musk and send a card of Mars. It's not many technologies that we can really solve many covered 19 ism. Next one Do we gotta go solve? Well, frankly, he kid upon the one thing that matters to every company. It is the fundamental culture to create a biased of action. It's a fundamental culture where you have to come back and have a deliverable that moves the ball forward every day, every month, every quarter, as opposed to have this CDs off. Like you said, a journey that say's and we all know this right? People talk about, we're going to do this in face one. We're gonna do this and face to and good food release and face three nothing and what happens Invasive. Nobody gets a number feast. I think he did a great job of saying I fundamentally had to go change the culture that was my biggest take away, and this I've heard this so many times the most effective I D execs wait a transformation. It actually shows in the people that they have. It's not the technology, it's the people. And some. This history is replete with organizations that have done remarkably well, not by leveraging the heck out of the technology, but truly by leveraging the change in the people's mindset. And, of course, that at that point that leverages technology where a proper here. But Robbie's a insightful person, always such a They lied to talk them, said they like for him to have chosen us as a its information technology for him to go pull his data warehouses and completely transformed how I was doing manufacturing across the globe. >>You know, I want to have some color of what you just said because some key keep takeaways that from what you just said, ashes is You know, you're right when you look back at the history of the computer industry used to be very well known processes, but the technology was the big mystery and the and the big risk and you think about with Cove it were it not for Technology Way didn't know what was coming. We were inventing new processes literally every day, every week, every month. It's so technology was pretty well understood. It and enabled that. And when you when you think when we talked earlier about putting data at the core, it was interesting to hear Robbie. He basically said, Yeah, we had a big data team in the U. S. A big tainted TV in Europe. We actually organized around silos and and so you guys played a role you were very respectful about, you know, touting active video with him. You did ask him, You know what role you play, But it is interesting to hear and talk about how he had to address that both culturally. And of course, there's technology underneath to enable that unification of data that silo busting, if you will. And you guys played a role in that. >>Yeah, I always enjoy, um, conversation with folks who have taken a problem, identified what needs to be done and then just get it done. And its That's more fascinating than you. Of course, I video plays a small part in a lot of things, and we're proud to have played a small part in his big initiative, and that's true of know the thousands of customers we talk about. But it's such a fascinating story to have leaders who come back and make this transformation happen, and to understand how they went about making those decisions, how they identified where the problem with these are so hard. We all see them in our own life, right? We see there is a there's a problem, but sometimes it takes a wider don't understand. How do you identify them and what do you have to do and more importantly, actually do it? And so whenever use, whenever I get an opportunity with people like Robbie, I think understanding that there's a way to help, uh, we always make sure that we play our own small part, and we're privileged to be a part of those kinds of journeys. >>Well, I think what's interesting about activity on the company that you created is essentially that. We're talking about the democratisation of data, that whole data pipeline, that discussion, that we had the self service of that data to the lines of business and, you know, you guys clearly play a role there. The multi cloud discussion fits into that. I mean that these air all trends that are tail winds for companies that can that can help sort of you know, flattened the data globe. If you if you will, your final thoughts. >>Yeah, I know you said something that is so much at the heart of every idea Exactly that you're talking to, if they truly is. The fundamental asset that I finally end up with is an organization. The democratization of data. Where I do not lock this into another silo, another platform, another ploughed. Another application has to be part of my foundation design and therefore my ability to use each of this cloud platform for the services they provide. While I and they were to move the data to where I needed to be. That is so critical. So you almost start to think about the one possession and organization now has. And we talked about this with a group of CEOs. They might be some pretty soon. Not too far off, but data stolen asset. I might actually have our data mark data market, just like you. I was stopped working, but I can start to sell my data. You know, imagine a coup in 19. There's so many organization that have so much data, and many of them have contributed to this research because this is an existence of issue. But you can see this turning into a next level. So, yes, we've got activities, will move the data toe one level higher where it's become a foundation construct for the organization. The next part is gonna actually done. This is the one asset would actually monetize someone stuff. And it will be not too long when you need to talk about how there's this new exchange and what's the rate of data for this company? Was, is that company in the future trading options? Who knows is gonna be really interesting. >>Well, I think you're right on this notion of a data. Marketplaces is coming, and it's not not that far away, Blash. It's always great to talk to you. I hope next year a data driven weaken we could be face to face. But I mean, look, this has been we we've dealt with it. It's it's actually created opportunities for us toe to reinvent ourselves. So congratulations on the success that you've had and ah, and thank you for coming on the Cube. >>No, thank you for hosting us and always a big fan off Cube. You guys, you engage with you since early days, and it is fascinating to see how this company has grown. And it's probably many people don't even know how much you've grown behind the seats, technologies and culture that you created yourself. So it's hopefully one day we'll strict the table that I would be another side and asking of our transformation. Digital transformation of Cuban cell >>I would love to. I'd love to do that index again. And thank you, everybody for watching our continuous coverage of active fio data driven keeper Right there. We'll be back with our next guest right after this short break. >>Thank you.

Published Date : Sep 9 2020

SUMMARY :

Great to see you again. is you call it the next normal. There's a is a way to create this next normal, and you just figured out how to live with the environment, And then, of course, the entrepreneurial spirit kicked in and they said, Okay, we can only control what we can control really found the next mom, you really start forgetting our out of continue to innovate Could, I mean, every conference you go to, the divide between organizations that know how to leverage, I mean, you saw the work from I said many times I thought it was more of a symptom than it was a strategy, but it's that's completely End of 2018 Io's and in addition to just the usual cloud providers of you all know and love inside And I'm curious as to what you're seeing. the business move faster, more productive or you will survive. You know, that is interesting to hear you talk about that because that has been the holy Grail of backup. and the ability to find a way to leverage those. You mentioned the word journey and triggered a thought. I want speed, you know, journeys like Okay, And I thought he was going to come back with Well, you know, I gotta go to the abuses. and the big risk and you think about with Cove it were it not for Technology Way How do you identify them and what do you have to do and more importantly, I mean that these air all trends that are tail winds for companies that can that can help sort of you And it will be not too long when you need to talk But I mean, look, this has been we we've dealt with it. the seats, technologies and culture that you created yourself. I'd love to do that index again.

ENTITIES

Entity	Category	Confidence
Robbie	PERSON	0.99+
Ashutosh	PERSON	0.99+
Europe	LOCATION	0.99+
2018	DATE	0.99+
January	DATE	0.99+
Boston	LOCATION	0.99+
Asia	LOCATION	0.99+
10 years	QUANTITY	0.99+
March	DATE	0.99+
Active Eo	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
UK	LOCATION	0.99+
20%	QUANTITY	0.99+
uber	ORGANIZATION	0.99+
Elon Musk	PERSON	0.99+
last quarter	DATE	0.99+
45%	QUANTITY	0.99+
two years	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
three reasons	QUANTITY	0.99+
360 degree	QUANTITY	0.99+
one	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
six months	QUANTITY	0.99+
six weeks	QUANTITY	0.99+
Three	QUANTITY	0.99+
U. S.	LOCATION	0.99+
Ash	PERSON	0.99+
next year	DATE	0.99+
Four constituencies	QUANTITY	0.99+
Oliver Half	PERSON	0.99+
2020	DATE	0.99+
one car	QUANTITY	0.99+
18 months	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
second reason	QUANTITY	0.99+
6 weeks	QUANTITY	0.98+
each	QUANTITY	0.98+
this year	DATE	0.98+
second practical visa	QUANTITY	0.98+
10 years ago	DATE	0.98+
16 4017	DATE	0.98+
three	QUANTITY	0.97+
19	QUANTITY	0.97+
12 more percent	QUANTITY	0.97+
2000 accounts	QUANTITY	0.97+
first	QUANTITY	0.97+
Saxes	ORGANIZATION	0.97+
both	QUANTITY	0.97+
four	QUANTITY	0.97+
first thing	QUANTITY	0.97+
John	PERSON	0.97+
today	DATE	0.96+
single	QUANTITY	0.96+
About 20 years ago	DATE	0.96+
86	OTHER	0.96+
Cee Io	ORGANIZATION	0.96+
One	QUANTITY	0.95+
10 times	QUANTITY	0.95+
FiOS	ORGANIZATION	0.95+
about 20%	QUANTITY	0.95+
End of 2018	DATE	0.94+
That's Doing an Aristo covert 19	TITLE	0.94+
single source	QUANTITY	0.93+
Google	ORGANIZATION	0.93+
India	LOCATION	0.93+
Lake House	LOCATION	0.93+
first flower	QUANTITY	0.93+
second sight	QUANTITY	0.93+
8	QUANTITY	0.93+
Ash Ashutosh	PERSON	0.93+
pandemic	EVENT	0.93+
Mars	LOCATION	0.92+
first reaction	QUANTITY	0.92+
Century Days Is Central Data System	ORGANIZATION	0.9+
seven	QUANTITY	0.9+
single version	QUANTITY	0.89+
a year and a half	QUANTITY	0.88+
100 ex	QUANTITY	0.87+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for lake house: