Tomer Shiran, Dremio | AWS re:Invent 2022

>>Hey everyone. Welcome back to Las Vegas. It's the Cube live at AWS Reinvent 2022. This is our fourth day of coverage. Lisa Martin here with Paul Gillen. Paul, we started Monday night, we filmed and streamed for about three hours. We have had shammed pack days, Tuesday, Wednesday, Thursday. What's your takeaway? >>We're routed final turn as we, as we head into the home stretch. Yeah. This is as it has been since the beginning, this show with a lot of energy. I'm amazed for the fourth day of a conference, how many people are still here I am too. And how, and how active they are and how full the sessions are. Huge. Proud for the keynote this morning. You don't see that at most of the day four conferences. Everyone's on their way home. So, so people come here to learn and they're, and they're still >>Learning. They are still learning. And we're gonna help continue that learning path. We have an alumni back with us, Toron joins us, the CPO and co-founder of Dremeo. Tomer, it's great to have you back on the program. >>Yeah, thanks for, for having me here. And thanks for keeping the, the best session for the fourth day. >>Yeah, you're right. I like that. That's a good mojo to come into this interview with Tomer. So last year, last time I saw you was a year ago here in Vegas at Reinvent 21. We talked about the growth of data lakes and the data lake houses. We talked about the need for open data architectures as opposed to data warehouses. And the headline of the Silicon Angle's article on the interview we did with you was, Dremio Predicts 2022 will be the year open data architectures replace the data warehouse. We're almost done with 2022. Has that prediction come true? >>Yeah, I think, I think we're seeing almost every company out there, certainly in the enterprise, adopting data lake, data lakehouse technology, embracing open source kind of file and table formats. And, and so I think that's definitely happening. Of course, nothing goes away. So, you know, data warehouses don't go away in, in a year and actually don't go away ever. We still have mainframes around, but certainly the trends are, are all pointing in that direction. >>Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, what it really means for organizations. >>Yeah. I think you could think of the data lakehouse as the evolution of the data lake, right? And so, you know, for, for, you know, the last decade we've had kind of these two options, data lakes and data warehouses and, you know, warehouses, you know, having good SQL support, but, and good performance. But you had to spend a lot of time and effort getting data into the warehouse. You got locked into them, very, very expensive. That's a big problem now. And data lakes, you know, more open, more scalable, but had all sorts of kind of limitations. And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache Iceberg, is we've unlocked all the capabilities of the warehouse directly on object storage like s3. So you can insert and update and delete individual records. You can do transactions, you can do all the things you could do with a, a database directly in kind of open formats without getting locked in at a much lower cost. >>But you're still dealing with semi-structured data as opposed to structured data. And there's, there's work that has to be done to get that into a usable form. That's where Drio excels. What, what has been happening in that area to, to make, I mean, is it formats like j s o that are, are enabling this to happen? How, how we advancing the cause of making semi-structured data usable? Yeah, >>Well, I think first of all, you know, I think that's all changed. I think that was maybe true for the original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. It's all, it's all tables with the schema. And, you know, you can, you know, create table insert records. You know, it's, it's, it's really everything you can do with a data warehouse you can now do in the lakehouse. Now, that's not to say that there aren't like very advanced capabilities when it comes to, you know, j s O and nested data and kind of sparse data. You know, we excel in that as well. But we're really seeing kind of the lakehouse take over the, the bread and butter data warehouse use cases. >>You mentioned open a minute ago. Talk about why it's, why open is important and the value that it can deliver for customers. >>Yeah, well, I think if you look back in time and you see all the challenges that companies have had with kind of traditional data architectures, right? The, the, the, a lot of that comes from the, the, the problems with data warehouses. The fact that they are, you know, they're very expensive. The data is, you have to ingest it into the data warehouse in order to query it. And then it's almost impossible to get off of these systems, right? It takes an enormous effort, tremendous cost to get off of them. And so you're kinda locked in and that's a big problem, right? You also, you're dependent on that one data warehouse vendor, right? You can only do things with that data that the warehouse vendor supports. And if you contrast that to data lakehouse and open architectures where the data is stored in entirely open formats. >>So things like par files and Apache iceberg tables, that means you can use any engine on that data. You can use s SQL Query Engine, you can use Spark, you can use flin. You know, there's a dozen different engines that you can use on that, both at the same time. But also in the future, if you ever wanted to try something new that comes out, some new open source innovation, some new startup, you just take it and point out the same data. So that data's now at the core, at the center of the architecture as opposed to some, you know, vendors logo. Yeah. >>Amazon seems to be bought into the Lakehouse concept. It has big announcements on day two about eliminating the ETL stage between RDS and Redshift. Do you see the cloud vendors as pushing this concept forward? >>Yeah, a hundred percent. I mean, I'm, I'm Amazon's a great, great partner of ours. We work with, you know, probably 10 different teams there. Everything from, you know, the S3 team, the, the glue team, the click site team, you know, everything in between. And, you know, their embracement of the, the, the lake house architecture, the fact that they adopted Iceberg as their primary table format. I think that's exciting as an industry. We're all coming together around standard, standard ways to represent data so that at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account in open formats and be able to use all these different engines without losing any of the functionality that they need, right? The ability to do all these interactions with data that maybe in the past you would have to move the data into a database or, or warehouse in order to do, you just don't have to do that anymore. Speaking >>Of functionality, talk about what's new this year with drio since we've seen you last. >>Yeah, there's a lot of, a lot of new things with, with Drio. So yeah, we now have full Apache iceberg support, you know, with DML commands, you can do inserts, updates, deletes, you know, copy into all, all that kind of stuff is now, you know, fully supported native part of the platform. We, we now offer kind of two flavors of dr. We have, you know, Dr. Cloud, which is our SaaS version fully hosted. You sign up with your Google or, you know, Azure account and, and, and you're up in, you're up and running in, in, in a minute. And then dral software, which you can self host usually in the cloud, but even, even even outside of the cloud. And then we're also very excited about this new idea of data as code. And so we've introduced a new product that's now in preview called Dr. >>Arctic. And the idea there is to bring the concepts of GI or GitHub to the world of data. So things like being able to create a branch and work in isolation. If you're a data scientist, you wanna experiment on your own without impacting other people, or you're a data engineer and you're ingesting data, you want to transform it and test it before you expose it to others. You can do that in a branch. So all these ideas that, you know, we take for granted now in the world of source code and software development, we're bringing to the world of data with Jamar. And when you think about data mesh, a lot of people talking about data mesh now and wanting to kind of take advantage of, of those concepts and ideas, you know, thinking of data as a product. Well, when you think about data as a product, we think you have to manage it like code, right? You have to, and that's why we call it data as code, right? The, all those reasons that we use things like GI have to build products, you know, if we wanna think of data as a product, we need all those capabilities also with data. You know, also the ability to go back in time. The ability to undo mistakes, to see who changed my data and when did they change that table. All of those are, are part of this, this new catalog that we've created. >>Are you talk about data as a product that's sort of intrinsic to the data mesh concept. Are you, what's your opinion of data mesh? Is the, is the world ready for that radically different approach to data ownership? >>You know, we are now in dozens of, dozens of our customers that are using drio for to implement enterprise-wide kind of data mesh solutions. And at the end of the day, I think it's just, you know, what most people would consider common sense, right? In a large organization, it is very hard for a centralized single team to understand every piece of data, to manage all the data themselves, to, you know, make sure the quality is correct to make it accessible. And so what data mesh is first and foremost about is being able to kind of federate the, or distribute the, the ownership of data, the governance of the data still has to happen, right? And so that is, I think at the heart of the data mesh, but thinking of data as kind of allowing different teams, different domains to own their own data to really manage it like a product with all the best practices that that we have with that super important. >>So we we're doing a lot with data mesh, you know, the way that cloud has multiple projects and the way that Jamar allows you to have multiple catalogs and different groups can kind of interact and share data among each other. You know, the fact that we can connect to all these different data sources, even outside your data lake, you know, with Redshift, Oracle SQL Server, you know, all the different databases that are out there and join across different databases in addition to your data lake, that that's all stuff that companies want with their data mesh. >>What are some of your favorite customer stories that where you've really helped them accelerate that data mesh and drive business value from it so that more people in the organization kind of access to data so they can really make those data driven decisions that everybody wants to make? >>I mean, there's, there's so many of them, but, you know, one of the largest tech companies in the world creating a, a data mesh where you have all the different departments in the company that, you know, they, they, they were a big data warehouse user and it kinda hit the wall, right? The costs were so high and the ability for people to kind of use it for just experimentation, to try new things out to collaborate, they couldn't do it because it was so prohibitively expensive and difficult to use. And so what they said, well, we need a platform that different people can, they can collaborate, they can ex, they can experiment with the data, they can share data with others. And so at a big organization like that, the, their ability to kind of have a centralized platform but allow different groups to manage their own data, you know, several of the largest banks in the world are, are also doing data meshes with Dr you know, one of them has over over a dozen different business units that are using, using Dremio and that ability to have thousands of people on a platform and to be able to collaborate and share among each other that, that's super important to these >>Guys. Can you contrast your approach to the market, the snowflakes? Cause they have some of those same concepts. >>Snowflake's >>A very closed system at the end of the day, right? Closed and very expensive. Right? I think they, if I remember seeing, you know, a quarter ago in, in, in one of their earnings reports that the average customer spends 70% more every year, right? Well that's not sustainable. If you think about that in a decade, that's your cost is gonna increase 200 x, most companies not gonna be able to swallow that, right? So companies need, first of all, they need more cost efficient solutions that are, you know, just more approachable, right? And the second thing is, you know, you know, we talked about the open data architecture. I think most companies now realize that the, if you want to build a platform for the future, you need to have the data and open formats and not be locked into one vendor, right? And so that's kind of another important aspect beyond that's ability to connect to all your data, even outside the lake to your different databases, no sequel databases, relational databases, and drs semantic layer where we can accelerate queries. And so typically what you have, what happens with data warehouses and other data lake query engines is that because you can't get the performance that you want, you end up creating lots and lots of copies of data. You, for every use case, you're creating a, you know, a pre-joy copy of that data, a pre aggregated version of that data. And you know, then you have to redirect all your data. >>You've got a >>Governance problem, individual things. It's expensive. It's expensive, it's hard to secure that cuz permissions don't travel with the data. So you have all sorts of problems with that, right? And so what we've done because of our semantic layer that makes it easy to kind of expose data in a logical way. And then our query acceleration technology, which we call reflections, which transparently accelerates queries and gives you subsecond response times without data copies and also without extracts into the BI tools. Cause if you start doing bi extracts or imports, again, you have lots of copies of data in the organization, all sorts of refresh problems, security problems, it's, it's a nightmare, right? And that just collapsing all those copies and having a, a simple solution where data's stored in open formats and we can give you fast access to any of that data that's very different from what you get with like a snowflake or, or any of these other >>Companies. Right. That, that's a great explanation. I wanna ask you, early this year you announced that your Dr. Cloud service would be a free forever, the basic DR. Cloud service. How has that offer gone over? What's been the uptake on that offer? >>Yeah, it, I mean it is, and thousands of people have signed up and, and it's, I think it's a great service. It's, you know, it's very, very simple. People can go on the website, try it out. We now have a test drive as well. If, if you want to get started with just some sample public sample data sets and like a tutorial, we've made that increasingly easy as well. But yeah, we continue to, you know, take that approach of, you know, making it, you know, making it easy, democratizing these kind of cloud data platforms and, and kinda lowering the barriers to >>Adoption. How, how effective has it been in driving sales of the enterprise version? >>Yeah, a lot of, a lot of, a lot of business with, you know, that, that we do like when it comes to, to selling is, you know, folks that, you know, have educated themselves, right? They've started off, they've followed some tutorials. I think generally developers, they prefer the first interaction to be with a product, not with a salesperson. And so that's, that's basically the reason we did that. >>Before we ask you the last question, I wanna just, can you give us a speak peek into the product roadmap as we enter 2023? What can you share with us that we should be paying attention to where Drum is concerned? >>Yeah. You know, actually a couple, couple days ago here at the conference, we, we had a press release with all sorts of new capabilities that we, we we just released. And there's a lot more for, for the coming year. You know, we will shortly be releasing a variety of different performance enhancements. So we'll be in the next quarter or two. We'll be, you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections and our career acceleration, you know, support for all the major clouds is coming. You know, just a lot of capabilities in Inre that make it easier and easier to use the platform. >>Awesome. Tomer, thank you so much for joining us. My last question to you is, if you had a billboard in your desired location and it was going to really just be like a mic drop about why customers should be looking at Drio, what would that billboard say? >>Well, DRIO is the easy and open data lake house and, you know, open architectures. It's just a lot, a lot better, a lot more f a lot more future proof, a lot easier and a lot just a much safer choice for the future for, for companies. And so hard to argue with those people to take a look. Exactly. That wasn't the best. That wasn't the best, you know, billboards. >>Okay. I think it's a great billboard. Awesome. And thank you so much for joining Poly Me on the program, sharing with us what's new, what some of the exciting things are that are coming down the pipe. Quite soon we're gonna be keeping our eye Ono. >>Awesome. Always happy to be here. >>Thank you. Right. For our guest and for Paul Gillin, I'm Lisa Martin. You're watching The Cube, the leader in live and emerging tech coverage.

Published Date : Dec 1 2022

SUMMARY :

It's the Cube live at AWS Reinvent This is as it has been since the beginning, this show with a lot of energy. it's great to have you back on the program. And thanks for keeping the, the best session for the fourth day. And the headline of the Silicon Angle's article on the interview we did with you was, So, you know, data warehouses don't go away in, in a year and actually don't go away ever. Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache are enabling this to happen? original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. You mentioned open a minute ago. The fact that they are, you know, they're very expensive. at the center of the architecture as opposed to some, you know, vendors logo. Do you see the at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account Apache iceberg support, you know, with DML commands, you can do inserts, updates, So all these ideas that, you know, we take for granted now in the world of Are you talk about data as a product that's sort of intrinsic to the data mesh concept. And at the end of the day, I think it's just, you know, what most people would consider common sense, So we we're doing a lot with data mesh, you know, the way that cloud has multiple several of the largest banks in the world are, are also doing data meshes with Dr you know, Cause they have some of those same concepts. And the second thing is, you know, you know, stored in open formats and we can give you fast access to any of that data that's very different from what you get What's been the uptake on that offer? But yeah, we continue to, you know, take that approach of, you know, How, how effective has it been in driving sales of the enterprise version? to selling is, you know, folks that, you know, have educated themselves, right? you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections My last question to you is, if you had a Well, DRIO is the easy and open data lake house and, you And thank you so much for joining Poly Me on the program, sharing with us what's new, Always happy to be here. the leader in live and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Paul Gillen	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tomer	PERSON	0.99+
Tomer Shiran	PERSON	0.99+
Toron	PERSON	0.99+
Las Vegas	LOCATION	0.99+
70%	QUANTITY	0.99+
Monday night	DATE	0.99+
Vegas	LOCATION	0.99+
fourth day	QUANTITY	0.99+
Paul	PERSON	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
dozens	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
10 different teams	QUANTITY	0.99+
Dremio	PERSON	0.99+
early this year	DATE	0.99+
SQL Query Engine	TITLE	0.99+
The Cube	TITLE	0.99+
Tuesday	DATE	0.99+
2023	DATE	0.99+
one	QUANTITY	0.98+
a year ago	DATE	0.98+
next quarter	DATE	0.98+
S3	TITLE	0.98+
a quarter ago	DATE	0.98+
twice	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Drio	ORGANIZATION	0.98+
couple days ago	DATE	0.98+
both	QUANTITY	0.97+
DRIO	ORGANIZATION	0.97+
2022	DATE	0.97+
Lake House	ORGANIZATION	0.96+
thousands of people	QUANTITY	0.96+
Wednesday	DATE	0.96+
Spark	TITLE	0.96+
200 x	QUANTITY	0.96+
first	QUANTITY	0.96+
Drio	TITLE	0.95+
Dremeo	ORGANIZATION	0.95+
two options	QUANTITY	0.94+
about three hours	QUANTITY	0.94+
day two	QUANTITY	0.94+
s3	TITLE	0.94+
Apache Iceberg	ORGANIZATION	0.94+
a minute ago	DATE	0.94+
Silicon Angle	ORGANIZATION	0.94+
hundred percent	QUANTITY	0.93+
Apache	ORGANIZATION	0.93+
single team	QUANTITY	0.93+
GitHub	ORGANIZATION	0.91+
this morning	DATE	0.9+
a dozen different engines	QUANTITY	0.89+
Iceberg	TITLE	0.87+
Redshift	TITLE	0.87+
last	DATE	0.87+
this year	DATE	0.86+
first interaction	QUANTITY	0.85+
two flavors	QUANTITY	0.84+
Thursday	DATE	0.84+
Azure	ORGANIZATION	0.84+
DR. Cloud	ORGANIZATION	0.84+
SQL Server	TITLE	0.83+
four conferences	QUANTITY	0.82+
coming year	DATE	0.82+
over over a dozen different business	QUANTITY	0.81+
one vendor	QUANTITY	0.8+
Poly	ORGANIZATION	0.79+
Jamar	PERSON	0.77+
GI	ORGANIZATION	0.77+
Inre	ORGANIZATION	0.76+
Dr.	ORGANIZATION	0.73+
Lake house	ORGANIZATION	0.71+
Arctic	ORGANIZATION	0.71+
a year	QUANTITY	0.7+
a minute	QUANTITY	0.7+
SQL	TITLE	0.69+
AWS Reinvent 2022	EVENT	0.69+
subsecond	QUANTITY	0.68+
DML	TITLE	0.68+

Tomer Shiran, Dremio | AWS re:Invent 2021

>>Good morning. Welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm Lisa Martin. We have two live sets here. We've got over a hundred guests on the program this week with our live sets of remote sets, talking about the next decade in cloud innovation. And I'm pleased to be welcoming back. One of our cube alumni timbers. She ran the founder and CPO of Jenny-O to the program. Tom is going to be talking about why 2022 is the year open data architectures surpass the data warehouse Timur. Welcome back to the >>Cube. Thanks for having me. It's great to be here. It's >>Great to be here at a live event in person, my goodness, sitting side by side with guests. Talk to me a little bit about before we kind of dig into the data lake house versus the data warehouse. I want to, I want to unpack that with you. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, but what are some of the things going on right now in the fall of 2021? >>Yeah, for us, it's a big year of, uh, a lot of product news, a lot of new products, new innovation, a company's grown a lot. We're, uh, you know, probably three times bigger than we were a year ago. So a lot of, a lot of new, new folks on the team and, uh, many, many new customers. >>It's good, always new customers, especially during the last 22 months, which have been obviously incredibly challenging, but I want to unpack this, the difference between a data lake and data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities and how customers are benefiting. Sure. Yeah. >>I think you could think of the lake house as kind of the evolution of the lake, right? So we have, we've had data lakes for a while. Now, the transition to the cloud made them a lot more powerful and now a lot of new capabilities coming into the world of data lakes really make the, that whole kind of concept that whole architecture, much more powerful to the point that you really are not going to need a data warehouse anymore. Right. And so it kind of gives you the best of both worlds, all the advantages that we had with data lakes, the flexibility to use different processing engines, to have data in your own account and open formats, um, all those benefits, but also the benefits that you had with warehouses, where you could do transactions and get high performance for your, uh, BI workloads and things like that. So the lake house makes kind of both of those come together and gives you the, the benefits of both >>Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go about from say they've got data warehouses, data lakes to actually evolving to the lake house. >>You know, data warehouses have been around forever, right? And you know, there's, there's been some new innovation there as we've kind of moved to the cloud, but fundamentally there are very close and very proprietary architecture that gets very expensive quickly. And so, you know, with a data warehouse, you have to take your data and load it into the warehouse, right. You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, that's, that's what you do. You bring the data into the engine. Um, the data lake house is a really different architecture. It's one where you actually, you're having, you have data as its own tier, right? Stored in open formats, things like parquet files and iceberg tables. And you're basically bringing the engines to the data instead of the data to the engine. And so now all of a sudden you can start to take advantage of all this innovation that's happening on the same set of data without having to copy and move it around. So whether that's, you know, Dremio for high performance, uh, BI workloads and SQL type of analysis, a spark for kind of batch processing and machine learning, Flink for streaming. So lots of different technologies that you can use on the, on the same data and the data stays in the customer's own account, right? So S3 effectively becomes their new data warehouse. >>Okay. So it can imagine during the last 22 months of this scattered work from Eddie, and we're still in this work from anywhere environment with so much data being generated at the edge of the edge, expanding that bringing the engines to the data is probably now more timely than ever. >>Yeah. I think the, the growth in data, uh, you see it everywhere, right? That that's the reason so many companies like ourselves are doing so well. Right? It's, it's, there's so much new data, so many new use cases and every company wants to be data-driven right. They all want to be, you know, to, to democratize data within the organization. Um, you know, but you need the platforms to be able to do that. Right. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, you know, which maybe is landing in S3, but move it into, you know, subsets of it into a data warehouse. And then from there move, you know, substance of that into, you know, BI extracts, right? Tableau extracts power BI imports, and you have to create cubes and lots of copies within the data warehouse. There's no way you're going to be able to provide self-service and data democratization. And so really requires a new architecture. Um, and that's one of the main things that we've been focused on at Dremio, um, is really taking the, the, the lake house and the lake and making it, not just something that data scientists use for, you know, really kind of advanced use cases, but even your production BI workloads can actually now run on the lake house when you're using a SQL technology. Like, and then >>It's really critical because as you talked about this, you know, companies, every company, these days is a data company. If they're not, they have to be, or there's a competitor in the rear view mirror that is going to be able to take over what they're doing. So this really is really critical, especially considering another thing that we learned in the last 22 months is that there's no real-time data access is no longer, a nice to have. It's really an essential for businesses in any organization. >>I think, you know, we, we see it even in our own company, right? The folks that are joining the workforce now, they, they learn sequel in school, right. They, they, they don't want to report on their desk, printed out every Monday morning. They want access to the database. How do I connect my whatever tool I want, or even type sequel by hand. And I want access to the data and I want to just use it. Right. And I want the performance of course, to be fast because otherwise I'll get frustrated and I won't use it, which has been the status quo for a long time. Um, and that's basically what we're solving >>The lake house versus a data warehouse, better able to really facilitate data democratization across an organization. >>Yeah. Because there's a big, you know, people don't talk a lot about the story before the story, right. With, with a data warehouse, the data never starts there. Right. You typically first have your data in something like an S3 or perhaps in other databases, right. And then you have to kind of ETL at all into, um, into that warehouse. And that's a lot of work. And typically only a small subset of the data gets ETL into that data warehouse. And then the user wants to query something that's not in the warehouse. And somebody has to go from engineering, spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, uh, to get the data in. And so it's a big problem, right? And so if you can have a system that can query the data directly in S3 and even join it with sources, uh, outside of that things like your Oracle database, your, your SQL server database here, you know, Mongo, DB, et cetera. Well, now you can really have the ability to expose data to your, to your users within the company and make it very self-service. They can, they can query any data at any time and get a fast response time that that's, that's what they need >>At self-service is key there. Speaking of self-service and things that are new. I know you guys dromio cloud launched that recently, new SAS offering. Talk to me about that. What's going on there. Yeah. >>We want to stream your cloud. We, we spent about two years, um, working on that internally and, uh, really the goal was to simplify how we deliver all of the, kind of the benefits that we've had in our product. Right. Sub-second response times on the lake, a semantic layer, the ability to connect to multiple sources, but take away the pain of having to, you know, install and manage software. Right. And so we did it in a way that the user doesn't have to think about versions. They don't have to think about upgrades. They don't have to monitor anything. It's basically like running and using Gmail. Right? You log in, you, you get to use it, right. You don't have to be very sophisticated. There's no, not a lot of administration you have to do. Um, it basically makes it a lot, a lot simpler. >>And what's the adoption been like so far? >>It's been great. It's been limited availability, but we've been onboarding customers, uh, every week now. Um, many startups, many of the world's largest companies. So that's been, that's been really exciting actually. >>So quite a range of customers. And one of the things, it sounds like you want me to has grown itself during the pandemic. We've seen acceleration of, of that, of, of, uh, startups, of a lot of companies, of cloud adoption of migration. What are some, how have your customer conversations changed in the last 22 months as businesses and every industry kind of scrambled in the beginning to, to survive and now are realizing that they need to modernize, to thrive and to be competitive and to have competitive advantage. >>I think I've seen a few different trends here. One is certainly, there's been a lot of, uh, acceleration of movement to the cloud, right? With, uh, uh, you know, how different businesses have been impacted. It's required them to be more agile, more elastic, right. They don't necessarily know how much workload they're gonna have at any point in time. So having that flexibility, both in terms of the technology that can, you know, with Dremio cloud, we scale, for example, infinitely, like you can have, you know, one query a day, or you can have a thousand queries a second and the system just takes care of it. Right. And so that's really important to these companies that are going through, you know, being impacted in various different ways, right? You had the companies, you know, the Peloton and zooms of the world that were business was exploding. >>And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, uh, you know, since then, but so that flexibility, um, has been really important to customers. I think the other thing is just they've realized that they have to leverage data, right? Because in parallel to this pandemic has been also really a boom in technology, right? And so every industry is being disrupted by new startups, whether it's the insurance industry, the financial services, a lot of InsureTech, FinTech, you know, different, uh, companies that are trying to take advantage of data. So if you, as a, as an enterprise are not doing that, you know, that's a problem. >>It is a problem. It's definitely something that I think every business and every industry needs to be very acutely aware of because from a competitive advantage perspective, you know, there's someone in that rear view mirror who is going to be focused on data. I have a real solid, modern data strategy. That's going to be able to take over if a company is resting on its laurels at all. So here we are at reinvent, they talked a lot about, um, I just came off of Adam psyllid speeds. So Lipsey's keynote. But talk to me about the jumbo AWS partnership. I know AWS its partner ecosystem is huge. You're one of the partners, but talk to me about what's going on with the partnership. How long have you guys been partners? What are the advantages for your customers? >>You know, we've been very close partners with AWS for, for a number of years now, and it kind of spans many different parts of AWS from kind of the, uh, the engineering organization. So very close relationship with the S3 team, the C2 team, uh, you know, just having dinner last night with, uh, Kevin Miller, the GM of S3. Um, and so that's kind of one side of things is really the engineering integration. You know, we're the first technology to integrate with AWS lake formation, which is Amazon's data lake security technology. So we do a lot of work together on kind of upcoming features that Amazon is releasing. Um, and then also they've been really helpful on the go-to-market side of things on the sales and marketing, um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually promoting Dremio to their customers, um, uh, to help them be successful. So it's really been a good, good partnership. >>And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, their customer obsession sounds like you're, there's deep alignment on from the technical engineering perspective, sales and marketing. Talk to me a little bit about cultural alignment, because when you're going into customer conversations, I imagine they want to see one unified team. >>Yeah. You know, I think Amazon does have that customer first and obviously we do as well. And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. Right. So that's where we call it customer obsession internally. Um, and I think that's very much what we've seen, you know, with, with AWS as well as the desire to make the customer successful comes before. Okay. How does this affect a specific Amazon product? Right? Because anytime a customer is, uh, you know, using Dremio on AWS, they're also consuming many different AWS services and they're bringing data into AWS. And so, um, I, I think for both of us, it's all about how do we solve customer problems and make them successful with their data in this case. Yup. >>Solving those customer problems is the whole reason that we're all here. Right. Talk to me a little bit about, um, as we have just a few more minutes here, we, when we hear terms like, future-proof, I always want to dig in with, with folks like yourself, chief product officers, what does it actually mean? How do you enable businesses to create these future-proof data architectures that are gonna allow them to scale and be really competitive? Sure. >>So yeah, I think many companies have been, have experienced. What's known as lock-in right. They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, right? You, you start using that and you can really never get off and prices go up and you find out that you're spending 10 times more, especially now with the cloud data warehouses 10 times more than you thought you were going to be spending. And at that point it becomes very difficult. Right? What do you do? And so, um, one of the great things about the data lake and the lake house architecture is that the data stays stored in the customer's own account. Right? It's in their S3 buckets in source formats, like parquet files and iceberg tables. Um, and they can use many different technologies on that. So, you know, today the best technology for, for, you know, sequel and, you know, powering your, your mission critical BI is, is Dremio, but tomorrow they might be something else, right. >>And that customer can then take that, uh, uh, that company can take that new technology point at the same data and start using it right. That they don't have to go through some really crazy migration process. And, you know, we see that with Teradata data and Oracle, right? The, the, the old school vendors, um, that's always been a pain. And now it is with the, with the newer, uh, cloud data warehouses, you see a lot of complaints around that, so that the lake house is fundamentally designed. Especially if you choose open source formats, like iceberg tables, as opposed to say a Delta, like you're, you're really, you know, future-proofing yourself. Right. Um, >>Got it. Talk to me about some of the things as we wrap up here that, that attendees can learn and see and touch and feel and smell at the jumbo booth at this reinvent. >>Yeah. I think there's a, there's a few different things they can, uh, they can watch, uh, watch a demo or play around with the dremmel cloud and they can talk to our team about what we're doing with Apache iceberg. It's a iceberg to me is one of the more exciting projects, uh, in this space because, you know, it's just created by Netflix and apple Salesforce, AWS just announced support for iceberg with that, with their products, Athena and EMR. So it's really kind of emerging as the standard table format, the way to represent data in open formats in S3. We've been behind iceberg now for, for a while. And so that to us is very exciting. We're happy to chat with folks at the booth about that. Um, Nessie is another project that we created an source project for, uh, really providing a good experience for your data, where you have version control and branching, and kind of trying to reinvent, uh, data engineering, data management. So that's another cool project that there, uh, we can talk about at the booth. >>So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program today, talking about the difference between a data warehouse data lake, the lake house, did a great job explaining that Jamil cloud what's going on and how you guys are deepening that partnership with AWS. We appreciate your time. Thank you. Thanks for having me. My pleasure for Tomer. She ran I'm Lisa Martin. You're watching the cube. Our coverage of AWS reinvent continues after this.

Published Date : Nov 30 2021

SUMMARY :

She ran the founder and CPO of Jenny-O to the program. It's great to be here. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, We're, uh, you know, probably three times bigger than we were a year data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities So the lake house makes kind of both of those come together and gives you the, the benefits of both Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, expanding that bringing the engines to the data is probably now more timely than ever. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, It's really critical because as you talked about this, you know, companies, every company, these days is a data company. I think, you know, we, we see it even in our own company, right? The lake house versus a data warehouse, better able to really facilitate data democratization across spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, I know you guys dromio cloud launched that recently, to, you know, install and manage software. Um, many startups, many of the world's largest companies. And one of the things, it sounds like you want me to has grown itself during the pandemic. So having that flexibility, both in terms of the technology that can, you know, And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, You're one of the partners, but talk to me about what's going on with the partnership. um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. How do you enable businesses to create these future-proof They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, And, you know, we see that with Teradata data and Oracle, right? Talk to me about some of the things as we wrap up here that, that attendees can learn and see and uh, in this space because, you know, it's just created by Netflix and apple Salesforce, So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Kevin Miller	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tom	PERSON	0.99+
10 times	QUANTITY	0.99+
10 times	QUANTITY	0.99+
Tomer	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Elizabeth	PERSON	0.99+
two months	QUANTITY	0.99+
Tomer Shiran	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Lipsey	PERSON	0.99+
Dremio	PERSON	0.99+
tomorrow	DATE	0.99+
apple	ORGANIZATION	0.99+
a month	QUANTITY	0.99+
One	QUANTITY	0.99+
fall of 2021	DATE	0.98+
today	DATE	0.98+
Eddie	PERSON	0.98+
one	QUANTITY	0.98+
both worlds	QUANTITY	0.98+
Adam psyllid	PERSON	0.98+
Gmail	TITLE	0.98+
S3	TITLE	0.97+
next decade	DATE	0.97+
SQL	TITLE	0.97+
a year ago	DATE	0.97+
three times	QUANTITY	0.97+
two live sets	QUANTITY	0.97+
2022	DATE	0.97+
this week	DATE	0.96+
iceberg	TITLE	0.96+
Dremio	ORGANIZATION	0.96+
first	QUANTITY	0.96+
about two years	QUANTITY	0.95+
Apache	ORGANIZATION	0.95+
Tableau	TITLE	0.95+
Monday morning	DATE	0.94+
SAS	ORGANIZATION	0.94+
one query	QUANTITY	0.94+
Jemena	ORGANIZATION	0.94+
earlier this summer	DATE	0.93+
second	QUANTITY	0.93+
first focus	QUANTITY	0.92+
last 22 months	DATE	0.91+
Delta	ORGANIZATION	0.9+
zero	QUANTITY	0.9+
2021	DATE	0.89+
last night	DATE	0.87+
a thousand queries	QUANTITY	0.85+
Mongo	ORGANIZATION	0.85+
a day	QUANTITY	0.84+
first technology	QUANTITY	0.82+
pandemic	EVENT	0.81+
a second	QUANTITY	0.8+

Mark Lyons, Dremio | CUBE Conversation

(bright upbeat music) >> Hey everyone. Welcome to this "CUBE Conversation" featuring Dremio. I'm your host, Lisa Martin. And I'm excited today to be joined by Mark Lyons the VP of product management at Dremio. Mark thanks for joining us today. >> Hey Lisa, thank you for having me. Looking forward to the top. >> Yeah. Talk to me about what's going on at Dremio. I had the chance to talk to your chief product officer Tomer Shiran in a couple months ago but talk to us about what's going on. >> Yeah, I remember that at re:Invent it's been an exciting few months since re:Invent here at Dremio and just in the new year we raised our Series E since then we ran into our subsurface event which we had over seven, 8,000 registrants and attendees. And then we announced our Dremio cloud product generally available including Dremio Sonar, which is SQL query engine and Dremio Arctic in public preview which is a better store for the lakehouse. >> Great. And we're going to dig into both of those. I saw that over 400 million raised in that Series E raising the valuation of Dremio to 2 billion. So a lot of growth and momentum going on at the company I'm sure. If we think about businesses in any industry they've made large investments in data warehouses, proprietary data warehouses. Talk to me about historically what they've been able to achieve, but then what some those bottlenecks are that they're running into. >> Yeah, for sure. My background is actually in the data warehouse space. I spent over the last eight, maybe close to 10 years and we've seen this shift go on from the traditional enterprise data warehouse to the data lake to the the last couple years is really been the time of the cloud data warehouse. And there's been a large amount of adoption of cloud data warehouses, but fundamentally they still come with a lot of the same challenges that have always existed with the data warehouse, which is first of all you have to load your data into it. So that data's coming from lots of different sources. In many cases, it's landing in a files in the data lake like a repository like S3 first. And then there's a loading process, right? An ETL process. And those pipelines have to be maintained and stay operational. And typically as the data warehouse life cycle of processing moves on the scope of the data that consumers get to access gets smaller and smaller. The control of that data gets tighter and change process gets heavier, and it goes from quick changes of adding a column or adding a field to a file to days if not weeks for businesses to modify their data pipelines and test new scenarios offer new features in the application or answer new questions that the business is interested you know, from an analytics standpoint. So typically we see the same thing even with these cloud data warehouses, the scope of the data shrinks, the time to get answers gets longer. And when new engines come along the same story we see, and this is going on right now in the data warehouse space there's new data that are coming and they say, well we're a thousand faster times faster than the last data warehouse. And then it's like, okay, great. But what's the process? The process is to migrate all your data to the new data warehouse, right? And that comes with all the same baggage. Again, it's a proprietary format that you load your data into. So I think people are ready for a change from that. >> People are not only ready for a change, but as every company has to become a data company these days and access to real time data is no longer a nice to have. It's absolutely essential. The ability to scale the ability to harness the value from as much data as possible and to do so fast is real really table stakes for any organization. How is Dremio helping customers in that situation to operationalize their data? >> Yeah, so that's why I was so intrigued and loved about Dremio when I joined three, four, five months back. Coming from the warehouse space, when I first saw the product I was just like, oh my gosh, this is so much easier for folks. They can access a larger scope of their data faster, which to your point, like is table stakes for all organizations these days they need to be able to analyze data sooner. Sooner is the better. Data has a halflife, right? Like it decays. The value of data decays over time. So typically the most valuable data is the newest data. And that all depends on what we're the industries we're talking about the types of data and the use cases, but it's always basically true that newer data is more valuable and they need to be able to analyze as much of it as possible. The story can't be, no, we have to wait weeks or months to get a new data source or the story can't be you know, that data that includes seasonality. You know, we weren't able to keep in the same location because it's too expensive to keep it in the warehouse or whatever. So for Dremio and our customers our story is simple, is leverage the data where it is so access data in all sorts of sources, whether it's a post press database or an S3 bucket, and don't move the data don't copy the data, analyze it in place. And don't limit the scope of the data you're trying to analyze. If you have new use cases you have additional data sets that you want to add to those use cases, just bring them in, into S3 and you are off to the races and you can easily analyze more data and give more power to the end user. So if there's a field that they want to calculate the simple change convert this miles field, the kilometers well, the end users should be empowered to just make a calculation on the data like that. That should not require an entire cycle through a data engineering team and a backlog and a ticket and pushing that to production and so forth which in many cases it does at many organizations. It's a lot of effort to make new calculations on the data or derive new fields, add a new column and so forth. So Dremio makes the data engineers life easier and more productive. It also makes the data consumers life much easier and happier, and they can just do their job without worrying about and waiting. >> Not only can they do their job but from a business, a high level perspective the business is probably has the opportunity to be far more competitive because it's got a bigger scope of data, as you mentioned, access to it more widely faster and those are only good things in terms of- >> More use cases, more experiments, right? So what I've seen a lot is like there's no shortage of ideas of what people can do with the data. And projects that might be able to be undertaken but no one knows exactly how valuable that will be. How whether that's something that should be funded or should not be funded. So like more use cases, more experiments try more things. Like if it's cheap to try these data problems and see if it's valuable to the business then that's better for the business. Ultimately the business will be more competitive. We'll be able to try more new products we'll be able to have better operational kind of efficiencies, lower risk all those things. >> Right. What about data governance? Talk to me about how the Lakehouse enables that across all these disparate data volumes. >> I think this is where things get really interesting with the Lakehouse concept relative to where we used to be with a data lake, which was a parking ground for just lots of files. And that came with a lot of challenges when you just had a lot of files out there in a data lake, whether that was HDFS, right. I do data lake back in the day or now a cloud storage object, storage data lake. So historically I feel like governance, access authentication, auditing all were extremely challenging with the data lake but now in the modern kind of lake in the modern lakehouse world, all those challenges have been solved. You have great everything from the front of the house with all and access policies and data masking everything that you would expect through commits and tables and transactions and inserts and updates and deletes, and auditing of that data able to see, well who made the changes to the data, which engine, which user when were they made and seeing the whole history of a table and not just one, not just a mess of files in a file store. So it's really come a long way. I feel like where the renaissance stage of the 2.0 data lakes or lakehouses as people call them. But basically what you're seeing is a lot of functionality from the traditional warehouse, all available in the lake. And warehouses had a lot of governance built in. And whether that is encryption and column access policies and row access policies. So only the right user saw the right data or some data masking. So that like the social security was masked out but the analyst knew it was a social security number. That was all there. Now that's all available on the lakehouse and you don't need to copy data into a data warehouse just to meet those type of requirements. Huge one is also deletes, right? Like I feel like deletes were one of the Achilles heels of the original data lake when there was no governance. And people were just copying data sets around modifying data sets for whatever their analytics use case was. If someone said, "Hey, go delete the right. To be forgotten GDPR." Now you've got Californias CCPA and others all coming online. If you said, go delete this per you know, this records or set of records from there from a lake original lake. I think that was impossible, probably for many people to do it with confidence, like to say that like I fully deleted this. Now with the Apache like iceberg cable format that is stores in the lakehouse architecture, you actually have delete functionality, right? Which is a key component that warehouses are traditionally brought to the table. >> That's a huge component from a compliance perspective. You mentioned GDPR, CCPA, which is going to be CPRA in less than a year, but there's so many other regulations data privacy regulations that are coming up that the ability to delete that is going to be table stakes for organizations, something that you guys launched. And we just have a couple minutes left, but you launched I love the name, the forever free data Lakehouse platform. That sounds great. Forever Free. Talk to me about what that really means is consisting of two products the Sonar and Arctic that you mentioned, but talk to me about this Forever Free data Lakehouse. >> Yeah. I feel like this is an amazing step forward in this, in the industry. And because of the Dremio cloud architecture, where the execution and data lives in the customer's cloud account we're able to basically say, hey, the Dremio software the Dremio service side of this platform is Forever Free for users. Now there is a paid tier but there's a standard tier that is truly forever free. Now that that still comes with infrastructure bills from like your cloud provider, right? So if you use AWS, you still have an S3 bill like for your data sets because we're not moving them. They're staying in your Amazon account in your S3 bucket. You still do still have to pay for right. The infrastructure, the EC2 and the compute to do the data analytics but the actual softwares is free forever. And there's no one else in our space offering that at in our space, everything's a free trial. So here's your $500 of credit. Come try my product. And what we're saying is with this kind of our unique architectural approach and this is what I think is preferred by customers too. You know, we take care of all the query planning all the engine management, all the administrative the platform, the upgrades fully available zero downtime platform. So they get all the benefits of SaaS as well as the benefits of maintaining control over their data. And because that data staying in their account and the execution of the analytics is staying in their account. We don't incur that infrastructure bill. So we can have a free forever tier a forever free tier of our platform. And we've had tremendous adoption. I think we announced this beginning of March first week of March. So it's not even the end of March. Hundreds and hundreds of signups and many customers actively are users actively on the platform now live querying their data >> Just kind of summarizes the momentum that Dremio we seeing. Mark, thank you so much. We're out of time, but thanks for talking to me- >> Thank you. >> About what's new at Dremio. What you guys are doing. Next time, we'll have to unpack this even more. I'm sure there's loads more we could talk about but we appreciate that. >> Yeah, this was great. Thank you, Lisa. Thank you. >> My pleasure for Mark Lyons. I'm Lisa Martin. Keep it right here on theCUBE your leader in high tech hybrid event coverage. (upbeat music)

Published Date : Mar 24 2022

SUMMARY :

the VP of product management at Dremio. Looking forward to the top. I had the chance to talk to and just in the new year of Dremio to 2 billion. the time to get answers gets longer. and to do so fast is and pushing that to Ultimately the business Talk to me about how the Lakehouse enables and auditing of that data able to see, that the ability to delete that and the compute to do the data analytics Just kind of summarizes the momentum but we appreciate that. Yeah, this was great. your leader in high tech

ENTITIES

Entity	Category	Confidence
Mark Lyons	PERSON	0.99+
Lisa Martin	PERSON	0.99+
$500	QUANTITY	0.99+
Lisa	PERSON	0.99+
2 billion	QUANTITY	0.99+
Mark	PERSON	0.99+
Dremio	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Tomer Shiran	PERSON	0.99+
Hundreds	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
less than a year	QUANTITY	0.99+
GDPR	TITLE	0.99+
both	QUANTITY	0.99+
end of March	DATE	0.99+
today	DATE	0.99+
over 400 million	QUANTITY	0.98+
over seven, 8,000 registrants	QUANTITY	0.98+
first	QUANTITY	0.97+
Sonar	ORGANIZATION	0.97+
Arctic	ORGANIZATION	0.97+
Apache	ORGANIZATION	0.96+
two products	QUANTITY	0.96+
S3	TITLE	0.95+
Dremio Arctic	ORGANIZATION	0.94+
EC2	TITLE	0.94+
Lakehouse	ORGANIZATION	0.94+
CCPA	TITLE	0.94+
couple months ago	DATE	0.93+
re:Invent	EVENT	0.87+
five months back	DATE	0.86+
last couple years	DATE	0.86+
three	DATE	0.84+
one	QUANTITY	0.84+
couple minutes	QUANTITY	0.82+
March first week of March	DATE	0.82+
hundreds	QUANTITY	0.81+
10 years	QUANTITY	0.76+
four	DATE	0.76+
Forever	TITLE	0.76+
beginning	DATE	0.73+
SQL	TITLE	0.72+
2.0 data	QUANTITY	0.71+
Series	EVENT	0.68+
Sonar	COMMERCIAL_ITEM	0.67+
E	OTHER	0.64+
Series E	EVENT	0.64+
Free	ORGANIZATION	0.63+
Californias	LOCATION	0.59+
signups	QUANTITY	0.57+
Conversation	EVENT	0.56+
year	EVENT	0.53+
thousand	QUANTITY	0.48+
eight	DATE	0.46+
CPRA	ORGANIZATION	0.42+
CCPA	ORGANIZATION	0.34+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Tomer Shiran: