Mark Lyons, Dremio | CUBE Conversation
(bright upbeat music) >> Hey everyone. Welcome to this "CUBE Conversation" featuring Dremio. I'm your host, Lisa Martin. And I'm excited today to be joined by Mark Lyons the VP of product management at Dremio. Mark thanks for joining us today. >> Hey Lisa, thank you for having me. Looking forward to the top. >> Yeah. Talk to me about what's going on at Dremio. I had the chance to talk to your chief product officer Tomer Shiran in a couple months ago but talk to us about what's going on. >> Yeah, I remember that at re:Invent it's been an exciting few months since re:Invent here at Dremio and just in the new year we raised our Series E since then we ran into our subsurface event which we had over seven, 8,000 registrants and attendees. And then we announced our Dremio cloud product generally available including Dremio Sonar, which is SQL query engine and Dremio Arctic in public preview which is a better store for the lakehouse. >> Great. And we're going to dig into both of those. I saw that over 400 million raised in that Series E raising the valuation of Dremio to 2 billion. So a lot of growth and momentum going on at the company I'm sure. If we think about businesses in any industry they've made large investments in data warehouses, proprietary data warehouses. Talk to me about historically what they've been able to achieve, but then what some those bottlenecks are that they're running into. >> Yeah, for sure. My background is actually in the data warehouse space. I spent over the last eight, maybe close to 10 years and we've seen this shift go on from the traditional enterprise data warehouse to the data lake to the the last couple years is really been the time of the cloud data warehouse. And there's been a large amount of adoption of cloud data warehouses, but fundamentally they still come with a lot of the same challenges that have always existed with the data warehouse, which is first of all you have to load your data into it. So that data's coming from lots of different sources. In many cases, it's landing in a files in the data lake like a repository like S3 first. And then there's a loading process, right? An ETL process. And those pipelines have to be maintained and stay operational. And typically as the data warehouse life cycle of processing moves on the scope of the data that consumers get to access gets smaller and smaller. The control of that data gets tighter and change process gets heavier, and it goes from quick changes of adding a column or adding a field to a file to days if not weeks for businesses to modify their data pipelines and test new scenarios offer new features in the application or answer new questions that the business is interested you know, from an analytics standpoint. So typically we see the same thing even with these cloud data warehouses, the scope of the data shrinks, the time to get answers gets longer. And when new engines come along the same story we see, and this is going on right now in the data warehouse space there's new data that are coming and they say, well we're a thousand faster times faster than the last data warehouse. And then it's like, okay, great. But what's the process? The process is to migrate all your data to the new data warehouse, right? And that comes with all the same baggage. Again, it's a proprietary format that you load your data into. So I think people are ready for a change from that. >> People are not only ready for a change, but as every company has to become a data company these days and access to real time data is no longer a nice to have. It's absolutely essential. The ability to scale the ability to harness the value from as much data as possible and to do so fast is real really table stakes for any organization. How is Dremio helping customers in that situation to operationalize their data? >> Yeah, so that's why I was so intrigued and loved about Dremio when I joined three, four, five months back. Coming from the warehouse space, when I first saw the product I was just like, oh my gosh, this is so much easier for folks. They can access a larger scope of their data faster, which to your point, like is table stakes for all organizations these days they need to be able to analyze data sooner. Sooner is the better. Data has a halflife, right? Like it decays. The value of data decays over time. So typically the most valuable data is the newest data. And that all depends on what we're the industries we're talking about the types of data and the use cases, but it's always basically true that newer data is more valuable and they need to be able to analyze as much of it as possible. The story can't be, no, we have to wait weeks or months to get a new data source or the story can't be you know, that data that includes seasonality. You know, we weren't able to keep in the same location because it's too expensive to keep it in the warehouse or whatever. So for Dremio and our customers our story is simple, is leverage the data where it is so access data in all sorts of sources, whether it's a post press database or an S3 bucket, and don't move the data don't copy the data, analyze it in place. And don't limit the scope of the data you're trying to analyze. If you have new use cases you have additional data sets that you want to add to those use cases, just bring them in, into S3 and you are off to the races and you can easily analyze more data and give more power to the end user. So if there's a field that they want to calculate the simple change convert this miles field, the kilometers well, the end users should be empowered to just make a calculation on the data like that. That should not require an entire cycle through a data engineering team and a backlog and a ticket and pushing that to production and so forth which in many cases it does at many organizations. It's a lot of effort to make new calculations on the data or derive new fields, add a new column and so forth. So Dremio makes the data engineers life easier and more productive. It also makes the data consumers life much easier and happier, and they can just do their job without worrying about and waiting. >> Not only can they do their job but from a business, a high level perspective the business is probably has the opportunity to be far more competitive because it's got a bigger scope of data, as you mentioned, access to it more widely faster and those are only good things in terms of- >> More use cases, more experiments, right? So what I've seen a lot is like there's no shortage of ideas of what people can do with the data. And projects that might be able to be undertaken but no one knows exactly how valuable that will be. How whether that's something that should be funded or should not be funded. So like more use cases, more experiments try more things. Like if it's cheap to try these data problems and see if it's valuable to the business then that's better for the business. Ultimately the business will be more competitive. We'll be able to try more new products we'll be able to have better operational kind of efficiencies, lower risk all those things. >> Right. What about data governance? Talk to me about how the Lakehouse enables that across all these disparate data volumes. >> I think this is where things get really interesting with the Lakehouse concept relative to where we used to be with a data lake, which was a parking ground for just lots of files. And that came with a lot of challenges when you just had a lot of files out there in a data lake, whether that was HDFS, right. I do data lake back in the day or now a cloud storage object, storage data lake. So historically I feel like governance, access authentication, auditing all were extremely challenging with the data lake but now in the modern kind of lake in the modern lakehouse world, all those challenges have been solved. You have great everything from the front of the house with all and access policies and data masking everything that you would expect through commits and tables and transactions and inserts and updates and deletes, and auditing of that data able to see, well who made the changes to the data, which engine, which user when were they made and seeing the whole history of a table and not just one, not just a mess of files in a file store. So it's really come a long way. I feel like where the renaissance stage of the 2.0 data lakes or lakehouses as people call them. But basically what you're seeing is a lot of functionality from the traditional warehouse, all available in the lake. And warehouses had a lot of governance built in. And whether that is encryption and column access policies and row access policies. So only the right user saw the right data or some data masking. So that like the social security was masked out but the analyst knew it was a social security number. That was all there. Now that's all available on the lakehouse and you don't need to copy data into a data warehouse just to meet those type of requirements. Huge one is also deletes, right? Like I feel like deletes were one of the Achilles heels of the original data lake when there was no governance. And people were just copying data sets around modifying data sets for whatever their analytics use case was. If someone said, "Hey, go delete the right. To be forgotten GDPR." Now you've got Californias CCPA and others all coming online. If you said, go delete this per you know, this records or set of records from there from a lake original lake. I think that was impossible, probably for many people to do it with confidence, like to say that like I fully deleted this. Now with the Apache like iceberg cable format that is stores in the lakehouse architecture, you actually have delete functionality, right? Which is a key component that warehouses are traditionally brought to the table. >> That's a huge component from a compliance perspective. You mentioned GDPR, CCPA, which is going to be CPRA in less than a year, but there's so many other regulations data privacy regulations that are coming up that the ability to delete that is going to be table stakes for organizations, something that you guys launched. And we just have a couple minutes left, but you launched I love the name, the forever free data Lakehouse platform. That sounds great. Forever Free. Talk to me about what that really means is consisting of two products the Sonar and Arctic that you mentioned, but talk to me about this Forever Free data Lakehouse. >> Yeah. I feel like this is an amazing step forward in this, in the industry. And because of the Dremio cloud architecture, where the execution and data lives in the customer's cloud account we're able to basically say, hey, the Dremio software the Dremio service side of this platform is Forever Free for users. Now there is a paid tier but there's a standard tier that is truly forever free. Now that that still comes with infrastructure bills from like your cloud provider, right? So if you use AWS, you still have an S3 bill like for your data sets because we're not moving them. They're staying in your Amazon account in your S3 bucket. You still do still have to pay for right. The infrastructure, the EC2 and the compute to do the data analytics but the actual softwares is free forever. And there's no one else in our space offering that at in our space, everything's a free trial. So here's your $500 of credit. Come try my product. And what we're saying is with this kind of our unique architectural approach and this is what I think is preferred by customers too. You know, we take care of all the query planning all the engine management, all the administrative the platform, the upgrades fully available zero downtime platform. So they get all the benefits of SaaS as well as the benefits of maintaining control over their data. And because that data staying in their account and the execution of the analytics is staying in their account. We don't incur that infrastructure bill. So we can have a free forever tier a forever free tier of our platform. And we've had tremendous adoption. I think we announced this beginning of March first week of March. So it's not even the end of March. Hundreds and hundreds of signups and many customers actively are users actively on the platform now live querying their data >> Just kind of summarizes the momentum that Dremio we seeing. Mark, thank you so much. We're out of time, but thanks for talking to me- >> Thank you. >> About what's new at Dremio. What you guys are doing. Next time, we'll have to unpack this even more. I'm sure there's loads more we could talk about but we appreciate that. >> Yeah, this was great. Thank you, Lisa. Thank you. >> My pleasure for Mark Lyons. I'm Lisa Martin. Keep it right here on theCUBE your leader in high tech hybrid event coverage. (upbeat music)
SUMMARY :
the VP of product management at Dremio. Looking forward to the top. I had the chance to talk to and just in the new year of Dremio to 2 billion. the time to get answers gets longer. and to do so fast is and pushing that to Ultimately the business Talk to me about how the Lakehouse enables and auditing of that data able to see, that the ability to delete that and the compute to do the data analytics Just kind of summarizes the momentum but we appreciate that. Yeah, this was great. your leader in high tech
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Mark Lyons | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
$500 | QUANTITY | 0.99+ |
Lisa | PERSON | 0.99+ |
2 billion | QUANTITY | 0.99+ |
Mark | PERSON | 0.99+ |
Dremio | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Tomer Shiran | PERSON | 0.99+ |
Hundreds | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
less than a year | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
both | QUANTITY | 0.99+ |
end of March | DATE | 0.99+ |
today | DATE | 0.99+ |
over 400 million | QUANTITY | 0.98+ |
over seven, 8,000 registrants | QUANTITY | 0.98+ |
first | QUANTITY | 0.97+ |
Sonar | ORGANIZATION | 0.97+ |
Arctic | ORGANIZATION | 0.97+ |
Apache | ORGANIZATION | 0.96+ |
two products | QUANTITY | 0.96+ |
S3 | TITLE | 0.95+ |
Dremio Arctic | ORGANIZATION | 0.94+ |
EC2 | TITLE | 0.94+ |
Lakehouse | ORGANIZATION | 0.94+ |
CCPA | TITLE | 0.94+ |
couple months ago | DATE | 0.93+ |
re:Invent | EVENT | 0.87+ |
five months back | DATE | 0.86+ |
last couple years | DATE | 0.86+ |
three | DATE | 0.84+ |
one | QUANTITY | 0.84+ |
couple minutes | QUANTITY | 0.82+ |
March first week of March | DATE | 0.82+ |
hundreds | QUANTITY | 0.81+ |
10 years | QUANTITY | 0.76+ |
four | DATE | 0.76+ |
Forever | TITLE | 0.76+ |
beginning | DATE | 0.73+ |
SQL | TITLE | 0.72+ |
2.0 data | QUANTITY | 0.71+ |
Series | EVENT | 0.68+ |
Sonar | COMMERCIAL_ITEM | 0.67+ |
E | OTHER | 0.64+ |
Series E | EVENT | 0.64+ |
Free | ORGANIZATION | 0.63+ |
Californias | LOCATION | 0.59+ |
signups | QUANTITY | 0.57+ |
Conversation | EVENT | 0.56+ |
year | EVENT | 0.53+ |
thousand | QUANTITY | 0.48+ |
eight | DATE | 0.46+ |
CPRA | ORGANIZATION | 0.42+ |
CCPA | ORGANIZATION | 0.34+ |