Isha Sharma, Dremio | CUBE Conversation | March 2021
>>Well, welcome to the special cube conversation. I'm Jennifer with the cube, your host, we're here with Jeremy and Iisha Sharma director of product management for trim. We're going to talk about data, data lakes, the future of data, and how it works with cloud and in the new applications. Iisha thanks for joining me. >>Thank you for having me, John, >>You guys are a cutting-edge startup. You've got a lot of good action going on. You're kind of on the new, the new guard as Andy Jassy at AWS always talks about this. The old guard incumbents you guys are on the, on the new breed, you guys are doing the new stuff around data lakes and also making data accessible for customers. Uh, what, what is that all about? Take us through what is Dremio. >>So Dremio is the data Lake service that essentially allows you to very simply run SQL queries on directly on your data Lake storage, without having to make any of those copies that everybody's going on about all the time. So you're really able to get that fast time to value without having to have this long process of let's put in a request to my data team, let's make all of those copies and then finally get this very reduced scope of, of your data and still have to go back to your data team every time you need it, you need a change to that. So dreamy is bringing you that fast time to value with that. No copy data strategy, and really providing you the flexibility to keep your data in your data Lake storage, as the single source of truth. >>You know, the past 10 years, we've watched with cube coverage since we've been doing this program and in the community following from the early days of Hadoop to now, we've seen the trials and tribulations of ETL data warehousing. We've seen the starts and stops, and we've seen that the most successful formula has been store everything. Um, and then, you know, then the ease of use became a challenge. I don't want to have to hire really high powered engineers to manage certain kinds of clusters. I just got cloud now comes into the mix. I got on-premise storage, but the notion of a data Lake became hugely popular because it became a phrase meant store everything, and it meant different things to different peoples. And since then, teams of people have been hired to be the data teams. So it's kind of new. So I got to ask you, what is the challenge of these data teams? What do they look like? What's the psychology going on with some of the people on these teams? What problems are they solving what's going on? Because you know, they becoming data full >>To take >>Us through what's going on with data teams, >>To your point, the volumes, the variety of data, Eastern growing exponentially every day, there's really no end to it, right? And companies are looking to get their hands on as much data as they possibly can. So that means data teams in a position to how do I provide access to as many users as easily as possible that self service experience or data, um, and data democratization as much of a great concept as it is in theory, it comes with its own challenges in terms of all of those copies that ended up being created to provide the quote unquote self service experience. And then with all of these copies comes the cost to store all of them. And you've just added a tremendous amount of complexity and delayed your time to value significantly. >>You mentioned self-service is one of those things that seems like a moving train. Everyone I talked to is like, Oh, self-service is the Holy grail we've got to get to self-service almost. And then you get to some self serves, then you gotta, you gotta re rethink it cause more stuff's changing. So I have to ask in that capacity, you've got data architects and you've got analysts, the customer of the data. How's the, what's the relationship between those two is who gives and who gets, who drives it, who leans in to the analyst, feed the requirements into the architect, set up the boundaries. How is that relationship? Can you take us through how you guys view the relationship between the data analyst and architect? I mean data architect and the data analysts. >>Sure. So you have the data architect, the data team that's actually responsible for providing data access at the end of the day, right? They're the people that have the data democratization requirement on them. And so they've created these copies, tremendous amount of copies. A lot of the times the data Lake storage is, is that source of truth. But, um, you're copying your data into a data warehouse. And then what they end up doing is your, your end user, your analyst, they want, they all want different types of data. They want different views of this data. So there's a tremendous amount of personalized copies that the architects end up creating. And then on top of it, there's performance. We need to get everything back in a timely manner. Otherwise what's the point, right? Real time analytics. So there's all these performance related copies, whether that be additive tables or, you know, VI extract cues, all of that fun stuff. >>And so the architect is the one that's responsible for creating all of those. That's what they have to do to provide access to the analyst. And then, like I'm saying, when we need an update to that data set, when I discover that I have a new data set, that I need to join with an existing one, I have the analyst go to the data architect and say, Hey, by the way, I need this new data set. Can you make this usable for me? Or can you provide me access? And so then we did protect has to process that request now. And so again, coming back to all these copies that have been created, um, the data architect goes through a tremendous amount of work and almost, um, has, has to do this over and over again to actually make the data available to the analyst. But it's a cycle that goes on between the two. >>Yeah. It's interesting dynamic. It's a power dynamic, but also trying to get to the innovation. I've got to ask you, some people are saying that data copies are the major obstacle for democratization. How do you respond to that? What's your view? >>They absolutely are. Data copies are the complete opposite of data democratization. There's no aspect of self-service there, which is exactly what you're looking to do with data democratization. Um, because of those copies, how do you manage those? How do you govern those? How, uh, like I was saying, when somebody needs a new data set or an update to one, they have to go back to that data team. And there goes that self-service actually Dana coffees create a bottleneck because it all comes back to that data team that has to continue to get through those requests that are coming in from their analysts. So, uh, data copies and data democratization is completely automated. >>You know, I remember talking to David latte in a cube event two years ago, he said infrastructure as code was the big DevOps movement. And we felt that data ops would be something similar where data as code, where you didn't have to think about it. So you're kind of getting to this idea of, you know, copies are bad because it doesn't, it holds back that self-service this modern error is looking for more of programmability with data. Kind of what you're teasing out here is that's the modern architecture. Is that how you see it? How do, how do you see, uh, a, uh, a modern data architecture? >>Yeah, so the modern data or the data architecture has evolved significantly in the last several years, right? We started with traditional data warehouses and the traditional data Lake with Duke where the storage and compute were totally tightly coupled. And then we moved on to cloud data warehouses, where there was a separation of compute and storage, and that provided a little more flexibility there. But then with the modern data architecture now with cloud data lakes, you have this aspect of separating, not only storage and compute, but also compute data. So that creates a separate tier for data altogether. What does that look like? So you have your data and your feeling storage as three ATLs, whatever it may be. And on top of that. So of course it's an open format, right? And so on top of that, thanks to technology. It's like Apache iceberg and Delta Lake. There's this ability to give your files, your data, a table structure. And so that starts to bring the capabilities that a data warehouse was providing the data. Thanks to these. You have the ability to do transactions, record level mutations, burgeoning things that were missing completely from a data Lake architecture before. And so, um, introducing that, that data to your, having that separation of compute and data really, really accelerate the ability to get that time to value because you're keeping your data in the data Lake storage at the end of the day. >>And it's interesting, you see all the hot companies tend to be, have that kind of mindset and architecture, and it's creating new opportunities as a ton of white space. So I have to kind of ask you guys, how does Dremio fit into this because you guys are playing in this kind of the new wave here with data it's growing extremely, it's moving fast. You got, again, edge is developing more. Data's coming in at the edge. You've got hybrid testing multi-cloud environments on the horizon. I mean this ultimate multicloud, but I mean, data in real time across multiple clouds is the next kind of area people are focused on. What does, what's the role of GMU and all this to take, take us through that. >>Yeah. So Dremio provides, again, like I said, this data Lake service, and we're all referring to just storage or Hadoop. When we say data Lake, we're talking about an entire solution. Um, so you keep your data, you keep your data in your data, Lake orange. And then on top of that, with the integrations that Dremio has with Apache iceberg and Delta, like we do provide that data here that I was talking about. And so you've given your data, this table structure, and now you can operate on it like you would in a data warehouse. So there's really no need to move your data from a data Lake data warehouse, again, keeping that data Lake as that source of truth. And then on top of that, um, when we talk about copies, personalized copies, performance related copies, you, you really, like I was saying, you've created so much complexity with Jeremy of you don't do that when it comes to personalized copies, we've got the semantic layer and that's a very key aspect of Dremio where you can provide as many views of, of data that you want without having to make any copies. So it really accelerates that, that data democratization story, and then when it, >>So it's the no cop, my strategy trim, you guys are on it, but you're about no copy keeps semantic layer, have that be horizontal across whatever environment and just applications have, can applications tap into this, or how do you guys integrate into apps if I'm an app developer, for instance, how does that work? >>Of course. So that's, that's one of the most important use cases in the sense that when there's an application or even when it's a, you know, a BI client or some other tool that's tapping into the data in S3 or ATLs, a lot of people see performance degradation. Typically with the Dremio, that's not the case we've got, Aeroflight integrated into Tremino, it's a key component as well. And that puts so much, uh, it, so put so much ease in terms of running dashboards off of that, running your analytics apps off of that, because that replay can deliver 20 times the performance that PIO DBC could. So coming back to the no data strategy or note copy data strategy, there's no those local copies anymore that you needed to make. >>So one of the things I got to ask you is, cause this comes up all the time. So she had less pass re-invent. I notice again, Amazon was, I was banging on this hard Azure as well on their side too. Their whole thing is we want to take the AI environment and make it so that people can normal people can use it and deploy machine learning. The same thing kind of comes down into this layer where you're talking about is this democratization is a huge trend because you don't have to be super peaked, you know, math, PhD, data scientist, or ETL, or data Wrangler. You just want to actually code the data or play party with the data in any way you want to do with it. So, so the question I have is is that that's certainly a great trend and no one debates that, but the reality is people are storing data, like almost hoarding it, just throw it in a data Lake and we'll deal with them later. How does you guys solve that problem? Because once that starts happening, do you have to hire someone super smart to dig that out or rearchitected or because that seems to be kind of the pattern, right? You know, throw everything into data Lake, uh, and we'll deal with it later >>Called the data swamp. And it's like, no one knows what's going on. >>Of course though, you don't actually want to throw everything into a data Lake. There still needs to be a certain amount of structure that all of this lands in. You want it to live in one place, but have still a little bit of structure so that, um, Dremio and other are, are much more enabled to query that with fantastic performance. So there's, there's still some amount of structure that needs to happen at a data Lake level, but from, uh, that semantic layer that we have with during the, you you're, you're creating structure for your end user, >>How would you advise, how would you advise someone who wants to hedge their future and not take on too much technical debt, but says, Hey, you know, I do have the store. Is there a best practice on kind of some guard rails around getting going, how do you, how do you advise your customers who want to get it going? >>So how we advise our customers is again, plugin put your, put your data in that data Lake. A lot of them already have three TLS in place. And getting started with Bermeo is really easy. I would say I did it for the first time and it took a matter of minutes if not less. And so what you're doing with Dremio is connecting data directly to that data source and then creating a semantic layer on top. So you bring together a bunch of data. That's sitting in your data Lake, you know, if that sales data and Sophia, and we give you a really streamlined way to say together, the, you know, last, however, we go back in time, create a view on top of all of that. If you have that structured in folders as great, we will provide you a way to create one view on top of all of that, as opposed to having a view for every day or whatnot. And so again, that semantic layer really comes in handy when you're trying to, as the architect provide access to this data Lake. And then as the user who just, just interacts with the data as, as the views are provided to them, there's really, uh, there's a whole lot of transparency there, and it's really easy to get up and running with drumming. >>I'm looking forward to it. I got to finally ask the question is how do I get started? How do people engage with you guys? Is it, is it a freemium? Is it a cloud service? What's the requirements? What are some of the ways that people can engage and work with you guys? >>Yeah, so we get started, uh, on our website at dot com. And speaking of self-service, we've got a virtual lab at dremio.com/labs that you can get started with that gives you a product tour and even gives you a getting started, walk through the tissue through your first query so that you can see how well it works. And in addition to that, we've got a free trial of Dremio available on AWS marketplace. >>Awesome. Net marketplace is a good place to download stuff. So can I ask you a personal question, Isha? Um, you're the director of product management. You get to see inside the kitchen where everyone's making the, making the product. You also got the customer relationships out there looking at product market fit, as it evolves, customer's requirements evolve. What's some of the cool things that you've seen in this space. That's just interesting to you that either you kind of expected or maybe some surprises, what's the coolest thing you've seen come out of this new data environment we're living in. >>I think just the ability to the way things have evolved, right? It used to be data Lake or data warehouse, and you pick one, you probably have both, but you're not like reaching either to their highest potential. Now you've got, this is coming together of both of them. I think it's been fantastic to see how you've got technology is like a iceberg and Delta Lake and bringing those two things together. And you know, you're in your data Lake and it's great in terms of cost and storage and all of that. But now you're able to have so much flexibility in terms of some of those data warehouse capabilities. And on top of that with technologies like Dremio, and just in general, this open format concept, you're, you're never locked in with a particular vendor with a particular format. You're not locking yourself out of a technology that you don't even know exists yet. And thinking in the past, you were always going to end up there. You always ended up putting your data in something where it was going to be difficult to change it, to get it out. But now you have so much flexibility with the open architecture that's coming. What's the DNA like of the >>Culture at Treme. And obviously you've got a cutting edge. We're in a big, hot wave data. You're enabling a lot of value. Uh, what's the, what's it like there at Jemena? What do you guys strive for? What's the purpose? What's the, what's the DNA of the culture. >>There's a lot of excitement in terms of getting customers to this flexibility, to get them out of things they're locked into really in providing them with accessibility to their data, right? This data access data democratization concept to make that actually happen so that, you know, time to value is a key thing. You want to derive insights out of your, out of your data. And everybody, I drove you in super excited and charging towards that, >>Unlocking that value. That's awesome. Aisha, thank you for coming on the cube conversation. Great to see you. Thanks for coming on. Appreciate it. He's just Sharma director of product management. Dremio here inside the cube. I'm John for your host. Thanks for watching.
SUMMARY :
We're going to talk about data, data lakes, the future of data, you guys are on the, on the new breed, you guys are doing the new stuff around data lakes and also So Dremio is the data Lake service that essentially allows you to very following from the early days of Hadoop to now, we've seen the trials and tribulations of ETL So that means data teams in a position to And then you get to some self serves, then you gotta, you gotta re rethink it cause more A lot of the times the data Lake storage one, I have the analyst go to the data architect and say, Hey, by the way, How do you respond to that? Um, because of those copies, how do you manage those? Is that how you see it? the modern data architecture now with cloud data lakes, you have this aspect So I have to kind of ask you guys, how does Dremio fit So there's really no need to move your data from a data Lake that when there's an application or even when it's a, you know, a BI client or So one of the things I got to ask you is, cause this comes up all the time. And it's like, no one knows what's going on. that semantic layer that we have with during the, you you're, you're creating structure for your end user, How would you advise, how would you advise someone who wants to hedge their future and not take So you bring together a bunch of data. What are some of the ways that people can engage and work with you guys? so that you can see how well it works. That's just interesting to you that either you kind of expected or maybe some surprises, And you know, you're in your data Lake and it's great in terms What do you guys strive for? make that actually happen so that, you know, time to value is a Aisha, thank you for coming on the cube conversation.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeremy | PERSON | 0.99+ |
Andy Jassy | PERSON | 0.99+ |
Aisha | PERSON | 0.99+ |
March 2021 | DATE | 0.99+ |
Isha Sharma | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Iisha Sharma | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Dremio | PERSON | 0.99+ |
20 times | QUANTITY | 0.99+ |
John | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Iisha | PERSON | 0.99+ |
Dremio | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Sharma | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
Sophia | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
GMU | ORGANIZATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
David latte | PERSON | 0.99+ |
two years ago | DATE | 0.98+ |
first time | QUANTITY | 0.98+ |
Delta | ORGANIZATION | 0.98+ |
first query | QUANTITY | 0.98+ |
Bermeo | ORGANIZATION | 0.98+ |
Duke | ORGANIZATION | 0.98+ |
dremio.com/labs | OTHER | 0.95+ |
S3 | TITLE | 0.95+ |
dot com | ORGANIZATION | 0.95+ |
Apache iceberg | ORGANIZATION | 0.94+ |
SQL | TITLE | 0.93+ |
Jemena | ORGANIZATION | 0.93+ |
one place | QUANTITY | 0.92+ |
Azure | TITLE | 0.91+ |
Isha | PERSON | 0.9+ |
single source | QUANTITY | 0.88+ |
one view | QUANTITY | 0.83+ |
Dana coffees | ORGANIZATION | 0.8+ |
past 10 years | DATE | 0.73+ |
last several years | DATE | 0.73+ |
Treme | ORGANIZATION | 0.72+ |
three | QUANTITY | 0.71+ |
Lake | ORGANIZATION | 0.68+ |
Dremio | TITLE | 0.64+ |
Aeroflight | TITLE | 0.64+ |
Tremino | TITLE | 0.57+ |
Delta Lake | ORGANIZATION | 0.56+ |
dreamy | PERSON | 0.55+ |
Lake | LOCATION | 0.46+ |