Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps

(upbeat music) >> Welcome to today's session of the AWS Startup Showcase, featuring Dremio. I'm your host, Lisa Martin. And today we're joined by Robert Maybin, Principal Architect at Dremio. Robert is going to talk to us about democratizing your data by eliminating data copies. Robert, welcome. It's great to have you in today's session. >> Great. Thank you, Lisa. It's great to be here. >> So talk to me a little bit about why data copies, as Dremio says, are the key obstacle to data democratization? >> Oh, sure. Sure. Well, I think when you think about data democratization and really what that means, what people mean when they talk about data democratization, what they're really speaking to is kind of the desire for people in the organization to be able to, you know, work with the enterprises data, discover data, really, in a more self-service way. And you know, when you think about democratization, you might say, "Well, what's wrong with copies? What could be more democratic than giving everybody their own copy of the data?" But I think when you really think about that and how that ties into, you know, traditional architectures and environments, there are a lot of problems that come with copies, and those are real impediments. And so, you know, traditionally, in the data warehousing world, what often happens is that there are numerous sources of data that are coming in in all different formats, all different structures. These things, typically, for people that query them, have got to be, you know, loaded into some sort of a data warehousing tool. You know, maybe they land in cloud storage, but before they can be queried, you know, somebody has to go in and basically reformat those data sets, transform them in ways that make them more useful and make them more performant. And so this is very, very common. Like I think many, many organizations do this, and it makes a lot of sense to do it, because, you know, traditionally, the formats of the data is sourced in is pretty hard to work with and it's very slow to query. So copies is kind of a natural thing to do, but it comes at a real cost, right? There's a tremendous complexity that can come about, and having to do all these transformations. There's a real dollar cost, and there's a lot of time involved too. So, you know, if you could kind of take all of these middle steps out, where you're copying and transforming, and then transforming again, and then, potentially, persisting very high-performance structures for fast BI queries, you can reduce a lot of those impediments. >> So talk to me about... Oh, I'm sorry. Go ahead. >> Go ahead. >> I was just going to say, you know, of the things that is even in more demand now is the need for real time data access. I think real-time is no longer a nice-to-have. And I think what we've been through the last year has really shown that. So given the legacy architectures and some of the challenges with copies being an obstacle to that true democratization, how can data teams actually get in there and solve this challenge? >> Yeah, so, you know, I think going back a little bit to the prior question, and I can fill out a little bit more of the detail, and that'll lead us to your point, that one of the things that is also really born as a cost, when you have to go through and make multiple copies, is that, you know, typically you need experts in the organization, who are the ones who are going to, you know, write the ETL scripts, or, you know, kind of do the data architecture and design the structures that have to be performant for real-time BI queries, right? So typically these take the form of things like, you know, OLAP cubes, or, you know, big flattened data structures with all of the attributes joined in, or there's a lot of different ways that you can get query performance. Typically that's not available directly against the source data. So, you know, one of the things that data teams can do, and, you know, there's really two ways to go about this, right? One is you can really go all in on the data copy approach, and kind of home grow or build yourself a lot of the automation and tooling, and, you know, parts that it would take to basically transform the data. You can build UIs for people to go in, and kind of request data, and you can automate this whole process. And we found that a number of large organizations have actually gone this route. And they've kind of been at these projects for, in some cases, years, and they're still not completely there. And so I wouldn't really recommend that approach. I think that the real approach, and this is really available today with kind of the the rise of cloud technologies, is that we can shift our thinking a bit, right? And so we can think about how do we take some of these, you know, features and capabilities that one would expect in a data warehousing environment, and how can we bring that directly to the data? So, you know, with the shift in thinking, it requires kind of new technology to do this, right? So if you could imagine a lot of these traditional data warehousing features, like interactive speed, and, you know, the ability to kind of build structures, or, you know, views or things on top of your data, but do that directly on the data itself without having to transform and copy, transform and copy. So that's really something that we kind of call the next generation data lake architecture, is bringing those capabilities directly to the data that's on the lake. >> So leaving the data where it is, next generation is a term like future-ready, that's used a lot. Let's unpack that and dig into why what you're talking about is the next generation data lake architecture. >> Sure, sure. And I think to talk about that, the first thing that we really have to discuss is, really, a fundamental shift in technologies that's come about really in the last few years. So, you know, as really cloud services, like AWS, who've have risen to prominence, there are some capabilities that are available to us now that just weren't, you know, three, four or five years ago. And so what we can do now is that we have the ability to truly separate compute and storage, connected together with really fast networking. And we can, you know, provision storage, and we can provision compute. And from the perspective of the user, those two things can basically be scaled infinitely, right? And if you contrast that with what used to have to happen, or what we used to have to do in platforms like Hadoop or in scale-out MPP data warehouses, is that we didn't have, not only the the flexibility to scale compute and storage independently, but we didn't have the kind of networking that we have today. And so it was a requirement to take, you know, basically the compute, and push it as close to the data as we could, which is what you would get in a large Hadoop cluster. You've got, you know, nodes, which have compute right next to the storage, and you try to push as much work as you can onto each node before you start to transfer the data to other nodes for further processing. And now what we've got with some of the new cloud technology is the ability to, basically, do away with that requirement, right? So now we can have very, very large provision pools of data that can grow and grow and grow, really, without the limitations of nodes of hardware. And we can spin up and down compute process that. And the thing that we need, though, is a way of processing it, a query processing engine that's built for those dynamics, right? That's built, so that it performs really, really well when compute and storage are decoupled. So I think that that's really the trick, is that once we really, you know, come into the fact that we've got this new paradigm with separate compute, separate storage, very fast networking, if we start to look for technologies that can scale out and back, and do really performance query in that environment, then that's really what we're talking about. Now, I think the very last piece, and what I would call kind of next gen data lake architecture, is very common even today for organizations to have a data lake, right? That contains a tremendous amount of data, but in order to do actual BI queries at that interactive speed that people expect, they still have to take portions of the data from the lake and go load it into a warehouse, right? And then probably from there build, you know, OLAP cubes, or, you know, extracts into a BI tool. So the last piece, really, in the next gen data lake architecture puzzle, is once you've got that fast query engine foundation, how do you then move those interactive workloads into that platform, so they don't have to be in a data warehouse, right? How do you take some of those data warehousing expectations and put those into a platform that can query data directly? So that that's really what the next generation means to us. >> So let's talk about Dremio now. I see that just in January of 2021, Series D funding of $135 million. And then I saw that Datanami actually coined Dremio as a unicorn, as it's reached a $1 billion valuation. Talk to us about what Dremio is, and how you're part of this modern data architecture. >> Absolutely. Yeah. So, you know, you can think about Dremio as a... You know, in the technology context, really, is solving that problem that I just laid out, which is we're in the business of, you know, building technology that allows users to query very large data sets in a scale-out, very performant way, you know, directly on the data where it lives. So there's no real need for data movement. And in fact, we can also not only query one source of data, but we can query multiple sources of data, and, you know, join those things together in the context of the same query. So, you know, you may have most of your data in a data lake, but then you may have some relational sources. So there's a potent story there, in that you don't have to consolidate all of your data into one place. You don't have to load all of your data into, you know, a data warehouse or a cloud data warehouse. You can query it where it is. That's the first piece. I think the next piece that the Dremio provides is kind of, as we mentioned before, we're giving almost a data warehouse-like user experience in terms of very, very fast response times for things like BI dashboards, right? So really interactive queries. And the ability to do things, like you would normally expect to do inside a warehouse. So you can, you know, create schemas, for instance, you can create layers of views and accelerations, and effectively allow users to build out virtually in the form of views, what they would have done before with all of their various ETL pipelines, to, you know, scrub and prepare and transform the data to get it in shape to query. And at the very end, what we can do is selectively, kind of in an internally managed way, we can accelerate certain query patterns by creating something that we call reflections, which is an internally managed, you know, persistence of data that accelerates certain queries, but it's entirely internally managed by Dremio. The user doesn't have to worry with anything to do with setup, or configuration, or clean up, or maintenance, or any of that stuff. >> So does reflections really provide a differentiator for Dremio, if you look in the market and you see competitors, like Snowflake, SingleStore, for example, is this really kind of that competitive differentiator? >> I think it's one of them. I think the ability to create reflections is it's certainly a differentiator, because what it allows is it allows you to basically accelerate different kinds of query patterns against the same underlying source data, right? So rather than have to go build a transformation for a user, that, you know, potentially aggregates data a certain way, and persist that somewhere, and have to build all the machinery to do that and maintain it, in Dremio, literally, it's a button click. You can, you know, go in and look at the dataset, identify those dimensions that you need to, say, aggregate by, the measures that you want to compute, and Dremio will just manage that for you, and any query that comes in, that may be going after this massive detail table with a trillion rows, that has a GROUP BY in it, for instance, will just match that reflection and use it. And that query can respond in less than a second, where typically the work that would have to happen on the backend engine might take a minute to process that query. So really that's the edge piece that gives us that BI acceleration without having to use additional tools or in any additional complexity for the user. >> And I assume you're talking about like millisecond response times, right? You said under a second, but I'm sure milliseconds? >> Hundreds of milliseconds, typically. So we're not really in the one to two millisecond range. That's pretty, pretty rare (chuckles), but certainly sub-second response times is very, very common with very, very large backend data sets when you use reflections, mm-hmm. >> Got it, and that speed and performance is absolutely table stakes today for organizations to succeed and thrive. So is what Dremio delivers a no-copy data strategy? Is that what you consider it? >> It's that, and it's actually much more than that, right? So I think, you know, when you talk to, really, users of the platform, there are a number of layers of Dremio, and, you know, we often get asked, I get asked, you know, who are our direct competitors, right? And I think that when you think about that question, it's really interesting, because we're not just the backend high-performance query engine. We aren't just the acceleration layer, right? We also have a very rich, fully-featured UI environment, that allows users to actually log in, find data, curate data, you know, reflect data, build their own views, et cetera. So there's really a whole suite of services that are built in to the Dremio platform, that make it very, very easy to install Dremio on, you know... You know, install it on AWS, get started right away, and be querying data, kind of building these virtual views, adding accelerations. All this can happen within minutes. And so it's really interesting that there's kind of a wide spectrum of services that allow us to really power a data lake in its entirety, really, without too many other technologies that have to be involved there. >> What are some of key use cases that you've seen, especially in the last year, as we've seen this rapid acceleration of digital transformation, this adoption of SaaS applications, more and more and more data, some of those key use cases that Dremio is helping customers solve? >> Sure. Yeah. I think there's a number of verticals, and there's some that I'm very familiar with, because I've worked very closely with customers, and in financial services is a large one, you know, and that would include, you know, banking, insurance, investment, you know, a lot of the large fortune 500 companies that maybe in manufacturing, or, you know, transportation, shipping, et cetera. You know, I think lately I'm most familiar with some of the transformation that's going on in the financial services space, and what's happening there, you know, companies have typically started with very, very large data warehouses, and often for the last four or five years, maybe a little longer, they've been in this transition to building kind of an in-house data lake, typically on a Hadoop platform of some flavor, with a lot of additional services that they've created to try to enable this data democratization. But these are huge efforts. And, you know, typically these are on-prem, and, you know, lots of engineers working on these things, really, full-time, to build out this full spectrum of capabilities. The way that Dremio really impacts that is, you know, we can come in and actually take the place of a lot of parts of that puzzle. And we give a really rich experience to the user, you know, allow customers to kind of retire some of these acceleration layers that they've put in to try to make BI queries fast, get rid of a lot of the transformations, like the ETL jobs or ELT processes that have to run. So, you know, there's a really wide swath of that puzzle that we can solve. And then when you look at the cloud, because all of these organizations, they've got a toe in the water, or they're halfway down the path, of really exploring how do we take all of this on-prem data and processing and everything else, and get it into AWS, you know, put it in the cloud? What does that architecture look like? And we're ideally positioned for that story. You know, we've got an offering that runs, you know, natively on AWS, and takes full advantage of kind of the decoupling of compute and storage. So we give organizations a really good path to solve some of their on-prem problems today, and then give them a clear path as they migrate into cloud. >> Can you walk me through a customer example that you think really underscores what you just described as what Dremio delivers, and helping customers with this migration, and to be able to take advantage and find value in volumes and volumes of data? >> Yeah, absolutely. Unfortunately, I can't mention their name, but I have worked very, very closely with a large customer, as I mentioned in financial services. And one of the things that they're very keenly interested in is, you know, they've had a pretty large deployment that traditionally has been both Hadoop-based, and they've got a large, several large on-prem relational data warehouses as well. And Dremio has been able to come in and actually provide that BI performance piece, basically, you know, the very, very fast, you know, second, two second, three-second performance that people would expect from the data warehouse, but we're able to do that directly on, you know, the files and tables that are in their Hadoop cluster. And that project's been going on for quite some time, and we've had success there. I think that where it really starts to get exciting though, and this is just beginning, is this customer also is, you know, investigating and actually prototyping and building out a lot of these functions in the AWS cloud. And so, you know, the nice thing that we're able to offer is, really, a consistent technology stack, consistent interfaces you know, consistent look and feel of the UI, both on-prem and in the cloud. And so we can really, once they start that move, now they've got kind of the familiar place to connect to for their data and to run their queries. And that's a nice seamless transition as they migrate. >> What about other verticals? Like, I can imagine healthcare and government services, are you seeing traction in those segments as well? >> Yeah, absolutely. We are. There are a number of companies in the healthcare space. I think that one of the larger ones in the government space, which I have some exposure to, is CMS, which is one that we had done some work through a partner to implement Dremio there. And, you know, this was a project, I think, that was undertaken about a year ago. They implemented our technology as part of a larger data lake architecture, and had a good bit of success there. So what's been interesting, when you talk about the funding and the valuation, and the kind of the buzz that's going on around Dremio is that we really have customers in so many different verticals, right? So we've got certainly financials and healthcare, and, you know, insurance, and, you know, big commercials, like in manufacturing, et cetera. So we're seeing a lot of interest across a number of different verticals, and customers are are buying and implementing the product in all those verticals, yeah. >> All right, so take us out with where customers can go, and prospects that are interested, and even investors, in finding out more about this next generation data engine that is Dremio. >> Absolutely. So I think the first thing that people can do is they can go to our website, which is dremio.com, and they can go to dremio.com/labs. And from there they can launch a self-guided product tour. I think that's probably a very quick way to get an overview of the product, and who we are, what we do, what we offer. And then there's also a free trial that's actually on the AWS marketplace. So if you want to actually try Dremio out, and, you know, spin up an instance, you can get us on the marketplace. >> Do most of your customers do that, like doing a trial with a proof of concept, for example, to see really how, from an architecture perspective, how these technologies are synergistic? >> Absolutely. Yeah. I think that probably every large enterprise, you know, there's a number of ways that customers find us. And so, you know, often customers may just try the trial on the marketplace. But, you know, customers may also, you know, reach out to our sales team, et cetera, but it's very, very common for us to do a proof of concept, that's not just architecture, but it would cover, you know, performance requirements and things like that. So I think pretty much all of our very largest enterprise customers would go through some sort of a proof of concept, and that would be done with the support of our field teams. >> Excellent, well, Robert, thanks for joining me today, and sharing all about Dremio with our audience. We appreciate your time. >> Great. Thank you, Lisa. It was a pleasure. >> Likewise, for Robert Maybin, I'm Lisa Martin. Thanks for watching. (upbeat music)

Published Date : Mar 24 2021

SUMMARY :

have you in today's session. It's great to be here. have got to be, you know, So talk to me about... you know, of the things that is that, you know, So leaving the data where it is, is that once we really, you know, Talk to us about what Dremio is, in that you don't have to You can, you know, go in when you use reflections, mm-hmm. Is that what you consider it? So I think, you know, when you talk you know, a lot of the And so, you know, the nice and, you know, insurance, and prospects that are interested, and, you know, spin up an instance, And so, you know, often customers and sharing all about It was a pleasure. Likewise, for Robert Maybin,

ENTITIES

Entity	Category	Confidence
Robert Maybin	PERSON	0.99+
Robert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
January of 2021	DATE	0.99+
Lisa	PERSON	0.99+
$1 billion	QUANTITY	0.99+
Dremio	ORGANIZATION	0.99+
$135 million	QUANTITY	0.99+
less than a second	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Datanami	ORGANIZATION	0.99+
first piece	QUANTITY	0.99+
today	DATE	0.99+
dremio.com/labs	OTHER	0.99+
Dremio	TITLE	0.99+
two ways	QUANTITY	0.99+
last year	DATE	0.99+
One	QUANTITY	0.99+
500 companies	QUANTITY	0.98+
one	QUANTITY	0.98+
first thing	QUANTITY	0.98+
Hundreds of milliseconds	QUANTITY	0.98+
two things	QUANTITY	0.98+
both	QUANTITY	0.98+
under a second	QUANTITY	0.97+
Dremio	PERSON	0.97+
dremio.com	OTHER	0.97+
first thing	QUANTITY	0.97+
one source	QUANTITY	0.96+
second	QUANTITY	0.96+
each node	QUANTITY	0.95+
three-second	QUANTITY	0.94+
Snowflake	ORGANIZATION	0.94+
five years ago	DATE	0.94+
SingleStore	ORGANIZATION	0.92+
four	DATE	0.91+
AWS Startup Showcase	EVENT	0.9+
three	DATE	0.9+
two millisecond	QUANTITY	0.89+
a minute	QUANTITY	0.87+
about	DATE	0.86+
one place	QUANTITY	0.85+
five years	QUANTITY	0.82+
two second	QUANTITY	0.81+
a year ago	DATE	0.79+
trillion rows	QUANTITY	0.79+
last few years	DATE	0.68+
Hadoop	TITLE	0.61+
last	DATE	0.55+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Datanami: