Tomer Shiran, Dremio | AWS re:Invent 2022

>>Hey everyone. Welcome back to Las Vegas. It's the Cube live at AWS Reinvent 2022. This is our fourth day of coverage. Lisa Martin here with Paul Gillen. Paul, we started Monday night, we filmed and streamed for about three hours. We have had shammed pack days, Tuesday, Wednesday, Thursday. What's your takeaway? >>We're routed final turn as we, as we head into the home stretch. Yeah. This is as it has been since the beginning, this show with a lot of energy. I'm amazed for the fourth day of a conference, how many people are still here I am too. And how, and how active they are and how full the sessions are. Huge. Proud for the keynote this morning. You don't see that at most of the day four conferences. Everyone's on their way home. So, so people come here to learn and they're, and they're still >>Learning. They are still learning. And we're gonna help continue that learning path. We have an alumni back with us, Toron joins us, the CPO and co-founder of Dremeo. Tomer, it's great to have you back on the program. >>Yeah, thanks for, for having me here. And thanks for keeping the, the best session for the fourth day. >>Yeah, you're right. I like that. That's a good mojo to come into this interview with Tomer. So last year, last time I saw you was a year ago here in Vegas at Reinvent 21. We talked about the growth of data lakes and the data lake houses. We talked about the need for open data architectures as opposed to data warehouses. And the headline of the Silicon Angle's article on the interview we did with you was, Dremio Predicts 2022 will be the year open data architectures replace the data warehouse. We're almost done with 2022. Has that prediction come true? >>Yeah, I think, I think we're seeing almost every company out there, certainly in the enterprise, adopting data lake, data lakehouse technology, embracing open source kind of file and table formats. And, and so I think that's definitely happening. Of course, nothing goes away. So, you know, data warehouses don't go away in, in a year and actually don't go away ever. We still have mainframes around, but certainly the trends are, are all pointing in that direction. >>Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, what it really means for organizations. >>Yeah. I think you could think of the data lakehouse as the evolution of the data lake, right? And so, you know, for, for, you know, the last decade we've had kind of these two options, data lakes and data warehouses and, you know, warehouses, you know, having good SQL support, but, and good performance. But you had to spend a lot of time and effort getting data into the warehouse. You got locked into them, very, very expensive. That's a big problem now. And data lakes, you know, more open, more scalable, but had all sorts of kind of limitations. And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache Iceberg, is we've unlocked all the capabilities of the warehouse directly on object storage like s3. So you can insert and update and delete individual records. You can do transactions, you can do all the things you could do with a, a database directly in kind of open formats without getting locked in at a much lower cost. >>But you're still dealing with semi-structured data as opposed to structured data. And there's, there's work that has to be done to get that into a usable form. That's where Drio excels. What, what has been happening in that area to, to make, I mean, is it formats like j s o that are, are enabling this to happen? How, how we advancing the cause of making semi-structured data usable? Yeah, >>Well, I think first of all, you know, I think that's all changed. I think that was maybe true for the original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. It's all, it's all tables with the schema. And, you know, you can, you know, create table insert records. You know, it's, it's, it's really everything you can do with a data warehouse you can now do in the lakehouse. Now, that's not to say that there aren't like very advanced capabilities when it comes to, you know, j s O and nested data and kind of sparse data. You know, we excel in that as well. But we're really seeing kind of the lakehouse take over the, the bread and butter data warehouse use cases. >>You mentioned open a minute ago. Talk about why it's, why open is important and the value that it can deliver for customers. >>Yeah, well, I think if you look back in time and you see all the challenges that companies have had with kind of traditional data architectures, right? The, the, the, a lot of that comes from the, the, the problems with data warehouses. The fact that they are, you know, they're very expensive. The data is, you have to ingest it into the data warehouse in order to query it. And then it's almost impossible to get off of these systems, right? It takes an enormous effort, tremendous cost to get off of them. And so you're kinda locked in and that's a big problem, right? You also, you're dependent on that one data warehouse vendor, right? You can only do things with that data that the warehouse vendor supports. And if you contrast that to data lakehouse and open architectures where the data is stored in entirely open formats. >>So things like par files and Apache iceberg tables, that means you can use any engine on that data. You can use s SQL Query Engine, you can use Spark, you can use flin. You know, there's a dozen different engines that you can use on that, both at the same time. But also in the future, if you ever wanted to try something new that comes out, some new open source innovation, some new startup, you just take it and point out the same data. So that data's now at the core, at the center of the architecture as opposed to some, you know, vendors logo. Yeah. >>Amazon seems to be bought into the Lakehouse concept. It has big announcements on day two about eliminating the ETL stage between RDS and Redshift. Do you see the cloud vendors as pushing this concept forward? >>Yeah, a hundred percent. I mean, I'm, I'm Amazon's a great, great partner of ours. We work with, you know, probably 10 different teams there. Everything from, you know, the S3 team, the, the glue team, the click site team, you know, everything in between. And, you know, their embracement of the, the, the lake house architecture, the fact that they adopted Iceberg as their primary table format. I think that's exciting as an industry. We're all coming together around standard, standard ways to represent data so that at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account in open formats and be able to use all these different engines without losing any of the functionality that they need, right? The ability to do all these interactions with data that maybe in the past you would have to move the data into a database or, or warehouse in order to do, you just don't have to do that anymore. Speaking >>Of functionality, talk about what's new this year with drio since we've seen you last. >>Yeah, there's a lot of, a lot of new things with, with Drio. So yeah, we now have full Apache iceberg support, you know, with DML commands, you can do inserts, updates, deletes, you know, copy into all, all that kind of stuff is now, you know, fully supported native part of the platform. We, we now offer kind of two flavors of dr. We have, you know, Dr. Cloud, which is our SaaS version fully hosted. You sign up with your Google or, you know, Azure account and, and, and you're up in, you're up and running in, in, in a minute. And then dral software, which you can self host usually in the cloud, but even, even even outside of the cloud. And then we're also very excited about this new idea of data as code. And so we've introduced a new product that's now in preview called Dr. >>Arctic. And the idea there is to bring the concepts of GI or GitHub to the world of data. So things like being able to create a branch and work in isolation. If you're a data scientist, you wanna experiment on your own without impacting other people, or you're a data engineer and you're ingesting data, you want to transform it and test it before you expose it to others. You can do that in a branch. So all these ideas that, you know, we take for granted now in the world of source code and software development, we're bringing to the world of data with Jamar. And when you think about data mesh, a lot of people talking about data mesh now and wanting to kind of take advantage of, of those concepts and ideas, you know, thinking of data as a product. Well, when you think about data as a product, we think you have to manage it like code, right? You have to, and that's why we call it data as code, right? The, all those reasons that we use things like GI have to build products, you know, if we wanna think of data as a product, we need all those capabilities also with data. You know, also the ability to go back in time. The ability to undo mistakes, to see who changed my data and when did they change that table. All of those are, are part of this, this new catalog that we've created. >>Are you talk about data as a product that's sort of intrinsic to the data mesh concept. Are you, what's your opinion of data mesh? Is the, is the world ready for that radically different approach to data ownership? >>You know, we are now in dozens of, dozens of our customers that are using drio for to implement enterprise-wide kind of data mesh solutions. And at the end of the day, I think it's just, you know, what most people would consider common sense, right? In a large organization, it is very hard for a centralized single team to understand every piece of data, to manage all the data themselves, to, you know, make sure the quality is correct to make it accessible. And so what data mesh is first and foremost about is being able to kind of federate the, or distribute the, the ownership of data, the governance of the data still has to happen, right? And so that is, I think at the heart of the data mesh, but thinking of data as kind of allowing different teams, different domains to own their own data to really manage it like a product with all the best practices that that we have with that super important. >>So we we're doing a lot with data mesh, you know, the way that cloud has multiple projects and the way that Jamar allows you to have multiple catalogs and different groups can kind of interact and share data among each other. You know, the fact that we can connect to all these different data sources, even outside your data lake, you know, with Redshift, Oracle SQL Server, you know, all the different databases that are out there and join across different databases in addition to your data lake, that that's all stuff that companies want with their data mesh. >>What are some of your favorite customer stories that where you've really helped them accelerate that data mesh and drive business value from it so that more people in the organization kind of access to data so they can really make those data driven decisions that everybody wants to make? >>I mean, there's, there's so many of them, but, you know, one of the largest tech companies in the world creating a, a data mesh where you have all the different departments in the company that, you know, they, they, they were a big data warehouse user and it kinda hit the wall, right? The costs were so high and the ability for people to kind of use it for just experimentation, to try new things out to collaborate, they couldn't do it because it was so prohibitively expensive and difficult to use. And so what they said, well, we need a platform that different people can, they can collaborate, they can ex, they can experiment with the data, they can share data with others. And so at a big organization like that, the, their ability to kind of have a centralized platform but allow different groups to manage their own data, you know, several of the largest banks in the world are, are also doing data meshes with Dr you know, one of them has over over a dozen different business units that are using, using Dremio and that ability to have thousands of people on a platform and to be able to collaborate and share among each other that, that's super important to these >>Guys. Can you contrast your approach to the market, the snowflakes? Cause they have some of those same concepts. >>Snowflake's >>A very closed system at the end of the day, right? Closed and very expensive. Right? I think they, if I remember seeing, you know, a quarter ago in, in, in one of their earnings reports that the average customer spends 70% more every year, right? Well that's not sustainable. If you think about that in a decade, that's your cost is gonna increase 200 x, most companies not gonna be able to swallow that, right? So companies need, first of all, they need more cost efficient solutions that are, you know, just more approachable, right? And the second thing is, you know, you know, we talked about the open data architecture. I think most companies now realize that the, if you want to build a platform for the future, you need to have the data and open formats and not be locked into one vendor, right? And so that's kind of another important aspect beyond that's ability to connect to all your data, even outside the lake to your different databases, no sequel databases, relational databases, and drs semantic layer where we can accelerate queries. And so typically what you have, what happens with data warehouses and other data lake query engines is that because you can't get the performance that you want, you end up creating lots and lots of copies of data. You, for every use case, you're creating a, you know, a pre-joy copy of that data, a pre aggregated version of that data. And you know, then you have to redirect all your data. >>You've got a >>Governance problem, individual things. It's expensive. It's expensive, it's hard to secure that cuz permissions don't travel with the data. So you have all sorts of problems with that, right? And so what we've done because of our semantic layer that makes it easy to kind of expose data in a logical way. And then our query acceleration technology, which we call reflections, which transparently accelerates queries and gives you subsecond response times without data copies and also without extracts into the BI tools. Cause if you start doing bi extracts or imports, again, you have lots of copies of data in the organization, all sorts of refresh problems, security problems, it's, it's a nightmare, right? And that just collapsing all those copies and having a, a simple solution where data's stored in open formats and we can give you fast access to any of that data that's very different from what you get with like a snowflake or, or any of these other >>Companies. Right. That, that's a great explanation. I wanna ask you, early this year you announced that your Dr. Cloud service would be a free forever, the basic DR. Cloud service. How has that offer gone over? What's been the uptake on that offer? >>Yeah, it, I mean it is, and thousands of people have signed up and, and it's, I think it's a great service. It's, you know, it's very, very simple. People can go on the website, try it out. We now have a test drive as well. If, if you want to get started with just some sample public sample data sets and like a tutorial, we've made that increasingly easy as well. But yeah, we continue to, you know, take that approach of, you know, making it, you know, making it easy, democratizing these kind of cloud data platforms and, and kinda lowering the barriers to >>Adoption. How, how effective has it been in driving sales of the enterprise version? >>Yeah, a lot of, a lot of, a lot of business with, you know, that, that we do like when it comes to, to selling is, you know, folks that, you know, have educated themselves, right? They've started off, they've followed some tutorials. I think generally developers, they prefer the first interaction to be with a product, not with a salesperson. And so that's, that's basically the reason we did that. >>Before we ask you the last question, I wanna just, can you give us a speak peek into the product roadmap as we enter 2023? What can you share with us that we should be paying attention to where Drum is concerned? >>Yeah. You know, actually a couple, couple days ago here at the conference, we, we had a press release with all sorts of new capabilities that we, we we just released. And there's a lot more for, for the coming year. You know, we will shortly be releasing a variety of different performance enhancements. So we'll be in the next quarter or two. We'll be, you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections and our career acceleration, you know, support for all the major clouds is coming. You know, just a lot of capabilities in Inre that make it easier and easier to use the platform. >>Awesome. Tomer, thank you so much for joining us. My last question to you is, if you had a billboard in your desired location and it was going to really just be like a mic drop about why customers should be looking at Drio, what would that billboard say? >>Well, DRIO is the easy and open data lake house and, you know, open architectures. It's just a lot, a lot better, a lot more f a lot more future proof, a lot easier and a lot just a much safer choice for the future for, for companies. And so hard to argue with those people to take a look. Exactly. That wasn't the best. That wasn't the best, you know, billboards. >>Okay. I think it's a great billboard. Awesome. And thank you so much for joining Poly Me on the program, sharing with us what's new, what some of the exciting things are that are coming down the pipe. Quite soon we're gonna be keeping our eye Ono. >>Awesome. Always happy to be here. >>Thank you. Right. For our guest and for Paul Gillin, I'm Lisa Martin. You're watching The Cube, the leader in live and emerging tech coverage.

Published Date : Dec 1 2022

SUMMARY :

It's the Cube live at AWS Reinvent This is as it has been since the beginning, this show with a lot of energy. it's great to have you back on the program. And thanks for keeping the, the best session for the fourth day. And the headline of the Silicon Angle's article on the interview we did with you was, So, you know, data warehouses don't go away in, in a year and actually don't go away ever. Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache are enabling this to happen? original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. You mentioned open a minute ago. The fact that they are, you know, they're very expensive. at the center of the architecture as opposed to some, you know, vendors logo. Do you see the at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account Apache iceberg support, you know, with DML commands, you can do inserts, updates, So all these ideas that, you know, we take for granted now in the world of Are you talk about data as a product that's sort of intrinsic to the data mesh concept. And at the end of the day, I think it's just, you know, what most people would consider common sense, So we we're doing a lot with data mesh, you know, the way that cloud has multiple several of the largest banks in the world are, are also doing data meshes with Dr you know, Cause they have some of those same concepts. And the second thing is, you know, you know, stored in open formats and we can give you fast access to any of that data that's very different from what you get What's been the uptake on that offer? But yeah, we continue to, you know, take that approach of, you know, How, how effective has it been in driving sales of the enterprise version? to selling is, you know, folks that, you know, have educated themselves, right? you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections My last question to you is, if you had a Well, DRIO is the easy and open data lake house and, you And thank you so much for joining Poly Me on the program, sharing with us what's new, Always happy to be here. the leader in live and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Paul Gillen	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tomer	PERSON	0.99+
Tomer Shiran	PERSON	0.99+
Toron	PERSON	0.99+
Las Vegas	LOCATION	0.99+
70%	QUANTITY	0.99+
Monday night	DATE	0.99+
Vegas	LOCATION	0.99+
fourth day	QUANTITY	0.99+
Paul	PERSON	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
dozens	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
10 different teams	QUANTITY	0.99+
Dremio	PERSON	0.99+
early this year	DATE	0.99+
SQL Query Engine	TITLE	0.99+
The Cube	TITLE	0.99+
Tuesday	DATE	0.99+
2023	DATE	0.99+
one	QUANTITY	0.98+
a year ago	DATE	0.98+
next quarter	DATE	0.98+
S3	TITLE	0.98+
a quarter ago	DATE	0.98+
twice	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Drio	ORGANIZATION	0.98+
couple days ago	DATE	0.98+
both	QUANTITY	0.97+
DRIO	ORGANIZATION	0.97+
2022	DATE	0.97+
Lake House	ORGANIZATION	0.96+
thousands of people	QUANTITY	0.96+
Wednesday	DATE	0.96+
Spark	TITLE	0.96+
200 x	QUANTITY	0.96+
first	QUANTITY	0.96+
Drio	TITLE	0.95+
Dremeo	ORGANIZATION	0.95+
two options	QUANTITY	0.94+
about three hours	QUANTITY	0.94+
day two	QUANTITY	0.94+
s3	TITLE	0.94+
Apache Iceberg	ORGANIZATION	0.94+
a minute ago	DATE	0.94+
Silicon Angle	ORGANIZATION	0.94+
hundred percent	QUANTITY	0.93+
Apache	ORGANIZATION	0.93+
single team	QUANTITY	0.93+
GitHub	ORGANIZATION	0.91+
this morning	DATE	0.9+
a dozen different engines	QUANTITY	0.89+
Iceberg	TITLE	0.87+
Redshift	TITLE	0.87+
last	DATE	0.87+
this year	DATE	0.86+
first interaction	QUANTITY	0.85+
two flavors	QUANTITY	0.84+
Thursday	DATE	0.84+
Azure	ORGANIZATION	0.84+
DR. Cloud	ORGANIZATION	0.84+
SQL Server	TITLE	0.83+
four conferences	QUANTITY	0.82+
coming year	DATE	0.82+
over over a dozen different business	QUANTITY	0.81+
one vendor	QUANTITY	0.8+
Poly	ORGANIZATION	0.79+
Jamar	PERSON	0.77+
GI	ORGANIZATION	0.77+
Inre	ORGANIZATION	0.76+
Dr.	ORGANIZATION	0.73+
Lake house	ORGANIZATION	0.71+
Arctic	ORGANIZATION	0.71+
a year	QUANTITY	0.7+
a minute	QUANTITY	0.7+
SQL	TITLE	0.69+
AWS Reinvent 2022	EVENT	0.69+
subsecond	QUANTITY	0.68+
DML	TITLE	0.68+

Mark Lyons, Dremio | AWS Startup Showcase S2 E2

(upbeat music) >> Hello, everyone and welcome to theCUBE presentation of the AWS startup showcase, data as code. This is season two, episode two of the ongoing series covering the exciting startups from the AWS ecosystem. Here we're talking about operationalizing the data lake. I'm your host, John Furrier, and my guest here is Mark Lyons, VP of product management at Dremio. Great to see you, Mark. Thanks for coming on. >> Hey John, nice to see you again. Thanks for having me. >> Yeah, we were talking before we came on camera here on this showcase we're going to spend the next 20 minutes talking about the new architectures of data lakes and how they expand and scale. But we kind of were reminiscing by the old big data days, and how this really changed. There's a lot of hangovers from (mumbles) kind of fall through, Cloud took over, now we're in a new era and the theme here is data as code. Really highlights that data is now in the developer cycles of operations. So infrastructure is code-led DevOps movement for Cloud programmable infrastructure. Now you got data as code, which is really accelerating DataOps, MLOps, DatabaseOps, and more developer focus. So this is a big part of it. You guys at Dremio have a Cloud platform, query engine and a data tier innovation. Take us through the positioning of Dremio right now. What's the current state of the offering? >> Yeah, sure, so happy to, and thanks for kind of introing into the space that we're headed. I think the world is changing, and databases are changing. So today, Dremio is a full database platform, data lakehouse platform on the Cloud. So we're all about keeping your data in open formats in your Cloud storage, but bringing that full functionality that you would want to access the data, as well as manage the data. All the functionality folks would be used to from NC SQL compatibility, inserts updates, deletes on that data, keeping that data in Parquet files in the iceberg table format, another level of abstraction so that people can access the data in a very efficient way. And going even further than that, what we announced with Dremio Arctic which is in public preview on our Cloud platform, is a full get like experience for the data. So just like you said, data as code, right? We went through waves and source code and infrastructure as code. And now we can treat the data as code, which is amazing. You can have development branches, you can have staging branches, ETL branches, which are separate from production. Developers can do experiments. You can make changes, you can test those changes before you merge back to production and let the consumers see that data. Lots of innovation on the platform, super fast velocity of delivery, and lots of customers adopting it in just in the first month here since we announced Dremio Cloud generally available where the adoption's been amazing. >> Yeah, and I think we're going to dig into the a lot of the architecture, but I want to highlight your point you made about the branching off and taking a branch of Git. This is what developers do, right? The developers use GitHub, Git, they bake branches from code. They build on top of other code. That's open source. This is what's been around for generations. Now for the first time we're seeing data sets being taken out of production to be worked on and coded and tested and even doing look backs or even forward looking analysis. This is data being programmed. This is data as code. This is really, you couldn't get any closer to data as code. >> Yeah. It's all done through metadata by the way. So there's no actual copying of these data sets 'cause in these big data systems, Cloud data lakes and stuff, and these tables are billions of records, trillions of records, super wide, hundreds of columns wide, thousands of columns wide. You have to do this all through metadata operations so you can control what version of the data basically a individual's working with and which version of the data the production systems are seeing because these data sets are too big. You don't want to be moving them. You can't be moving them. You can't be copying them. It's all metadata and manifest files and pointers to basically keep track of what's going on. >> I think this is the most important trend we've seen in a long time, because if you think about what Agile did for developers, okay, speed, DevOps, Cloud scale, now you've got agility in the data side of it where you're basically breaking down the old proprietary, old ways of doing data warehousing, but not killing the functionality of what data warehouses did. Just doing more volume data warehouses where proprietary, not open. They were different use cases. They were single application developers when used data warehouse query, not a lot of volume. But as you get volume, these things are inadequate. And now you've got the new open Agile. Is this Agile data engineering at play here? >> Yeah, I think it totally is. It's bringing it as far forward in as possible. We're talking about making the data engineering process easier and more productive for the data engineer, which ultimately makes the consumers of that data much happier as well as way more experiments can happen. Way more use cases can be tried. If it's not a burden and it doesn't require building a whole new pipeline and defining a schema and adding columns and data types and all this stuff, you can do a lot more with your data much faster. So it's really going to be super impactful to all these businesses out there trying to be data driven, especially when you're looking at data as a code and branching, a branch off, you can de-risk your changes. You're not worried about messing up the production system, messing up that data, having it seen by end user. Some businesses data is their business so that data would be going all the way to a consumer, a third party. And then it gets really scary. There's a lot of risk if you show the wrong credit score to a consumer or you do something like that. So it's really de-risking... >> Even updating machine learning algorithms. So for instance, if the data sets change, you can always be iterating on things like machine learning or learning algorithms. This is kind of new. This is awesome, right? >> I think it's going to change the world because this stuff was so painful to do. The data sets had gotten so much bigger as you know, but we were still doing it in the old way, which was typically moving data around for everyone. It was copying data down, sampling data, moving data, and now we're just basically saying, hey, don't do that anymore. We got to stop moving the data. It doesn't make any sense. >> So I got to ask you Mark, data lakes are growing in popularity. I was originally down on data lakes. I called them data swamps. I didn't think they were going to be as popular because at that time, distributed file systems like Hadoop, and object store in the Cloud were really cool. So what happened between that promise of distributed file systems and object store and data lakes? What made data lakes popular? What made that work in your opinion? >> Yeah, it really comes down to the metadata, which I already mentioned once. But we went through these waves. John you saw we did the EDWs to the data lakes and then the Cloud data warehouses. I think we're at the start of a cycle back to the data lake. And it's because the data lakes this time around with the Apache iceberg table format, with project (mumbles) and what Dremio's working on around metadata, these things aren't going to become data swamps anymore. They're actually going to be functional systems that do inserts updates into leads. You can see all the commits. You can time travel them. And all the files are actually managed and optimized so you have to partition the data. You have to merge small files into larger files. Oh, by the way, this is stuff that all the warehouses have done behind the scenes and all the housekeeping they do, but people weren't really aware of it. And the data lakes the first time around didn't solve all these problems so that those files landing in a distributed file system does become a mess. If you just land JSON, Avro or Parquet files, CSV files into the HDFS, or in S3 compatible, object store doesn't matter, if you're just parking files and you're going to deal with it as schema and read instead of schema and write, you're going to have a mess. If you don't know which tool changed the files, which user deleted a file, updated a file, you will end up with a mess really quickly. So to take care of that, you have to put a table format so everyone's looking at Apache iceberg or the data bricks Delta format, which is an interesting conversation similar to the Parquet and org file format that we saw play out. And then you track the metadata. So you have those manifest files. You know which files change when, which engine, which commit. And you can actually make a functional system that's not going to become a swamp. >> Another trend that's extending on beyond the data lake is other data sources, right? So you have a lot of other data, not just in data lakes so you have to kind of work with that. How do you guys answer the question around some of the mission critical BI dashboards out there on the latency side? A lot of people have been complaining that these mission critical BI dashboards aren't getting the kind of performance as they add more data sources and they try to do more. >> Yeah, that's a great question. Dremio does actually a bunch of interesting things to bring the performance of these systems up because at the end of the day, people want to access their data really quickly. They want the response times of these dashboards to be interactive. Otherwise the data's not interesting if it takes too long to get it. To answer a question, yeah, a couple of things. First of all, from a data source's side, Dremio is very proficient with our Parquet files in an object store, like we just talked about, but it also can access data in other relational systems. So whether that's a Postgres system, whether that's a Teradata system or an Oracle system. That's really useful if you have dimensional data, customer data, not the largest data set in the world, not the fastest moving data set in the world, but you don't want to move it. We can query that where it resides. Bringing in new sources is definitely, we all know that's a key to getting better insights. It's in your data, is joining sources together. And then from a query speed standpoint, there's a lot of things going on here. Everything from kind of Apache, the Apache Avro project, which is in memory format of Parquet and not kind of serialize and de-serialize the data back and forth. As well as what we call reflection, which is basically a re-indexing or pre-computing of the data, but we leave it in Parquet format, in a open format in the customer's account so that you can have aggregates and other things that are really popular in these dashboards pre-computed. So millisecond response, lightning fast, like tricks that a warehouse would do that the warehouses have been doing forever. Right? >> Yeah, more deals coming in. And obviously the architecture we'll get into that now has to handle the growth. And as your customers and practitioners see the volume and the variety and the velocity of the data coming in, how are they adjusting their data strategies to respond to this? Again, Cloud is clearly the answer, not the data warehouse, but what are they doing? What's the strategy adjustment? >> It's interesting when we start talking to folks, I think sometimes it's a really big shift in thinking about data architectures and data strategies when you look at the Dremio approach. It's very different than what most people are doing today around ETL pipelines and then bringing stuff into a warehouse and oh, the warehouse is too overloaded so let's build some cubes and extracts into the next tier of tools to speed up those dashboards for those tools. And Dremio has totally flipped this on a sentence and said, no, let's not do all those things. That's time consuming. It's brittle, it breaks. And actually your agility and the scope of what you can do with your data decreases. You go from all your data and all your data sources to smaller and smaller. We actually call it the perimeter doom and a lot of people look at this and say, yeah, that kind of looks like how we're doing things today. So from a Dremio perspective, it's really about no copy, try to keep as much data in one place, keep it in one open format and less data movement. And that's a very different approach for people. I think they don't realize how much you can accomplish that way. And your latency shrinks down too. Your actual latency from data created to insight is much shorter. And it's not because of the query response time, that latency is mostly because of data movement and copy and all these things. So you really want to shrink your time to insight. It's not about getting a faster query from a few seconds down, it's about changing the architecture. >> The data drift as they say, interesting there. I got to ask you on the personnel side, team side, you got the technical side, you got the non-technical consumers of the data, you got the data science or data engineering is ramping up. We mentioned earlier data engineering being Agile, is a key innovation here. As you got to blend the two personas of technical and non-technical people playing with data, coding with data, we're the bottlenecks in this process today. How can data teams overcome these bottlenecks? >> I think we see a lot of bottlenecks in the process today, a lot of data movement, a lot of change requests, update this dashboard. Oh, well, that dashboard update requires an ETL pipeline update, requires a column to be added to this warehouse. So then you've got these personas, like you said, some more technical, less technical, the data consumers, the data engineers. Well, the data engineers are getting totally overloaded with requests and work. And it's not even super value-add work to the business. It's not really driving big changes in their culture and insights and new new use cases for data. It's turning through kind of small changes, but it's taking too much time. It's taking days, if not weeks for these organizations to manage small changes. And then the data consumers, the less technical folks, they can't get the answers that they want. They're waiting and waiting and waiting and they don't understand why things are so challenging, how things could take so much time. So from a Dremio perspective, it's amazing to watch these organizations unleash their data. Get the data engineers, their productivity up. Stop dealing with some of the last mile ETL and small changes to the data. And Dremio actually says, hey, data consumers, here's a really nice gooey. You don't need to be a SQL expert, well, the tool will write the joints for you. You can click on a column and say, hey, I want to calculate a new field and calculate that field. And it's all done virtually so it's not changing the physical data sets. The actual data engineering team doesn't even really need to care at that point. So you get happier data consumers at the end of the day. They're doing things more self-service. They're learning about the data and the data engineering teams can go do value-add things. They can re-architecture the platform for the future. They can do POCs to test out new technologies that could support new use cases and bring those into the organization. Things that really add value, instead of just churning through backlogs of, hey, can we get a column added or we change... Everyone's doing app development, AB testing, and those developers are king. Those pipelines stream all this data down when the JSON files change. You need agility. And if you don't have that agility, you just get this endless backlog that you never... >> This is data as code in action. You're committing data back into the main brand that's been tested. That's what developers do. So this is really kind of the next step function. I got to put the customer hat on for a second and ask you kind of the pessimist question. Okay, we've had data lakes, I've got data lakes, it's been data lakes around, I got query engines here and there, they're all over the place, what's missing? What's been missing from the architecture to fully realize the potential of a data lakehouse? >> Yeah, I think that's a great question. The customers say exactly that John. They say, "I've got 22 databases, you got to be kidding me. You showed up with another database." Or, hey, let's talk about a Cloud data lake or a data lake. Again, I did the data lake thing. I had a data lake and it wasn't everything I thought it was going to be. >> It was bad. It was data swamp. >> Yeah, so customers really think this way, and you say, well, what's different this time around? Well, the Cloud in the original data lake world, and I'm just going to focus on data lakes, so the original data lake worlds, everything was still direct attached storage, so you had to scale your storage and compute out together. And we built these huge systems. Thousands of thousands of HDFS nodes and stuff. Well, the Cloud brought the separated compute and storage, but data lakes have never seen separated compute and storage until now. We went from the data lake with directed tap storage to the Cloud data warehouse with separated compute and storage. So the Cloud architecture and getting compute and storage separated is a huge shift in the data lake world. And that agility of like, well, I'm only going to apply it, the compute that I need for this question, for this answer right now, and not get 5,000 servers of compute sitting around at some peak moment. Or just 5,000 compute servers because I have five petabytes or 50 petabytes of data that need to be stored in the discs that are attached to them. So I think the Cloud architecture and separating compute and storage is the first thing that's different this time around about data lakes. But then more importantly than that is the metadata tier. Is the data tier and having sufficient metadata to have the functionality that people need on the data lake. Whether that's for governance and compliance standpoints, to actually be able to do a delete on your data lake, or that's for productivity and treating that data as code, like we're talking about today, and being able to time travel it, version it, branch it. And now these data lakes, the data lakes back in the original days were getting to 50 petabytes. Now think about how big these Cloud data lakes could be. Even larger and you can't move that data around so we have to be really intelligent and really smart about the data operations and versioning all that data, knowing which engine touch the data, which person was the last commit and being able to track all that, is ultimately what's going to make this successful. Because if you don't have the governance in place these days with data, the projects are going to fail. >> Yeah, and I think separating the query layer or SQL layer and the data tier is another innovation that you guys have. Also it's a managed Cloud service, Dremio Cloud now. And you got the open source angle too, which is also going to open up more standardization around some of these awesome features like you mentioned the joints, and I think you guys built on top of Parquet and some other cool things. And you got a community developing, so you get the Cloud and community kind of coming together. So it's the real world that is coming to light saying, hey, I need real world applications, not the theory of old school. So what use cases do you see suited for this kind of new way, new architecture, new community, new programability? >> Yeah, I see people doing all sorts of interesting things and I'm sure with what we've introduced with Dremio Arctic and the data is code is going to open up a whole new world of things that we don't even know about today. But generally speaking, we have customers doing very interesting things, very data application things. Like building really high performance data into use cases whether that's a supply chain and manufacturing use case, whether that's a pharma or biotech use case, a banking use case, and really unleashing that data right into an application. We also see a lot of traditional data analytics use cases more in the traditional business intelligence or dashboarding use cases. That stuff is totally achievable, no problems there. But I think the most interesting stuff is companies are really figuring out how to bring that data. When we offer the flexibility that we're talking about, and the agility that we're talking about, you can really start to bring that data back into the apps, into the work streams, into the places where the business gets more value out of it. Not in a dashboard that some person might have access to, or a set of people have access to. So even in the Dremio Cloud announcement, the press release, there was a customer, they're in Europe, it's called Garvis AI and they do AI for supply chains. It's an intelligent application and it's showing customers transparently how they're getting to these predictions. And they stood this all up in a very short period of time, because it's a Cloud product. They don't have to deal with provisioning, management, upgrades. I think they had their stuff going in like 30 minutes or something, like super quick, which is amazing. The data was already there, and a lot of organizations, their data's already in these Cloud storages. And if that's the case... >> If they have data, they're a use case. This is agility. This is agility coming to the data engineering field, making data programmable, enabling the data applications, the data ops for everybody, for coding... >> For everybody. And for so many more use cases at these companies. These data engineering teams, these data platform teams, whether they're in marketing or ad tech or Fiserv or Telco, they have a list. There's a list about a roadmap of use cases that they're waiting to get to. And if they're drowning underwater in the current tooling and barely keeping that alive, and oh, by the way, John, you can't go higher 30 new data engineers tomorrow and bring on the team to get capacity. You have to innovate at the architecture level, to unlock more data use cases because you're not going to go triple your team. That's not possible. >> It's going to unlock a tsunami of value. Because everyone's clogged in the system and it's painful. Right? >> Yeah. >> They've got delays, you've got bottlenecks. you've got people complaining it's hard, scar tissue. So now I think this brings ease of use and speed to the table. >> Yeah. >> I think that's what we're all about, is making the data super easy for everyone. This should be fun and easy, not really painful and really hard and risky. In a lot of these old ways of doing things, there's a lot of risk. You start changing your ETL pipeline. You add a column to the table. All of a sudden, you've got potential risk that things are going to break and you don't even know what's going to break. >> Proprietary, not a lot of volume and usage, and on-premises, open, Cloud, Agile. (John chuckles) Come on, which path? The curtain or the box, what are you going to take? It's a no brainer. >> Which way do you want to go? >> Mark, thanks for coming on theCUBE. Really appreciate it for being part of the AWS startup showcase data as code, great conversation. Data as code is going to enable a next wave of innovation and impact the future of data analytics. Thanks for coming on theCUBE. >> Yeah, thanks John and thanks to the AWS team. A great partnership between AWS and Dremio too. Talk to you soon. >> Keep it right there, more action here on theCUBE. As part of the showcase, stay with us. This is theCUBE, your leader in tech coverage. I'm John Furrier, your host, thanks for watching. (downbeat music)

Published Date : Apr 26 2022

SUMMARY :

of the AWS startup showcase, data as code. Hey John, nice to see you again. and the theme here is data as code. Lots of innovation on the platform, Now for the first time the production systems are seeing in the data side of it for the data engineer, So for instance, if the data sets change, I think it's going to change the world and object store in the And it's because the data extending on beyond the data lake of the data, but we leave and the variety and the the scope of what you can do I got to ask you on the and the data engineering teams kind of the pessimist question. Again, I did the data lake thing. It was data swamp. and really smart about the data operations and the data tier is another and the data is code is going the data engineering field, and bring on the team to get capacity. Because everyone's clogged in the system to the table. is making the data The curtain or the box, and impact the future of data analytics. Talk to you soon. As part of the showcase, stay with us.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
Europe	LOCATION	0.99+
John Furrier	PERSON	0.99+
Mark Lyons	PERSON	0.99+
30 minutes	QUANTITY	0.99+
Telco	ORGANIZATION	0.99+
Mark	PERSON	0.99+
50 petabytes	QUANTITY	0.99+
five petabytes	QUANTITY	0.99+
two personas	QUANTITY	0.99+
5,000 servers	QUANTITY	0.99+
tomorrow	DATE	0.99+
hundreds of columns	QUANTITY	0.99+
22 databases	QUANTITY	0.99+
Dremio	ORGANIZATION	0.99+
trillions of records	QUANTITY	0.99+
Dremio	PERSON	0.99+
Dremio Arctic	ORGANIZATION	0.99+
Fiserv	ORGANIZATION	0.99+
first time	QUANTITY	0.98+
30 new data engineers	QUANTITY	0.98+
billions of records	QUANTITY	0.98+
thousands of columns	QUANTITY	0.98+
first thing	QUANTITY	0.98+
Thousands of thousands	QUANTITY	0.98+
today	DATE	0.97+
one place	QUANTITY	0.97+
Oracle	ORGANIZATION	0.97+
Apache	ORGANIZATION	0.96+
S3	TITLE	0.96+
Git	TITLE	0.96+
Cloud	TITLE	0.95+
Hadoop	TITLE	0.95+
first month	QUANTITY	0.94+
Parquet	TITLE	0.94+
Dremio Cloud	TITLE	0.91+
5,000 compute servers	QUANTITY	0.91+
one	QUANTITY	0.91+
JSON	TITLE	0.89+
First	QUANTITY	0.89+
single application	QUANTITY	0.89+
Garvis	ORGANIZATION	0.88+
GitHub	ORGANIZATION	0.87+
Apache	TITLE	0.82+
episode	QUANTITY	0.79+
Agile	TITLE	0.77+
season two	QUANTITY	0.74+
Agile	ORGANIZATION	0.69+
DevOps	TITLE	0.67+
Startup Showcase S2 E2	EVENT	0.66+
Teradata	ORGANIZATION	0.65+
theCUBE	ORGANIZATION	0.64+

Mark Lyons, Dremio | CUBE Conversation

(bright upbeat music) >> Hey everyone. Welcome to this "CUBE Conversation" featuring Dremio. I'm your host, Lisa Martin. And I'm excited today to be joined by Mark Lyons the VP of product management at Dremio. Mark thanks for joining us today. >> Hey Lisa, thank you for having me. Looking forward to the top. >> Yeah. Talk to me about what's going on at Dremio. I had the chance to talk to your chief product officer Tomer Shiran in a couple months ago but talk to us about what's going on. >> Yeah, I remember that at re:Invent it's been an exciting few months since re:Invent here at Dremio and just in the new year we raised our Series E since then we ran into our subsurface event which we had over seven, 8,000 registrants and attendees. And then we announced our Dremio cloud product generally available including Dremio Sonar, which is SQL query engine and Dremio Arctic in public preview which is a better store for the lakehouse. >> Great. And we're going to dig into both of those. I saw that over 400 million raised in that Series E raising the valuation of Dremio to 2 billion. So a lot of growth and momentum going on at the company I'm sure. If we think about businesses in any industry they've made large investments in data warehouses, proprietary data warehouses. Talk to me about historically what they've been able to achieve, but then what some those bottlenecks are that they're running into. >> Yeah, for sure. My background is actually in the data warehouse space. I spent over the last eight, maybe close to 10 years and we've seen this shift go on from the traditional enterprise data warehouse to the data lake to the the last couple years is really been the time of the cloud data warehouse. And there's been a large amount of adoption of cloud data warehouses, but fundamentally they still come with a lot of the same challenges that have always existed with the data warehouse, which is first of all you have to load your data into it. So that data's coming from lots of different sources. In many cases, it's landing in a files in the data lake like a repository like S3 first. And then there's a loading process, right? An ETL process. And those pipelines have to be maintained and stay operational. And typically as the data warehouse life cycle of processing moves on the scope of the data that consumers get to access gets smaller and smaller. The control of that data gets tighter and change process gets heavier, and it goes from quick changes of adding a column or adding a field to a file to days if not weeks for businesses to modify their data pipelines and test new scenarios offer new features in the application or answer new questions that the business is interested you know, from an analytics standpoint. So typically we see the same thing even with these cloud data warehouses, the scope of the data shrinks, the time to get answers gets longer. And when new engines come along the same story we see, and this is going on right now in the data warehouse space there's new data that are coming and they say, well we're a thousand faster times faster than the last data warehouse. And then it's like, okay, great. But what's the process? The process is to migrate all your data to the new data warehouse, right? And that comes with all the same baggage. Again, it's a proprietary format that you load your data into. So I think people are ready for a change from that. >> People are not only ready for a change, but as every company has to become a data company these days and access to real time data is no longer a nice to have. It's absolutely essential. The ability to scale the ability to harness the value from as much data as possible and to do so fast is real really table stakes for any organization. How is Dremio helping customers in that situation to operationalize their data? >> Yeah, so that's why I was so intrigued and loved about Dremio when I joined three, four, five months back. Coming from the warehouse space, when I first saw the product I was just like, oh my gosh, this is so much easier for folks. They can access a larger scope of their data faster, which to your point, like is table stakes for all organizations these days they need to be able to analyze data sooner. Sooner is the better. Data has a halflife, right? Like it decays. The value of data decays over time. So typically the most valuable data is the newest data. And that all depends on what we're the industries we're talking about the types of data and the use cases, but it's always basically true that newer data is more valuable and they need to be able to analyze as much of it as possible. The story can't be, no, we have to wait weeks or months to get a new data source or the story can't be you know, that data that includes seasonality. You know, we weren't able to keep in the same location because it's too expensive to keep it in the warehouse or whatever. So for Dremio and our customers our story is simple, is leverage the data where it is so access data in all sorts of sources, whether it's a post press database or an S3 bucket, and don't move the data don't copy the data, analyze it in place. And don't limit the scope of the data you're trying to analyze. If you have new use cases you have additional data sets that you want to add to those use cases, just bring them in, into S3 and you are off to the races and you can easily analyze more data and give more power to the end user. So if there's a field that they want to calculate the simple change convert this miles field, the kilometers well, the end users should be empowered to just make a calculation on the data like that. That should not require an entire cycle through a data engineering team and a backlog and a ticket and pushing that to production and so forth which in many cases it does at many organizations. It's a lot of effort to make new calculations on the data or derive new fields, add a new column and so forth. So Dremio makes the data engineers life easier and more productive. It also makes the data consumers life much easier and happier, and they can just do their job without worrying about and waiting. >> Not only can they do their job but from a business, a high level perspective the business is probably has the opportunity to be far more competitive because it's got a bigger scope of data, as you mentioned, access to it more widely faster and those are only good things in terms of- >> More use cases, more experiments, right? So what I've seen a lot is like there's no shortage of ideas of what people can do with the data. And projects that might be able to be undertaken but no one knows exactly how valuable that will be. How whether that's something that should be funded or should not be funded. So like more use cases, more experiments try more things. Like if it's cheap to try these data problems and see if it's valuable to the business then that's better for the business. Ultimately the business will be more competitive. We'll be able to try more new products we'll be able to have better operational kind of efficiencies, lower risk all those things. >> Right. What about data governance? Talk to me about how the Lakehouse enables that across all these disparate data volumes. >> I think this is where things get really interesting with the Lakehouse concept relative to where we used to be with a data lake, which was a parking ground for just lots of files. And that came with a lot of challenges when you just had a lot of files out there in a data lake, whether that was HDFS, right. I do data lake back in the day or now a cloud storage object, storage data lake. So historically I feel like governance, access authentication, auditing all were extremely challenging with the data lake but now in the modern kind of lake in the modern lakehouse world, all those challenges have been solved. You have great everything from the front of the house with all and access policies and data masking everything that you would expect through commits and tables and transactions and inserts and updates and deletes, and auditing of that data able to see, well who made the changes to the data, which engine, which user when were they made and seeing the whole history of a table and not just one, not just a mess of files in a file store. So it's really come a long way. I feel like where the renaissance stage of the 2.0 data lakes or lakehouses as people call them. But basically what you're seeing is a lot of functionality from the traditional warehouse, all available in the lake. And warehouses had a lot of governance built in. And whether that is encryption and column access policies and row access policies. So only the right user saw the right data or some data masking. So that like the social security was masked out but the analyst knew it was a social security number. That was all there. Now that's all available on the lakehouse and you don't need to copy data into a data warehouse just to meet those type of requirements. Huge one is also deletes, right? Like I feel like deletes were one of the Achilles heels of the original data lake when there was no governance. And people were just copying data sets around modifying data sets for whatever their analytics use case was. If someone said, "Hey, go delete the right. To be forgotten GDPR." Now you've got Californias CCPA and others all coming online. If you said, go delete this per you know, this records or set of records from there from a lake original lake. I think that was impossible, probably for many people to do it with confidence, like to say that like I fully deleted this. Now with the Apache like iceberg cable format that is stores in the lakehouse architecture, you actually have delete functionality, right? Which is a key component that warehouses are traditionally brought to the table. >> That's a huge component from a compliance perspective. You mentioned GDPR, CCPA, which is going to be CPRA in less than a year, but there's so many other regulations data privacy regulations that are coming up that the ability to delete that is going to be table stakes for organizations, something that you guys launched. And we just have a couple minutes left, but you launched I love the name, the forever free data Lakehouse platform. That sounds great. Forever Free. Talk to me about what that really means is consisting of two products the Sonar and Arctic that you mentioned, but talk to me about this Forever Free data Lakehouse. >> Yeah. I feel like this is an amazing step forward in this, in the industry. And because of the Dremio cloud architecture, where the execution and data lives in the customer's cloud account we're able to basically say, hey, the Dremio software the Dremio service side of this platform is Forever Free for users. Now there is a paid tier but there's a standard tier that is truly forever free. Now that that still comes with infrastructure bills from like your cloud provider, right? So if you use AWS, you still have an S3 bill like for your data sets because we're not moving them. They're staying in your Amazon account in your S3 bucket. You still do still have to pay for right. The infrastructure, the EC2 and the compute to do the data analytics but the actual softwares is free forever. And there's no one else in our space offering that at in our space, everything's a free trial. So here's your $500 of credit. Come try my product. And what we're saying is with this kind of our unique architectural approach and this is what I think is preferred by customers too. You know, we take care of all the query planning all the engine management, all the administrative the platform, the upgrades fully available zero downtime platform. So they get all the benefits of SaaS as well as the benefits of maintaining control over their data. And because that data staying in their account and the execution of the analytics is staying in their account. We don't incur that infrastructure bill. So we can have a free forever tier a forever free tier of our platform. And we've had tremendous adoption. I think we announced this beginning of March first week of March. So it's not even the end of March. Hundreds and hundreds of signups and many customers actively are users actively on the platform now live querying their data >> Just kind of summarizes the momentum that Dremio we seeing. Mark, thank you so much. We're out of time, but thanks for talking to me- >> Thank you. >> About what's new at Dremio. What you guys are doing. Next time, we'll have to unpack this even more. I'm sure there's loads more we could talk about but we appreciate that. >> Yeah, this was great. Thank you, Lisa. Thank you. >> My pleasure for Mark Lyons. I'm Lisa Martin. Keep it right here on theCUBE your leader in high tech hybrid event coverage. (upbeat music)

Published Date : Mar 24 2022

SUMMARY :

the VP of product management at Dremio. Looking forward to the top. I had the chance to talk to and just in the new year of Dremio to 2 billion. the time to get answers gets longer. and to do so fast is and pushing that to Ultimately the business Talk to me about how the Lakehouse enables and auditing of that data able to see, that the ability to delete that and the compute to do the data analytics Just kind of summarizes the momentum but we appreciate that. Yeah, this was great. your leader in high tech

ENTITIES

Entity	Category	Confidence
Mark Lyons	PERSON	0.99+
Lisa Martin	PERSON	0.99+
$500	QUANTITY	0.99+
Lisa	PERSON	0.99+
2 billion	QUANTITY	0.99+
Mark	PERSON	0.99+
Dremio	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Tomer Shiran	PERSON	0.99+
Hundreds	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
less than a year	QUANTITY	0.99+
GDPR	TITLE	0.99+
both	QUANTITY	0.99+
end of March	DATE	0.99+
today	DATE	0.99+
over 400 million	QUANTITY	0.98+
over seven, 8,000 registrants	QUANTITY	0.98+
first	QUANTITY	0.97+
Sonar	ORGANIZATION	0.97+
Arctic	ORGANIZATION	0.97+
Apache	ORGANIZATION	0.96+
two products	QUANTITY	0.96+
S3	TITLE	0.95+
Dremio Arctic	ORGANIZATION	0.94+
EC2	TITLE	0.94+
Lakehouse	ORGANIZATION	0.94+
CCPA	TITLE	0.94+
couple months ago	DATE	0.93+
re:Invent	EVENT	0.87+
five months back	DATE	0.86+
last couple years	DATE	0.86+
three	DATE	0.84+
one	QUANTITY	0.84+
couple minutes	QUANTITY	0.82+
March first week of March	DATE	0.82+
hundreds	QUANTITY	0.81+
10 years	QUANTITY	0.76+
four	DATE	0.76+
Forever	TITLE	0.76+
beginning	DATE	0.73+
SQL	TITLE	0.72+
2.0 data	QUANTITY	0.71+
Series	EVENT	0.68+
Sonar	COMMERCIAL_ITEM	0.67+
E	OTHER	0.64+
Series E	EVENT	0.64+
Free	ORGANIZATION	0.63+
Californias	LOCATION	0.59+
signups	QUANTITY	0.57+
Conversation	EVENT	0.56+
year	EVENT	0.53+
thousand	QUANTITY	0.48+
eight	DATE	0.46+
CPRA	ORGANIZATION	0.42+
CCPA	ORGANIZATION	0.34+

Tomer Shiran, Dremio | AWS re:Invent 2021

>>Good morning. Welcome back to the cubes. Continuing coverage of AWS reinvent 2021. I'm Lisa Martin. We have two live sets here. We've got over a hundred guests on the program this week with our live sets of remote sets, talking about the next decade in cloud innovation. And I'm pleased to be welcoming back. One of our cube alumni timbers. She ran the founder and CPO of Jenny-O to the program. Tom is going to be talking about why 2022 is the year open data architectures surpass the data warehouse Timur. Welcome back to the >>Cube. Thanks for having me. It's great to be here. It's >>Great to be here at a live event in person, my goodness, sitting side by side with guests. Talk to me a little bit about before we kind of dig into the data lake house versus the data warehouse. I want to, I want to unpack that with you. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, but what are some of the things going on right now in the fall of 2021? >>Yeah, for us, it's a big year of, uh, a lot of product news, a lot of new products, new innovation, a company's grown a lot. We're, uh, you know, probably three times bigger than we were a year ago. So a lot of, a lot of new, new folks on the team and, uh, many, many new customers. >>It's good, always new customers, especially during the last 22 months, which have been obviously incredibly challenging, but I want to unpack this, the difference between a data lake and data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities and how customers are benefiting. Sure. Yeah. >>I think you could think of the lake house as kind of the evolution of the lake, right? So we have, we've had data lakes for a while. Now, the transition to the cloud made them a lot more powerful and now a lot of new capabilities coming into the world of data lakes really make the, that whole kind of concept that whole architecture, much more powerful to the point that you really are not going to need a data warehouse anymore. Right. And so it kind of gives you the best of both worlds, all the advantages that we had with data lakes, the flexibility to use different processing engines, to have data in your own account and open formats, um, all those benefits, but also the benefits that you had with warehouses, where you could do transactions and get high performance for your, uh, BI workloads and things like that. So the lake house makes kind of both of those come together and gives you the, the benefits of both >>Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go about from say they've got data warehouses, data lakes to actually evolving to the lake house. >>You know, data warehouses have been around forever, right? And you know, there's, there's been some new innovation there as we've kind of moved to the cloud, but fundamentally there are very close and very proprietary architecture that gets very expensive quickly. And so, you know, with a data warehouse, you have to take your data and load it into the warehouse, right. You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, that's, that's what you do. You bring the data into the engine. Um, the data lake house is a really different architecture. It's one where you actually, you're having, you have data as its own tier, right? Stored in open formats, things like parquet files and iceberg tables. And you're basically bringing the engines to the data instead of the data to the engine. And so now all of a sudden you can start to take advantage of all this innovation that's happening on the same set of data without having to copy and move it around. So whether that's, you know, Dremio for high performance, uh, BI workloads and SQL type of analysis, a spark for kind of batch processing and machine learning, Flink for streaming. So lots of different technologies that you can use on the, on the same data and the data stays in the customer's own account, right? So S3 effectively becomes their new data warehouse. >>Okay. So it can imagine during the last 22 months of this scattered work from Eddie, and we're still in this work from anywhere environment with so much data being generated at the edge of the edge, expanding that bringing the engines to the data is probably now more timely than ever. >>Yeah. I think the, the growth in data, uh, you see it everywhere, right? That that's the reason so many companies like ourselves are doing so well. Right? It's, it's, there's so much new data, so many new use cases and every company wants to be data-driven right. They all want to be, you know, to, to democratize data within the organization. Um, you know, but you need the platforms to be able to do that. Right. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, you know, which maybe is landing in S3, but move it into, you know, subsets of it into a data warehouse. And then from there move, you know, substance of that into, you know, BI extracts, right? Tableau extracts power BI imports, and you have to create cubes and lots of copies within the data warehouse. There's no way you're going to be able to provide self-service and data democratization. And so really requires a new architecture. Um, and that's one of the main things that we've been focused on at Dremio, um, is really taking the, the, the lake house and the lake and making it, not just something that data scientists use for, you know, really kind of advanced use cases, but even your production BI workloads can actually now run on the lake house when you're using a SQL technology. Like, and then >>It's really critical because as you talked about this, you know, companies, every company, these days is a data company. If they're not, they have to be, or there's a competitor in the rear view mirror that is going to be able to take over what they're doing. So this really is really critical, especially considering another thing that we learned in the last 22 months is that there's no real-time data access is no longer, a nice to have. It's really an essential for businesses in any organization. >>I think, you know, we, we see it even in our own company, right? The folks that are joining the workforce now, they, they learn sequel in school, right. They, they, they don't want to report on their desk, printed out every Monday morning. They want access to the database. How do I connect my whatever tool I want, or even type sequel by hand. And I want access to the data and I want to just use it. Right. And I want the performance of course, to be fast because otherwise I'll get frustrated and I won't use it, which has been the status quo for a long time. Um, and that's basically what we're solving >>The lake house versus a data warehouse, better able to really facilitate data democratization across an organization. >>Yeah. Because there's a big, you know, people don't talk a lot about the story before the story, right. With, with a data warehouse, the data never starts there. Right. You typically first have your data in something like an S3 or perhaps in other databases, right. And then you have to kind of ETL at all into, um, into that warehouse. And that's a lot of work. And typically only a small subset of the data gets ETL into that data warehouse. And then the user wants to query something that's not in the warehouse. And somebody has to go from engineering, spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, uh, to get the data in. And so it's a big problem, right? And so if you can have a system that can query the data directly in S3 and even join it with sources, uh, outside of that things like your Oracle database, your, your SQL server database here, you know, Mongo, DB, et cetera. Well, now you can really have the ability to expose data to your, to your users within the company and make it very self-service. They can, they can query any data at any time and get a fast response time that that's, that's what they need >>At self-service is key there. Speaking of self-service and things that are new. I know you guys dromio cloud launched that recently, new SAS offering. Talk to me about that. What's going on there. Yeah. >>We want to stream your cloud. We, we spent about two years, um, working on that internally and, uh, really the goal was to simplify how we deliver all of the, kind of the benefits that we've had in our product. Right. Sub-second response times on the lake, a semantic layer, the ability to connect to multiple sources, but take away the pain of having to, you know, install and manage software. Right. And so we did it in a way that the user doesn't have to think about versions. They don't have to think about upgrades. They don't have to monitor anything. It's basically like running and using Gmail. Right? You log in, you, you get to use it, right. You don't have to be very sophisticated. There's no, not a lot of administration you have to do. Um, it basically makes it a lot, a lot simpler. >>And what's the adoption been like so far? >>It's been great. It's been limited availability, but we've been onboarding customers, uh, every week now. Um, many startups, many of the world's largest companies. So that's been, that's been really exciting actually. >>So quite a range of customers. And one of the things, it sounds like you want me to has grown itself during the pandemic. We've seen acceleration of, of that, of, of, uh, startups, of a lot of companies, of cloud adoption of migration. What are some, how have your customer conversations changed in the last 22 months as businesses and every industry kind of scrambled in the beginning to, to survive and now are realizing that they need to modernize, to thrive and to be competitive and to have competitive advantage. >>I think I've seen a few different trends here. One is certainly, there's been a lot of, uh, acceleration of movement to the cloud, right? With, uh, uh, you know, how different businesses have been impacted. It's required them to be more agile, more elastic, right. They don't necessarily know how much workload they're gonna have at any point in time. So having that flexibility, both in terms of the technology that can, you know, with Dremio cloud, we scale, for example, infinitely, like you can have, you know, one query a day, or you can have a thousand queries a second and the system just takes care of it. Right. And so that's really important to these companies that are going through, you know, being impacted in various different ways, right? You had the companies, you know, the Peloton and zooms of the world that were business was exploding. >>And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, uh, you know, since then, but so that flexibility, um, has been really important to customers. I think the other thing is just they've realized that they have to leverage data, right? Because in parallel to this pandemic has been also really a boom in technology, right? And so every industry is being disrupted by new startups, whether it's the insurance industry, the financial services, a lot of InsureTech, FinTech, you know, different, uh, companies that are trying to take advantage of data. So if you, as a, as an enterprise are not doing that, you know, that's a problem. >>It is a problem. It's definitely something that I think every business and every industry needs to be very acutely aware of because from a competitive advantage perspective, you know, there's someone in that rear view mirror who is going to be focused on data. I have a real solid, modern data strategy. That's going to be able to take over if a company is resting on its laurels at all. So here we are at reinvent, they talked a lot about, um, I just came off of Adam psyllid speeds. So Lipsey's keynote. But talk to me about the jumbo AWS partnership. I know AWS its partner ecosystem is huge. You're one of the partners, but talk to me about what's going on with the partnership. How long have you guys been partners? What are the advantages for your customers? >>You know, we've been very close partners with AWS for, for a number of years now, and it kind of spans many different parts of AWS from kind of the, uh, the engineering organization. So very close relationship with the S3 team, the C2 team, uh, you know, just having dinner last night with, uh, Kevin Miller, the GM of S3. Um, and so that's kind of one side of things is really the engineering integration. You know, we're the first technology to integrate with AWS lake formation, which is Amazon's data lake security technology. So we do a lot of work together on kind of upcoming features that Amazon is releasing. Um, and then also they've been really helpful on the go-to-market side of things on the sales and marketing, um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually promoting Dremio to their customers, um, uh, to help them be successful. So it's really been a good, good partnership. >>And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, their customer obsession sounds like you're, there's deep alignment on from the technical engineering perspective, sales and marketing. Talk to me a little bit about cultural alignment, because when you're going into customer conversations, I imagine they want to see one unified team. >>Yeah. You know, I think Amazon does have that customer first and obviously we do as well. And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. Right. So that's where we call it customer obsession internally. Um, and I think that's very much what we've seen, you know, with, with AWS as well as the desire to make the customer successful comes before. Okay. How does this affect a specific Amazon product? Right? Because anytime a customer is, uh, you know, using Dremio on AWS, they're also consuming many different AWS services and they're bringing data into AWS. And so, um, I, I think for both of us, it's all about how do we solve customer problems and make them successful with their data in this case. Yup. >>Solving those customer problems is the whole reason that we're all here. Right. Talk to me a little bit about, um, as we have just a few more minutes here, we, when we hear terms like, future-proof, I always want to dig in with, with folks like yourself, chief product officers, what does it actually mean? How do you enable businesses to create these future-proof data architectures that are gonna allow them to scale and be really competitive? Sure. >>So yeah, I think many companies have been, have experienced. What's known as lock-in right. They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, right? You, you start using that and you can really never get off and prices go up and you find out that you're spending 10 times more, especially now with the cloud data warehouses 10 times more than you thought you were going to be spending. And at that point it becomes very difficult. Right? What do you do? And so, um, one of the great things about the data lake and the lake house architecture is that the data stays stored in the customer's own account. Right? It's in their S3 buckets in source formats, like parquet files and iceberg tables. Um, and they can use many different technologies on that. So, you know, today the best technology for, for, you know, sequel and, you know, powering your, your mission critical BI is, is Dremio, but tomorrow they might be something else, right. >>And that customer can then take that, uh, uh, that company can take that new technology point at the same data and start using it right. That they don't have to go through some really crazy migration process. And, you know, we see that with Teradata data and Oracle, right? The, the, the old school vendors, um, that's always been a pain. And now it is with the, with the newer, uh, cloud data warehouses, you see a lot of complaints around that, so that the lake house is fundamentally designed. Especially if you choose open source formats, like iceberg tables, as opposed to say a Delta, like you're, you're really, you know, future-proofing yourself. Right. Um, >>Got it. Talk to me about some of the things as we wrap up here that, that attendees can learn and see and touch and feel and smell at the jumbo booth at this reinvent. >>Yeah. I think there's a, there's a few different things they can, uh, they can watch, uh, watch a demo or play around with the dremmel cloud and they can talk to our team about what we're doing with Apache iceberg. It's a iceberg to me is one of the more exciting projects, uh, in this space because, you know, it's just created by Netflix and apple Salesforce, AWS just announced support for iceberg with that, with their products, Athena and EMR. So it's really kind of emerging as the standard table format, the way to represent data in open formats in S3. We've been behind iceberg now for, for a while. And so that to us is very exciting. We're happy to chat with folks at the booth about that. Um, Nessie is another project that we created an source project for, uh, really providing a good experience for your data, where you have version control and branching, and kind of trying to reinvent, uh, data engineering, data management. So that's another cool project that there, uh, we can talk about at the booth. >>So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program today, talking about the difference between a data warehouse data lake, the lake house, did a great job explaining that Jamil cloud what's going on and how you guys are deepening that partnership with AWS. We appreciate your time. Thank you. Thanks for having me. My pleasure for Tomer. She ran I'm Lisa Martin. You're watching the cube. Our coverage of AWS reinvent continues after this.

Published Date : Nov 30 2021

SUMMARY :

She ran the founder and CPO of Jenny-O to the program. It's great to be here. Talk to me about what what's going on at Jemena you guys were on the program earlier this summer, We're, uh, you know, probably three times bigger than we were a year data lake house, but I love the idea of a lake house by the way, but talk to me about what the differences are similarities So the lake house makes kind of both of those come together and gives you the, the benefits of both Elizabeth talk to me about from a customer lens perspective, what are some of the key benefits and how does the customer go You know, whether that's a, you know, Terra data or snowflake or any, any other, uh, you know, database out there, expanding that bringing the engines to the data is probably now more timely than ever. And so, uh, that's very hard if you have to constantly move data around, if you have to take your data, It's really critical because as you talked about this, you know, companies, every company, these days is a data company. I think, you know, we, we see it even in our own company, right? The lake house versus a data warehouse, better able to really facilitate data democratization across spend, you know, a month or two months, you know, respond to that ticket and wiring up some new ETL, I know you guys dromio cloud launched that recently, to, you know, install and manage software. Um, many startups, many of the world's largest companies. And one of the things, it sounds like you want me to has grown itself during the pandemic. So having that flexibility, both in terms of the technology that can, you know, And then of course, you know, the travel and hospitality industries, and that went to zero, all of a sudden it's been recovering nicely, You're one of the partners, but talk to me about what's going on with the partnership. um, whether it's, you know, blogs on the Amazon blog, where their sales teams actually And there they are, every time I talked to somebody from Amazon, we always talk about their kind of customer first focus, And we, you know, we have to right as a, as a startup for us, you know, if a customer has a problem, the whole company will jump on that problem. How do you enable businesses to create these future-proof They, they invest in some technology, you know, we've seen this with, you know, databases and data warehouses, And, you know, we see that with Teradata data and Oracle, right? Talk to me about some of the things as we wrap up here that, that attendees can learn and see and uh, in this space because, you know, it's just created by Netflix and apple Salesforce, So lots of opportunity there for attendees to learn even thank you, Tomer for joining me on the program

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Kevin Miller	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Tom	PERSON	0.99+
10 times	QUANTITY	0.99+
10 times	QUANTITY	0.99+
Tomer	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Elizabeth	PERSON	0.99+
two months	QUANTITY	0.99+
Tomer Shiran	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Netflix	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Lipsey	PERSON	0.99+
Dremio	PERSON	0.99+
tomorrow	DATE	0.99+
apple	ORGANIZATION	0.99+
a month	QUANTITY	0.99+
One	QUANTITY	0.99+
fall of 2021	DATE	0.98+
today	DATE	0.98+
Eddie	PERSON	0.98+
one	QUANTITY	0.98+
both worlds	QUANTITY	0.98+
Adam psyllid	PERSON	0.98+
Gmail	TITLE	0.98+
S3	TITLE	0.97+
next decade	DATE	0.97+
SQL	TITLE	0.97+
a year ago	DATE	0.97+
three times	QUANTITY	0.97+
two live sets	QUANTITY	0.97+
2022	DATE	0.97+
this week	DATE	0.96+
iceberg	TITLE	0.96+
Dremio	ORGANIZATION	0.96+
first	QUANTITY	0.96+
about two years	QUANTITY	0.95+
Apache	ORGANIZATION	0.95+
Tableau	TITLE	0.95+
Monday morning	DATE	0.94+
SAS	ORGANIZATION	0.94+
one query	QUANTITY	0.94+
Jemena	ORGANIZATION	0.94+
earlier this summer	DATE	0.93+
second	QUANTITY	0.93+
first focus	QUANTITY	0.92+
last 22 months	DATE	0.91+
Delta	ORGANIZATION	0.9+
zero	QUANTITY	0.9+
2021	DATE	0.89+
last night	DATE	0.87+
a thousand queries	QUANTITY	0.85+
Mongo	ORGANIZATION	0.85+
a day	QUANTITY	0.84+
first technology	QUANTITY	0.82+
pandemic	EVENT	0.81+
a second	QUANTITY	0.8+

Robert Maybin, Dremio | AWS Startup Showcase: Innovations with CloudData & CloudOps

(upbeat music) >> Welcome to today's session of the AWS Startup Showcase, featuring Dremio. I'm your host, Lisa Martin. And today we're joined by Robert Maybin, Principal Architect at Dremio. Robert is going to talk to us about democratizing your data by eliminating data copies. Robert, welcome. It's great to have you in today's session. >> Great. Thank you, Lisa. It's great to be here. >> So talk to me a little bit about why data copies, as Dremio says, are the key obstacle to data democratization? >> Oh, sure. Sure. Well, I think when you think about data democratization and really what that means, what people mean when they talk about data democratization, what they're really speaking to is kind of the desire for people in the organization to be able to, you know, work with the enterprises data, discover data, really, in a more self-service way. And you know, when you think about democratization, you might say, "Well, what's wrong with copies? What could be more democratic than giving everybody their own copy of the data?" But I think when you really think about that and how that ties into, you know, traditional architectures and environments, there are a lot of problems that come with copies, and those are real impediments. And so, you know, traditionally, in the data warehousing world, what often happens is that there are numerous sources of data that are coming in in all different formats, all different structures. These things, typically, for people that query them, have got to be, you know, loaded into some sort of a data warehousing tool. You know, maybe they land in cloud storage, but before they can be queried, you know, somebody has to go in and basically reformat those data sets, transform them in ways that make them more useful and make them more performant. And so this is very, very common. Like I think many, many organizations do this, and it makes a lot of sense to do it, because, you know, traditionally, the formats of the data is sourced in is pretty hard to work with and it's very slow to query. So copies is kind of a natural thing to do, but it comes at a real cost, right? There's a tremendous complexity that can come about, and having to do all these transformations. There's a real dollar cost, and there's a lot of time involved too. So, you know, if you could kind of take all of these middle steps out, where you're copying and transforming, and then transforming again, and then, potentially, persisting very high-performance structures for fast BI queries, you can reduce a lot of those impediments. >> So talk to me about... Oh, I'm sorry. Go ahead. >> Go ahead. >> I was just going to say, you know, of the things that is even in more demand now is the need for real time data access. I think real-time is no longer a nice-to-have. And I think what we've been through the last year has really shown that. So given the legacy architectures and some of the challenges with copies being an obstacle to that true democratization, how can data teams actually get in there and solve this challenge? >> Yeah, so, you know, I think going back a little bit to the prior question, and I can fill out a little bit more of the detail, and that'll lead us to your point, that one of the things that is also really born as a cost, when you have to go through and make multiple copies, is that, you know, typically you need experts in the organization, who are the ones who are going to, you know, write the ETL scripts, or, you know, kind of do the data architecture and design the structures that have to be performant for real-time BI queries, right? So typically these take the form of things like, you know, OLAP cubes, or, you know, big flattened data structures with all of the attributes joined in, or there's a lot of different ways that you can get query performance. Typically that's not available directly against the source data. So, you know, one of the things that data teams can do, and, you know, there's really two ways to go about this, right? One is you can really go all in on the data copy approach, and kind of home grow or build yourself a lot of the automation and tooling, and, you know, parts that it would take to basically transform the data. You can build UIs for people to go in, and kind of request data, and you can automate this whole process. And we found that a number of large organizations have actually gone this route. And they've kind of been at these projects for, in some cases, years, and they're still not completely there. And so I wouldn't really recommend that approach. I think that the real approach, and this is really available today with kind of the the rise of cloud technologies, is that we can shift our thinking a bit, right? And so we can think about how do we take some of these, you know, features and capabilities that one would expect in a data warehousing environment, and how can we bring that directly to the data? So, you know, with the shift in thinking, it requires kind of new technology to do this, right? So if you could imagine a lot of these traditional data warehousing features, like interactive speed, and, you know, the ability to kind of build structures, or, you know, views or things on top of your data, but do that directly on the data itself without having to transform and copy, transform and copy. So that's really something that we kind of call the next generation data lake architecture, is bringing those capabilities directly to the data that's on the lake. >> So leaving the data where it is, next generation is a term like future-ready, that's used a lot. Let's unpack that and dig into why what you're talking about is the next generation data lake architecture. >> Sure, sure. And I think to talk about that, the first thing that we really have to discuss is, really, a fundamental shift in technologies that's come about really in the last few years. So, you know, as really cloud services, like AWS, who've have risen to prominence, there are some capabilities that are available to us now that just weren't, you know, three, four or five years ago. And so what we can do now is that we have the ability to truly separate compute and storage, connected together with really fast networking. And we can, you know, provision storage, and we can provision compute. And from the perspective of the user, those two things can basically be scaled infinitely, right? And if you contrast that with what used to have to happen, or what we used to have to do in platforms like Hadoop or in scale-out MPP data warehouses, is that we didn't have, not only the the flexibility to scale compute and storage independently, but we didn't have the kind of networking that we have today. And so it was a requirement to take, you know, basically the compute, and push it as close to the data as we could, which is what you would get in a large Hadoop cluster. You've got, you know, nodes, which have compute right next to the storage, and you try to push as much work as you can onto each node before you start to transfer the data to other nodes for further processing. And now what we've got with some of the new cloud technology is the ability to, basically, do away with that requirement, right? So now we can have very, very large provision pools of data that can grow and grow and grow, really, without the limitations of nodes of hardware. And we can spin up and down compute process that. And the thing that we need, though, is a way of processing it, a query processing engine that's built for those dynamics, right? That's built, so that it performs really, really well when compute and storage are decoupled. So I think that that's really the trick, is that once we really, you know, come into the fact that we've got this new paradigm with separate compute, separate storage, very fast networking, if we start to look for technologies that can scale out and back, and do really performance query in that environment, then that's really what we're talking about. Now, I think the very last piece, and what I would call kind of next gen data lake architecture, is very common even today for organizations to have a data lake, right? That contains a tremendous amount of data, but in order to do actual BI queries at that interactive speed that people expect, they still have to take portions of the data from the lake and go load it into a warehouse, right? And then probably from there build, you know, OLAP cubes, or, you know, extracts into a BI tool. So the last piece, really, in the next gen data lake architecture puzzle, is once you've got that fast query engine foundation, how do you then move those interactive workloads into that platform, so they don't have to be in a data warehouse, right? How do you take some of those data warehousing expectations and put those into a platform that can query data directly? So that that's really what the next generation means to us. >> So let's talk about Dremio now. I see that just in January of 2021, Series D funding of $135 million. And then I saw that Datanami actually coined Dremio as a unicorn, as it's reached a $1 billion valuation. Talk to us about what Dremio is, and how you're part of this modern data architecture. >> Absolutely. Yeah. So, you know, you can think about Dremio as a... You know, in the technology context, really, is solving that problem that I just laid out, which is we're in the business of, you know, building technology that allows users to query very large data sets in a scale-out, very performant way, you know, directly on the data where it lives. So there's no real need for data movement. And in fact, we can also not only query one source of data, but we can query multiple sources of data, and, you know, join those things together in the context of the same query. So, you know, you may have most of your data in a data lake, but then you may have some relational sources. So there's a potent story there, in that you don't have to consolidate all of your data into one place. You don't have to load all of your data into, you know, a data warehouse or a cloud data warehouse. You can query it where it is. That's the first piece. I think the next piece that the Dremio provides is kind of, as we mentioned before, we're giving almost a data warehouse-like user experience in terms of very, very fast response times for things like BI dashboards, right? So really interactive queries. And the ability to do things, like you would normally expect to do inside a warehouse. So you can, you know, create schemas, for instance, you can create layers of views and accelerations, and effectively allow users to build out virtually in the form of views, what they would have done before with all of their various ETL pipelines, to, you know, scrub and prepare and transform the data to get it in shape to query. And at the very end, what we can do is selectively, kind of in an internally managed way, we can accelerate certain query patterns by creating something that we call reflections, which is an internally managed, you know, persistence of data that accelerates certain queries, but it's entirely internally managed by Dremio. The user doesn't have to worry with anything to do with setup, or configuration, or clean up, or maintenance, or any of that stuff. >> So does reflections really provide a differentiator for Dremio, if you look in the market and you see competitors, like Snowflake, SingleStore, for example, is this really kind of that competitive differentiator? >> I think it's one of them. I think the ability to create reflections is it's certainly a differentiator, because what it allows is it allows you to basically accelerate different kinds of query patterns against the same underlying source data, right? So rather than have to go build a transformation for a user, that, you know, potentially aggregates data a certain way, and persist that somewhere, and have to build all the machinery to do that and maintain it, in Dremio, literally, it's a button click. You can, you know, go in and look at the dataset, identify those dimensions that you need to, say, aggregate by, the measures that you want to compute, and Dremio will just manage that for you, and any query that comes in, that may be going after this massive detail table with a trillion rows, that has a GROUP BY in it, for instance, will just match that reflection and use it. And that query can respond in less than a second, where typically the work that would have to happen on the backend engine might take a minute to process that query. So really that's the edge piece that gives us that BI acceleration without having to use additional tools or in any additional complexity for the user. >> And I assume you're talking about like millisecond response times, right? You said under a second, but I'm sure milliseconds? >> Hundreds of milliseconds, typically. So we're not really in the one to two millisecond range. That's pretty, pretty rare (chuckles), but certainly sub-second response times is very, very common with very, very large backend data sets when you use reflections, mm-hmm. >> Got it, and that speed and performance is absolutely table stakes today for organizations to succeed and thrive. So is what Dremio delivers a no-copy data strategy? Is that what you consider it? >> It's that, and it's actually much more than that, right? So I think, you know, when you talk to, really, users of the platform, there are a number of layers of Dremio, and, you know, we often get asked, I get asked, you know, who are our direct competitors, right? And I think that when you think about that question, it's really interesting, because we're not just the backend high-performance query engine. We aren't just the acceleration layer, right? We also have a very rich, fully-featured UI environment, that allows users to actually log in, find data, curate data, you know, reflect data, build their own views, et cetera. So there's really a whole suite of services that are built in to the Dremio platform, that make it very, very easy to install Dremio on, you know... You know, install it on AWS, get started right away, and be querying data, kind of building these virtual views, adding accelerations. All this can happen within minutes. And so it's really interesting that there's kind of a wide spectrum of services that allow us to really power a data lake in its entirety, really, without too many other technologies that have to be involved there. >> What are some of key use cases that you've seen, especially in the last year, as we've seen this rapid acceleration of digital transformation, this adoption of SaaS applications, more and more and more data, some of those key use cases that Dremio is helping customers solve? >> Sure. Yeah. I think there's a number of verticals, and there's some that I'm very familiar with, because I've worked very closely with customers, and in financial services is a large one, you know, and that would include, you know, banking, insurance, investment, you know, a lot of the large fortune 500 companies that maybe in manufacturing, or, you know, transportation, shipping, et cetera. You know, I think lately I'm most familiar with some of the transformation that's going on in the financial services space, and what's happening there, you know, companies have typically started with very, very large data warehouses, and often for the last four or five years, maybe a little longer, they've been in this transition to building kind of an in-house data lake, typically on a Hadoop platform of some flavor, with a lot of additional services that they've created to try to enable this data democratization. But these are huge efforts. And, you know, typically these are on-prem, and, you know, lots of engineers working on these things, really, full-time, to build out this full spectrum of capabilities. The way that Dremio really impacts that is, you know, we can come in and actually take the place of a lot of parts of that puzzle. And we give a really rich experience to the user, you know, allow customers to kind of retire some of these acceleration layers that they've put in to try to make BI queries fast, get rid of a lot of the transformations, like the ETL jobs or ELT processes that have to run. So, you know, there's a really wide swath of that puzzle that we can solve. And then when you look at the cloud, because all of these organizations, they've got a toe in the water, or they're halfway down the path, of really exploring how do we take all of this on-prem data and processing and everything else, and get it into AWS, you know, put it in the cloud? What does that architecture look like? And we're ideally positioned for that story. You know, we've got an offering that runs, you know, natively on AWS, and takes full advantage of kind of the decoupling of compute and storage. So we give organizations a really good path to solve some of their on-prem problems today, and then give them a clear path as they migrate into cloud. >> Can you walk me through a customer example that you think really underscores what you just described as what Dremio delivers, and helping customers with this migration, and to be able to take advantage and find value in volumes and volumes of data? >> Yeah, absolutely. Unfortunately, I can't mention their name, but I have worked very, very closely with a large customer, as I mentioned in financial services. And one of the things that they're very keenly interested in is, you know, they've had a pretty large deployment that traditionally has been both Hadoop-based, and they've got a large, several large on-prem relational data warehouses as well. And Dremio has been able to come in and actually provide that BI performance piece, basically, you know, the very, very fast, you know, second, two second, three-second performance that people would expect from the data warehouse, but we're able to do that directly on, you know, the files and tables that are in their Hadoop cluster. And that project's been going on for quite some time, and we've had success there. I think that where it really starts to get exciting though, and this is just beginning, is this customer also is, you know, investigating and actually prototyping and building out a lot of these functions in the AWS cloud. And so, you know, the nice thing that we're able to offer is, really, a consistent technology stack, consistent interfaces you know, consistent look and feel of the UI, both on-prem and in the cloud. And so we can really, once they start that move, now they've got kind of the familiar place to connect to for their data and to run their queries. And that's a nice seamless transition as they migrate. >> What about other verticals? Like, I can imagine healthcare and government services, are you seeing traction in those segments as well? >> Yeah, absolutely. We are. There are a number of companies in the healthcare space. I think that one of the larger ones in the government space, which I have some exposure to, is CMS, which is one that we had done some work through a partner to implement Dremio there. And, you know, this was a project, I think, that was undertaken about a year ago. They implemented our technology as part of a larger data lake architecture, and had a good bit of success there. So what's been interesting, when you talk about the funding and the valuation, and the kind of the buzz that's going on around Dremio is that we really have customers in so many different verticals, right? So we've got certainly financials and healthcare, and, you know, insurance, and, you know, big commercials, like in manufacturing, et cetera. So we're seeing a lot of interest across a number of different verticals, and customers are are buying and implementing the product in all those verticals, yeah. >> All right, so take us out with where customers can go, and prospects that are interested, and even investors, in finding out more about this next generation data engine that is Dremio. >> Absolutely. So I think the first thing that people can do is they can go to our website, which is dremio.com, and they can go to dremio.com/labs. And from there they can launch a self-guided product tour. I think that's probably a very quick way to get an overview of the product, and who we are, what we do, what we offer. And then there's also a free trial that's actually on the AWS marketplace. So if you want to actually try Dremio out, and, you know, spin up an instance, you can get us on the marketplace. >> Do most of your customers do that, like doing a trial with a proof of concept, for example, to see really how, from an architecture perspective, how these technologies are synergistic? >> Absolutely. Yeah. I think that probably every large enterprise, you know, there's a number of ways that customers find us. And so, you know, often customers may just try the trial on the marketplace. But, you know, customers may also, you know, reach out to our sales team, et cetera, but it's very, very common for us to do a proof of concept, that's not just architecture, but it would cover, you know, performance requirements and things like that. So I think pretty much all of our very largest enterprise customers would go through some sort of a proof of concept, and that would be done with the support of our field teams. >> Excellent, well, Robert, thanks for joining me today, and sharing all about Dremio with our audience. We appreciate your time. >> Great. Thank you, Lisa. It was a pleasure. >> Likewise, for Robert Maybin, I'm Lisa Martin. Thanks for watching. (upbeat music)

Published Date : Mar 24 2021

SUMMARY :

have you in today's session. It's great to be here. have got to be, you know, So talk to me about... you know, of the things that is that, you know, So leaving the data where it is, is that once we really, you know, Talk to us about what Dremio is, in that you don't have to You can, you know, go in when you use reflections, mm-hmm. Is that what you consider it? So I think, you know, when you talk you know, a lot of the And so, you know, the nice and, you know, insurance, and prospects that are interested, and, you know, spin up an instance, And so, you know, often customers and sharing all about It was a pleasure. Likewise, for Robert Maybin,

ENTITIES

Entity	Category	Confidence
Robert Maybin	PERSON	0.99+
Robert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
January of 2021	DATE	0.99+
Lisa	PERSON	0.99+
$1 billion	QUANTITY	0.99+
Dremio	ORGANIZATION	0.99+
$135 million	QUANTITY	0.99+
less than a second	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Datanami	ORGANIZATION	0.99+
first piece	QUANTITY	0.99+
today	DATE	0.99+
dremio.com/labs	OTHER	0.99+
Dremio	TITLE	0.99+
two ways	QUANTITY	0.99+
last year	DATE	0.99+
One	QUANTITY	0.99+
500 companies	QUANTITY	0.98+
one	QUANTITY	0.98+
first thing	QUANTITY	0.98+
Hundreds of milliseconds	QUANTITY	0.98+
two things	QUANTITY	0.98+
both	QUANTITY	0.98+
under a second	QUANTITY	0.97+
Dremio	PERSON	0.97+
dremio.com	OTHER	0.97+
first thing	QUANTITY	0.97+
one source	QUANTITY	0.96+
second	QUANTITY	0.96+
each node	QUANTITY	0.95+
three-second	QUANTITY	0.94+
Snowflake	ORGANIZATION	0.94+
five years ago	DATE	0.94+
SingleStore	ORGANIZATION	0.92+
four	DATE	0.91+
AWS Startup Showcase	EVENT	0.9+
three	DATE	0.9+
two millisecond	QUANTITY	0.89+
a minute	QUANTITY	0.87+
about	DATE	0.86+
one place	QUANTITY	0.85+
five years	QUANTITY	0.82+
two second	QUANTITY	0.81+
a year ago	DATE	0.79+
trillion rows	QUANTITY	0.79+
last few years	DATE	0.68+
Hadoop	TITLE	0.61+
last	DATE	0.55+

Isha Sharma, Dremio | CUBE Conversation | March 2021

>>Well, welcome to the special cube conversation. I'm Jennifer with the cube, your host, we're here with Jeremy and Iisha Sharma director of product management for trim. We're going to talk about data, data lakes, the future of data, and how it works with cloud and in the new applications. Iisha thanks for joining me. >>Thank you for having me, John, >>You guys are a cutting-edge startup. You've got a lot of good action going on. You're kind of on the new, the new guard as Andy Jassy at AWS always talks about this. The old guard incumbents you guys are on the, on the new breed, you guys are doing the new stuff around data lakes and also making data accessible for customers. Uh, what, what is that all about? Take us through what is Dremio. >>So Dremio is the data Lake service that essentially allows you to very simply run SQL queries on directly on your data Lake storage, without having to make any of those copies that everybody's going on about all the time. So you're really able to get that fast time to value without having to have this long process of let's put in a request to my data team, let's make all of those copies and then finally get this very reduced scope of, of your data and still have to go back to your data team every time you need it, you need a change to that. So dreamy is bringing you that fast time to value with that. No copy data strategy, and really providing you the flexibility to keep your data in your data Lake storage, as the single source of truth. >>You know, the past 10 years, we've watched with cube coverage since we've been doing this program and in the community following from the early days of Hadoop to now, we've seen the trials and tribulations of ETL data warehousing. We've seen the starts and stops, and we've seen that the most successful formula has been store everything. Um, and then, you know, then the ease of use became a challenge. I don't want to have to hire really high powered engineers to manage certain kinds of clusters. I just got cloud now comes into the mix. I got on-premise storage, but the notion of a data Lake became hugely popular because it became a phrase meant store everything, and it meant different things to different peoples. And since then, teams of people have been hired to be the data teams. So it's kind of new. So I got to ask you, what is the challenge of these data teams? What do they look like? What's the psychology going on with some of the people on these teams? What problems are they solving what's going on? Because you know, they becoming data full >>To take >>Us through what's going on with data teams, >>To your point, the volumes, the variety of data, Eastern growing exponentially every day, there's really no end to it, right? And companies are looking to get their hands on as much data as they possibly can. So that means data teams in a position to how do I provide access to as many users as easily as possible that self service experience or data, um, and data democratization as much of a great concept as it is in theory, it comes with its own challenges in terms of all of those copies that ended up being created to provide the quote unquote self service experience. And then with all of these copies comes the cost to store all of them. And you've just added a tremendous amount of complexity and delayed your time to value significantly. >>You mentioned self-service is one of those things that seems like a moving train. Everyone I talked to is like, Oh, self-service is the Holy grail we've got to get to self-service almost. And then you get to some self serves, then you gotta, you gotta re rethink it cause more stuff's changing. So I have to ask in that capacity, you've got data architects and you've got analysts, the customer of the data. How's the, what's the relationship between those two is who gives and who gets, who drives it, who leans in to the analyst, feed the requirements into the architect, set up the boundaries. How is that relationship? Can you take us through how you guys view the relationship between the data analyst and architect? I mean data architect and the data analysts. >>Sure. So you have the data architect, the data team that's actually responsible for providing data access at the end of the day, right? They're the people that have the data democratization requirement on them. And so they've created these copies, tremendous amount of copies. A lot of the times the data Lake storage is, is that source of truth. But, um, you're copying your data into a data warehouse. And then what they end up doing is your, your end user, your analyst, they want, they all want different types of data. They want different views of this data. So there's a tremendous amount of personalized copies that the architects end up creating. And then on top of it, there's performance. We need to get everything back in a timely manner. Otherwise what's the point, right? Real time analytics. So there's all these performance related copies, whether that be additive tables or, you know, VI extract cues, all of that fun stuff. >>And so the architect is the one that's responsible for creating all of those. That's what they have to do to provide access to the analyst. And then, like I'm saying, when we need an update to that data set, when I discover that I have a new data set, that I need to join with an existing one, I have the analyst go to the data architect and say, Hey, by the way, I need this new data set. Can you make this usable for me? Or can you provide me access? And so then we did protect has to process that request now. And so again, coming back to all these copies that have been created, um, the data architect goes through a tremendous amount of work and almost, um, has, has to do this over and over again to actually make the data available to the analyst. But it's a cycle that goes on between the two. >>Yeah. It's interesting dynamic. It's a power dynamic, but also trying to get to the innovation. I've got to ask you, some people are saying that data copies are the major obstacle for democratization. How do you respond to that? What's your view? >>They absolutely are. Data copies are the complete opposite of data democratization. There's no aspect of self-service there, which is exactly what you're looking to do with data democratization. Um, because of those copies, how do you manage those? How do you govern those? How, uh, like I was saying, when somebody needs a new data set or an update to one, they have to go back to that data team. And there goes that self-service actually Dana coffees create a bottleneck because it all comes back to that data team that has to continue to get through those requests that are coming in from their analysts. So, uh, data copies and data democratization is completely automated. >>You know, I remember talking to David latte in a cube event two years ago, he said infrastructure as code was the big DevOps movement. And we felt that data ops would be something similar where data as code, where you didn't have to think about it. So you're kind of getting to this idea of, you know, copies are bad because it doesn't, it holds back that self-service this modern error is looking for more of programmability with data. Kind of what you're teasing out here is that's the modern architecture. Is that how you see it? How do, how do you see, uh, a, uh, a modern data architecture? >>Yeah, so the modern data or the data architecture has evolved significantly in the last several years, right? We started with traditional data warehouses and the traditional data Lake with Duke where the storage and compute were totally tightly coupled. And then we moved on to cloud data warehouses, where there was a separation of compute and storage, and that provided a little more flexibility there. But then with the modern data architecture now with cloud data lakes, you have this aspect of separating, not only storage and compute, but also compute data. So that creates a separate tier for data altogether. What does that look like? So you have your data and your feeling storage as three ATLs, whatever it may be. And on top of that. So of course it's an open format, right? And so on top of that, thanks to technology. It's like Apache iceberg and Delta Lake. There's this ability to give your files, your data, a table structure. And so that starts to bring the capabilities that a data warehouse was providing the data. Thanks to these. You have the ability to do transactions, record level mutations, burgeoning things that were missing completely from a data Lake architecture before. And so, um, introducing that, that data to your, having that separation of compute and data really, really accelerate the ability to get that time to value because you're keeping your data in the data Lake storage at the end of the day. >>And it's interesting, you see all the hot companies tend to be, have that kind of mindset and architecture, and it's creating new opportunities as a ton of white space. So I have to kind of ask you guys, how does Dremio fit into this because you guys are playing in this kind of the new wave here with data it's growing extremely, it's moving fast. You got, again, edge is developing more. Data's coming in at the edge. You've got hybrid testing multi-cloud environments on the horizon. I mean this ultimate multicloud, but I mean, data in real time across multiple clouds is the next kind of area people are focused on. What does, what's the role of GMU and all this to take, take us through that. >>Yeah. So Dremio provides, again, like I said, this data Lake service, and we're all referring to just storage or Hadoop. When we say data Lake, we're talking about an entire solution. Um, so you keep your data, you keep your data in your data, Lake orange. And then on top of that, with the integrations that Dremio has with Apache iceberg and Delta, like we do provide that data here that I was talking about. And so you've given your data, this table structure, and now you can operate on it like you would in a data warehouse. So there's really no need to move your data from a data Lake data warehouse, again, keeping that data Lake as that source of truth. And then on top of that, um, when we talk about copies, personalized copies, performance related copies, you, you really, like I was saying, you've created so much complexity with Jeremy of you don't do that when it comes to personalized copies, we've got the semantic layer and that's a very key aspect of Dremio where you can provide as many views of, of data that you want without having to make any copies. So it really accelerates that, that data democratization story, and then when it, >>So it's the no cop, my strategy trim, you guys are on it, but you're about no copy keeps semantic layer, have that be horizontal across whatever environment and just applications have, can applications tap into this, or how do you guys integrate into apps if I'm an app developer, for instance, how does that work? >>Of course. So that's, that's one of the most important use cases in the sense that when there's an application or even when it's a, you know, a BI client or some other tool that's tapping into the data in S3 or ATLs, a lot of people see performance degradation. Typically with the Dremio, that's not the case we've got, Aeroflight integrated into Tremino, it's a key component as well. And that puts so much, uh, it, so put so much ease in terms of running dashboards off of that, running your analytics apps off of that, because that replay can deliver 20 times the performance that PIO DBC could. So coming back to the no data strategy or note copy data strategy, there's no those local copies anymore that you needed to make. >>So one of the things I got to ask you is, cause this comes up all the time. So she had less pass re-invent. I notice again, Amazon was, I was banging on this hard Azure as well on their side too. Their whole thing is we want to take the AI environment and make it so that people can normal people can use it and deploy machine learning. The same thing kind of comes down into this layer where you're talking about is this democratization is a huge trend because you don't have to be super peaked, you know, math, PhD, data scientist, or ETL, or data Wrangler. You just want to actually code the data or play party with the data in any way you want to do with it. So, so the question I have is is that that's certainly a great trend and no one debates that, but the reality is people are storing data, like almost hoarding it, just throw it in a data Lake and we'll deal with them later. How does you guys solve that problem? Because once that starts happening, do you have to hire someone super smart to dig that out or rearchitected or because that seems to be kind of the pattern, right? You know, throw everything into data Lake, uh, and we'll deal with it later >>Called the data swamp. And it's like, no one knows what's going on. >>Of course though, you don't actually want to throw everything into a data Lake. There still needs to be a certain amount of structure that all of this lands in. You want it to live in one place, but have still a little bit of structure so that, um, Dremio and other are, are much more enabled to query that with fantastic performance. So there's, there's still some amount of structure that needs to happen at a data Lake level, but from, uh, that semantic layer that we have with during the, you you're, you're creating structure for your end user, >>How would you advise, how would you advise someone who wants to hedge their future and not take on too much technical debt, but says, Hey, you know, I do have the store. Is there a best practice on kind of some guard rails around getting going, how do you, how do you advise your customers who want to get it going? >>So how we advise our customers is again, plugin put your, put your data in that data Lake. A lot of them already have three TLS in place. And getting started with Bermeo is really easy. I would say I did it for the first time and it took a matter of minutes if not less. And so what you're doing with Dremio is connecting data directly to that data source and then creating a semantic layer on top. So you bring together a bunch of data. That's sitting in your data Lake, you know, if that sales data and Sophia, and we give you a really streamlined way to say together, the, you know, last, however, we go back in time, create a view on top of all of that. If you have that structured in folders as great, we will provide you a way to create one view on top of all of that, as opposed to having a view for every day or whatnot. And so again, that semantic layer really comes in handy when you're trying to, as the architect provide access to this data Lake. And then as the user who just, just interacts with the data as, as the views are provided to them, there's really, uh, there's a whole lot of transparency there, and it's really easy to get up and running with drumming. >>I'm looking forward to it. I got to finally ask the question is how do I get started? How do people engage with you guys? Is it, is it a freemium? Is it a cloud service? What's the requirements? What are some of the ways that people can engage and work with you guys? >>Yeah, so we get started, uh, on our website at dot com. And speaking of self-service, we've got a virtual lab at dremio.com/labs that you can get started with that gives you a product tour and even gives you a getting started, walk through the tissue through your first query so that you can see how well it works. And in addition to that, we've got a free trial of Dremio available on AWS marketplace. >>Awesome. Net marketplace is a good place to download stuff. So can I ask you a personal question, Isha? Um, you're the director of product management. You get to see inside the kitchen where everyone's making the, making the product. You also got the customer relationships out there looking at product market fit, as it evolves, customer's requirements evolve. What's some of the cool things that you've seen in this space. That's just interesting to you that either you kind of expected or maybe some surprises, what's the coolest thing you've seen come out of this new data environment we're living in. >>I think just the ability to the way things have evolved, right? It used to be data Lake or data warehouse, and you pick one, you probably have both, but you're not like reaching either to their highest potential. Now you've got, this is coming together of both of them. I think it's been fantastic to see how you've got technology is like a iceberg and Delta Lake and bringing those two things together. And you know, you're in your data Lake and it's great in terms of cost and storage and all of that. But now you're able to have so much flexibility in terms of some of those data warehouse capabilities. And on top of that with technologies like Dremio, and just in general, this open format concept, you're, you're never locked in with a particular vendor with a particular format. You're not locking yourself out of a technology that you don't even know exists yet. And thinking in the past, you were always going to end up there. You always ended up putting your data in something where it was going to be difficult to change it, to get it out. But now you have so much flexibility with the open architecture that's coming. What's the DNA like of the >>Culture at Treme. And obviously you've got a cutting edge. We're in a big, hot wave data. You're enabling a lot of value. Uh, what's the, what's it like there at Jemena? What do you guys strive for? What's the purpose? What's the, what's the DNA of the culture. >>There's a lot of excitement in terms of getting customers to this flexibility, to get them out of things they're locked into really in providing them with accessibility to their data, right? This data access data democratization concept to make that actually happen so that, you know, time to value is a key thing. You want to derive insights out of your, out of your data. And everybody, I drove you in super excited and charging towards that, >>Unlocking that value. That's awesome. Aisha, thank you for coming on the cube conversation. Great to see you. Thanks for coming on. Appreciate it. He's just Sharma director of product management. Dremio here inside the cube. I'm John for your host. Thanks for watching.

Published Date : Mar 17 2021

SUMMARY :

We're going to talk about data, data lakes, the future of data, you guys are on the, on the new breed, you guys are doing the new stuff around data lakes and also So Dremio is the data Lake service that essentially allows you to very following from the early days of Hadoop to now, we've seen the trials and tribulations of ETL So that means data teams in a position to And then you get to some self serves, then you gotta, you gotta re rethink it cause more A lot of the times the data Lake storage one, I have the analyst go to the data architect and say, Hey, by the way, How do you respond to that? Um, because of those copies, how do you manage those? Is that how you see it? the modern data architecture now with cloud data lakes, you have this aspect So I have to kind of ask you guys, how does Dremio fit So there's really no need to move your data from a data Lake that when there's an application or even when it's a, you know, a BI client or So one of the things I got to ask you is, cause this comes up all the time. And it's like, no one knows what's going on. that semantic layer that we have with during the, you you're, you're creating structure for your end user, How would you advise, how would you advise someone who wants to hedge their future and not take So you bring together a bunch of data. What are some of the ways that people can engage and work with you guys? so that you can see how well it works. That's just interesting to you that either you kind of expected or maybe some surprises, And you know, you're in your data Lake and it's great in terms What do you guys strive for? make that actually happen so that, you know, time to value is a Aisha, thank you for coming on the cube conversation.

ENTITIES

Entity	Category	Confidence
Jeremy	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Aisha	PERSON	0.99+
March 2021	DATE	0.99+
Isha Sharma	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Iisha Sharma	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dremio	PERSON	0.99+
20 times	QUANTITY	0.99+
John	PERSON	0.99+
Jennifer	PERSON	0.99+
Iisha	PERSON	0.99+
Dremio	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Sharma	PERSON	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
Sophia	PERSON	0.99+
one	QUANTITY	0.99+
GMU	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
David latte	PERSON	0.99+
two years ago	DATE	0.98+
first time	QUANTITY	0.98+
Delta	ORGANIZATION	0.98+
first query	QUANTITY	0.98+
Bermeo	ORGANIZATION	0.98+
Duke	ORGANIZATION	0.98+
dremio.com/labs	OTHER	0.95+
S3	TITLE	0.95+
dot com	ORGANIZATION	0.95+
Apache iceberg	ORGANIZATION	0.94+
SQL	TITLE	0.93+
Jemena	ORGANIZATION	0.93+
one place	QUANTITY	0.92+
Azure	TITLE	0.91+
Isha	PERSON	0.9+
single source	QUANTITY	0.88+
one view	QUANTITY	0.83+
Dana coffees	ORGANIZATION	0.8+
past 10 years	DATE	0.73+
last several years	DATE	0.73+
Treme	ORGANIZATION	0.72+
three	QUANTITY	0.71+
Lake	ORGANIZATION	0.68+
Dremio	TITLE	0.64+
Aeroflight	TITLE	0.64+
Tremino	TITLE	0.57+
Delta Lake	ORGANIZATION	0.56+
dreamy	PERSON	0.55+
Lake	LOCATION	0.46+

Jacques Nadeau, Dremio | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE, presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and it's ecosystem partners. >> Welcome back to Big Data SV in San Jose. This theCUBE, the leader in live tech coverage. My name is Dave Vellante and this is day two of our wall-to-wall coverage. We've been here most of the week, had a great event last night, about 50 or 60 of our CUBE community members were here. We had a breakfast this morning where the Wikibon research team laid out it's big data forecast, the eighth big data forecast and report that we've put out, so check out that online. Jacques Nadeau is here. He is the CTO and co-founder of Dremio. Jacque, welcome to theCUBE, thanks for coming on. >> Thanks for having me here. >> So we were talking a little bit about what you guys do. Three year old company. Well, let me start. Why did you co-found Dremio? >> So, it was a very simple thing I saw, so, over the last ten years or so, we saw a regression in the ability for people to get at data, so you see all these really cool technologies that came out to store data. Data lakes, you know, SQL systems, all these different things that make developers very agile with data. But what we were also seeing was a regression in the ability for analysts and data consumers to get at that data because the systems weren't designed for analysts, they were designed for data producers and developers. And we said, you know what, there needs to be a way to solve this. We need to be able to empower people to be self-sufficient again at the data consumption layer. >> Okay, so you solved that problem how, you said, called it a self-service of a data platform. >> Yeah, yeah, so self-service data platform and the idea is pretty simple. It's that, no matter where the data is physically, people should be able to interact with a logical view of it. And so, we talk a little bit like it's Google Docs for your data. So people can go into the system, they can see the different data sets that are available to them, collaborate around those, create changes to those that they can then share with other people in the organization, always dealing with the logical layer and then, behind the scenes, we have physical capabilities to interact with all the different system we interact with. But that's something that business users shouldn't have to think as much about and so, if you think about how people interact with data today, it's very much about copies. So every time you want to do something, typically you're going to make a copy. I want to reshape the data, I make a copy. I want to make it go faster, I make a copy. And those copies are very, very difficult for people to manage and they could have mixed the business meaning of data with the physical, I'm making copies to make them faster or whatever. And so our perspective is that, if you can separate away the physical concerns from the logical, then business users have a much more, much more likelihood to be able to do something self-service. >> So you're essentially virtualizing my corpus of data, independent of location, is that right, I mean-- >> It's part of what we do, yeah. No, it's part of what we do. So, the way we look at it is, is kind of several different components to try to make something self-service. It starts with, yeah, virtualize or abstract away the details of the physical, right? But then, on top of that, expose a very, sort of a very user-friendly interface that allows people to sort of catalog and understand the different things, you know, search for things that they want to interact with, and then curate things, even if they're non-technical users, right? So the goal is that, if you talk to sort of even large internet companies in the Valley, it's very hard to even hire the amount of data engineering that you need to satisfy all the requests of your end-users of data. And so the, and so the goal of Dremio is basically to figure out different tools that can provide a non-technical experience for getting at the data. So that's sort of the start of it but then the second step is, once you've got access to this thing and people can collaborate and sort of deal with the data, then you've got these huge volumes of data, right? It's big data and so how do you make that go faster? And then we have some components that we deal with, sort of, speed and acceleration. >> So maybe talk about how people are leveraging this capability, this platform, what the business impact is, what have you seen there? >> So a lot of people have this problem, which is, they have data all over the place and they're trying to figure out "How do I expose this "to my end-users?" And those end-users might be analysts, they might be data scientists, they might be product managers that are trying to figure out how their product is working. And so, what they're doing today is they're typically trying to build systems internally that, to provide these capabilities. And so, for example, working with a large auto manufacturer. And they've got a big initiative where they're trying to make the data that they have, they have huge amounts of data across all sort of different parts of the organization and they're trying to make that available to different data consumers. Now, of course, there's a bunch of security concerns that you need to have around that, but they just want to make the data more accessible. And so, what they're doing is they're using Dremio to figure out ways to, basically, catalog all the data below, expose that to the different users, applying lots of different security rules around that, and then create a bunch of reflections, which make the things go faster as people are interacting with the things. >> Well, what about the governance factor? I mean, you heard this in the hadoop world years ago. "Ah, we're going to make, we're going to harden hadoop, "we're going to" and really, there was no governance and it became more and more important. How do you guys handle that? Do you partner with people? Is it up to the customer to figure that out? Do you provide that? >> It's several different things, right? It's a complex ecosystem, right? So it's a combination of things. You start with partnering with different systems to make sure that you integrate well with those things. So the different things that control some parts of credentials inside the systems all the way down to "What's the file system permissions?", right? "What are the permissions inside of something like Hive and the metastore there?" And then other systems on top of that, like Sentry or Ranger are also exposing different credentialing, right? And so we work hard to sort of integrate with those things. On top of that, Dremio also provides a full security model inside of the sort of virtual space that we work. And so people can control the permissions, the ability to access or edit any object inside of Dremio based on user roles and LDAP and those kinds of things. So it's, it's kind of multiple layers that have to be working together. >> And tell me more about the company. So founded three years ago, I think a couple of raises, >> Yep >> who's backing you? >> Yeah, yeah, yeah, so we founded just under three years ago. We had great initial investors, in Red Point and Lightspeed, so two great initial investors and we raised about 15 million on that round. And then we actually just closed a B round in January of this year and we added Norwest to the portfolio there. >> Awesome, so you're now in the mode of, I mean, they always say, you know, software is such a capital-efficient business but you see software companies raising, you know, 900 million dollars and so, presumably, that's to compete, to go to market and, you know, differentiate with your messaging and branding. Is that sort of what the, the phase that you're in now? You kind of developed a product, it's technically sound, it's proven in the marketspace and now you're scaling the, the go-to-market, is that right? >> That's exactly right. So, so we've had a lot of early successes, a lot of Fortune 100 companies using Dremio today. For example, we're working with TransUnion. We're working with Intel. We actually have a great relationship with OVH, which is the third-largest hosting company in the world, so a lot of great, Daimler is another one. So working with a lot of great companies, seeing sort of great early success with the product with those companies, and really looking to say "Hey, we're out here." We've got a booth for the first time at Strata here and we're sort of letting people know about, sort of, a better way, or easier way, for people to deal with data >> Yeah. >> A happier way. >> I mean, it's a crowded space, right? There's a lot of tools out there, a lot of companies. I'm interested in how you sort of differentiate. Obviously simplification is a part of that, the breadth of your capabilities. But maybe, in your words, you could share with me how you differentiate from the competition and how you break out from the noise. >> Yeah, yeah, yeah, so it's, you're absolutely right, it's a very crowded space. Everybody's using the same words and that makes it very hard for people to understand what's going on. And so, what we've found is very simple is that typically we will actually, the first meeting we deal with a customer, within the first 10 minutes we'll demo the product. Because so many technologies are technologies, not, they're not products and so you have to figure out how to use the product. You've got to figure out how you would customize it for your certain use-case. And what we've found with our product is, by making it very, very simple, people start, the light goes on in a very short amount of time and so, we also do things on our website so that you can see, in a couple of minutes, or even less than that, little animations that sort of give you a sense of what it's about. But really, it's just "Hey, this is a product "which is about", there's this light bulb that goes on, it's great. And you figure this out over the course of working with different customers, right? But there's this light bulb that goes on for people that are so confused by all the things that are going on and if we can just sit down with them, show them the product for a few minutes, all of a sudden they're like "Wait a minute, "I can use this", right? So you're frequently talking to buyers that are not the most technical parts of the organization initially, and so most of the technologies they look at are technologies that are very difficult to understand and they have to look to others to try to even understand how it would fit into their architecture. With Dremio, we have customers that can, that have installed it and gotten up, and within an hour or two, started to see real value. And that sort of excitement happens even in the demo, with most people. >> So you kind of have this bifurcated market. Since the big data meme, everybody says they're data-driven and you've got a bifurcated market in that, you've got the companies that are data-driven and you've got companies who say they're data-driven but really aren't. Who are your customers? Are they in both? Are they predominantly in the data-driven side? Are they predominantly in the trying to be data-driven? >> Well, I would say that they all would say that they're data-driven. >> Yeah, everyone, who's going to say "Well, we're not data-driven." >> Yeah, yeah, yeah. So I would say >> We're dead. >> I would say that everybody has data and they've got some ways that they're using it well and other places where they feel like they're not using it as well as they should. And so, I mean, the reason that we exist is to make it so it's easier for people to get value out of data, and so, if they were getting all the value they think they could get out of data, then we probably wouldn't exist and they would be fully data-driven. So I think that everybody, it's a journey and people are responding well to us, in part, because we're helping them down that journey. >> Well, the reason I asked that question is that we go to a lot of shows and everybody likes to throw out the digital transformation buzzword and then use Uber and Airbnb as an example, but if you dig deeper, you see that data is at the core of those companies and they're now beginning to apply machine intelligence and they're leveraging all this data that they've built up, this data architecture that they built up over the last five or 10 years. And then you've got this set of companies where all the data lives in silos and I can see you guys being able to help them. At the same time, I can see you helping the disruptors, so how do you see that? I mean, in terms of your role, in terms of affecting either digital transformations or digital disruptions. >> Well, I'd say that in either case, so we believe in a very sort of simple thing, which is that, so going back to what I said at the beginning, which is just that I see this regression in terms of data access, right? And so what happens is that, if you have a tightly-coupled system between two layers, then it becomes very difficult for people to sort of accommodate two different sets of needs. And so, the change over the last 10 years was the rise of the developer as the primary person for controlling data and that brought a huge amount of great things to it but analysis was not one of them. And there's tools that try to make that better but that's really the problem. And so our belief is very simple, which is that a new tier needs to be introduced between the consumers and the, and the producers of data. And that, and so that tier may interact with different systems, it may be more complex or whatever, for certain organizations, but the tier is necessary in all organizations because the analysts shouldn't be shaken around every time the developers change how they're doing data. >> Great. John Furrier has a saying that "Data is the new development kit", you know. He said that, I don't know, eight years ago and it's really kind of turned out to be the case. Jacques Nadeau, thanks very much for coming on theCUBE. Really appreciate your time. >> Yeah. >> Great to meet you. Good luck and keep us informed, please. >> Yes, thanks so much for your time, I've enjoyed it. >> You're welcome. Alright, thanks for watching everybody. This is theCUBE. We're live from Big Data SV. We'll be right back. (bright music)

Published Date : Mar 9 2018

SUMMARY :

Brought to you by SiliconANGLE Media We've been here most of the week, So we were talking a little bit about what you guys do. And we said, you know what, there needs to be a way Okay, so you solved that problem how, and the idea is pretty simple. So the goal is that, if you talk to sort of expose that to the different users, I mean, you heard this in the hadoop world years ago. And so people can control the permissions, And tell me more about the company. And then we actually just closed a B round that's to compete, to go to market and, you know, for people to deal with data and how you break out from the noise. and so most of the technologies they look at So you kind of have this bifurcated market. that they're data-driven. Yeah, everyone, who's going to say So I would say And so, I mean, the reason that we exist is At the same time, I can see you helping the disruptors, And so, the change over the last 10 years "Data is the new development kit", you know. Great to meet you. This is theCUBE.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Jacques Nadeau	PERSON	0.99+
Daimler	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Norwest	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
TransUnion	ORGANIZATION	0.99+
Jacque	PERSON	0.99+
San Jose	LOCATION	0.99+
OVH	ORGANIZATION	0.99+
Lightspeed	ORGANIZATION	0.99+
second step	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
two layers	QUANTITY	0.99+
Airbnb	ORGANIZATION	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Google Docs	TITLE	0.99+
Red Point	ORGANIZATION	0.99+
Strata	ORGANIZATION	0.99+
60	QUANTITY	0.98+
900 million dollars	QUANTITY	0.98+
three years ago	DATE	0.98+
eight years ago	DATE	0.98+
two	QUANTITY	0.98+
Dremio	PERSON	0.98+
first 10 minutes	QUANTITY	0.98+
last night	DATE	0.98+
about 15 million	QUANTITY	0.97+
theCUBE	ORGANIZATION	0.97+
first time	QUANTITY	0.97+
Dremio	ORGANIZATION	0.97+
Big Data SV	ORGANIZATION	0.96+
an hour	QUANTITY	0.96+
two great initial investors	QUANTITY	0.95+
today	DATE	0.93+
first meeting	QUANTITY	0.93+
this morning	DATE	0.92+
two different sets	QUANTITY	0.9+
third	QUANTITY	0.88+
Big Data	ORGANIZATION	0.87+
SQL	TITLE	0.87+
10 years	QUANTITY	0.87+
CUBE	ORGANIZATION	0.87+
years ago	DATE	0.86+
Silicon Valley	LOCATION	0.86+
January of this year	DATE	0.84+
Dremio	TITLE	0.84+
Three year old	QUANTITY	0.81+
last 10 years	DATE	0.8+
Sentry	ORGANIZATION	0.77+
one of them	QUANTITY	0.75+
about 50	QUANTITY	0.75+
day two	QUANTITY	0.74+
Ranger	ORGANIZATION	0.74+
SV	EVENT	0.7+
last ten years	DATE	0.68+
eighth big	QUANTITY	0.68+
Data	ORGANIZATION	0.66+
Big	EVENT	0.65+
couple of minutes	QUANTITY	0.61+
CTO	PERSON	0.56+
one	QUANTITY	0.55+
last	DATE	0.52+
100 companies	QUANTITY	0.52+
under	DATE	0.51+
five	QUANTITY	0.5+
2018	DATE	0.5+
Hive	TITLE	0.42+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Rahul Pathak Opening Session | AWS Startup Showcase S2 E2

>>Hello, everyone. Welcome to the cubes presentation of the 80 minutes startup showcase. Season two, episode two, the theme is data as code, the future of analytics. I'm John furry, your host. We had a great day lineup for you. Fast growing startups, great lineup of companies, founders, and stories around data as code. And we're going to kick it off here with our opening keynote with Rahul Pathak VP of analytics at AWS cube alumni. Right? We'll thank you for coming on and being the opening keynote for this awesome event. >>Yeah. And it's great to see you, and it's great to be part of this event, uh, excited to, um, to help showcase some of the great innovation that startups are doing on top of AWS. >>Yeah. We last spoke at AWS reinvent and, uh, a lot's happened there, service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio Cribble monks next Liccardo, a HANA imply all doing great stuff. Data as code has a lot of traction. So a lot of still momentum going on in the marketplace. Uh, pretty exciting. >>No, it's, uh, it's awesome. I mean, I think there's so much innovation happening and you know, the, the wonderful part of working with data is that the demand for services and products that help customers drive insight from data is just skyrocketing and has no sign of no sign of slowing down. And so it's a great time to be in the data business. >>It's interesting to see the theme of the show getting traction, because you start to see data being treated almost like how developers write software, taking things out of branches, working on them, putting them back in, uh, machine learnings, uh, getting iterated on you, seeing more models, being trained differently with better insights, action ones that all kind of like working like code. And this is a whole nother way. People are reinventing their businesses. This has been a big, huge wave. What's your reaction to that? >>Uh, I think it's spot on, I mean, I think the idea of data's code and bringing some of the repeatability of processes from software development into how people built it, applications is absolutely fundamental and especially so in machine learning where you need to think about the explainability of a model, what version of the world was it trained on? When you build a better model, you need to be able to explain and reproduce it. So I think your insights are spot on and these ideas are showing up in all stages of the data work flow from ingestion to analytics to I'm out >>This next way is about modernization and going to the next level with cloud-scale. Uh, thank you so much for coming on and being the keynote presenter here for this great event. Um, I'll let you take it away. Reinventing businesses, uh, with ads analytics, right? We'll take it away. >>Okay, perfect. Well, folks, we're going to talk about, uh, um, reinventing your business with, uh, data. And if you think about it, the first wave of reinvention was really driven by the cloud. As customers were able to really transform how they thought about technology and that's well on her way. Although if you stop and think about it, I think we're only about five to 10% of the way done in terms of it span being on the cloud. So lots of work to do there, but we're seeing another wave of reinvention, which is companies reinventing their businesses with data and really using data to transform what they're doing to look for new opportunities and look for ways to operate more efficiently. And I think the past couple of years of the pandemic, it really only accelerated that trend. And so what we're seeing is, uh, you know, it's really about the survival of the most informed folks for the best data are able to react more quickly to what's happening. >>Uh, we've seen customers being able to scale up if they're in, say the delivery business or scale down, if they were in the travel business at the beginning of all of this, and then using data to be able to find new opportunities and new ways to serve customers. And so it's really foundational and we're seeing this across the board. And so, um, you know, it's great to see the innovation that's happening to help customers make sense of all of this. And our customers are really looking at ways to put data to work. It's about making better decisions, finding new efficiencies and really finding new opportunities to succeed and scale. And, um, you know, when it comes to, uh, good examples of this FINRA is a great one. You may not have heard of them, but that the U S equities regulators, all trading that happens in equities, they keep track of they're look at about 250 billion records per day. >>Uh, the examiner, I was only EMR, which is our spark and Hadoop service, and they're processing 20 terabytes of data running across tens of thousands of nodes. And they're looking for fraud and bad actors in the market. So, um, you know, huge, uh, transformation journey for FINRA over the years of customer I've gotten to work with personally since really 2013 onward. So it's been amazing to see their journey, uh, Pinterest, not a great customer. I'm sure everyone's familiar with, but, um, you know, they're about visual search and discovery and commerce, and, um, they're able to scale their daily lot searches, um, really a factor of three X or more, uh, drive down their costs. And they're using the Amazon Opus search service. And really what we're trying to do at AWS is give our customers the most comprehensive set of services for the end-to-end journey around, uh, data from ingestion to analytics and machine learning. And we will want to provide a comprehensive set of capabilities for ingestion, cataloging analytics, and then machine learning. And all of these are things that our partners and the startups that are run on us have available to them to build on as they build and deliver value for their customers. >>And, you know, the way we think about this is we want customers to be able to modernize what they're doing and their infrastructure. And we provide services for that. It's about unifying data, wherever it lives, connecting it. So the customers can build a complete picture of their customers and business. And then it's about innovation and really using machine learning to bring all of this unified data, to bear on driving new innovation and new opportunities for customers. And what we're trying to do AWS is really provide a scalable and secure cloud platform that customers and partners can build on a unifying is about connecting data. And it's also about providing well-governed access to data. So one of the big trends that we see is customers looking for the ability to make self-service data available to that customer there and use. And the key to that is good foundational governance. >>Once you can define good access controls, you then are more comfortable setting data free. And, um, uh, the other part of it is, uh, data lakes play a huge role because you need to be able to think about structured and unstructured data. In fact, about 80% of the data being generated today, uh, is unstructured. And you want to be able to connect data that's in data lakes with data that's in purpose-built data stores, whether that's databases on AWS databases, outside SAS products, uh, as well as things like data warehouses and machine learning systems, but really connecting data as key. Uh, and then, uh, innovation, uh, how can we bring to bear? And we imagine all processes with new technologies like AI and machine learning, and AI is also key to unlocking a lot of the value that's in unstructured data. If you can figure out what's in an imagine the sentiment of audio and do that in real-time that lets you then personalize and dynamically tailor experiences, all of which are super important to getting an edge, um, in, uh, in the modern marketplace. And so at AWS, we, when we think about connecting the dots across sources of data, allowing customers to use data, lakes, databases, analytics, and machine learning, we want to provide a common catalog and governance and then use these to help drive new experiences for customers and their apps and their devices. And then this, you know, in an ideal world, we'll create a closed loop. So you create a new experience. You observe our customers interact with it, that generates more data, which is a data source that feeds into the system. >>And, uh, you know, on AWS, uh, thinking about a modern data strategy, uh, really at the core is a data lakes built on us three. And I'll talk more about that in a second. Then you've got services like Athena included, lake formation for managing that data, cataloging it and querying it in place. And then you have the ability to use the right tool for the right job. And so we're big believers in purpose-built services for data because that's where you can avoid compromising on performance functionality or scale. Uh, and then as I mentioned, unification and inter interconnecting, all of that data. So if you need to move data between these systems, uh, there's well-trodden pathways that allow you to do that, and then features built into services that enable that. >>And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at key, um, and you know, this is really about providing arbitrarily scalable high throughput systems. It's about open format data for future-proofing. Uh, then we talk about purpose-built systems at the best possible functionality, performance, and cost. Uh, and then from a serverless perspective, this has been another big trend for us. We announced a bunch of serverless services and reinvented the goal here is to really take away the need to manage infrastructure from customers. They can really focus about driving differentiated business value, integrated governance, and then machine learning pervasively, um, not just as an end product for data scientists, but also machine learning built into data, warehouses, visualization and a database. >>And so it's scalable data lakes. Uh, data three is really the foundation for this. One of our, um, original services that AWS really the backbone of so much of what we do, uh, really unmatched your ability, availability, and scale, a huge portfolio of analytics services, uh, both that we offer, but also that our partners and customers offer and really arbitrary skin. We've got individual customers and estimator in the expert range, many in the hundreds of petabytes. And that's just growing. You know, as I mentioned, we see roughly a 10 X increase in data volume every five years. So that's a exponential increase in data volumes, Uh, from a purpose-built perspective, it's the right tool for the right job, the red shift and data warehousing Athena for querying all your data. Uh, EMR is our managed sparking to do, uh, open search for log analytics and search, and then Kinesis and Amex care for CAFCA and streaming. And that's been another big trend is, uh, real time. Data has been exploding and customers wanting to make sense of that data in real time, uh, is another big deal. >>Uh, some examples of how we're able to achieve differentiated performance and purpose-built systems. So with Redshift, um, using managed storage and it's led us and since types, uh, the three X better price performance, and what's out there available to all our customers and partners in EMR, uh, with things like spark, we're able to deliver two X performance of open source with a hundred percent compatibility, uh, almost three X and Presto, uh, with on two, which is our, um, uh, new Silicon chips on AWS, better price performance, about 10 to 12% better price performance, and 20% lower costs. And then, uh, all compatible source. So drop your jobs, then have them run faster and cheaper. And that translates to customer benefits for better margins for partners, uh, from a serverless perspective, this is about simplifying operations, reducing total cost of ownership and freeing customers from the need to think about capacity management. If we invent, we, uh, announced serverless redshifts EMR, uh, serverless, uh, Kinesis and Kafka, um, and these are all game changes for customers in terms of freeing our customers and partners from having to think about infrastructure and allowing them to focus on data. >>And, um, you know, when it comes to several assumptions in analytics, we've really got a very full and complete set. So, uh, whether that's around data warehousing, big data processing streaming, or cataloging or governance or visualization, we want all of our customers to have an option to run something struggles as well as if they have specialized needs, uh, uh, instances are available as well. And so, uh, really providing a comprehensive deployment model, uh, based on the customer's use cases, uh, from a governance perspective, uh, you know, like information is about easy build and management of data lakes. Uh, and this is what enables data sharing and self service. And, um, you know, with you get very granular access controls. So rule level security, uh, simple data sharing, and you can tag data. So you can tag a group of analysts in the year when you can say those only have access to the new data that's been tagged with the new tags, and it allows you to very, scaleably provide different secure views onto the same data without having to make multiple copies, another big win for customers and partners, uh, support transactions on data lakes. >>So updates and deletes. And time-travel, uh, you know, John talked about data as code and with time travel, you can look at, um, querying on different versions of data. So that's, uh, a big enabler for those types of strategies. And with blue, you're able to connect data in multiple places. So, uh, whether that's accessing data on premises in other SAS providers or, uh, clouds, uh, as well as data that's on AWS and all of this is, uh, serverless and interconnected. And, um, and really it's about plugging all of your data into the AWS ecosystem and into our partner ecosystem. So this API is all available for integration as well, but then from an AML perspective, what we're really trying to do is bring machine learning closer to data. And so with our databases and warehouses and lakes and BI tools, um, you know, we've infused machine learning throughout our, by, um, the state of the art machine running that we offer through SageMaker. >>And so you've got a ML in Aurora and Neptune for broths. Uh, you can train machine learning models from SQL, directly from Redshift and a female. You can use free inference, and then QuickSight has built in forecasting built in natural language, querying all powered by machine learning, same with anomaly detection. And here are the ideas, you know, how can we up our systems get smarter at the surface, the right insights for our customers so that they don't have to always rely on smart people asking the right questions, um, and you know, uh, really it's about bringing data back together and making it available for innovation. And, uh, thank you very much. I appreciate your attention. >>Okay. Well done reinventing the business with AWS analytics rural. That was great. Thanks for walking through that. That was awesome. I have to ask you some questions on the end-to-end view of the data. That seems to be a theme serverless, uh, in there, uh, Mel integration. Um, but then you also mentioned picking the right tool for the job. So then you've got like all these things moving on, simplify it for me right now. So from a business standpoint, how do they modernize? What's the steps that the clients are taking with analytics, what's the best practice? How do they, what's the what's the high order bit here? >>Uh, so the basic hierarchy is, you know, historically legacy systems are rigid and inflexible, and they weren't really designed for the scale of modern data or the variety of it. And so what customers are finding is they're moving to the cloud. They're moving from legacy systems with punitive licensing into more flexible, more systems. And that allows them to really think about building a decoupled, scalable future proof architecture. And so you've got the ability to combine data lakes and databases and data warehouses and connect them using common KPIs and common data protection. And that sets you up to deal with arbitrary scale and arbitrary types. And it allows you to evolve as the future changes since it makes it easy to add in a new type of engine, as we invent a better one a few years from now. Uh, and then, uh, once you've kind of got your data in a cloud and interconnected in this way, you can now build complete pictures of what's going on. You can understand all your touch points with customers. You can understand your complete supply chain, and once you can build that complete picture of your business, you can start to use analytics and machine learning to find new opportunities. So, uh, think about modernizing, moving to the cloud, setting up for the future, connecting data end to end, and then figuring out how to use that to your advantage. >>I know as you mentioned, modern data strategy gives you the best of both worlds. And you've mentioned, um, briefly, I want to get a little bit more, uh, insight from you on this. You mentioned open, open formats. One of the themes that's come out of some of the interviews, these companies we're going to be hearing from today is open source. The role opens playing. Um, how do you see that integrating in? Because again, this is just like software, right? Open, uh, open source software, open source data. It seems to be a trend. What does open look like to you? How do you see that progressing? >>Uh, it's a great question. Uh, open operates on multiple dimensions, John, as you point out, there's open data formats. These are things like JSI and our care for analytics. This allows multiple engines tend to operate on data and it'll, it, it creates option value for customers. If you're going to data in an open format, you can use it with multiple technologies and that'll be future-proofed. You don't have to migrate your data. Now, if you're thinking about using a different technology. So that's one piece now that sort of software, um, also, um, really a big enabler for innovation and for customers. And you've got things like squat arc and Presto, which are popular. And I know some of the startups, um, you know, that we're talking about as part of the showcase and use these technologies, and this allows for really the world to contribute, to innovating and these engines and moving them forward together. And we're big believers in that we've got open source services. We contribute to open-source, we support open source projects, and that's another big part of what we do. And then there's open API is things like SQL or Python. Uh, again, uh, common ways of interacting with data that are broadly adopted. And this one, again, create standardization. It makes it easier for customers to inter-operate and be flexible. And so open is really present all the way through. And it's a big part, I think, of, uh, the present and the future. >>Yeah. It's going to be fun to watch and see how that grows. It seems to be a lot of traction there. I want to ask you about, um, the other comment I thought was cool. You had the architectural slides out there. One was data lakes built on S3, and you had a theme, the glue in lake formation kind of around S3. And then you had the constellation of, you know, Kinesis SageMaker and other things around it. And you said, you know, pick the tool for the right job. And then you had the other slide on the analytics at the center and you had Redshift and all the other, other, other services around it around serverless. So one was more about the data lake with Athena glue and lake formation. The other one's about serverless. Explain that a little bit more for me, because I'm trying to understand where that fits. I get the data lake piece. Okay. Athena glue and lake formation enables it, and then you can pick and choose what you need on the serverless side. What does analytics in the center mean? >>So the idea there is that really, we wanted to talk about the fact that if you zoom into the analytics use case within analytics, everything that we offer, uh, has a serverless option for our customers. So, um, you could look at the bucket of analytics across things like Redshift or EMR or Athena, or, um, glue and league permission. You have the option to use instances or containers, but also to just not worry about infrastructure and just think declaratively about the data that you want to. >>Oh, so basically you're saying the analytics is going serverless everywhere. Talking about volumes, you mentioned 10 X volumes. Um, what are other stats? Can you share in terms of volumes? What are people seeing velocity I've seen data warehouses can't move as fast as what we're seeing in the cloud with some of your customers and how they're using data. How does the volume and velocity community have any kind of other kind of insights into those numbers? >>Yeah, I mean, I think from a stats perspective, um, you know, take Redshift, for example, customers are processing. So reading and writing, um, multiple exabytes of data there across from each shift. And, uh, you know, one of the things that we've seen in, uh, as time has progressed as, as data volumes have gone up and did a tapes have exploded, uh, you've seen data warehouses get more flexible. So we've added things like the ability to put semi-structured data and arbitrary, nested data into Redshift. Uh, we've also seen the seamless integration of data warehouses and data lakes. So, um, actually Redshift was one of the first to enable a straightforward acquiring of data. That's sitting in locally and drives as well as feed and that's managed on a stream and, uh, you know, those trends will continue. I think you'll kind of continue to see this, um, need to query data wherever it lives and, um, and, uh, allow, uh, leaks and warehouses and purpose-built stores to interconnect. >>You know, one of the things I liked about your presentation was, you know, kind of had the theme of, you know, modernize, unify, innovate, um, and we've been covering a lot of companies that have been, I won't say stumbling, but like getting to the future, some go faster than others, but they all kind of get stuck in an area that seems to be the same spot. It's the silos, breaking down the silos and get in the data lakes and kind of blending that purpose built data store. And they get stuck there because they're so used to silos and their teams, and that's kind of holding back the machine learning side of it because the machine learning can't do its job if they don't have access to all the data. And that's where we're seeing machine learning kind of being this new iterative model where the models are coming in faster. And so the silo brake busting is an issue. So what's your take on this part of the equation? >>Uh, so there's a few things I plan it. So you're absolutely right. I think that transition from some old data to interconnected data is always straightforward and it operates on a number of levels. You want to have the right technology. So, um, you know, we enable things like queries that can span multiple stores. You want to have good governance, you can connect across multiple ones. Uh, then you need to be able to get data in and out of these things and blue plays that role. So there's that interconnection on the technical side, but the other piece is also, um, you know, you want to think through, um, organizationally, how do you organize, how do you define it once data when they share it? And one of the asylees for enabling that sharing and, um, think about, um, some of the processes that need to get put in place and create the right incentives in your company to enable that data sharing. And then the foundational piece is good guardrails. You know, it's, uh, it can be scary to open data up. And, uh, the key to that is to put good governance in place where you can ensure that data can be shared and distributed while remaining protected and adhering to the privacy and compliance and security regulations that you have for that. And once you can assert that level of protection, then you can set that data free. And that's when, uh, customers really start to see the benefits of connecting all of it together, >>Right? And then we have a batch of startups here on this episode that are doing a lot of different things. Uh, some have, you know, new lake new lakes are forming observability lakes. You have CQL innovation on the front end data, tiering innovation at the data tier side, just a ton of innovation around this new data as code. How do you see as executive at AWS? You're enabling all this, um, where's the action going? Where are the white spaces? Where are the opportunities as this architecture continues to grow, um, and get traction because of the relevance of machine learning and AI and the apps are embedding data in there now as code where's the opportunities for these startups and how can they continue to grow? >>Yeah, the, I mean, the opportunity is it's amazing, John, you know, we talked a little bit about this at the beginning, but the, there is no slow down insight for the volume of data that we're generating pretty much everything that we have, whether it's a watch or a phone or the systems that we interact with are generating data and, uh, you know, customers, uh, you know, we talk a lot about the things that'll stay the same over time. And so, you know, the data volumes will continue to go up. Customers are gonna want to keep analyzing that data to make sense of it. They're going to want to be able to do it faster and more cheaply than they were yesterday. And then we're going to want to be able to make decisions and innovate, uh, in a shorter cycle and run more experiments than they were able to do. >>And so I think as long as, and they're always going to want this data to be secure and well-protected, and so I think as long as we, and the startups that we work with can continue to push on making these things better. Can I deal with more data? Can I deal with it more cheaply? Can I make it easier to get insight? And can I maintain a super high bar in security investments in these areas will just be off. Um, because, uh, the demand side of this equation is just in a great place, given what we're seeing in terms of theater and the architect for forum. >>I also love your comment about, uh, ML integration being the last leg of the equation here or less likely the journey, but you've got that enablement of the AIP solves a lot of problems. People can see benefits from good machine learning and AI is creating opportunities. Um, and also you also have mentioned the end to end with security piece. So data and security are kind of going hand in hand these days, not just the governments and the compliance stuff we're talking about security. So machine learning integration kind of connects all of this. Um, what's it all mean for the customers, >>For customers. That means that with machine learning and really enabling themselves to use machine learning, to make sense of data, they're able to find patterns that can represent new opportunities, um, quicker than ever before. And they're able to do it, uh, dynamically. So, you know, in a prior version of the world, we'd have little bit of systems and they would be relatively rigid and then we'd have to improve them. Um, with machine learning, this can be dynamic and near real time and you can customize them. So, uh, that just represents an opportunity to deepen relationships with customers and create more value and to find more efficiency in how businesses are run. So that piece is there. Um, and you know, your ideas around, uh, data's code really come into play because machine learning needs to be repeatable and explainable. And that means versioning, uh, keeping track of everything that you've done from a code and data and learning and training perspective >>And data sets are updating the machine learning. You got data sets growing, they become code modules that can be reused and, uh, interrogated, um, security okay. Is a big as a big theme data, really important security is seen as one of our top use cases. Certainly now in this day and age, we're getting a lot of, a lot of breaches and hacks coming in, being defended. It brings up the open, brings up the data as code security is a good proxy for kind of where this is going. What's your what's take on that and your reaction to that. >>So I'm, I'm security. You can, we can never invest enough. And I think one of the things that we, um, you know, guide us in AWS is security, availability, durability sort of jobs, you know, 1, 2, 3, and, um, and it operates at multiple levels. You need to protect data and rest with encryption, good key management and good practices though. You need to protect data on the wire. You need to have a good sense of what data is allowed to be seen by whom. And then you need to keep track of who did what and be able to verify and come back and prove that, uh, you know, uh, only the things that were allowed to happen actually happened. And you can actually then use machine learning on top of all of this apparatus to say, uh, you know, can I detect things that are happening that shouldn't be happening in near real time so they could put a stop to them. So I don't think any of us can ever invest enough in securing and protecting my data and our systems, and it is really fundamental or adding customer trust and it's just good business. So I think it is absolutely crucial. And we think about it all the time and are always looking for ways to raise >>Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a lot of these startups that are presenting, they're doing well. Business wise, they're being used by large enterprises and people buying their products and using their services for customers are implementing more and more of the hot startups products they're relevant. What's your advice to the customer out there as they go on this journey, this new data as code this new future of analytics, what's your recommendation. >>So for customers who are out there, uh, recommend you take a look at, um, what, uh, the startups on AWS are building. I think there's tremendous innovation and energy, uh, and, um, there's really great technology being built on top of a rock solid platform. And so I encourage customers thinking about it to lean forward, to think about new technology and to embrace, uh, move to the cloud suite, modernized, you know, build a single picture of our data and, and figure out how to innovate and when >>Well, thanks for coming on. Appreciate your keynote. Thanks for the insight. And thanks for the conversation. Let's hand it off to the show. Let the show begin. >>Thank you, John pleasure, as always.

Published Date : Apr 5 2022

SUMMARY :

And we're going to kick it off here with our opening keynote with um, to help showcase some of the great innovation that startups are doing on top of AWS. service loss of serverless as the center of the, of the action, but all these start-ups rock set Dremio And so it's a great time to be in the data business. It's interesting to see the theme of the show getting traction, because you start to see data being treated and especially so in machine learning where you need to think about the explainability of a model, Uh, thank you so much for coming on and being the keynote presenter here for this great event. And so what we're seeing is, uh, you know, it's really about the survival And so, um, you know, it's great to see the innovation that's happening to help customers make So, um, you know, huge, uh, transformation journey for FINRA over the years of customer And the key to that is good foundational governance. And you want to be able to connect data that's in data lakes with data And then you have the ability to use the right tool for the right job. And, um, you know, some of the core ideas that guide the work that we do, um, scalable data lakes at And that's been another big trend is, uh, real time. and freeing customers from the need to think about capacity management. those only have access to the new data that's been tagged with the new tags, and it allows you to And time-travel, uh, you know, John talked about data as code And here are the ideas, you know, how can we up our systems get smarter at the surface, I have to ask you some questions on the end-to-end Uh, so the basic hierarchy is, you know, historically legacy systems are I know as you mentioned, modern data strategy gives you the best of both worlds. And I know some of the startups, um, you know, that we're talking about as part of the showcase And then you had the other slide on the analytics at the center and you had Redshift and all the other, So the idea there is that really, we wanted to talk about the fact that if you zoom about volumes, you mentioned 10 X volumes. And, uh, you know, one of the things that we've seen And so the silo brake busting is an issue. side, but the other piece is also, um, you know, you want to think through, Uh, some have, you know, new lake new lakes are forming observability lakes. And so, you know, the data volumes will continue to go up. And so I think as long as, and they're always going to want this data to be secure and well-protected, Um, and also you also have mentioned the end to end with security piece. And they're able to do it, uh, that can be reused and, uh, interrogated, um, security okay. And then you need to keep track of who did what and be able Well, I really appreciate you taking the time to give the keynote final word here for the folks watching a And so I encourage customers thinking about it to lean forward, And thanks for the conversation.

ENTITIES

Entity	Category	Confidence
Rahul Pathak	PERSON	0.99+
John	PERSON	0.99+
20 terabytes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
20%	QUANTITY	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
S3	TITLE	0.99+
Python	TITLE	0.99+
FINRA	ORGANIZATION	0.99+
10 X	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
hundred percent	QUANTITY	0.99+
SQL	TITLE	0.98+
both	QUANTITY	0.98+
One	QUANTITY	0.98+
80 minutes	QUANTITY	0.98+
each shift	QUANTITY	0.98+
one piece	QUANTITY	0.98+
about 80%	QUANTITY	0.98+
Neptune	LOCATION	0.98+
one	QUANTITY	0.98+
Pinterest	ORGANIZATION	0.98+
today	DATE	0.97+
QuickSight	ORGANIZATION	0.97+
three	QUANTITY	0.97+
Redshift	TITLE	0.97+
wave of reinvention	EVENT	0.97+
first	EVENT	0.96+
hundreds of petabytes	QUANTITY	0.96+
HANA	TITLE	0.96+
first	QUANTITY	0.95+
both worlds	QUANTITY	0.95+
Aurora	LOCATION	0.94+
Amex	ORGANIZATION	0.94+
SAS	ORGANIZATION	0.94+
pandemic	EVENT	0.94+
12%	QUANTITY	0.93+
about 10	QUANTITY	0.93+
past couple of years	DATE	0.92+
Kafka	TITLE	0.92+
Kinesis	ORGANIZATION	0.92+
Liccardo	TITLE	0.91+
EMR	TITLE	0.91+
about five	QUANTITY	0.89+
tens of thousands of nodes	QUANTITY	0.88+
Kinesis	TITLE	0.88+
10%	QUANTITY	0.87+
three X	QUANTITY	0.86+
Athena	ORGANIZATION	0.86+
about 250 billion records per	QUANTITY	0.85+
U S	ORGANIZATION	0.85+
CAFCA	ORGANIZATION	0.84+
Silicon	ORGANIZATION	0.83+
every five years	QUANTITY	0.82+
Season two	QUANTITY	0.82+
Athena	OTHER	0.78+
single picture	QUANTITY	0.74+

Sanjeev Mohan, SanjMo & Nong Li, Okera | AWS Startup Showcase

(cheerful music) >> Hello everyone, welcome to today's session of theCUBE's presentation of AWS Startup Showcase, New Breakthroughs in DevOps, Data Analytics, Cloud Management Tools, featuring Okera from the cloud management migration track. I'm John Furrier, your host. We've got two great special guests today, Nong Li, founder and CTO of Okera, and Sanjeev Mohan, principal @SanjMo, and former research vice president of big data and advanced analytics at Gartner. He's a legend, been around the industry for a long time, seen the big data trends from the past, present, and knows the future. Got a great lineup here. Gentlemen, thank you for this, so, life in the trenches, lessons learned across compliance, cloud migration, analytics, and use cases for Fortune 1000s. Thanks for joining us. >> Thanks for having us. >> So Sanjeev, great to see you, I know you've seen this movie, I was saying that in the open, you've at Gartner seen all the visionaries, the leaders, you know everything about this space. It's changing extremely fast, and one of the big topics right out of the gate is not just innovation, we'll get to that, that's the fun part, but it's the regulatory compliance and audit piece of it. It's keeping people up at night, and frankly if not done right, slows things down. This is a big part of the showcase here, is to solve these problems. Share us your thoughts, what's your take on this wide-ranging issue? >> So, thank you, John, for bringing this up, and I'm so happy you mentioned the fact that, there's this notion that it can slow things down. Well I have to say that the old way of doing governance slowed things down, because it was very much about control and command. But the new approach to data governance is actually in my opinion, it's liberating data. If you want to democratize or monetize, whatever you want to call it, you cannot do it 'til you know you can trust said data and it's governed in some ways, so data governance has actually become very interesting, and today if you want to talk about three different areas within compliance regulatory, for example, we all know about the EU GDPR, we know California has CCPA, and in fact California is now getting even a more stringent version called CPRA in a couple of years, which is more aligned to GDPR. That is a first area we know we need to comply to that, we don't have any way out. But then, there are other areas, there is insider trading, there is how you secure the data that comes from third parties, you know, vendors, partners, suppliers, so Nong, I'd love to hand it over to you, and see if you can maybe throw some light into how our customers are handling these use cases. >> Yeah, absolutely, and I love what you said about balancing agility and liberating, in the face of what may be seen as things that slow you down. So we work with customers across verticals with old and new regulations, so you know, you brought up GDPR. One of our clients is using this to great effect to power their ecosystem. They are a very large retail company that has operations and customers across the world, obviously the importance of GDPR, and the regulations that imposes on them are very top of mind, and at the same time, being able to do effective targeting analytics on customer information is equally critical, right? So they're exactly at that spot where they need this customer insight for powering their business, and then the regulatory concerns are extremely prevalent for them. So in the context of GDPR, you'll hear about things like consent management and right to be forgotten, right? I, as a customer of that retailer should say "I don't want my information used for this purpose," right? "Use it for this, but not this." And you can imagine at a very, very large scale, when you have a billion customers, managing that, all the data you've collected over time through all of your devices, all of your telemetry, really, really challenging. And they're leveraging Okera embedded into their analytics platform so they can do both, right? Their data scientists and analysts who need to do everything they're doing to power the business, not have to think about these kind of very granular customer filtering requirements that need to happen, and then they leverage us to do that. So that's kind of new, right, GDPR, relatively new stuff at this point, but we obviously also work with customers that have regulations from a long long time ago, right? So I think you also mentioned insider trading and that supply chain, so we'll talk to customers, and they want really data-driven decisions on their supply chain, everything about their production pipeline, right? They want to understand all of that, and of course that makes sense, whether you're the CFO, if you're going to make business decisions, you need that information readily available, and supply chains as we know get more and more and more complex, we have more and more integrated into manufacturing and other verticals. So that's your, you're a little bit stuck, right? You want to be data-driven on those supply chain analytics, but at the same time, knowing the details of all the supply chain across all of your dependencies exposes your internal team to very high blackout periods or insider trading concerns, right? For example, if you knew Apple was buying a bunch of something, that's maybe information that only a select few people can have, and the way that manifests into data policies, 'cause you need the ability to have very, very scalable, per employee kind of scalable data restriction policies, so they can do their job easier, right? If we talk about speeding things up, instead of a very complex process for them to get approved, and approved on SEC regulations, all that kind of stuff, you can now go give them access to the part of the supply chain that they need, and no more, and limit their exposure and the company's exposure and all of that kind of stuff. So one of our customers able to do this, getting two orders of magnitude, a 100x reduction in the policies to manage the system like that. >> When I hear you talking like that, I think the old days of "Oh yeah, regulatory, it kind of slows down innovation, got to go faster," pretty basic variables, not a lot of combination of things to check. Now with cloud, there seems to be combinations, Sanjeev, because how complicated has the regulatory compliance and audit environment gotten in the past few years, because I hear security in a supply chain, I hear insider threats, I mean these are security channels, not just compliance department G&A kind of functions. You're talking about large-scale, potentially combinations of access, distribution, I mean it seems complicated. How much more complicated is it now, just than it was a few years ago? >> So, you know the way I look at it is, I'm just mentioning these companies just as an example, when PayPal or Ebay, all these companies started, they started in California. Anybody who ever did business on Ebay or PayPal, guess where that data was? In the US in some data center. Today you cannot do it. Today, data residency laws are really tough, and so now these organizations have to really understand what data needs to remain where. On top of that, we now have so many regulations. You know, earlier on if you were healthcare, you needed to be HIPAA compliant, or banking PCI DSS, but today, in the cloud, you really need to know, what data I have, what sensitive data I have, how do I discover it? So that data discovery becomes really important. What roles I have, so for example, let's say I work for a bank in the US, and I decide to move to Germany. Now, the old school is that a new rule will be created for me, because of German... >> John: New email address, all these new things happen, right? >> Right, exactly. So you end up with this really, a mass of rules and... And these are all static. >> Rules and tools, oh my god. >> Yeah. So Okera actually makes a lot of this dynamic, which reduces your cloud migration overhead, and Nong used some great examples, in fact, sorry if I take just a second, without mentioning any names, there's one of the largest banks in the world is going global in the digital space for the first time, and they're taking Okera with them. So... >> But what's the point? This is my next topic in cloud migration, I want to bring this up because, complexity, when you're in that old school kind of data center, waterfall, these old rules and tools, you have to roll this out, and it's a pain in the butt for everybody, it's a hassle, huge hassle. Cloud gives the agility, we know that, and cloud's becoming more secure, and I think now people see the on-premise, certainly things that'd be on-premises for secure things, I get that, but when you start getting into agility, and you now have cloud regions, you can start being more programmatic, so I want to get you guys' thoughts on the cloud migration, how companies who are now lifting and shifting, replatforming, what's the refactoring beyond that, because you can replatform in the cloud, and still some are kind of holding back on that. Then when you're in the cloud, the ones that are winning, the companies that are winning are the ones that are refactoring in the cloud. Doing things different with new services. Sanjeev, you start. >> Yeah, so you know, in fact lot of people tell me, "You know, we are just going to lift and shift into the cloud." But you're literally using cloud as a data center. You still have all the, if I may say, junk you had on-prem, you just moved it into the cloud, and now you're paying for it. In cloud, nothing is free. Every storage, every processing, you're going to pay for it. The most successful companies are the ones that are replatforming, they are taking advantage of the platform as a service or software as a service, so that includes things like, you pay as you go, you pay for exactly the amount you use, so you scale up and scale down or scale out and scale in, pretty quickly, you know? So you're handling that demand, so without replatforming, you are not really utilizing your- >> John: It's just hosting. >> Yeah, you're just hosting. >> It's basically hosting if you're not doing anything right there. >> Right. The reason why people sometimes resist to replatform, is because there's a hidden cost that we don't really talk about, PaaS adds 3x to IaaS cost. So, some organizations that are very mature, and they have a few thousand people in the IT department, for them, they're like "No, we just want to run it in the cloud, we have the expertise, and it's cheaper for us." But in the long run, to get the most benefit, people should think of using cloud as a service. >> Nong what's your take, because you see examples of companies, I'll just call one out, Snowflake for instance, they're essentially a data warehouse in the cloud, they refactored and they replatformed, they have a competitive advantage with the scale, so they have things that others don't have, that just hosting. Or even on-premise. The new model developing where there's real advantages, and how should companies think about this when they have to manage these data lakes, and they have to manage all these new access methods, but they want to maintain that operational stability and control and growth? >> Yeah, so. No? Yeah. >> There's a few topics that are all (indistinct) this topic. (indistinct) enterprises moving to the cloud, they do this maybe for some cost savings, but a ton of it is agility, right? The motor that the business can run at is just so much faster. So we'll work with companies in the context of cloud migration for data, where they might have a data warehouse they've been using for 20 years, and building policies over that time, right? And it's taking a long time to go proof of access and those kind of things, made more sense, right? If it took you months to procure a physical infrastructure, get machines shipped to your data center, then this data access taking so long feels okay, right? That's kind of the same rate that everything is moving. In the cloud, you can spin up new infrastructure instantly, so you don't want approvals for getting policies, creating rules, all that stuff that Sanjeev was talking about, that being slow is a huge, huge problem. So this is a very common environment that we see where they're trying to do that kind of thing. And then, for replatforming, again, they've been building these roles and processes and policies for 20 years. What they don't want to do is take 20 years to go migrate all that stuff into the cloud, right? That's probably an experience nobody wants to repeat, and frankly for many of them, people who did it originally may or may not be involved in this kind of effort. So we work with a lot of companies like that, they have their, they want stability, they got to have the business running as normal, they got to get moving into the new infrastructure, doing it in a new way that, you know, with all the kind of lessons learned, so, as Sanjeev said, one of these big banks that we work with, that classical story of on-premise data warehousing, maybe a little bit of Hadoop, moved onto AWS, S3, Snowflake, that kind of setup, extremely intricate policies, but let's go reimagine how we can do this faster, right? What we like to talk about is, you're an organization, you need a design that, if you onboarded 1000 more data users, that's got to be way, way easier than the first 10 you onboarded, right? You got to get it to be easier over time, in a really, really significant way. >> Talk about the data authorization safety factor, because I can almost imagine all the intricacies of these different tools creates specialism amongst people who operate them. And each one might have their own little authorization nuance. Trend is not to have that siloed mentality. What's your take on clients that want to just "Hey, you know what? I want to have the maximum agility, but I don't want to get caught in the weeds on some of these tripwires around access and authorization." >> Yeah, absolutely, I think it's real important to get the balance of it, right? Because if you are an enterprise, or if you have diversive teams, you want them to have the ability to use tools as best of breed for their purpose, right? But you don't want to have it be so that every tool has its own access and provisioning and whatever, that's definitely going to be a security, or at least, a lot of friction for you to get things going. So we think about that really hard, I think we've seen great success with things like SSO and Okta, right? Unifying authentication. We think there's a very, very similar thing about to happen with authorization. You want that single control plane that can integrate with all the tools, and still get the best of what you need, but it's much, much easier (indistinct). >> Okta's a great example, if people don't want to build their own thing and just go with that, same with what you guys are doing. That seems to be the dots that are connecting you, Sanjeev. The ease of use, but yet the stability factor. >> Right. Yeah, because John, today I may want to bring up a SQL editor to go into Snowflake, just as an example. Tomorrow, I may want to use the Azure Bot, you know? I may not even want to go to Snowflake, I may want to go to an underlying piece of data, or I may use Power BI, you know, for some reason, and come from Azure side, so the point is that, unless we are able to control, in some sort of a centralized manner, we will not get that consistency. And security you know is all or nothing. You cannot say "Well, I secured my Snowflake, but if you come through HTFS, Hadoop, or some, you know, that is outside of my realm, or my scope," what's the point? So that is why it is really important to have a watertight way, in fact I'm using just a few examples, maybe tomorrow I decide to use a data catalog, or I use Denodo as my data virtualization and I run a query. I'm the same identity, but I'm using different tools. I may use it from home, over VPN, or I may use it from the office, so you want this kind of flexibility, all encompassed in a policy, rather than a separate rule if you do this and this, if you do that, because then you end up with literally thousands of rules. >> And it's never going to stop, either, it's like fashion, the next tool's going to come out, it's going to be cool, and people are going to want to use it, again, you don't want to have to then move the train from the compliance side this way or that way, it's a lot of hassle, right? So we have that one capability, you can bring on new things pretty quickly. Nong, am I getting it right, this is kind of like the trend, that you're going to see more and more tools and/or things that are relevant or, certain use cases that might justify it, but yet, AppSec review, compliance review, I mean, good luck with that, right? >> Yeah, absolutely, I mean we certainly expect tools to continue to get more and more diverse, and better, right? Most innovation in the data space, and I think we... This is a great time for that, a lot of things that need to happen, and so on and so forth. So I think one of the early goals of the company, when we were just brainstorming, is we don't want data teams to not be able to use the tools because it doesn't have the right security (indistinct), right? Often those tools may not be focused on that particular area. They're great at what they do, but we want to make sure they're enabled, they do some enterprise investments, they see broader adoption much easier. A lot of those things. >> And I can hear the sirens in the background, that's someone who's not using your platform, they need some help there. But that's the case, I mean if you don't get this right, there are some consequences, and I think one of the things I would like to bring up on next track is, to talk through with you guys is, the persona pigeonhole role, "Oh yeah, a data person, the developer, the DevOps, the SRE," you start to see now, developers and with cloud developers, and data folks, people, however they get pigeonholed, kind of blending in, okay? You got data services, you got analytics, you got data scientists, you got more democratization, all these things are being kicked around, but the notion of a developer now is a data developer, because cloud is about DevOps, data is now a big part of it, it's not just some department, it's actually blending in. Just a cultural shift, can you guys share your thoughts on this trend of data people versus developers now becoming kind of one, do you guys see this happening, and if so, how? >> So when, John, I started my career, I was a DBA, and then a data architect. Today, I think you cannot have a DBA who's not a developer. That's just my opinion. Because there is so much of CICD, DevOps, that happens today, and you know, you write your code in Python, you put it in version control, you deploy using Jenkins, you roll back if there's a problem. And then, you are interacting, you're building your data to be consumed as a service. People in the past, you would have a thick client that would connect to the database over TCP/IP. Today, people don't want to connect over TCP/IP necessarily, they want to go by HTTP. And they want an API gateway in the middle. So, if you're a data architect or DBA, now you have to worry about, "I have a REST API call that's coming in, how am I going to secure that, and make sure that people are allowed to see that?" And that was just yesterday. >> Exactly. Got to build an abstraction layer. You got to build an abstraction layer. The old days, you have to worry about schema, and do all that, it was hard work back then, but now, it's much different. You got serverless, functions are going to show way... It's happening. >> Correct, GraphQL, and semantic layer, that just blows me away because, it used to be, it was all in database, then we took it out of database and we put it in a BI tool. So we said, like BusinessObjects started this whole trend. So we're like "Let's put the semantic layer there," well okay, great, but that was when everything was surrounding BusinessObjects and Oracle Database, or some other database, but today what if somebody brings Power BI or Tableau or Qlik, you know? Now you don't have a semantic layer access. So you cannot have it in the BI layer, so you move it down to its own layer. So now you've got a semantic layer, then where do you store your metrics? Same story repeats, you have a metrics layer, then the data centers want to do feature engineering, where do you store your features? You have a feature store. And before you know, this stack has disaggregated over and over and over, and then you've got layers and layers of specialization that are happening, there's query accelerators like Dremio or Trino, so you've got your data here, which Nong is trying really hard to protect, and then you've got layers and layers and layers of abstraction, and networks are fast, so the end user gets great service, but it's a nightmare for architects to bring all these things together. >> How do you tame the complexity? What's the bottom line? >> Nong? >> Yeah, so, I think... So there's a few things you need to do, right? So, we need to re-think how we express security permanence, right? I think you guys have just maybe in passing (indistinct) talked about creating all these rules and all that kind of stuff, that's been the way we've done things forever. We get to think about policies and mechanisms that are much more dynamic, right? You need to really think about not having to do any additional work, for the new things you add to the system. That's really, really core to solving the complexity problem, right? 'Cause that gets you those orders of magnitude reduction, system's got to be more expressive and map to those policies. That's one. And then second, it's got to be implemented at the right layer, right, to Sanjeev's point, close to the data, and it can service all of those applications and use cases at the same time, and have that uniformity and breadth of support. So those two things have to happen. >> Love this universal data authorization vision that you guys have. Super impressive, we had a CUBE Conversation earlier with Nick Halsey, who's a veteran in the industry, and he likes it. That's a good sign, 'cause he's seen a lot of stuff, too, Sanjeev, like yourself. This is a new thing, you're seeing compliance being addressed, and with programmatic, I'm imagining there's going to be bots someday, very quickly with AI that's going to scale that up, so they kind of don't get in the innovation way, they can still get what they need, and enable innovation. You've got cloud migration, which is only going faster and faster. Nong, you mentioned speed, that's what CloudOps is all about, developers want speed, not things in days or hours, they want it in minutes and seconds. And then finally, ultimately, how's it scale up, how does it scale up for the people operating and/or programming? These are three major pieces. What happens next? Where do we go from here, what's, the customer's sitting there saying "I need help, I need trust, I need scale, I need security." >> So, I just wrote a blog, if I may diverge a bit, on data observability. And you know, so there are a lot of these little topics that are critical, DataOps is one of them, so to me data observability is really having a transparent view of, what is the state of your data in the pipeline, anywhere in the pipeline? So you know, when we talk to these large banks, these banks have like 1000, over 1000 data pipelines working every night, because they've got that hundred, 200 data sources from which they're bringing data in. Then they're doing all kinds of data integration, they have, you know, we talked about Python or Informatica, or whatever data integration, data transformation product you're using, so you're combining this data, writing it into an analytical data store, something's going to break. So, to me, data observability becomes a very critical thing, because it shows me something broke, walk me down the pipeline, so I know where it broke. Maybe the data drifted. And I know Okera does a lot of work in data drift, you know? So this is... Nong, jump in any time, because I know we have use cases for that. >> Nong, before you get in there, I just want to highlight a quick point. I think you're onto something there, Sanjeev, because we've been reporting, and we believe, that data workflows is intellectual property. And has to be protected. Nong, go ahead, your thoughts, go ahead. >> Yeah, I mean, the observability thing is critically important. I would say when you want to think about what's next, I think it's really effectively bridging tools and processes and systems and teams that are focused on data production, with the data analysts, data scientists, that are focused on data consumption, right? I think bridging those two, which cover a lot of the topics we talked about, that's kind of where security almost meets, that's kind of where you got to draw it. I think for observability and pipelines and data movement, understanding that is essential. And I think broadly, on all of these topics, where all of us can be better, is if we're able to close the loop, get the feedback loop of success. So data drift is an example of the loop rarely being closed. It drifts upstream, and downstream users can take forever to figure out what's going on. And we'll have similar examples related to buy-ins, or data quality, all those kind of things, so I think that's really a problem that a lot of us should think about. How do we make sure that loop is closed as quickly as possible? >> Great insight. Quick aside, as the founder CTO, how's life going for you, you feel good? I mean, you started a company, doing great, it's not drifting, it's right in the stream, mainstream, right in the wheelhouse of where the trends are, you guys have a really crosshairs on the real issues, how you feeling, tell us a little bit about how you see the vision. >> Yeah, I obviously feel really good, I mean we started the company a little over five years ago, there are kind of a few things that we bet would happen, and I think those things were out of our control, I don't think we would've predicted GDPR security and those kind of things being as prominent as they are. Those things have really matured, probably as best as we could've hoped, so that feels awesome. Yeah, (indistinct) really expanded in these years, and it feels good. Feels like we're in the right spot. >> Yeah, it's great, data's competitive advantage, and certainly has a lot of issues. It could be a blocker if not done properly, and you're doing great work. Congratulations on your company. Sanjeev, thanks for kind of being my cohost in this segment, great to have you on, been following your work, and you continue to unpack it at your new place that you started. SanjMo, good to see your Twitter handle taking on the name of your new firm, congratulations. Thanks for coming on. >> Thank you so much, such a pleasure. >> Appreciate it. Okay, I'm John Furrier with theCUBE, you're watching today's session presentation of AWS Startup Showcase, featuring Okera, a hot startup, check 'em out, great solution, with a really great concept. Thanks for watching. (calm music)

Published Date : Sep 22 2021

SUMMARY :

and knows the future. and one of the big topics and I'm so happy you in the policies to manage of things to check. and I decide to move to Germany. So you end up with this really, is going global in the digital and you now have cloud regions, Yeah, so you know, if you're not doing anything right there. But in the long run, to and they have to manage all Yeah, so. In the cloud, you can spin up get caught in the weeds and still get the best of what you need, with what you guys are doing. the Azure Bot, you know? are going to want to use it, a lot of things that need to happen, the SRE," you start to see now, People in the past, you The old days, you have and networks are fast, so the for the new things you add to the system. that you guys have. So you know, when we talk Nong, before you get in there, I would say when you want I mean, you started a and I think those things and you continue to unpack it Thank you so much, of AWS Startup Showcase,

ENTITIES

Entity	Category	Confidence
Nick Halsey	PERSON	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
US	LOCATION	0.99+
Nong Li	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Ebay	ORGANIZATION	0.99+
PayPal	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Tomorrow	DATE	0.99+
two	QUANTITY	0.99+
GDPR	TITLE	0.99+
Sanjeev Mohan	PERSON	0.99+
Today	DATE	0.99+
One	QUANTITY	0.99+
yesterday	DATE	0.99+
Snowflake	TITLE	0.99+
today	DATE	0.99+
Python	TITLE	0.99+
Gartner	ORGANIZATION	0.99+
Tableau	TITLE	0.99+
first time	QUANTITY	0.99+
3x	QUANTITY	0.99+
both	QUANTITY	0.99+
100x	QUANTITY	0.99+
one	QUANTITY	0.99+
Okera	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.98+
two orders	QUANTITY	0.98+
Nong	ORGANIZATION	0.98+
SanjMo	PERSON	0.98+
second	QUANTITY	0.98+
Power BI	TITLE	0.98+
1000	QUANTITY	0.98+
tomorrow	DATE	0.98+
two things	QUANTITY	0.98+
Qlik	TITLE	0.98+
each one	QUANTITY	0.97+
thousands of rules	QUANTITY	0.97+
1000 more data users	QUANTITY	0.96+
Twitter	ORGANIZATION	0.96+
first 10	QUANTITY	0.96+
Okera	PERSON	0.96+
AWS	ORGANIZATION	0.96+
hundred, 200 data sources	QUANTITY	0.95+
HIPAA	TITLE	0.94+
EU	ORGANIZATION	0.94+
CCPA	TITLE	0.94+
over 1000 data pipelines	QUANTITY	0.93+
single	QUANTITY	0.93+
first area	QUANTITY	0.93+
two great special guests	QUANTITY	0.92+
BusinessObjects	TITLE	0.92+

Opening Keynote | AWS Startup Showcase: Innovations with CloudData and CloudOps

(upbeat music) >> Welcome to this special cloud virtual event, theCUBE on cloud. This is our continuing editorial series of the most important stories in cloud. We're going to explore the cutting edge most relevant technologies and companies that will impact business and society. We have special guests from Jeff Barr, Michael Liebow, Jerry Chen, Ben Haynes, Michael skulk, Mike Feinstein from AWS all today are presenting the top startups in the AWS ecosystem. This is the AWS showcase of startups. I'm showing with Dave Vellante. Dave great to see you. >> Hey John. Great to be here. Thanks for having me. >> So awesome day today. We're going to feature a 10 grade companies amplitude, auto grid, big ID, cordial Dremio Kong, multicloud, Reltio stardog wire wheel, companies that we've talked to. We've researched. And they're going to present today from 10 for the rest of the day. What's your thoughts? >> Well, John, a lot of these companies were just sort of last decade, they really, were keyer kicker mode, experimentation mode. Now they're well on their way to hitting escape velocity which is very exciting. And they're hitting tens of millions dollars of ARR, many are planning IPO's and it's just it's really great to see what the cloud has enabled and we're going to dig into that very deeply today. So I'm super excited. >> Before we jump into the keynote (mumbles) our non Huff from AWS up on stage Jeremy is the brains behind this program that we're doing. We're going to do this quarterly. Jeremy great to see you, you're in the global startups program at AWS. Your job is to keep the crops growing, keep the startups going and keep the flow of innovation. Thanks for joining us. >> Yeah. Made it to startup showcase day. I'm super excited. And as you mentioned my team the global startup program team, we kind of provide white glove service for VC backed startups and help them with go to market activities. Co-selling with AWS and we've been looking for ways to highlight all the great work they're doing and partnering with you guys has been tremendous. You guys really know how to bring their stories to life. So super excited about all the partner sessions today. >> Well, I really appreciate the vision and working with Amazon this is like truly a bar raiser from theCUBE virtual perspective, using the virtual we can get more content, more flow and great to have you on and bring that the top hot startups around data, data ops. Certainly the most important story in tech is cloud scale with data. You you can't look around and seeing more innovation happening. So I really appreciate the work. Thanks for coming on. >> Yeah, and don't forget, we're making this a quarterly series. So the next one we've already been working on it. The next one is Wednesday, June 16th. So mark your calendars, but super excited to continue doing these showcases with you guys in the future. >> Thanks for coming on Jeremy. I really appreciate it,. Dave so I want to just quick quickly before we get Jeff up here, Jeff Barr who's a luminary guests for us this week who has been in the industry has been there from the beginning of AWS the role of data, and what's happened in cloud. And we've been watching the evolution of Amazon web services from the beginning, from the startup market to dominate in the enterprise. If you look at the top 10 enterprise companies Amazon wasn't on that list in 2010 they weren't even bringing the top 10 Andy Jassy's keynote at reinvent this past year. Highlighted that fact, I think they were number five or four as vendor in just AWS. So interesting to see that you've been reporting and doing a lot of analysis on the role of data. What's your analysis for these startups and as businesses need to embrace the new technologies and be on the right side of history not part of that old guard, incumbent failed model. >> Well, I think again, if you look back on the early days of cloud, it was really about storage and networking and compute infrastructure. And then we collected all this data and now you're seeing the next generation of innovation and value. We're going to talk to Michael Liebow about this is really if you look at all the value points in the leavers, it's all around data and data is going through a massive change in the way that we think about it, that we talk about it. And you hear that a lot. Obviously you talk about the volumes, the giant volumes but there's something else going on as AWS brings the cloud to the edge. And of course it looks at the data centers, just another edge device, data is getting highly decentralized. And what we're seeing is data getting into the hands of business owners and data product builders. I think we're going to see a new parlance emerge and that's where you're seeing the competitive advantage. And if you look at all the real winners these days in the marketplace especially in the digital with COVID, it all comes back to the data. And we're going to talk about that a lot today. >> One of the things that's coming up in all of our cube interviews, certainly we've seen, I mean we've had a great observation space across all the ecosystems, but the clear thing that's coming out of COVID is speed, agility, scale, and data. If you don't have that data you are going to be a non-player. And I think I heard some industry people talking about the future of how the stock market's going to work and that if you're not truly in market with an AI or machine learning data value play you probably will be shorted on the stock market or delisted. I think people are looking at that as a table stakes competitive advantage item, where if you don't have some sort of data competitive strategy you're going to be either delisted or sold short. And that's, I don't think delisted but the point is this table-stakes Dave. >> Well, I think too, I think the whole language the lingua franca of data is changing. We talk about data as an asset all the time, but you think about it now, what do we do with assets? We protect it, we hide it. And we kind of we don't share it. But then on the other hand, everybody talks about sharing the data and that is a huge trend in the marketplace. And so I think that everybody is really starting to rethink the whole concept of data, what it is, its value and how we think about it, talk about it, share it make it accessible, and at the same time, protect it and make it governed. And I think you're seeing, computational governance and automation really hidden. Couldn't do this without the cloud. I mean, that's the bottom line. >> Well, I'm super excited to have Jeff Barr here from AWS as our special keynote guests. I've been following Jeff's career for a long, long time. He's a luminaries, he's a technical, he's in the industry. He's part of the community, he's been there from the beginning AWS just celebrate its 15th birthday as he was blogging hard. He's been a hardcore blogger. I think Jeff, you had one of the original ping service. If I remember correctly, you were part of the web services foundational kind of present at creation. No better guests to have you Jeff thanks for coming up on our stage. >> John and Dave really happy to be here. >> So I got to ask you, you've been blogging hard for the past decade or so, going hard and your job has evolved from blogging about what's new with Amazon. A couple of building blocks a few services to last reinvent them. You must have put out I don't know how many blog posts did you put out last year at every event? I mean, it must have been a zillion. >> Not quite a zillion. I think I personally wrote somewhere between 20 and 25 including quite a few that I did in the month or so run up to reinvent and it's always intense, but it's always really, really fun. >> So I've got to ask you in the past couple of years, I mean I quoted Andy Jassy's keynote where we highlight in 2010 Amazon wasn't even on the top 10 enterprise players. Now in the top five, you've seen the evolution. What is the big takeaway from your standpoint as you look at the enterprise going from Amazon really dominating the start of a year startups today, you're in the cloud, you're born in the cloud. There's advantage to that. Now enterprises are kind of being reborn in the cloud at the same time, they're building these new use cases rejuvenating themselves and having innovation strategy. What's your takeaway? >> So I love to work with our customers and one of the things that I hear over and over again and especially the last year or two is really the value that they're placing on building a workforce that has really strong cloud skills. They're investing in education. They're focusing on this neat phrase that I learned in Australia called upskilling and saying let's take our set of employees and improve their skill base. I hear companies really saying we're going to go cloud first. We're going to be cloud native. We're going to really embrace it, adopt the full set of cloud services and APIs. And I also see that they're really looking at cloud as part of often a bigger picture. They often use the phrase digital transformation, in Amazon terms we'd say they're thinking big. They're really looking beyond where they are and who they are to what they could be and what they could grow into. Really putting a lot of energy and creativity into thinking forward in that way. >> I wonder Jeff, if you could talk about sort of how people are thinking about the future of cloud if you look at where the spending action is obviously you see it in cloud computing. We've seen that as the move to digital, serverless Lambda is huge. If you look at the data it's off the charts, machine learning and AI also up there containers and of course, automation, AWS leads in all of those. And they portend a different sort of programming model a different way of thinking about how to deploy workloads and applications maybe different than the early days of cloud. What's driving that generally and I'm interested in serverless specifically. And how do you see the next several years folding out? >> Well, they always say that the future is the hardest thing to predict but when I talked to our enterprise customers the two really big things that I see is there's this focus that says we need to really, we're not simply like hosting the website or running the MRP. I'm working with one customer in particular where they say, well, we're going to start on the factory floor all the way up to the boardroom effectively from IOT and sensors on the factory floor to feed all the data into machine learning. So they understand that the factory is running really well to actually doing planning and inventory maintenance to putting it on the website to drive the analytics, to then saying, okay, well how do we know that we're building the right product mix? How do we know that we're getting it out through the right channels? How are our customers doing? So they're really saying there's so many different services available to us in the cloud and they're relatively easy and straightforward to deploy. They really don't think in the old days as we talked about earlier that the old days where these multi-year planning and deployment cycles, now it's much more straightforward. It's like let's see what we can do today. And this week and this month, and from idea to some initial results is a much, much shorter turnaround. So they can iterate a lot more quickly which is just always known to produce better results. >> Well, Jeff and the spirit of the 15th birthday of AWS a lot of services have been built from the original three. I believe it was the core building blocks and there's been a lot of history and it's kind of like there was a key decoupling of compute from storage, those innovations what's the most important architectural change if any has happened or built upon those building blocks with AWS that you could share with companies out there as many people are coming into the cloud not just lifting and shifting and having that innovation but really building cloud native and now hybrid full cloud operations, day two operations. However you want to look at it. That's a big thing. What architecturally has changed that's been innovative from those original building blocks? >> Well, I think that the basic architecture has proven to be very, very resilient. When I wrote about the 15 year birthday of Amazon S3 a couple of weeks ago one thing that I thought was really incredible was the fact that the same APIs that you could have used 15 years ago they all still work. The put, the get, the list, the delete, the permissions management, every last one of those were chosen with extreme care. And so they all still work. So one of the things you think about when you put APIs out there is in Amazon terms we always talk about going through a one-way door and a one way door says, once you do it you're committed for the indefinite future. And so you we're very happy to do that but we take those steps with extreme care. And so those basic building blocks so the original S3 APIs, the original EC2 APIs and the model, all those things really worked. But now they're running at this just insane scale. One thing that blows me away I routinely hear my colleagues talking about petabytes and exabytes, and we throw around trillions and quadrillions like they're pennies. It's kind of amazing. Sometimes when you hear the scale of requests per day or request per month, and the orders of magnitude are you can't map them back to reality anymore. They're simply like literally astronomical. >> If I can just jump in real quick Dave before you ask Jeff, I was watching the Jeff Bezos interview in 1999 that's been going around on LinkedIn in a 60 minutes interview. The interviewer says you are reporting that you can store a gigabyte of customer data from all their purchases. What are you going to do with that? He basically nailed the answer. This is in 99. We're going to use that data to create, that was only a gig. >> Well one of the things that is interesting to me guys, is if you look at again, the early days of cloud, of course I always talked about that in small companies like ours John could have now access to information technology that only big companies could get access to. And now you've seen we just going to talk about it today. All these startups rise up and reach viability. But at the same time, Jeff you've seen big companies get the aha moment on cloud and competition drives urgency and that drives innovation. And so now you see everybody is doing cloud, it's a mandate. And so the expectation is a lot more innovation, experimentation and speed from all ends. It's really exciting to see. >> I know this sounds hackneyed and overused but it really, really still feels just like day one. We're 15 plus years into this. I still wake up every morning, like, wow what is the coolest thing that I'm going to get to learn about and write about today? We have the most amazing customers, one of the things that is great when you're so well connected to your customers, they keep telling you about their dreams, their aspirations, their use cases. And we can just take that and say we can actually build awesome things to help you address those use cases from the ground on up, from building custom hardware things like the nitro system, the graviton to the machine learning inferencing and training chips where we have such insight into customer use cases because we have these awesome customers that we can make these incredible pieces of hardware and software to really address those use cases. >> I'm glad you brought that up. This is another big change, right? You're getting the early days of cloud like, oh, Amazon they're just using off the shelf components. They're not buying these big refrigerator sized disc drives. And now you're developing all this custom Silicon and vertical integration in certain aspects of your business. And that's because workload is demanding. You've got to get more specialized in a lot of cases. >> Indeed they do. And if you watch Peter DeSantis' keynote at re-invent he talked about the fact that we're researching ways to make better cement that actually produces less carbon dioxide. So we're now literally at the from the ground on up level of construction. >> Jeff, I want to get a question from the crowd here. We got, (mumbles) who's a good friend of theCUBE cloud Arate from the beginning. He asked you, he wants to know if you'd like to share Amazon's edge aspirations. He says, he goes, I mean, roadmaps. I go, first of all, he's not going to talk about the roadmaps, but what can you share? I mean, obviously the edge is key. Outpost has been all in the news. You obviously at CloudOps is not a boundary. It's a distributed network. What's your response to-- >> Well, the funny thing is we don't generally have technology roadmaps inside the company. The roadmap is always listen really well to customers not just where they are, but the customers are just so great at saying, this is where we'd like to go. And when we hear edge, the customers don't generally come to us and say edge, they say we need as low latency as possible between where the action happens within our factory floors and our own offices and where we might be able to compute, analyze, store make decisions. And so that's resulted in things like outposts where we can put outposts in their own data center or their own field office, wavelength, where we're working with 5G telecom providers to put computing storage in the carrier hubs of the various 5G providers. Again, with reducing latency, we've been doing things like local zones, where we put zones in an increasing number of cities across the country with the goal of just reducing the average latency between the vast majority of customers and AWS resources. So instead of thinking edge, we really think in terms of how do we make sure that our customers can realize their dreams. >> Staying on the flywheel that AWS has built on ship stuff faster, make things faster, smaller, cheaper, great mission. I want to ask you about the working backwards document. I know it's been getting a lot of public awareness. I've been, that's all I've learned in interviewing Amazon folks. They always work backwards. I always mentioned the customer and all the interviews. So you've got a couple of customer references in there check the box there for you. But working backwards has become kind of a guiding principles, almost like a Harvard Business School case study approach to management. As you guys look at this working backwards and ex Amazonians have written books about it now so people can go look at, it's a really good methodology. Take us back to how you guys work back from the customers because here we're featuring 10 startups. So companies that are out there and Andy has been preaching this to customers. You should think about working backwards because it's so fast. These companies are going into this enterprise market your ecosystem of startups to provide value. What things are you seeing that customers need to think about to work backwards from their customer? How do you see that? 'Cause you've been on the community side, you see the tech side customers have to move fast and work backwards. What are the things that they need to focus on? What's your observation? >> So there's actually a brand new book called "Working Backwards," which I actually learned a lot about our own company from simply reading the book. And I think to me, a principal part of learning backward it's really about humility and being able to be a great listener. So you don't walk into a customer meeting ready to just broadcast the latest and greatest that we've been working on. You walk in and say, I'm here from AWS and I simply want to learn more about who you are, what you're doing. And most importantly, what do you want to do that we're not able to help you with right now? And then once we hear those kinds of things we don't simply write down kind of a bullet item of AWS needs to improve. It's this very active listening process. Tell me a little bit more about this challenge and if we solve it in this way or this way which one's a better fit for your needs. And then a typical AWS launch, we might talk to between 50 and 100 customers in depth to make sure that we have that detailed understanding of what they would like to do. We can't always meet all the needs of these customers but the idea is let's see what is the common base that we can address first. And then once we get that first iteration out there, let's keep listening, let's keep making it better and better and better as quickly. >> A lot of people might poopoo that John but I got to tell you, John, you will remember this the first time we ever met Andy Jassy face-to-face. I was in the room, you were on the speaker phone. We were building an app on AWS at the time. And he was asking you John, for feedback. And he was probing and he pulled out his notebook. He was writing down and he wasn't just superficial questions. He was like, well, why'd you do it that way? And he really wanted to dig. So this is cultural. >> Yeah. I mean, that's the classic Amazon. And that's the best thing about it is that you can go from zero startups zero stage startup to traction. And that was the premise of the cloud. Jeff, I want to get your thoughts and commentary on this love to get your opinion. You've seen this grow from the beginning. And I remember 'cause I've been playing with AWS since the beginning as well. And it says as an entrepreneur I remember my first EC2 instance that didn't even have custom domain support. It was the long URL. You seen the startups and now that we've been 15 years in, you see Dropbox was it just a startup back in the day. I remember these startups that when they were coming they were all born on Amazon, right? These big now unicorns, you were there when these guys were just developers and these gals. So what's it like, I mean, you see just the growth like here's a couple of people with them ideas rubbing nickels together, making magic happen who knows what's going to turn into, you've been there. What's it been like? >> It's been a really unique journey. And to me like the privilege of a lifetime, honestly I've like, you always want to be part of something amazing and you aspire to it and you study hard and you work hard and you always think, okay, somewhere in this universe something really cool is about to happen. And if you're really, really lucky and just a million great pieces of luck like lineup in series, sometimes it actually all works out and you get to be part of something like this when it does you don't always fully appreciate just how awesome it is from the inside, because you're just there just like feeding the machine and you are just doing your job just as fast as you possibly can. And in my case, it was listening to teams and writing blog posts about their launches and sharing them on social media, going out and speaking, you do it, you do it as quickly as possible. You're kind of running your whole life as you're doing that as well. And suddenly you just take a little step back and say, wow we did this kind of amazing thing, but we don't tend to like relax and say, okay, we've done it at Amazon. We get to a certain point. We recognize it. And five minutes later, we're like, okay, let's do the next amazingly good thing. But it's been this just unique privilege and something that I never thought I'd be fortunate enough to be a part of. >> Well, then the last few minutes we have Jeff I really appreciate you taking the time to spend with us for this inaugural launch of theCUBE on cloud startup showcase. We are showcasing 10 startups here from your ecosystem. And a lot of people who know AWS for the folks that don't you guys pride yourself on community and ecosystem the global startups program that Jeremy and his team are running. You guys nurture these startups. You want them to be successful. They're vectoring out into the marketplace with growth strategy, helping customers. What's your take on this ecosystem? As customers are out there listening to this what's your advice to them? How should they engage? Why is these sets of start-ups so important? >> Well, I totally love startups and I've spent time in several startups. I've spent other time consulting with them. And I think we're in this incredible time now wheres, it's so easy and straightforward to get those basic resources, to get your compute, to get your storage, to get your databases, to get your machine learning and to take that and to really focus on your customers and to build what you want. And we see this actual exponential growth. And we see these startups that find something to do. They listen to one of their customers, they build that solution. And they're just that feedback cycle gets started. It's really incredible. And I love to see the energy of these startups. I love to hear from them. And at any point if we've got an AWS powered startup and they build something awesome and want to share it with me, I'm all ears. I love to hear about them. Emails, Twitter mentions, whatever I'll just love to hear about all this energy all those great success with our startups. >> Jeff Barr, thank you for coming on. And congratulations, please pass on to Andy Jassy who's going to take over for Jeff Bezos and I saw the big news that he's picking a successor an Amazonian coming back into the fold, Adam. So congratulations on that. >> I will definitely pass on your congratulations to Andy and I worked with Adam in the past when AWS was just getting started and really looking forward to seeing him again, welcoming back and working with him. >> All right, Jeff Barr with AWS guys check out his Twitter and all the social coordinates. He is pumping out all the resources you need to know about if you're a developer or you're an enterprise looking to go to the next level, next generation, modern infrastructure. Thanks Jeff for coming on. Really appreciate it. Our next guests want to bring up stage Michael Liebow from McKinsey cube alumni, who is a great guest who is very timely in his McKinsey role with a paper he and his colleagues put out called cloud's trillion dollar prize up for grabs. Michael, thank you for coming up on stage with Dave and I. >> Hey, great to be here, John. Thank you. >> One of the things I loved about this and why I wanted you to come on was not only is the report awesome. And Dave has got a zillion questions, he want us to drill into. But in 2015, we wrote a story called Andy Jassy trillion dollar baby on Forbes, and then on medium and silken angle where we were the first ones to profile Andy Jassy and talk about this trillion dollar term. And Dave came up with the calculation and people thought we were crazy. What are you talking about trillion dollar opportunity. That was in 2015. You guys have put this together with a serious research report with methodology and you left a lot on the table. I noticed in the report you didn't even have a whole section quantified. So I think just scratching the surface trillion. I'd be a little light, Dave, so let's dig into it, Michael thanks for coming on. >> Well, and I got to say, Michael that John's a trillion dollar baby was revenue. Yours is EBITDA. So we're talking about seven to X, seven to eight X. What we were talking back then, but great job on the report. Fantastic work. >> Thank you. >> So tell us about the report gives a quick lowdown. I got some questions. You guys are unlocking the value drivers but give us a quick overview of this report that people can get for free. So everyone who's registered will get a copy but give us a quick rundown. >> Great. Well the question I think that has bothered all of us for a long time is what's the business value of cloud and how do you quantify it? How do you specify it? Because a lot of people talk around the infrastructure or technical value of cloud but that actually is a big problem because it just scratches the surface of the potential of what cloud can mean. And we focus around the fortune 500. So we had to box us in somewhat. And so focusing on the fortune 500 and fast forwarding to 2030, we put out this number that there's over a trillion dollars worth of value. And we did a lot of analysis using research from a variety of partners, using third-party research, primary research in order to come up with this view. So the business value is two X the technical value of cloud. And as you just pointed out, there is a whole unlock of additional value where organizations can pioneer on some of the newest technologies. And so AWS and others are creating platforms in order to do not just machine learning and analytics and IOT, but also for quantum or mixed reality for blockchain. And so organizations specific around the fortune 500 that aren't leveraging these capabilities today are going to get left behind. And that's the message we were trying to deliver that if you're not doing this and doing this with purpose and with great execution, that others, whether it's others in your industry or upstarts who were motioning into your industry, because as you say cloud democratizes compute, it provides these capabilities and small companies with talent. And that's what the skills can leverage these capabilities ahead of slow moving incumbents. And I think that was the critical component. So that gives you the framework. We can deep dive based on your questions. >> Well before we get into the deep dive, I want to ask you we have startups being showcased here as part of the, it will showcase, they're coming out of the ecosystem. They have a lot of certification from Amazon and they're secure, which is a big issue. Enterprises that you guys talk to McKinsey speaks directly to I call the boardroom CXOs, the top executives. Are they realizing that the scale and timing of this agility window? I mean, you want to go through these key areas that you would break out but as startups become more relevant the boardrooms that are making these big decisions realize that their businesses are up for grabs. Do they realize that all this wealth is shifting? And do they see the role of startups helping them? How did you guys come out of them and report on that piece? >> Well in terms of the whole notion, we came up with this framework which looked at the opportunity. We talked about it in terms of three dimensions, rejuvenate, innovate and pioneer. And so from the standpoint of a board they're more than focused on not just efficiency and cost reduction basically tied to nation, but innovation tied to analytics tied to machine learning, tied to IOT, tied to two key attributes of cloud speed and scale. And one of the things that we did in the paper was leverage case examples from across industry, across-region there's 17 different case examples. My three favorite is one is Moderna. So software for life couldn't have delivered the vaccine as fast as they did without cloud. My second example was Goldman Sachs got into consumer banking is the platform behind the Apple card couldn't have done it without leveraging cloud. And the third example, particularly in early days of the pandemic was Zoom that added five to 6,000 servers a night in order to scale to meet the demand. And so all three of those examples, plus the other 14 just indicate in business terms what the potential is and to convince boards and the C-suite that if you're not doing this, and we have some recommendations in terms of what CEOs should do in order to leverage this but to really take advantage of those capabilities. >> Michael, I think it's important to point out the approach at sometimes it gets a little wonky on the methodology but having done a lot of these types of studies and observed there's a lot of superficial studies out there, a lot of times people will do, they'll go I'll talk to a customer. What kind of ROI did you get? And boom, that's the value study. You took a different approach. You have benchmark data, you talked to a lot of companies. You obviously have a lot of financial data. You use some third-party data, you built models, you bounded it. And ultimately when you do these things you have to ascribe a value contribution to the cloud component because fortunate 500 companies are going to grow even if there were no cloud. And the way you did that is again, you talk to people you model things, and it's a very detailed study. And I think it's worth pointing out that this was not just hey what'd you get from going to cloud before and after. This was a very detailed deep dive with really a lot of good background work going into it. >> Yeah, we're very fortunate to have the McKinsey Global Institute which has done extensive studies in these areas. So there was a base of knowledge that we could leverage. In fact, we looked at over 700 use cases across 19 industries in order to unpack the value that cloud contributed to those use cases. And so getting down to that level of specificity really, I think helps build it from the bottom up and then using cloud measures or KPIs that indicate the value like how much faster you can deploy, how much faster you can develop. So these are things that help to kind of inform the overall model. >> Yeah. Again, having done hundreds, if not thousands of these types of things, when you start talking to people the patterns emerge, I want to ask you there's an exhibit tool in here, which is right on those use cases, retail, healthcare, high-tech oil and gas banking, and a lot of examples. And I went through them all and virtually every single one of them from a value contribution standpoint the unlocking value came down to data large data sets, document analysis, converting sentiment analysis, analytics. I mean, it really does come down to the data. And I wonder if you could comment on that and why is it that cloud is enabled that? >> Well, it goes back to scale. And I think the word that I would use would be data gravity because we're talking about massive amounts of data. So as you go through those kind of three dimensions in terms of rejuvenation one of the things you can do as you optimize and clarify and build better resiliency the thing that comes into play I think is to have clean data and data that's available in multiple places that you can create an underlying platform in order to leverage the services, the capabilities around, building out that structure. >> And then if I may, so you had this again I want to stress as EBITDA. It's not a revenue and it's the EBITDA potential as a result of leveraging cloud. And you listed a number of industries. And I wonder if you could comment on the patterns that you saw. I mean, it doesn't seem to be as simple as Negroponte bits versus Adam's in terms of your ability to unlock value. What are the patterns that you saw there and why are the ones that have so much potential why are they at the top of the list? >> Well, I mean, they're ranked based on impact. So the five greatest industries and again, aligned by the fortune 500. So it's interesting when you start to unpack it that way high-tech oil, gas, retail, healthcare, insurance and banking, right? Top. And so we did look at the different solutions that were in that, tried to decipher what was fully unlocked by cloud, what was accelerated by cloud and what was perhaps in this timeframe remaining on premise. And so we kind of step by step, expert by expert, use case by use case deciphered of the 700, how that applied. >> So how should practitioners within organizations business but how should they use this data? What would you recommend, in terms of how they think about it, how they apply it to their business, how they communicate? >> Well, I think clearly what came out was a set of best practices for what organizations that were leveraging cloud and getting the kind of business return, three things stood out, execution, experience and excellence. And so for under execution it's not just the transaction, you're not just buying cloud you're changing their operating model. And so if the organization isn't kind of retooling the model, the processes, the workflows in order to support creating the roles then they aren't going to be able, they aren't going to be successful. In terms of experience, that's all about hands-on. And so you have to dive in, you have to start you have to apply yourself, you have to gain that applied knowledge. And so if you're not gaining that experience, you're not going to move forward. And then in terms of excellence, and it was mentioned earlier by Jeff re-skilling, up-skilling, if you're not committed to your workforce and pushing certification, pushing training in order to really evolve your workforce or your ways of working you're not going to leverage cloud. So those three best practices really came up on top in terms of what a mature cloud adopter looks like. >> That's awesome. Michael, thank you for coming on. Really appreciate it. Last question I have for you as we wrap up this trillion dollar segment upon intended is the cloud mindset. You mentioned partnering and scaling up. The role of the enterprise and business is to partner with the technologists, not just the technologies but the companies talk about this cloud native mindset because it's not just lift and shift and run apps. And I have an IT optimization issue. It's about innovating next gen solutions and you're seeing it in public sector. You're seeing it in the commercial sector, all areas where the relationship with partners and companies and startups in particular, this is the startup showcase. These are startups are more relevant than ever as the tide is shifting to a new generation of companies. >> Yeah, so a lot of think about an engine. A lot of things have to work in order to produce the kind of results that we're talking about. Brad, you're more than fair share or unfair share of trillion dollars. And so CEOs need to lead this in bold fashion. Number one, they need to craft the moonshot or the Marshot. They have to set that goal, that aspiration. And it has to be a stretch goal for the organization because cloud is the only way to enable that achievement of that aspiration that's number one, number two, they really need a hardheaded economic case. It has to be defined in terms of what the expectation is going to be. So it's not loose. It's very, very well and defined. And in some respects time box what can we do here? I would say the cloud data, your organization has to move in an agile fashion training DevOps, and the fourth thing, and this is where the startups come in is the cloud platform. There has to be an underlying platform that supports those aspirations. It's an art, it's not just an architecture. It's a living, breathing live service with integrations, with standardization, with self service that enables this whole program. >> Awesome, Michael, thank you for coming on and sharing the McKinsey perspective. The report, the clouds trillion dollar prize is up for grabs. Everyone who's registered for this event will get a copy. We will appreciate it's also on the website. We'll make sure everyone gets a copy. Thanks for coming, I appreciate it. Thank you. >> Thanks, Michael. >> Okay, Dave, big discussion there. Trillion dollar baby. That's the cloud. That's Jassy. Now he's going to be the CEO of AWS. They have a new CEO they announced. So that's going to be good for Amazon's kind of got clarity on the succession to Jassy, trusted soldier. The ecosystem is big for Amazon. Unlike Microsoft, they have the different view, right? They have some apps, but they're cultivating as many startups and enterprises as possible in the cloud. And no better reason to change gears here and get a venture capitalist in here. And a friend of theCUBE, Jerry Chen let's bring them up on stage. Jerry Chen, great to see you partner at Greylock making all the big investments. Good to see you >> John hey, Dave it's great to be here with you guys. Happy marks.Can you see that? >> Hey Jerry, good to see you man >> So Jerry, our first inaugural AWS startup showcase we'll be doing these quarterly and we're going to be featuring the best of the best, you're investing in all the hot startups. We've been tracking your careers from the beginning. You're a good friend of theCUBE. Always got great commentary. Why are startups more important than ever before? Because in the old days we've talked about theCUBE before startups had to go through certain certifications and you've got tire kicking, you got to go through IT. It's like going through security at the airport, take your shoes off, put your belt on thing. I mean, all kinds of things now different. The world has changed. What's your take? >> I think startups have always been a great way for experimentation, right? It's either new technologies, new business models, new markets they can move faster, the experiment, and a lot of startups don't work, unfortunately, but a lot of them turned to be multi-billion dollar companies. I thing startup is more important because as we come out COVID and economy is recovery is a great way for individuals, engineers, for companies for different markets to try different things out. And I think startups are running multiple experiments at the same time across the globe trying to figure how to do things better, faster, cheaper. >> And McKinsey points out this use case of rejuvenate, which is essentially retool pivot essentially get your costs down or and the next innovation here where there's Tam there's trillion dollars on unlock value and where the bulk of it is is the innovation, the new use cases and existing new use cases. This is where the enterprises really have an opportunity. Could you share your thoughts as you invest in the startups to attack these new waves these new areas where it may not look the same as before, what's your assessment of this kind of innovation, these new use cases? >> I think we talked last time about kind of changing the COVID the past year and there's been acceleration of things like how we work, education, medicine all these things are going online. So I think that's very clear. The first wave of innovation is like, hey things we didn't think we could be possible, like working remotely, e-commerce everywhere, telemedicine, tele-education, that's happening. I think the second order of fact now is okay as enterprises realize that this is the new reality everything is digital, everything is in the cloud and everything's going to be more kind of electronic relation with the customers. I think that we're rethinking what does it mean to be a business? What does it mean to be a bank? What does it mean to be a car company or an energy company? What does it mean to be a retailer? Right? So I think the rethinking that brands are now global, brands are all online. And they now have relationships with the customers directly. So I think if you are a business now, you have to re experiment or rethink about your business model. If you thought you were a Nike selling shoes to the retailers, like half of Nike's revenue is now digital right all online. So instead of selling sneakers through stores they're now a direct to consumer brand. And so I think every business is going to rethink about what the AR. Airbnb is like are they in the travel business or the experience business, right? Airlines, what business are they in? >> Yeah, theCUBE we're direct to consumer virtual totally opened up our business model. Dave, the cloud premise is interesting now. I mean, let's reset this where we are, right? Andy Jassy always talks about the old guard, new guard. Okay we've been there done that, even though they still have a lot of Oracle inside AWS which we were joking the other day, but this new modern era coming out of COVID Jerry brings this up. These startups are going to be relevant take territory down in the enterprises as new things develop. What's your premise of the cloud and AWS prospect? >> Well, so Jerry, I want to to ask you. >> Jerry: Yeah. >> The other night, last Thursday, I think we were in Clubhouse. Ben Horowitz was on and Martine Casado was laying out this sort of premise about cloud startups saying basically at some point they're going to have to repatriate because of the Amazon VIG. I mean, I'm paraphrasing and I guess the premise was that there's this variable cost that grows as you scale but I kind of shook my head and I went back. You saw, I put it out on Twitter a clip that we had the a couple of years ago and I don't think, I certainly didn't see it that way. Maybe I'm getting it wrong but what's your take on that? I just don't see a snowflake ever saying, okay we're going to go build our own data center or we're going to repatriate 'cause they're going to end up like service now and have this high cost infrastructure. What do you think? >> Yeah, look, I think Martin is an old friend from VMware and he's brilliant. He has placed a lot of insights. There is some insights around, at some point a scale, use of startup can probably run things more cost-effectively in your own data center, right? But I think that's fewer companies more the vast majority, right? At some point, but number two, to your point, Dave going on premise versus your own data center are two different things. So on premise in a customer's environment versus your own data center are two different worlds. So at some point some scale, a lot of the large SaaS companies run their own data centers that makes sense, Facebook and Google they're at scale, they run their own data centers, going on premise or customer's environment like a fortune 100 bank or something like that. That's a different story. There are reasons to do that around compliance or data gravity, Dave, but Amazon's costs, I don't think is a legitimate reason. Like if price is an issue that could be solved much faster than architectural decisions or tech stacks, right? Once you're on the cloud I think the thesis, the conversation we had like a year ago was the way you build apps are very different in the cloud and the way built apps on premise, right? You have assume storage, networking and compute elasticity that's independent each other. You don't really get that in a customer's data center or their own environment even with all the new technologies. So you can't really go from cloud back to on-premise because the way you build your apps look very, very different. So I would say for sure at some scale run your own data center that's why the hyperscale guys do that. On-premise for customers, data gravity, compliance governance, great reasons to go on premise but for vast majority of startups and vast majority of customers, the network effects you get for being in the cloud, the network effects you get from having everything in this alas cloud service I think outweighs any of the costs. >> I couldn't agree more and that's where the data is, at the way I look at it is your technology spend is going to be some percentage of revenue and it's going to be generally flat over time and you're going to have to manage it whether it's in the cloud or it's on prem John. >> Yeah, we had a quote on theCUBE on the conscious that had Jerry I want to get your reaction to this. The executive said, if you don't have an AI strategy built into your value proposition you will be shorted as a stock on wall street. And I even went further. So you'll probably be delisted cause you won't be performing with a tongue in cheek comment. But the reality is that that's indicating that everyone has to have AI in their thing. Mainly as a reality, what's your take on that? I know you've got a lot of investments in this area as AI becomes beyond fashion and becomes table stakes. Where are we on that spectrum? And how does that impact business and society as that becomes a key part of the stack and application stack? >> Yeah, I think John you've seen AI machine learning turn out to be some kind of novelty thing that a bunch of CS professors working on years ago to a funnel piece of every application. So I would say the statement of the sentiment's directionally correct that 20 years ago if you didn't have a web strategy or a website as a company, your company be sure it, right? If you didn't have kind of a internet website, you weren't real company. Likewise, if you don't use AI now to power your applications or machine learning in some form or fashion for sure you'd be at a competitive disadvantage to everyone else. And just like if you're not using software intelligently or the cloud intelligently your stock as a company is going to underperform the rest of the market. And the cloud guys on the startups that we're backing are making AI so accessible and so easy for developers today that it's really easy to use some level of machine learning, any applications, if you're not doing that it's like not having a website in 1999. >> Yeah. So let's get into that whole operation side. So what would you be your advice to the enterprises that are watching and people who are making decisions on architecture and how they roll out their business model or value proposition? How should they look at AI and operations? I mean big theme is day two operations. You've got IT service management, all these things are being disrupted. What's the operational impact to this? What's your view on that? >> So I think two things, one thing that you and Dave both talked about operation is the key, I mean, operations is not just the guts of the business but the actual people running the business, right? And so we forget that one of the values are going to cloud, one of the values of giving these services is you not only have a different technology stack, all the bits, you have a different human stack meaning the people running your cloud, running your data center are now effectively outsource to Amazon, Google or Azure, right? Which I think a big part of the Amazon VIG as Dave said, is so eloquently on Twitter per se, right? You're really paying for those folks like carry pagers. Now take that to the next level. Operations is human beings, people intelligently trying to figure out how my business can run better, right? And that's either accelerate revenue or decrease costs, improve my margin. So if you want to use machine learning, I would say there's two areas to think about. One is how I think about customers, right? So we both talked about the amount of data being generated around enterprise individuals. So intelligently use machine learning how to serve my customers better, then number two AI and machine learning internally how to run my business better, right? Can I take cost out? Can I optimize supply chain? Can I use my warehouses more efficiently my logistics more efficiently? So one is how do I use AI learning to be a more familiar more customer oriented and number two, how can I take cost out be more efficient as a company, by writing AI internally from finance ops, et cetera. >> So, Jerry, I wonder if I could ask you a little different subject but a question on tactical valuations how coupled or decoupled are private company valuations from the public markets. You're seeing the public markets everybody's freaking out 'cause interest rates are going to go up. So the future value of cash flows are lower. Does that trickle in quickly into the private markets? Or is it a whole different dynamic? >> If I could weigh in poly for some private markets Dave I would have a different job than I do today. I think the reality is in the long run it doesn't matter as much as long as you're investing early. Now that's an easy answer say, boats have to fall away. Yes, interest rates will probably go up because they're hard to go lower, right? They're effectively almost zero to negative right now in most of the developed world, but at the end of the day, I'm not going to trade my Twilio shares or Salesforce shares for like a 1% yield bond, right? I'm going to hold the high growth tech stocks because regardless of what interest rates you're giving me 1%, 2%, 3%, I'm still going to beat that with a top tech performers, Snowflake, Twilio Hashi Corp, bunch of the private companies out there I think are elastic. They're going to have a great 10, 15 year run. And in the Greylock portfolio like the things we're investing in, I'm super bullish on from Roxanne to Kronos fear, to true era in the AI space. I think in the long run, next 10 years these things will outperform the market that said, right valuation prices have gone up and down and they will in our careers, they have. In the careers we've been covering tech. So I do believe that they're high now they'll come down for sure. Will they go back up again? Definitely, right? But as long as you're betting these macro waves I think we're all be good. >> Great answer as usual. Would you trade them for NFTs Jerry? >> That $69 million people piece of artwork look, I mean, I'm a longterm believer in kind of IP and property rights in the blockchain, right? And I'm waiting for theCUBE to mint this video as the NFT, when we do this guys, we'll mint this video's NFT and see how much people pay for the original Dave, John, Jerry (mumbles). >> Hey, you know what? We can probably get some good bang for that. Hey it's all about this next Jerry. Jerry, great to have you on, final question as we got this one minute left what's your advice to the people out there that either engaging with these innovative startups, we're going to feature startups every quarter from the in the Amazon ecosystem, they are going to be adding value. What's the advice to the enterprises that are engaging startups, the approach, posture, what's your advice. >> Yeah, when I talk to CIOs and large enterprises, they often are wary like, hey, when do I engage a startup? How, what businesses, and is it risky or low risk? Now I say, just like any career managing, just like any investment you're making in a big, small company you should have a budget or set of projects. And then I want to say to a CIO, Hey, every priority on your wish list, go use the startup, right? I mean, that would be 10 for 10 projects, 10 startups. Probably too much risk for a lot of tech companies. But we would say to most CIOs and executives, look, there are strategic initiatives in your business that you want to accelerate. And I would take the time to invest in one or two startups each quarter selectively, right? Use the time, focus on fewer startups, go deep with them because we can actually be game changers in terms of inflecting your business. And what I mean by that is don't pick too many startups because you can't devote the time, but don't pick zero startups because you're going to be left behind, right? It'd be shorted as a stock by the John, Dave and Jerry hedge fund apparently but pick a handful of startups in your strategic areas, in your top tier three things. These really, these could be accelerators for your career. >> I have to ask you real quick while you're here. We've got a couple minutes left on startups that are building apps. I've seen DevOps and the infrastructure as code movement has gone full mainstream. That's really what we're living right now. That kind of first-generation commercialization of DevOps. Now DevSecOps, what are the trends that you've seen that's different from say a couple of years ago now that we're in COVID around how apps are being built? Is it security? Is it the data integration? What can you share as a key app stack impact (mumbles)? >> Yeah, I think there're two things one is security is always been a top priority. I think that was the only going forward period, right? Security for sure. That's why you said that DevOps, DevSecOps like security is often overlooked but I think increasingly could be more important. The second thing is I think we talked about Dave mentioned earlier just the data around customers, the data on premise or the cloud, and there's a ton of data out there. We keep saying this over and over again like data's new oil, et cetera. It's evolving and not changing because the way we're using data finding data is changing in terms of sources of data we're using and discovering and also speed of data, right? In terms of going from Basser real-time is changing. The speed of business has changed to go faster. So I think these are all things that we're thinking about. So both security and how you use your data faster and better. >> Yeah you were in theCUBE a number of years ago and I remember either John or I asked you about you think Amazon is going to go up the stack and start developing applications and your answer was you know what I think no, I think they're going to enable a new set of disruptors to come in and disrupt the SaaS world. And I think that's largely playing out. And one of the interesting things about Adam Selipsky appointment to the CEO, he comes from Tableau. He really helped Tableau go from that sort of old guard model to an ARR model obviously executed a great exit to Salesforce. And now I see companies like Salesforce and service now and Workday is potential for your scenario to really play out. They've got in my view anyway, outdated pricing models. You look at what's how Snowflake's pricing and the consumption basis, same with Datadog same with Stripe and new startups seem to really be a leading into the consumption-based pricing model. So how do you, what are your thoughts on that? And maybe thoughts on Adam and thoughts on SaaS disruption? >> I think my thesis still holds that. I don't think Selipsky Adam is going to go into the app space aggressively. I think Amazon wants to enable next generation apps and seeing some of the new service that they're doing is they're kind of deconstructing apps, right? They're deconstructing the parts of CRM or e-commerce and they're offering them as services. So I think you're going to see Amazon continue to say, hey we're the core parts of an app like payments or custom prediction or some machine learning things around applications you want to buy bacon, they're going to turn those things to the API and sell those services, right? So you look at things like Stripe, Twilio which are two of the biggest companies out there. They're not apps themselves, they're the components of the app, right? Either e-commerce or messaging communications. So I can see Amazon going down that path. I think Adam is a great choice, right? He was a longterm early AWS exact from the early days latent to your point Dave really helped take Tableau into kind of a cloud business acquired by Salesforce work there for a few years under Benioff the guy who created quote unquote cloud and now him coming home again and back to Amazon. So I think it'll be exciting to see how Adam runs the business. >> And John I think he's the perfect choice because he's got operations chops and he knows how to... He can help the startups disrupt. >> Yeah, and he's been a trusted soldier of Jassy from the beginning, he knows the DNA. He's got some CEO outside experience. I think that was the key he knows. And he's not going to give up Amazon speed, but this is baby, right? So he's got him in charge and he's a trusted lieutenant. >> You think. Yeah, you think he's going to hold the mic? >> Yeah. We got to go. Jerry Chen thank you very much for coming on. Really appreciate it. Great to see you. Thanks for coming on our inaugural cube on cloud AWS startup event. Now for the 10 startups, enjoy the sessions at 12:30 Pacific, we're going to have the closing keynote. I'm John Ferry for Dave Vellante and our special guests, thanks for watching and enjoy the rest of the day and the 10 startups. (upbeat music)

Published Date : Mar 24 2021

SUMMARY :

of the most important stories in cloud. Thanks for having me. And they're going to present today it's really great to see Jeremy is the brains behind and partnering with you and great to have you on So the next one we've from the startup market to as AWS brings the cloud to the edge. One of the things that's coming up I mean, that's the bottom line. No better guests to have you Jeff for the past decade or so, going hard in the month or so run up to reinvent So I've got to ask you and one of the things that We've seen that as the move to digital, and sensors on the factory Well, Jeff and the spirit So one of the things you think about He basically nailed the answer. And so the expectation to help you address those use cases You're getting the early days at the from the ground I go, first of all, he's not going to talk of the various 5G providers. and all the interviews. And I think to me, a principal the first time we ever And that's the best thing about and you are just doing your job taking the time to spend And I love to see the and I saw the big news that forward to seeing him again, He is pumping out all the Hey, great to be here, John. One of the things I Well, and I got to say, Michael I got some questions. And so focusing on the fortune the boardrooms that are making And one of the things that we did And the way you did that is that indicate the value the patterns emerge, I want to ask you one of the things you on the patterns that you saw. and again, aligned by the fortune 500. and getting the kind of business return, as the tide is shifting to a and the fourth thing, and this and sharing the McKinsey perspective. on the succession to to be here with you guys. Because in the old days we've at the same time across the globe in the startups to attack these new waves and everything's going to be more kind of in the enterprises as new things develop. and I guess the premise because the way you build your apps and it's going to be that becomes a key part of the And the cloud guys on the What's the operational impact to this? all the bits, you have So the future value of And in the Greylock portfolio Would you trade them for NFTs Jerry? as the NFT, when we do this guys, What's the advice to the enterprises Use the time, focus on fewer startups, I have to ask you real the way we're using data finding data And one of the interesting and seeing some of the new He can help the startups disrupt. And he's not going to going to hold the mic? and the 10 startups.

ENTITIES

Entity	Category	Confidence
Jeremy	PERSON	0.99+
John	PERSON	0.99+
Mike Feinstein	PERSON	0.99+
Michael	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Dave	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Andy	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Martin	PERSON	0.99+
Michael Liebow	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Jeff Barr	PERSON	0.99+
Jerry	PERSON	0.99+
Michael skulk	PERSON	0.99+
Ben Haynes	PERSON	0.99+
Andy Jassy	PERSON	0.99+
2015	DATE	0.99+
Nike	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
Ben Horowitz	PERSON	0.99+
Adam	PERSON	0.99+
two	QUANTITY	0.99+
Australia	LOCATION	0.99+
five	QUANTITY	0.99+
$69 million	QUANTITY	0.99+
1999	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
John Ferry	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+

Wikibon | Action Item, Feb 2018

>> Hi I'm Peter Burris, welcome to Action Item. (electronic music) There's an enormous net new array of software technologies that are available to businesses and enterprises to tend to some new classes of problems and that means that there's an explosion in the number of problems that people perceive as could be applied, or could be solved, with software approaches. The whole world of how we're going to automate things differently in artificial intelligence and any number of other software technologies, are all being brought to bear on problems in ways that we never envisioned or never thought possible. That leads ultimately to a comparable explosion in the number of approaches to how we're going to solve some of these problems. That means new tooling, new models, new any number of other structures, conventions, and artifacts that are going to have to be factored by IT organizations and professionals in the technology industry as they conceive and put forward plans and approaches to solving some of these problems. Now, George that leads to a question. Are we going to see an ongoing ever-expanding array of approaches or are we going to see some new kind of steady-state that kind of starts to simplify what happens, or how enterprises conceive of the role of software and solving problems. >> Well, we've had... probably four decades of packaged applications being installed and defining really the systems of record, which first handled the ordered cash process and then layered around that. Once we had more CRM capabilities we had the sort of the opportunity to lead capability added in there. But systems of record fundamentally are backward looking, they're tracking about the performance of the business. The opportunity-- >> Peter: Recording what has happened? >> Yes, recording what has happened. The opportunity we have is now to combine what the big Internet companies pioneered, with systems of engagement. Where you had machine learning anticipating and influencing interactions. You can now combine those sorts of analytics with systems of record to inform and automate decisions in the form of transactions. And the question is now, how are we going to do this? Is there some way to simplify or, not completely standardized, but can we make it so that we have at least some conventions and design patterns for how to do that? >> And David, we've been working on this problem for quite some time but the notion of convergence has been extent in the hardware and the services, or in the systems business for quite some time. Take us through what convergence means and how it is going to set up new ways of thinking about software. >> So there's a hardware convergence and it's useful to define a few terms. There's converged systems, those are systems which have some management software that have been brought into it and then on top of that they have traditional SANs and networks. There's hyper-converged systems, which started off in the cloud systems and now have come to enterprise as well. And those bring software networking, software storage, software-- >> Software defined, so it's a virtualizing of those converged systems. >> David: Absolutely, and in the future is going to bring also automated operational stuff as well, AI in the operational side. And then there's full stack conversions. Where we start to put in the software, the application software, to begin with the database side of things and then the application itself on top of the database. And finally these, what you are talking about, the systems of intelligence. Where we can combine both the systems of record, the systems of engagement, and the real-time analytics as a complete stack. >> Peter: Let's talk about this for a second because ultimately what I think you're saying is, that we've got hardware convergence in the form of converged infrastructure, hyper-converged in the forms of virtualization of that, new ways of thinking about how the stack comes together, and new ways of thinking about application components. But what seems to be the common thread, through all of this, is data. >> David: Yes. >> So it's basically what we're seeing is a convergence or a rethinking of how software elements revolve around the data, is that kind of the centerpiece of this? >> David: That's the centerpiece of it and we had very serious constraints about accessing data. Those will improve with flash but there's still a lot of room for improvement. And the architecture that we are saying is going to come forward, which really helps this a lot, is the unit grid architecture. Where we offload the networking and the storage from the processor. This is already happening in the hyper scale clouds, they're putting a lot of effort into doing this. But we're at the same time allowing any processor to access any data in a much more fluid way and we can grow that to thousands of processes. Now that type of architecture gives us the ability to converge the traditional systems of record, and there are a lot of them obviously, and the systems of engagement and the the real-time analytics for the first time. >> But the focal point of that convergence is not the licensing of the software, the focal point is convergence around the data. >> The data. >> But that has some pretty significant implications when we think about how software has always been sold, how organizations to run software have been structured, the way that funding is set up within businesses. So George, what does it mean to talk about converging software around data from a practical standpoint over the next few years? >> Okay, so let me take that and interpret that as converging the software around data in the context of adding intelligence to our existing application portfolio and then the new applications that follow on. And basically, when we want to inject an intelligence enough to inform and anticipate and inform interactions or inform or automate transactions, we have a bunch of steps that need to get done. Where we're ingesting essentially contextual or ambient information. Often this is information about a user or the business process. And this data, it's got to go through a pipeline where there's both a Design Time and a Run Time. In addition to ingesting it, you have to sort of enrich it and make it ready for analysis. Then the analysis has essentially picking out of all that data and calculating the features that you plug into a machine learning model. And then that, produces essentially an inference based on all that data, that says well this is the probable value and it sounds like, sounds like it's in the weeds but the point is it's actually a standardized set of steps. Then the question is, do you put that all together in one product across that whole pipeline? Can one piece of infrastructure software manage that ? Or do you have a bunch of pieces each handing off to the next? And-- >> Peter: But let me stop you so because I want to make sure that we kind of follow this thread. So we've argued that hardware convergence and the ability to scale the role the data plays or how data is used, is happening and that opens up new opportunities to think about data. Now what we've got is we are centering a lot of the software convergence around the use of data through copies and other types of mechanisms for handling snapshots and whatnot and things like uni grid. What you're, let's start with this. It sounds like what you're saying is we need to think of new classes of investments in technologies that are specifically set up to handling the processing of data in a more distributed application way, right? If I got that right, that's kind of what we mean by pipelines? >> George: Yes. >> Okay, so once we do that, once we establish those conventions, once we establish organizationally institutionally how that's going to work. Now we take the next step of saying, are we going to default to a single set of products or are we going to do best to breed and what kind of convergence are we going to see there? >> And there's no-- >> First of all, have I got that right? >> Yes, but there's no right answer. And I think there's a bunch of variables that we have to play with that depend on who the customer is. For instance, the very largest and most sophisticated tech companies are more comfortable taking multiple pieces each that's very specialized and putting them together in a pipeline. >> Facebook, Yahoo, Google-- >> George: LinkedIn. >> Got it. >> George: Those guys. And the knobs that they're playing with, that everyone's playing with, are three, basically on the software side. There's your latency budget, which is how much time do you have to produce an answer. So that drives the transaction or the interaction. And it's not, that itself is not just a single answer because... It's not, the goal isn't to get it as short as possible. The goal is to get as much information into the analysis within the budgeted latency. >> Peter: So it's packing the latency budget with data? >> George: Yes, because the more data that goes into making the inference, the better the inference. >> Got it. >> The example that someone used actually on Fareed Zakaria GPS, one show about it was, if he had 300 attributes describing a person he could know more about that person then that person did (laughs) in terms of inferring other attributes. So the the point is, once you've got your latency budget, the other two knobs that you can play with are development complexity and admin complexity. And the idea is on development complexity, there's a bunch of abstractions that you have to deal with. If it's all one product you're going to have one data model, one address and namespace convention, one programming model, one way of persisting data, a whole bunch of things. That's simplicity. And that makes it more accessible to mainstream organizations. Similarly there's a bunch of, let me just add that, there's probably two or three times as many constructs that admins would have to deal with. So again, if you're dealing with one product, it's a huge burden off the admin and we know they struggled with Hadoop. >> So convergence, decisions about how to enact convergence is going to be partly or strongly influenced by those three issues. Latency budget, development complexity or simplicity, and administrative, David-- >> I'd like to add one more to that, and that is location of data. Because you want to be able to, you want to be able to look at the data that is most relevant to solving that particular problem. Now, today a lot of the data is inside the enterprise. There's a lot of data outside that but they're still, you will want to, in the best possible way, combine that data one way or another. >> But isn't that a variable on the latency budget? >> David: Well there's, I would think it's very useful to split the latency budget, which is to do with inference mainly, and development with the machine learning. So there is a development cycle with machine learning that is much longer. That is days, could be weeks, could be months. >> I would still done in Bash. >> It is or will be done, wait a second. It will be done in Bash, it is done in Bash, and it's. You need to test it and then deliver it as an inference engine to the applications that you're talking about. Now that's going to be very close together, that inference, then the rest of it has to be all physically very close together. But the data itself is spread out and you want to have mechanisms that can combine those datas, move application to those datas, bring those together in the best possible way. That is still a Bash process. That can run where the data is, in the cloud locally, wherever it is. >> George: And I think you brought up a great point, which I would tend to include in latency budget because... no matter what kind of answers you're looking for, some of the attributes are going to be pre computed and those could be-- >> David: Absolutely. >> External data. >> David: Yes. >> And you're not going to calculate everything in real time, there's just-- >> You can't. >> Yes you can't. >> But is the practical reality that the convergence of, so again, the argument. We've got all these new problems, all new kinds of new people that are claiming that they know how to solve the problems, each of them choosing different classes of tools to solve the problem, an explosion across the board in the approaches, which can lead to enormous downstream integration and complexity costs. You've used the example of Cloudera, for example. Some of the distro companies who claim that 50 plus percent of their development budget is dedicated to just integrating these pieces. That's a non-starter for a lot of enterprises. Are we fundamentally saying that the degree of complexity or the degree of simplicity and convergence, it's possible in software, is tied to the degree of convergence in the data? >> You're honing in on something really important, give me-- >> Peter: Thank you! (laughs) >> George: Give an example of the convergence of data that you're talking about. >> Peter: I'll let David do it because I think he's going to jump on it. >> David: Yes so let me take examples, for example. If you have a small business, there's no way that you want to invest yourself in any of the normal levels of machine learning and applications like that. You want to outsource that. So big software companies are going to do that for you and they're going to do it especially for the specific business processes which are unique to them, which give them digital differentiation of some sort or another. So for all of those type of things, software will come in from vendors, from SAP or son of SAP, which will help you solve those problems. And having data brokers which are collecting the data, putting them together, helping you with that. That seems to me the way things are going. In the same way that there's a lot of inference engines which will be out at the IOT level. Those will have very rapid analytics given to them. Again, not by yourself but by companies that specialize in facial recognition or specialize in making warehouse-- >> Wait a minute, are you saying that my customers aren't special, that require special facial recognition? (laughs) So I agree with David but I want to come back to this notion because-- >> David: The point I was getting at is, there's going to be lots and lots of room for software to be developed, to help in specific cases. >> Peter: And large markets to sell that software into. >> Very large markets. >> Whether it's a software, but increasingly also with services. But I want to come back to this notion of convergence because we talked about hardware convergence and we're starting to talk about the practical limits on software convergence. But somewhere in between I would argue, and I think you guys would agree, that really the catalyst for, or the thing that's going to determine the rate of change and the degree of convergence is going to be how we deal with data. Now you've done a lot of research on this, I'm going to put something out there and you tell me if I'm wrong. But at the end of the day, when we start thinking about uni grid, when we start thinking about some of these new technologies, and the ability to have single copies or single sources of data, multiple copies, in many respects what we're talking about is the virtualization of data without loss. >> David: Yes. >> Not loss of the characters, the fidelity of the data, or the state of the data. I got that right? >> Knowing the state of the data. >> Peter: Or knowing state of the data. >> If you take a snapshot, that's a point in time, you know what that point of time is, and you can do a lot of analytics for example on, and you want to do them on a certain time of day or whatever-- >> Peter: So is it wrong to say that we're seeing, we've moved through the virtualization of hardware and we're now in a hyper scale or hyper-converged, which is very powerful stuff. We're seeing this explosion in the amount of software that's being you know, the way we approach problems and whatnot. But that a forcing function, something that's going to both constrain how converged that can be, but also force or catalyze some convergence, is the idea that we're moving into an era where we can start to think about virtualized data through some of these distributed file systems-- >> David: That's right, and the metadata that goes with it. The most important thing about the data is, and it's increasing much more rapidly than data itself, is the metadata around it. But I want to just, make one point on this, all data isn't useful. There's a huge amount of data that we capture that we're just going to have to throw away. The idea that we can look at every piece of data for every decision is patently false. There's a lovely example of this in... fluid mechanics. >> Peter: Fluid dynamics. >> David: Fluid dynamics, if you're trying to, if you're trying to have simulation at a very very low level, the amount of-- >> Peter: High fidelity. >> High fidelity, you run out of capacity very very very quickly indeed. So you have to make trade-offs about everything and all of that data that you're doing in that simulation, you're not going to keep that. All the data from IOT, you can't keep that. >> Peter: And that's not just a statement about the performance or the power or the capabilities of the hardware, there's some physical realities-- >> David: Absolutely, yes. >> That are going to limit what you can do with the simulation. But, and we've talked. We've talked about this in other action items, There is this notion of options on data value, where the value of today's data is maybe-- >> David: Is much higher. >> Peter: Well it's higher from at a time standpoint for the problems that we understand and are trying to solve now but there may be future problems where we still want to ensure that we have some degree of data where we can be better at attending those future problems. But I want to come back to this point because in all honesty, I haven't heard anybody else talking about this and maybe's because I'm not listening. But this notion of again, your research that the notion of virtualized data inside these new architectures being a catalyst for a simplification of a lot of the sharing subsystem. >> David: It's essentially sharing of data. So instead of having the traditional way of doing it within a data center, which is I have my systems of record, I make a copy, it gets delivered to the data warehouse, for example. That's the way that's being done. That is too slow, moving data is incredibly slow. So another way of doing it is to share that data, make a virtual copy of it, and technologies allowing you to do that because the access density has gone up by thousands of times-- >> Peter: Because? >> Because. (laughs) Because of flash, because of new technologies at that level, >> Peter: High performance interfaces, high performance networks. >> David: All of that stuff is now allowing things, which just couldn't be even conceived. However, there is still a constraint there. It may be a thousand times bigger but there is still an absolute constraint to the amount of data that you can actually process. >> And that constraint is provided by latency. >> Latency. >> Peter: Speed of light. >> Speed of light and speed of the processes themselves. >> George: Let me add something that may help explain the sort of the virtualization of data and how it ties into the convergence or non convergence of the software around it. Which is, when we're building these analytic pipelines, essentially we've disassembled what used to be a DBMS. And so out of that we've got a storage engine, we've got query optimizers, we've got data manipulation languages which have grown into full-blown analytic languages, data definition language. Now the system catalog used to be just, a way to virtualize all the tables in the database and tell you where all the stuff was, and the indexes and things like that. Now, what we're seeing is since data is now spread out over so many places and products, we're seeing an emergence of a new of catalog. Whether that's from Elation or Dremio or on AWS, it's the Glue catalog, and I think there's something equivalent coming on Asure. But the point is, we're beginning, those are beginning to get useful enough to be the entry point for analytic products and maybe eventually even for transactional products to update, or at least to analyze the data in these pipelines that we're putting together out of these components of what was a disassembled database. Now, we could be-- >> I would make a difference there there between the development of analytics and again, the real-time use of those analytics within systems of intelligence. >> George: Yeah but when you're using them-- >> David: There's a different, problems they have to solve. >> George: But there's a Design Time and a Run Time, there's actually four pipelines for the sort of analytic pipeline itself. There's Design Time and Run Time, and then for the inference engine and the modeling that goes behind it, there's also a Design Time and Run Time. But I guess where. I'm not disagreeing that you could have one converged product to manage the Run Time analytic pipeline. I'm just saying that the pieces that you assemble could come from one vendor. >> Yeah but I think David's point, I think it's accurate and this has been since the beginning of time. (laughs) Certainly predated UNIVAC. That at the end of the day, read/write ratios and the characteristics of the data are going to have an enormous impact on the choices that you make. And high write to read ratios almost dictate the degree of convergence, and we used to call that SMP, or you know scale-up database managers. And for those types of applications, with those types of workloads, it's not necessarily obvious that that's going to change. Now we can still find ways to relax that but you're talking about, George, the new characteristics >> Injecting the analytics. >> Injecting the analytics where we're doing more reading as opposed to writing. We may still be writing into an application that has these characteristics-- >> That's a small amount of data. >> But a significant portion of the new function is associated with these new pipelines. >> Right. And it's actually... what data you create is generally derived data. So you're not stepping on something that's already there. >> All right, so let me get some action items here. David, I want to start with you. What's the action item? >> David: So for me, about conversions, there's two levels of conversions. First of all, converge as much as possible and give the work to the vendor, would be my action item. The more that you can go full stack, the more that you can get the software services from a single point, single throat to choke, single hand to shake, the more you have out source your problems to them. >> Peter: And that has a speed implication, time to value. >> Time to value, it has a, you don't have to do undifferentiated work. So that's the first level of convergence and then the second level of convergence is to look hard about how you can bring additional value to your existing systems of record by putting in automation or a real-time analytics. Which leads to automation, that is the second one, for me, where the money is. Automation, reduction in the number of things that people have to do. >> Peter: George, action item. >> So my action item is that you have to evaluate, you the customer have to evaluate sort of your skills as much as your existing application portfolio. And if more of your greenfield apps can start in the cloud and you're not religious about open source but you're more religious about the admin burden and development burden and your latency budget, then start focusing on the services that the cloud vendors originally created that were standalone, but they are increasingly integrating because the customers are leading them there. And then for those customers who you know, have decades and decades of infrastructure and applications on Prem and need a pathway to the cloud, some of the vendors formerly known as Hadoop vendors. But for that matter, any on Prem software vendor is providing customers a way to run workloads in a hybrid environment or to migrate data across platforms. >> All right, so let me give this a final action item here. Thank you David Foyer, George Gilbert. Neil Raiden and Jim Kobielus and the rest of the Wikibon team is with customers today. We talked today about convergence at the software level. What we've observed over the course of the last few years is an expanding array of software technologies, specifically AI, big data, machine learning, etc. That are allowing enterprises to think differently about the types of problems that they can solve with technology. That's leading to an explosion and a number of problems that folks are looking at, the number of individuals participating in making those decisions and thinking those issues through. And very importantly, an explosion of the number of vendors with piecemeal solutions about what they regard, their best approach to doing things. However, that is going to have a significant burden that could have enormous implications for years and so the question is, will we see a degree of convergence in the approach to doing software, in the form of pipelines and applications and whatnot, driven by a combination of: what the hardware is capable of doing, what the skills are or make possible, and very importantly, the natural attributes of the data. And we think that there will be. There will always be tension in the model if you try to invent new software but one of the factors that's going to bring it all back to a degree of simplicity, will be a combination of what the hardware can do, what people can do, and what the data can do. And so we believe, pretty strongly, that ultimately the issues surrounding data whether it be latency or location, as well as the development complexity and administrative complexity, are going to be a range of factors that are going to dictate ultimately of how some of these solutions start to converge and simplify within enterprises. As we look forward, our expectation is that we're going to see an enormous net new investment over the next few years in pipelines, because pipelines are a first-level set of investments on how we're going to handle data within the enterprise. And they'll look like, in certain respects, how DBMS used to look but just in a disaggregated way but conceptually and administratively and then from a product selection and service election standpoint, the expectation is that they themselves have to come together so the developers can have a consistent view of the data that's going to run inside the enterprise. Want to thank David Floyer, want to thank George Gilbert. Once again, this has been Wikibon Action Item and we look forward to seeing you on our next Action Item. (electronic music)

Published Date : Feb 16 2018

SUMMARY :

in the number of approaches to how we're going the sort of the opportunity to lead And the question is now, how are we going to do this? has been extent in the hardware and the services, and now have come to enterprise as well. of those converged systems. David: Absolutely, and in the future is going to bring hyper-converged in the forms of virtualization of that, and the the real-time analytics for the first time. the licensing of the software, the way that funding is set up within businesses. the features that you plug into a machine learning model. and the ability to scale how that's going to work. that we have to play with that It's not, the goal isn't to get it as short as possible. George: Yes, because the more data that goes the other two knobs that you can play with is going to be partly or strongly that is most relevant to solving that particular problem. to split the latency budget, that inference, then the rest of it has to be all some of the attributes are going to be pre computed But is the practical reality that the convergence of, George: Give an example of the convergence of data because I think he's going to jump on it. in any of the normal levels of there's going to be lots and lots of room for and the ability to have single copies Not loss of the characters, the fidelity of the data, the way we approach problems and whatnot. David: That's right, and the metadata that goes with it. and all of that data that you're doing in that simulation, That are going to limit what you can for the problems that we understand So instead of having the traditional way of doing it Because of flash, because of new technologies at that level, Peter: High performance interfaces, to the amount of data that you can actually process. and the indexes and things like that. the development of analytics and again, I'm just saying that the pieces that you assemble on the choices that you make. Injecting the analytics where we're doing But a significant portion of the new function is what data you create is generally derived data. What's the action item? the more that you can get the software services So that's the first level of convergence and applications on Prem and need a pathway to the cloud, of convergence in the approach to doing software,

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
David Floyer	PERSON	0.99+
George	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
George Gilbert	PERSON	0.99+
Peter	PERSON	0.99+
David Foyer	PERSON	0.99+
George Gilber	PERSON	0.99+
Feb 2018	DATE	0.99+
Yahoo	ORGANIZATION	0.99+
Neil Raiden	PERSON	0.99+
two	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
300 attributes	QUANTITY	0.99+
Bash	TITLE	0.99+
three	QUANTITY	0.99+
second level	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
two knobs	QUANTITY	0.99+
today	DATE	0.99+
two levels	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
one	QUANTITY	0.99+
first level	QUANTITY	0.99+
each	QUANTITY	0.98+
three issues	QUANTITY	0.98+
First	QUANTITY	0.98+
first time	QUANTITY	0.98+
one point	QUANTITY	0.98+
one product	QUANTITY	0.98+
both	QUANTITY	0.98+
UNIVAC	ORGANIZATION	0.98+
50 plus percent	QUANTITY	0.98+
decades	QUANTITY	0.98+
second one	QUANTITY	0.98+
single point	QUANTITY	0.97+
three times	QUANTITY	0.97+
one way	QUANTITY	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Dremio: