Breaking Analysis: Databricks faces critical strategic decisions…here’s why

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> Spark became a top level Apache project in 2014, and then shortly thereafter, burst onto the big data scene. Spark, along with the cloud, transformed and in many ways, disrupted the big data market. Databricks optimized its tech stack for Spark and took advantage of the cloud to really cleverly deliver a managed service that has become a leading AI and data platform among data scientists and data engineers. However, emerging customer data requirements are shifting into a direction that will cause modern data platform players generally and Databricks, specifically, we think, to make some key directional decisions and perhaps even reinvent themselves. Hello and welcome to this week's wikibon theCUBE Insights, powered by ETR. In this Breaking Analysis, we're going to do a deep dive into Databricks. We'll explore its current impressive market momentum. We're going to use some ETR survey data to show that, and then we'll lay out how customer data requirements are changing and what the ideal data platform will look like in the midterm future. We'll then evaluate core elements of the Databricks portfolio against that vision, and then we'll close with some strategic decisions that we think the company faces. And to do so, we welcome in our good friend, George Gilbert, former equities analyst, market analyst, and current Principal at TechAlpha Partners. George, good to see you. Thanks for coming on. >> Good to see you, Dave. >> All right, let me set this up. We're going to start by taking a look at where Databricks sits in the market in terms of how customers perceive the company and what it's momentum looks like. And this chart that we're showing here is data from ETS, the emerging technology survey of private companies. The N is 1,421. What we did is we cut the data on three sectors, analytics, database-data warehouse, and AI/ML. The vertical axis is a measure of customer sentiment, which evaluates an IT decision maker's awareness of the firm and the likelihood of engaging and/or purchase intent. The horizontal axis shows mindshare in the dataset, and we've highlighted Databricks, which has been a consistent high performer in this survey over the last several quarters. And as we, by the way, just as aside as we previously reported, OpenAI, which burst onto the scene this past quarter, leads all names, but Databricks is still prominent. You can see that the ETR shows some open source tools for reference, but as far as firms go, Databricks is very impressively positioned. Now, let's see how they stack up to some mainstream cohorts in the data space, against some bigger companies and sometimes public companies. This chart shows net score on the vertical axis, which is a measure of spending momentum and pervasiveness in the data set is on the horizontal axis. You can see that chart insert in the upper right, that informs how the dots are plotted, and net score against shared N. And that red dotted line at 40% indicates a highly elevated net score, anything above that we think is really, really impressive. And here we're just comparing Databricks with Snowflake, Cloudera, and Oracle. And that squiggly line leading to Databricks shows their path since 2021 by quarter. And you can see it's performing extremely well, maintaining an elevated net score and net range. Now it's comparable in the vertical axis to Snowflake, and it consistently is moving to the right and gaining share. Now, why did we choose to show Cloudera and Oracle? The reason is that Cloudera got the whole big data era started and was disrupted by Spark. And of course the cloud, Spark and Databricks and Oracle in many ways, was the target of early big data players like Cloudera. Take a listen to Cloudera CEO at the time, Mike Olson. This is back in 2010, first year of theCUBE, play the clip. >> Look, back in the day, if you had a data problem, if you needed to run business analytics, you wrote the biggest check you could to Sun Microsystems, and you bought a great big, single box, central server, and any money that was left over, you handed to Oracle for a database licenses and you installed that database on that box, and that was where you went for data. That was your temple of information. >> Okay? So Mike Olson implied that monolithic model was too expensive and inflexible, and Cloudera set out to fix that. But the best laid plans, as they say, George, what do you make of the data that we just shared? >> So where Databricks has really come up out of sort of Cloudera's tailpipe was they took big data processing, made it coherent, made it a managed service so it could run in the cloud. So it relieved customers of the operational burden. Where they're really strong and where their traditional meat and potatoes or bread and butter is the predictive and prescriptive analytics that building and training and serving machine learning models. They've tried to move into traditional business intelligence, the more traditional descriptive and diagnostic analytics, but they're less mature there. So what that means is, the reason you see Databricks and Snowflake kind of side by side is there are many, many accounts that have both Snowflake for business intelligence, Databricks for AI machine learning, where Snowflake, I'm sorry, where Databricks also did really well was in core data engineering, refining the data, the old ETL process, which kind of turned into ELT, where you loaded into the analytic repository in raw form and refine it. And so people have really used both, and each is trying to get into the other. >> Yeah, absolutely. We've reported on this quite a bit. Snowflake, kind of moving into the domain of Databricks and vice versa. And the last bit of ETR evidence that we want to share in terms of the company's momentum comes from ETR's Round Tables. They're run by Erik Bradley, and now former Gartner analyst and George, your colleague back at Gartner, Daren Brabham. And what we're going to show here is some direct quotes of IT pros in those Round Tables. There's a data science head and a CIO as well. Just make a few call outs here, we won't spend too much time on it, but starting at the top, like all of us, we can't talk about Databricks without mentioning Snowflake. Those two get us excited. Second comment zeros in on the flexibility and the robustness of Databricks from a data warehouse perspective. And then the last point is, despite competition from cloud players, Databricks has reinvented itself a couple of times over the year. And George, we're going to lay out today a scenario that perhaps calls for Databricks to do that once again. >> Their big opportunity and their big challenge for every tech company, it's managing a technology transition. The transition that we're talking about is something that's been bubbling up, but it's really epical. First time in 60 years, we're moving from an application-centric view of the world to a data-centric view, because decisions are becoming more important than automating processes. So let me let you sort of develop. >> Yeah, so let's talk about that here. We going to put up some bullets on precisely that point and the changing sort of customer environment. So you got IT stacks are shifting is George just said, from application centric silos to data centric stacks where the priority is shifting from automating processes to automating decision. You know how look at RPA and there's still a lot of automation going on, but from the focus of that application centricity and the data locked into those apps, that's changing. Data has historically been on the outskirts in silos, but organizations, you think of Amazon, think Uber, Airbnb, they're putting data at the core, and logic is increasingly being embedded in the data instead of the reverse. In other words, today, the data's locked inside the app, which is why you need to extract that data is sticking it to a data warehouse. The point, George, is we're putting forth this new vision for how data is going to be used. And you've used this Uber example to underscore the future state. Please explain? >> Okay, so this is hopefully an example everyone can relate to. The idea is first, you're automating things that are happening in the real world and decisions that make those things happen autonomously without humans in the loop all the time. So to use the Uber example on your phone, you call a car, you call a driver. Automatically, the Uber app then looks at what drivers are in the vicinity, what drivers are free, matches one, calculates an ETA to you, calculates a price, calculates an ETA to your destination, and then directs the driver once they're there. The point of this is that that cannot happen in an application-centric world very easily because all these little apps, the drivers, the riders, the routes, the fares, those call on data locked up in many different apps, but they have to sit on a layer that makes it all coherent. >> But George, so if Uber's doing this, doesn't this tech already exist? Isn't there a tech platform that does this already? >> Yes, and the mission of the entire tech industry is to build services that make it possible to compose and operate similar platforms and tools, but with the skills of mainstream developers in mainstream corporations, not the rocket scientists at Uber and Amazon. >> Okay, so we're talking about horizontally scaling across the industry, and actually giving a lot more organizations access to this technology. So by way of review, let's summarize the trend that's going on today in terms of the modern data stack that is propelling the likes of Databricks and Snowflake, which we just showed you in the ETR data and is really is a tailwind form. So the trend is toward this common repository for analytic data, that could be multiple virtual data warehouses inside of Snowflake, but you're in that Snowflake environment or Lakehouses from Databricks or multiple data lakes. And we've talked about what JP Morgan Chase is doing with the data mesh and gluing data lakes together, you've got various public clouds playing in this game, and then the data is annotated to have a common meaning. In other words, there's a semantic layer that enables applications to talk to the data elements and know that they have common and coherent meaning. So George, the good news is this approach is more effective than the legacy monolithic models that Mike Olson was talking about, so what's the problem with this in your view? >> So today's data platforms added immense value 'cause they connected the data that was previously locked up in these monolithic apps or on all these different microservices, and that supported traditional BI and AI/ML use cases. But now if we want to build apps like Uber or Amazon.com, where they've got essentially an autonomously running supply chain and e-commerce app where humans only care and feed it. But the thing is figuring out what to buy, when to buy, where to deploy it, when to ship it. We needed a semantic layer on top of the data. So that, as you were saying, the data that's coming from all those apps, the different apps that's integrated, not just connected, but it means the same. And the issue is whenever you add a new layer to a stack to support new applications, there are implications for the already existing layers, like can they support the new layer and its use cases? So for instance, if you add a semantic layer that embeds app logic with the data rather than vice versa, which we been talking about and that's been the case for 60 years, then the new data layer faces challenges that the way you manage that data, the way you analyze that data, is not supported by today's tools. >> Okay, so actually Alex, bring me up that last slide if you would, I mean, you're basically saying at the bottom here, today's repositories don't really do joins at scale. The future is you're talking about hundreds or thousands or millions of data connections, and today's systems, we're talking about, I don't know, 6, 8, 10 joins and that is the fundamental problem you're saying, is a new data error coming and existing systems won't be able to handle it? >> Yeah, one way of thinking about it is that even though we call them relational databases, when we actually want to do lots of joins or when we want to analyze data from lots of different tables, we created a whole new industry for analytic databases where you sort of mung the data together into fewer tables. So you didn't have to do as many joins because the joins are difficult and slow. And when you're going to arbitrarily join thousands, hundreds of thousands or across millions of elements, you need a new type of database. We have them, they're called graph databases, but to query them, you go back to the prerelational era in terms of their usability. >> Okay, so we're going to come back to that and talk about how you get around that problem. But let's first lay out what the ideal data platform of the future we think looks like. And again, we're going to come back to use this Uber example. In this graphic that George put together, awesome. We got three layers. The application layer is where the data products reside. The example here is drivers, rides, maps, routes, ETA, et cetera. The digital version of what we were talking about in the previous slide, people, places and things. The next layer is the data layer, that breaks down the silos and connects the data elements through semantics and everything is coherent. And then the bottom layers, the legacy operational systems feed that data layer. George, explain what's different here, the graph database element, you talk about the relational query capabilities, and why can't I just throw memory at solving this problem? >> Some of the graph databases do throw memory at the problem and maybe without naming names, some of them live entirely in memory. And what you're dealing with is a prerelational in-memory database system where you navigate between elements, and the issue with that is we've had SQL for 50 years, so we don't have to navigate, we can say what we want without how to get it. That's the core of the problem. >> Okay. So if I may, I just want to drill into this a little bit. So you're talking about the expressiveness of a graph. Alex, if you'd bring that back out, the fourth bullet, expressiveness of a graph database with the relational ease of query. Can you explain what you mean by that? >> Yeah, so graphs are great because when you can describe anything with a graph, that's why they're becoming so popular. Expressive means you can represent anything easily. They're conducive to, you might say, in a world where we now want like the metaverse, like with a 3D world, and I don't mean the Facebook metaverse, I mean like the business metaverse when we want to capture data about everything, but we want it in context, we want to build a set of digital twins that represent everything going on in the world. And Uber is a tiny example of that. Uber built a graph to represent all the drivers and riders and maps and routes. But what you need out of a database isn't just a way to store stuff and update stuff. You need to be able to ask questions of it, you need to be able to query it. And if you go back to prerelational days, you had to know how to find your way to the data. It's sort of like when you give directions to someone and they didn't have a GPS system and a mapping system, you had to give them turn by turn directions. Whereas when you have a GPS and a mapping system, which is like the relational thing, you just say where you want to go, and it spits out the turn by turn directions, which let's say, the car might follow or whoever you're directing would follow. But the point is, it's much easier in a relational database to say, "I just want to get these results. You figure out how to get it." The graph database, they have not taken over the world because in some ways, it's taking a 50 year leap backwards. >> Alright, got it. Okay. Let's take a look at how the current Databricks offerings map to that ideal state that we just laid out. So to do that, we put together this chart that looks at the key elements of the Databricks portfolio, the core capability, the weakness, and the threat that may loom. Start with the Delta Lake, that's the storage layer, which is great for files and tables. It's got true separation of compute and storage, I want you to double click on that George, as independent elements, but it's weaker for the type of low latency ingest that we see coming in the future. And some of the threats highlighted here. AWS could add transactional tables to S3, Iceberg adoption is picking up and could accelerate, that could disrupt Databricks. George, add some color here please? >> Okay, so this is the sort of a classic competitive forces where you want to look at, so what are customers demanding? What's competitive pressure? What are substitutes? Even what your suppliers might be pushing. Here, Delta Lake is at its core, a set of transactional tables that sit on an object store. So think of it in a database system, this is the storage engine. So since S3 has been getting stronger for 15 years, you could see a scenario where they add transactional tables. We have an open source alternative in Iceberg, which Snowflake and others support. But at the same time, Databricks has built an ecosystem out of tools, their own and others, that read and write to Delta tables, that's what makes the Delta Lake and ecosystem. So they have a catalog, the whole machine learning tool chain talks directly to the data here. That was their great advantage because in the past with Snowflake, you had to pull all the data out of the database before the machine learning tools could work with it, that was a major shortcoming. They fixed that. But the point here is that even before we get to the semantic layer, the core foundation is under threat. >> Yep. Got it. Okay. We got a lot of ground to cover. So we're going to take a look at the Spark Execution Engine next. Think of that as the refinery that runs really efficient batch processing. That's kind of what disrupted the DOOp in a large way, but it's not Python friendly and that's an issue because the data science and the data engineering crowd are moving in that direction, and/or they're using DBT. George, we had Tristan Handy on at Supercloud, really interesting discussion that you and I did. Explain why this is an issue for Databricks? >> So once the data lake was in place, what people did was they refined their data batch, and Spark has always had streaming support and it's gotten better. The underlying storage as we've talked about is an issue. But basically they took raw data, then they refined it into tables that were like customers and products and partners. And then they refined that again into what was like gold artifacts, which might be business intelligence metrics or dashboards, which were collections of metrics. But they were running it on the Spark Execution Engine, which it's a Java-based engine or it's running on a Java-based virtual machine, which means all the data scientists and the data engineers who want to work with Python are really working in sort of oil and water. Like if you get an error in Python, you can't tell whether the problems in Python or where it's in Spark. There's just an impedance mismatch between the two. And then at the same time, the whole world is now gravitating towards DBT because it's a very nice and simple way to compose these data processing pipelines, and people are using either SQL in DBT or Python in DBT, and that kind of is a substitute for doing it all in Spark. So it's under threat even before we get to that semantic layer, it so happens that DBT itself is becoming the authoring environment for the semantic layer with business intelligent metrics. But that's again, this is the second element that's under direct substitution and competitive threat. >> Okay, let's now move down to the third element, which is the Photon. Photon is Databricks' BI Lakehouse, which has integration with the Databricks tooling, which is very rich, it's newer. And it's also not well suited for high concurrency and low latency use cases, which we think are going to increasingly become the norm over time. George, the call out threat here is customers want to connect everything to a semantic layer. Explain your thinking here and why this is a potential threat to Databricks? >> Okay, so two issues here. What you were touching on, which is the high concurrency, low latency, when people are running like thousands of dashboards and data is streaming in, that's a problem because SQL data warehouse, the query engine, something like that matures over five to 10 years. It's one of these things, the joke that Andy Jassy makes just in general, he's really talking about Azure, but there's no compression algorithm for experience. The Snowflake guy started more than five years earlier, and for a bunch of reasons, that lead is not something that Databricks can shrink. They'll always be behind. So that's why Snowflake has transactional tables now and we can get into that in another show. But the key point is, so near term, it's struggling to keep up with the use cases that are core to business intelligence, which is highly concurrent, lots of users doing interactive query. But then when you get to a semantic layer, that's when you need to be able to query data that might have thousands or tens of thousands or hundreds of thousands of joins. And that's a SQL query engine, traditional SQL query engine is just not built for that. That's the core problem of traditional relational databases. >> Now this is a quick aside. We always talk about Snowflake and Databricks in sort of the same context. We're not necessarily saying that Snowflake is in a position to tackle all these problems. We'll deal with that separately. So we don't mean to imply that, but we're just sort of laying out some of the things that Snowflake or rather Databricks customers we think, need to be thinking about and having conversations with Databricks about and we hope to have them as well. We'll come back to that in terms of sort of strategic options. But finally, when come back to the table, we have Databricks' AI/ML Tool Chain, which has been an awesome capability for the data science crowd. It's comprehensive, it's a one-stop shop solution, but the kicker here is that it's optimized for supervised model building. And the concern is that foundational models like GPT could cannibalize the current Databricks tooling, but George, can't Databricks, like other software companies, integrate foundation model capabilities into its platform? >> Okay, so the sound bite answer to that is sure, IBM 3270 terminals could call out to a graphical user interface when they're running on the XT terminal, but they're not exactly good citizens in that world. The core issue is Databricks has this wonderful end-to-end tool chain for training, deploying, monitoring, running inference on supervised models. But the paradigm there is the customer builds and trains and deploys each model for each feature or application. In a world of foundation models which are pre-trained and unsupervised, the entire tool chain is different. So it's not like Databricks can junk everything they've done and start over with all their engineers. They have to keep maintaining what they've done in the old world, but they have to build something new that's optimized for the new world. It's a classic technology transition and their mentality appears to be, "Oh, we'll support the new stuff from our old stuff." Which is suboptimal, and as we'll talk about, their biggest patron and the company that put them on the map, Microsoft, really stopped working on their old stuff three years ago so that they could build a new tool chain optimized for this new world. >> Yeah, and so let's sort of close with what we think the options are and decisions that Databricks has for its future architecture. They're smart people. I mean we've had Ali Ghodsi on many times, super impressive. I think they've got to be keenly aware of the limitations, what's going on with foundation models. But at any rate, here in this chart, we lay out sort of three scenarios. One is re-architect the platform by incrementally adopting new technologies. And example might be to layer a graph query engine on top of its stack. They could license key technologies like graph database, they could get aggressive on M&A and buy-in, relational knowledge graphs, semantic technologies, vector database technologies. George, as David Floyer always says, "A lot of ways to skin a cat." We've seen companies like, even think about EMC maintained its relevance through M&A for many, many years. George, give us your thought on each of these strategic options? >> Okay, I find this question the most challenging 'cause remember, I used to be an equity research analyst. I worked for Frank Quattrone, we were one of the top tech shops in the banking industry, although this is 20 years ago. But the M&A team was the top team in the industry and everyone wanted them on their side. And I remember going to meetings with these CEOs, where Frank and the bankers would say, "You want us for your M&A work because we can do better." And they really could do better. But in software, it's not like with EMC in hardware because with hardware, it's easier to connect different boxes. With software, the whole point of a software company is to integrate and architect the components so they fit together and reinforce each other, and that makes M&A harder. You can do it, but it takes a long time to fit the pieces together. Let me give you examples. If they put a graph query engine, let's say something like TinkerPop, on top of, I don't even know if it's possible, but let's say they put it on top of Delta Lake, then you have this graph query engine talking to their storage layer, Delta Lake. But if you want to do analysis, you got to put the data in Photon, which is not really ideal for highly connected data. If you license a graph database, then most of your data is in the Delta Lake and how do you sync it with the graph database? If you do sync it, you've got data in two places, which kind of defeats the purpose of having a unified repository. I find this semantic layer option in number three actually more promising, because that's something that you can layer on top of the storage layer that you have already. You just have to figure out then how to have your query engines talk to that. What I'm trying to highlight is, it's easy as an analyst to say, "You can buy this company or license that technology." But the really hard work is making it all work together and that is where the challenge is. >> Yeah, and well look, I thank you for laying that out. We've seen it, certainly Microsoft and Oracle. I guess you might argue that well, Microsoft had a monopoly in its desktop software and was able to throw off cash for a decade plus while it's stock was going sideways. Oracle had won the database wars and had amazing margins and cash flow to be able to do that. Databricks isn't even gone public yet, but I want to close with some of the players to watch. Alex, if you'd bring that back up, number four here. AWS, we talked about some of their options with S3 and it's not just AWS, it's blob storage, object storage. Microsoft, as you sort of alluded to, was an early go-to market channel for Databricks. We didn't address that really. So maybe in the closing comments we can. Google obviously, Snowflake of course, we're going to dissect their options in future Breaking Analysis. Dbt labs, where do they fit? Bob Muglia's company, Relational.ai, why are these players to watch George, in your opinion? >> So everyone is trying to assemble and integrate the pieces that would make building data applications, data products easy. And the critical part isn't just assembling a bunch of pieces, which is traditionally what AWS did. It's a Unix ethos, which is we give you the tools, you put 'em together, 'cause you then have the maximum choice and maximum power. So what the hyperscalers are doing is they're taking their key value stores, in the case of ASW it's DynamoDB, in the case of Azure it's Cosmos DB, and each are putting a graph query engine on top of those. So they have a unified storage and graph database engine, like all the data would be collected in the key value store. Then you have a graph database, that's how they're going to be presenting a foundation for building these data apps. Dbt labs is putting a semantic layer on top of data lakes and data warehouses and as we'll talk about, I'm sure in the future, that makes it easier to swap out the underlying data platform or swap in new ones for specialized use cases. Snowflake, what they're doing, they're so strong in data management and with their transactional tables, what they're trying to do is take in the operational data that used to be in the province of many state stores like MongoDB and say, "If you manage that data with us, it'll be connected to your analytic data without having to send it through a pipeline." And that's hugely valuable. Relational.ai is the wildcard, 'cause what they're trying to do, it's almost like a holy grail where you're trying to take the expressiveness of connecting all your data in a graph but making it as easy to query as you've always had it in a SQL database or I should say, in a relational database. And if they do that, it's sort of like, it'll be as easy to program these data apps as a spreadsheet was compared to procedural languages, like BASIC or Pascal. That's the implications of Relational.ai. >> Yeah, and again, we talked before, why can't you just throw this all in memory? We're talking in that example of really getting down to differences in how you lay the data out on disk in really, new database architecture, correct? >> Yes. And that's why it's not clear that you could take a data lake or even a Snowflake and why you can't put a relational knowledge graph on those. You could potentially put a graph database, but it'll be compromised because to really do what Relational.ai has done, which is the ease of Relational on top of the power of graph, you actually need to change how you're storing your data on disk or even in memory. So you can't, in other words, it's not like, oh we can add graph support to Snowflake, 'cause if you did that, you'd have to change, or in your data lake, you'd have to change how the data is physically laid out. And then that would break all the tools that talk to that currently. >> What in your estimation, is the timeframe where this becomes critical for a Databricks and potentially Snowflake and others? I mentioned earlier midterm, are we talking three to five years here? Are we talking end of decade? What's your radar say? >> I think something surprising is going on that's going to sort of come up the tailpipe and take everyone by storm. All the hype around business intelligence metrics, which is what we used to put in our dashboards where bookings, billings, revenue, customer, those things, those were the key artifacts that used to live in definitions in your BI tools, and DBT has basically created a standard for defining those so they live in your data pipeline or they're defined in their data pipeline and executed in the data warehouse or data lake in a shared way, so that all tools can use them. This sounds like a digression, it's not. All this stuff about data mesh, data fabric, all that's going on is we need a semantic layer and the business intelligence metrics are defining common semantics for your data. And I think we're going to find by the end of this year, that metrics are how we annotate all our analytic data to start adding common semantics to it. And we're going to find this semantic layer, it's not three to five years off, it's going to be staring us in the face by the end of this year. >> Interesting. And of course SVB today was shut down. We're seeing serious tech headwinds, and oftentimes in these sort of downturns or flat turns, which feels like this could be going on for a while, we emerge with a lot of new players and a lot of new technology. George, we got to leave it there. Thank you to George Gilbert for excellent insights and input for today's episode. I want to thank Alex Myerson who's on production and manages the podcast, of course Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Siliconangle.com, he does some great editing. Remember all these episodes, they're available as podcasts. Wherever you listen, all you got to do is search Breaking Analysis Podcast, we publish each week on wikibon.com and siliconangle.com, or you can email me at David.Vellante@siliconangle.com, or DM me @DVellante. Comment on our LinkedIn post, and please do check out ETR.ai, great survey data, enterprise tech focus, phenomenal. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis.

Published Date : Mar 10 2023

SUMMARY :

bringing you data-driven core elements of the Databricks portfolio and pervasiveness in the data and that was where you went for data. and Cloudera set out to fix that. the reason you see and the robustness of Databricks and their big challenge and the data locked into in the real world and decisions Yes, and the mission of that is propelling the likes that the way you manage that data, is the fundamental problem because the joins are difficult and slow. and connects the data and the issue with that is the fourth bullet, expressiveness and it spits out the and the threat that may loom. because in the past with Snowflake, Think of that as the refinery So once the data lake was in place, George, the call out threat here But the key point is, in sort of the same context. and the company that put One is re-architect the platform and architect the components some of the players to watch. in the case of ASW it's DynamoDB, and why you can't put a relational and executed in the data and manages the podcast, of

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
David Floyer	PERSON	0.99+
Mike Olson	PERSON	0.99+
2014	DATE	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
George	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Erik Bradley	PERSON	0.99+
Dave	PERSON	0.99+
Uber	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Sun Microsystems	ORGANIZATION	0.99+
50 years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Bob Muglia	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
60 years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
2010	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Kristin Martin	PERSON	0.99+
Rob Hof	PERSON	0.99+
three	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Databricks'	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
Boston	LOCATION	0.99+
Tristan Handy	PERSON	0.99+
M&A	ORGANIZATION	0.99+
Frank Quattrone	PERSON	0.99+
second element	QUANTITY	0.99+
Daren Brabham	PERSON	0.99+
TechAlpha Partners	ORGANIZATION	0.99+
third element	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
50 year	QUANTITY	0.99+
40%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
five years	QUANTITY	0.99+

Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps

>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)

Published Date : Jan 14 2023

SUMMARY :

with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal

ENTITIES

Entity	Category	Confidence
Bob Muglia	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
David Flynn	PERSON	0.99+
Veronica	PERSON	0.99+
Jack	PERSON	0.99+
Nelu Mihai	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Nick Taylor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Veronica Durgin	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Warner Media	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ionis Pharmaceutical	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Bob Muglia	PERSON	0.99+
David Flore	PERSON	0.99+
DBT Labs	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Palo Alto	LOCATION	0.99+
21 sessions	QUANTITY	0.99+
Darren Bramberm	PERSON	0.99+
33 guests	QUANTITY	0.99+
Nir Zuk	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Harveer Singh	PERSON	0.99+
Kit Colbert	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Supercloud 2	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
last year	DATE	0.99+
Western Union	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
200 locations	QUANTITY	0.99+
August	DATE	0.99+
Keith Townsend	PERSON	0.99+
Data Mesh	ORGANIZATION	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
next week	DATE	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
second	QUANTITY	0.99+
first point	QUANTITY	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
VMware	ORGANIZATION	0.98+
Silicon Angle	ORGANIZATION	0.98+
ETR	ORGANIZATION	0.98+
Eric Bradley	PERSON	0.98+
two	QUANTITY	0.98+
today	DATE	0.98+
Sachs	ORGANIZATION	0.98+
SAKS	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
last August	DATE	0.98+
each week	QUANTITY	0.98+

Bob Muglia, George Gilbert & Tristan Handy | How Supercloud will Support a new Class of Data Apps

(upbeat music) >> Hello, everybody. This is Dave Vellante. Welcome back to Supercloud2, where we're exploring the intersection of data analytics and the future of cloud. In this segment, we're going to look at how the Supercloud will support a new class of applications, not just work that runs on multiple clouds, but rather a new breed of apps that can orchestrate things in the real world. Think Uber for many types of businesses. These applications, they're not about codifying forms or business processes. They're about orchestrating people, places, and things in a business ecosystem. And I'm pleased to welcome my colleague and friend, George Gilbert, former Gartner Analyst, Wiki Bond market analyst, former equities analyst as my co-host. And we're thrilled to have Tristan Handy, who's the founder and CEO of DBT Labs and Bob Muglia, who's the former President of Microsoft's Enterprise business and former CEO of Snowflake. Welcome all, gentlemen. Thank you for coming on the program. >> Good to be here. >> Thanks for having us. >> Hey, look, I'm going to start actually with the SuperCloud because both Tristan and Bob, you've read the definition. Thank you for doing that. And Bob, you have some really good input, some thoughts on maybe some of the drawbacks and how we can advance this. So what are your thoughts in reading that definition around SuperCloud? >> Well, I thought first of all that you did a very good job of laying out all of the characteristics of it and helping to define it overall. But I do think it can be tightened a bit, and I think it's helpful to do it in as short a way as possible. And so in the last day I've spent a little time thinking about how to take it and write a crisp definition. And here's my go at it. This is one day old, so gimme a break if it's going to change. And of course we have to follow the industry, and so that, and whatever the industry decides, but let's give this a try. So in the way I think you're defining it, what I would say is a SuperCloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. >> Boom. Nice. Okay, great. I'm going to go back and read the script on that one and tighten that up a bit. Thank you for spending the time thinking about that. Tristan, would you add anything to that or what are your thoughts on the whole SuperCloud concept? >> So as I read through this, I fully realize that we need a word for this thing because I have experienced the inability to talk about it as well. But for many of us who have been living in the Confluence, Snowflake, you know, this world of like new infrastructure, this seems fairly uncontroversial. Like I read through this, and I'm just like, yeah, this is like the world I've been living in for years now. And I noticed that you called out Snowflake for being an example of this, but I think that there are like many folks, myself included, for whom this world like fully exists today. >> Yeah, I think that's a fair, I dunno if it's criticism, but people observe, well, what's the big deal here? It's just kind of what we're living in today. It reminds me of, you know, Tim Burns Lee saying, well, this is what the internet was supposed to be. It was supposed to be Web 2.0, so maybe this is what multi-cloud was supposed to be. Let's turn our attention to apps. Bob first and then go to Tristan. Bob, what are data apps to you? When people talk about data products, is that what they mean? Are we talking about something more, different? What are data apps to you? >> Well, to understand data apps, it's useful to contrast them to something, and I just use the simple term people apps. I know that's a little bit awkward, but it's clear. And almost everything we work with, almost every application that we're familiar with, be it email or Salesforce or any consumer app, those are applications that are targeted at responding to people. You know, in contrast, a data application reacts to changes in data and uses some set of analytic services to autonomously take action. So where applications that we're familiar with respond to people, data apps respond to changes in data. And they both do something, but they do it for different reasons. >> Got it. You know, George, you and I were talking about, you know, it comes back to SuperCloud, broad definition, narrow definition. Tristan, how do you see it? Do you see it the same way? Do you have a different take on data apps? >> Oh, geez. This is like a conversation that I don't know has an end. It's like been, I write a substack, and there's like this little community of people who all write substack. We argue with each other about these kinds of things. Like, you know, as many different takes on this question as you can find, but the way that I think about it is that data products are atomic units of functionality that are fundamentally data driven in nature. So a data product can be as simple as an interactive dashboard that is like actually had design thinking put into it and serves a particular user group and has like actually gone through kind of a product development life cycle. And then a data app or data application is a kind of cohesive end-to-end experience that often encompasses like many different data products. So from my perspective there, this is very, very related to the way that these things are produced, the kinds of experiences that they're provided, that like data innovates every product that we've been building in, you know, software engineering for, you know, as long as there have been computers. >> You know, Jamak Dagani oftentimes uses the, you know, she doesn't name Spotify, but I think it's Spotify as that kind of example she uses. But I wonder if we can maybe try to take some examples. If you take, like George, if you take a CRM system today, you're inputting leads, you got opportunities, it's driven by humans, they're really inputting the data, and then you got this system that kind of orchestrates the business process, like runs a forecast. But in this data driven future, are we talking about the app itself pulling data in and automatically looking at data from the transaction systems, the call center, the supply chain and then actually building a plan? George, is that how you see it? >> I go back to the example of Uber, may not be the most sophisticated data app that we build now, but it was like one of the first where you do have users interacting with their devices as riders trying to call a car or driver. But the app then looks at the location of all the drivers in proximity, and it matches a driver to a rider. It calculates an ETA to the rider. It calculates an ETA then to the destination, and it calculates a price. Those are all activities that are done sort of autonomously that don't require a human to type something into a form. The application is using changes in data to calculate an analytic product and then to operationalize that, to assign the driver to, you know, calculate a price. Those are, that's an example of what I would think of as a data app. And my question then I guess for Tristan is if we don't have all the pieces in place for sort of mainstream companies to build those sorts of apps easily yet, like how would we get started? What's the role of a semantic layer in making that easier for mainstream companies to build? And how do we get started, you know, say with metrics? How does that, how does that take us down that path? >> So what we've seen in the past, I dunno, decade or so, is that one of the most successful business models in infrastructure is taking hard things and rolling 'em up behind APIs. You take messaging, you take payments, and you all of a sudden increase the capability of kind of your median application developer. And you say, you know, previously you were spending all your time being focused on how do you accept credit cards, how do you send SMS payments, and now you can focus on your business logic, and just create the thing. One of, interestingly, one of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse, inside of your data lake. These are core concepts that, you know, you would imagine that the business would be able to create applications around very easily, but in fact that's not the case. It's actually quite challenging to, and involves a lot of data engineering pipeline and all this work to make these available. And so if you really want to make it very easy to create some of these data experiences for users, you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes, and they don't need to. >> So how rich can that API layer grow if you start with metric definitions that you've defined? And DBT has, you know, the metric, the dimensions, the time grain, things like that, that's a well scoped sort of API that people can work within. How much can you extend that to say non-calculated business rules or governance information like data reliability rules, things like that, or even, you know, features for an AIML feature store. In other words, it starts, you started pragmatically, but how far can you grow? >> Bob is waiting with bated breath to answer this question. I'm, just really quickly, I think that we as a company and DBT as a product tend to be very pragmatic. We try to release the simplest possible version of a thing, get it out there, and see if people use it. But the idea that, the concept of a metric is really just a first landing pad. The really, there is a physical manifestation of the data and then there's a logical manifestation of the data. And what we're trying to do here is make it very easy to access the logical manifestation of the data, and metric is a way to look at that. Maybe an entity, a customer, a user is another way to look at that. And I'm sure that there will be more kind of logical structures as well. >> So, Bob, chime in on this. You know, what's your thoughts on the right architecture behind this, and how do we get there? >> Yeah, well first of all, I think one of the ways we get there is by what companies like DBT Labs and Tristan is doing, which is incrementally taking and building on the modern data stack and extending that to add a semantic layer that describes the data. Now the way I tend to think about this is a fairly major shift in the way we think about writing applications, which is today a code first approach to moving to a world that is model driven. And I think that's what the big change will be is that where today we think about data, we think about writing code, and we use that to produce APIs as Tristan said, which encapsulates those things together in some form of services that are useful for organizations. And that idea of that encapsulation is never going to go away. It's very, that concept of an API is incredibly useful and will exist well into the future. But what I think will happen is that in the next 10 years, we're going to move to a world where organizations are defining models first of their data, but then ultimately of their business process, their entire business process. Now the concept of a model driven world is a very old concept. I mean, I first started thinking about this and playing around with some early model driven tools, probably before Tristan was born in the early 1980s. And those tools didn't work because the semantics associated with executing the model were too complex to be written in anything other than a procedural language. We're now reaching a time where that is changing, and you see it everywhere. You see it first of all in the world of machine learning and machine learning models, which are taking over more and more of what applications are doing. And I think that's an incredibly important step. And learned models are an important part of what people will do. But if you look at the world today, I will claim that we've always been modeling. Modeling has existed in computers since there have been integrated circuits and any form of computers. But what we do is what I would call implicit modeling, which means that it's the model is written on a whiteboard. It's in a bunch of Slack messages. It's on a set of napkins in conversations that happen and during Zoom. That's where the model gets defined today. It's implicit. There is one in the system. It is hard coded inside application logic that exists across many applications with humans being the glue that connects those models together. And really there is no central place you can go to understand the full attributes of the business, all of the business rules, all of the business logic, the business data. That's going to change in the next 10 years. And we'll start to have a world where we can define models about what we're doing. Now in the short run, the most important models to build are data models and to describe all of the attributes of the data and their relationships. And that's work that DBT Labs is doing. A number of other companies are doing that. We're taking steps along that way with catalogs. People are trying to build more complete ontologies associated with that. The underlying infrastructure is still super, super nascent. But what I think we'll see is this infrastructure that exists today that's building learned models in the form of machine learning programs. You know, some of these incredible machine learning programs in foundation models like GPT and DALL-E and all of the things that are happening in these global scale models, but also all of that needs to get applied to the domains that are appropriate for a business. And I think we'll see the infrastructure developing for that, that can take this concept of learned models and put it together with more explicitly defined models. And this is where the concept of knowledge graphs come in and then the technology that underlies that to actually implement and execute that, which I believe are relational knowledge graphs. >> Oh, oh wow. There's a lot to unpack there. So let me ask the Colombo question, Tristan, we've been making fun of your youth. We're just, we're just jealous. Colombo, I'll explain it offline maybe. >> I watch Colombo. >> Okay. All right, good. So but today if you think about the application stack and the data stack, which is largely an analytics pipeline. They're separate. Do they, those worlds, do they have to come together in order to achieve Bob's vision? When I talk to practitioners about that, they're like, well, I don't want to complexify the application stack cause the data stack today is so, you know, hard to manage. But but do those worlds have to come together? And you know, through that model, I guess abstraction or translation that Bob was just describing, how do you guys think about that? Who wants to take that? >> I think it's inevitable that data and AI are going to become closer together? I think that the infrastructure there has been moving in that direction for a long time. Whether you want to use the Lakehouse portmanteau or not. There's also, there's a next generation of data tech that is still in the like early stage of being developed. There's a company that I love that is essentially Cross Cloud Lambda, and it's just a wonderful abstraction for computing. So I think that, you know, people have been predicting that these worlds are going to come together for awhile. A16Z wrote a great post on this back in I think 2020, predicting this, and I've been predicting this since since 2020. But what's not clear is the timeline, but I think that this is still just as inevitable as it's been. >> Who's that that does Cross Cloud? >> Let me follow up on. >> Who's that, Tristan, that does Cross Cloud Lambda? Can you name names? >> Oh, they're called Modal Labs. >> Modal Labs, yeah, of course. All right, go ahead, George. >> Let me ask about this vision of trying to put the semantics or the code that represents the business with the data. It gets us to a world that's sort of more data centric, where data's not locked inside or behind the APIs of different applications so that we don't have silos. But at the same time, Bob, I've heard you talk about building the semantics gradually on top of, into a knowledge graph that maybe grows out of a data catalog. And the vision of getting to that point, essentially the enterprise's metadata and then the semantics you're going to add onto it are really stored in something that's separate from the underlying operational and analytic data. So at the same time then why couldn't we gradually build semantics beyond the metric definitions that DBT has today? In other words, you build more and more of the semantics in some layer that DBT defines and that sits above the data management layer, but any requests for data have to go through the DBT layer. Is that a workable alternative? Or where, what type of limitations would you face? >> Well, I think that it is the way the world will evolve is to start with the modern data stack and, you know, which is operational applications going through a data pipeline into some form of data lake, data warehouse, the Lakehouse, whatever you want to call it. And then, you know, this wide variety of analytics services that are built together. To the point that Tristan made about machine learning and data coming together, you see that in every major data cloud provider. Snowflake certainly now supports Python and Java. Databricks is of course building their data warehouse. Certainly Google, Microsoft and Amazon are doing very, very similar things in terms of building complete solutions that bring together an analytics stack that typically supports languages like Python together with the data stack and the data warehouse. I mean, all of those things are going to evolve, and they're not going to go away because that infrastructure is relatively new. It's just being deployed by companies, and it solves the problem of working with petabytes of data if you need to work with petabytes of data, and nothing will do that for a long time. What's missing is a layer that understands and can model the semantics of all of this. And if you need to, if you want to model all, if you want to talk about all the semantics of even data, you need to think about all of the relationships. You need to think about how these things connect together. And unfortunately, there really is no platform today. None of our existing platforms are ultimately sufficient for this. It was interesting, I was just talking to a customer yesterday, you know, a large financial organization that is building out these semantic layers. They're further along than many companies are. And you know, I asked what they're building it on, and you know, it's not surprising they're using a, they're using combinations of some form of search together with, you know, textual based search together with a document oriented database. In this case it was Cosmos. And that really is kind of the state of the art right now. And yet those products were not built for this. They don't really, they can't manage the complicated relationships that are required. They can't issue the queries that are required. And so a new generation of database needs to be developed. And fortunately, you know, that is happening. The world is developing a new set of relational algorithms that will be able to work with hundreds of different relations. If you look at a SQL database like Snowflake or a big query, you know, you get tens of different joins coming together, and that query is going to take a really long time. Well, fortunately, technology is evolving, and it's possible with new join algorithms, worst case, optimal join algorithms they're called, where you can join hundreds of different relations together and run semantic queries that you simply couldn't run. Now that technology is nascent, but it's really important, and I think that will be a requirement to have this semantically reach its full potential. In the meantime, Tristan can do a lot of great things by building up on what he's got today and solve some problems that are very real. But in the long run I think we'll see a new set of databases to support these models. >> So Tristan, you got to respond to that, right? You got to, so take the example of Snowflake. We know it doesn't deal well with complex joins, but they're, they've got big aspirations. They're building an ecosystem to really solve some of these problems. Tristan, you guys are part of that ecosystem, and others, but please, your thoughts on what Bob just shared. >> Bob, I'm curious if, I would have no idea what you were talking about except that you introduced me to somebody who gave me a demo of a thing and do you not want to go there right now? >> No, I can talk about it. I mean, we can talk about it. Look, the company I've been working with is Relational AI, and they're doing this work to actually first of all work across the industry with academics and research, you know, across many, many different, over 20 different research institutions across the world to develop this new set of algorithms. They're all fully published, just like SQL, the underlying algorithms that are used by SQL databases are. If you look today, every single SQL database uses a similar set of relational algorithms underneath that. And those algorithms actually go back to system R and what IBM developed in the 1970s. We're just, there's an opportunity for us to build something new that allows you to take, for example, instead of taking data and grouping it together in tables, treat all data as individual relations, you know, a key and a set of values and then be able to perform purely relational operations on it. If you go back to what, to Codd, and what he wrote, he defined two things. He defined a relational calculus and relational algebra. And essentially SQL is a query language that is translated by the query processor into relational algebra. But however, the calculus of SQL is not even close to the full semantics of the relational mathematics. And it's possible to have systems that can do everything and that can store all of the attributes of the data model or ultimately the business model in a form that is much more natural to work with. >> So here's like my short answer to this. I think that we're dealing in different time scales. I think that there is actually a tremendous amount of work to do in the semantic layer using the kind of technology that we have on the ground today. And I think that there's, I don't know, let's say five years of like really solid work that there is to do for the entire industry, if not more. But the wonderful thing about DBT is that it's independent of what the compute substrate is beneath it. And so if we develop new platforms, new capabilities to describe semantic models in more fine grain detail, more procedural, then we're going to support that too. And so I'm excited about all of it. >> Yeah, so interpreting that short answer, you're basically saying, cause Bob was just kind of pointing to you as incremental, but you're saying, yeah, okay, we're applying it for incremental use cases today, but we can accommodate a much broader set of examples in the future. Is that correct, Tristan? >> I think you're using the word incremental as if it's not good, but I think that incremental is great. We have always been about applying incremental improvement on top of what exists today, but allowing practitioners to like use different workflows to actually make use of that technology. So yeah, yeah, we are a very incremental company. We're going to continue being that way. >> Well, I think Bob was using incremental as a pejorative. I mean, I, but to your point, a lot. >> No, I don't think so. I want to stop that. No, I don't think it's pejorative at all. I think incremental, incremental is usually the most successful path. >> Yes, of course. >> In my experience. >> We agree, we agree on that. >> Having tried many, many moonshot things in my Microsoft days, I can tell you that being incremental is a good thing. And I'm a very big believer that that's the way the world's going to go. I just think that there is a need for us to build something new and that ultimately that will be the solution. Now you can argue whether it's two years, three years, five years, or 10 years, but I'd be shocked if it didn't happen in 10 years. >> Yeah, so we all agree that incremental is less disruptive. Boom, but Tristan, you're, I think I'm inferring that you believe you have the architecture to accommodate Bob's vision, and then Bob, and I'm inferring from Bob's comments that maybe you don't think that's the case, but please. >> No, no, no. I think that, so Bob, let me put words into your mouth and you tell me if you disagree, DBT is completely useless in a world where a large scale cloud data warehouse doesn't exist. We were not able to bring the power of Python to our users until these platforms started supporting Python. Like DBT is a layer on top of large scale computing platforms. And to the extent that those platforms extend their functionality to bring more capabilities, we will also service those capabilities. >> Let me try and bridge the two. >> Yeah, yeah, so Bob, Bob, Bob, do you concur with what Tristan just said? >> Absolutely, I mean there's nothing to argue with in what Tristan just said. >> I wanted. >> And it's what he's doing. It'll continue to, I believe he'll continue to do it, and I think it's a very good thing for the industry. You know, I'm just simply saying that on top of that, I would like to provide Tristan and all of those who are following similar paths to him with a new type of database that can actually solve these problems in a much more architected way. And when I talk about Cosmos with something like Mongo or Cosmos together with Elastic, you're using Elastic as the join engine, okay. That's the purpose of it. It becomes a poor man's join engine. And I kind of go, I know there's a better answer than that. I know there is, but that's kind of where we are state of the art right now. >> George, we got to wrap it. So give us the last word here. Go ahead, George. >> Okay, I just, I think there's a way to tie together what Tristan and Bob are both talking about, and I want them to validate it, which is for five years we're going to be adding or some number of years more and more semantics to the operational and analytic data that we have, starting with metric definitions. My question is for Bob, as DBT accumulates more and more of those semantics for different enterprises, can that layer not run on top of a relational knowledge graph? And what would we lose by not having, by having the knowledge graph store sort of the joins, all the complex relationships among the data, but having the semantics in the DBT layer? >> Well, I think this, okay, I think first of all that DBT will be an environment where many of these semantics are defined. The question we're asking is how are they stored and how are they processed? And what I predict will happen is that over time, as companies like DBT begin to build more and more richness into their semantic layer, they will begin to experience challenges that customers want to run queries, they want to ask questions, they want to use this for things where the underlying infrastructure becomes an obstacle. I mean, this has happened in always in the history, right? I mean, you see major advances in computer science when the data model changes. And I think we're on the verge of a very significant change in the way data is stored and structured, or at least metadata is stored and structured. Again, I'm not saying that anytime in the next 10 years, SQL is going to go away. In fact, more SQL will be written in the future than has been written in the past. And those platforms will mature to become the engines, the slicer dicers of data. I mean that's what they are today. They're incredibly powerful at working with large amounts of data, and that infrastructure is maturing very rapidly. What is not maturing is the infrastructure to handle all of the metadata and the semantics that that requires. And that's where I say knowledge graphs are what I believe will be the solution to that. >> But Tristan, bring us home here. It sounds like, let me put pause at this, is that whatever happens in the future, we're going to leverage the vast system that has become cloud that we're talking about a supercloud, sort of where data lives irrespective of physical location. We're going to have to tap that data. It's not necessarily going to be in one place, but give us your final thoughts, please. >> 100% agree. I think that the data is going to live everywhere. It is the responsibility for both the metadata systems and the data processing engines themselves to make sure that we can join data across cloud providers, that we can join data across different physical regions and that we as practitioners are going to kind of start forgetting about details like that. And we're going to start thinking more about how we want to arrange our teams, how does the tooling that we use support our team structures? And that's when data mesh I think really starts to get very, very critical as a concept. >> Guys, great conversation. It was really awesome to have you. I can't thank you enough for spending time with us. Really appreciate it. >> Thanks a lot. >> All right. This is Dave Vellante for George Gilbert, John Furrier, and the entire Cube community. Keep it right there for more content. You're watching SuperCloud2. (upbeat music)

Published Date : Jan 4 2023

SUMMARY :

and the future of cloud. And Bob, you have some really and I think it's helpful to do it I'm going to go back and And I noticed that you is that what they mean? that we're familiar with, you know, it comes back to SuperCloud, is that data products are George, is that how you see it? that don't require a human to is that one of the most And DBT has, you know, the And I'm sure that there will be more on the right architecture is that in the next 10 years, So let me ask the Colombo and the data stack, which is that is still in the like Modal Labs, yeah, of course. and that sits above the and that query is going to So Tristan, you got to and that can store all of the that there is to do for the pointing to you as incremental, but allowing practitioners to I mean, I, but to your point, a lot. the most successful path. that that's the way the that you believe you have the architecture and you tell me if you disagree, there's nothing to argue with And I kind of go, I know there's George, we got to wrap it. and more of those semantics and the semantics that that requires. is that whatever happens in the future, and that we as practitioners I can't thank you enough John Furrier, and the

ENTITIES

Entity	Category	Confidence
Tristan	PERSON	0.99+
George Gilbert	PERSON	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
Steve Mullaney	PERSON	0.99+
Katie	PERSON	0.99+
David Floyer	PERSON	0.99+
Charles	PERSON	0.99+
Mike Dooley	PERSON	0.99+
Peter Burris	PERSON	0.99+
Chris	PERSON	0.99+
Tristan Handy	PERSON	0.99+
Bob	PERSON	0.99+
Maribel Lopez	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Mike Wolf	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Merim	PERSON	0.99+
Adrian Cockcroft	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Brian Rossi	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Chris Wegmann	PERSON	0.99+
Whole Foods	ORGANIZATION	0.99+
Eric	PERSON	0.99+
Chris Hoff	PERSON	0.99+
Jamak Dagani	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Caterpillar	ORGANIZATION	0.99+
John Walls	PERSON	0.99+
Marianna Tessel	PERSON	0.99+
Josh	PERSON	0.99+
Europe	LOCATION	0.99+
Jerome	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lori MacVittie	PERSON	0.99+
2007	DATE	0.99+
Seattle	LOCATION	0.99+
10	QUANTITY	0.99+
five	QUANTITY	0.99+
Ali Ghodsi	PERSON	0.99+
Peter McKee	PERSON	0.99+
Nutanix	ORGANIZATION	0.99+
Eric Herzog	PERSON	0.99+
India	LOCATION	0.99+
Mike	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Peter	PERSON	0.99+
Dave	PERSON	0.99+
Tanuja Randery	PERSON	0.99+

The Truth About MySQL HeatWave

>>When Oracle acquired my SQL via the Sun acquisition, nobody really thought the company would put much effort into the platform preferring to focus all the wood behind its leading Oracle database, Arrow pun intended. But two years ago, Oracle surprised many folks by announcing my SQL Heatwave a new database as a service with a massively parallel hybrid Columbia in Mary Mary architecture that brings together transactional and analytic data in a single platform. Welcome to our latest database, power panel on the cube. My name is Dave Ante, and today we're gonna discuss Oracle's MySQL Heat Wave with a who's who of cloud database industry analysts. Holgar Mueller is with Constellation Research. Mark Stammer is the Dragon Slayer and Wikibon contributor. And Ron Westfall is with Fu Chim Research. Gentlemen, welcome back to the Cube. Always a pleasure to have you on. Thanks for having us. Great to be here. >>So we've had a number of of deep dive interviews on the Cube with Nip and Aggarwal. You guys know him? He's a senior vice president of MySQL, Heatwave Development at Oracle. I think you just saw him at Oracle Cloud World and he's come on to describe this is gonna, I'll call it a shock and awe feature additions to to heatwave. You know, the company's clearly putting r and d into the platform and I think at at cloud world we saw like the fifth major release since 2020 when they first announced MySQL heat wave. So just listing a few, they, they got, they taken, brought in analytics machine learning, they got autopilot for machine learning, which is automation onto the basic o l TP functionality of the database. And it's been interesting to watch Oracle's converge database strategy. We've contrasted that amongst ourselves. Love to get your thoughts on Amazon's get the right tool for the right job approach. >>Are they gonna have to change that? You know, Amazon's got the specialized databases, it's just, you know, the both companies are doing well. It just shows there are a lot of ways to, to skin a cat cuz you see some traction in the market in, in both approaches. So today we're gonna focus on the latest heat wave announcements and we're gonna talk about multi-cloud with a native MySQL heat wave implementation, which is available on aws MySQL heat wave for Azure via the Oracle Microsoft interconnect. This kind of cool hybrid action that they got going. Sometimes we call it super cloud. And then we're gonna dive into my SQL Heatwave Lake house, which allows users to process and query data across MyQ databases as heatwave databases, as well as object stores. So, and then we've got, heatwave has been announced on AWS and, and, and Azure, they're available now and Lake House I believe is in beta and I think it's coming out the second half of next year. So again, all of our guests are fresh off of Oracle Cloud world in Las Vegas. So they got the latest scoop. Guys, I'm done talking. Let's get into it. Mark, maybe you could start us off, what's your opinion of my SQL Heatwaves competitive position? When you think about what AWS is doing, you know, Google is, you know, we heard Google Cloud next recently, we heard about all their data innovations. You got, obviously Azure's got a big portfolio, snowflakes doing well in the market. What's your take? >>Well, first let's look at it from the point of view that AWS is the market leader in cloud and cloud services. They own somewhere between 30 to 50% depending on who you read of the market. And then you have Azure as number two and after that it falls off. There's gcp, Google Cloud platform, which is further way down the list and then Oracle and IBM and Alibaba. So when you look at AWS and you and Azure saying, hey, these are the market leaders in the cloud, then you start looking at it and saying, if I am going to provide a service that competes with the service they have, if I can make it available in their cloud, it means that I can be more competitive. And if I'm compelling and compelling means at least twice the performance or functionality or both at half the price, I should be able to gain market share. >>And that's what Oracle's done. They've taken a superior product in my SQL heat wave, which is faster, lower cost does more for a lot less at the end of the day and they make it available to the users of those clouds. You avoid this little thing called egress fees, you avoid the issue of having to migrate from one cloud to another and suddenly you have a very compelling offer. So I look at what Oracle's doing with MyQ and it feels like, I'm gonna use a word term, a flanking maneuver to their competition. They're offering a better service on their platforms. >>All right, so thank you for that. Holger, we've seen this sort of cadence, I sort of referenced it up front a little bit and they sat on MySQL for a decade, then all of a sudden we see this rush of announcements. Why did it take so long? And and more importantly is Oracle, are they developing the right features that cloud database customers are looking for in your view? >>Yeah, great question, but first of all, in your interview you said it's the edit analytics, right? Analytics is kind of like a marketing buzzword. Reports can be analytics, right? The interesting thing, which they did, the first thing they, they, they crossed the chasm between OTP and all up, right? In the same database, right? So major engineering feed very much what customers want and it's all about creating Bellevue for customers, which, which I think is the part why they go into the multi-cloud and why they add these capabilities. And they certainly with the AI capabilities, it's kind of like getting it into an autonomous field, self-driving field now with the lake cost capabilities and meeting customers where they are, like Mark has talked about the e risk costs in the cloud. So that that's a significant advantage, creating value for customers and that's what at the end of the day matters. >>And I believe strongly that long term it's gonna be ones who create better value for customers who will get more of their money From that perspective, why then take them so long? I think it's a great question. I think largely he mentioned the gentleman Nial, it's largely to who leads a product. I used to build products too, so maybe I'm a little fooling myself here, but that made the difference in my view, right? So since he's been charged, he's been building things faster than the rest of the competition, than my SQL space, which in hindsight we thought was a hot and smoking innovation phase. It kind of like was a little self complacent when it comes to the traditional borders of where, where people think, where things are separated between OTP and ola or as an example of adjacent support, right? Structured documents, whereas unstructured documents or databases and all of that has been collapsed and brought together for building a more powerful database for customers. >>So I mean it's certainly, you know, when, when Oracle talks about the competitors, you know, the competitors are in the, I always say they're, if the Oracle talks about you and knows you're doing well, so they talk a lot about aws, talk a little bit about Snowflake, you know, sort of Google, they have partnerships with Azure, but, but in, so I'm presuming that the response in MySQL heatwave was really in, in response to what they were seeing from those big competitors. But then you had Maria DB coming out, you know, the day that that Oracle acquired Sun and, and launching and going after the MySQL base. So it's, I'm, I'm interested and we'll talk about this later and what you guys think AWS and Google and Azure and Snowflake and how they're gonna respond. But, but before I do that, Ron, I want to ask you, you, you, you can get, you know, pretty technical and you've probably seen the benchmarks. >>I know you have Oracle makes a big deal out of it, publishes its benchmarks, makes some transparent on on GI GitHub. Larry Ellison talked about this in his keynote at Cloud World. What are the benchmarks show in general? I mean, when you, when you're new to the market, you gotta have a story like Mark was saying, you gotta be two x you know, the performance at half the cost or you better be or you're not gonna get any market share. So, and, and you know, oftentimes companies don't publish market benchmarks when they're leading. They do it when they, they need to gain share. So what do you make of the benchmarks? Have their, any results that were surprising to you? Have, you know, they been challenged by the competitors. Is it just a bunch of kind of desperate bench marketing to make some noise in the market or you know, are they real? What's your view? >>Well, from my perspective, I think they have the validity. And to your point, I believe that when it comes to competitor responses, that has not really happened. Nobody has like pulled down the information that's on GitHub and said, Oh, here are our price performance results. And they counter oracles. In fact, I think part of the reason why that hasn't happened is that there's the risk if Oracle's coming out and saying, Hey, we can deliver 17 times better query performance using our capabilities versus say, Snowflake when it comes to, you know, the Lakehouse platform and Snowflake turns around and says it's actually only 15 times better during performance, that's not exactly an effective maneuver. And so I think this is really to oracle's credit and I think it's refreshing because these differentiators are significant. We're not talking, you know, like 1.2% differences. We're talking 17 fold differences, we're talking six fold differences depending on, you know, where the spotlight is being shined and so forth. >>And so I think this is actually something that is actually too good to believe initially at first blush. If I'm a cloud database decision maker, I really have to prioritize this. I really would know, pay a lot more attention to this. And that's why I posed the question to Oracle and others like, okay, if these differentiators are so significant, why isn't the needle moving a bit more? And it's for, you know, some of the usual reasons. One is really deep discounting coming from, you know, the other players that's really kind of, you know, marketing 1 0 1, this is something you need to do when there's a real competitive threat to keep, you know, a customer in your own customer base. Plus there is the usual fear and uncertainty about moving from one platform to another. But I think, you know, the traction, the momentum is, is shifting an Oracle's favor. I think we saw that in the Q1 efforts, for example, where Oracle cloud grew 44% and that it generated, you know, 4.8 billion and revenue if I recall correctly. And so, so all these are demonstrating that's Oracle is making, I think many of the right moves, publishing these figures for anybody to look at from their own perspective is something that is, I think, good for the market and I think it's just gonna continue to pay dividends for Oracle down the horizon as you know, competition intens plots. So if I were in, >>Dave, can I, Dave, can I interject something and, and what Ron just said there? Yeah, please go ahead. A couple things here, one discounting, which is a common practice when you have a real threat, as Ron pointed out, isn't going to help much in this situation simply because you can't discount to the point where you improve your performance and the performance is a huge differentiator. You may be able to get your price down, but the problem that most of them have is they don't have an integrated product service. They don't have an integrated O L T P O L A P M L N data lake. Even if you cut out two of them, they don't have any of them integrated. They have multiple services that are required separate integration and that can't be overcome with discounting. And the, they, you have to pay for each one of these. And oh, by the way, as you grow, the discounts go away. So that's a, it's a minor important detail. >>So, so that's a TCO question mark, right? And I know you look at this a lot, if I had that kind of price performance advantage, I would be pounding tco, especially if I need two separate databases to do the job. That one can do, that's gonna be, the TCO numbers are gonna be off the chart or maybe down the chart, which you want. Have you looked at this and how does it compare with, you know, the big cloud guys, for example, >>I've looked at it in depth, in fact, I'm working on another TCO on this arena, but you can find it on Wiki bod in which I compared TCO for MySEQ Heat wave versus Aurora plus Redshift plus ML plus Blue. I've compared it against gcps services, Azure services, Snowflake with other services. And there's just no comparison. The, the TCO differences are huge. More importantly, thefor, the, the TCO per performance is huge. We're talking in some cases multiple orders of magnitude, but at least an order of magnitude difference. So discounting isn't gonna help you much at the end of the day, it's only going to lower your cost a little, but it doesn't improve the automation, it doesn't improve the performance, it doesn't improve the time to insight, it doesn't improve all those things that you want out of a database or multiple databases because you >>Can't discount yourself to a higher value proposition. >>So what about, I wonder ho if you could chime in on the developer angle. You, you followed that, that market. How do these innovations from heatwave, I think you used the term developer velocity. I've heard you used that before. Yeah, I mean, look, Oracle owns Java, okay, so it, it's, you know, most popular, you know, programming language in the world, blah, blah blah. But it does it have the, the minds and hearts of, of developers and does, where does heatwave fit into that equation? >>I think heatwave is gaining quickly mindshare on the developer side, right? It's not the traditional no sequel database which grew up, there's a traditional mistrust of oracles to developers to what was happening to open source when gets acquired. Like in the case of Oracle versus Java and where my sql, right? And, but we know it's not a good competitive strategy to, to bank on Oracle screwing up because it hasn't worked not on Java known my sequel, right? And for developers, it's, once you get to know a technology product and you can do more, it becomes kind of like a Swiss army knife and you can build more use case, you can build more powerful applications. That's super, super important because you don't have to get certified in multiple databases. You, you are fast at getting things done, you achieve fire, develop velocity, and the managers are happy because they don't have to license more things, send you to more trainings, have more risk of something not being delivered, right? >>So it's really the, we see the suite where this best of breed play happening here, which in general was happening before already with Oracle's flagship database. Whereas those Amazon as an example, right? And now the interesting thing is every step away Oracle was always a one database company that can be only one and they're now generally talking about heat web and that two database company with different market spaces, but same value proposition of integrating more things very, very quickly to have a universal database that I call, they call the converge database for all the needs of an enterprise to run certain application use cases. And that's what's attractive to developers. >>It's, it's ironic isn't it? I mean I, you know, the rumor was the TK Thomas Curian left Oracle cuz he wanted to put Oracle database on other clouds and other places. And maybe that was the rift. Maybe there was, I'm sure there was other things, but, but Oracle clearly is now trying to expand its Tam Ron with, with heatwave into aws, into Azure. How do you think Oracle's gonna do, you were at a cloud world, what was the sentiment from customers and the independent analyst? Is this just Oracle trying to screw with the competition, create a little diversion? Or is this, you know, serious business for Oracle? What do you think? >>No, I think it has lakes. I think it's definitely, again, attriting to Oracle's overall ability to differentiate not only my SQL heat wave, but its overall portfolio. And I think the fact that they do have the alliance with the Azure in place, that this is definitely demonstrating their commitment to meeting the multi-cloud needs of its customers as well as what we pointed to in terms of the fact that they're now offering, you know, MySQL capabilities within AWS natively and that it can now perform AWS's own offering. And I think this is all demonstrating that Oracle is, you know, not letting up, they're not resting on its laurels. That's clearly we are living in a multi-cloud world, so why not just make it more easy for customers to be able to use cloud databases according to their own specific, specific needs. And I think, you know, to holder's point, I think that definitely lines with being able to bring on more application developers to leverage these capabilities. >>I think one important announcement that's related to all this was the JSON relational duality capabilities where now it's a lot easier for application developers to use a language that they're very familiar with a JS O and not have to worry about going into relational databases to store their J S O N application coding. So this is, I think an example of the innovation that's enhancing the overall Oracle portfolio and certainly all the work with machine learning is definitely paying dividends as well. And as a result, I see Oracle continue to make these inroads that we pointed to. But I agree with Mark, you know, the short term discounting is just a stall tag. This is not denying the fact that Oracle is being able to not only deliver price performance differentiators that are dramatic, but also meeting a wide range of needs for customers out there that aren't just limited device performance consideration. >>Being able to support multi-cloud according to customer needs. Being able to reach out to the application developer community and address a very specific challenge that has plagued them for many years now. So bring it all together. Yeah, I see this as just enabling Oracles who ring true with customers. That the customers that were there were basically all of them, even though not all of them are going to be saying the same things, they're all basically saying positive feedback. And likewise, I think the analyst community is seeing this. It's always refreshing to be able to talk to customers directly and at Oracle cloud there was a litany of them and so this is just a difference maker as well as being able to talk to strategic partners. The nvidia, I think partnerships also testament to Oracle's ongoing ability to, you know, make the ecosystem more user friendly for the customers out there. >>Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able to be best of breed. That's the kind of surprising thing that I'm hearing about, about heatwave. I want to, I want to talk about Lake House because when I think of Lake House, I think data bricks, and to my knowledge data bricks hasn't been in the sites of Oracle yet. Maybe they're next, but, but Oracle claims that MySQL, heatwave, Lakehouse is a breakthrough in terms of capacity and performance. Mark, what are your thoughts on that? Can you double click on, on Lakehouse Oracle's claims for things like query performance and data loading? What does it mean for the market? Is Oracle really leading in, in the lake house competitive landscape? What are your thoughts? >>Well, but name in the game is what are the problems you're solving for the customer? More importantly, are those problems urgent or important? If they're urgent, customers wanna solve 'em. Now if they're important, they might get around to them. So you look at what they're doing with Lake House or previous to that machine learning or previous to that automation or previous to that O L A with O ltp and they're merging all this capability together. If you look at Snowflake or data bricks, they're tacking one problem. You look at MyQ heat wave, they're tacking multiple problems. So when you say, yeah, their queries are much better against the lake house in combination with other analytics in combination with O ltp and the fact that there are no ETLs. So you're getting all this done in real time. So it's, it's doing the query cross, cross everything in real time. >>You're solving multiple user and developer problems, you're increasing their ability to get insight faster, you're having shorter response times. So yeah, they really are solving urgent problems for customers. And by putting it where the customer lives, this is the brilliance of actually being multicloud. And I know I'm backing up here a second, but by making it work in AWS and Azure where people already live, where they already have applications, what they're saying is, we're bringing it to you. You don't have to come to us to get these, these benefits, this value overall, I think it's a brilliant strategy. I give Nip and Argo wallet a huge, huge kudos for what he's doing there. So yes, what they're doing with the lake house is going to put notice on data bricks and Snowflake and everyone else for that matter. Well >>Those are guys that whole ago you, you and I have talked about this. Those are, those are the guys that are doing sort of the best of breed. You know, they're really focused and they, you know, tend to do well at least out of the gate. Now you got Oracle's converged philosophy, obviously with Oracle database. We've seen that now it's kicking in gear with, with heatwave, you know, this whole thing of sweets versus best of breed. I mean the long term, you know, customers tend to migrate towards suite, but the new shiny toy tends to get the growth. How do you think this is gonna play out in cloud database? >>Well, it's the forever never ending story, right? And in software right suite, whereas best of breed and so far in the long run suites have always won, right? So, and sometimes they struggle again because the inherent problem of sweets is you build something larger, it has more complexity and that means your cycles to get everything working together to integrate the test that roll it out, certify whatever it is, takes you longer, right? And that's not the case. It's a fascinating part of what the effort around my SQL heat wave is that the team is out executing the previous best of breed data, bringing us something together. Now if they can maintain that pace, that's something to to, to be seen. But it, the strategy, like what Mark was saying, bring the software to the data is of course interesting and unique and totally an Oracle issue in the past, right? >>Yeah. But it had to be in your database on oci. And but at, that's an interesting part. The interesting thing on the Lake health side is, right, there's three key benefits of a lakehouse. The first one is better reporting analytics, bring more rich information together, like make the, the, the case for silicon angle, right? We want to see engagements for this video, we want to know what's happening. That's a mixed transactional video media use case, right? Typical Lakehouse use case. The next one is to build more rich applications, transactional applications which have video and these elements in there, which are the engaging one. And the third one, and that's where I'm a little critical and concerned, is it's really the base platform for artificial intelligence, right? To run deep learning to run things automatically because they have all the data in one place can create in one way. >>And that's where Oracle, I know that Ron talked about Invidia for a moment, but that's where Oracle doesn't have the strongest best story. Nonetheless, the two other main use cases of the lake house are very strong, very well only concern is four 50 terabyte sounds long. It's an arbitrary limitation. Yeah, sounds as big. So for the start, and it's the first word, they can make that bigger. You don't want your lake house to be limited and the terabyte sizes or any even petabyte size because you want to have the certainty. I can put everything in there that I think it might be relevant without knowing what questions to ask and query those questions. >>Yeah. And you know, in the early days of no schema on right, it just became a mess. But now technology has evolved to allow us to actually get more value out of that data. Data lake. Data swamp is, you know, not much more, more, more, more logical. But, and I want to get in, in a moment, I want to come back to how you think the competitors are gonna respond. Are they gonna have to sort of do a more of a converged approach? AWS in particular? But before I do, Ron, I want to ask you a question about autopilot because I heard Larry Ellison's keynote and he was talking about how, you know, most security issues are human errors with autonomy and autonomous database and things like autopilot. We take care of that. It's like autonomous vehicles, they're gonna be safer. And I went, well maybe, maybe someday. So Oracle really tries to emphasize this, that every time you see an announcement from Oracle, they talk about new, you know, autonomous capabilities. It, how legit is it? Do people care? What about, you know, what's new for heatwave Lakehouse? How much of a differentiator, Ron, do you really think autopilot is in this cloud database space? >>Yeah, I think it will definitely enhance the overall proposition. I don't think people are gonna buy, you know, lake house exclusively cause of autopilot capabilities, but when they look at the overall picture, I think it will be an added capability bonus to Oracle's benefit. And yeah, I think it's kind of one of these age old questions, how much do you automate and what is the bounce to strike? And I think we all understand with the automatic car, autonomous car analogy that there are limitations to being able to use that. However, I think it's a tool that basically every organization out there needs to at least have or at least evaluate because it goes to the point of it helps with ease of use, it helps make automation more balanced in terms of, you know, being able to test, all right, let's automate this process and see if it works well, then we can go on and switch on on autopilot for other processes. >>And then, you know, that allows, for example, the specialists to spend more time on business use cases versus, you know, manual maintenance of, of the cloud database and so forth. So I think that actually is a, a legitimate value proposition. I think it's just gonna be a case by case basis. Some organizations are gonna be more aggressive with putting automation throughout their processes throughout their organization. Others are gonna be more cautious. But it's gonna be, again, something that will help the overall Oracle proposition. And something that I think will be used with caution by many organizations, but other organizations are gonna like, hey, great, this is something that is really answering a real problem. And that is just easing the use of these databases, but also being able to better handle the automation capabilities and benefits that come with it without having, you know, a major screwup happened and the process of transitioning to more automated capabilities. >>Now, I didn't attend cloud world, it's just too many red eyes, you know, recently, so I passed. But one of the things I like to do at those events is talk to customers, you know, in the spirit of the truth, you know, they, you know, you'd have the hallway, you know, track and to talk to customers and they say, Hey, you know, here's the good, the bad and the ugly. So did you guys, did you talk to any customers my SQL Heatwave customers at, at cloud world? And and what did you learn? I don't know, Mark, did you, did you have any luck and, and having some, some private conversations? >>Yeah, I had quite a few private conversations. The one thing before I get to that, I want disagree with one point Ron made, I do believe there are customers out there buying the heat wave service, the MySEQ heat wave server service because of autopilot. Because autopilot is really revolutionary in many ways in the sense for the MySEQ developer in that it, it auto provisions, it auto parallel loads, IT auto data places it auto shape predictions. It can tell you what machine learning models are going to tell you, gonna give you your best results. And, and candidly, I've yet to meet a DBA who didn't wanna give up pedantic tasks that are pain in the kahoo, which they'd rather not do and if it's long as it was done right for them. So yes, I do think people are buying it because of autopilot and that's based on some of the conversations I had with customers at Oracle Cloud World. >>In fact, it was like, yeah, that's great, yeah, we get fantastic performance, but this really makes my life easier and I've yet to meet a DBA who didn't want to make their life easier. And it does. So yeah, I've talked to a few of them. They were excited. I asked them if they ran into any bugs, were there any difficulties in moving to it? And the answer was no. In both cases, it's interesting to note, my sequel is the most popular database on the planet. Well, some will argue that it's neck and neck with SQL Server, but if you add in Mariah DB and ProCon db, which are forks of MySQL, then yeah, by far and away it's the most popular. And as a result of that, everybody for the most part has typically a my sequel database somewhere in their organization. So this is a brilliant situation for anybody going after MyQ, but especially for heat wave. And the customers I talk to love it. I didn't find anybody complaining about it. And >>What about the migration? We talked about TCO earlier. Did your t does your TCO analysis include the migration cost or do you kind of conveniently leave that out or what? >>Well, when you look at migration costs, there are different kinds of migration costs. By the way, the worst job in the data center is the data migration manager. Forget it, no other job is as bad as that one. You get no attaboys for doing it. Right? And then when you screw up, oh boy. So in real terms, anything that can limit data migration is a good thing. And when you look at Data Lake, that limits data migration. So if you're already a MySEQ user, this is a pure MySQL as far as you're concerned. It's just a, a simple transition from one to the other. You may wanna make sure nothing broke and every you, all your tables are correct and your schema's, okay, but it's all the same. So it's a simple migration. So it's pretty much a non-event, right? When you migrate data from an O LTP to an O L A P, that's an ETL and that's gonna take time. >>But you don't have to do that with my SQL heat wave. So that's gone when you start talking about machine learning, again, you may have an etl, you may not, depending on the circumstances, but again, with my SQL heat wave, you don't, and you don't have duplicate storage, you don't have to copy it from one storage container to another to be able to be used in a different database, which by the way, ultimately adds much more cost than just the other service. So yeah, I looked at the migration and again, the users I talked to said it was a non-event. It was literally moving from one physical machine to another. If they had a new version of MySEQ running on something else and just wanted to migrate it over or just hook it up or just connect it to the data, it worked just fine. >>Okay, so every day it sounds like you guys feel, and we've certainly heard this, my colleague David Foyer, the semi-retired David Foyer was always very high on heatwave. So I think you knows got some real legitimacy here coming from a standing start, but I wanna talk about the competition, how they're likely to respond. I mean, if your AWS and you got heatwave is now in your cloud, so there's some good aspects of that. The database guys might not like that, but the infrastructure guys probably love it. Hey, more ways to sell, you know, EC two and graviton, but you're gonna, the database guys in AWS are gonna respond. They're gonna say, Hey, we got Redshift, we got aqua. What's your thoughts on, on not only how that's gonna resonate with customers, but I'm interested in what you guys think will a, I never say never about aws, you know, and are they gonna try to build, in your view a converged Oola and o LTP database? You know, Snowflake is taking an ecosystem approach. They've added in transactional capabilities to the portfolio so they're not standing still. What do you guys see in the competitive landscape in that regard going forward? Maybe Holger, you could start us off and anybody else who wants to can chime in, >>Happy to, you mentioned Snowflake last, we'll start there. I think Snowflake is imitating that strategy, right? That building out original data warehouse and the clouds tasking project to really proposition to have other data available there because AI is relevant for everybody. Ultimately people keep data in the cloud for ultimately running ai. So you see the same suite kind of like level strategy, it's gonna be a little harder because of the original positioning. How much would people know that you're doing other stuff? And I just, as a former developer manager of developers, I just don't see the speed at the moment happening at Snowflake to become really competitive to Oracle. On the flip side, putting my Oracle hat on for a moment back to you, Mark and Iran, right? What could Oracle still add? Because the, the big big things, right? The traditional chasms in the database world, they have built everything, right? >>So I, I really scratched my hat and gave Nipon a hard time at Cloud world say like, what could you be building? Destiny was very conservative. Let's get the Lakehouse thing done, it's gonna spring next year, right? And the AWS is really hard because AWS value proposition is these small innovation teams, right? That they build two pizza teams, which can be fit by two pizzas, not large teams, right? And you need suites to large teams to build these suites with lots of functionalities to make sure they work together. They're consistent, they have the same UX on the administration side, they can consume the same way, they have the same API registry, can't even stop going where the synergy comes to play over suite. So, so it's gonna be really, really hard for them to change that. But AWS super pragmatic. They're always by themselves that they'll listen to customers if they learn from customers suite as a proposition. I would not be surprised if AWS trying to bring things closer together, being morely together. >>Yeah. Well how about, can we talk about multicloud if, if, again, Oracle is very on on Oracle as you said before, but let's look forward, you know, half a year or a year. What do you think about Oracle's moves in, in multicloud in terms of what kind of penetration they're gonna have in the marketplace? You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at the, the Microsoft Azure deal. I think that's really interesting. I've, I've called it a little bit of early days of a super cloud. What impact do you think this is gonna have on, on the marketplace? But, but both. And think about it within Oracle's customer base, I have no doubt they'll do great there. But what about beyond its existing install base? What do you guys think? >>Ryan, do you wanna jump on that? Go ahead. Go ahead Ryan. No, no, no, >>That's an excellent point. I think it aligns with what we've been talking about in terms of Lakehouse. I think Lake House will enable Oracle to pull more customers, more bicycle customers onto the Oracle platforms. And I think we're seeing all the signs pointing toward Oracle being able to make more inroads into the overall market. And that includes garnishing customers from the leaders in, in other words, because they are, you know, coming in as a innovator, a an alternative to, you know, the AWS proposition, the Google cloud proposition that they have less to lose and there's a result they can really drive the multi-cloud messaging to resonate with not only their existing customers, but also to be able to, to that question, Dave's posing actually garnish customers onto their platform. And, and that includes naturally my sequel but also OCI and so forth. So that's how I'm seeing this playing out. I think, you know, again, Oracle's reporting is indicating that, and I think what we saw, Oracle Cloud world is definitely validating the idea that Oracle can make more waves in the overall market in this regard. >>You know, I, I've floated this idea of Super cloud, it's kind of tongue in cheek, but, but there, I think there is some merit to it in terms of building on top of hyperscale infrastructure and abstracting some of the, that complexity. And one of the things that I'm most interested in is industry clouds and an Oracle acquisition of Cerner. I was struck by Larry Ellison's keynote, it was like, I don't know, an hour and a half and an hour and 15 minutes was focused on healthcare transformation. Well, >>So vertical, >>Right? And so, yeah, so you got Oracle's, you know, got some industry chops and you, and then you think about what they're building with, with not only oci, but then you got, you know, MyQ, you can now run in dedicated regions. You got ADB on on Exadata cloud to customer, you can put that OnPrem in in your data center and you look at what the other hyperscalers are, are doing. I I say other hyperscalers, I've always said Oracle's not really a hyperscaler, but they got a cloud so they're in the game. But you can't get, you know, big query OnPrem, you look at outposts, it's very limited in terms of, you know, the database support and again, that that will will evolve. But now you got Oracle's got, they announced Alloy, we can white label their cloud. So I'm interested in what you guys think about these moves, especially the industry cloud. We see, you know, Walmart is doing sort of their own cloud. You got Goldman Sachs doing a cloud. Do you, you guys, what do you think about that and what role does Oracle play? Any thoughts? >>Yeah, let me lemme jump on that for a moment. Now, especially with the MyQ, by making that available in multiple clouds, what they're doing is this follows the philosophy they've had the past with doing cloud, a customer taking the application and the data and putting it where the customer lives. If it's on premise, it's on premise. If it's in the cloud, it's in the cloud. By making the mice equal heat wave, essentially a plug compatible with any other mice equal as far as your, your database is concern and then giving you that integration with O L A P and ML and Data Lake and everything else, then what you've got is a compelling offering. You're making it easier for the customer to use. So I look the difference between MyQ and the Oracle database, MyQ is going to capture market more market share for them. >>You're not gonna find a lot of new users for the Oracle debate database. Yeah, there are always gonna be new users, don't get me wrong, but it's not gonna be a huge growth. Whereas my SQL heatwave is probably gonna be a major growth engine for Oracle going forward. Not just in their own cloud, but in AWS and in Azure and on premise over time that eventually it'll get there. It's not there now, but it will, they're doing the right thing on that basis. They're taking the services and when you talk about multicloud and making them available where the customer wants them, not forcing them to go where you want them, if that makes sense. And as far as where they're going in the future, I think they're gonna take a page outta what they've done with the Oracle database. They'll add things like JSON and XML and time series and spatial over time they'll make it a, a complete converged database like they did with the Oracle database. The difference being Oracle database will scale bigger and will have more transactions and be somewhat faster. And my SQL will be, for anyone who's not on the Oracle database, they're, they're not stupid, that's for sure. >>They've done Jason already. Right. But I give you that they could add graph and time series, right. Since eat with, Right, Right. Yeah, that's something absolutely right. That's, that's >>A sort of a logical move, right? >>Right. But that's, that's some kid ourselves, right? I mean has worked in Oracle's favor, right? 10 x 20 x, the amount of r and d, which is in the MyQ space, has been poured at trying to snatch workloads away from Oracle by starting with IBM 30 years ago, 20 years ago, Microsoft and, and, and, and didn't work, right? Database applications are extremely sticky when they run, you don't want to touch SIM and grow them, right? So that doesn't mean that heat phase is not an attractive offering, but it will be net new things, right? And what works in my SQL heat wave heat phases favor a little bit is it's not the massive enterprise applications which have like we the nails like, like you might be only running 30% or Oracle, but the connections and the interfaces into that is, is like 70, 80% of your enterprise. >>You take it out and it's like the spaghetti ball where you say, ah, no I really don't, don't want to do all that. Right? You don't, don't have that massive part with the equals heat phase sequel kind of like database which are more smaller tactical in comparison, but still I, I don't see them taking so much share. They will be growing because of a attractive value proposition quickly on the, the multi-cloud, right? I think it's not really multi-cloud. If you give people the chance to run your offering on different clouds, right? You can run it there. The multi-cloud advantages when the Uber offering comes out, which allows you to do things across those installations, right? I can migrate data, I can create data across something like Google has done with B query Omni, I can run predictive models or even make iron models in different place and distribute them, right? And Oracle is paving the road for that, but being available on these clouds. But the multi-cloud capability of database which knows I'm running on different clouds that is still yet to be built there. >>Yeah. And >>That the problem with >>That, that's the super cloud concept that I flowed and I I've always said kinda snowflake with a single global instance is sort of, you know, headed in that direction and maybe has a league. What's the issue with that mark? >>Yeah, the problem with the, with that version, the multi-cloud is clouds to charge egress fees. As long as they charge egress fees to move data between clouds, it's gonna make it very difficult to do a real multi-cloud implementation. Even Snowflake, which runs multi-cloud, has to pass out on the egress fees of their customer when data moves between clouds. And that's really expensive. I mean there, there is one customer I talked to who is beta testing for them, the MySQL heatwave and aws. The only reason they didn't want to do that until it was running on AWS is the egress fees were so great to move it to OCI that they couldn't afford it. Yeah. Egress fees are the big issue but, >>But Mark the, the point might be you might wanna root query and only get the results set back, right was much more tinier, which been the answer before for low latency between the class A problem, which we sometimes still have but mostly don't have. Right? And I think in general this with fees coming down based on the Oracle general E with fee move and it's very hard to justify those, right? But, but it's, it's not about moving data as a multi-cloud high value use case. It's about doing intelligent things with that data, right? Putting into other places, replicating it, what I'm saying the same thing what you said before, running remote queries on that, analyzing it, running AI on it, running AI models on that. That's the interesting thing. Cross administered in the same way. Taking things out, making sure compliance happens. Making sure when Ron says I don't want to be American anymore, I want to be in the European cloud that is gets migrated, right? So tho those are the interesting value use case which are really, really hard for enterprise to program hand by hand by developers and they would love to have out of the box and that's yet the innovation to come to, we have to come to see. But the first step to get there is that your software runs in multiple clouds and that's what Oracle's doing so well with my SQL >>Guys. Amazing. >>Go ahead. Yeah. >>Yeah. >>For example, >>Amazing amount of data knowledge and, and brain power in this market. Guys, I really want to thank you for coming on to the cube. Ron Holger. Mark, always a pleasure to have you on. Really appreciate your time. >>Well all the last names we're very happy for Romanic last and moderator. Thanks Dave for moderating us. All right, >>We'll see. We'll see you guys around. Safe travels to all and thank you for watching this power panel, The Truth About My SQL Heat Wave on the cube. Your leader in enterprise and emerging tech coverage.

Published Date : Nov 1 2022

SUMMARY :

Always a pleasure to have you on. I think you just saw him at Oracle Cloud World and he's come on to describe this is doing, you know, Google is, you know, we heard Google Cloud next recently, They own somewhere between 30 to 50% depending on who you read migrate from one cloud to another and suddenly you have a very compelling offer. All right, so thank you for that. And they certainly with the AI capabilities, And I believe strongly that long term it's gonna be ones who create better value for So I mean it's certainly, you know, when, when Oracle talks about the competitors, So what do you make of the benchmarks? say, Snowflake when it comes to, you know, the Lakehouse platform and threat to keep, you know, a customer in your own customer base. And oh, by the way, as you grow, And I know you look at this a lot, to insight, it doesn't improve all those things that you want out of a database or multiple databases So what about, I wonder ho if you could chime in on the developer angle. they don't have to license more things, send you to more trainings, have more risk of something not being delivered, all the needs of an enterprise to run certain application use cases. I mean I, you know, the rumor was the TK Thomas Curian left Oracle And I think, you know, to holder's point, I think that definitely lines But I agree with Mark, you know, the short term discounting is just a stall tag. testament to Oracle's ongoing ability to, you know, make the ecosystem Yeah, it's interesting when you get these all in one tools, you know, the Swiss Army knife, you expect that it's not able So when you say, yeah, their queries are much better against the lake house in You don't have to come to us to get these, these benefits, I mean the long term, you know, customers tend to migrate towards suite, but the new shiny bring the software to the data is of course interesting and unique and totally an Oracle issue in And the third one, lake house to be limited and the terabyte sizes or any even petabyte size because you want keynote and he was talking about how, you know, most security issues are human I don't think people are gonna buy, you know, lake house exclusively cause of And then, you know, that allows, for example, the specialists to And and what did you learn? The one thing before I get to that, I want disagree with And the customers I talk to love it. the migration cost or do you kind of conveniently leave that out or what? And when you look at Data Lake, that limits data migration. So that's gone when you start talking about So I think you knows got some real legitimacy here coming from a standing start, So you see the same And you need suites to large teams to build these suites with lots of functionalities You saw a lot of presentations at at cloud world, you know, we've looked pretty closely at Ryan, do you wanna jump on that? I think, you know, again, Oracle's reporting I think there is some merit to it in terms of building on top of hyperscale infrastructure and to customer, you can put that OnPrem in in your data center and you look at what the So I look the difference between MyQ and the Oracle database, MyQ is going to capture market They're taking the services and when you talk about multicloud and But I give you that they could add graph and time series, right. like, like you might be only running 30% or Oracle, but the connections and the interfaces into You take it out and it's like the spaghetti ball where you say, ah, no I really don't, global instance is sort of, you know, headed in that direction and maybe has a league. Yeah, the problem with the, with that version, the multi-cloud is clouds And I think in general this with fees coming down based on the Oracle general E with fee move Yeah. Guys, I really want to thank you for coming on to the cube. Well all the last names we're very happy for Romanic last and moderator. We'll see you guys around.

ENTITIES

Entity	Category	Confidence
Mark	PERSON	0.99+
Ron Holger	PERSON	0.99+
Ron	PERSON	0.99+
Mark Stammer	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Ron Westfall	PERSON	0.99+
Ryan	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Larry Ellison	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Holgar Mueller	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Constellation Research	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
17 times	QUANTITY	0.99+
two	QUANTITY	0.99+
David Foyer	PERSON	0.99+
44%	QUANTITY	0.99+
1.2%	QUANTITY	0.99+
4.8 billion	QUANTITY	0.99+
Jason	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Fu Chim Research	ORGANIZATION	0.99+
Dave Ante	PERSON	0.99+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022

(upbeat music) >> Welcome back everyone to The Cubes coverage here in Seattle, Washington. For AWS's Marketplace Seller Conference. It's the big news within the Amazon partner network, combining with marketplace, forming the Amazon partner organization. Part of a big reorg as they grow to the next level, NextGen cloud, mid-game on the chessboard. Cube's got it covered. I'm John Furry, your host at Cube. Great guests here from Data bricks. Both cube alumni's. Jack Anderson, GM and VP of the Databricks partnership team for AWS. You handle that relationship and Joel Minick vice president of product and partner marketing. You guys have the keys to the kingdom with Databricks and AWS. Thanks for joining. Good to see you again. >> Thanks for having us back. >> Yeah, John, great to be here. >> So I feel like we're at Reinvent 2013. Small event, no stage, but there's a real shift happening with procurement. Obviously it's a no brainer on the micro, you know, people should be buying online. Self-service, Cloud Scale. But Amazon's got billions being sold through their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website, marketplace. Merge our partner organizations, have more synergy and frictionless experiences so everyone can make more money and customer's are going to be happier. >> Yeah, that's right. >> I mean, you're running relationship. You're in the middle of it. >> Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do, is to work with Amazon on these really, you know, unique use cases. >> You know, I interviewed Ali many times over the years. I remember many years ago, maybe six, seven years ago, we were talking. He's like, "we're all in on AWS." Obviously now the success of Databricks, you've got multiple clouds, see that. Customers have choice. But I remember the strategy early on. It was like, we're going to be deep. So this is, speaks volumes to the relationship you have. Years. Jack, take us through the relationship that Databricks has with AWS from a partner perspective. Joel, and from a product perspective. Because it's not like you guys are Johnny come lately, new to the scene. >> Right. >> You've been there, almost president creation of this wave. What's the relationship and how does it relate to what's going on today? >> So most people may not know that Databricks was born on AWS. We actually did our first $100 million of revenue on Amazon. And today we're obviously available on multiple clouds. But we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon, and marketplace broadens our reach. And so, we think of marketplace in three different aspects. We've got the marketplace private offer business, which we've been doing for a number of years. Matter of fact, we were driving well over a hundred percent year over year growth in private offers. And we have a nine figure business. So it's a very significant business. And when a customer uses a private offer, that private offer counts against their private pricing agreement with AWS. So they get pricing power against their private pricing. So it's really important it goes on their Amazon bill. In may we launched our pay as you go, on demand offering. And in five short months, we have well over a thousand subscribers. And what this does, is it really reduces the barriers to entry. It's low friction. So anybody in an enterprise or startup or public sector company can start to use Databricks on AWS, in a consumption based model, and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation, pilots, POCs. They're really learning the value of that first, use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offer, CPPO. Super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with Databricks on behalf of customers. >> So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data, coming in with the buyers you get. And obviously the integration piece all fitting in there. >> Exactly. >> Okay, so those are the offers, that's current, what's in marketplace today. Is that the products... What are people buying? >> Yeah. >> I mean, I guess what's the... Joel, what are people buying in the marketplace? And what does it mean for them? >> So fundamentally what they're buying is the ability to take silos out of their organization. And that is the problem that Databricks is out there to solve. Which is, when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data. And your teams are trying to use all of this data to solve really complicated problems. And as Databricks, as the Lakehouse Company, what we're helping customers do is, how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find, through the marketplace, those rapid adoption use cases where they can get rid of these data warehousing, data lake silos they've had in the past. Get their unstructured and structured data onto one data platform, an open data platform, that is no longer adherent to any proprietary formats and standards and something they can, very much, very easily, integrate into the rest of their data environment. Apply one common data governance layer on top of that. So that from the time they ingest that data, to the time they use that data, to the time they share that data, inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform, with that common governance solution, they'd being able to bring all of those use cases together. Across their real time streaming, their data engineering, their BI, their AI. All of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges they just couldn't solve before. A good example of this, you know, one of the world's now largest data streaming platforms runs on Databricks with AWS. And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses. That they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data, from what the customers are doing with their platform and the recommendations they want to push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex. But by building it on Databricks, they were able to release it in record time and have grown at a record pace to now be the number one platform. >> And this product, it's impacting product development. >> Absolutely. >> I mean, this is like the difference between lagging months of product development, to like days. >> Yes. >> Pretty much what you're getting at. >> Yes. >> So total agility. >> Mm-hmm. >> I got that. Okay, now, I'm a customer I want to buy in the marketplace, but you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from AWS's leadership, Chris, was up there speaking, and Mona was, "Hey, he's a CRO conference chief revenue officer" conversation. Which means someone's getting compensated. So, if I'm the sales rep at Databricks, what's my motion to the customer? Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Or, how do you handle it? >> Well, I'd add what Joel just talked about with, you know, with the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, it's the entire Data Bricks offering. And- >> The flagship, all the, the top stuff. >> Everything, the flagship, the complete offering. So it's not segmented. It's not a sub segment. >> Okay. >> It's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we view this two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend Databricks for the right situation. Same thing with Databricks, our sales force wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get Databricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars. You have $5 million of spend. You put that spend through the flywheel with AWS marketplace, and then you can use that in your negotiations with AWS to get better pricing overall. So that's how we view it. >> So customers are driving. This sounds like. >> Correct. For sure. >> So they're looking at this as saying, Hey, I'm going to just get purchasing power with all my relationships. Because it's a solution architectural market, right? >> Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider. If they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing power. >> Okay, Joel, we're going to date ourselves. At least I will. So back in the old days, (group laughter) It used to be, do a Barney deal with someone, Hey, let's go to market together. You got to get paper, you do a biz dev deal. And then you got to say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for Databricks, or any company is, to go find those partners and do deals, versus now Amazon is the center point for the customer. So you can still do those joint deals, but this seems to be flipping the script a little bit. >> Well, it is, but we still have vars and consulting partners that are doing implementation work. Very valuable work, advisory work, that can actually work with marketplace through the CPPO offering. So the marketplace allows multiple ways to procure your solution. >> So it doesn't change your business structure. It just makes it more efficient. >> That's correct. >> That's a great way to say it. >> Yeah, that's great. >> Okay. So, that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >> Yes. >> Absolutely. >> Economically. >> Economically, it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon. Especially when it comes back to co-selling, right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution. And our teams are working backwards from those use cases, you know, to collaborate and land them. >> Yeah. I want to get that out there. Go ahead, Joel. >> So one of the other things I might add to that too, you know, and why this is advantageous for companies like Databricks to work through the marketplace. Is it makes it so much easier for customers to deploy a solution. It's very, literally, one click through the marketplace to get Databricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. >> You know, it's interesting. I want to bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself, EDI back in the old days, you know, all that craziness. Now this is all the internet, basically through the console. I get the infrastructure side, you know, spin up and provision some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage. A little nuanced. Most enterprises aren't yet there of integrating tech, on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrators dilemma, not an innovator's dilemma. So like, I want to integrate. So now I have integration points with Databricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you got to build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or am I off because, no one's going to be buying software like they used to. They buy software to integrate it. >> Yeah, no- >> Because everything's integrated. >> I think AWS has done a great job at creating a partner ecosystem, right? To give customers the right tools for the right jobs. And those might be with third parties. Databricks is doing the same thing with our partner connect program, right? We've got customer partners like Five Tran and DBT that, you know, augment and enhance our platform. And so you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >> Yeah. It's almost like, you know, bundling and un bundling. I was talking about this with, with Dave Alante about Supercloud. Which is why wouldn't a customer want the best solution in their architecture? Period. In its class. If someone's got API security or an API gateway. Well, you know, I don't want to be forced to buy something because it's part of a suite. And that's where you see things get sub optimized. Where someone dominates a category and they have, oh, you got to buy my version of this. >> Joel and I were talking, we were actually saying, what's really important about Databricks, is that customers control the data, right? You want to comment on that? >> Yeah. I was going to say, you know, what you're pushing on there, we think is extraordinarily, you know, the way the market is going to go. Is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically, I think, really strong places, Databricks and AWS have lined up, is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the Lakehouse, one thing we've always been extremely committed to, as a company, is building the data platform on an open foundation. And we do that primarily through Delta Lake and making sure that, to Jack's point, with Databricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed Databricks to have the breadth of integrations that it has with all the other data tools out there. Because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >> When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what am I foreclosing if I go with something else that's not as open? What's the customer's downside as you think about what's around the corner in the industry? Because if you believe it's going to be open, open source, which I think open source software is the software industry, and integration is a big deal. Because software's going to be plentiful. >> Sure. >> Let's face it. It's a good time to be in software business. But Cloud's booming. So what's the downside, from your Databricks perspective? You see a buyer clicking on Databricks versus that alternative. What's potentially should they be a nervous about, down the road, if they go with a more proprietary or locked in approach? >> Yeah. >> Well, I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also, then, beholden to the pace at which that provider is able to innovate. >> Mm-hmm. >> And I think we've seen lots of times over history where, you know, a proprietary format may run ahead, for a while, on a lot of innovation. But as that market control begins to solidify, that desire to innovate begins to degrade. Whereas in the open formats- >> So extract rents versus innovation. (John laughs) >> Exactly. Yeah, exactly. >> I'll say it. >> But in the open world, you know, you have to continue to innovate. >> Yeah. >> And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you are always going to be at the forefront of what is the latest. >> You know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked with proprietary. >> Yeah. >> You know, SNA and IBM, deck net was digital. You know the rest. And then TCPIP was part of the open systems interconnect. >> Mm-hmm. >> Revolutionary (indistinct) a big part of that, as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. >> Yeah. >> But that helped inter operate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid-game, then you get the end game. You're not there at the end game yet at Cloud. But Cloud- >> There's, always some form of lock in, right? Andy Jazzy will address it, you know, when making a decision. But if you're going to make a decision you want to reduce- You don't want to be limited, right? So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now, is an AI driven business, right? And so it has to do with, can you get that data out of silos? Can you organize it and secure it? And then can you work with data scientists to feed those models? >> Yeah. >> In a very consistent manner. And so the tools of tomorrow will, to Joel's point, will be open and we want interoperability with those tools. >> And choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds, is that they have to compete directly too. Redshift competes directly with a lot of other stuff. But they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're going to be, they're onto it. This is the- >> To Amazon's credit by having these solutions that may compete with native services in marketplace, they are providing customers with choice, low price- >> And access to the core value. Which is the hardware- >> Exactly. >> Which is their platform. Okay. So I want to get you guys thought on something else I see emerging. This is, again, kind of Cube rumination moment. So on stage, Chris unpacked a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing, compensation, workflows, services behind the curtain. And one of those things he mentioned was, they talk about resellers or channel partners, depending upon what you talk about. We believe, Dave and I believe on the Cube, that the entire indirect sales channel of the industry is going to be disrupted radically. Because those players were selling hardware in the old days and software. That game is going to change. You mentioned you guys have a program, let me get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in. Which means that the old reseller channels are going to be rewritten. They're going to be refactored with this new kinds of access. Because you've got scale, you've got money and you've got product. And you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, a value added reseller or VAB or business. >> You've got to evolve. >> You got to, you got to be here. >> Yes. >> Yeah. >> How are you guys working with those partners? Because you say you have a product in your marketplace there. How do I make money if I'm a reseller with Databricks, with Amazon? Take me through that use case. >> Well I'll let Joel comment, but I think it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations. They need expertise from consulting and SI partners. If those consulting and SI partners happen to resell the solution as well. Well, that's another aspect of their business. But I really think it is the expertise that the partners bring to help customers get outcomes. >> Joel, channel big opportunity for Amazon to reimagine this. >> For sure. Yeah. And I think, you know, to your comment about how do resellers take advantage of that, I think what Jack was pushing on is spot on. Which is, it's becoming more and more about the expertise you bring to the table. And not just transacting the software. But now actually helping customers make the right choices. And we're seeing, you know, both SIs begin to be able to resell solutions and finding a lot of opportunity in that. >> Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's going to be the evolution that this goes. >> At the end of the day, it's about services, right? >> For sure. Yeah. >> I mean... >> You've got a great service. You're going to have high gross profits. >> Yeah >> Managed service provider business is alive and well, right? Because there are a number of customers that want that type of a service. >> I think that's going to be a really hot, hot button for you guys. I think being the way you guys are open, this channel, partner services model coming in, to the fold, really kind of makes for kind of that Supercloud like experience, where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on, within Databricks. >> For sure. >> On top of this ecosystem. How does that work? This is kind of like, hasn't been written up in business school and case studies yet. This is new. What is this? >> I think, you know, what it comes down to is, you're seeing ecosystems begin to evolve around the data platforms. And that's going to be one of the big, kind of, new horizons for us as we think about what drives ecosystems. It's going to be around, well, what's the data platform that I'm using? And then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around Databricks in its Lakehouse platform as well. As customers are looking at well, if I'm standing these Lakehouses up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >> I mean you think about ecosystem theory, we're living a whole nother dream. And I'm not kidding. It hasn't yet been written up and for business school case studies is that, we're now in a whole nother connective tissue, ecology thing happening. Where you have dependencies and value proposition. Economics, connectedness. So you have relationships in these ecosystems. >> And I think one of the great things about the relationships with these ecosystems, is that there's a high degree of overlap. >> Yeah. >> So you're seeing that, you know, the way that the cloud business is evolving, the ecosystem partners of Databricks, are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for you. >> Joel, Jack, I love it because you know what it means? The best ecosystem will win, if you keep it open. >> Sure, sure. >> You can see everything. If you're going to do it in the dark, you know, you don't know the outcome. I mean, this is really kind of what we're talking about. >> And John, can I just add that when I was at Amazon, we had a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? >> Yeah. >> And so there's a platform decision being made and that ecosystem is going to evolve around the platform. >> Yeah, and I totally agree. And the word innovation gets kicked around. That's why, you know, when we had our Supercloud panel, it was called the innovators dilemma, with a slash through it, called the integrater's dilemma. Innovation is the digital transformation. So- >> Absolutely. >> Like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating? If APIs are connective tissue, what's automation, what's the service messages look like? I mean, a whole nother set of, kind of thinking, goes on in these new ecosystems and these new products. >> And that thinking is, has been born in Delta Sharing, right? So the idea that you can have a multi-cloud implementation of Databricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud solution. >> Well, Databricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great growing ecosystem. And again, I think a shining example of what every enterprise is going to do. Build on top of something, operating model, get that operating model, driving revenue. >> Mm-hmm. >> Yeah. >> Whether, you're Goldman Sachs or capital one or XYZ corporation. >> S and P global, NASDAQ. >> Yeah. >> We've got, you know, the biggest verticals in the world are solving tough problems with Databricks. I think we'd be remiss because if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions. Whether it's the relationship we have with our engineering and service teams. Our marketing teams, you know, product development. And we're going to be at Reinvent. A big presence at Reinvent. We're looking forward to seeing you there, again. >> Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening. This NextGen Cloud is here. We're seeing this evolve, kind of new economics, new value propositions kind of scaling up. Producing more. So you guys are doing a great job. Thanks for coming on the Cube and taking the time. Joel, great to see you at the check. >> Thanks for having us, John. >> Okay. Cube coverage here. The world's changing as APN comes together with the marketplace for a new partner organization at Amazon web services. The Cube's got it covered. This should be a very big, growing ecosystem as this continues. Billions of being sold through the marketplace. And of course the buyers are happy as well. So we've got it all covered. I'm John Furry. your host of the cube. Thanks for watching. (upbeat music)

Published Date : Oct 10 2022

SUMMARY :

You guys have the keys to the kingdom on the micro, you know, You're in the middle of it. you know, unique use cases. to the relationship you have. and how does it relate to And so we see customers, you know, And obviously the integration Is that the products... buying in the marketplace? And that is the problem that Databricks And this product, it's the difference between So how do you guys look at So it's not a subset, it's the Everything, the flagship, and then you can use So customers are driving. For sure. Hey, I'm going to just you know, multiple ISV spend here is that the alternative So the marketplace allows multiple ways So it doesn't change So you guys are actually incented It's the right thing to do for out there. the marketplace to get Databricks stood up I get the infrastructure side, you know, Databricks is doing the same thing And that's where you see And that is one of the things that aren't as open as you guys, down the road, if they go that provider is able to innovate. that desire to innovate begins to degrade. So extract rents versus innovation. Yeah, exactly. But in the open world, you know, And the open source the protocol stacked with proprietary. You know the rest. And so like, you know, that was, I call it the chessboard, you know, And if you look at what every customer's And so the tools of tomorrow And I would say that, you know, And access to the core value. to data centers or software, you know, How are you guys working that the partners bring to to reimagine this. And I think, you know, And that's going to be the Yeah. You're going to have high gross profits. that want that type of a service. I think being the way you guys are open, This is kind of like, And so I think there's, you know, So you have relationships And I think one of the great things And so as you build these because you know what it means? in the dark, you know, that want to build things themselves. to evolve around the platform. And the word innovation more of a, are you open? So the idea that you and the CapEx gift from AWS. Whether, you're Goldman for all of the investments across Joel, great to see you at the check. And of course the buyers

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
Chris	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Joel	PERSON	0.99+
Jeff Frick	PERSON	0.99+
Peter	PERSON	0.99+
Mona	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Vellante	PERSON	0.99+
Keith	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Kevin	PERSON	0.99+
Joel Minick	PERSON	0.99+
Andy	PERSON	0.99+
Ryan	PERSON	0.99+
Cathy Dally	PERSON	0.99+
Patrick	PERSON	0.99+
Greg	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Stephen	PERSON	0.99+
Kevin Miller	PERSON	0.99+
Marcus	PERSON	0.99+
Dave Alante	PERSON	0.99+
Eric	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Dan	PERSON	0.99+
Peter Burris	PERSON	0.99+
Greg Tinker	PERSON	0.99+
Utah	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Raleigh	LOCATION	0.99+
Brooklyn	LOCATION	0.99+
Carl Krupitzer	PERSON	0.99+
Lisa	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
JetBlue	ORGANIZATION	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Angie Embree	PERSON	0.99+
Kirk Skaugen	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
2014	DATE	0.99+
Simon	PERSON	0.99+
United	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
Southwest	ORGANIZATION	0.99+
Kirk	PERSON	0.99+
Frank	PERSON	0.99+
Patrick Osborne	PERSON	0.99+
1984	DATE	0.99+
China	LOCATION	0.99+
Boston	LOCATION	0.99+
California	LOCATION	0.99+
Singapore	LOCATION	0.99+

Jack Andersen & Joel Minnick, Databricks | AWS Marketplace Seller Conference 2022

>>Welcome back everyone to the cubes coverage here in Seattle, Washington, AWS's marketplace seller conference. It's the big news within the Amazon partner network, combining with marketplaces, forming the Amazon partner organization, part of a big reorg as they grow the next level NextGen cloud mid-game on the chessboard. Cube's got cover. I'm John fur, host of Cub, a great guests here from data bricks, both cube alumnis, Jack Anderson, GM of the and VP of the data bricks partnership team. For ADOS, you handle that relationship and Joel Minick vice president of product and partner marketing. You guys are the, have the keys to the kingdom with data, bricks, and AWS. Thanks for joining. Thanks for good to see you again. Thanks for >>Having us back. Yeah, John, great to be here. >>So I feel like we're at reinvent 2013 small event, no stage, but there's a real shift happening with procurement. Obviously it makes it's a no brainer on the micro, you know, people should be buying online self-service cloud scale, but Amazon's got billions being sold to their marketplace. They've reorganized their partner network. You can see kind of what's going on. They've kind of figured it out. Like let's put everything together and simplify and make it less of a website marketplace merge our partner to have more synergy and friction, less experiences so everyone can make more money and customer's gonna be happier. >>Yeah, that's right. >>I mean, you're run relationship. You're in the middle of it. >>Well, Amazon's mental model here is that they want the world's best ISVs to operate on AWS so that we can collaborate and co architect on behalf of customers. And that's exactly what the APO and marketplace allow us to do is to work with Amazon on these really, you know, unique use cases. >>You know, I interviewed Ali many times over the years. I remember many years ago, I think six, maybe six, seven years ago, we were talking. He's like, we're all in ons. Obviously. Now the success of data bricks, you've got multiple clouds. See that customers have choice, but I remember the strategy early on. It was like, we're gonna be deep. So this is speaks volumes to the, the relationship you have years. Jack take us through the relationship that data bricks has with AWS from a, from a partner perspective, Joel, and from a product perspective, because it's not like you got to Johnny come lately new to the new, to the scene, right? We've been there almost president creation of this wave. What's the relationship and has it relate to what's going on today? >>So, so most people may not know that data bricks was born on AWS. We actually did our first 100 million of revenue on Amazon. And today we're obviously available on multiple clouds, but we're very fond of our Amazon relationship. And when you look at what the APN allows us to do, you know, we're able to expand our reach and co-sell with Amazon and marketplace broadens our reach. And so we think of marketplace in three different aspects. We've got the marketplace, private offer business, which we've been doing for a number of years. Matter of fact, we we're driving well over a hundred percent year over year growth in private offers and we have a nine figure business. So it's a very significant business. And when a customer uses a private offer that private offer counts against their private pricing agreement with AWS. So they get pricing power against their, their private pricing. >>So it's really important. It goes on their Amazon bill in may. We launched our pay as you go on demand offering. And in five short months, we have well over a thousand subscribers. And what this does is it really reduces the barriers to entry it's low friction. So anybody in an enterprise or startup or public sector company can start to use data bricks on AWS and pay consumption based model and have it go against their monthly bill. And so we see customers, you know, doing rapid experimentation pilots, POCs, they're, they're really learning the value of that first use case. And then we see rapid use case expansion. And the third aspect is the consulting partner, private offers C P O super important in how we involve our partner ecosystem of our consulting partners and our resellers that are able to work with data bricks on behalf of customers. >>So you got the big contracts with the private offer. You got the product market fit, kind of people iterating with data coming in with, with the buyers you go. And obviously the integration piece all fitting in there. Exactly. Exactly. Okay. So that's that those are the offers that's current and what's in marketplace today. Is that the products, what are, what are people buying? I mean, I guess what's the Joel, what are, what are people buying in the marketplace and what does it mean for >>Them? So fundamentally what they're buying is the ability to take silos out of their organization. And that's, that is the problem that data bricks is out there to solve, which is when you look across your data landscape today, you've got unstructured data, you've got structured data, you've got real time streaming data, and your teams are trying to use all of this data to solve really complicated problems. And as data bricks as the lake house company, what we're helping customers do is how do they get into the new world? How do they move to a place where they can use all of that data across all of their teams? And so we allow them to begin to find through the marketplace, those rapid adoption use cases where they can get rid of these data, warehousing data lake silos they've had in the past, get their unstructured and structured data onto one data platform and open data platform that is no longer adherent to any proprietary formats and standards and something. >>They can very much, very easily integrate into the rest of their data environment, apply one common data governance layer on top of that. So that from the time they ingest that data to the time they use that data to the time they share that data inside and outside of their organization, they know exactly how it's flowing. They know where it came from. They know who's using it. They know who has access to it. They know how it's changing. And then with that common data platform with that common governance solution, they'd being able to bring all of those use cases together across their real time, streaming their data engineering, their BI, their AI, all of their teams working on one set of data. And that lets them move really, really fast. And it also lets them solve challenges. They just couldn't solve before a good example of this, you know, one of the world's now largest data streaming platforms runs on data bricks with AWS. >>And if you think about what does it take to set that up? Well, they've got all this customer data that was historically inside of data warehouses, that they have to understand who their customers are. They have all this unstructured data, they've built their data science model, so they can do the right kinds of recommendation engines and forecasting around. And then they've got all this streaming data going back and forth between click stream data from what the customers are doing with their platform and the recommendations they wanna push back out. And if those teams were all working in individual silos, building these kinds of platforms would be extraordinarily slow and complex, but by building it on data bricks, they were able to release it in record time and have grown at, at record pace >>To not be that's product platform that's impacting product development. Absolutely. I mean, this is like the difference between lagging months of product development to like days. Yes. Pretty much what you're getting at. Yeah. So total agility. I got that. Okay. Now I'm a customer I wanna buy in the marketplace, but I also, you got direct Salesforce up there. So how do you guys look at this? Is there channel conflict? Are there comp programs? Because one of the things I heard today in on the stage from a Davis's leadership, Chris was up there speaking and, and, and moment I was, Hey, he's a CRO conference, chief revenue officer conversation, which means someone's getting compensated. So if I'm the sales rep at data bricks, what's my motion to the customer. Do I get paid? Does Amazon sell it? Take us through that. Is there channel conflict? Is there or an audio lift? >>Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering is available on AWS marketplace. So it's not a subset, the entire data bricks offering and >>The flagship, all the, the top, >>Everything, the flagship, the complete offering. So it's not, it's not segmented. It's not a sub segment. It's it's, you know, you can use all of our different offerings. Now when it comes to seller compensation, we, we, we view this two, two different ways, right? One is that AWS is also incented, right? Versus selling a native service to recommend data bricks for the right situation. Same thing with data bricks. Our Salesforce wants to do the right thing for the customer. If the customer wants to use marketplace as their procurement vehicle. And that really helps customers because if you get data bricks and five other ISVs together, and let's say each ISV is spending, you're spending a million dollars, you have $5 million of spend, you put that spend through the flywheel with AWS marketplace. And then you can use that in your negotiations with AWS to get better pricing overall. So that's how we, >>We do it. So customers are driving. This sounds like, correct. For sure. So they're looking at this as saying, Hey, I'm gonna just get purchasing power with all my relationships because it's a solution architectural market, right? >>Yeah. It makes sense. Because if most customers will have a primary and secondary cloud provider, if they can consolidate, you know, multiple ISV spend through that same primary provider, you get pricing >>Power, okay, Jill, we're gonna date ourselves. At least I will. So back in the old days, it used to be, do a Barney deal with someone, Hey, let's go to market together. You gotta get paper, you do a biz dev deal. And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So what you're getting at here is that the alternative for data bricks or any company is to go find those partners and do deals versus now Amazon is the center point for the customer so that you can still do those joint deals. But this seems to be flipping the script a little bit. >>Well, it is, but we still have VAs and consulting partners that are doing implementation work very valuable work advisory work that can actually work with marketplace through the C PPO offering. So the marketplace allows multiple ways to procure your >>Solution. So it doesn't change your business structure. It just makes it more efficient. That's >>Correct. >>That's a great way to say it. Yeah, >>That's great. So that's so that's it. So that's just makes it more efficient. So you guys are actually incented to point customers to the marketplace. >>Yes, >>Absolutely. Economically. Yeah. >>E economically it's the right thing to do for the customer. It's the right thing to do for our relationship with Amazon, especially when it comes back to co-selling right? Because Amazon now is leaning in with ISVs and making recommendations for, you know, an ISV solution and our teams are working backwards from those use cases, you know, to collaborate, land them. >>Yeah. I want, I wanna get that out there. Go ahead, Joel. >>So one of the other things I might add to that too, you know, and why this is advantageous for, for companies like data bricks to, to work through the marketplace, is it makes it so much easier for customers to deploy a solution. It's, it's very, literally one click through the marketplace to get data bricks stood up inside of your environment. And so if you're looking at how do I help customers most rapidly adopt these solutions in the AWS cloud, the marketplace is a fantastic accelerator to that. You >>Know, it's interesting. I wanna bring this up and get your reaction to it because to me, I think this is the future of procurement. So from a procurement standpoint, I mean, again, dating myself EDI back in the old days, you know, all that craziness. Now this is all the, all the internet, basically through the console, I get the infrastructure side, you know, spin up and provision. Some servers, all been good. You guys have played well there in the marketplace. But now as we get into more of what I call the business apps, and they brought this up on stage little nuance, most enterprises aren't yet there of integrating tech on the business apps, into the stack. This is where I think you guys are a use case of success where you guys have been successful with data integration. It's an integrator's dilemma, not an innovator's dilemma. So like, I want to integrate, so now I have integration points with data bricks, but I want to put an app in there. I want to provision an application, but it has to be built. It's not, you don't buy it. You build, you gotta build stuff. And this is the nuance. What's your reaction to that? Am I getting this right? Or, or am I off because no, one's gonna be buying software. Like they used to, they buy software to integrate it. >>Yeah, >>No, I, cause everything's integrated. >>I think AWS has done a great job at creating a partner ecosystem, right. To give customers the right tools for the right jobs. And those might be with third parties, data bricks is doing the same thing with our partner connect program. Right. We've got customer, customer partners like five tra and D V T that, you know, augment and enhance our platform. And so you, you're looking at multi ISV architectures and all of that can be procured through the AWS marketplace. >>Yeah. It's almost like, you know, bundling and unbundling. I was talking about this with, with Dave ante about Supercloud, which is why wouldn't a customer want the best solution in their architecture period. And it's class. If someone's got API security or an API gateway. Well, you know, I don't wanna be forced to buy something because it's part of a suite and that's where you see things get suboptimized where someone dominates a category and they have, oh, you gotta buy my version of this. Yeah. >>Joel, Joel. And that's Joel and I were talking, we're actually saying what what's really important about Databricks is that customers control the data. Right? You wanna comment on that? >>Yeah. I was say the, you know what you're pushing on there we think is extraordinarily, you know, the way the market is gonna go is that customers want a lot of control over how they build their data stack. And everyone's unique in what tools are the right ones for them. And so one of the, you know, philosophically I think really strong places, data, bricks, and AWS have lined up is we both take an approach that you should be able to have maximum flexibility on the platform. And as we think about the lake house, one thing we've always been extremely committed to as a company is building the data platform on an open foundation. And we do that primarily through Delta lake and making sure that to Jack's point with data bricks, the data is always in your control. And then it's always stored in a completely open format. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with all the other data tools out there, because you're not tied into any proprietary format, but instead are able to take advantage of all the innovation that's happening out there in the open source ecosystem. >>When you see other solutions out there that aren't as open as you guys, you guys are very open by the way, we love that too. We think that's a great strategy, but what's the, what am I foreclosing? If I go with something else that's not as open what what's the customer's downside as you think about what's around the corner in the industry. Cuz if you believe it's gonna be open, open source, which I think opens our software is the software industry and integration is a big deal, cuz software's gonna be plentiful. Let's face it. It's a good time to be in software business, but cloud's booming. So what's the downside from your data bricks perspective, you see a buyer clicking on data bricks versus that alternative what's potentially is should they be a nervous about down the road if they go with a more proprietary or locked in approach? Well, >>I think the challenge with proprietary ecosystems is you become beholden to the ability of that provider to both build relationships and convince other vendors that they should invest in that format. But you're also then beholden to the pace at which that provider is able to innovate. And I think we've seen lots of times over history where, you know, a proprietary format may run ahead for a while on a lot of innovation. But as that market control begins to solidify that desire to innovate begins to, to degrade, whereas in the open format. So >>Extract rents versus innovation. Exactly. >>Yeah, exactly. >>But >>I'll say it in the open world, you know, you have to continue to innovate. Yeah. And the open source world is always innovating. If you look at the last 10 to 15 years, I challenge you to find, you know, an example where the innovation in the data and AI world is not coming from open source. And so by investing in open ecosystems, that means you were always going to be at the forefront of what is the >>Latest, you know, again, not to date myself again, but you look back at the eighties and nineties, the protocol stacked for proprietary. Yeah. You know, SNA at IBM deck net was digital, you know, the rest is, and then TCP, I P was part of the open systems, interconnect, revolutionary Oly, a big part of that as well as my school did. And so like, you know, that was, but it didn't standardize the whole stack. It stopped at IP and TCP. Yeah. But that helped interoperate, that created a nice defacto. So this is a big part of this mid game. I call it the chessboard, you know, you got opening game and mid game. Then you got the end game and we're not there. The end game yet cloud the cloud. >>There's, there's always some form of lock in, right. Andy jazzy will, will address it, you know, when making a decision. But if you're gonna make a decision you want to reduce as you don't wanna be limited. Right. So I would advise a customer that there could be limitations with a proprietary architecture. And if you look at what every customer's trying to become right now is an AI driven business. Right? And so it has to do with, can you get that data outta silos? Can you, can you organize it and secure it? And then can you work with data scientists to feed those models? Yeah. In a, in a very consistent manner. And so the tools of tomorrow will to Joel's point will be open and we want interoperability with those >>Tools and, and choice is a matter too. And I would say that, you know, the argument for why I think Amazon is not as locked in as maybe some other clouds is that they have to compete directly too. Redshift competes directly with a lot of other stuff, but they can't play the bundling game because the customers are getting savvy to the fact that if you try to bundle an inferior product with something else, it may not work great at all. And they're gonna be they're onto it. This is >>The Amazon's credit by having these, these solutions that may compete with native services in marketplace, they are providing customers with choice, low >>Price and access to the S and access to the core value. Exactly. Which the >>Hardware, which is their platform. Okay. So I wanna get you guys thought on something else. I, I see emerging, this is again kind of cube rumination moment. So on stage Chris unpacked, a lot of stuff. I mean this marketplace, they're touching a lot of hot buttons here, you know, pricing compensation, workflows services behind the curtain. And one of the things he mentioned was they talk about resellers or channel partners, depending upon what you talk about. We believe Dave and I believe on the cube that the entire indirect sales channel of the industry is gonna be disrupted radically because those players were selling hardware in the old days and software, that game is gonna change. You know, you mentioned you guys have a program, want to get your thoughts on this. We believe that once this gets set up, they can play in this game and bring their services in which means that the old reseller channels are gonna be rewritten. They're gonna be refactored with this new kinds of access. Cuz you've got scale, you've got money and you've got product and you got customers coming into the marketplace. So if you're like a reseller that sold computers to data centers or software, you know, value added reseller or V or business, >>You've gotta evolve. >>You gotta, you gotta be here. Yes. How are you guys working with those partners? Cuz you say you have a part in your marketplace there. How do I make money? If I'm a reseller with data bricks with eight Amazon, take me through that use case. >>Well I'll let Joel comment, but I think it's, it's, it's pretty straightforward, right? Customers need expertise. They need knowhow. When we're seeing customers do mass migrations to the cloud or Hadoop specific migrations or data transformation implementations, they need expertise from consulting and SI partners. If those consulting SI partners happen to resell the solution as well. Well, that's another aspect of their business, but I really think it is the expertise that the partners bring to help customers get outcomes. >>Joel, channel big opportunity for re re Amazon to reimagine this. >>For sure. Yeah. And I think, you know, to your comment about how to resellers take advantage of that, I think what Jack was pushing on is spot on, which is it's becoming more about more and more about the expertise you bring to the table and not just transacting the software, but now actually helping customers make the right choices. And we're seeing, you know, both SI begin to be able to resell solutions and finding a lot of opportunity in that. Yeah. And I think we're seeing traditional resellers begin to move into that SI model as well. And that's gonna be the evolution that >>This gets at the end of the day. It's about services for sure, for sure. You've got a great service. You're gonna have high gross profits. And >>I think that the managed service provider business is alive and well, right? Because there are a number of customers that want that, that type of a service. >>I think that's gonna be a really hot, hot button for you guys. I think being the way you guys are open this channel partner services model coming in to the fold really kind of makes for kind of that super cloudlike experience where you guys now have an ecosystem. And that's my next question. You guys have an ecosystem going on within data bricks for sure. On top of this ecosystem, how does that work? This is kinda like hasn't been written up in business school and case studies yet this is new. What is this? >>I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around the data platforms and that's gonna be one of the big kind of new horizons for us as we think about what drives ecosystems it's going to be around. Well, what is the, what's the data platform that I'm using and then all the tools that have to encircle that to get my business done. And so I think there's, you know, absolutely ecosystems inside of the AWS business on all of AWS's services, across data analytics and AI. And then to your point, you are seeing ecosystems now arise around data bricks in its Lakehouse platform, as well as customers are looking at well, if I'm standing these Lakehouse up and I'm beginning to invest in this, then I need a whole set of tools that help me get that done as well. >>I mean you think about ecosystem theory, we're living a whole nother dream and I'm, and I'm not kidding. It hasn't yet been written up and for business school case studies is that we're now in a whole nother connective tissue ecology thing happening where you have dependencies and value proposition economics connectedness. So you have relationships in these ecosystems. >>And I think one of the great things about relationships with these ecosystems is that there's a high degree of overlap. Yeah. So you're seeing that, you know, the way that the cloud business is evolving, the, the ecosystem partners of data bricks are the same ecosystem partners of AWS. And so as you build these platforms out into the cloud, you're able to really take advantage of best of breed, the broadest set of solutions out there for >>You. Joel, Jack, I love it because you know what it means the best ecosystem will win. If you keep it open. Sure. You can see everything. If you're gonna do it in the dark, you know, you don't know the outcome. I mean, this is really kind we're talking about. >>And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, right? There's very innovative companies that want to build things themselves. We're seeing now that that builders want to buy a platform. Right? Yeah. And so there's a platform decision being made and that ecosystem gonna evolve around the >>Platform. Yeah. And I totally agree. And, and, and the word innovation get kicks around. That's why, you know, when we had our super cloud panel was called the innovators dilemma with a slash through it called the integrated dilemma, innovation is the digital transformation. So absolutely like that becomes cliche in a way, but it really becomes more of a, are you open? Are you integrating if APIs are the connective tissue, what's automation, what's the service message look like. I mean, a whole nother set of kind of thinking goes on and these new ecosystems and these new products >>And that, and that thinking is, has been born in Delta sharing. Right? So the idea that you can have a multi-cloud implementation of data bricks, and actually share data between those two different clouds, that is the next layer on top of the native cloud >>Solution. Well, data bricks has done a good job of building on top of the goodness of, and the CapEx gift from AWS. But you guys have done a great job taking that building differentiation into the product. You guys have great customer base, great grow ecosystem. And again, I think in a shining example of what every enterprise is going to do, build on top of something operating model, get that operating model, driving revenue. >>Yeah. >>Well we, whether whether you're Goldman Sachs or capital one or XYZ corporation >>S and P global NASDAQ, right. We've got, you know, these, the biggest verticals in the world are solving tough problems with data breaks. I think we'd be remiss cuz if Ali was here, he would really want to thank Amazon for all of the investments across all of the different functions, whether it's the relationship we have with our engineering and service teams. Yeah. Our marketing teams, you know, product development and we're gonna be at reinvent the big presence of reinvent. We're looking forward to seeing you there again. >>Yeah. We'll see you guys there. Yeah. Again, good ecosystem. I love the ecosystem evolutions happening this next gen cloud is here. We're seeing this evolve kind of new economics, new value propositions kind of scaling up, producing more so you guys are doing a great job. Thanks for coming on the Cuban, taking time. Chill. Great to see you at the check. Thanks for having us. Thanks. Going. Okay. Cube coverage here. The world's changing as APN comes to give the marketplace for a new partner organization at Amazon web services, the Cube's got a covered. This should be a very big growing ecosystem as this continues, billions of being sold through the marketplace. Of course the buyers are happy as well. So we've got it all covered. I'm John furry, your host of the cube. Thanks for watching.

Published Date : Sep 21 2022

SUMMARY :

Thanks for good to see you again. Yeah, John, great to be here. Obviously it makes it's a no brainer on the micro, you know, You're in the middle of it. you know, unique use cases. So this is speaks volumes to the, the relationship you have years. And when you look at what the APN allows us to do, And so we see customers, you know, doing rapid experimentation pilots, POCs, So you got the big contracts with the private offer. And that's, that is the problem that data bricks is out there to solve, They just couldn't solve before a good example of this, you know, And if you think about what does it take to set that up? So how do you guys look at this? Well, I I'd add what Joel just talked about with, with, you know, what the solution, the value of the solution our entire offering And that really helps customers because if you get data bricks So they're looking at this as saying, you know, multiple ISV spend through that same primary provider, you get pricing And then you gotta say, okay, now let's coordinate our sales teams, a lot of moving parts. So the marketplace allows multiple ways to procure your So it doesn't change your business structure. Yeah, So you guys are actually incented to Yeah. It's the right thing to do for our relationship with Amazon, So one of the other things I might add to that too, you know, and why this is advantageous for, I get the infrastructure side, you know, spin up and provision. you know, augment and enhance our platform. you know, I don't wanna be forced to buy something because it's part of a suite and the data. And that is one of the things that's allowed data bricks to have the breadth of integrations that it has with When you see other solutions out there that aren't as open as you guys, you guys are very open by the I think the challenge with proprietary ecosystems is you become beholden to the Exactly. I'll say it in the open world, you know, you have to continue to innovate. I call it the chessboard, you know, you got opening game and mid game. And so it has to do with, can you get that data outta silos? And I would say that, you know, the argument for why I think Amazon Price and access to the S and access to the core value. So I wanna get you guys thought on something else. You gotta, you gotta be here. If those consulting SI partners happen to resell the solution as well. And we're seeing, you know, both SI begin to be This gets at the end of the day. I think that the managed service provider business is alive and well, right? I think being the way you guys are open this channel I think, you know, what it comes down to is you're seeing ecosystems begin to evolve around So you have relationships in And so as you build these platforms out into the cloud, you're able to really take advantage you don't know the outcome. And John, can I just add that when I was in Amazon, we had a, a theory that there's buyers and builders, That's why, you know, when we had our super cloud panel So the idea that you can have a multi-cloud implementation of data bricks, and actually share data But you guys have done a great job taking that building differentiation into the product. We're looking forward to seeing you there again. Great to see you at the check.

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Joel Minick	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Joel	PERSON	0.99+
Ali	PERSON	0.99+
Jack Anderson	PERSON	0.99+
Dave	PERSON	0.99+
$5 million	QUANTITY	0.99+
Jack	PERSON	0.99+
two	QUANTITY	0.99+
Goldman Sachs	ORGANIZATION	0.99+
XYZ	ORGANIZATION	0.99+
Joel Minnick	PERSON	0.99+
Jack Andersen	PERSON	0.99+
Andy jazzy	PERSON	0.99+
third aspect	QUANTITY	0.99+
John fur	PERSON	0.99+
NASDAQ	ORGANIZATION	0.99+
Barney	ORGANIZATION	0.99+
both	QUANTITY	0.99+
five short months	QUANTITY	0.99+
One	QUANTITY	0.99+
APO	ORGANIZATION	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
first 100 million	QUANTITY	0.98+
tomorrow	DATE	0.98+
one	QUANTITY	0.98+
billions	QUANTITY	0.98+
Johnny	PERSON	0.97+
Davis	PERSON	0.97+
a million dollars	QUANTITY	0.96+
Salesforce	ORGANIZATION	0.96+
data bricks	ORGANIZATION	0.95+
each ISV	QUANTITY	0.95+
Seattle, Washington	LOCATION	0.95+
two different ways	QUANTITY	0.95+
one data platform	QUANTITY	0.95+
seven years ago	DATE	0.94+

Ali Ghodsi, Databricks | Supercloud22

(light hearted music) >> Okay, welcome back to Supercloud '22. I'm John Furrier, host of theCUBE. We got Ali Ghodsi here, co-founder and CEO of Databricks. Ali, Great to see you. Thanks for spending your valuable time to come on and talk about Supercloud and the future of all the structural change that's happening in cloud computing. >> My pleasure, thanks for having me. >> Well, first of all, congratulations. We've been talking for many, many years, and I still go back to the video that we have in archive, you talking about cloud. And really, at the beginning of the big reboot, I called the post Hadoop, a revitalization of data. Congratulations, you've been cloud-first, now on multiple clouds. Congratulations to you and your team for achieving what looks like a billion dollars in annualized revenue as reported by the Wall Street Journal, so first, congratulations. >> Thank you so much, appreciate it. >> So I was talking to some young developers and I asked a random poll, what do you think about Databricks? Oh, we love those guys, they're AI and ML-native, and that's their advantage over the competition. So I pressed why. I don't think they knew why, but that's an interesting perspective. This idea of cloud native, AI/ML-native, ML Ops, this has been a big trend and it's continuing. This is a big part of how this change and this structural change is happening. How do you react to that? And how do you see Databricks evolving into this new Supercloud-like multi-cloud environment? >> Yeah, look, I think it's a continuum. It starts with having data, but they want to clean it, you know, and they want to get insights out of it. But then, eventually, you'd like to start asking questions, doing reports, maybe ask questions about what was my revenue yesterday, last week, but soon you want to start using the crystal ball, predictive technology. Okay, but what will my revenue be next week? Next quarter? Who's going to churn? And if you can finally automate that completely so that you can act on the predictions, right? So this credit card that got swiped, the AI thinks it's fraud, we're going to deny it. That's when you get real value. So we're trying to help all these organizations move through this data AI maturity curve, all the way to that, the prescriptive, automated AI machine learning. That's when you get real competitive advantage. And you know, we saw that with the fans, right? I mean, Google wouldn't be here today if it wasn't for AI. You know, we'd be using AltaVista or something. We want to help all organizations to be able to leverage data and AI that way that the fans did. >> One of the things we're looking at with supercloud and why we call it supercloud versus other things like multi-cloud is that today a lot of the successful companies have started in the cloud have been successful, but have realized and even enterprises who have gotten by accident, and maybe have done nothing with cloud have just some cloud projects on multiple clouds. So, people have multiple cloud operational things going on but it hasn't necessarily been a strategy per se. It's been more of kind of a default reaction to things but the ones that are innovating have been successful in one native cloud because the use cases that drove that got scale got value, and then they're making that super by bringing it on premise, putting in a modern data stack, for the modern application development, and kind of dealing with the things that you guys are in the middle of with data bricks is that, that is where the action is, and they don't want to go, lose the trajectory in all the economies of scale. So we're seeing another structural change where the evolutionary nature of the cloud has solved a bunch of use cases, but now other use cases are emerging that's on premises and edge that have been driven by applications because of the developer boom, that's happening. You guys are in the middle of it. What is happening with this structural change? Are people looking for the modern data stack? Are they looking for more AI? What's the, what's your perspective on this supercloud kind of position? >> Look, it started with not AR on multiple clouds, right? So multi-cloud has been a thing. It became a thing 70, 80% of our customers when you ask them, they're more than one cloud. But then soon to start realizing that, hey, you know, if I'm on multiple clouds, this data stuff is hard enough as it is. Do I want to redo it again and again with different proprietary technologies, on each of the clouds. And that's when I started thinking about let's standardize this, let's figure out a way which just works across them. That's where I think open source comes in, becomes really important. Hey, can we leverage open standards because then we can make it work in these different environments, as we said so that we can actually go super, as you said, that's one. The second thing is, can we simplify it? You know, and I think today, the data landscape is complicated. Conceptually it's simple. You have data which is essentially customer data that you have, maybe employee data. And you want to get some kind of insights from that. But how you do that is very complicated. You have to buy data warehouse, hire data analysts. You have to buy, store stuff in the Delta Lake you know, get your data engineers. If you want streaming real time thing that's another complete different set of technologies you have to buy. And then you have to stitch all these together, and you have to do again and again on every cloud. So they just want simplification. So that's why we're big believers in this Delta Lakehouse concept. Which is an open standard to simplifying this data stack and help people to just get value out of their data in any environment. So they can do that in this sort of supercloud as you call it. >> You know, we've been talking about that in previous interviews, do the heavy lifting let them get the value. I have to ask you about how you see that going forward, Because if I'm a customer, I have a lot of operational challenges. Cause the developers are are kicking butt right now. We see that clearly. Open sources growing at, and continue to be great. But ops and security teams they really care about this stuff. And most companies don't want to spin up multiple ops teams to deal with different stacks. This is one big problem that I think that's leading into the multi-cloud viability. How do you guys deal with that? How do you talk to customers when they say, I want to have less complications on operations? >> Yeah, you're absolutely right. You know, it's easy for a developer to adopt all these technologies and new things are coming out all the time. The ops teams are the ones that have to make sure this works. Doing that in multiple different environments is super hard. especially when there's a proprietary stack in each environment that's different. So they just want standardization. They want open source, that's super important. We hear that all the time from them. They want open the source technologies. They believe in the communities around it. You know, they know that source code is open. So you can also see if there's issues with it. If there's security breaches, those kind of things that they can have a community around it. So they can actually leverage that. So they're the ones that are really pushing this, and we're seeing it across the board. You know, it starts first with the digital natives you know, the companies that are, but slowly it's also now percolating to the other organizations, we're hearing across the board. >> Where are we, Ali on the innovation strategies for customers? Where are they on the trajectory around how they're building out their teams? How are they looking at the open source? How are they extending the value proposition of Databricks, and data at scale, as they start to build out their teams and operations, because some are like kind of starting, crawl, walk, run, kind of vibe. Some are big companies, they're dealing with data all the time. Where are they in their journey? What's the core issues that they're solving? What are some of the use cases that you see that are most pressing in customer? >> Yeah, what I've seen, that's really exciting about this Delta Lakehouse concept is that we're now seeing a lot of use cases around real time. So real time fraud detection, real time stock ticker pricing, anyone that's doing trading, they want that to work real time. Lots of use cases around that. Lots of use cases around how do we in real time drive more engagement on our web assets if we're a media company, right? We have all these assets how do we get people to get engaged? Stay on our sites. Continue engaging with the material we have. Those are real time use cases. And the interesting thing is, they're real time. So, you know, it's really important that you that now you don't want to recommend someone, hey, you should go check out this restaurant if they just came from that restaurant, half an hour ago. So you want it to be real time, but B, that it's also all based on machine learning. These are a lot of this is trying to predict what you want to see, what you want to do, is it fraudulent? And that's also interesting because basically more and more machine learning is coming in. So that's super exciting to see, the combination of real time and machine learning on the Lakehouse. And finally, I would say the Lakehouse is really important for this because that's where the data is flowing in. If they have to take that data that's flowing into the lake and actually copy it into a separate warehouse, that delays the real time use cases. And then it can't hit those real time deadlines. So that's another catalyst for this Lakehouse pattern. >> Would that be an example of how the metrics are changing? Cause I've been looking at some people saying, well you can tell if someone's doing well there's a lot of data being transferred. And then I was saying, well, wait a minute. Data transfer costs money, right? And time. So this is interesting dynamic, in a way you don't want to have a lot of movement, right? >> Yeah, movement actually decreases for a lot of these real time use cases. 'Cause what we saw in the past was that they would run a batch processing to process all the data. So once they process all the data. But actually if you look at the things that have changed since the data that we have yesterday it's actually not that much. So if you can actually incrementally process it in real time, you can actually reduce the cost of transfers and storage and processing. So that's actually a great point. That's also one of the main things that we're seeing with the use cases, the bill shrinks and the cost goes down, and they can process less. >> Yeah, and it'd be interesting to see how those KPIs evolve into industry metrics down the road around the supercloud of evolution. I got to ask you about the open source concept of data platforms. You guys have been a pioneer in there doing great work, kind of picking the baton off where the Hadoop World left off as Dave Vellante always points out. But if working across clouds is super important. How are you guys looking at the ability to work across the different clouds with data bricks? Are you going to build that abstraction yourself? Does data sharing and model sharing kind of come into play there? How do you see this data bricks capability across the clouds? >> Yeah, I mean, let me start by saying, we just we're big fans of open source. We think that open source is a force in software. That's going to continue for, decades, hundreds of years, and it's going to slowly replace all proprietary code in its way. We saw that, it could do that with the most advanced technology. Windows, you know proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the Delta Lakehouse is that slowly the open source community is building a replacement for the proprietary data warehouse, Delta Lake, machine learning, real time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you layout your data in the cloud. And when it comes a really important protocol called data sharing, that enables you in a open way actually for the first time ever share large data sets between organizations, but it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't need to even like Databricks, you just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently just one copy of the data. So you don't have to copy it if you're within the same cloud. >> So you're playing the long game on open source. >> Absolutely. I mean, this is a force it's going to be there if if you deny it, before you know it there's going to be, something like Linux, that is going to be a threat to your propriety. >> I totally agree by the way. I was just talking to somebody the other day and they're like hey, the software industry someone made the comment, the software industry, the software industry is open source. There's no more software industry, it's called open source. It's integrations that become interesting. And I was looking at integrations now is really where the action is. And we had a panel with the Clouderati we called it, the people have been around for a long time. And it was called the innovator's dilemma. And one of the comments was it's the integrator's dilemma, not the innovator's dilemma. And this is a big part of this piece of supercloud. Can you share your thoughts on how cloud and integration need to be tightened up to really make it super? >> Actually that's a great point. I think the beauty of this is, look the ecosystem of data today is vast, there's this picture that someone puts together every year of all the different vendors and how they relate, and it gets bigger and bigger and messy and messier. So, we see customers use all kinds of different aspects of what's existing in the ecosystem and they want it to be integrated in whatever you're selling them. And that's where I think the power of open source comes in. Open source, you get integrations that people will do without you having to push it. So us, Databricks as a vendor, we don't have to go tell people please integrate with Databricks. The open source technology that we contribute to, automatically, people are integrating with it. Delta Lake has integrations with lots of different software out there and Databricks as a company doesn't have to push that. So I think open source is also another thing that really helps with the ecosystem integrations. Many of these companies in this data space actually have employees that are full-time dedicated to make sure make sure our software works well with Spark. Make sure our software works well with Delta and they contribute back to that community. And that's the way you get this sort of ecosystem to further sort of flourish. >> Well, I really appreciate your time. And I, my final question for you is, as we're kind of unpack and and kind of shape and frame supercloud for the future, how would you see a roadmap or architecture or outcome for companies that are going to clearly be in the cloud where it's open source is going to be dominating. Integrations has got to be seamless and frictionless. Abstraction layer make things super easy and take away the complexity. What is supercloud to them? What does the outcome look like? How would you define a supercloud environment for an enterprise? >> Yeah, for me, it's the simplification that you get where you standardize an open source. You get your data in one place, in one format in one standardized way, and then you can get your insights from it, without having to buy lots of different idiosyncratic proprietary software from different vendors. That's different in each environment. So it's this slow standardization that's happening. And I think it's going to happen faster than we think. And I think in a couple years it's going to be a requirement that, does your software work on all these different departments? Is it based on open source? Is it using this Delta Lake house pattern? And if it's not, I think they're going to demand it. >> Yeah, I feel like we're close to some sort of defacto standard coming and you guys are a big part of it, once that clicks in, it's going to highly accelerate in the open, and I think it's going to be super valuable. Ali, thank you so much for your time, and congratulations to you and your team. Like we've been following you guys since the beginning. Remember the early days and look how far it's come. And again, you guys are really making a big difference in making a super cool environment out there. Thanks for coming on sharing. >> Thank you so much John. >> Okay, this is supercloud 22. I'm John Furrier stay with more for more coverage and more commentary after this break. (light hearted music)

Published Date : Aug 7 2022

SUMMARY :

and the future of all Congratulations to you and your team And how do you see Databricks evolving And if you can finally One of the things we're And then you have to I have to ask you about how We hear that all the time from them. What are some of the use cases that delays the real time use cases. in a way you don't want to So if you can actually incrementally I got to ask you about So you don't have to copy it So you're playing the that is going to be a And one of the comments was And that's the way you and take away the complexity. simplification that you get and congratulations to you and your team. Okay, this is supercloud 22.

ENTITIES

Entity	Category	Confidence
Ali Ghodsi	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Google	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
John	PERSON	0.99+
last week	DATE	0.99+
next week	DATE	0.99+
Ali	PERSON	0.99+
Next quarter	DATE	0.99+
yesterday	DATE	0.99+
John Furrier	PERSON	0.99+
Delta	ORGANIZATION	0.99+
one format	QUANTITY	0.99+
first	QUANTITY	0.99+
today	DATE	0.98+
second thing	QUANTITY	0.98+
one	QUANTITY	0.98+
Linux	TITLE	0.98+
one copy	QUANTITY	0.98+
Delta Lakehouse	ORGANIZATION	0.98+
supercloud 22	ORGANIZATION	0.98+
more than one cloud	QUANTITY	0.98+
each environment	QUANTITY	0.98+
Clouderati	ORGANIZATION	0.98+
Supercloud22	ORGANIZATION	0.98+
hundreds of years	QUANTITY	0.97+
Delta Lake	LOCATION	0.97+
one big problem	QUANTITY	0.97+
70, 80%	QUANTITY	0.97+
Windows	TITLE	0.96+
one place	QUANTITY	0.96+
first time	QUANTITY	0.96+
billion dollars	QUANTITY	0.95+
decades	QUANTITY	0.95+
Delta Lake	ORGANIZATION	0.95+
One	QUANTITY	0.94+
supercloud	ORGANIZATION	0.94+
Supercloud	ORGANIZATION	0.94+
half an hour ago	DATE	0.93+
Delta Lake	TITLE	0.92+
Lakehouse	ORGANIZATION	0.92+
Spark	TITLE	0.91+
each	QUANTITY	0.91+
a minute	QUANTITY	0.85+
one of	QUANTITY	0.73+
one native	QUANTITY	0.72+
supercloud	TITLE	0.7+
couple years	QUANTITY	0.66+
AltaVista	ORGANIZATION	0.65+
Wall Street Journal	ORGANIZATION	0.63+
theCUBE	ORGANIZATION	0.63+
Lakehouse	TITLE	0.51+
Lake	LOCATION	0.46+
Hadoop World	TITLE	0.41+
'22	EVENT	0.24+

Erik Bradley | AWS Summit New York 2022

>>Hello, everyone. Welcome to the cubes coverage here. New York city for AWS Amazon web services summit 2022. I'm John furrier, host of the cube with Dave ante. My co-host. We are breaking it down, getting an update on the ecosystem. As the GDP drops, inflations up gas prices up the enterprise continues to grow. We're seeing exceptional growth. We're here on the ground floor. Live at the Summit's packed house, 10,000 people. Eric Bradley's here. Chief STR at ETR, one of the premier enterprise research firms out there, partners with the cube and powers are breaking analysis that Dave does check that out as the hottest podcast in enterprise. Eric. Great to have you on the cube. Thanks for coming on. >>Thank you so much, John. I really appreciate the collaboration always. >>Yeah. Great stuff. Your data's amazing ETR folks watching check out ETR. They have a unique formula, very accurate. We love it. It's been moving the market. Congratulations. Let's talk about the market right now. This market is booming. Enterprise is the hottest thing, consumers kind of in the toilet. Okay. I said that all right, back out devices and, and, and consumer enterprise is still growing. And by the way, this first downturn, the history of the world where hyperscalers are on full pumping on all cylinders, which means they're still powering the revolution. >>Yeah, it's true. The hyperscalers were basically at this two sun system when Microsoft and an AWS first came around and everything was orbiting around it. And we're starting to see that sun cool off a little bit, but we're talking about a gradient here, right? When we say cool off, we're not talking to shutdown, it's still burning hot. That's for sure. And I can get it to some of the macro data in a minute, if that's all right. Or do you want me to go right? No, go go. Right. Yeah. So right now we just closed our most recent survey and that's macro and vendor specific. We had 1200 people talk to us on the macro side. And what we're seeing here is a cool down in spending. We originally had about 8.5% increase in budgets. That's cool down is 6.5 now, but I'll say with the doom and gloom and the headlines that we're seeing every day, 6.5% growth coming off of what we just did the last couple of years is still pretty fantastic as a backdrop. >>Okay. So you, you started to see John mentioned consumer. We saw that in Snowflake's earnings. For example, we, we certainly saw, you know, Walmart, other retailers, the FA Facebooks of the world where consumption was being dialed down, certain snowflake customers. Not necessarily, they didn't have mentioned any customers, but they were able to say, all right, we're gonna dial down, consumption this quarter, hold on until we saw some of that in snowflake results and other results. But at the same time, the rest of the industry is booming. But your data is showing softness within the fortune 500 for AWS, >>Not only AWS, but fortune 500 across the board. Okay. So going back to that larger macro data, the biggest drop in spending that we captured is fortune 500, which is surprising. But at the same time, these companies have a better purview into the economy. In general, they tend to see things further in advance. And we often remember they spend a lot of money, so they don't need to play catch up. They'll easily more easily be able to pump the brakes a little bit in the fortune 500. But to your point, when we get into the AWS data, the fortune 500 decrease seems to be hitting them a little bit more than it is Azure and GCP. I >>Mean, we're still talking about a huge business, right? >>I mean, they're catching up. I mean, Amazon has been transforming from owning the developer cloud startup cloud decade ago to really putting a dent on the enterprise as being number one cloud. And I still contest that they're number one by a long ways, but Azure kicking ass and catching up. Okay. You seeing people move to Azure, you got Charlie bell over there, Sean, by former Amazonians, Theresa Carlson, people are going over there, there there's lift over at Azure. >>There certainly is. >>Is there kinks in the arm or for AWS? There's >>A couple of kinks, but I think your point is really good. We need to take a second there. If you're talking about true pass or infrastructure is a service true cloud compute. I think AWS still is the powerhouse. And a lot of times the, the data gets a little muddied because Azure is really a hosted platform for applications. And you're not really sure where that line is drawn. And I think that's an important caveat to make, but based on the data, yes, we are seeing some kinks in the armor for AWS. Yes. Explain. So right now, a first of all caveat, 40% net score, which is our proprietary spending metric across the board. So we're not like raising any alarms here. It's still strong that said there are declines and there are declines pretty much across the board. The only spot we're not seeing a decline at all is in container, spend everything else is coming down specifically. We're seeing it come down in data analytics, data warehousing, and M I, which is a little bit of a concern because that, that rate of decline is not the same with Azure. >>Okay. So I gotta ask macro, I see the headwinds on the macro side, you pointed that out. Is there any insight into any underlying conditions that might be there on AWS or just a chronic kind of situational thing >>Right now? It seems situational. Other than that correlation between their big fortune 500, you know, audience and that being our biggest decline. The other aspect of the macro survey is we ask people, if you are planning to decline spend, how do you plan on doing it? And the number two answer is taking a look at our cloud spend and auditing it. So they're kind say, all right, you know, for the last 10 years it's been drunken, sail or spend, I >>Was gonna use that same line, you know, >>Cloud spend, just spend and we'll figure it out later, who cares? And then right now it's time to tighten the belts a little bit, >>But this is part of the allure of cloud at some point. Yeah. You, you could say, I'm gonna, I'm gonna dial it down. I'm gonna rein it in. So that's part of the reason why people go to the cloud. I want to, I wanna focus in on the data side of things and specifically the database. Let, just to give some context if, and correct me if I'm, I'm a little off here, but snowflake, which hot company, you know, on the planet, their net score was up around 80% consistently. It it's dropped down the last, you know, quarter, last survey to 60%. Yeah. So still highly, highly elevated, but that's relative to where Amazon is much larger, but you're saying they're coming down to the 40% level. Is that right? >>Yeah, they are. And I remember, you know, when I first started doing this 10 years ago, AWS at a 70%, you know, net score as well. So what's gonna happen over time is those adoptions are gonna get less and you're gonna see more flattening of spend, which ultimately is going to lower the score because we're looking for expansion rates. We wanna see adoption and increase. And when you see flattening a spend, it starts to contract a little bit. And you're right. Snowflake also was in the stratosphere that cooled off a little bit, but still, you know, very strong and AWS is coming down. I think the reason why it's so concerning is because a it's within the fortune 500 and their rate of decline is more than Azure right >>Now. Well, and, and one of the big trends you're seeing in database is this idea of converging function. In other words, bringing transaction and analytics right together at snowflake summit, they added the capability to handle transaction data, Mongo DB, which is largely mostly transactions added the capability in June to bring in analytic data. You see data bricks going from data engineering and data science now getting into snowflake space and analytics. So you're seeing that convergence Oracle is converging with my SQL heat wave and their core databases, couch base couch base is doing the same. Maria do virtually all these database companies are, are converging their platforms with the exception of AWS. AWS is still the right tool for the right job. So they've got Aurora, they've got RDS, they've got, you know, a dynamo DV, they've got red, they've got, you know, going on and on and on. And so the question everybody's asking is will that change? Will they start to sort of cross those swim lanes? We haven't seen it thus far. How is that affecting the data >>Performance? I mean, that's fantastic analysis. I think that's why we're seeing it because you have to be in the AWS ecosystem and they're really not playing nicely with others in the sandbox right now that now I will say, oh, Amazon's not playing nicely. Well, no, no. Simply to your point though, that there, the other ones are actually bringing in others at consolidating other different vendor types. And they're really not. You know, if you're in AWS, you need to stay within AWS. Now I will say their tools are fantastic. So if you do stay within AWS, they have a tool for every job they're advanced. And they're incredible. I think sometimes the complexity of their tools hurts them a little bit. Cause to your point earlier, AWS started as a developer-centric type of cloud. They have moved on to enterprise cloud and it's a little bit more business oriented, but their still roots are still DevOps friendly. And unless you're truly trained, AWS can be a little scary. >>So a common use case is I'm gonna be using Aurora for my transaction system and then I'm gonna ETL it into Redshift. Right. And, and I, now I have two data stores and I have two different sets of APIs and primitives two different teams of skills. And so that is probably causing some friction and complexity in the customer base that again, the question is, will they begin to expand some of those platforms to minimize some of that friction? >>Well, yeah, this is the question I wanted to ask on that point. So I've heard from people inside Amazon don't count out Redshift, we're making, we're catching up. I think that's my word, but they were kind of saying that right. Cuz Redshift is good, good database, but they're adding a lot more. So you got snowflake success. I think it's a little bit of a jealousy factor going on there within Redshift team, but then you got Azure synapse with the Synap product synapse. Yep. And then you got big query from Google big >>Query. Yep. >>What's the differentiation. What are you seeing for the data for the data warehouse or the data clouds that are out there for the customers? What's the data say, say to us? >>Yeah, unfortunately the data's showing that they're dropping a little bit whose day AWS is dropping a little bit now of their data products, Redshift and RDS are still the two highest of them, but they are starting to decline. Now I think one of the great data points that we have, we just closed the survey is we took a comparison of the legacy data. Now please forgive me for the word legacy. We're gonna anger a few people, but we Gotter data Oracle on-prem, we've got IBM. Some of those more legacy data warehouse type of names. When we look at our art survey takers that have them where their spend is going, that spends going to snowflake first, and then it's going to Google and then it's going to Microsoft Azure and, and AWS is actually declining in there. So when you talk about who's taking that legacy market share, it's not AWS right now. >>So legacy goes to legacy. So Microsoft, >>So, so let's work through in a little context because Redshift really was the first to take, you know, take the database to the cloud. And they did that by doing a one time license deal with par XL, which was an on-prem database. And then they re-engineered it, they did a fantastic job, but it was still engineered for on-prem. Then you along comes snowflake a couple years later and true cloud native, same thing with big query. Yep. True cloud native architecture. So they get a lot of props. Now what, what Amazon did, they took a page outta of the snowflake, for example, separating compute from storage. Now of course what's what, what Amazon did is actually not really completely separating like snowflake did they couldn't because of the architecture, they created a tearing system that you could dial down the compute. So little nuances like that. I understand. But at the end of the day, what we're seeing from snowflake is the gathering of an ecosystem in this true data cloud, bringing in different data types, they got to the public markets, data bricks was not able to get to the public markets. Yeah. And think is, is struggling >>And a 25 billion evaluation. >>Right. And so that's, that's gonna be dialed down, struggling somewhat from a go to market standpoint where snowflake has no troubles from a go to market. They are the masters at go to market. And so now they've got momentum. We talked to Frank sluman at the snowflake. He basically said, I'm not taking the foot off the gas, no way. Yeah. We, few of our large, you know, consumer customers dialed things down, but we're going balls to the >>Wall. Well, if you look at their show before you get in the numbers, you look at the two shows. Snowflake had their summit in person in Vegas. Data bricks has had their show in San Francisco. And if you compare the two shows, it's clear, who's winning snowflake is blew away from a, from a market standpoint. And we were at snowflake, but we weren't at data bricks, but there was really nothing online. I heard from sources that it was like less than 3000 people. So >>Snowflake was 1900 people in 2019, nearly 10,000. Yeah. In 2020, >>It's gonna be fun to sort of track that as a, as an odd caveat to say, okay, let's see what that growth is. Because in fairness, data, bricks, you know, a little bit younger, Snowflake's had a couple more years. So I'd be curious to see where they are. Their, their Lakehouse paradigm is interesting. >>Yeah. And I think it's >>And their product first company, yes. Their go to market might be a little bit weak from our analysis, but that, but they'll figure it out. >>CEO's pretty smart. But I think it's worth pointing out. It's like two different philosophies, right? It is. Snowflake is come into our data cloud. That's their proprietary environment. They're the, they think of the iPhone, right? End to end. We, we guarantee it's all gonna work. And we're in control. Snowflake is like, Hey, open source, no, bring in data bricks. I mean data bricks, open source, bring in this tool that too, now you are seeing snowflake capitulate a little bit. They announce, for instance, Apache iceberg support at their, at the snowflake summit. So they're tipping their cap to open source. But at the end of the day, they're gonna market and sell the fact that it's gonna run better in native snowflake. Whereas data bricks, they're coming at it from much more of an open source, a mantra. So that's gonna, you know, we'll see who look at, you had windows and you had apple, >>You got, they both want, you got Cal and you got Stanford. >>They both >>Consider, I don't think it's actually there yet. I, I find the more interesting dynamic right now is between AWS and snowflake. It's really a fun tit for tat, right? I mean, AWS has the S three and then, you know, snowflake comes right on top of it and announces R two, we're gonna do one letter, one number better than you. They just seem to have this really interesting dynamic. And I, and it is SLT and no one's betting against him. I mean, this guy's fantastic. So, and he hasn't used his war chest yet. He's still sitting on all that money that he raised to your point, that data bricks five, their timing just was a little off >>5 billion in >>Capital when Slootman hasn't used that money yet. So what's he gonna do? What can he do when he turns that on? He finds the right. >>They're making some acquisitions. They did the stream lit acquisitions stream. >>Fantastic >>Problem. With data bricks, their valuation is underwater. Yes. So they're recruiting and their MNAs. Yes. In the toilet, they cannot make the moves because they don't have the currency until they refactor the multiple, let the, this market settle. I I'm, I'm really nervous that they have to over factor the >>Valuation. Having said that to your point, Eric, the lake house architecture is definitely gaining traction. When you talk to practitioners, they're all saying, yeah, we're building data lakes, we're building lake houses. You know, it's a much, much smaller market than the enterprise data warehouse. But nonetheless, when you talk to practitioners that are actually doing things like self serve data, they're building data lakes and you know, snow. I mean, data bricks is right there. And as a clear leader in, in ML and AI and they're ahead of snowflake, right. >>And I was gonna say, that's the thing with data bricks. You know, you're getting that analytics at M I built into it. >>You know, what's ironic is I remember talking to Matt Carroll, who's CEO of auDA like four or five years ago. He came into the office in ma bro. And we were in temporary space and we were talking about how there's this new workload emerging, which combines AWS for cloud infrastructure, snowflake for the simple data warehouse and data bricks for the ML AI, and then all now all of a sudden you see data bricks yeah. And snowflake going at it. I think, you know, to your point about the competition between AWS and snowflake, here's what I think, I think the Redshift team is, you know, doesn't like snowflake, right. But I think the EC two team loves it. Loves it. Exactly. So, so I think snowflake is driving a lot of, >>Yeah. To John's point, there is plenty to go around. And I think I saw just the other day, I saw somebody say less than 40% of true global 2000 organizations believe that they're at real time data analytics right now. They're not really there yet. Yeah. Think about how much runway is left and how many tools you need to get to real time streaming use cases. It's complex. It's not easy. >>It's gonna be a product value market to me, snowflake in data bricks. They're not going away. Right. They're winning architectures. Yeah. In the cloud, what data bricks did would spark and took over the Haddo market. Yeah. To your point. Now that big data, market's got two players, in my opinion, snow flicking data, bricks converging. Well, Redshift is sitting there behind the curtain, their wild card. Yeah. They're wild card, Dave. >>Okay. I'm gonna give one more wild card, which is the edge. Sure. Okay. And that's something that when you talk about real time analytics and AI referencing at the edge, there aren't a lot of database companies in a position to do that. You know, Amazon trying to put outposts out there. I think it runs RDS. I don't think it runs any other database. Right. Snowflake really doesn't have a strong edge strategy when I'm talking the far edge, the tiny edge. >>I think, I think that's gonna be HPE or Dell's gonna own the outpost market. >>I think you're right. I'll come back to that. Couch base is an interesting company to watch with Capella Mongo. DB really doesn't have a far edge strategy at this point, but couch base does. And that's one to watch. They're doing some really interesting things there. And I think >>That, but they have to leapfrog bongo in my >>Opinion. Yeah. But there's a new architecture emerging at the edge and it's gonna take a number of years to develop, but it could eventually from an economic standpoint, seep back into the enterprise arm base, low end, take a look at what couch base is >>Doing. They hired an Amazon guard system. They have to leapfrog though. They need to, they can't incrementally who's they who >>Couch >>Base needs to needs to make a big move in >>Leap frog. Well, think they're trying to, that's what Capella is all about was not only, you know, their version of Atlas bringing to the cloud couch base, but it's also stretching it out to the edge and bringing converged database analytics >>Real quick on the numbers. Any data on CloudFlare, >>I was, I've been sitting here trying to get the word CloudFlare out my mouth the whole time you guys were talking, >>Is this another that's innovated in the ecosystem. So >>Platform, it was really simple for them early on, right? They're gonna get that edge network out there and they're gonna steal share from Akamai. Then they started doing exactly what Akamai did. We're gonna start rolling out some security. Their security is fantastic. Maybe some practitioners are saying a little bit too much, cuz they're not focused on one thing or another, but they are doing extremely well. And now they're out there in the cloud as well. You >>Got S3 compare. They got two, they got an S3 competitor. >>Exactly. So when I'm listening to you guys talk about, you know, a, a couch base I'm like, wow, those two would just be an absolute fantastic, you know, combination between the two of them. You mean >>CloudFlare >>Couch base. Yeah. >>I mean you got S3 alternative, right? You got a Mongo alternative basically in my >>Opinion. And you're going and you got the edge and you got the edge >>Network with security security, interesting dynamic. This brings up the super cloud date. I wanna talk about Supercloud because we're seeing a trend on we're reporting this since last year that basically people don't have to spend the CapEx to be cloud scale. And you're seeing Amazon enable that, but snowflake has become a super cloud. They're on AWS. Now they're on Azure. Why not tan expansion expand the market? Why not get that? And then it'll be on Google next, all these marketplaces. So the emergence of this super cloud, and then the ability to make that across a substrate across multiple clouds is a strategy we're seeing. What do you, what do you think? >>Well, honestly, I'm gonna be really Frank here. The, everything I know about the super cloud I know from this guy. So I've been following his lead on this and I'm looking forward to you guys doing that conference and that summit coming up from a data perspective. I think what you're saying is spot on though, cuz those are the areas we're seeing expansion in without a doubt. >>I think, you know, when you talk about things like super cloud and you talk about things like metaverse, there's, there's a, there, there look every 15 or 20 years or so this industry reinvents itself and a new disruption comes out and you've got the internet, you've got the cloud, you've got an AI and VR layer. You've got, you've got machine intelligence. You've got now gaming. There's a new matrix, emerging, super cloud. Metaverse there's something happening out there here. That's not just your, your father's SAS or is or pass. Well, >>No, it's also the spend too. Right? So if I'm a company like say capital one or Goldman Sachs, my it spend has traditionally been massive every year. Yes. It's basically like tons of CapEx comes the cloud. It's an operating expense. Wait a minute, Amazon has all the CapEx. So I'm not gonna dial down my budget. I want a competitive advantage. So next thing they know they have a super cloud by default because they just pivoted their, it spend into new capabilities that they then can sell to the market in FinTech makes total sense. >>Right? They're building out a digital platform >>That would, that was not possible. Pre-cloud >>No, it wasn't cause you weren't gonna go put all that money into CapEx expenditure to build that out. Not knowing whether or not the market was there, but the scalability, the ability to spend, reduce and be flexible with it really changes that paradigm entire. >>So we're looking at this market now thinking about, okay, it might be Greenfield in every vertical. It might have a power law where you have a head of the long tail. That's a player like a capital one, an insurance. It could be Liberty mutual or mass mutual that has so much it and capital that they're now gonna scale it into a super cloud >>And they have data >>And they have the data tools >>And the tools. And they're gonna bring that to their constituents. Yes, yes. And scale it using >>Cloud. So that means they can then service the entire vertical as a service provider. >>And the industry cloud is becoming bigger and bigger and bigger. I mean, that's really a way that people are delivering to market. So >>Remember in the early days of cloud, all the banks thought they could build their own cloud. Yeah. Yep. Well actually it's come full circle. They're like, we can actually build a cloud on top of the cloud. >>Right. And by the way, they can have a private cloud in their super cloud. Exactly. >>And you know, it's interesting cause we're talking about financial services insurance, all the people we know spend money in our macro survey. Do you know the, the sector that's spending the most right now? It's gonna shock you energy utilities. Oh yeah. I was gonna, the energy utilities industry right now is the one spending the most money I saw largely cuz they're playing ketchup. But also because they don't have these type of things for their consumers, they need the consumer app. They need to be able to do that delivery. They need to be able to do metrics. And they're the they're, they're the one spending right >>Now it's an arms race, but the, the vector shifts to value creation. So >>It's it just goes back to your post when it was a 2012, the trillion dollar baby. Yeah. It's a multi-trillion dollar baby that they, >>The world was going my chassis post on Forbes, headline trillion dollar baby 2012. You know, I should add it's happening. That's >>On the end. Yeah, exactly. >>Trillions of babies, Eric. Great to have you on the key. >>Thank you so much guys. >>Great to bring the data. Thanks for sharing. Check out ETR. If you're into the enterprise, want to know what's going on. They have a unique approach, very accurate in their survey data. They got a great market basket of, of, of, of, of data questions and people and community. Check it out. Thanks for coming on and sharing with. >>Thank you guys. Always enjoy. >>We'll be back with more coverage here in the cube in New York city live at summit 22. I'm John fur with Dave ante. We'll be right back.

Published Date : Jul 12 2022

SUMMARY :

Great to have you on the cube. I really appreciate the collaboration always. And by the way, And I can get it to some of the macro data in a minute, if that's all right. For example, we, we certainly saw, you know, Walmart, other retailers, So going back to that larger macro data, You seeing people move to Azure, you got Charlie bell over there, And I think that's an important caveat to make, Is there any insight into any underlying conditions that might be there on AWS And the number two answer the last, you know, quarter, last survey to 60%. And I remember, you know, when I first started doing this 10 years ago, AWS at a 70%, And so the question everybody's asking is will that change? I think that's why we're seeing it because you have to be in And so that is probably causing some friction and complexity in the customer base that again, And then you got big query from Google big Yep. What's the data say, say to us? So when you talk about who's taking that legacy market So legacy goes to legacy. But at the end of the day, what we're seeing from snowflake They are the masters at go to market. And if you compare the two shows, it's clear, who's winning snowflake is blew away Yeah. So I'd be curious to see where they are. And their product first company, yes. I mean data bricks, open source, bring in this tool that too, now you are seeing snowflake capitulate I mean, AWS has the S three and then, He finds the right. They did the stream lit acquisitions stream. I'm really nervous that they have to over factor the they're building data lakes and you know, snow. And I was gonna say, that's the thing with data bricks. I think, you know, to your point about the competition between AWS And I think I saw just the other day, In the cloud, what data bricks did would spark And that's something that when you talk about real time And I think but it could eventually from an economic standpoint, seep back into the enterprise arm base, They have to leapfrog though. Well, think they're trying to, that's what Capella is all about was not only, you know, Real quick on the numbers. So And now they're out there in the cloud as well. They got two, they got an S3 competitor. wow, those two would just be an absolute fantastic, you know, combination between the two of them. Yeah. And you're going and you got the edge and you got the edge So the emergence of this super So I've been following his lead on this and I'm looking forward to you guys doing that conference and that summit coming up from a I think, you know, when you talk about things like super cloud and you talk about things like metaverse, Wait a minute, Amazon has all the CapEx. No, it wasn't cause you weren't gonna go put all that money into CapEx expenditure to build that out. It might have a power law where you have a head of the long tail. And they're gonna bring that to their constituents. So that means they can then service the entire vertical as a service provider. And the industry cloud is becoming bigger and bigger and bigger. Remember in the early days of cloud, all the banks thought they could build their own cloud. And by the way, they can have a private cloud in their super cloud. And you know, it's interesting cause we're talking about financial services insurance, all the people we know spend money in So It's it just goes back to your post when it was a 2012, the trillion dollar baby. You know, I should add it's happening. On the end. Great to bring the data. Thank you guys. We'll be back with more coverage here in the cube in New York city live at summit 22.

ENTITIES

Entity	Category	Confidence
Matt Carroll	PERSON	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Eric	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Theresa Carlson	PERSON	0.99+
Eric Bradley	PERSON	0.99+
Erik Bradley	PERSON	0.99+
AWS	ORGANIZATION	0.99+
two	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Sean	PERSON	0.99+
70%	QUANTITY	0.99+
two shows	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
25 billion	QUANTITY	0.99+
Frank sluman	PERSON	0.99+
60%	QUANTITY	0.99+
two players	QUANTITY	0.99+
2020	DATE	0.99+
June	DATE	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Vegas	LOCATION	0.99+
Akamai	ORGANIZATION	0.99+
last year	DATE	0.99+
CapEx	ORGANIZATION	0.99+
Apache	ORGANIZATION	0.99+
New York	LOCATION	0.99+
2019	DATE	0.99+
20 years	QUANTITY	0.99+
6.5%	QUANTITY	0.99+
10,000 people	QUANTITY	0.99+
auDA	ORGANIZATION	0.99+
John fur	PERSON	0.99+
first	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
1200 people	QUANTITY	0.99+
Capella Mongo	ORGANIZATION	0.99+
less than 40%	QUANTITY	0.99+
less than 3000 people	QUANTITY	0.99+
10 years ago	DATE	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
CloudFlare	TITLE	0.99+
HPE	ORGANIZATION	0.99+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

AWS Heroes Panel | AWS Startup Showcase S2 E2 | Data as Code

>>Hi, everyone. Welcome to the cubes presentation of the AWS startup showcase the theme. This episode is data as code, and this is season two, episode two of the ongoing series covering exciting startups from the ecosystem in cloud and the future of data analytics. I'm your host, John furry. You're getting great featured panel here with AWS heroes, Lynn blankets, the CEO of Lindbergh Lega consulting, Peter Hanson's, founder of cloud Cedar and Alex debris, principal of debris advisory. Great to see all of you here and, uh, remotely and look forward to see you in person at the next re-invent or other event. >>Thanks for having us. >>So Lynn, you're doing a lot of work in healthcare, Peter you're in the middle of all the action as data as code Alex. You're in deep on the databases. We've got a good round up of, of topics here ranging from healthcare to getting under the hood on databases. So as we'll start with you, what are you working on right now? What trends do you see in the database space? >>Yeah, sure. So I do, uh, I do a lot of consulting work working with different people and, you know, often with, with dynamo DB or, or just general serverless technology type stuff. Um, if you want to talk about trends that I'm seeing right now, I would say trends you're seeing as a lot, just more serverless native databases or cloud native databases where you're seeing these cool databases come out that really take advantage of, uh, this new cloud environment, right? Where you have scalability, you have plasticity of the clouds. So you're not having, you know, instant space environments anymore. You're paying for capacity, you're paying for throughput. You're able to scale up and down. You're not managing individual instances. So a lot of cool stuff that we're seeing, you know, um, with this new generation of, of infrastructure and in particular database is taking advantage of this, this new cloud world >>And really lot deep into the database side in terms of like cloud native impact, diversity of database types, when to use certain databases that also a big deal. >>Yeah, absolutely. I like, I totally agree. I love seeing the different types of databases and, you know, AWS has this whole, uh, purpose-built database strategy. And I think that, that makes a lot of sense. Um, you know, I want to go too far with it. I would, I would more think about purpose-built categories and things like that, you know, specialize in an OLTB database within your, within your organization, whether that's dynamo DB or document DB or relational database Aurora or something like that. But then also choose some sort of analytics database, you know, if it's drew it or Redshift or Athena, and then, you know, if you have some specialized needs, you want to show some real time stuff to your users, check out rock site. If you want to, uh, you know, do some graph analytics, fraud detection, checkout tiger graph, a lot of cool stuff that we're seeing from the startup showcase here. >>Looking forward to unpacking that Lynn you've been in love now, a healthcare action with cloud ops, the pandemic pushes hard core on everybody. What are you working on? >>Yeah, it's all COVID data all the time. Uh, before the pandemic, I was supporting research groups for cancer genomics, which I still do, but, um, what's, uh, impactful is the explosive data volumes. You know, when you there's big data and there's genomic data, you know, I've worked with clients that have broken data centers, broken public cloud provider data centers because of the daily volume they're putting in. So there's this volume aspect. And then there's a collaboration, particularly around COVID research because of pandemic. And so you have this explosive volume, you have this, um, need for, uh, computational complexity. And that means cloud the challenge is it, you know, put the pedal to the metal. So you've got all these bioinformatics researchers that are used to single machine. Suddenly they have to deal with distributed compute. So it's a wild time to be in this space. >>What was the big change that you've seen with the, uh, the pandemic and in genomic cloud genomic specifically what's the big change has happened. >>The amount of data that is being put into the public cloud, um, previously people would have their data on their local, uh, capacity, and then they would publish their paper and the data may or may not become available for, uh, reproducing the research, uh, to accelerate for drug discovery and even variant identification. The data sets are being pushed to public cloud repositories, which is a whole new set of concerns. You have not only dealing with the volume and cost, but security, you know, there's federated security is non-trivial and not well understood by this domain. So there's so much work available here. >>Awesome. Peter, you're doing a lot with the data as a platform kind of view and platform engineering data as code is, is something that's being kicked around. What are you working on and how does platform engineering change as data becomes so much more prevalent in its value proposition? >>Yeah. So I'm the founder of cloud Cedar and, um, we sort of built this company out, this consultancy all around the challenges that a lot of companies have got with getting their data sorted, getting it organized, getting it ready for other use cases, such as analytics and machine learning, um, AI workloads and the like. So typically a platform engineering team will look after the organization of a company infrastructure, making sure that it's coherent across the company and a data platform, engineering teams doing something similar in that sense where they're, they're looking at making sure that, uh, data teams have a solid foundation to build upon, uh, that everything's quite predictable and what that enables is a faster velocity and the ability to use data as code as a way of specifying and onboarding data, building that, translating it, transforming it out into its specific domains and then on to data products. >>I have to ask you while you're here. Um, there's a big trend around data meshes right now. You're hearing, we've had a lot of stuff on the cube. Um, what are practical that people are using data mesh, first of all, is it relevant and how are people looking at this data mesh conversation? >>I think it becomes more and more relevant, uh, the bigger the organization that you're dealing with. So, you know, often times in the enterprise, you've got, uh, projects with timelines of five to 10 years often outlasting technology life cycles. The technology that you're building on is probably irrelevant by the time that you complete it. And what we're seeing is that data engineering teams and data teams more broadly, this organizational bottleneck and data mesh is all about, uh, breaking down that, um, bottleneck and decentralizing the work, shifting that work back onto, uh, development teams who oftentimes have got more of the context and a centralized data engineering team. And we're seeing a lot of, uh, Philocity increases as a result of that. >>It's interesting. There's so many different aspects of how data is changing the world. Lynn talks about the volume with the cloud and genomics. We're hearing data engineering at a platform level. You're talking about slicing and dicing and real-time information. You mentioned rock set, Alex. So I'd like to ask each of you to answer this next question, which is how has the team dynamics changed with data engineering because every single company's impacted. So if you're researchers, Lynn, you're pumping more data into the cloud, that's got a little bit of data engineering to it. Do they even understand that is that impacting them? So how has data changed the responsibilities or roles in this new emerging area of data engineering or whatever you want to call it? Lynn, we'll start with you. What do you, what do you see this impact? >>Well, you know, I mean, dev ops becomes data ops and ML ops and, uh, you know, this is a whole emergent area of work and it starts with an understanding of container technologies, which, you know, in different verticals like FinTech, that's a given, right, but in bioinformatics building an appropriately optimized Docker container is something I'm still working with customers now on because they have the concept of a Docker container is just a virtual machine, which obviously it isn't, or shouldn't be. So, um, you have, again, as I mentioned previously, this humongous skill gap, um, concepts like D, which are prevalent in ad tech FinTech, that's not available yet for most of my customers. So those are the things that I'm building. So the whole ops space is, um, this a wide open area. And really it's a question of practicality. Um, you know, I have, uh, a lot of experience with data lakes and, you know, containerizing and using the data lake platform. But a lot of my customers are going to move to like an interim pass based solutions. If they're using spark, for example, they might use to use a managed spark solution as an interim, um, step up to the cloud before they build their own containers. Because the amount of knowledge to do that effectively is non-trivial >>Peter, you mentioned data, you mentioned data lakes, onboarding data into lake house architectures, for instance, something that you're familiar with. Um, this is not obvious to some verticals obvious to others. What do you see this data engineering impact from a personnel standpoint? And then ultimately how things get built, >>You know, are you directing that to me, >>Peter? >>Yeah. So I think, um, first and foremost, you know, the workload that data engineering teams are dealing with is ever increasing. Usually there's a 10 X ratio of, um, software engineers to data engineers within a business and usually double the amount of analysts to data engineers again. And so they're, they're fighting it ever increasing backload. And, uh, so they're fighting an ever increasing backlog of, of, uh, tasks to do and tickets to, to, to churn through. And so what we're seeing is that data engineering teams are becoming data platform engineering teams where they're building capability instead of constantly hamster wheels spinning if you will. And so with that in mind, with onboarding data into, uh, a Lakehouse architecture or a data lake where data engineering teams, uh, uh, getting wins is developing a very good baseline of structure where they're getting the categorization, the data tagging, whether this data is of a particular domain, does it contain some, um, PII data, for instance, uh, and, and, and, and then the security aspects, and also, you know, the mechanisms on which to do the data transformations, >>Alex, on the database side, those are known personas in an enterprise, a them, the database team, but now the scale is so big. Um, and there's so much going on in databases. How does the data engineering impact organizations from your standpoint? >>Yeah, absolutely. I think definitely, you know, gone are the days where you have a single relational database that is serving operational queries for your users, and you can also serve analytics queries, you know, for your internal teams. It's, it's now split up into those purpose-built databases, like we've said. Uh, but now you've got two different teams managing it and they're, they're designing their data model for different things. You know? So L LLTP might have a more de-normalized model, something that works for very fast operations and it's optimized for that, but now you need to suck that data out and get it elsewhere so that your, your PM or your business analyst, or whoever can crunch through some of that. And, you know, now it needs to be in a more normalized format. How do you sort of bridge that gap? That's a tough one. I think you need to, you know, build empathy on each side of, of what each side is doing and, and build the tools to say, Hey, this is going to help you, uh, you know, LLTP team, if we know what, what users are actually doing, and, and if you can get us into the right format there, so that then I can, you know, we can analyze it, um, on the backend. >>So I think, I think building empathy across those teams is helpful. >>When I left to come back to, you mentioned a health and informatics is coming back. Um, but it's interesting, you know, I look at a database world and you look at the solutions that are out there. A lot of companies that build data solutions don't have a data problem. They've never, they're not swimming in a lot of data, but then you look at like the field that you're working in right now with the genomics and health and, and quantum, they're always, they're dealing with data all the time. So you have people who deal with a lot of data all the time are breaking through New Zealand. People who are don't have that experience are now becoming data full, right? So people are now either it's a first time problem, or they've always been swimming in a ton of data. So it's more of what's the new playbook. And then, wow, I've never had to deal with a lot of data before. What's your take? >>It's interesting. Cause they know, uh, bioinformatics hires, um, uh, grad students. So grad students, you know, use their, our scripts with their file on their laptop. And so, um, to get those folks to understand distributed container-based computing is like I said, a not non-trivial problem. What's been really interesting with the money pouring in to COVID research is when I first started, some of the workflows would take, you know, literally 500 hours and that was just okay. And coming out of FinTech, I was, uh, I could, I was blown away like FinTech is like, could that please take a millisecond rather than a second? Right. And so what has now happened, which makes it, you know, like I said, even more fun to work in this domain is, uh, the research dollars have really gone up because of the pandemic. And so there are, there are, there's this blending of people like me with more of a big data background coming into bioinformatics and working side by side. >>So it's this interesting sort of translation because you have the whole taxonomy of bioinformatics with genomics and sequencers and all the weird file types that you get. And then you have the whole taxonomy of dev ops data ops, you know, containers and Kubernetes and all that. And trying to get that into pipelines that can actually, you know, be efficient, given the constraints. Of course, we, on the tech side, we always want to make it super optimized. I had a customer that we got it down from 500 hours to minutes, but they wanted to stay with the past solution because it was easier for them to go from 500 hours to five hours was good enough, but you know, the techies want to get it down to five minutes. >>This is, this is, we've seen this movie before dev ops, um, edge and op operations, you know, IOT, world scenes, the convergence of cultures. Now you have data and then old, old school operations kind of coming up. So this kind of supports the thesis. That data as code is the next infrastructure as code. What do you guys, what's the reaction there for you guys? What do you think about that? What does data's code mean? If infrastructure's code was cloud and dev ops, what is data as code? What does that mean? >>I could take it if you like. I think, um, data teams, organizations, um, have been long been this bottleneck within the organization and there's like this dark matter of untapped energy and potential waiting to be unleashed a data with the advent of open source projects like DBT, um, have been slowly sort of embracing software development, lifecycle practices. And this is really sort of seeing a, a big steep increase in, um, in their velocity. And, and this is only going to increase and improve as we're seeing data teams, um, embrace starter as code. I think it's, uh, the future is bright for data. So I'm very excited. >>Lynn Peter reaction. I mean, agility data is code is developer concept CICB pipeline. You mentioned it new operational workflows coming into traditional operations reaction. >>Yeah. I mean, I think Peter's right on there. I'd say, you know, some of those tools we're seeing come in from, from software, like, like DBT, basically giving you that infrastructure as code, but applied to that data realm. Also there have been a few, like get for data type things, pack a derm, I believe is one and a few other ones where you bring that in and you also see a lot of immutability concepts flowing into the data realm. So I think just seeing some of those software engineering concepts come over to the data world has, has been pretty interesting >>What we'll literally just versioning datasets and the identification of what's in a data set. What's not in a data set. Some of this is around ethical AI as well, um, is a whole, uh, area that has come out of research groups. Um, mostly AI research groups, but is being applied to medical data and needs to be obviously, um, so this, this, this, um, metadata and versioning around data sets is really, I think, a very of the moment area. >>Yeah, I think we, we, you guys are bringing up a really good kind of direction that's happening in data. And that is something that you're seeing on the software side, open source and now dev ops. And now going to data is that the supply chain challenges of we've been talking about it here on the cube and this, this, um, this episode is, you know, we've seen Ukraine war, but some open source, you know, malware hitting datasets is data secure. What is that going to look like? So you starting to get into this what's the supply chain, is it verified data sets if data sets have to be managed a whole nother level of data supply chain comes up, what do you guys think about that? >>I'll jump in. Oh, sorry. I'll jump in again. I think that, you know, there's, there's, um, some, some of the compliance requirements, um, around financial data are going to be applied to other types of data, probably health data. So immutability reproducibility, um, that is, uh, legally required. Um, also some of the privacy requirements that originated in Europe with GDPR are going to be replicated as more and more, um, types of data. And again, I'm always going to speak for health, but there's other types as well coming out of personal devices and that kind of stuff. So I think, you know, this idea of data as code is it's, it goes down to versioning and controlling and, um, that's, uh, that's sort of a real succinct way to say it that we didn't used to think about that. We just put it in our, you know, relational database and we were good to go, but, um, versioning and controlling in the global ecosystem is kind of, uh, where I'm focusing my efforts. >>It brings up a good question. If databases, if data is going to be part of the development process has to be addressable, which means horizontally scalable. That means it has to be accessible and open. How do you make that work and not foreclose it with a lot of restrictions? >>I think the use of data catalogs and appropriate tagging and categorization, you know, I think, you know, everyone's heard of the term data swamp, and I think that just came about because that everyone saw like, oh, wow, S3, you know, infinite storage. We just, you know, throw whatever in there for as long as we want. And I think at times, you know, the proliferation of S3 buckets, um, and the like, you know, we've just seen, uh, perhaps security, not maintained as well as it could have been. And I think that's kind of where data platform engineering teams have really sort of, uh, come into the, for, you know, creating a governance set of buckets like formation on top. But I think that's kind of where we need to see a lot more work with appropriate tags and also the automatic publishing of metadata into data catalogs so that, um, folks can easily search and address particular data sets and also control the access. You know, for instance, you've got some PII data, perhaps really only your marketing folks should be looking at email addresses and the like not perhaps your finance folks. So I think, you know, there's, there's a lot to be leveraged there in formation and other solutions, >>Alex, let's back up and talk about what's in it for the customer, right. Let's zoom back and saying reality is I just got to get my data to make sure it's secure always on and not going to be hackable. And I just got to get my data available on river performance. So then, then I got to start thinking about, okay, how do I intersect it? So what should teams be thinking about right now as I look up all their data options or databases across their enterprise? >>Yeah, it's, it's a, it's a good question. I just, you know, I think Peter made some good points there and you can think of history as sort of ebbing and flowing between centralization and decentralization a lot of times. And you know, when storage was expensive, data was going to be sort of centralized and Maine maintained, sort of a, you know, by the, uh, the people that are in charge of it. But then when, when S3 comes along, it really decreases storage. Now we can do a lot more experiments on it. We can store a lot more of our data, keep it around and do different things on it. You know, now we've got regulations again, we were, we gotta, we gotta be more realistic about, about keeping that data secure and make sure we're, we're doing the right things with it. So it's, we're gonna probably go through a period of, of centralization as we work out some of this tooling around, you know, tagging and, and ethical AI that, that both Peter. And when we're talking about here and maybe get us into that, that next wearable world of de-centralization again. But I, I think that ebb and flow is going to be natural in response to, you know, the problems of the, the other extreme, >>Where are we in the market right now from progress standpoint, because data lakes don't want to be data swamps. You seeing lake formation as a data architecture, as an example, where are we with customers? What are they doing right now? Where would you put them in the progress bar of, of evolution towards the Nirvana of having this data sovereignty? And this data is code environment. Are they just now in the data lake store, everything real-time and historical? >>Well, I can jump in there. Um, SQL on files is the, is the driver. And so we know when Amazon got Athena, um, that really drove a lot of the customers to really realistically look at data lake technologies, but data warehouses are not going away. And the integration between the two is not seamless. No, we, we are partners with AWS, but we don't work for them. So we can tell you the truth here. Um, there's, there's work to it, but it really, for my customers, it really upped the ante around data lake, uh, because Athena and technologies like that, the serverless, um, SQL queries or the familiar quarry, um, uh, libraries really drove a movement away from either OLTB or OLAP, more expensive, more cumbersome structures, >>But they still need that. Oh, LTP, like if they have high latency issues, they want to be low latency. Can they have the best of both worlds? That's the question. >>I mean, I w I would say we're getting, you know, we're getting closer. We're always going to be, uh, you know, that technology is going to be moving forward, and then we'll just move the goalpost again, in terms of, of what we're asking from it. But I think, you know, the technology that's getting out there, you can get, get really well. And then, you know, just what I work in the dynamo DB world. So you can get really great low latency. So, you know, single digit millisecond LLTP response times on that. I think some of the analytics stuff has been a problem with that. And there, there are different solutions out there to where you can export dynamo to S3, and then you can be doing SQL on your FA your files with Athena Lakeland's talking about, or now you see, you know, rock set of partner here that that'll just ingest your dynamo, DB data, you know, make all those changes. So if you're doing a lot of, uh, changes to your data and dynamo is going to reflect in Roxanna, and then you can do analytics queries, you can do complex filters, different things like that. So, you know, I, I think we continue to push the envelope and then we moved the goalpost again. But, um, you know, I think we're in a, a lot better place than we were a few years ago, for sure. >>Where do you guys see this going relative to the next level? If data as code becomes that next agile, um, software defined environment with open source? Well, all of these new tools with serverless things happening with data lakes are built in with nice architectures with data warehouses, where does it go next? What happens next? If this becomes an agile environment, what's the impact? >>Well, I don't want to be so dominant, but I have, I feel strongly, so I'm going to jump in here. So, so I, um, I feel like, you know, now for my, my, my most computationally intensive workloads, I'm using GPS, I'm bursting to GPU for TensorFlow neural networks. So I've been doing quite a bit of exploration around Amazon bracket for QPS and it's early. Um, and it's specialty. It's not, you know, for everybody. And the learning curve again is pretty daunting, but, um, there are some use cases out there. I mean, I got ahold of a paper where some people did some, um, it was a Q CNN, um, quantum convolutional neural network for lung cancer images, um, from COVID patients and the, the, uh, the QP Hugh, um, algorithm pipeline performed more accurately and faster. So I think, um, bursting to quantum is something to pay attention to. >>Awesome. Peter, what's your take on what's next? >>Well, I think there's still, um, that, that was absolutely fascinating from Lynn, but I think also there's, there's, uh, you know, some more sort of low-level, uh, low-hanging fruit available in, in the data stack. I think there's a lot of, there's still a lot of challenges around the transformation there, getting our data from sort of raw landed data into business domains, and that sort of talks to a lot of what data mesh is all about. I think if we can somehow make that a little more frictionless, because that that's really where the like labor intensive work is. That's, that's kinda dominating, uh, data engineering teams and where we're sort of trying to push that, that workload back onto, um, you know, software engineering teams. >>Alice will give you the final word. What's the impact. What's the next step? What's it look like in the future? >>Yeah, for sure. I mean, I've never had the, uh, breaking a data center problem that wind's had, or the bursting the quantum problem, for sure. But, you know, if you're in that, you know, the pool I swim and of terabytes of data and below and things like that, I think it's a good time. It just like we saw, you know, like we were talking about dev ops and, and pushing, uh, you know, allowing software engineers to handle more of, of the operation stuff. I think the same thing with data can happen where, you know, software engineering teams can handle not just their code, not just, you know, deploying and operating it, but also thinking about their data around the code. And that doesn't mean you won't have people assist you within your organization. You won't have some specialists in there, but I think pushing more stuff, even onto the individual development teams where they have ownership of that. And they're thinking about it through all this different life cycle. I mean, I'm pretty bullish on that. And I think that's an exciting development >>Was that shift, what left with left is security. What does that mean to >>Shipped so much stuff left, but now, you know, the things that were at the end are back at the end again, but, uh, you know, at least we think we can think about that stuff early in the process, which is good, >>Great conversation, very provocative, very realistic and great impact on the future data as code is real, the developers I do believe will have a great operational role and the data stack concept and impacting things like quantum, it's all kind of lining up nicely. Um, and it's a great opportunity to be in this field from a science and policy standpoint. Um, data engineering is legit. It's going to continue to grow and thanks for unpacking that here on the queue. Appreciate it. Okay. Great panel D AWS heroes. They work with AWS and the ecosystem independently out there. They're in the trenches doing the front lines, cracking the code here with data as code season two, episode two of the ongoing series of the 80, but startups I'm John for your host. Thanks for watching.

Published Date : Apr 5 2022

SUMMARY :

remotely and look forward to see you in person at the next re-invent or other event. What trends do you see in the database space? So I do, uh, I do a lot of consulting work working with different people and, you know, often with, And really lot deep into the database side in terms of like cloud native impact, diversity of database and then, you know, if you have some specialized needs, you want to show some real time stuff to your users, check out rock site. What are you working on? you know, put the pedal to the metal. What was the big change that you've seen with the, uh, the pandemic and in genomic cloud genomic specifically but security, you know, there's federated security is non-trivial and not well understood What are you working on and how does making sure that it's coherent across the company and a data platform, I have to ask you while you're here. So, you know, often times in the enterprise, you've got, uh, projects with So I'd like to ask each of you to answer this next question, which is how has the team dynamics Um, you know, I have, uh, a lot of experience with data lakes and, you know, containerizing and using What do you see this data engineering impact from a personnel standpoint? and then the security aspects, and also, you know, the mechanisms How does the data engineering impact organizations from your standpoint? I think definitely, you know, gone are the days where you have a single relational database that is serving but it's interesting, you know, I look at a database world and you look at the solutions that are out there. which makes it, you know, like I said, even more fun to work in this domain is, uh, the research dollars have really for them to go from 500 hours to five hours was good enough, but you know, edge and op operations, you know, IOT, world scenes, I could take it if you like. I mean, agility data is code is developer concept CICB I'd say, you know, some of those tools we're seeing come in from, from software, to be obviously, um, so this, this, this, um, metadata and versioning around you know, we've seen Ukraine war, but some open source, you know, malware hitting datasets I think that, you know, there's, there's, um, How do you make that work and not foreclose it with a lot of restrictions? So I think, you know, there's, there's a lot to be leveraged there in formation And I just got to get my data available on river performance. But I, I think that ebb and flow is going to be natural in response to, you know, the problems of the, Where would you put them in the progress bar of, of evolution towards the So we can tell you the truth here. the question. We're always going to be, uh, you know, that technology is going to be moving forward, so I, um, I feel like, you know, now for my, my, my most computationally intensive Peter, what's your take on what's next? but I think also there's, there's, uh, you know, some more sort of low-level, Alice will give you the final word. I think the same thing with data can happen where, you know, software engineering teams can handle What does that mean to Um, and it's a great opportunity to be

ENTITIES

Entity	Category	Confidence
Lynn	PERSON	0.99+
Peter	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
New Zealand	LOCATION	0.99+
Peter Hanson	PERSON	0.99+
five hours	QUANTITY	0.99+
500 hours	QUANTITY	0.99+
five	QUANTITY	0.99+
Alex	PERSON	0.99+
two	QUANTITY	0.99+
Alice	PERSON	0.99+
each side	QUANTITY	0.99+
Lynn Peter	PERSON	0.99+
each	QUANTITY	0.99+
Athena Lakeland	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
John	PERSON	0.99+
pandemic	EVENT	0.98+
FinTech	ORGANIZATION	0.98+
GDPR	TITLE	0.98+
first	QUANTITY	0.98+
both	QUANTITY	0.98+
both worlds	QUANTITY	0.97+
single machine	QUANTITY	0.96+
10 years	QUANTITY	0.96+
first time	QUANTITY	0.96+
10 X	QUANTITY	0.96+
CICB	ORGANIZATION	0.94+
single	QUANTITY	0.94+
John furry	PERSON	0.93+
Lynn blankets	PERSON	0.93+
80	QUANTITY	0.91+
Lindbergh Lega consulting	ORGANIZATION	0.9+
LLTP	ORGANIZATION	0.89+
one	QUANTITY	0.87+
two different teams	QUANTITY	0.87+
terabytes	QUANTITY	0.86+
S3	TITLE	0.81+
COVID	ORGANIZATION	0.79+
Alex	TITLE	0.78+
Lakehouse	ORGANIZATION	0.77+
few years ago	DATE	0.77+
a millisecond	QUANTITY	0.77+
single digit	QUANTITY	0.76+
D AWS	ORGANIZATION	0.76+
Startup Showcase S2 E2	EVENT	0.73+
a second	QUANTITY	0.73+
Kubernetes	TITLE	0.72+
Athena	ORGANIZATION	0.71+
season two	QUANTITY	0.7+
SQL	TITLE	0.69+
OLTB	ORGANIZATION	0.69+
Redshift	ORGANIZATION	0.69+
CNN	ORGANIZATION	0.68+
Cedar	ORGANIZATION	0.66+
Hugh	PERSON	0.66+
dynamo	ORGANIZATION	0.65+
episode	QUANTITY	0.63+
Q	ORGANIZATION	0.63+
episode two	OTHER	0.6+
Maine	LOCATION	0.6+

Wen Phan, Ahana & Satyam Krishna, Blinkit & Akshay Agarwal, Blinkit | AWS Startup Showcase S2 E2

(gentle music) >> Welcome everyone to theCUBE's presentation of the AWS Startup Showcase. The theme is Data as Code; The Future of Enterprise Data and Analytics. This is the season two, episode two of the ongoing series of covering the exciting startups in the AWS ecosystem around data analytics and cloud computing. I'm your host, John Furrier. Today we're joined by great guests here. Three guests. Wen Phan, who's a Director of Product Management at Ahana, Satyam Krishna, Engineering Manager at Blinkit, and we have Akshay Agarwal, Senior Engineer at Blinkit as well. We're going to get into the relationship there. Let's get into. We're going to talk about how Blinkit's using open data lake, data house with Presto on AWS. Gentlemen, thanks for joining us. >> Thanks for having us. >> So we're going to get into the deep dive on the open data lake, but I want to just quickly get your thoughts on what it is for the folks out there. Set the table. What is the open data lakehouse? Why it is important? What's in it for the customers? Why are we seeing adoption around this because this is a big story. >> Sure. Yeah, the open data lakehouse is really being able to run a gamut of analytics, whether it be BI, SQL, machine learning, data science, on top of the data lake, which is based on inexpensive, low cost, scalable storage. And more importantly, it's also on top of open formats. And this to the end customer really offers a tremendous range of flexibility. They can run a bunch of use cases on the same storage and great price performance. >> You guys have any other thoughts on what's your reaction to the lakehouse? What is your experience with it? What's going on with Blinkit? >> No, I think for us also, it has been the primary driver of how as a company we have shifted our completely delivery model from us delivering in one day to someone who is delivering in 10 minutes, right? And a lot of this was made possible by having this kind of architecture in place, which helps us to be more open-source, more... where the tools are open-source, we have an open table format which helps us be very modular in nature, meaning we can pick solutions which works best for us, right? And that is the kind of architecture that we want to be in. >> Awesome. Wen, you know last time we chat with Ahana, we had a great conversation around Presto, data. The theme of this episode is Data as Code, which is interesting because in all the conversations in these episodes all around developers, which administrators are turning into developers, there's a developer vibe with data. And with opensource, it's software. Now you've got data taking a similar trajectory as how software development was with code, but the people running data they're not developers, they're administrators, they're operators. Now they're turning into DataOps. So it's kind of a similar vibe going on with branches and taking stuff out of and putting it back in, and testing it. Datasets becoming much more stable, iterating on machine learning algorithm. This is a movement. What's your guys reaction before we get into the relationships here with you guys. But, what's your reaction to this Data as Code movement? >> Yeah, so I think the folks at Blinkit are doing a great job there. I mean, they have a pretty compact data engineering team and they have some pretty stringent SLAs, as well as in terms of time to value and reliability. And what that ultimately translates for them is not only flexibility but reliability. So they've done some very fantastic work on a lot of automation, a lot of integration with code, and their data pipelines. And I'm sure they can give the details on that. >> Yes. Satyam and Akshay, you guys are engineers' software, but this is becoming a whole another paradigm where the frontline coding and or work or engineer data engineering is implementing the operations as well. It's kind of like DevOps for data. >> For sure. Right. And I think whenever you're working, even as a software engineer, the understanding of business is equally important. You cannot be working on something and be away from business, right? And that's where, like I mentioned earlier, when we realized that we have to completely move our stack and start giving analytics at 10 minutes, right. Because when you're delivering in 10 minutes, your leaders want to take decisions in your real-time. That means you need to move with them. You need to move with business. And when you do that, the kind of flexibility these softwares give is what enables the businesses at the end of the day. >> Awesome. This is the really kind of like, is there going to be a book called agile data warehouses? I don't think so. >> I think so. (laughing) >> The agile cloud data. This is cool. So let's get into what you guys do. What is Blinkit up to? What do you guys do? Can you take a minute to explain the company and your product? >> Sure. I'll take that. So Blinkit is India's biggest 10 minute delivery platform. It pioneered the delivery model in the country with over 10 million Indian shopping on our platform, ranging from everything: grocery staples, vegetables, emergency services, electronics, and much more, right. It currently delivers over 200,000 orders every day, and is in a hurry to bring the future of farmers to everyone in India. >> What's the relationship with Ahana and Blinkit? Wen, what's the tie in? >> Yeah, so Blinkit had a pretty well formed stack. They needed a little bit more flexibility and control. They thought a managed service was the way to go. And here at Ahana, we provide a SaaS managed service for Presto. So they engaged us and they evaluated our offering. And more importantly, we're able to partner. As a early stage startup, we really rely on very strong partners with great use cases that are willing to collaborate. And the folks at Blinkit have been really great in helping us push our product, develop our product. And we've been very happy about the value that we've been able to deliver to them as well. >> Okay. So let's unpack the open data lakehouse. What is it? What's under the covers? Let's get into it. >> Sure. So if bring up a slide. Like I said before, it's really a paradigm on being able to run a gamut of analytics on top of the open data lake. So what does that mean? How did it come about? So on the left hand side of the slide, we are coming out of this world where for the last several decades, the primary workhorse for SQL based processing and reporting and dashboarding use cases was really the data warehouse. And what we're seeing is a shift due to the trends in inexpensive scalable storage, cloud storage. The proliferation of open formats to facilitate using this storage to get certain amounts of reliability and performance, and the adoption of frameworks that can operate on top of this cloud data lake. So while here at Ahana, we're primarily focused on SQL workloads and Presto, this architecture really allows for other types of frameworks. And you see the ML and AI side. And like to Satyam's point earlier, offers a great amount of flexibility modularity for many use cases in the cloud. So really, that's really the lakehouse, and people like it for the performance, the openness, and the price performance. >> How's the open-source open side of it playing in the open-source? It's kind of open formats. What is the open-source angle on this because there's a lot of different approaches. I'm hearing open formats. You know, you have data stores which are a big part of seeing that. You got SQL, you mentioned SQL. There's got a mishmash of opportunities. Is it all coexisting? Is it one tool to rule the world or is it interchangeable? What's the open-source angle? >> There's multiple angles and I'll let definitely Satyam add to what I'm saying. This was definitely a big piece for Blinkit. So on one hand, you have the open formats. And what really the open formats enable is multiple compute engines to work on that data. And that's very huge. 'Cause it's open, you're not locked in. I think the other part of open that is important and I think it was important to Blinkit was the governance around that. So in particular Presto is governed by the Linux Foundation. And so, as a customer of open-source technology, they want some assurances for things like how's it governed? Is the license going to change? So there's that aspect of openness that I think is very important. >> Yeah. Blinkit, what's the data strategy here with lakehouse and you guys? Why are you adopting this type of architecture? >> So adding to what... Yeah, I think adding to Wen said, right. When we are thinking in terms of all these OpenStacks, you have got these open table formats, everything which is deployed over cloud, the primary reason there is modularity. It's as simple as that, right. You can plug and play so many different table formats from one thing to another based on the use case that you're trying to serve, so that you get the most value out of data. Right? I'll give you a very simple example. So for us we use... not even use one single table format. It's not that one thing solves for everything, right? We use both Hudi and Iceberg to solve for different use cases. One is good for when you're working for a certain data site. Icebergs works well when you're in the SQL kind of interface, right. Hudi's still trying to reach there. It's going to go there very soon. So having the ability to plug and play different formats based on the use case helps you to grow faster, helps you to take decisions faster because you now you're not stuck on one thing. They will have to implement it. Right. So I think that's what it is great about this data lake strategy. Keeping yourself cost effective. Yeah, please. >> So the enablement is basically use case driven. You don't have to be rearchitecturing for use cases. You can simply plug can play based on what you need for the use case. >> Yeah. You can... and again, you can focus on your business use case. You can figure out what your business users need and not worry about these things because that's where Presto comes in, helps you stitch that data together with multiple data formats, give you the performance that you need and it works out the best there. And that's something that you don't get to with traditional warehouse these days. Right? The kind of thing that we need, you don't get that. >> I do want to add. This is just to riff on what Satyam said. I think it's pretty interesting. So, it really allowed him to take the best-of-breed of what he was seeing in the community, right? So in the case of table formats, you've got Delta, you've got Hudi, you've got Iceberg, and they all have got their own roadmap and it's kind of organic of how these different communities want to evolve, and I think that's great, but you have these end consumers like Blinkit who have different maybe use cases overlapping, and they're not forced to pick one. When you have an open architecture, they can really put together best-of-breed. And as these projects evolve, they can continue to monitor it and then make decisions and continue to remain agile based on the landscape and how it's evolving. >> So the agility is a key point. Flexibility and agility, and time to valuing with your data. >> Yeah. >> All right. Wen, I got to get in to why the Presto is important here. Where does that fit in? Why is Presto important? >> Yeah. For me, it all comes down to the use cases and the needs. And reporting and dashboarding is not going to go away anytime soon. It's a very common use case. Many of our customers like Blinkit come to us for that use case. The difference now is today, people want to do that particular use case on top of the modern data lake, on top of scalable, inexpensive, low cost storage. Right? In addition to that, there's a need for this low latency interactive ability to engage with the data. This is often arises when you need to do things in a ad hoc basis or you're in the developmental phase of building things up. So if that's what your need is. And latency's important and getting your arms around the problems, very important. You have a certain SLA, I need to deliver something. That puts some requirements in the technology. And Presto is a perfect for that ideal use case. It's ideal for that use case. It's distributed, it's scalable, it's in memory. And so it's able to really provide that. I think the other benefit for Presto and why we're bidding on Presto is it works well on the data lakes, but you have to think about how are these organizations maturing with this technology. So it's not necessarily an all or nothing. You have organizations that have maybe the data lake and it's augmented with other analytical data stores like Snowflake or Redshift. So Presto also... a core aspect is its ability to federate or connect and query across different data sources. So this can be a permanent thing. This could also be a transitionary thing. We have some customers that are moving and slowly shifting their data portfolio from maybe all data warehouse into 80% data lake. But it gives that optionality, it gives that ability to transition over a timeframe. But for all those reasons, the latency, the scalability, the federation, is why Presto for this particular use case. >> And you can connect with other databases. It can be purpose built database, could be whatever. Right? >> Sure. Yes, yes. Presto has a very pluggable architecture. >> Okay. Here's the question for the Blinkit team? Why did you choose Presto and what led you to Ahana? >> So I'll take this better, over this what Presto sits well in that reach is, is how it is designed. Like basically, Presto decouples your storage with the compute. Basically like, people can use any storage and Presto just works as a query engine for them. So basically, it has a constant connectors where you can connect with a real-time databases like Pinot or a Druid, along with your warehouses like Redshift, along with your data lake that's like based on Hudi or Iceberg. So it's like a very landscape that you can use with the Presto. And consumers like the analytics doesn't need to learn the SQL or different paradigms of the querying for different sources. They just need to learn a single source. And, they get a single place to consume from. They get a single consumer on their single destination to write on also. So, it's a homologous architecture, which allows you to put a central security like which Presto integrates. So it's also based on open architecture, that's Apache engine. And it has also certain innovative features that you can see based on caching, which reduces a lot of the cost. And since you have further decoupled your storage with the compute, you can further reduce your cost, because now the biggest part of our tradition warehouse is a storage. And the cost goes massively upwards with the amount of data that you've added. Like basically, each time that you add more data, you require more storage, and warehouses ask you to write the data in their own format. Over here since we have decoupled that, the storage cost have gone down. It's literally that your cost that you are writing, and you just pay for the compute, and you can scale in scale out based on the requirements. If you have high traffic, you scale out. If you have low traffic, you scale in. So all those. >> So huge cost savings. >> Yeah. >> Yeah. Cost effectiveness, for sure. >> Cost effectiveness and you get a very good price value out of it. Like for each query, you can estimate what's the cost for you based on that tracking and all those things. >> I mean, if you think about the other classic Iceberg and what's under the water you don't know, it's the hidden cost. You think about the tooling, right, and also, time it takes to do stuff. So if you have flexibility on choice, when we were riffing on this last time we chatted with you guys and you brought it up earlier around, you can have the open formats to have different use cases in different tools or different platforms to work on it. Redshift, you can use Redshift here, or use something over there. You don't have to get locking >> Absolutely. >> Satyam & Akshay: Yeah. >> Locking is a huge problem. How do you guys see that 'cause sounds like here there's not a lot of locking. You got the open formats, and you got choice. >> Yeah. So you get best of the both worlds. Like you get with Ahana or with the Presto, you can get the best of the both worlds. Since it's cloud native, you can easily deploy your clusters very easily within like five minutes. Your cluster is up, you can start working on it. You can deploy multiple clusters for multiple teams. You get also flexibility of adding new connectors since it's open and further it's also much more secure since it's based on cloud native. So basically, you can control your security endpoints very well. So all those things comes in together with this architecture. So you can definitely go more on the lakehouse architecture than warehousing when you want to deliver data value faster. And basically, you get the much more high value out of your data in a sorted template. >> So Satyam, it sounds like the old warehousing was like the application person, not a lot of usage, old, a lot of latency. Okay. Here and there. But now you got more speed to deploy clusters, scale up scale down. Application developers are as everyone. It's not one person. It's not one group. It's whenever you want. So, you got speed. You got more diversity in the data opportunities, and your coding. >> Yeah. I think data warehouses are a way to start for every organization who is getting into data. I don't think data warehousing is still a solution and will be a solution for a lot of teams which are still getting into data. But as soon as you start scaling, as you start seeing the cost going up, as you start seeing the number of use cases adding up, having an open format definitely helps. So, I would say that's where we are also heading into and that's how our journey as well started with Presto as well, why we even thought about Ahana, right. >> (John chuckles) >> So, like you mentioned, one of the things that happened was as we were moving to the lakehouse and the open table format, I think Ahana is one of the first ones in the market to have Hudi as a first class citizen completely supported with all the things which are not even present at the time of... even with Presto, right. So we see Ahana working behind the scenes, improving even some of the things already over the open-source ecosystem. And that's where we get the most value out of Ahana as well. >> This is the convergence of open-source magic and commercialization. Wen, because you think about Data as Code, reminds me, I hear, "Data warehouse, it's not going to go away." But you got cloud scale or scale. It reminds me of the old, "Oh yeah, I have a data center." Well, here comes the cloud. So, doesn't really kill the data center, although Amazon would say that the data center's going to be eliminated. No, you just use it for whatever you need it for. You use it for specific use cases, but everyone, all the action goes to the cloud for scale. The same things happen with data, and look at the open-source community. It's kind of coming together. Data as Code is coming together. >> Yeah, absolutely. >> Absolutely. >> I do want to again to connect on another dot in terms of cost and that. You know, we've been talking a little bit about price performance, but there's an implicit cost, and I think this was also very important to Blinkit, and also why we're offering a managed service. So one piece of it. And it really revolves around the people, right? So outside of the technology, the performance. One thing that Akshay brought up and it's another important piece that I should have highlighted a little bit more is, Presto exposes the ability to interact your data in a widely adopted way, which is basically ANSI SQL. So the ability for your practitioners to use this technology is huge. That's just regular Presto. In terms of a managed service, the guys at Blinkit are a great high performing team, but they have to be very efficient with their time and what they manage. And what we're trying to do is provide leverage for them. So take a lot of the heavy lifting away, but at the same time, figuring out the right things to expose so that they have that same flexibility. And that's been the balancing point that we've been trying to balance at Ahana, but that goes back to cost. How do I total cost of ownership? And that not doesn't include just the actual querying processing time, but the ability for the organization to go ahead and absorb the solution. And what does it cost in terms of the people involved? >> Yeah. Great conversation. I mean, this brings up the question of back in the data center, the cloud days, you had the concept of an SRE, which is now popular, site reliability engineer. One person does all the clusters and manages all the scale. Is the data engineer the new SRE for data? Are we seeing a similar trajectory? Just want to get your reaction. What do you guys think? >> Yes, so I would say, definitely. It depends on the teams and the sizes of that. We are high performing team so each automation takes bits on the pieces of the architecture, like where they want to invest in. And it comes out with the value of the engineer's time and basically like how much they can invest in, how much they need to configure the architecture, and how much time it'll take to time to market. So basically like, this is what I would also highlight as an engineer. I found Ahana like the... I would say as a Presto in a cloud native environment, or I think so there's the one in the market that seamlessly scales and then scales out. And further, with a team of us, I would say our team size like three to four engineers managing cluster day in day out, conferring, tuning and all those things takes a lot of time. And Ahana came in and takes it off our plate and the hands in a solution which works out of box. So that's where this comes in. Ahana it's also based on open-source community. >> So the time of the engineer's time is so valuable. >> Yeah. >> My take on it really in terms of the data engineering being the SRE. I think that can work, it depends on the actual person, and we definitely try to make the process as easy as possible. I think in Blinkit's case, you guys are... There are data platform owners, but they definitely are aware of the pipelines. >> John: Yeah. >> So they have very intimate knowledge of what data engineers do, but I think in their case, you guys, you're managing a ton of systems. So it's not just even Presto. They have a ton of systems and surfacing that interface so they can cater to all the data engineers across their data systems, I think is the big need for them. I know you guys you want to chime in. I mean, we've seen the architecture and things like that. I think you guys did an amazing job there. >> So, and to adding to Wen's point, right. Like I generally think what DevOps is to the tech team. I think, what is data engineer or the data teams are to the data organization, right? Like they play a very similar role that you have to act as a guardrail to ensure that everyone has access to the data so the democratizing and everything is there, but that has to also come with security, right? And when you do that, there are (indistinct) a lot of points where someone can interact with data. We have... And again, there's a mixed match of open-source tools that works well, as well. And there are some paid tools as well. So for us like for visualization, we use Redash for our ad hoc analysis. And we use Tableau as well whenever we want to give a very concise reporting. We have Jupyter notebooks in place and we have EMRs as well. So we always have a mixed batch of things where people can interact with data. And most of our time is spent in acting as that guardrail to ensure that everyone should have access to data, but it shouldn't be exploited, right. And I think that's where we spend most of our time in. >> Yeah. And I think the time is valuable, but that your point about the democratization aspect of it, there seems to be a bigger step function value that you're enabling and needs to be talked out. The 10x engineer, it's more like 50x, right? If you get it done right, the enablement downstream at the scale that we're seeing with this new trend is significant. It's not just, oh yeah, visualization and get some data quicker, there's actually real advantages on a multiple with that engineering. So, and we saw that with DevOps, right? Like, you do this right and then magic happens on the edges. So, yeah, it's interesting. You guys, congratulations. Great environment. Thanks for sharing the insight Blinkit. Wen, great to see you. Ahana again with Presto, congratulations. The open-source meets data engineering. Thanks so much. >> Thanks, John. >> Appreciate it. >> Okay. >> Thanks John. >> Thanks. >> Thanks for having us. >> This season two, episode two of our ongoing series. This one is Data as Code. This is theCUBE. I'm John furrier. Thanks for watching. (gentle music)

Published Date : Apr 1 2022

SUMMARY :

This is the season two, episode What is the open data lakehouse? And this to the end customer And that is the kind of into the relationships here with you guys. give the details on that. is implementing the operations as well. You need to move with business. This is the really kind of like, I think so. So let's get into what you guys do. and is in a hurry to bring And the folks at Blinkit the open data lakehouse. So on the left hand side of the slide, What is the open-source angle on this Is the license going to change? with lakehouse and you guys? So having the ability to plug So the enablement is and again, you can focus So in the case of table formats, So the agility is a key point. Wen, I got to get in and the needs. And you can connect Presto has a very pluggable architecture. and what led you to Ahana? And consumers like the analytics and you get a very good and also, time it takes to do stuff. and you got choice. best of the both worlds. like the old warehousing as you start seeing the cost going up, and the open table format, the data center's going to be eliminated. figuring out the right things to expose and manages all the scale. and the sizes of that. So the time of the it depends on the actual person, I think you guys did an amazing job there. So, and to adding Thanks for sharing the insight Blinkit. This is theCUBE.

ENTITIES

Entity	Category	Confidence
John Furrier	PERSON	0.99+
Wen Phan	PERSON	0.99+
Akshay Agarwal	PERSON	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ahana	PERSON	0.99+
India	LOCATION	0.99+
Blinkit	ORGANIZATION	0.99+
Satyam Krishna	PERSON	0.99+
Linux Foundation	ORGANIZATION	0.99+
Ahana	ORGANIZATION	0.99+
five minutes	QUANTITY	0.99+
Akshay	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 minutes	QUANTITY	0.99+
Three guests	QUANTITY	0.99+
Satyam	PERSON	0.99+
Blinkit	PERSON	0.99+
one day	QUANTITY	0.99+
10 minute	QUANTITY	0.99+
Redshift	TITLE	0.99+
both worlds	QUANTITY	0.99+
over 200,000 orders	QUANTITY	0.99+
Presto	PERSON	0.99+
over 10 million	QUANTITY	0.99+
SQL	TITLE	0.99+
10x	QUANTITY	0.99+
Wen	PERSON	0.98+
50x	QUANTITY	0.98+
agile	TITLE	0.98+
one piece	QUANTITY	0.98+
both	QUANTITY	0.98+
three	QUANTITY	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
single destination	QUANTITY	0.97+
One person	QUANTITY	0.97+
each time	QUANTITY	0.96+
each	QUANTITY	0.96+
Presto	ORGANIZATION	0.96+
one person	QUANTITY	0.96+
single source	QUANTITY	0.96+
Tableau	TITLE	0.96+
one tool	QUANTITY	0.96+
Icebergs	ORGANIZATION	0.96+
Today	DATE	0.95+
One	QUANTITY	0.95+
one thing	QUANTITY	0.95+

Mark Lyons, Dremio | CUBE Conversation

(bright upbeat music) >> Hey everyone. Welcome to this "CUBE Conversation" featuring Dremio. I'm your host, Lisa Martin. And I'm excited today to be joined by Mark Lyons the VP of product management at Dremio. Mark thanks for joining us today. >> Hey Lisa, thank you for having me. Looking forward to the top. >> Yeah. Talk to me about what's going on at Dremio. I had the chance to talk to your chief product officer Tomer Shiran in a couple months ago but talk to us about what's going on. >> Yeah, I remember that at re:Invent it's been an exciting few months since re:Invent here at Dremio and just in the new year we raised our Series E since then we ran into our subsurface event which we had over seven, 8,000 registrants and attendees. And then we announced our Dremio cloud product generally available including Dremio Sonar, which is SQL query engine and Dremio Arctic in public preview which is a better store for the lakehouse. >> Great. And we're going to dig into both of those. I saw that over 400 million raised in that Series E raising the valuation of Dremio to 2 billion. So a lot of growth and momentum going on at the company I'm sure. If we think about businesses in any industry they've made large investments in data warehouses, proprietary data warehouses. Talk to me about historically what they've been able to achieve, but then what some those bottlenecks are that they're running into. >> Yeah, for sure. My background is actually in the data warehouse space. I spent over the last eight, maybe close to 10 years and we've seen this shift go on from the traditional enterprise data warehouse to the data lake to the the last couple years is really been the time of the cloud data warehouse. And there's been a large amount of adoption of cloud data warehouses, but fundamentally they still come with a lot of the same challenges that have always existed with the data warehouse, which is first of all you have to load your data into it. So that data's coming from lots of different sources. In many cases, it's landing in a files in the data lake like a repository like S3 first. And then there's a loading process, right? An ETL process. And those pipelines have to be maintained and stay operational. And typically as the data warehouse life cycle of processing moves on the scope of the data that consumers get to access gets smaller and smaller. The control of that data gets tighter and change process gets heavier, and it goes from quick changes of adding a column or adding a field to a file to days if not weeks for businesses to modify their data pipelines and test new scenarios offer new features in the application or answer new questions that the business is interested you know, from an analytics standpoint. So typically we see the same thing even with these cloud data warehouses, the scope of the data shrinks, the time to get answers gets longer. And when new engines come along the same story we see, and this is going on right now in the data warehouse space there's new data that are coming and they say, well we're a thousand faster times faster than the last data warehouse. And then it's like, okay, great. But what's the process? The process is to migrate all your data to the new data warehouse, right? And that comes with all the same baggage. Again, it's a proprietary format that you load your data into. So I think people are ready for a change from that. >> People are not only ready for a change, but as every company has to become a data company these days and access to real time data is no longer a nice to have. It's absolutely essential. The ability to scale the ability to harness the value from as much data as possible and to do so fast is real really table stakes for any organization. How is Dremio helping customers in that situation to operationalize their data? >> Yeah, so that's why I was so intrigued and loved about Dremio when I joined three, four, five months back. Coming from the warehouse space, when I first saw the product I was just like, oh my gosh, this is so much easier for folks. They can access a larger scope of their data faster, which to your point, like is table stakes for all organizations these days they need to be able to analyze data sooner. Sooner is the better. Data has a halflife, right? Like it decays. The value of data decays over time. So typically the most valuable data is the newest data. And that all depends on what we're the industries we're talking about the types of data and the use cases, but it's always basically true that newer data is more valuable and they need to be able to analyze as much of it as possible. The story can't be, no, we have to wait weeks or months to get a new data source or the story can't be you know, that data that includes seasonality. You know, we weren't able to keep in the same location because it's too expensive to keep it in the warehouse or whatever. So for Dremio and our customers our story is simple, is leverage the data where it is so access data in all sorts of sources, whether it's a post press database or an S3 bucket, and don't move the data don't copy the data, analyze it in place. And don't limit the scope of the data you're trying to analyze. If you have new use cases you have additional data sets that you want to add to those use cases, just bring them in, into S3 and you are off to the races and you can easily analyze more data and give more power to the end user. So if there's a field that they want to calculate the simple change convert this miles field, the kilometers well, the end users should be empowered to just make a calculation on the data like that. That should not require an entire cycle through a data engineering team and a backlog and a ticket and pushing that to production and so forth which in many cases it does at many organizations. It's a lot of effort to make new calculations on the data or derive new fields, add a new column and so forth. So Dremio makes the data engineers life easier and more productive. It also makes the data consumers life much easier and happier, and they can just do their job without worrying about and waiting. >> Not only can they do their job but from a business, a high level perspective the business is probably has the opportunity to be far more competitive because it's got a bigger scope of data, as you mentioned, access to it more widely faster and those are only good things in terms of- >> More use cases, more experiments, right? So what I've seen a lot is like there's no shortage of ideas of what people can do with the data. And projects that might be able to be undertaken but no one knows exactly how valuable that will be. How whether that's something that should be funded or should not be funded. So like more use cases, more experiments try more things. Like if it's cheap to try these data problems and see if it's valuable to the business then that's better for the business. Ultimately the business will be more competitive. We'll be able to try more new products we'll be able to have better operational kind of efficiencies, lower risk all those things. >> Right. What about data governance? Talk to me about how the Lakehouse enables that across all these disparate data volumes. >> I think this is where things get really interesting with the Lakehouse concept relative to where we used to be with a data lake, which was a parking ground for just lots of files. And that came with a lot of challenges when you just had a lot of files out there in a data lake, whether that was HDFS, right. I do data lake back in the day or now a cloud storage object, storage data lake. So historically I feel like governance, access authentication, auditing all were extremely challenging with the data lake but now in the modern kind of lake in the modern lakehouse world, all those challenges have been solved. You have great everything from the front of the house with all and access policies and data masking everything that you would expect through commits and tables and transactions and inserts and updates and deletes, and auditing of that data able to see, well who made the changes to the data, which engine, which user when were they made and seeing the whole history of a table and not just one, not just a mess of files in a file store. So it's really come a long way. I feel like where the renaissance stage of the 2.0 data lakes or lakehouses as people call them. But basically what you're seeing is a lot of functionality from the traditional warehouse, all available in the lake. And warehouses had a lot of governance built in. And whether that is encryption and column access policies and row access policies. So only the right user saw the right data or some data masking. So that like the social security was masked out but the analyst knew it was a social security number. That was all there. Now that's all available on the lakehouse and you don't need to copy data into a data warehouse just to meet those type of requirements. Huge one is also deletes, right? Like I feel like deletes were one of the Achilles heels of the original data lake when there was no governance. And people were just copying data sets around modifying data sets for whatever their analytics use case was. If someone said, "Hey, go delete the right. To be forgotten GDPR." Now you've got Californias CCPA and others all coming online. If you said, go delete this per you know, this records or set of records from there from a lake original lake. I think that was impossible, probably for many people to do it with confidence, like to say that like I fully deleted this. Now with the Apache like iceberg cable format that is stores in the lakehouse architecture, you actually have delete functionality, right? Which is a key component that warehouses are traditionally brought to the table. >> That's a huge component from a compliance perspective. You mentioned GDPR, CCPA, which is going to be CPRA in less than a year, but there's so many other regulations data privacy regulations that are coming up that the ability to delete that is going to be table stakes for organizations, something that you guys launched. And we just have a couple minutes left, but you launched I love the name, the forever free data Lakehouse platform. That sounds great. Forever Free. Talk to me about what that really means is consisting of two products the Sonar and Arctic that you mentioned, but talk to me about this Forever Free data Lakehouse. >> Yeah. I feel like this is an amazing step forward in this, in the industry. And because of the Dremio cloud architecture, where the execution and data lives in the customer's cloud account we're able to basically say, hey, the Dremio software the Dremio service side of this platform is Forever Free for users. Now there is a paid tier but there's a standard tier that is truly forever free. Now that that still comes with infrastructure bills from like your cloud provider, right? So if you use AWS, you still have an S3 bill like for your data sets because we're not moving them. They're staying in your Amazon account in your S3 bucket. You still do still have to pay for right. The infrastructure, the EC2 and the compute to do the data analytics but the actual softwares is free forever. And there's no one else in our space offering that at in our space, everything's a free trial. So here's your $500 of credit. Come try my product. And what we're saying is with this kind of our unique architectural approach and this is what I think is preferred by customers too. You know, we take care of all the query planning all the engine management, all the administrative the platform, the upgrades fully available zero downtime platform. So they get all the benefits of SaaS as well as the benefits of maintaining control over their data. And because that data staying in their account and the execution of the analytics is staying in their account. We don't incur that infrastructure bill. So we can have a free forever tier a forever free tier of our platform. And we've had tremendous adoption. I think we announced this beginning of March first week of March. So it's not even the end of March. Hundreds and hundreds of signups and many customers actively are users actively on the platform now live querying their data >> Just kind of summarizes the momentum that Dremio we seeing. Mark, thank you so much. We're out of time, but thanks for talking to me- >> Thank you. >> About what's new at Dremio. What you guys are doing. Next time, we'll have to unpack this even more. I'm sure there's loads more we could talk about but we appreciate that. >> Yeah, this was great. Thank you, Lisa. Thank you. >> My pleasure for Mark Lyons. I'm Lisa Martin. Keep it right here on theCUBE your leader in high tech hybrid event coverage. (upbeat music)

Published Date : Mar 24 2022

SUMMARY :

the VP of product management at Dremio. Looking forward to the top. I had the chance to talk to and just in the new year of Dremio to 2 billion. the time to get answers gets longer. and to do so fast is and pushing that to Ultimately the business Talk to me about how the Lakehouse enables and auditing of that data able to see, that the ability to delete that and the compute to do the data analytics Just kind of summarizes the momentum but we appreciate that. Yeah, this was great. your leader in high tech

ENTITIES

Entity	Category	Confidence
Mark Lyons	PERSON	0.99+
Lisa Martin	PERSON	0.99+
$500	QUANTITY	0.99+
Lisa	PERSON	0.99+
2 billion	QUANTITY	0.99+
Mark	PERSON	0.99+
Dremio	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Tomer Shiran	PERSON	0.99+
Hundreds	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
less than a year	QUANTITY	0.99+
GDPR	TITLE	0.99+
both	QUANTITY	0.99+
end of March	DATE	0.99+
today	DATE	0.99+
over 400 million	QUANTITY	0.98+
over seven, 8,000 registrants	QUANTITY	0.98+
first	QUANTITY	0.97+
Sonar	ORGANIZATION	0.97+
Arctic	ORGANIZATION	0.97+
Apache	ORGANIZATION	0.96+
two products	QUANTITY	0.96+
S3	TITLE	0.95+
Dremio Arctic	ORGANIZATION	0.94+
EC2	TITLE	0.94+
Lakehouse	ORGANIZATION	0.94+
CCPA	TITLE	0.94+
couple months ago	DATE	0.93+
re:Invent	EVENT	0.87+
five months back	DATE	0.86+
last couple years	DATE	0.86+
three	DATE	0.84+
one	QUANTITY	0.84+
couple minutes	QUANTITY	0.82+
March first week of March	DATE	0.82+
hundreds	QUANTITY	0.81+
10 years	QUANTITY	0.76+
four	DATE	0.76+
Forever	TITLE	0.76+
beginning	DATE	0.73+
SQL	TITLE	0.72+
2.0 data	QUANTITY	0.71+
Series	EVENT	0.68+
Sonar	COMMERCIAL_ITEM	0.67+
E	OTHER	0.64+
Series E	EVENT	0.64+
Free	ORGANIZATION	0.63+
Californias	LOCATION	0.59+
signups	QUANTITY	0.57+
Conversation	EVENT	0.56+
year	EVENT	0.53+
thousand	QUANTITY	0.48+
eight	DATE	0.46+
CPRA	ORGANIZATION	0.42+
CCPA	ORGANIZATION	0.34+

Rik Tamm-Daniels, Informatica | AWS re:Invent 2021

>>Hey everyone. Welcome back to the cube. Live in Las Vegas, Lisa Martin, with Dave Nicholson, we are covering AWS reinvent 2021. This was probably one of the most important and largest hybrid tech events this year with AWS and its enormous ecosystem of partners. We're going to be talking with a hundred guests in the next couple of days. We started a couple of days ago and about really the innovation that's going to be going on in the cloud and tech in the next decade. We're pleased to welcome Rick Tam Daniel's as our next guest VP of strategic ecosystems at Informatica. Rick. Welcome to >>The program. Thank you for having me. It's a, it's a pleasure to be back. >>Isn't it nice to be back in person? Oh, it's amazing. All these conversations you just can't replicate by video conferencing. Absolutely >>Great to reconnect with folks haven't seen in a few years as well. >>Absolutely. That's been the sentiment. I think one of the, one of the sentiments that we've heard the last three days, so one of the things thematically that we've also been hearing about in, in between all of the plethora of AWS announcements, typical reinvent is that every company has to become a data company, public sector, private sector, small business, large business. Talk to us about how Informatica and AWS are helping companies become data companies so that they don't get left behind. >>But one of the biggest things that we're hearing at reinvent is that customers are really concerned with data, fragmentation, data silos, access to trusted data, and how do they, how do they get that information to really affect data led transformation? In fact, we did a survey earlier in the year of chief, the chief data officers were found that up to 80, almost 80% of organizations had 50% or more of their data in hybrid or multi-cloud environments. And also a 79% are looking to leverage more than 100 data sources. And 30% are looking to leverage more than 1000 data sources. So Informatica we, with our intelligent data management cloud, we're really focused on enabling customers to bring together the data assets, no matter where they live, what format they're in, on-premise cloud, multi-cloud bringing that all together. >>Well, we sold this massive scatter 22 months ago now, right? Of everyone just, and the edge exploded and data exploded and volumes and data sources exploded hard for organizations to get their head around that, to go or that the data is going to be living in all these different places. You talked about a lot of customers and every industry being hybrid multi-cloud because based on strategy, based on acquisition, but to get their arms around that data and to be able to actually extract value from it fast is going to be the difference between those businesses that succeed and those that don't >>Absolutely. And our partnership with AWS, that's a long standing partnership and we're very much focused on addressing the challenges you're talking about. Uh, and in fact, earlier this year we announced our cloud first, our cloud native, uh, data governance and data catalog on AWS, which is really focused on creating that central point of trusted data access and visibility for the organization. And just today, we had an announcement about how we're bringing data democratization and really accelerating data democratization for AWS lake formation. >>What is, when you, when you, we talk about data democratization often, what does that mean to you? What does that mean to Informatica? How do you deliver that to customers so that they can really be able to extract as much value as they can? >>Yeah. So a great question. And really that whole data management journey is a big piece of this. So it starts with data discovery. How do I even begin to find my data assets? How do I get them from where they are to where they need to go in the cloud? How do I make sure they're clean, they're ready to use. I trust them. I understand where they came from. And so the solution that we announced today is really focused on how do we provide a business users with a self-service way of getting access to data lake data, sitting in Amazon S3 with lake formation governance, but doing it in a way that doesn't create an undue burden on those business users, around data compliance and data policies. And so what we've done is we brought our business user-friendly self-service experience an axon data marketplace together with AWS lake formation. >>So Informatica has had a long history in the data world. Um, I think of terms like MDM and ETL. Yes. Where does, where does Informatica is history dovetail with the present day in terms of cloud the con does the concept of extract translate load? I think that's what ETL stood for too many TLAs running as far as trying to transform, uh, w where does that play in today's world? Are you focused separately on cloud from on-premise data center or do you, or do you link the two? Yeah, >>So we focus on, uh, addressing data management, uh, when, no matter where the data lives. So on-premise cloud multi-cloud, uh, on our intelligent data management cloud platform is a, is the industry's first end-to-end cloud native as a service data management platform that delivers all those capabilities. I mentioned before, uh, to customers. So we can manage all those workloads that are distributed from a single cloud-based as a service data management platform. So >>The platform is, is as a service in the cloud, but you could be managing data assets that are in traditional on premises, data centers, the like, absolutely. >>Okay. >>So congratulations on the IPO. Of course we can't, we can't not talk to Informatica without that. I imagined the momentum is probably pretty great right about now when we think of AWS, I, when I think of AWS, I always think of momentum. We, I mean the, the volume of announcements, but also when I think about AWS, I think about their absolute focus on the customer, that working backwards approach from a partnership perspective. Is there alignment there? I imagine, like I said, with the IPO, a lot of momentum right now, probably a lot of excitement are, is infant medical also was focused and customer obsessed as AWS's. >>Yeah. So, um, first of all, thank you so much. Congratulations. Uh, so we had a very successful IPO last month. And in fact, just yesterday, our CEO I'm at Wailea presented our Q3 results, uh, which showcase the continued growth of our subscription revenue or cloud revenue. And in fact, our cloud revenue grew 44% year over year, which is really reflective of our big shift to being a cloud first company and also the success of our intelligent data management cloud platform. Right. And, and that platform, again, as I mentioned, it's spanning all those aspects of data management and we're delivering that for more than 5,000 customers globally. Uh, and just from an adoption perspective, we processed about 23 trillion transactions a month for customers in our cloud platform. And that's doubling every six to 12 months. So it's incredible amount of adoption. Some of the biggest enterprises in the world like Unilever, Sanofi folks like that are using the cloud is their preferred data management platform of choice in the cloud. >>Well, you know, of course, congratulations is in order for the IPO, but also really on what you just mentioned, the trajectory of where Informatica is going, because Informatica wasn't born yesterday. Right. And, uh, we shouldn't overlook the fact that there are challenges associated with moving from the world as it exists on premises for still 80% of it spend at least navigating that transition, going from private to public, getting the right kind of investment where people realize that cloud is a significant barrier to entry, uh, for, for a lot of companies. I think it's, it's, you know, you have a lot of folks cheering for you as you navigate this transition. >>Well, one thing I do I say is, yes, we have it in the business of data for a long time, but we also then the business of cloud quite a long time. So this is true. This is the 10th reinvent. This is also the ten-year anniversary of the Informatica AWS partnership, right? So we've been working in the cloud with AWS for, for that long innovating all of these different, different core services. So, um, and from that perspective, you know, I think we're doing a tremendous amount of innovation together, you know, solutions like when we talked about for lake formation, but we also announced today a couple of key programs that we partnered with AWS around, around modernization and migration, right? So that's a big area of focus as well is how do we help customers modernize and take advantage of all the great services that AWS offers? So that's how we announced our membership and what's called the workload migration program and also the data lead migrations program, which is part of the public sector focus at AWS as well. >>The station perspective that was talked a lot about by Adam yesterday. And we've talked about it a lot today, every organization needs to monitorize, even some of those younger ones that you think, oh, aren't, they already, you know, fairly modern, but where, where are your customer conversations happening from a modernization perspective is that elevated up the, the C stat that we've got to modernize our or our organization get better handle of our data, be able to use it more protected, secure it so that we can be competitive and deliver outstanding customer experiences. >>What happens is the pain points that the legacy infrastructure has from the business perspective really do elevate the conversation to the C-suite. They're looking at saying, Hey, especially with the pandemic, right? We have to transform our business. We have to have data. We have to have trust in data. How do we do that? And we're not going to get there >>On rigid on-premise infrastructure. We need to be in a cloud native footprint. And so we've been focused on helping customers get to those cloud native end points, but also to a truly cloud native data management, we talked about earlier can manage all those different workloads, right? From a single that SAS serverless type experience. Right? What have been some of the interesting conversations that you've had here? Again, we are in person yep. Fresh off the IPO, lots of announcements coming out. You guys made announcements today. What's been the sentiment from the, those customers and partners that you've talked about. >>Well, I'll give you guys actually a little sneak preview of another announcement we have coming tomorrow, uh, with our friends at Databricks. So we, uh, we are announcing a data, data democratization solution with Databricks accelerating some of the same, you know, addressing some of the same challenges we were talking about here, but in the data breaks in the Lakehouse environment. Um, so, so, but around that, and I had a great conversation with some partners here, some of the global system integrators, and they're just so happy to see that, right, because a lot of the infrastructure that's around data lakes are lake formation. It's pretty technical it's for a technical audience. And, and Informatica has really been focused on how do we expand the base of users that are able to tap into data and that's through no code experiences, right? It's through visual experiences. And we bring that tightly coupled together with the performance and the power and scale of platforms like Databricks and the AWS Redshift and S3, it's really transformative for customers. >>What are some of the things that here we are wrapping up the 10th, re-invent almost as tomorrow, but also wrapping up the end of 2021. What are some of the things that th th that there's obviously a lot of momentum with Informatica right now that from a partnership perspective, anything that you, you just gave us some breaking news. Thank you. We always love that. What are some of the things that you're looking forward to in 2022 that you think are really going to help Informatica customers just be incredibly competitive and utilizing data in the cloud on prem to their maximum? >>Well, I think as we go into the next year data complexity data fragmentation, it's gonna continue to grow. It's, it's, it's exploding out there. Uh, and one of the key components of our platform or the IDMC platform is we call it Clare, which is the industry first kind of metadata driven AI engine. And what we've done is we've taken the intelligence of machine learning and AI, and brought that to the business of data management. And we truly believe that the way customers are going to tame that data, they're going to address those problems and continue to scale and keep up is leveraging the power of AI in a cloud native cloud, first data management platform. >>Excellent. Rick, thank you so much for joining us today. Again, congratulations on last month, Informatica IPO, great solid, strong, deep partnership with AWS. We thank you for your insights and best of luck next year. >>Awesome. Thank you so much. Pleasure being here. Our >>Pleasure to have you for my co-host David Nicholson, I'm Martin. You're watching the cube, the global leader in live tech coverage.

Published Date : Dec 2 2021

SUMMARY :

We started a couple of days ago and about really the innovation that's going to be It's a, it's a pleasure to be back. Isn't it nice to be back in person? that every company has to become a data company, public sector, private sector, But one of the biggest things that we're hearing at reinvent is that customers are really concerned with data, fast is going to be the difference between those businesses that succeed and those And just today, we had an announcement about how we're bringing data democratization And so the solution that we announced today So Informatica has had a long history in the data world. So we focus on, uh, addressing data management, uh, when, no matter where the data lives. The platform is, is as a service in the cloud, but you could be managing data assets that are So congratulations on the IPO. And that's doubling every six to 12 months. that cloud is a significant barrier to entry, uh, but we also announced today a couple of key programs that we partnered with AWS around, our organization get better handle of our data, be able to use it more protected, secure it so that we can really do elevate the conversation to the C-suite. What have been some of the interesting conversations that you've had here? some of the same, you know, addressing some of the same challenges we were talking about here, but in the data breaks in the Lakehouse environment. What are some of the things that here we are wrapping up the 10th, and brought that to the business of data management. We thank you for your insights and best of luck next year. Thank you so much. Pleasure to have you for my co-host David Nicholson, I'm Martin.

ENTITIES

Entity	Category	Confidence
David Nicholson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Dave Nicholson	PERSON	0.99+
Rick	PERSON	0.99+
Unilever	ORGANIZATION	0.99+
44%	QUANTITY	0.99+
Sanofi	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Lisa Martin	PERSON	0.99+
2022	DATE	0.99+
yesterday	DATE	0.99+
Las Vegas	LOCATION	0.99+
50%	QUANTITY	0.99+
Martin	PERSON	0.99+
next year	DATE	0.99+
tomorrow	DATE	0.99+
two	QUANTITY	0.99+
Adam	PERSON	0.99+
first	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
more than 1000 data sources	QUANTITY	0.99+
more than 100 data sources	QUANTITY	0.99+
today	DATE	0.99+
79%	QUANTITY	0.99+
last month	DATE	0.99+
last month	DATE	0.99+
more than 5,000 customers	QUANTITY	0.99+
Rick Tam Daniel	PERSON	0.99+
Rik Tamm-Daniels	PERSON	0.99+
Wailea	ORGANIZATION	0.99+
ten-year	QUANTITY	0.99+
22 months ago	DATE	0.98+
12 months	QUANTITY	0.98+
30%	QUANTITY	0.98+
first company	QUANTITY	0.98+
this year	DATE	0.97+
earlier this year	DATE	0.97+
2021	DATE	0.97+
one	QUANTITY	0.97+
Informatica AWS	ORGANIZATION	0.96+
next decade	DATE	0.96+
end of 2021	DATE	0.95+
up to 80	QUANTITY	0.95+
almost 80%	QUANTITY	0.94+
about 23 trillion transactions a month	QUANTITY	0.91+
next couple of days	DATE	0.88+
single	QUANTITY	0.88+

Greg Rokita, Edmunds.com & Joel Minnick, Databricks | AWS re:Invent 2021

>>We'll come back to the cubes coverage of AWS reinvent 2021, the industry's most important hybrid event. Very few hybrid events, of course, in the last two years. And the cube is excited to be here. Uh, this is our ninth year covering AWS reinvent this the 10th reinvent we're here with Joel Minnick, who the vice president of product and partner marketing at smoking hot company, Databricks and Greg Rokita, who is executive director of technology at Edmonds. If you're buying a car or leasing a car, you gotta go to Edmund's. We're gonna talk about busting data silos, guys. Great to see you again. >>Welcome. Welcome. Glad to be here. >>All right. So Joel, what the heck is a lake house? This is all over the place. Everybody's talking about lake house. What is it? >>And it did well in a nutshell, a Lakehouse is the ability to have one unified platform to handle all of your traditional analytics workloads. So your BI and reporting Trisha, the lake, the workloads that you would have for your data warehouse on the same platform as the workloads that you would have for data science and machine learning. And so if you think about kind of the way that, uh, most organizations have built their infrastructure in the cloud today, what we have is generally customers will land all their data in a data lake and a data lake is fantastic because it's low cost, it's open. It's able to handle lots of different kinds of data. Um, but the challenges that data lakes have is that they don't necessarily scale very well. It's very hard to govern data in a data lake house. It's very hard to manage that data in a data lake, sorry, in a, in a data lake. >>And so what happens is that customers then move the data out of a data lake into downstream systems and what they tend to move it into our data warehouses to handle those traditional reporting kinds of workloads that they have. And they do that because data warehouses are really great at being able to have really great scale, have really great performance. The challenge though, is that data warehouses really only work for structured data. And regardless of what kind of data warehouse you adopt, all data warehouse and platforms today are built on some kind of proprietary format. So once you've put that data into the data warehouse, that's, that is kind of what you're locked into. The promise of the data lake house was to say, look, what if we could strip away all of that complexity and having to move data back and forth between all these different systems and keep the data exactly where it is today and where it is today is in the data lake. >>And then being able to apply a transaction layer on top of that. And the Databricks case, we do that through a technology and open source technology called data lake, or sorry, Delta lake. And what Delta lake allows us to do is when you need it, apply that performance, that reliability, that quality, that scale that you would expect out of a data warehouse directly on your data lake. And if I can do that, then what I'm able to do now is operate from one single source of truth that handles all of my analytics workloads, both my traditional analytics workloads and my data science and machine learning workloads, and being able to have all of those workloads on one common platform. It means that now not only do I get much, much more simple in the way that my infrastructure works and therefore able to operate at much lower costs, able to get things to production much, much faster. >>Um, but I'm also able to now to leverage open source in a much bigger way being that lake house is inherently built on an open platform. Okay. So I'm no longer locked into any kind of data format. And finally, probably one of the most, uh, lasting benefits of a lake house is that all the roles that have to take that have to touch my data for my data engineers, to my data analyst, my data scientists, they're all working on the same data, which means that collaboration that has to happen to go answer really hard problems with data. I'm now able to do much, much more easy because those silos that traditionally exist inside of my environment no longer have to be there. And so Lakehouse is that is the promise to have one single source of truth, one unified platform for all of my data. Okay, >>Great. Thank you for that very cogent description of what a lake house is now. Let's I want to hear from the customer to see, okay, this is what he just said. True. So actually, let me ask you this, Greg, because the other problem that you, you didn't mention about the data lake is that with no schema on, right, it gets messy and Databricks, I think, correct me if I'm wrong, has begun to solve that problem, right? Through series of tooling and AI. That's what Delta liked us. It's a man, like it's a managed service. Everybody thought you were going to be like the cloud era of spark and Brittany Britain, a brilliant move to create a managed service. And it's worked great. Now everybody has a managed service, but so can you paint a picture at Edmonds as to what you're doing with, maybe take us through your journey the early days of a dupe, a data lake. Oh, that sounds good. Throw it in there, paint a picture as to how you guys are using data and then tie it into what y'all just said. >>As Joel said, that they'll the, it simplifies the architecture quite a bit. Um, in a modern enterprise, you have to deal with a variety of different data sources, structured semi-structured and unstructured in the form of images and videos. And with Delta lake and built a lake, you can have one system that handles all those data sources. So what that does is that basically removes the issue of multiple systems that you have to administer. It lowers the cost, and it provides consistency. If you have multiple systems that deal with data, you always arise as the issue as to which data has to be loaded into which system. And then you have issues with consistency. Once you have issues with consistency, business users, as analysts will stop trusting your data. So that was very critical for us to unify the system of data handling in the one place. >>Additionally, you have a massive scalability. So, um, I went to the talk with from apple saying that, you know, he can process two years worth of data. Instead of just two days in an Edmonds, we have this use case of backfilling the data. So often we changed the logic and went to new. We need to reprocess massive amounts of data with the lake house. We can reprocess months worth of data in, in a matter of minutes or hours. And additionally at the data lake houses based on open, uh, open standards, like parquet that allowed us, allowed us to basically hope open source and third-party tools on top of the Delta lake house. Um, for example, a Mattson, we use a Matson for data discovery, and finally, uh, the lake house approach allows us for different skillsets of people to work on the same source data. We have analysts, we have, uh, data engineers, we have statisticians and data scientists using their own programming languages, but working on the same core of data sets without worrying about duplicating data and consistency issues between the teams. >>So what, what is, what are the primary use cases where you're using house Lakehouse Delta? >>So, um, we work, uh, we have several use cases, one of them more interesting and important use cases as vehicle pricing, you have used the Edmonds. So, you know, you go to our website and you use it to research vehicles, but it turns out that pricing and knowing whether you're getting a good or bad deal is critical for our, uh, for our business. So with the lake house, we were able to develop a data pipeline that ingests the transactions, curates the transactions, cleans them, and then feeds that curated a curated feed into the machine learning model that is also deployed on the lake house. So you have one system that handles this huge complexity. And, um, as you know, it's very hard to find unicorns that know all those technologies, but because we have flexibility of using Scala, Java, uh, Python and SQL, we have different people working on different parts of that pipeline on the same system and on the same data. So, um, having Lakehouse really enabled us to be very agile and allowed us to deploy new sources easily when we, when they arrived and fine tune the model to decrease the error rates for the price prediction. So that process is ongoing and it's, it's a very agile process that kind of takes advantage of the, of the different skill sets of different people on one system. >>Because you know, you guys democratized by car buying, well, at least the data around car buying because as a consumer now, you know, I know what they're paying and I can go in of course, but they changed their algorithms as well. I mean, the, the dealers got really smart and then they got kickbacks from the manufacturer. So you had to get smarter. So it's, it's, it's a moving target, I guess. >>Great. The pricing is actually very complex. Like I, I don't have time to explain it to you, but knowing, especially in this crazy market inflationary market where used car prices are like 38% higher year over year, and new car prices are like 10% higher and they're changing rapidly. So having very responsive pricing model is, is extremely critical. Uh, you, I don't know if you're familiar with Zillow. I mean, they almost went out of business because they mispriced their, uh, their houses. So, so if you own their stock, you probably under shorthand of it, but, you know, >>No, but it's true because I, my lease came up in the middle of the pandemic and I went to Edmonds, say, what's this car worth? It was worth like $7,000. More than that. Then the buyout costs the residual value. I said, I'm taking it, can't pass up that deal. And so you have to be flexible. You're saying the premises though, that open source technology and Delta lake and lake house enabled that flexible. >>Yes, we are able to ingest new transactions daily recalculate our model within less than an hour and deploy the new model with new pricing, you know, almost real time. So, uh, in this environment, it's very critical that you kind of keep up to date and ingest their latest transactions as they prices change and recalculate your model that predicts the future prices. >>Because the business lines inside of Edmond interact with the data teams, you mentioned data engineers, data scientists, analysts, how do the business people get access to their data? >>Originally, we only had a core team that was using Lakehouse, but because the usage was so powerful and easy, we were able to democratize it across our units. So other teams within software engineering picked it up and then analysts picked it up. And then even business users started using the dashboarding and seeing, you know, how the price has changed over time and seeing other, other metrics within the, >>What did that do for data quality? Because I feel like if I'm a business person, I might have context of the data that an analyst might not have. If they're part of a team that's servicing all these lines of business, did you find that the data quality, the collaboration affected data? >>Th the biggest thing for us was the fact that we don't have multiple systems now. So you don't have to load the data. Whenever you have to load the data from one system to another, there is always a lag. There's always a delay. There is always a problematic job that didn't do the copy correctly. And the quality is uncertain. You don't know which system tells you the truth. Now we just have one layer of data. Whether you do reports, whether you're data processing or whether you do modeling, they all read the same data. And the second thing is that with the dashboarding capabilities, people that were not very technical that before we could only use Tableau and Tableau is not the easiest thing to use as if you're not technical. Now they can use it. So anyone can see how our pricing data looks, whether you're an executive, whether you're an analyst or a casual business users, >>But Hey, so many questions, you guys are gonna have to combat. I'm gonna run out of time, but you now allow a consumer to buy a car directly. Yes. Right? So that's a new service that you launched. I presume that required new data. We give, we >>Give consumers offers. Yes. And, and that offer you >>Offered to buy my league. >>Exactly. And that offer leverages the pricing that we develop on top of the lake house. So the most important thing is accurately giving you a very good offer price, right? So if we give you a price, that's not so good. You're going to go somewhere else. If we give you price, that's too high, we're going to go bankrupt like Zillow debt, right. >>It took to enable that you're working off the same dataset. Yes. You're going to have to spin up a, did you have to inject new data? Was there a new data source that we're working on? >>Once we curate the data sources and once we clean it, we see the directly to the model. And all of those components are running on the lake house, whether you're curating the data, cleaning it or running the model. The nice thing about lake house is that machine learning is the first class citizen. If you use something like snowflake, I'm not going to slam snowflake here, but you >>Have two different use case. You have >>To, you have to load it into a different system later. You have to load it into a different system. So like good luck doing machine learning on snowflake. Right. >>Whereas, whereas Databricks, that's kind of your raison d'etre >>So what are your, your, your data engineer? I feel like I should be a salesman or something. Yeah. I'm not, I'm not saying that. Just, just because, you know, I was told to, like, I'm saying it because of that's our use case, >>Your use case. So question for each of you, what, what business results did you see when you went to kind of pre lake house, post lake house? What are the, any metrics you can share? And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, what can you tell us? Well, >>Uh, before their lake house, we had two different systems. We had one for processing, which was still data breaks. And the second one for serving and we iterated over Nateeza or Redshift, but we figured that maintaining two different system and loading data from one to the other was a huge overhead administration security costs. Um, the fact that you had to consistency issues. So the fact that you can have one system, um, with, uh, centralized data, solves all those issues. You have to have one security mechanism, one administrative mechanism, and you don't have to load the data from one system to the other. You don't have to make compromises. >>It's scale is not a problem because of the cloud, >>Because you can spend clusters at will for different use cases. So your clusters are independent. You have processing clusters that are not affecting your serving clusters. So, um, in the past, if you were running a serving, say on Nateeza or Redshift, if you were doing heavy processing, your reports would be affected, but now all those clusters are separated. So >>Consumer data consumer can take that data from the producer independ >>Using its own cluster. Okay. >>Yeah. I'll give you the final word, Joel. I know it's been, I said, you guys got to come back. This is what have you seen broadly? >>Yeah. Well, I mean, I think Greg's point about scale. It's an interesting one. So if you look at cross the entire Databricks platform, the platform is launching 9 million VMs every day. Um, and we're in total processing over nine exabytes a month. So in terms of just how much data the platform is able to flow through it, uh, and still maintain a extremely high performance is, is bar none out there. And then in terms of, if you look at just kind of the macro environment of what's happening out there, you know, I think what's been most exciting to watch or what customers are experiencing traditionally or, uh, on the traditional data warehouse and kinds of workloads, because I think that's where the promise of lake house really comes into its own is saying, yes, I can run these traditional data warehousing workloads that require a high concurrency high scale, high performance directly on my data lake. >>And, uh, I think probably the two most salient data points to raise up there is, uh, just last month, Databricks announced it's set the world record for the, for the, uh, TPC D S 100 terabyte benchmark. So that is a place where Databricks at the lake house architecture, that benchmark is built to measure data warehouse performance and the lake house beat data warehouse and sat their own game in terms of overall performance. And then what's that spends from a price performance standpoint, it's customers on Databricks right now are able to enjoy that level of performance at 12 X better price performance than what cloud data warehouses provide. So not only are we jumping on this extremely high scale and performance, but we're able to do it much, much more efficiently. >>We're gonna need a whole nother section second segment to talk about benchmarking that guys. Thanks so much, really interesting session and thank you and best of luck to both join the show. Thank you for having us. Very welcome. Okay. Keep it right there. Everybody you're watching the cube, the leader in high-tech coverage at AWS reinvent 2021

Published Date : Nov 30 2021

SUMMARY :

Great to see you again. Glad to be here. This is all over the place. and reporting Trisha, the lake, the workloads that you would have for your data warehouse on And regardless of what kind of data warehouse you adopt, And what Delta lake allows us to do is when you need it, that all the roles that have to take that have to touch my data for as to how you guys are using data and then tie it into what y'all just said. And with Delta lake and built a lake, you can have one system that handles all Additionally, you have a massive scalability. So you have one system that So you had to get smarter. So, so if you own their stock, And so you have to be flexible. less than an hour and deploy the new model with new pricing, you know, you know, how the price has changed over time and seeing other, other metrics within the, lines of business, did you find that the data quality, the collaboration affected data? So you don't have to load But Hey, so many questions, you guys are gonna have to combat. So the most important thing is accurately giving you a very good offer did you have to inject new data? I'm not going to slam snowflake here, but you You have To, you have to load it into a different system later. Just, just because, you know, I was told to, And then I wonder, Joel, if you could share a sort of broader what you're seeing across your customer base, but Greg, So the fact that you can have one system, So, um, in the past, if you were running a serving, Okay. This is what have you seen broadly? So if you look at cross the entire So not only are we jumping on this extremely high scale and performance, but we're able to do it much, Thanks so much, really interesting session and thank you and best of luck to both join the show.

ENTITIES

Entity	Category	Confidence
Joel	PERSON	0.99+
Greg	PERSON	0.99+
Joel Minnick	PERSON	0.99+
$7,000	QUANTITY	0.99+
Greg Rokita	PERSON	0.99+
38%	QUANTITY	0.99+
two days	QUANTITY	0.99+
10%	QUANTITY	0.99+
Java	TITLE	0.99+
Databricks	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
one system	QUANTITY	0.99+
one	QUANTITY	0.99+
Scala	TITLE	0.99+
apple	ORGANIZATION	0.99+
Python	TITLE	0.99+
SQL	TITLE	0.99+
ninth year	QUANTITY	0.99+
last month	DATE	0.99+
lake house	ORGANIZATION	0.99+
two different systems	QUANTITY	0.99+
Tableau	TITLE	0.99+
2021	DATE	0.99+
9 million VMs	QUANTITY	0.99+
second thing	QUANTITY	0.99+
less than an hour	QUANTITY	0.99+
Lakehouse	ORGANIZATION	0.98+
12 X	QUANTITY	0.98+
Delta	ORGANIZATION	0.98+
Delta lake house	ORGANIZATION	0.98+
one layer	QUANTITY	0.98+
one common platform	QUANTITY	0.98+
both	QUANTITY	0.97+
AWS	ORGANIZATION	0.97+
Zillow	ORGANIZATION	0.97+
Brittany Britain	PERSON	0.97+
Edmunds.com	ORGANIZATION	0.97+
two different system	QUANTITY	0.97+
Edmonds	ORGANIZATION	0.97+
over nine exabytes a month	QUANTITY	0.97+
today	DATE	0.96+
Lakehouse Delta	ORGANIZATION	0.96+
Delta lake	ORGANIZATION	0.95+
Trisha	PERSON	0.95+
data lake	ORGANIZATION	0.94+
Mattson	ORGANIZATION	0.92+
second segment	QUANTITY	0.92+
each	QUANTITY	0.92+
Matson	ORGANIZATION	0.91+
two most salient data points	QUANTITY	0.9+
Edmonds	LOCATION	0.89+
100 terabyte	QUANTITY	0.87+
one single source	QUANTITY	0.86+
first class	QUANTITY	0.85+
Nateeza	TITLE	0.85+
one security	QUANTITY	0.85+
Redshift	TITLE	0.84+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Lakehouse: