Breaking Analysis: Databricks faces critical strategic decisions…here’s why

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> Spark became a top level Apache project in 2014, and then shortly thereafter, burst onto the big data scene. Spark, along with the cloud, transformed and in many ways, disrupted the big data market. Databricks optimized its tech stack for Spark and took advantage of the cloud to really cleverly deliver a managed service that has become a leading AI and data platform among data scientists and data engineers. However, emerging customer data requirements are shifting into a direction that will cause modern data platform players generally and Databricks, specifically, we think, to make some key directional decisions and perhaps even reinvent themselves. Hello and welcome to this week's wikibon theCUBE Insights, powered by ETR. In this Breaking Analysis, we're going to do a deep dive into Databricks. We'll explore its current impressive market momentum. We're going to use some ETR survey data to show that, and then we'll lay out how customer data requirements are changing and what the ideal data platform will look like in the midterm future. We'll then evaluate core elements of the Databricks portfolio against that vision, and then we'll close with some strategic decisions that we think the company faces. And to do so, we welcome in our good friend, George Gilbert, former equities analyst, market analyst, and current Principal at TechAlpha Partners. George, good to see you. Thanks for coming on. >> Good to see you, Dave. >> All right, let me set this up. We're going to start by taking a look at where Databricks sits in the market in terms of how customers perceive the company and what it's momentum looks like. And this chart that we're showing here is data from ETS, the emerging technology survey of private companies. The N is 1,421. What we did is we cut the data on three sectors, analytics, database-data warehouse, and AI/ML. The vertical axis is a measure of customer sentiment, which evaluates an IT decision maker's awareness of the firm and the likelihood of engaging and/or purchase intent. The horizontal axis shows mindshare in the dataset, and we've highlighted Databricks, which has been a consistent high performer in this survey over the last several quarters. And as we, by the way, just as aside as we previously reported, OpenAI, which burst onto the scene this past quarter, leads all names, but Databricks is still prominent. You can see that the ETR shows some open source tools for reference, but as far as firms go, Databricks is very impressively positioned. Now, let's see how they stack up to some mainstream cohorts in the data space, against some bigger companies and sometimes public companies. This chart shows net score on the vertical axis, which is a measure of spending momentum and pervasiveness in the data set is on the horizontal axis. You can see that chart insert in the upper right, that informs how the dots are plotted, and net score against shared N. And that red dotted line at 40% indicates a highly elevated net score, anything above that we think is really, really impressive. And here we're just comparing Databricks with Snowflake, Cloudera, and Oracle. And that squiggly line leading to Databricks shows their path since 2021 by quarter. And you can see it's performing extremely well, maintaining an elevated net score and net range. Now it's comparable in the vertical axis to Snowflake, and it consistently is moving to the right and gaining share. Now, why did we choose to show Cloudera and Oracle? The reason is that Cloudera got the whole big data era started and was disrupted by Spark. And of course the cloud, Spark and Databricks and Oracle in many ways, was the target of early big data players like Cloudera. Take a listen to Cloudera CEO at the time, Mike Olson. This is back in 2010, first year of theCUBE, play the clip. >> Look, back in the day, if you had a data problem, if you needed to run business analytics, you wrote the biggest check you could to Sun Microsystems, and you bought a great big, single box, central server, and any money that was left over, you handed to Oracle for a database licenses and you installed that database on that box, and that was where you went for data. That was your temple of information. >> Okay? So Mike Olson implied that monolithic model was too expensive and inflexible, and Cloudera set out to fix that. But the best laid plans, as they say, George, what do you make of the data that we just shared? >> So where Databricks has really come up out of sort of Cloudera's tailpipe was they took big data processing, made it coherent, made it a managed service so it could run in the cloud. So it relieved customers of the operational burden. Where they're really strong and where their traditional meat and potatoes or bread and butter is the predictive and prescriptive analytics that building and training and serving machine learning models. They've tried to move into traditional business intelligence, the more traditional descriptive and diagnostic analytics, but they're less mature there. So what that means is, the reason you see Databricks and Snowflake kind of side by side is there are many, many accounts that have both Snowflake for business intelligence, Databricks for AI machine learning, where Snowflake, I'm sorry, where Databricks also did really well was in core data engineering, refining the data, the old ETL process, which kind of turned into ELT, where you loaded into the analytic repository in raw form and refine it. And so people have really used both, and each is trying to get into the other. >> Yeah, absolutely. We've reported on this quite a bit. Snowflake, kind of moving into the domain of Databricks and vice versa. And the last bit of ETR evidence that we want to share in terms of the company's momentum comes from ETR's Round Tables. They're run by Erik Bradley, and now former Gartner analyst and George, your colleague back at Gartner, Daren Brabham. And what we're going to show here is some direct quotes of IT pros in those Round Tables. There's a data science head and a CIO as well. Just make a few call outs here, we won't spend too much time on it, but starting at the top, like all of us, we can't talk about Databricks without mentioning Snowflake. Those two get us excited. Second comment zeros in on the flexibility and the robustness of Databricks from a data warehouse perspective. And then the last point is, despite competition from cloud players, Databricks has reinvented itself a couple of times over the year. And George, we're going to lay out today a scenario that perhaps calls for Databricks to do that once again. >> Their big opportunity and their big challenge for every tech company, it's managing a technology transition. The transition that we're talking about is something that's been bubbling up, but it's really epical. First time in 60 years, we're moving from an application-centric view of the world to a data-centric view, because decisions are becoming more important than automating processes. So let me let you sort of develop. >> Yeah, so let's talk about that here. We going to put up some bullets on precisely that point and the changing sort of customer environment. So you got IT stacks are shifting is George just said, from application centric silos to data centric stacks where the priority is shifting from automating processes to automating decision. You know how look at RPA and there's still a lot of automation going on, but from the focus of that application centricity and the data locked into those apps, that's changing. Data has historically been on the outskirts in silos, but organizations, you think of Amazon, think Uber, Airbnb, they're putting data at the core, and logic is increasingly being embedded in the data instead of the reverse. In other words, today, the data's locked inside the app, which is why you need to extract that data is sticking it to a data warehouse. The point, George, is we're putting forth this new vision for how data is going to be used. And you've used this Uber example to underscore the future state. Please explain? >> Okay, so this is hopefully an example everyone can relate to. The idea is first, you're automating things that are happening in the real world and decisions that make those things happen autonomously without humans in the loop all the time. So to use the Uber example on your phone, you call a car, you call a driver. Automatically, the Uber app then looks at what drivers are in the vicinity, what drivers are free, matches one, calculates an ETA to you, calculates a price, calculates an ETA to your destination, and then directs the driver once they're there. The point of this is that that cannot happen in an application-centric world very easily because all these little apps, the drivers, the riders, the routes, the fares, those call on data locked up in many different apps, but they have to sit on a layer that makes it all coherent. >> But George, so if Uber's doing this, doesn't this tech already exist? Isn't there a tech platform that does this already? >> Yes, and the mission of the entire tech industry is to build services that make it possible to compose and operate similar platforms and tools, but with the skills of mainstream developers in mainstream corporations, not the rocket scientists at Uber and Amazon. >> Okay, so we're talking about horizontally scaling across the industry, and actually giving a lot more organizations access to this technology. So by way of review, let's summarize the trend that's going on today in terms of the modern data stack that is propelling the likes of Databricks and Snowflake, which we just showed you in the ETR data and is really is a tailwind form. So the trend is toward this common repository for analytic data, that could be multiple virtual data warehouses inside of Snowflake, but you're in that Snowflake environment or Lakehouses from Databricks or multiple data lakes. And we've talked about what JP Morgan Chase is doing with the data mesh and gluing data lakes together, you've got various public clouds playing in this game, and then the data is annotated to have a common meaning. In other words, there's a semantic layer that enables applications to talk to the data elements and know that they have common and coherent meaning. So George, the good news is this approach is more effective than the legacy monolithic models that Mike Olson was talking about, so what's the problem with this in your view? >> So today's data platforms added immense value 'cause they connected the data that was previously locked up in these monolithic apps or on all these different microservices, and that supported traditional BI and AI/ML use cases. But now if we want to build apps like Uber or Amazon.com, where they've got essentially an autonomously running supply chain and e-commerce app where humans only care and feed it. But the thing is figuring out what to buy, when to buy, where to deploy it, when to ship it. We needed a semantic layer on top of the data. So that, as you were saying, the data that's coming from all those apps, the different apps that's integrated, not just connected, but it means the same. And the issue is whenever you add a new layer to a stack to support new applications, there are implications for the already existing layers, like can they support the new layer and its use cases? So for instance, if you add a semantic layer that embeds app logic with the data rather than vice versa, which we been talking about and that's been the case for 60 years, then the new data layer faces challenges that the way you manage that data, the way you analyze that data, is not supported by today's tools. >> Okay, so actually Alex, bring me up that last slide if you would, I mean, you're basically saying at the bottom here, today's repositories don't really do joins at scale. The future is you're talking about hundreds or thousands or millions of data connections, and today's systems, we're talking about, I don't know, 6, 8, 10 joins and that is the fundamental problem you're saying, is a new data error coming and existing systems won't be able to handle it? >> Yeah, one way of thinking about it is that even though we call them relational databases, when we actually want to do lots of joins or when we want to analyze data from lots of different tables, we created a whole new industry for analytic databases where you sort of mung the data together into fewer tables. So you didn't have to do as many joins because the joins are difficult and slow. And when you're going to arbitrarily join thousands, hundreds of thousands or across millions of elements, you need a new type of database. We have them, they're called graph databases, but to query them, you go back to the prerelational era in terms of their usability. >> Okay, so we're going to come back to that and talk about how you get around that problem. But let's first lay out what the ideal data platform of the future we think looks like. And again, we're going to come back to use this Uber example. In this graphic that George put together, awesome. We got three layers. The application layer is where the data products reside. The example here is drivers, rides, maps, routes, ETA, et cetera. The digital version of what we were talking about in the previous slide, people, places and things. The next layer is the data layer, that breaks down the silos and connects the data elements through semantics and everything is coherent. And then the bottom layers, the legacy operational systems feed that data layer. George, explain what's different here, the graph database element, you talk about the relational query capabilities, and why can't I just throw memory at solving this problem? >> Some of the graph databases do throw memory at the problem and maybe without naming names, some of them live entirely in memory. And what you're dealing with is a prerelational in-memory database system where you navigate between elements, and the issue with that is we've had SQL for 50 years, so we don't have to navigate, we can say what we want without how to get it. That's the core of the problem. >> Okay. So if I may, I just want to drill into this a little bit. So you're talking about the expressiveness of a graph. Alex, if you'd bring that back out, the fourth bullet, expressiveness of a graph database with the relational ease of query. Can you explain what you mean by that? >> Yeah, so graphs are great because when you can describe anything with a graph, that's why they're becoming so popular. Expressive means you can represent anything easily. They're conducive to, you might say, in a world where we now want like the metaverse, like with a 3D world, and I don't mean the Facebook metaverse, I mean like the business metaverse when we want to capture data about everything, but we want it in context, we want to build a set of digital twins that represent everything going on in the world. And Uber is a tiny example of that. Uber built a graph to represent all the drivers and riders and maps and routes. But what you need out of a database isn't just a way to store stuff and update stuff. You need to be able to ask questions of it, you need to be able to query it. And if you go back to prerelational days, you had to know how to find your way to the data. It's sort of like when you give directions to someone and they didn't have a GPS system and a mapping system, you had to give them turn by turn directions. Whereas when you have a GPS and a mapping system, which is like the relational thing, you just say where you want to go, and it spits out the turn by turn directions, which let's say, the car might follow or whoever you're directing would follow. But the point is, it's much easier in a relational database to say, "I just want to get these results. You figure out how to get it." The graph database, they have not taken over the world because in some ways, it's taking a 50 year leap backwards. >> Alright, got it. Okay. Let's take a look at how the current Databricks offerings map to that ideal state that we just laid out. So to do that, we put together this chart that looks at the key elements of the Databricks portfolio, the core capability, the weakness, and the threat that may loom. Start with the Delta Lake, that's the storage layer, which is great for files and tables. It's got true separation of compute and storage, I want you to double click on that George, as independent elements, but it's weaker for the type of low latency ingest that we see coming in the future. And some of the threats highlighted here. AWS could add transactional tables to S3, Iceberg adoption is picking up and could accelerate, that could disrupt Databricks. George, add some color here please? >> Okay, so this is the sort of a classic competitive forces where you want to look at, so what are customers demanding? What's competitive pressure? What are substitutes? Even what your suppliers might be pushing. Here, Delta Lake is at its core, a set of transactional tables that sit on an object store. So think of it in a database system, this is the storage engine. So since S3 has been getting stronger for 15 years, you could see a scenario where they add transactional tables. We have an open source alternative in Iceberg, which Snowflake and others support. But at the same time, Databricks has built an ecosystem out of tools, their own and others, that read and write to Delta tables, that's what makes the Delta Lake and ecosystem. So they have a catalog, the whole machine learning tool chain talks directly to the data here. That was their great advantage because in the past with Snowflake, you had to pull all the data out of the database before the machine learning tools could work with it, that was a major shortcoming. They fixed that. But the point here is that even before we get to the semantic layer, the core foundation is under threat. >> Yep. Got it. Okay. We got a lot of ground to cover. So we're going to take a look at the Spark Execution Engine next. Think of that as the refinery that runs really efficient batch processing. That's kind of what disrupted the DOOp in a large way, but it's not Python friendly and that's an issue because the data science and the data engineering crowd are moving in that direction, and/or they're using DBT. George, we had Tristan Handy on at Supercloud, really interesting discussion that you and I did. Explain why this is an issue for Databricks? >> So once the data lake was in place, what people did was they refined their data batch, and Spark has always had streaming support and it's gotten better. The underlying storage as we've talked about is an issue. But basically they took raw data, then they refined it into tables that were like customers and products and partners. And then they refined that again into what was like gold artifacts, which might be business intelligence metrics or dashboards, which were collections of metrics. But they were running it on the Spark Execution Engine, which it's a Java-based engine or it's running on a Java-based virtual machine, which means all the data scientists and the data engineers who want to work with Python are really working in sort of oil and water. Like if you get an error in Python, you can't tell whether the problems in Python or where it's in Spark. There's just an impedance mismatch between the two. And then at the same time, the whole world is now gravitating towards DBT because it's a very nice and simple way to compose these data processing pipelines, and people are using either SQL in DBT or Python in DBT, and that kind of is a substitute for doing it all in Spark. So it's under threat even before we get to that semantic layer, it so happens that DBT itself is becoming the authoring environment for the semantic layer with business intelligent metrics. But that's again, this is the second element that's under direct substitution and competitive threat. >> Okay, let's now move down to the third element, which is the Photon. Photon is Databricks' BI Lakehouse, which has integration with the Databricks tooling, which is very rich, it's newer. And it's also not well suited for high concurrency and low latency use cases, which we think are going to increasingly become the norm over time. George, the call out threat here is customers want to connect everything to a semantic layer. Explain your thinking here and why this is a potential threat to Databricks? >> Okay, so two issues here. What you were touching on, which is the high concurrency, low latency, when people are running like thousands of dashboards and data is streaming in, that's a problem because SQL data warehouse, the query engine, something like that matures over five to 10 years. It's one of these things, the joke that Andy Jassy makes just in general, he's really talking about Azure, but there's no compression algorithm for experience. The Snowflake guy started more than five years earlier, and for a bunch of reasons, that lead is not something that Databricks can shrink. They'll always be behind. So that's why Snowflake has transactional tables now and we can get into that in another show. But the key point is, so near term, it's struggling to keep up with the use cases that are core to business intelligence, which is highly concurrent, lots of users doing interactive query. But then when you get to a semantic layer, that's when you need to be able to query data that might have thousands or tens of thousands or hundreds of thousands of joins. And that's a SQL query engine, traditional SQL query engine is just not built for that. That's the core problem of traditional relational databases. >> Now this is a quick aside. We always talk about Snowflake and Databricks in sort of the same context. We're not necessarily saying that Snowflake is in a position to tackle all these problems. We'll deal with that separately. So we don't mean to imply that, but we're just sort of laying out some of the things that Snowflake or rather Databricks customers we think, need to be thinking about and having conversations with Databricks about and we hope to have them as well. We'll come back to that in terms of sort of strategic options. But finally, when come back to the table, we have Databricks' AI/ML Tool Chain, which has been an awesome capability for the data science crowd. It's comprehensive, it's a one-stop shop solution, but the kicker here is that it's optimized for supervised model building. And the concern is that foundational models like GPT could cannibalize the current Databricks tooling, but George, can't Databricks, like other software companies, integrate foundation model capabilities into its platform? >> Okay, so the sound bite answer to that is sure, IBM 3270 terminals could call out to a graphical user interface when they're running on the XT terminal, but they're not exactly good citizens in that world. The core issue is Databricks has this wonderful end-to-end tool chain for training, deploying, monitoring, running inference on supervised models. But the paradigm there is the customer builds and trains and deploys each model for each feature or application. In a world of foundation models which are pre-trained and unsupervised, the entire tool chain is different. So it's not like Databricks can junk everything they've done and start over with all their engineers. They have to keep maintaining what they've done in the old world, but they have to build something new that's optimized for the new world. It's a classic technology transition and their mentality appears to be, "Oh, we'll support the new stuff from our old stuff." Which is suboptimal, and as we'll talk about, their biggest patron and the company that put them on the map, Microsoft, really stopped working on their old stuff three years ago so that they could build a new tool chain optimized for this new world. >> Yeah, and so let's sort of close with what we think the options are and decisions that Databricks has for its future architecture. They're smart people. I mean we've had Ali Ghodsi on many times, super impressive. I think they've got to be keenly aware of the limitations, what's going on with foundation models. But at any rate, here in this chart, we lay out sort of three scenarios. One is re-architect the platform by incrementally adopting new technologies. And example might be to layer a graph query engine on top of its stack. They could license key technologies like graph database, they could get aggressive on M&A and buy-in, relational knowledge graphs, semantic technologies, vector database technologies. George, as David Floyer always says, "A lot of ways to skin a cat." We've seen companies like, even think about EMC maintained its relevance through M&A for many, many years. George, give us your thought on each of these strategic options? >> Okay, I find this question the most challenging 'cause remember, I used to be an equity research analyst. I worked for Frank Quattrone, we were one of the top tech shops in the banking industry, although this is 20 years ago. But the M&A team was the top team in the industry and everyone wanted them on their side. And I remember going to meetings with these CEOs, where Frank and the bankers would say, "You want us for your M&A work because we can do better." And they really could do better. But in software, it's not like with EMC in hardware because with hardware, it's easier to connect different boxes. With software, the whole point of a software company is to integrate and architect the components so they fit together and reinforce each other, and that makes M&A harder. You can do it, but it takes a long time to fit the pieces together. Let me give you examples. If they put a graph query engine, let's say something like TinkerPop, on top of, I don't even know if it's possible, but let's say they put it on top of Delta Lake, then you have this graph query engine talking to their storage layer, Delta Lake. But if you want to do analysis, you got to put the data in Photon, which is not really ideal for highly connected data. If you license a graph database, then most of your data is in the Delta Lake and how do you sync it with the graph database? If you do sync it, you've got data in two places, which kind of defeats the purpose of having a unified repository. I find this semantic layer option in number three actually more promising, because that's something that you can layer on top of the storage layer that you have already. You just have to figure out then how to have your query engines talk to that. What I'm trying to highlight is, it's easy as an analyst to say, "You can buy this company or license that technology." But the really hard work is making it all work together and that is where the challenge is. >> Yeah, and well look, I thank you for laying that out. We've seen it, certainly Microsoft and Oracle. I guess you might argue that well, Microsoft had a monopoly in its desktop software and was able to throw off cash for a decade plus while it's stock was going sideways. Oracle had won the database wars and had amazing margins and cash flow to be able to do that. Databricks isn't even gone public yet, but I want to close with some of the players to watch. Alex, if you'd bring that back up, number four here. AWS, we talked about some of their options with S3 and it's not just AWS, it's blob storage, object storage. Microsoft, as you sort of alluded to, was an early go-to market channel for Databricks. We didn't address that really. So maybe in the closing comments we can. Google obviously, Snowflake of course, we're going to dissect their options in future Breaking Analysis. Dbt labs, where do they fit? Bob Muglia's company, Relational.ai, why are these players to watch George, in your opinion? >> So everyone is trying to assemble and integrate the pieces that would make building data applications, data products easy. And the critical part isn't just assembling a bunch of pieces, which is traditionally what AWS did. It's a Unix ethos, which is we give you the tools, you put 'em together, 'cause you then have the maximum choice and maximum power. So what the hyperscalers are doing is they're taking their key value stores, in the case of ASW it's DynamoDB, in the case of Azure it's Cosmos DB, and each are putting a graph query engine on top of those. So they have a unified storage and graph database engine, like all the data would be collected in the key value store. Then you have a graph database, that's how they're going to be presenting a foundation for building these data apps. Dbt labs is putting a semantic layer on top of data lakes and data warehouses and as we'll talk about, I'm sure in the future, that makes it easier to swap out the underlying data platform or swap in new ones for specialized use cases. Snowflake, what they're doing, they're so strong in data management and with their transactional tables, what they're trying to do is take in the operational data that used to be in the province of many state stores like MongoDB and say, "If you manage that data with us, it'll be connected to your analytic data without having to send it through a pipeline." And that's hugely valuable. Relational.ai is the wildcard, 'cause what they're trying to do, it's almost like a holy grail where you're trying to take the expressiveness of connecting all your data in a graph but making it as easy to query as you've always had it in a SQL database or I should say, in a relational database. And if they do that, it's sort of like, it'll be as easy to program these data apps as a spreadsheet was compared to procedural languages, like BASIC or Pascal. That's the implications of Relational.ai. >> Yeah, and again, we talked before, why can't you just throw this all in memory? We're talking in that example of really getting down to differences in how you lay the data out on disk in really, new database architecture, correct? >> Yes. And that's why it's not clear that you could take a data lake or even a Snowflake and why you can't put a relational knowledge graph on those. You could potentially put a graph database, but it'll be compromised because to really do what Relational.ai has done, which is the ease of Relational on top of the power of graph, you actually need to change how you're storing your data on disk or even in memory. So you can't, in other words, it's not like, oh we can add graph support to Snowflake, 'cause if you did that, you'd have to change, or in your data lake, you'd have to change how the data is physically laid out. And then that would break all the tools that talk to that currently. >> What in your estimation, is the timeframe where this becomes critical for a Databricks and potentially Snowflake and others? I mentioned earlier midterm, are we talking three to five years here? Are we talking end of decade? What's your radar say? >> I think something surprising is going on that's going to sort of come up the tailpipe and take everyone by storm. All the hype around business intelligence metrics, which is what we used to put in our dashboards where bookings, billings, revenue, customer, those things, those were the key artifacts that used to live in definitions in your BI tools, and DBT has basically created a standard for defining those so they live in your data pipeline or they're defined in their data pipeline and executed in the data warehouse or data lake in a shared way, so that all tools can use them. This sounds like a digression, it's not. All this stuff about data mesh, data fabric, all that's going on is we need a semantic layer and the business intelligence metrics are defining common semantics for your data. And I think we're going to find by the end of this year, that metrics are how we annotate all our analytic data to start adding common semantics to it. And we're going to find this semantic layer, it's not three to five years off, it's going to be staring us in the face by the end of this year. >> Interesting. And of course SVB today was shut down. We're seeing serious tech headwinds, and oftentimes in these sort of downturns or flat turns, which feels like this could be going on for a while, we emerge with a lot of new players and a lot of new technology. George, we got to leave it there. Thank you to George Gilbert for excellent insights and input for today's episode. I want to thank Alex Myerson who's on production and manages the podcast, of course Ken Schiffman as well. Kristin Martin and Cheryl Knight help get the word out on social media and in our newsletters. And Rob Hof is our EIC over at Siliconangle.com, he does some great editing. Remember all these episodes, they're available as podcasts. Wherever you listen, all you got to do is search Breaking Analysis Podcast, we publish each week on wikibon.com and siliconangle.com, or you can email me at David.Vellante@siliconangle.com, or DM me @DVellante. Comment on our LinkedIn post, and please do check out ETR.ai, great survey data, enterprise tech focus, phenomenal. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time on Breaking Analysis.

Published Date : Mar 10 2023

SUMMARY :

bringing you data-driven core elements of the Databricks portfolio and pervasiveness in the data and that was where you went for data. and Cloudera set out to fix that. the reason you see and the robustness of Databricks and their big challenge and the data locked into in the real world and decisions Yes, and the mission of that is propelling the likes that the way you manage that data, is the fundamental problem because the joins are difficult and slow. and connects the data and the issue with that is the fourth bullet, expressiveness and it spits out the and the threat that may loom. because in the past with Snowflake, Think of that as the refinery So once the data lake was in place, George, the call out threat here But the key point is, in sort of the same context. and the company that put One is re-architect the platform and architect the components some of the players to watch. in the case of ASW it's DynamoDB, and why you can't put a relational and executed in the data and manages the podcast, of

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
David Floyer	PERSON	0.99+
Mike Olson	PERSON	0.99+
2014	DATE	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
George	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Erik Bradley	PERSON	0.99+
Dave	PERSON	0.99+
Uber	ORGANIZATION	0.99+
thousands	QUANTITY	0.99+
Sun Microsystems	ORGANIZATION	0.99+
50 years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Bob Muglia	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
60 years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
2010	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Kristin Martin	PERSON	0.99+
Rob Hof	PERSON	0.99+
three	QUANTITY	0.99+
15 years	QUANTITY	0.99+
Databricks'	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
Boston	LOCATION	0.99+
Tristan Handy	PERSON	0.99+
M&A	ORGANIZATION	0.99+
Frank Quattrone	PERSON	0.99+
second element	QUANTITY	0.99+
Daren Brabham	PERSON	0.99+
TechAlpha Partners	ORGANIZATION	0.99+
third element	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
50 year	QUANTITY	0.99+
40%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
five years	QUANTITY	0.99+

Google's PoV on Confidential Computing NO PUB

>> Welcome Nelly and Patricia, great to have you. >> Great to be here. >> Thank you so much for having us. >> You're very welcome. Nelly, why don't you start, and then Patricia you can weigh in. Just tell the audience a little bit about each of your roles at Google Cloud. >> So I'll start, I'm honing a lot of interesting activities in Google and again, security or infrastructure securities that I usually hone, and we're talking about encryption, Antware encryption, and confidential computing is a part of portfolio. In additional areas that I contribute to get with my team to Google and our customers is secure software supply chain. Because you need to trust your software. Is it operating your confidential environment to have end to end story about if you believe that your software and your environment doing what you expect, it's my role. >> Got it, okay. Patricia? >> Well I am a technical director in the office of the CTO, OCTO for short, in Google Cloud. And we are a global team. We include former CTOs like myself and senior technologies from large corporations, institutions, and a lot of success for startups as well. And we have two main goals. First, we work side by side with some of our largest, more strategic or most strategic customers and we help them solve complex engineering technical problems. And second, we are device Google and Google Cloud engineering and product management on emerging trends in technologies to guide the trajectory of our business. We are unique group, I think, because we have created this collaborative culture with our customers. And within OCTO I spend a lot of time collaborating with customers in the industry at large on technologies that can address privacy, security, and sovereignty of data in general. >> Excellent, thank you for that both of you. Let's get into it. So Nelly, what is confidential computing from Google's perspective? How do you define it? >> Confidential computing is a tool. And it's one of the tools in our toolbox. And confidential computing is a way how would help our customers to complete this very interesting end to end lifecycle of their data. And when customers bring in the data to Cloud and want to protect it, as they ingest it to the Cloud, they protect it address when they store data in the Cloud. But what was missing for many, many years is ability for us to continue protecting data and workloads of our customers when they running them. And again, because data is not brought to Cloud to have huge graveyard, we need to ensure that this data is actually indexed. Again there is some insights driven and drawn from this data. You have to process this data and confidential computing here to help. Now we have end to end protection of our customer's data when they bring the workloads and data to Cloud, thanks to confidential computing. >> Thank you for that. Okay, we're going to get into the architecture a bit but before we do Patricia, why do you think this topic of confidential computing is such an important technology? Can you explain, do you think it's transformative for customers and if so, why? >> Yeah, I would maybe like to use one thought, one way, one intuition behind why confidential matters. Because at the end of the day it reduces more and more the customers thrush boundaries and the attack surface, that's about reducing that periphery, the boundary, in which the customer needs to mind about trust and safety. And in a way is a natural progression that you're using encryption to secure and protect data in the same way that we are encrypting data in transit and at rest. Now we are also encrypting data while in use. And among other beneficial I would say one of the most transformative ones is that organizations will be able to collaborate with each other and retain the confidentiality of the data. And that is across industry. Even though it's highly focused on, I wouldn't say highly focused, but very beneficial for highly regulated industries. It applies to all of industries. And if you look at financing for example, where bankers are trying to detect fraud and specifically double finance where you are a customer is actually trying to get a finance on an asset, let's say a boat or a house and then it goes to another bank and gets another finance on that asset. Now bankers would be able to collaborate and detect fraud while preserving confidentiality and privacy of the of the data. >> Interesting, and I want to understand that a little bit more but I'm going to push you a little bit on this, Nelly, if I can, because there's a narrative out there that says confidential computing is a marketing ploy. I talked about this upfront, by Cloud providers that are just trying to placate people that are scared of the Cloud. And I'm presuming you don't agree with that but I'd like you to weigh in here. The argument is confidential computing is just memory encryption, it doesn't address many other problems, it is overhyped by Cloud providers. What do you say to that line of thinking? >> I absolutely disagree as you can imagine, it's a crazy statement. But the most importantly is we mixing multiple concepts I guess. And exactly as Patricia said, we need to look at the end-to-end story not again the mechanism of how confidential computing trying to again execute and protect customer's data, and why it's so critically important. Because what confidential computing was able to do it's in addition to isolate our tenants in multi-tenant environments the Cloud over. To offer additional stronger isolation, we called it cryptographic isolation. It's why customers will have more trust to customers and to other customers, the tenants that's running on the same host but also us, because they don't need to worry about against threats and more malicious attempts to penetrate the environment. So what confidential computing is helping us to offer our customers, stronger isolation between tenants in this multi-tenant environment but also incredibly important, stronger isolation of our customers. So tenants from us, we also writing code, we also software providers will also make mistakes or have some zero days sometimes again us introduced, sometimes introduced by our adversaries. But what I'm trying to say by creating this cryptographic layer of isolation between us and our tenants, and amongst those tenants, they're really providing meaningful security to our customers and eliminate some of the worries that they have running on multi-tenant spaces or even collaborating together this very sensitive data, knowing that this particular protection is available to them. >> Okay, thank you, appreciate that. And I, you know, I think malicious code is often a threat model missed in these narratives. You know, operator access, yeah, could maybe I trust my Clouds provider, but if I can fence off your access even better I'll sleep better at night. Separating a code from the data, everybody's arm Intel, AM, Invidia, others, they're all doing it. I wonder if Nell, if we could stay with you and bring up the slide on the architecture. What's architecturally different with confidential computing versus how operating systems and VMs have worked traditionally? We're showing a slide here with some VMs, maybe you could take us through that. >> Absolutely, and Dave, the whole idea for Google and industry way of dealing with confidential computing is to ensure as it's three main property is actually preserved. Customers don't need to change the code. They can operate in those VMs exactly as they would with normal non-confidential VMs. But to give them this opportunity of lift and shift or no changing their apps and performing and having very, very, very low latency and scale as any Cloud can, something that Google actually pioneered in confidential computing. I think we need to open and explain how this magic was actually done. And as I said, it's again the whole entire system have to change to be able to provide this magic. And I would start with we have this concept of root of trust and root of trust where we will ensure that this machine, the whole entire post has integrity guarantee, means nobody changing my code on the most low level of system. And we introduce this in 2017 code Titan. Those our specific ASIC specific, again inch by inch system on every single motherboard that we have, that ensures that your low level former, your actually system code, your kernel, the most powerful system, is actually proper configured and not changed, not tempered. We do it for everybody, confidential computing concluded. But for confidential computing what we have to change we bring in a MD again, future silicon vendors, and we have to trust their former, their way to deal with our confidential environments. And that's why we have obligation to validate integrity not only our software and our firmware but also firmware and software of our vendors, silicon vendors. So we actually, when we booting this machine as you can see, we validate that integrity of all of this system is in place. It means nobody touching, nobody changing, nobody modifying it. But then we have this concept of the secure processor. It's special Asics best, specific things that generate a key for every single VM that our customers will run or every single node in Kubernetes, or every single worker thread in our Spark capability. We offer all of that, and those keys are not available to us. It's the best keys ever in encryption space. Because when we are talking about encryption the first question that I'm receiving all the time, where's the key, who will have access to the key? Because if you have access to the key then it doesn't matter if you encrypt it enough. But the case in confidential computing quite so revolutionary technology, ask Cloud providers who don't have access to the keys. They're sitting in the hardware and they fed to memory controller. And it means when Hypervisors that also know about these wonderful things, saying I need to get access to the memories that this particular VM I'm trying to get access to. They do not encrypt the data, they don't have access to the key. Because those keys are random, ephemeral and VM, but the most importantly in hardware not exportable. And it means now you will be able to have this very interesting role that customers all Cloud providers, will not be able to get access to your memory. And what we do, again, as you can see our customers don't need to change their applications. Their VMs are running exactly as it should run. And what you're running in VM you actually see your memory in clear, it's not encrypted. But God forbid is trying somebody to do it outside of my confidential box. No, no, no, no, no, you will not be able to do it. Now you'll see cybernet. And it's exactly what combination of these multiple hardware pieces and software pieces have to do. So OS is also modified, and OS is modified such way to provide integrity. It means even OS that you're running in UVM bucks is not modifiable and you as customer can verify. But the most interesting thing I guess how to ensure the super performance of this environment because you can imagine, Dave, that's increasing it's additional performance, additional time, additional latency. So we're able to mitigate all of that by providing incredibly interesting capability in the OS itself. So our customers will get no changes needed, fantastic performance, and scales as they would expect from Cloud providers like Google. >> Okay, thank you. Excellent, appreciate that explanation. So you know again, the narrative on this is, well you know you've already given me guarantees as a Cloud provider that you don't have access to my data but this gives another level of assurance. Key management as they say is key. Now you're not, humans aren't managing the keys the machines are managing them. So Patricia, my question to you is in addition to, you know, let's go pre-confidential computing days what are the sort of new guarantees that these hardware-based technologies are going to provide to customers? >> So if I am a customer, I am saying I now have full guarantee of confidentiality and integrity of the data and of the code. So if you look at code and data confidentiality the customer cares then they want to know whether their systems are protected from outside or unauthorized access. And that we covered with Nelly that it is. Confidential computing actually ensures that the applications and data antennas remain secret, right? The code is actually looking at the data only the memory is decrypting the data with a key that is ephemeral, and per VM, and generated on demand. Then you have the second point where you have code and data integrity and now customers want to know whether their data was corrupted, tempered, with or impacted by outside actors. And what confidential computing insures is that application internals are not tampered with. So the application, the workload as we call it, that is processing the data it's also it has not been tempered and preserves integrity. I would also say that this is all verifiable. So you have attestation, and this attestation actually generates a log trail and the log trail guarantees that provides a proof that it was preserved. And I think that the offers also a guarantee of what we call ceiling, this idea that the secrets have been preserved and not tempered with. Confidentiality and integrity of code and data. >> Got it, okay, thank you. You know, Nelly, you mentioned, I think I heard you say that the applications, it's transparent,you don't have to change the application it just comes for free essentially. And I'm, we showed some various parts of the stack before. I'm curious as to what's affected but really more importantly what is specifically Google's value add? You know, how do partners, you know, participate in this? The ecosystem or maybe said another way how does Google ensure the compatibility of confidential computing with existing systems and applications? >> And a fantastic question by the way. And it's very difficult and definitely complicated world because to be able to provide these guarantees actually a lot of works was done by community. Google is very much operate and open. So again, our operating system we working in this operating system repository OS vendors to ensure that all capabilities that we need is part of their kernels, are part of their releases, and it's available for customers to understand and even explore if they have fun to explore a lot of code. We have also modified together with our silicon vendors, kernel, host kernel, to support this capability and it means working this community to ensure that all of those patches are there. We also worked with every single silicon vendor as you've seen, and that's what I probably feel that Google contributed quite a bit in this role. We moved our industry, our community, our vendors to understand the value of easy to use confidential computing or removing barriers. And now I don't know if you noticed Intel is pulling the lead and also announcing the trusted domain extension very similar architecture and no surprise, it's again a lot of work done with our partners to again, convince, work with them, and make this capability available. The same with ARM this year, actually last year, ARM unknowns are future design for confidential computing. It's called confidential computing architecture. And it's also influenced very heavily with similar ideas by Google and industry overall. So it's a lot of work in confidential computing consortiums that we are doing. For example, simply to mention to ensure interop, as you mentioned, between different confidential environments of Cloud providers. We want to ensure that they can attest to each other. Because when you're communicating with different environments, you need to trust them. And if it's running on different Cloud providers you need to ensure that you can trust your receiver when you are sharing your sensitive data workloads or secret with them. So we coming as a community and we have this at the station, the community based systems that we want to build and influence and work with ARM and every other Cloud providers to ensure that they can interrupt. And it means it doesn't matter where confidential workloads will be hosted but they can exchange the data in secure, verifiable, and controlled by customers way. And to do it, we need to continue what we are doing. Working open again and contribute with our ideas and ideas of our partners to this role to become what we see confidential computing has to become, it has to become utility. It doesn't need to be so special but it's what what we've wanted to become. >> Let's talk about, thank you for that explanation. Let talk about data sovereignty, because when you think about data sharing you think about data sharing across, you know, the ecosystem and different regions and then of course data sovereignty comes up. Typically public policy lags, you know, the technology industry and sometimes is problematic. I know, you know, there's a lot of discussions about exceptions, but Patricia, we have a graphic on data sovereignty. I'm interested in how confidential computing ensures that data sovereignty and privacy edicts are adhered to even if they're out of alignment maybe with the pace of technology. One of the frequent examples is when you you know, when you delete data, can you actually prove the data is deleted with a hundred percent certainty? You got to prove that and a lot of other issues. So looking at this slide, maybe you could take us through your thinking on data sovereignty. >> Perfect, so for us, data sovereignty is only one of the three pillars of digital sovereignty. And I don't want to give the impression that confidential computing addresses at all. That's why we want to step back and say, hey, digital sovereignty includes data sovereignty where we are giving you full control and ownership of the location, encryption, and access to your data. Operational sovereignty where the goal is to give our Google Cloud customers full visibility and control over the provider operations, right? So if there are any updates on hardware, software, stack, any operations, that is full transparency, full visibility. And then the third pillar is around software sovereignty where the customer wants to ensure that they can run their workloads without dependency on the provider's software. So they have sometimes is often referred as survivability that you can actually survive if you are untethered to the Cloud and that you can use open source. Now let's take a deep dive on data sovereignty, which by the way is one of my favorite topics. And we typically focus on saying, hey, we need to care about data residency. We care where the data resides because where the data is at rest or in processing it typically abides to the jurisdiction, the regulations of the jurisdiction where the data resides. And others say, hey, let's focus on data protection. We want to ensure the confidentiality and integrity and availability of the data which confidential computing is at the heart of that data protection. But it is yet another element that people typically don't talk about when talking about data sovereignty, which is the element of user control. And here Dave, is about what happens to the data when I give you access to my data. And this reminds me of security two decades ago, even a decade ago, where we started the security movement by putting firewall protections and login accesses. But once you were in, you were able to do everything you wanted with the data, an insider had access to all the infrastructure, the data, and the code. And that's similar because with data sovereignty we care about whether it resides, who is operating on the data. But the moment that the data is being processed, I need to trust that the processing of the data will abide by user control, by the policies that I put in place of how my data is going to be used. And if you look at a lot of the regulation today and a lot of the initiatives around the International Data Space Association, IDSA, and Gaia X, there is a movement of saying the two parties, the provider of the data and the receiver of the data going to agree on a contract that describes what my data can be used for. The challenge is to ensure that once the data crosses boundaries, that the data will be used for the purposes that it was intended and specified in the contract. And if you actually bring together, and this is the exciting part, confidential computing together with policy enforcement. Now the policy enforcement can guarantee that the data is only processed within the confines of a confidential computing environment. That the workload is cryptographically verified that there is the workload that was meant to process the data and that the data will be only used when abiding to the confidentiality and integrity, safety of the confidential computing environment. And that's why we believe confidential computing is one, necessary and essential technology that will allow us to ensure data sovereignty especially when it comes to user control. >> Thank you for that. I mean it was a deep dive, I mean brief, but really detailed, so I appreciate that, especially the verification of the enforcement. Last question, I met you two because as part of my year end prediction post you guys sent in some predictions, and I wasn't able to get to them in the predictions post. So I'm thrilled that you were able to make the time to come on the program. How widespread do you think the adoption of confidential computing will be in '23 and what's the maturity curve look like, you know, this decade in, in your opinion? Maybe each of you could give us a brief answer. >> So my prediction in five, seven years as I started, it'll become utility. It'll become TLS. As of, again, 10 years ago we couldn't believe that websites will have certificates and we will support encrypted traffic. Now we do, and it's become ubiquity. It's exactly where our confidential computing is heading and heading, I don't know if we are there yet yet. It'll take a few years of maturity for us, but we'll do that. >> Thank you, and Patricia, what's your prediction? >> I would double that and say, hey, in the future, in the very near future you will not be able to afford not having it. I believe as digital sovereignty becomes ever more top of mind with sovereign states and also for multinational organizations and for organizations that want to collaborate with each other, confidential computing will become the norm. It'll become the default, If I say mode of operation, I like to compare that, today is inconceivable if we talk to the young technologists. It's inconceivable to think that at some point in history and I happen to be alive that we had data at address that was not encrypted. Data in transit, that was not encrypted. And I think that we will be inconceivable at some point in the near future that to have unencrypted data while we use. >> You know, and plus, I think the beauty of the this industry is because there's so much competition this essentially comes for free. I want to thank you both for spending some time on Breaking Analysis. There's so much more we could cover. I hope you'll come back to share the progress that you're making in this area and we can double click on some of these topics. Really appreciate your time. >> Anytime. >> Thank you so much.

Published Date : Feb 10 2023

SUMMARY :

Patricia, great to have you. and then Patricia you can weigh in. In additional areas that I contribute to Got it, okay. of the CTO, OCTO for Excellent, thank you in the data to Cloud into the architecture a bit and privacy of the of the data. but I'm going to push you a is available to them. we could stay with you and they fed to memory controller. So Patricia, my question to you is and integrity of the data and of the code. that the applications, and ideas of our partners to this role is when you you know, and that the data will be only used of the enforcement. and we will support encrypted traffic. and I happen to be alive and we can double click

ENTITIES

Entity	Category	Confidence
Nelly	PERSON	0.99+
Patricia	PERSON	0.99+
International Data Space Association	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Google	ORGANIZATION	0.99+
IDSA	ORGANIZATION	0.99+
last year	DATE	0.99+
2017	DATE	0.99+
two parties	QUANTITY	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
second point	QUANTITY	0.99+
First	QUANTITY	0.99+
ARM	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
five	QUANTITY	0.99+
both	QUANTITY	0.99+
Intel	ORGANIZATION	0.99+
two decades ago	DATE	0.99+
Asics	ORGANIZATION	0.99+
second	QUANTITY	0.99+
Gaia X	ORGANIZATION	0.99+
One	QUANTITY	0.99+
each	QUANTITY	0.98+
seven years	QUANTITY	0.98+
OCTO	ORGANIZATION	0.98+
one thought	QUANTITY	0.98+
a decade ago	DATE	0.98+
this year	DATE	0.98+
10 years ago	DATE	0.98+
Invidia	ORGANIZATION	0.98+
'23	DATE	0.98+
today	DATE	0.98+
Cloud	TITLE	0.98+
three pillars	QUANTITY	0.97+
one way	QUANTITY	0.97+
hundred percent	QUANTITY	0.97+
zero days	QUANTITY	0.97+
three main property	QUANTITY	0.95+
third pillar	QUANTITY	0.95+
two main goals	QUANTITY	0.95+
CTO	ORGANIZATION	0.93+
Nell	PERSON	0.9+
Kubernetes	TITLE	0.89+
every single VM	QUANTITY	0.86+
Nelly	ORGANIZATION	0.83+
Google Cloud	TITLE	0.82+
every single worker	QUANTITY	0.77+
every single node	QUANTITY	0.74+
AM	ORGANIZATION	0.73+
double	QUANTITY	0.71+
single motherboard	QUANTITY	0.68+
single silicon	QUANTITY	0.57+
Spark	TITLE	0.53+
kernel	TITLE	0.53+
inch	QUANTITY	0.48+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Breaking Analysis: CIOs in a holding pattern but ready to strike at monetization

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Recent conversations with IT decision makers show a stark contrast between exiting 2023 versus the mindset when we were leaving 2022. CIOs are generally funding new initiatives by pushing off or cutting lower priority items, while security efforts are still being funded. Those that enable business initiatives that generate revenue or taking priority over cleaning up legacy technical debt. The bottom line is, for the moment, at least, the mindset is not cut everything, rather, it's put a pause on cleaning up legacy hairballs and fund monetization. Hello, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we tap recent discussions from two primary sources, year-end ETR roundtables with IT decision makers, and CUBE conversations with data, cloud, and IT architecture practitioners. The sources of data for this breaking analysis come from the following areas. Eric Bradley's recent ETR year end panel featured a financial services DevOps and SRE manager, a CSO in a large hospitality firm, a director of IT for a big tech company, the head of IT infrastructure for a financial firm, and a CTO for global travel enterprise, and for our upcoming Supercloud2 conference on January 17th, which you can register free by the way, at supercloud.world, we've had CUBE conversations with data and cloud practitioners, specifically, heads of data in retail and financial services, a cloud architect and a biotech firm, the director of cloud and data at a large media firm, and the director of engineering at a financial services company. Now we've curated commentary from these sources and now we share them with you today as anecdotal evidence supporting what we've been reporting on in the marketplace for these last couple of quarters. On this program, we've likened the economy to the slingshot effect when you're driving, when you're cruising along at full speed on the highway, and suddenly you see red brake lights up ahead, so, you tap your own brakes and then you speed up again, and traffic is moving along at full speed, so, you think nothing of it, and then, all of a sudden, the same thing happens. You slow down to a crawl and you start wondering, "What the heck is happening?" And you become a lot more cautious about the rate of acceleration when you start moving again. Well, that's the trend in IT spend right now. Back in June, we reported that despite the macro headwinds, CIOs were still expecting 6% to 7% spending growth for 2022. Now that was down from 8%, which we reported at the beginning of 2022. That was before Ukraine, and Fed tightening, but given those two factors, you know that that seemed pretty robust, but throughout the fall, we began reporting consistently declining expectations where CIOs are now saying Q4 will come in at around 3% growth relative to last year, and they're expecting, or should we say hoping that it pops back up in 2023 to 4% to 5%. The recent ETR panelists, when they heard this, are saying based on their businesses and discussions with their peers, they could see low single digit growth for 2023, so, 1%, 2%, 3%, so, this sort of slingshotting, or sometimes we call it a seesaw economy, has caught everyone off guard. Amazon is a good example of this, and there are others, but Amazon entered the pandemic with around 800,000 employees. It doubled that workforce during the pandemic. Now, right before Thanksgiving in 2022, Amazon announced that it was laying off 10,000 employees, and, Jassy, the CEO of Amazon, just last week announced that number is now going to grow to 18,000. Now look, this is a rounding error at Amazon from a headcount standpoint and their headcount remains far above 2019 levels. Its stock price, however, does not and it's back down to 2019 levels. The point is that visibility is very poor right now and it's reflected in that uncertainty. We've seen a lot of layoffs, obviously, the stock market's choppy, et cetera. Now importantly, not everything is on hold, and this downturn is different from previous tech pullbacks in that the speed at which new initiatives can be rolled out is much greater thanks to the cloud, and if you can show a fast return, you're going to get funding. Organizations are pausing on the cleanup of technical debt, unless it's driving fast business value. They're holding off on modernization projects. Those business enablement initiatives are still getting funded. CIOs are finding the money by consolidating redundant vendors, and they're stealing from other pockets of budget, so, it's not surprising that cybersecurity remains the number one technology priority in 2023. We've been reporting that for quite some time now. It's specifically cloud, cloud native security container and API security. That's where all the action is, because there's still holes to plug from that forced march to digital that occurred during COVID. Cloud migration, kind of showing here on number two on this chart, still a high priority, while optimizing cloud spend is definitely a strategy that organizations are taking to cut costs. It's behind consolidating redundant vendors by a long shot. There's very little evidence that cloud repatriation, i.e., moving workloads back on prem is a major cost cutting trend. The data just doesn't show it. What is a trend is getting more real time with analytics, so, companies can do faster and more accurate customer targeting, and they're really prioritizing that, obviously, in this down economy. Real time, we sometimes lose it, what's real time? Real time, we sometimes define as before you lose the customer. Now in the hiring front, customers tell us they're still having a hard time finding qualified site reliability engineers, SREs, Kubernetes expertise, and deep analytics pros. These job markets remain very tight. Let's stay with security for just a moment. We said many times that, prior to COVID, zero trust was this undefined buzzword, and the joke, of course, is, if you ask three people, "What is zero trust?" You're going to get three different answers, but the truth is that virtually every security company that was resisting taking a position on zero trust in an attempt to avoid... They didn't want to get caught up in the buzzword vortex, but they're now really being forced to go there by CISOs, so, there are some good quotes here on cyber that we want to share that came out of the recent conversations that we cited up front. The first one, "Zero trust is the highest ROI, because it enables business transformation." In other words, if I can have good security, I can move fast, it's not a blocker anymore. Second quote here, "ZTA," zero trust architecture, "Is more than securing the perimeter. It encompasses strong authentication and multiple identity layers. It requires taking a software approach to security instead of a hardware focus." The next one, "I'd love to have a security data lake that I could apply to asset management, vulnerability management, incident management, incident response, and all aspects for my security team. I see huge promise in that space," and the last one, I see NLP, natural language processing, as the foundation for email security, so, instead of searching for IP addresses, you can now read emails at light speed and identify phishing threats, so, look at, this is a small snapshot of the mindset around security, but I'll add, when you talk to the likes of CrowdStrike, and Zscaler, and Okta, and Palo Alto Networks, and many other security firms, they're listening to these narratives around zero trust. I'm confident they're working hard on skating to this puck, if you will. A good example is this idea of a security data lake and using analytics to improve security. We're hearing a lot about that. We're hearing architectures, there's acquisitions in that regard, and so, that's becoming real, and there are many other examples, because data is at the heart of digital business. This is the next area that we want to talk about. It's obvious that data, as a topic, gets a lot of mind share amongst practitioners, but getting data right is still really hard. It's a challenge for most organizations to get ROI and expected return out of data. Most companies still put data at the periphery of their businesses. It's not at the core. Data lives within silos or different business units, different clouds, it's on-prem, and increasingly it's at the edge, and it seems like the problem is getting worse before it gets better, so, here are some instructive comments from our recent conversations. The first one, "We're publishing events onto Kafka, having those events be processed by Dataproc." Dataproc is a Google managed service to run Hadoop, and Spark, and Flank, and Presto, and a bunch of other open source tools. We're putting them into the appropriate storage models within Google, and then normalize the data into BigQuery, and only then can you take advantage of tools like ThoughtSpot, so, here's a company like ThoughtSpot, and they're all about simplifying data, democratizing data, but to get there, you have to go through some pretty complex processes, so, this is a good example. All right, another comment. "In order to use Google's AI tools, we have to put the data into BigQuery. They haven't integrated in the way AWS and Snowflake have with SageMaker. Moving the data is too expensive, time consuming, and risky," so, I'll just say this, sharing data is a killer super cloud use case, and firms like Snowflake are on top of it, but it's still not pretty across clouds, and Google's posture seems to be, "We're going to let our database product competitiveness drive the strategy first, and the ecosystem is going to take a backseat." Now, in a way, I get it, owning the database is critical, and Google doesn't want to capitulate on that front. Look, BigQuery is really good and competitive, but you can't help but roll your eyes when a CEO stands up, and look, I'm not calling out Thomas Kurian, every CEO does this, and talks about how important their customers are, and they'll do whatever is right by the customer, so, look, I'm telling you, I'm rolling my eyes on that. Now let me also comment, AWS has figured this out. They're killing it in database. If you take Redshift for example, it's still growing, as is Aurora, really fast growing services and other data stores, but AWS realizes it can make more money in the long-term partnering with the Snowflakes and Databricks of the world, and other ecosystem vendors versus sub optimizing their relationships with partners and customers in order to sell more of their own homegrown tools. I get it. It's hard not to feature your own product. IBM chose OS/2 over Windows, and tried for years to popularize it. It failed. Lotus, go back way back to Lotus 1, 2, and 3, they refused to run on Windows when it first came out. They were running on DEC VAX. Many of you young people in the United States have never even heard of DEC VAX. IBM wanted to run every everything only in its cloud, the same with Oracle, originally. VMware, as you might recall, tried to build its own cloud, but, eventually, when the market speaks and reveals what seems to be obvious to analysts, years before, the vendors come around, they face reality, and they stop wasting money, fighting a losing battle. "The trend is your friend," as the saying goes. All right, last pull quote on data, "The hardest part is transformations, moving traditional Informatica, Teradata, or Oracle infrastructure to something more modern and real time, and that's why people still run apps in COBOL. In IT, we rarely get rid of stuff, rather we add on another coat of paint until the wood rots out or the roof is going to cave in. All right, the last key finding we want to highlight is going to bring us back to the cloud repatriation myth. Followers of this program know it's a real sore spot with us. We've heard the stories about repatriation, we've read the thoughtful articles from VCs on the subject, we've been whispered to by vendors that you should investigate this trend. It's really happening, but the data simply doesn't support it. Here's the question that was posed to these practitioners. If you had unlimited budget and the economy miraculously flipped, what initiatives would you tackle first? Where would you really lean into? The first answer, "I'd rip out legacy on-prem infrastructure and move to the cloud even faster," so, the thing here is, look, maybe renting infrastructure is more expensive than owning, maybe, but if I can optimize my rental with better utilization, turn off compute, use things like serverless, get on a steeper and higher performance over time, and lower cost Silicon curve with things like Graviton, tap best of breed tools in AI, and other areas that make my business more competitive. Move faster, fail faster, experiment more quickly, and cheaply, what's that worth? Even the most hard-o CFOs understand the business benefits far outweigh the possible added cost per gigabyte, and, again, I stress "possible." Okay, other interesting comments from practitioners. "I'd hire 50 more data engineers and accelerate our real-time data capabilities to better target customers." Real-time is becoming a thing. AI is being injected into data and apps to make faster decisions, perhaps, with less or even no human involvement. That's on the rise. Next quote, "I'd like to focus on resolving the concerns around cloud data compliance," so, again, despite the risks of data being spread out in different clouds, organizations realize cloud is a given, and they want to find ways to make it work better, not move away from it. The same thing in the next one, "I would automate the data analytics pipeline and focus on a safer way to share data across the states without moving it," and, finally, "The way I'm addressing complexity is to standardize on a single cloud." MonoCloud is actually a thing. We're hearing this more and more. Yes, my company has multiple clouds, but in my group, we've standardized on a single cloud to simplify things, and this is a somewhat dangerous trend, because it's creating even more silos and it's an opportunity that needs to be addressed, and that's why we've been talking so much about supercloud is a cross-cloud, unifying, architectural framework, or, perhaps, it's a platform. In fact, that's a question that we will be exploring later this month at Supercloud2 live from our Palo Alto Studios. Is supercloud an architecture or is it a platform? And in this program, we're featuring technologists, analysts, practitioners to explore the intersection between data and cloud and the future of cloud computing, so, you don't want to miss this opportunity. Go to supercloud.world. You can register for free and participate in the event directly. All right, thanks for listening. That's a wrap. I'd like to thank Alex Myerson, who's on production and manages our podcast, Ken Schiffman as well, Kristen Martin and Cheryl Knight, they helped get the word out on social media, and in our newsletters, and Rob Hof is our editor-in-chief over at siliconangle.com. He does some great editing. Thank you, all. Remember, all these episodes are available as podcasts wherever you listen. All you've got to do is search "breaking analysis podcasts." I publish each week on wikibon.com and siliconangle.com where you can email me directly at david.vellante@siliconangle.com or DM me, @Dante, or comment on our LinkedIn posts. By all means, check out etr.ai. They get the best survey data in the enterprise tech business. We'll be doing our annual predictions post in a few weeks, once the data comes out from the January survey. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, everybody, and we'll see you next time on "Breaking Analysis." (upbeat music)

Published Date : Jan 7 2023

SUMMARY :

This is "Breaking Analysis" and the director of engineering

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Ken Schiffman	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jassy	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Eric Bradley	PERSON	0.99+
Rob Hof	PERSON	0.99+
Okta	ORGANIZATION	0.99+
Kristen Martin	PERSON	0.99+
Zscaler	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Thomas Kurian	PERSON	0.99+
6%	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
2023	DATE	0.99+
18,000	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
10,000 employees	QUANTITY	0.99+
CrowdStrike	ORGANIZATION	0.99+
January	DATE	0.99+
2022	DATE	0.99+
January 17th	DATE	0.99+
Boston	LOCATION	0.99+
Lotus 1	TITLE	0.99+
2019	DATE	0.99+
June	DATE	0.99+
8%	QUANTITY	0.99+
United States	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
Snowflakes	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Lotus	TITLE	0.99+
two factors	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Dataproc	ORGANIZATION	0.99+
three people	QUANTITY	0.99+
last week	DATE	0.99+
Supercloud2	EVENT	0.99+
Teradata	ORGANIZATION	0.99+
1%	QUANTITY	0.99+
3	TITLE	0.99+
Windows	TITLE	0.99+
5%	QUANTITY	0.99+
3%	QUANTITY	0.99+
BigQuery	TITLE	0.99+
Second quote	QUANTITY	0.99+
4%	QUANTITY	0.99+
DEC VAX	TITLE	0.99+
Thanksgiving	EVENT	0.98+
OS/2	TITLE	0.98+
7%	QUANTITY	0.98+
last year	DATE	0.98+
two primary sources	QUANTITY	0.98+
each week	QUANTITY	0.98+
Informatica	ORGANIZATION	0.98+
pandemic	EVENT	0.98+
first one	QUANTITY	0.98+
siliconangle.com	OTHER	0.97+
first answer	QUANTITY	0.97+
2%	QUANTITY	0.97+
around 800,000 employees	QUANTITY	0.97+
50 more data engineers	QUANTITY	0.97+
zero trust	QUANTITY	0.97+
Snowflake	ORGANIZATION	0.96+
single cloud	QUANTITY	0.96+
2	TITLE	0.96+
today	DATE	0.95+
ETR	ORGANIZATION	0.95+
single cloud	QUANTITY	0.95+
LinkedIn	ORGANIZATION	0.94+
later this month	DATE	0.94+

Tomer Shiran, Dremio | AWS re:Invent 2022

>>Hey everyone. Welcome back to Las Vegas. It's the Cube live at AWS Reinvent 2022. This is our fourth day of coverage. Lisa Martin here with Paul Gillen. Paul, we started Monday night, we filmed and streamed for about three hours. We have had shammed pack days, Tuesday, Wednesday, Thursday. What's your takeaway? >>We're routed final turn as we, as we head into the home stretch. Yeah. This is as it has been since the beginning, this show with a lot of energy. I'm amazed for the fourth day of a conference, how many people are still here I am too. And how, and how active they are and how full the sessions are. Huge. Proud for the keynote this morning. You don't see that at most of the day four conferences. Everyone's on their way home. So, so people come here to learn and they're, and they're still >>Learning. They are still learning. And we're gonna help continue that learning path. We have an alumni back with us, Toron joins us, the CPO and co-founder of Dremeo. Tomer, it's great to have you back on the program. >>Yeah, thanks for, for having me here. And thanks for keeping the, the best session for the fourth day. >>Yeah, you're right. I like that. That's a good mojo to come into this interview with Tomer. So last year, last time I saw you was a year ago here in Vegas at Reinvent 21. We talked about the growth of data lakes and the data lake houses. We talked about the need for open data architectures as opposed to data warehouses. And the headline of the Silicon Angle's article on the interview we did with you was, Dremio Predicts 2022 will be the year open data architectures replace the data warehouse. We're almost done with 2022. Has that prediction come true? >>Yeah, I think, I think we're seeing almost every company out there, certainly in the enterprise, adopting data lake, data lakehouse technology, embracing open source kind of file and table formats. And, and so I think that's definitely happening. Of course, nothing goes away. So, you know, data warehouses don't go away in, in a year and actually don't go away ever. We still have mainframes around, but certainly the trends are, are all pointing in that direction. >>Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, what it really means for organizations. >>Yeah. I think you could think of the data lakehouse as the evolution of the data lake, right? And so, you know, for, for, you know, the last decade we've had kind of these two options, data lakes and data warehouses and, you know, warehouses, you know, having good SQL support, but, and good performance. But you had to spend a lot of time and effort getting data into the warehouse. You got locked into them, very, very expensive. That's a big problem now. And data lakes, you know, more open, more scalable, but had all sorts of kind of limitations. And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache Iceberg, is we've unlocked all the capabilities of the warehouse directly on object storage like s3. So you can insert and update and delete individual records. You can do transactions, you can do all the things you could do with a, a database directly in kind of open formats without getting locked in at a much lower cost. >>But you're still dealing with semi-structured data as opposed to structured data. And there's, there's work that has to be done to get that into a usable form. That's where Drio excels. What, what has been happening in that area to, to make, I mean, is it formats like j s o that are, are enabling this to happen? How, how we advancing the cause of making semi-structured data usable? Yeah, >>Well, I think first of all, you know, I think that's all changed. I think that was maybe true for the original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. It's all, it's all tables with the schema. And, you know, you can, you know, create table insert records. You know, it's, it's, it's really everything you can do with a data warehouse you can now do in the lakehouse. Now, that's not to say that there aren't like very advanced capabilities when it comes to, you know, j s O and nested data and kind of sparse data. You know, we excel in that as well. But we're really seeing kind of the lakehouse take over the, the bread and butter data warehouse use cases. >>You mentioned open a minute ago. Talk about why it's, why open is important and the value that it can deliver for customers. >>Yeah, well, I think if you look back in time and you see all the challenges that companies have had with kind of traditional data architectures, right? The, the, the, a lot of that comes from the, the, the problems with data warehouses. The fact that they are, you know, they're very expensive. The data is, you have to ingest it into the data warehouse in order to query it. And then it's almost impossible to get off of these systems, right? It takes an enormous effort, tremendous cost to get off of them. And so you're kinda locked in and that's a big problem, right? You also, you're dependent on that one data warehouse vendor, right? You can only do things with that data that the warehouse vendor supports. And if you contrast that to data lakehouse and open architectures where the data is stored in entirely open formats. >>So things like par files and Apache iceberg tables, that means you can use any engine on that data. You can use s SQL Query Engine, you can use Spark, you can use flin. You know, there's a dozen different engines that you can use on that, both at the same time. But also in the future, if you ever wanted to try something new that comes out, some new open source innovation, some new startup, you just take it and point out the same data. So that data's now at the core, at the center of the architecture as opposed to some, you know, vendors logo. Yeah. >>Amazon seems to be bought into the Lakehouse concept. It has big announcements on day two about eliminating the ETL stage between RDS and Redshift. Do you see the cloud vendors as pushing this concept forward? >>Yeah, a hundred percent. I mean, I'm, I'm Amazon's a great, great partner of ours. We work with, you know, probably 10 different teams there. Everything from, you know, the S3 team, the, the glue team, the click site team, you know, everything in between. And, you know, their embracement of the, the, the lake house architecture, the fact that they adopted Iceberg as their primary table format. I think that's exciting as an industry. We're all coming together around standard, standard ways to represent data so that at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account in open formats and be able to use all these different engines without losing any of the functionality that they need, right? The ability to do all these interactions with data that maybe in the past you would have to move the data into a database or, or warehouse in order to do, you just don't have to do that anymore. Speaking >>Of functionality, talk about what's new this year with drio since we've seen you last. >>Yeah, there's a lot of, a lot of new things with, with Drio. So yeah, we now have full Apache iceberg support, you know, with DML commands, you can do inserts, updates, deletes, you know, copy into all, all that kind of stuff is now, you know, fully supported native part of the platform. We, we now offer kind of two flavors of dr. We have, you know, Dr. Cloud, which is our SaaS version fully hosted. You sign up with your Google or, you know, Azure account and, and, and you're up in, you're up and running in, in, in a minute. And then dral software, which you can self host usually in the cloud, but even, even even outside of the cloud. And then we're also very excited about this new idea of data as code. And so we've introduced a new product that's now in preview called Dr. >>Arctic. And the idea there is to bring the concepts of GI or GitHub to the world of data. So things like being able to create a branch and work in isolation. If you're a data scientist, you wanna experiment on your own without impacting other people, or you're a data engineer and you're ingesting data, you want to transform it and test it before you expose it to others. You can do that in a branch. So all these ideas that, you know, we take for granted now in the world of source code and software development, we're bringing to the world of data with Jamar. And when you think about data mesh, a lot of people talking about data mesh now and wanting to kind of take advantage of, of those concepts and ideas, you know, thinking of data as a product. Well, when you think about data as a product, we think you have to manage it like code, right? You have to, and that's why we call it data as code, right? The, all those reasons that we use things like GI have to build products, you know, if we wanna think of data as a product, we need all those capabilities also with data. You know, also the ability to go back in time. The ability to undo mistakes, to see who changed my data and when did they change that table. All of those are, are part of this, this new catalog that we've created. >>Are you talk about data as a product that's sort of intrinsic to the data mesh concept. Are you, what's your opinion of data mesh? Is the, is the world ready for that radically different approach to data ownership? >>You know, we are now in dozens of, dozens of our customers that are using drio for to implement enterprise-wide kind of data mesh solutions. And at the end of the day, I think it's just, you know, what most people would consider common sense, right? In a large organization, it is very hard for a centralized single team to understand every piece of data, to manage all the data themselves, to, you know, make sure the quality is correct to make it accessible. And so what data mesh is first and foremost about is being able to kind of federate the, or distribute the, the ownership of data, the governance of the data still has to happen, right? And so that is, I think at the heart of the data mesh, but thinking of data as kind of allowing different teams, different domains to own their own data to really manage it like a product with all the best practices that that we have with that super important. >>So we we're doing a lot with data mesh, you know, the way that cloud has multiple projects and the way that Jamar allows you to have multiple catalogs and different groups can kind of interact and share data among each other. You know, the fact that we can connect to all these different data sources, even outside your data lake, you know, with Redshift, Oracle SQL Server, you know, all the different databases that are out there and join across different databases in addition to your data lake, that that's all stuff that companies want with their data mesh. >>What are some of your favorite customer stories that where you've really helped them accelerate that data mesh and drive business value from it so that more people in the organization kind of access to data so they can really make those data driven decisions that everybody wants to make? >>I mean, there's, there's so many of them, but, you know, one of the largest tech companies in the world creating a, a data mesh where you have all the different departments in the company that, you know, they, they, they were a big data warehouse user and it kinda hit the wall, right? The costs were so high and the ability for people to kind of use it for just experimentation, to try new things out to collaborate, they couldn't do it because it was so prohibitively expensive and difficult to use. And so what they said, well, we need a platform that different people can, they can collaborate, they can ex, they can experiment with the data, they can share data with others. And so at a big organization like that, the, their ability to kind of have a centralized platform but allow different groups to manage their own data, you know, several of the largest banks in the world are, are also doing data meshes with Dr you know, one of them has over over a dozen different business units that are using, using Dremio and that ability to have thousands of people on a platform and to be able to collaborate and share among each other that, that's super important to these >>Guys. Can you contrast your approach to the market, the snowflakes? Cause they have some of those same concepts. >>Snowflake's >>A very closed system at the end of the day, right? Closed and very expensive. Right? I think they, if I remember seeing, you know, a quarter ago in, in, in one of their earnings reports that the average customer spends 70% more every year, right? Well that's not sustainable. If you think about that in a decade, that's your cost is gonna increase 200 x, most companies not gonna be able to swallow that, right? So companies need, first of all, they need more cost efficient solutions that are, you know, just more approachable, right? And the second thing is, you know, you know, we talked about the open data architecture. I think most companies now realize that the, if you want to build a platform for the future, you need to have the data and open formats and not be locked into one vendor, right? And so that's kind of another important aspect beyond that's ability to connect to all your data, even outside the lake to your different databases, no sequel databases, relational databases, and drs semantic layer where we can accelerate queries. And so typically what you have, what happens with data warehouses and other data lake query engines is that because you can't get the performance that you want, you end up creating lots and lots of copies of data. You, for every use case, you're creating a, you know, a pre-joy copy of that data, a pre aggregated version of that data. And you know, then you have to redirect all your data. >>You've got a >>Governance problem, individual things. It's expensive. It's expensive, it's hard to secure that cuz permissions don't travel with the data. So you have all sorts of problems with that, right? And so what we've done because of our semantic layer that makes it easy to kind of expose data in a logical way. And then our query acceleration technology, which we call reflections, which transparently accelerates queries and gives you subsecond response times without data copies and also without extracts into the BI tools. Cause if you start doing bi extracts or imports, again, you have lots of copies of data in the organization, all sorts of refresh problems, security problems, it's, it's a nightmare, right? And that just collapsing all those copies and having a, a simple solution where data's stored in open formats and we can give you fast access to any of that data that's very different from what you get with like a snowflake or, or any of these other >>Companies. Right. That, that's a great explanation. I wanna ask you, early this year you announced that your Dr. Cloud service would be a free forever, the basic DR. Cloud service. How has that offer gone over? What's been the uptake on that offer? >>Yeah, it, I mean it is, and thousands of people have signed up and, and it's, I think it's a great service. It's, you know, it's very, very simple. People can go on the website, try it out. We now have a test drive as well. If, if you want to get started with just some sample public sample data sets and like a tutorial, we've made that increasingly easy as well. But yeah, we continue to, you know, take that approach of, you know, making it, you know, making it easy, democratizing these kind of cloud data platforms and, and kinda lowering the barriers to >>Adoption. How, how effective has it been in driving sales of the enterprise version? >>Yeah, a lot of, a lot of, a lot of business with, you know, that, that we do like when it comes to, to selling is, you know, folks that, you know, have educated themselves, right? They've started off, they've followed some tutorials. I think generally developers, they prefer the first interaction to be with a product, not with a salesperson. And so that's, that's basically the reason we did that. >>Before we ask you the last question, I wanna just, can you give us a speak peek into the product roadmap as we enter 2023? What can you share with us that we should be paying attention to where Drum is concerned? >>Yeah. You know, actually a couple, couple days ago here at the conference, we, we had a press release with all sorts of new capabilities that we, we we just released. And there's a lot more for, for the coming year. You know, we will shortly be releasing a variety of different performance enhancements. So we'll be in the next quarter or two. We'll be, you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections and our career acceleration, you know, support for all the major clouds is coming. You know, just a lot of capabilities in Inre that make it easier and easier to use the platform. >>Awesome. Tomer, thank you so much for joining us. My last question to you is, if you had a billboard in your desired location and it was going to really just be like a mic drop about why customers should be looking at Drio, what would that billboard say? >>Well, DRIO is the easy and open data lake house and, you know, open architectures. It's just a lot, a lot better, a lot more f a lot more future proof, a lot easier and a lot just a much safer choice for the future for, for companies. And so hard to argue with those people to take a look. Exactly. That wasn't the best. That wasn't the best, you know, billboards. >>Okay. I think it's a great billboard. Awesome. And thank you so much for joining Poly Me on the program, sharing with us what's new, what some of the exciting things are that are coming down the pipe. Quite soon we're gonna be keeping our eye Ono. >>Awesome. Always happy to be here. >>Thank you. Right. For our guest and for Paul Gillin, I'm Lisa Martin. You're watching The Cube, the leader in live and emerging tech coverage.

Published Date : Dec 1 2022

SUMMARY :

It's the Cube live at AWS Reinvent This is as it has been since the beginning, this show with a lot of energy. it's great to have you back on the program. And thanks for keeping the, the best session for the fourth day. And the headline of the Silicon Angle's article on the interview we did with you was, So, you know, data warehouses don't go away in, in a year and actually don't go away ever. Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache are enabling this to happen? original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. You mentioned open a minute ago. The fact that they are, you know, they're very expensive. at the center of the architecture as opposed to some, you know, vendors logo. Do you see the at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account Apache iceberg support, you know, with DML commands, you can do inserts, updates, So all these ideas that, you know, we take for granted now in the world of Are you talk about data as a product that's sort of intrinsic to the data mesh concept. And at the end of the day, I think it's just, you know, what most people would consider common sense, So we we're doing a lot with data mesh, you know, the way that cloud has multiple several of the largest banks in the world are, are also doing data meshes with Dr you know, Cause they have some of those same concepts. And the second thing is, you know, you know, stored in open formats and we can give you fast access to any of that data that's very different from what you get What's been the uptake on that offer? But yeah, we continue to, you know, take that approach of, you know, How, how effective has it been in driving sales of the enterprise version? to selling is, you know, folks that, you know, have educated themselves, right? you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections My last question to you is, if you had a Well, DRIO is the easy and open data lake house and, you And thank you so much for joining Poly Me on the program, sharing with us what's new, Always happy to be here. the leader in live and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Paul Gillen	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tomer	PERSON	0.99+
Tomer Shiran	PERSON	0.99+
Toron	PERSON	0.99+
Las Vegas	LOCATION	0.99+
70%	QUANTITY	0.99+
Monday night	DATE	0.99+
Vegas	LOCATION	0.99+
fourth day	QUANTITY	0.99+
Paul	PERSON	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
dozens	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
10 different teams	QUANTITY	0.99+
Dremio	PERSON	0.99+
early this year	DATE	0.99+
SQL Query Engine	TITLE	0.99+
The Cube	TITLE	0.99+
Tuesday	DATE	0.99+
2023	DATE	0.99+
one	QUANTITY	0.98+
a year ago	DATE	0.98+
next quarter	DATE	0.98+
S3	TITLE	0.98+
a quarter ago	DATE	0.98+
twice	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Drio	ORGANIZATION	0.98+
couple days ago	DATE	0.98+
both	QUANTITY	0.97+
DRIO	ORGANIZATION	0.97+
2022	DATE	0.97+
Lake House	ORGANIZATION	0.96+
thousands of people	QUANTITY	0.96+
Wednesday	DATE	0.96+
Spark	TITLE	0.96+
200 x	QUANTITY	0.96+
first	QUANTITY	0.96+
Drio	TITLE	0.95+
Dremeo	ORGANIZATION	0.95+
two options	QUANTITY	0.94+
about three hours	QUANTITY	0.94+
day two	QUANTITY	0.94+
s3	TITLE	0.94+
Apache Iceberg	ORGANIZATION	0.94+
a minute ago	DATE	0.94+
Silicon Angle	ORGANIZATION	0.94+
hundred percent	QUANTITY	0.93+
Apache	ORGANIZATION	0.93+
single team	QUANTITY	0.93+
GitHub	ORGANIZATION	0.91+
this morning	DATE	0.9+
a dozen different engines	QUANTITY	0.89+
Iceberg	TITLE	0.87+
Redshift	TITLE	0.87+
last	DATE	0.87+
this year	DATE	0.86+
first interaction	QUANTITY	0.85+
two flavors	QUANTITY	0.84+
Thursday	DATE	0.84+
Azure	ORGANIZATION	0.84+
DR. Cloud	ORGANIZATION	0.84+
SQL Server	TITLE	0.83+
four conferences	QUANTITY	0.82+
coming year	DATE	0.82+
over over a dozen different business	QUANTITY	0.81+
one vendor	QUANTITY	0.8+
Poly	ORGANIZATION	0.79+
Jamar	PERSON	0.77+
GI	ORGANIZATION	0.77+
Inre	ORGANIZATION	0.76+
Dr.	ORGANIZATION	0.73+
Lake house	ORGANIZATION	0.71+
Arctic	ORGANIZATION	0.71+
a year	QUANTITY	0.7+
a minute	QUANTITY	0.7+
SQL	TITLE	0.69+
AWS Reinvent 2022	EVENT	0.69+
subsecond	QUANTITY	0.68+
DML	TITLE	0.68+

Jed Dougherty, Dataiku | AWS re:Invent 2022

(bright music) >> Welcome back to Vegas, guys and girls. We're pleased that you're watching theCUBE. We know you've been with us. This is our fourth day. We know you've been with us since day one. Why wouldn't you be? Lisa Martin, here. As I mentioned, day four of theCUBE's coverage of AWS re:Invent. There are north of 55,000 people that have been at this event this week. We're hearing hundreds of thousands online. It really feels like old times, which is awesome. We're pleased to welcome back a gentleman from Dataiku who's actually new to theCUBE but Dataiku is not. Jed Dougherty is here, the VP of Platform Strategy. Thanks to joining me today, Jed. >> Oh, I'm so happy to be here. >> Talk a little bit, for anybody that isn't familiar with Dataiku, tell the audience a little bit about the technology, what you guys do. >> Dataiku is an end-to-end data science machine learning platform. We take everything from data ingestion, piplining of that data, bringing it all together, something that's useful for building models, deploying those models and then managing your ML ops workflow. So, really all the way across. And we sit on top of, basically, tons of different AWS stack as well as lots of the partners that are here today. >> Okay, got it. >> Snowflake, Databricks, all that. >> Got it, so one of the things that, it was funny, I think it was Adam's keynote Tuesday morning. I didn't time it, I watched it, but one of my guests said to me earlier this week that Adam spent exactly 52 minutes talking about data. >> Yeah. >> 52 minutes. Obviously, we can't come to an event like this without talking about data. Every company these days has to be a data company. Whether it's my grocery store or a retailer, a hospital, and so- >> Jed: It is the lifeblood of every modern company. >> It is, but you have to be able to access it. You have to be able to harness it, access it, derive insights from it, and be able to act on that faster than the competitors that are waiting, like, right back here. One of the things Adam Selipsky talked about with our boss, John Furrier, who's the co-CEO of theCUBE, they had a sit-down about a week before re:Invent. John always gets a preview of the show and Adam said, you know, he thinks the role of data analyst is going to go away. Or at least the term, because with data democratization that needs to happen. Putting data in the hands of all the business users, that every business user, whether you're in technology or marketing or ops or finance, it's going to have to analyze data to do their jobs. >> Could not agree more. >> Are you hearing that from customers? >> 100% >> Yeah. >> I was just at the CTO Summit of Bank of America two weeks ago out in California, and they told, their CTO had a statistic, 60,000 technologists in Bank of America, all asking data-type questions. You can have the best team of data scientists in the world, and they do. They have some of the best data scientists in the world there. And this team of data scientists could answer any one of the questions that those 60,000 people might have but they can't answer all of them, right? You need those people to be able to answer their own questions. I don't know if the term data analysts are going away. I think, yeah, everybody's just going to have to become a bit more of one. Just like how Excel taught everybody how to use the spreadsheet, in the future, in the next five, 10 years, the democratization of AI means that tools like Dataiku and other data science tools are going to teach everybody how to analyze data. >> Talk about Dataiku as a facilitator of that, of that democratization. Giving, like the citizen technologist who might be in finance, the ability to do that. >> So, a lot of data science tools are aimed at your hardcore coder, right? Somebody who wants to be sitting at a notebook writing (indistinct) or something like that and running models on some big fancy Spark server. Dataiku is still going to be running models on some big fancy Spark server but we're really obfuscating the challenge of writing code away from the user. So we target low code, no code, and high code users all working together in a collaborative platform. So we really do, we believe that there is always going to be a place for data scientists. That role is not going away. You will always need hardcore coders to take on those moonshot very challenging topics. But for every day AI, anybody should be able to do this and it should be open to anybody. >> Right. >> Jed: Really aim to facilitate that. >> I would love to hear some feedback, you know, this is day four of the show as I was saying, and day four is packed. I mean, this is energy-level-wise, guys, it is the same as it was when we started here on Friday night. But I'd love to hear, Jed, from your perspective some of the customer conversations that you've had, what are some of the challenges? They're coming to you saying, "Jed, Dataiku, help us eradicate these challenges so we can transform our business." >> What I'm hearing from customers and partners and AWS here is, over and over, we don't want to buy tools anymore. We want to buy solutions. We want a vertical solution that's pre-built for our industry. And we want it to be, not necessarily click and run out of the box, but we want a template that we can build off of quickly. And I've heard that customers are also looking to understand how tools can be packaged together. You got how many booths are here? 1000 booths? >> Yes, easily. >> You have 1000 different products being talked about, right behind us. Customers need to know which of these products are friends with each other and how they fit together so that they are making sure that when they purchase a set, a suite of tools to do their jobs, it's all going to work naturally together. So, being able, I think this is a really vital concept for GSIs as well. GSIs needs to understand how to package sets of tools together to deliver a full solution to clients. People don't want to be, you know, I think 10 years ago, five years ago, AWS was in the business of selling servers in the cloud. But basically what you do is, you would buy an EC two instance and you install whatever software you wanted on it. I don't know that they're in that business still but customers don't want to buy servers from AWS anymore. They want to buy solutions. >> Right. >> Rent, whatever. >> Yeah. (chuckles) >> That is the big repeated message that I've heard here. >> So you brought up a good point that there are probably 1000 booths here. You could be here every day and not get to see everything that's going on. Plus this show was going on across the strip. We're only getting a fraction of the people that are here. But with that said, to your point, there are so many tools out there. Customers are looking for solutions. One of the things that we say about theCUBE is, we extract the signal from the noise. How does Dataiku get past the noise? How do you get up the stack to really impact customers so they understand the value that you're delivering? >> I think that Data science and ML sound like a very complicated topic but our value prop is relatively simple. And we appeal both to your end users who are excited to learn about how data science works and how they can leverage these tools in their day-to-day jobs, as well as appealing to IT. IT, right now, at major organizations they want to be able to build a full stack that makes sense. And the big choices they're making right now are around infrastructure. Where am I going to run my compute? So, they're choosing between Snowflake or Databricks or a native AWS compute solution, right? And so they make this big choice around compute and then they realize, "Oh, how many of our users across our organization are actually able to leverage this big compute choice?" Oh, maybe 100, maybe 200. That's not incredibly useful for what we've just decided to completely stand behind. Dataiku, all of a sudden, opens that up to 1000s of users across your organization. So it makes IT feel empowered by being able to help more people. And it makes users feel empowered by being able to use a great tool and start answering their own questions. >> And where are your customer conversations these days? As we look at AI and ML, emerging technologies, so many customers and companies, knowing we have to go in this direction. We have to have AI to speed the business. Are you seeing more of the conversations are still in IT or are they actually going up the stack? >> (chuckles) It's a great question. When you're going into large organizations, there's two sales motions, right? There's convincing the business users that this is a great thing and then convincing IT that it's not going to be too painful. You always have to go to both places. IT doesn't want to take on a boondoggler, or there's an albatross, I don't remember the word, but, something that they're going to have to deal with for the next 10 years and then eventually dismantle and pull apart. I think a lot of IT got very scared about big data platforms and solutions because of Hadoop. To be honest, Hadoop was incredibly powerful but maybe not as mature of technology as IT would've liked it to be. From a maintenance and administration standpoint. So yes, you will always have to sell to IT and help IT feel comfortable with the platform. But no, the conversations that I want to have are the use case conversations with a Chief Data Officer, Chief Revenue Officer, Chief Marketing Officer. That's who I really want to convince that this is going to be a worthwhile opportunity. >> And what are some of the key, sorry. What are some of the key use cases that Dataiku is tackling in the market these days? >> So we work a lot. Two of the biggest organizations, or verticals, that I work with personally are finance and pharmaceuticals. In finance, we are closely embedded with wealth management organizations. So, a lot of that is around customer entertainment, churn, relatively obvious, simple concepts but ones where it's worth a lot of money. In pharma, we work both on the supply side. So, doing supply chain optimization, ensuring the right drugs get to the right places at the right time. As well as on the business and marketing side. So, ensuring that your ad spend is correctly distributed across different advertising platforms. >> So if you're working with a financial organization, I want to understand from a consumer, from the end user's perspective, although obviously this technology impacts the end user who's trying to do a transaction. What's in it for me? And I don't know as the end user that Dataiku is under the hood. >> You'd never know. >> Which is good. I shouldn't have to worry about the technology. >> Jed: You shouldn't have to worry about that at all. >> What's in it for the end user customer? What are they gaining from this? >> So, from a very end user perspective, if you think about when you logged onto maybe your Bank of America, your Chase app, five or 10 years ago, maybe you didn't even have it on your phone five years ago. Or when you logged into your account online. We do 95% of our banking online right now, right? I go into a physical location, what? I don't know, once every six months or something? Get a cashier's check? I don't know. The experience that you're getting and the amount of information you're getting back about your spending habits, where your money is going, what your credit score is, all of these things are being driven by these big data organizations inside the banks. Also, any type, this is a little creepier, but any type of promotional emails or the types of things that you get feedback on when you use your credit card and the offers that you get through that, are all being personalized to you through the information that these banks are collecting about your spending habits. >> Yeah, but we want that as a consumer, we want the personalized. >> Yeah, of course. We want it to be magic slash not creepy. (laughs) >> Right, I want them to recommend the best card for me. >> Right. >> The next best thing. >> It's good for me, it's good for them. >> Don't serve me up something that I've already bought. That always bugs me when I'm like, I already bought that. >> I get that all the time. I'm like, yeah, I have that card already. It's in my wallet. Why are you telling me? >> We only have a couple of minutes left Jed, but talk to me about from a platform strategy perspective, what's next for Dataiku and AWS? >> So we are making a matrix transition right now and it's core to our platform. For a long time, the way that we've installed Dataiku is, we help our customers install it on their AWS account so it runs inside their tenant. This is very comfortable for, for example, large banking clients, pharma clients that have personally identifiable information, all that kind of thing. They own everything. However, as we were talking about before, we're really moving from providing a tool to providing solutions. And part of that is obviously a move to SaaS. So two years ago we released a SaaS offering. We've been expanding it more and more to, this year, we want to be pushing SaaS first. So Dataiku online should be the first option when new customers move on. And that is a huge platform shift. It means making sure that we have the right security in place. It means making sure that we have the right scaling in place, that we have 24-7 support. All this has been a big challenge. A big fascinating challenge, actually, to put together. >> Awesome. Last question for you. Say you get a brand new DeLorean, I hear they're coming back, and you want to put, you really, really want to put a bumper sticker on it, 'cause why not? And it's about Dataiku and it's like a sizzle reel kind of thing. >> A sizzle real, alright. >> Yeah. What does it say? >> Extraordinary people, everyday AI. >> Wow. Drop the mic, Jed. That was awesome. Thank you so much for coming on the program. We really appreciate the update on Dataiku. What you guys are doing for customers, your specialization and solutions for verticals. Awesome stuff, we'll have to have you back. >> Thank you so much. >> Alright, my pleasure. >> Bye-Bye. >> For my guest, I'm Lisa Martin. You're watching theCUBE, the leader in live enterprise and emerging tech coverage. (bright music)

Published Date : Dec 1 2022

SUMMARY :

Jed Dougherty is here, the tell the audience a little lots of the partners that are here today. Got it, so one of the has to be a data company. Jed: It is the lifeblood that needs to happen. I don't know if the term the ability to do that. is always going to be a of the show as I was saying, and run out of the box, I don't know that they're That is the big repeated of the people that are here. And the big choices We have to have AI to speed the business. that this is going to be What are some of the key use cases So, a lot of that is around And I don't know as the I shouldn't have to worry to worry about that at all. and the offers that you get through that, Yeah, but we want that as a consumer, We want it to be magic the best card for me. it's good for them. something that I've already bought. I get that all the time. and it's core to our platform. and you want to put, you really, really What does it say? have to have you back. the leader in live enterprise

ENTITIES

Entity	Category	Confidence
Adam	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Jed Dougherty	PERSON	0.99+
Adam Selipsky	PERSON	0.99+
John Furrier	PERSON	0.99+
AWS	ORGANIZATION	0.99+
95%	QUANTITY	0.99+
California	LOCATION	0.99+
Jed	PERSON	0.99+
1000 booths	QUANTITY	0.99+
Friday night	DATE	0.99+
John	PERSON	0.99+
100%	QUANTITY	0.99+
fourth day	QUANTITY	0.99+
Two	QUANTITY	0.99+
first option	QUANTITY	0.99+
Tuesday morning	DATE	0.99+
Excel	TITLE	0.99+
60,000 people	QUANTITY	0.99+
Bank of America	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two years ago	DATE	0.99+
this year	DATE	0.99+
100	QUANTITY	0.99+
today	DATE	0.99+
52 minutes	QUANTITY	0.99+
60,000 technologists	QUANTITY	0.99+
10 years ago	DATE	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
five	DATE	0.99+
Dataiku	ORGANIZATION	0.99+
52 minutes	QUANTITY	0.98+
five years ago	DATE	0.98+
200	QUANTITY	0.98+
two sales	QUANTITY	0.98+
one	QUANTITY	0.98+
earlier this week	DATE	0.98+
Snowflake	ORGANIZATION	0.98+
Vegas	LOCATION	0.98+
1000 different products	QUANTITY	0.97+
this week	DATE	0.97+
both places	QUANTITY	0.97+
Hadoop	TITLE	0.97+
CTO Summit	EVENT	0.97+
two weeks ago	DATE	0.96+
hundreds of thousands	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
Bank of America	LOCATION	0.94+
Bank of America	EVENT	0.93+
Dataiku	TITLE	0.92+
day one	QUANTITY	0.91+
Spark	TITLE	0.9+
day four	QUANTITY	0.89+
first	QUANTITY	0.88+
EC two	TITLE	0.88+
Dataiku	PERSON	0.86+
a week	DATE	0.83+
Chase	TITLE	0.83+
one of my guests	QUANTITY	0.83+
CTO	ORGANIZATION	0.81+

ML & AI Keynote Analysis | AWS re:Invent 2022

>>Hey, welcome back everyone. Day three of eight of us Reinvent 2022. I'm John Farmer with Dave Volante, co-host the q Dave. 10 years for us, the leader in high tech coverage is our slogan. Now 10 years of reinvent day. We've been to every single one except with the original, which we would've come to if Amazon actually marketed the event, but they didn't. It's more of a customer event. This is day three. Is the machine learning ai keynote sws up there. A lot of announcements. We're gonna break this down. We got, we got Andy Thra here, vice President, prince Constellation Research. Andy, great to see you've been on the cube before one of our analysts bringing the, bringing the, the analysis, commentary to the keynote. This is your wheelhouse. Ai. What do you think about Swami up there? I mean, he's awesome. We love him. Big fan Oh yeah. Of of the Cuban we're fans of him, but he got 13 announcements. >>A lot. A lot, >>A lot. >>So, well some of them are, first of all, thanks for having me here and I'm glad to have both of you on the same show attacking me. I'm just kidding. But some of the announcement really sort of like a game changer announcements and some of them are like, meh, you know, just to plug in the holes what they have and a lot of golf claps. Yeah. Meeting today. And you could have also noticed that by, when he was making the announcements, you know, the, the, the clapping volume difference, you could say, which is better, right? But some of the announcements are, are really, really good. You know, particularly we talked about, one of that was Microsoft took that out of, you know, having the open AI in there, doing the large language models. And then they were going after that, you know, having the transformer available to them. And Amazon was a little bit weak in the area, so they couldn't, they don't have a large language model. So, you know, they, they are taking a different route saying that, you know what, I'll help you train the large language model by yourself, customized models. So I can provide the necessary instance. I can provide the instant volume, memory, the whole thing. Yeah. So you can train the model by yourself without depending on them kind >>Of thing. So Dave and Andy, I wanna get your thoughts cuz first of all, we've been following Amazon's deep bench on the, on the infrastructure pass. They've been doing a lot of machine learning and ai, a lot of data. It just seems that the sentiment is that there's other competitors doing a good job too. Like Google, Dave. And I've heard folks in the hallway, even here, ex Amazonians saying, Hey, they're train their models on Google than they bring up the SageMaker cuz it's better interface. So you got, Google's making a play for being that data cloud. Microsoft's obviously putting in a, a great kind of package to kind of make it turnkey. How do they really stand versus the competition guys? >>Good question. So they, you know, each have their own uniqueness and the we variation that take it to the field, right? So for example, if you were to look at it, Microsoft is known for as industry or later things that they are been going after, you know, industry verticals and whatnot. So that's one of the things I looked here, you know, they, they had this omic announcement, particularly towards that healthcare genomics space. That's a huge space for hpz related AIML applications. And they have put a lot of things in together in here in the SageMaker and in the, in their models saying that, you know, how do you, how do you use this transmit to do things like that? Like for example, drug discovery, for genomics analysis, for cancer treatment, the whole, right? That's a few volumes of data do. So they're going in that healthcare area. Google has taken a different route. I mean they want to make everything simple. All I have to do is I gotta call an api, give what I need and then get it done. But Amazon wants to go at a much deeper level saying that, you know what? I wanna provide everything you need. You can customize the whole thing for what you need. >>So to me, the big picture here is, and and Swami references, Hey, we are a data company. We started, he talked about books and how that informed them as to, you know, what books to place front and center. Here's the, here's the big picture. In my view, companies need to put data at the core of their business and they haven't, they've generally put humans at the core of their business and data. And now machine learning are at the, at the outside and the periphery. Amazon, Google, Microsoft, Facebook have put data at their core. So the question is how do incumbent companies, and you mentioned some Toyota Capital One, Bristol Myers Squibb, I don't know, are those data companies, you know, we'll see, but the challenge is most companies don't have the resources as you well know, Andy, to actually implement what Google and Facebook and others have. >>So how are they gonna do that? Well, they're gonna buy it, right? So are they gonna build it with tools that's kind of like you said the Amazon approach or are they gonna buy it from Microsoft and Google, I pulled some ETR data to say, okay, who are the top companies that are showing up in terms of spending? Who's spending with whom? AWS number one, Microsoft number two, Google number three, data bricks. Number four, just in terms of, you know, presence. And then it falls down DataRobot, Anaconda data icu, Oracle popped up actually cuz they're embedding a lot of AI into their products and, and of course IBM and then a lot of smaller companies. But do companies generally customers have the resources to do what it takes to implement AI into applications and into workflows? >>So a couple of things on that. One is when it comes to, I mean it's, it's no surprise that the, the top three or the hyperscalers, because they all want to bring their business to them to run the specific workloads on the next biggest workload. As you was saying, his keynote are two things. One is the A AIML workloads and the other one is the, the heavy unstructured workloads that he was talking about. 80%, 90% of the data that's coming off is unstructured. So how do you analyze that? Such as the geospatial data. He was talking about the volumes of data you need to analyze the, the neural deep neural net drug you ought to use, only hyperscale can do it, right? So that's no wonder all of them on top for the data, one of the things they announced, which not many people paid attention, there was a zero eight L that that they talked about. >>What that does is a little bit of a game changing moment in a sense that you don't have to, for example, if you were to train the data, data, if the data is distributed everywhere, if you have to bring them all together to integrate it, to do that, it's a lot of work to doing the dl. So by taking Amazon, Aurora, and then Rich combine them as zero or no ETL and then have Apaches Apaches Spark applications run on top of analytical applications, ML workloads. That's huge. So you don't have to move around the data, use the data where it is, >>I, I think you said it, they're basically filling holes, right? Yeah. They created this, you know, suite of tools, let's call it. You might say it's a mess. It's not a mess because it's, they're really powerful but they're not well integrated and now they're starting to take the seams as I say. >>Well yeah, it's a great point. And I would double down and say, look it, I think that boring is good. You know, we had that phase in Kubernetes hype cycle where it got boring and that was kind of like, boring is good. Boring means we're getting better, we're invisible. That's infrastructure that's in the weeds, that's in between the toes details. It's the stuff that, you know, people we have to get done. So, you know, you look at their 40 new data sources with data Wrangler 50, new app flow connectors, Redshift Auto Cog, this is boring. Good important shit Dave. The governance, you gotta get it and the governance is gonna be key. So, so to me, this may not jump off the page. Adam's keynote also felt a little bit of, we gotta get these gaps done in a good way. So I think that's a very positive sign. >>Now going back to the bigger picture, I think the real question is can there be another independent cloud data cloud? And that's the, to me, what I try to get at my story and you're breaking analysis kind of hit a home run on this, is there's interesting opportunity for an independent data cloud. Meaning something that isn't aws, that isn't, Google isn't one of the big three that could sit in. And so let me give you an example. I had a conversation last night with a bunch of ex Amazonian engineering teams that left the conversation was interesting, Dave. They were like talking, well data bricks and Snowflake are basically batch, okay, not transactional. And you look at Aerospike, I can see their booth here. Transactional data bases are hot right now. Streaming data is different. Confluence different than data bricks. Is data bricks good at hosting? >>No, Amazon's better. So you start to see these kinds of questions come up where, you know, data bricks is great, but maybe not good for this, that and the other thing. So you start to see the formation of swim lanes or visibility into where people might sit in the ecosystem, but what came out was transactional. Yep. And batch the relationship there and streaming real time and versus you know, the transactional data. So you're starting to see these new things emerge. Andy, what do you, what's your take on this? You're following this closely. This seems to be the alpha nerd conversation and it all points to who's gonna have the best data cloud, say data, super clouds, I call it. What's your take? >>Yes, data cloud is important as well. But also the computational that goes on top of it too, right? Because when, when the data is like unstructured data, it's that much of a huge data, it's going to be hard to do that with a low model, you know, compute power. But going back to your data point, the training of the AIML models required the batch data, right? That's when you need all the, the historical data to train your models. And then after that, when you do inference of it, that's where you need the streaming real time data that's available to you too. You can make an inference. One of the things, what, what they also announced, which is somewhat interesting, is you saw that they have like 700 different instances geared towards every single workload. And there are some of them very specifically run on the Amazon's new chip. The, the inference in two and theran tr one chips that basically not only has a specific instances but also is run on a high powered chip. And then if you have that data to support that, both the training as well as towards the inference, the efficiency, again, those numbers have to be proven. They claim that it could be anywhere between 40 to 60% faster. >>Well, so a couple things. You're definitely right. I mean Snowflake started out as a data warehouse that was simpler and it's not architected, you know, in and it's first wave to do real time inference, which is not now how, how could they, the other second point is snowflake's two or three years ahead when it comes to governance, data sharing. I mean, Amazon's doing what always does. It's copying, you know, it's customer driven. Cuz they probably walk into an account and they say, Hey look, what's Snowflake's doing for us? This stuff's kicking ass. And they go, oh, that's a good idea, let's do that too. You saw that with separating compute from storage, which is their tiering. You saw it today with extending data, sharing Redshift, data sharing. So how does Snowflake and data bricks approach this? They deal with ecosystem. They bring in ecosystem partners, they bring in open source tooling and that's how they compete. I think there's unquestionably an opportunity for a data cloud. >>Yeah, I think, I think the super cloud conversation and then, you know, sky Cloud with Berkeley Paper and other folks talking about this kind of pre, multi-cloud era. I mean that's what I would call us right now. We are, we're kind of in the pre era of multi-cloud, which by the way is not even yet defined. I think people use that term, Dave, to say, you know, some sort of magical thing that's happening. Yeah. People have multiple clouds. They got, they, they end up by default, not by design as Dell likes to say. Right? And they gotta deal with it. So it's more of they're inheriting multiple cloud environments. It's not necessarily what they want in the situation. So to me that is a big, big issue. >>Yeah, I mean, again, going back to your snowflake and data breaks announcements, they're a data company. So they, that's how they made their mark in the market saying that, you know, I do all those things, therefore you have, I had to have your data because it's a seamless data. And, and Amazon is catching up with that with a lot of that announcements they made, how far it's gonna get traction, you know, to change when I to say, >>Yeah, I mean to me, to me there's no doubt about Dave. I think, I think what Swamee is doing, if Amazon can get corner the market on out of the box ML and AI capabilities so that people can make it easier, that's gonna be the end of the day tell sign can they fill in the gaps. Again, boring is good competition. I don't know mean, mean I'm not following the competition. Andy, this is a real question mark for me. I don't know where they stand. Are they more comprehensive? Are they more deeper? Are they have deeper services? I mean, obviously shows to all the, the different, you know, capabilities. Where, where, where does Amazon stand? What's the process? >>So what, particularly when it comes to the models. So they're going at, at a different angle that, you know, I will help you create the models we talked about the zero and the whole data. We'll get the data sources in, we'll create the model. We'll move the, the whole model. We are talking about the ML ops teams here, right? And they have the whole functionality that, that they built ind over the year. So essentially they want to become the platform that I, when you come in, I'm the only platform you would use from the model training to deployment to inference, to model versioning to management, the old s and that's angle they're trying to take. So it's, it's a one source platform. >>What about this idea of technical debt? Adrian Carro was on yesterday. John, I know you talked to him as well. He said, look, Amazon's Legos, you wanna buy a toy for Christmas, you can go out and buy a toy or do you wanna build a, to, if you buy a toy in a couple years, you could break and what are you gonna do? You're gonna throw it out. But if you, if you, if part of your Lego needs to be extended, you extend it. So, you know, George Gilbert was saying, well, there's a lot of technical debt. Adrian was countering that. Does Amazon have technical debt or is that Lego blocks analogy the right one? >>Well, I talked to him about the debt and one of the things we talked about was what do you optimize for E two APIs or Kubernetes APIs? It depends on what team you're on. If you're on the runtime gene, you're gonna optimize for Kubernetes, but E two is the resources you want to use. So I think the idea of the 15 years of technical debt, I, I don't believe that. I think the APIs are still hardened. The issue that he brings up that I think is relevant is it's an end situation, not an or. You can have the bag of Legos, which is the primitives and build a durable application platform, monitor it, customize it, work with it, build it. It's harder, but the outcome is durability and sustainability. Building a toy, having a toy with those Legos glued together for you, you can get the play with, but it'll break over time. Then you gotta replace it. So there's gonna be a toy business and there's gonna be a Legos business. Make your own. >>So who, who are the toys in ai? >>Well, out of >>The box and who's outta Legos? >>The, so you asking about what what toys Amazon building >>Or, yeah, I mean Amazon clearly is Lego blocks. >>If people gonna have out the box, >>What about Google? What about Microsoft? Are they basically more, more building toys, more solutions? >>So Google is more of, you know, building solutions angle like, you know, I give you an API kind of thing. But, but if it comes to vertical industry solutions, Microsoft is, is is ahead, right? Because they have, they have had years of indu industry experience. I mean there are other smaller cloud are trying to do that too. IBM being an example, but you know, the, now they are starting to go after the specific industry use cases. They think that through, for example, you know the medical one we talked about, right? So they want to build the, the health lake, security health lake that they're trying to build, which will HIPPA and it'll provide all the, the European regulations, the whole line yard, and it'll help you, you know, personalize things as you need as well. For example, you know, if you go for a certain treatment, it could analyze you based on your genome profile saying that, you know, the treatment for this particular person has to be individualized this way, but doing that requires a anomalous power, right? So if you do applications like that, you could bring in a lot of the, whether healthcare, finance or what have you, and then easy for them to use. >>What's the biggest mistake customers make when it comes to machine intelligence, ai, machine learning, >>So many things, right? I could start out with even the, the model. Basically when you build a model, you, you should be able to figure out how long that model is effective. Because as good as creating a model and, and going to the business and doing things the right way, there are people that they leave the model much longer than it's needed. It's hurting your business more than it is, you know, it could be things like that. Or you are, you are not building a responsibly or later things. You are, you are having a bias and you model and are so many issues. I, I don't know if I can pinpoint one, but there are many, many issues. Responsible ai, ethical ai. All >>Right, well, we'll leave it there. You're watching the cube, the leader in high tech coverage here at J three at reinvent. I'm Jeff, Dave Ante. Andy joining us here for the critical analysis and breaking down the commentary. We'll be right back with more coverage after this short break.

Published Date : Nov 30 2022

SUMMARY :

Ai. What do you think about Swami up there? A lot. of, you know, having the open AI in there, doing the large language models. So you got, Google's making a play for being that data cloud. So they, you know, each have their own uniqueness and the we variation that take it to have the resources as you well know, Andy, to actually implement what Google and they gonna build it with tools that's kind of like you said the Amazon approach or are they gonna buy it from Microsoft the neural deep neural net drug you ought to use, only hyperscale can do it, right? So you don't have to move around the data, use the data where it is, They created this, you know, It's the stuff that, you know, people we have to get done. And so let me give you an example. So you start to see these kinds of questions come up where, you know, it's going to be hard to do that with a low model, you know, compute power. was simpler and it's not architected, you know, in and it's first wave to do real time inference, I think people use that term, Dave, to say, you know, some sort of magical thing that's happening. you know, I do all those things, therefore you have, I had to have your data because it's a seamless data. the different, you know, capabilities. at a different angle that, you know, I will help you create the models we talked about the zero and you know, George Gilbert was saying, well, there's a lot of technical debt. Well, I talked to him about the debt and one of the things we talked about was what do you optimize for E two APIs or Kubernetes So Google is more of, you know, building solutions angle like, you know, I give you an API kind of thing. you know, it could be things like that. We'll be right back with more coverage after this short break.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
George Gilbert	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Adrian	PERSON	0.99+
Dave	PERSON	0.99+
Andy	PERSON	0.99+
Google	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Adrian Carro	PERSON	0.99+
Dave Volante	PERSON	0.99+
Andy Thra	PERSON	0.99+
90%	QUANTITY	0.99+
15 years	QUANTITY	0.99+
John	PERSON	0.99+
Adam	PERSON	0.99+
13 announcements	QUANTITY	0.99+
Lego	ORGANIZATION	0.99+
John Farmer	PERSON	0.99+
Dave Ante	PERSON	0.99+
two	QUANTITY	0.99+
10 years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Legos	ORGANIZATION	0.99+
Bristol Myers Squibb	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Constellation Research	ORGANIZATION	0.99+
One	QUANTITY	0.99+
Christmas	EVENT	0.99+
second point	QUANTITY	0.99+
yesterday	DATE	0.99+
Anaconda	ORGANIZATION	0.99+
today	DATE	0.99+
Berkeley Paper	ORGANIZATION	0.99+
one	QUANTITY	0.99+
eight	QUANTITY	0.98+
700 different instances	QUANTITY	0.98+
three years	QUANTITY	0.98+
Swami	PERSON	0.98+
Aerospike	ORGANIZATION	0.98+
both	QUANTITY	0.98+
Snowflake	ORGANIZATION	0.98+
two things	QUANTITY	0.98+
60%	QUANTITY	0.98+

Evan Kaplan, InfluxData | AWS re:invent 2022

>>Hey everyone. Welcome to Las Vegas. The Cube is here, live at the Venetian Expo Center for AWS Reinvent 2022. Amazing attendance. This is day one of our coverage. Lisa Martin here with Day Ante. David is great to see so many people back. We're gonna be talk, we've been having great conversations already. We have a wall to wall coverage for the next three and a half days. When we talk to companies, customers, every company has to be a data company. And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, no longer a nice to have that is a differentiator and a competitive all >>About data. I mean, you know, I love the topic and it's, it's got so many dimensions and such texture, can't get enough of data. >>I know we have a great guest joining us. One of our alumni is back, Evan Kaplan, the CEO of Influx Data. Evan, thank you so much for joining us. Welcome back to the Cube. >>Thanks for having me. It's great to be here. So here >>We are, day one. I was telling you before we went live, we're nice and fresh hosts. Talk to us about what's new at Influxed since the last time we saw you at Reinvent. >>That's great. So first of all, we should acknowledge what's going on here. This is pretty exciting. Yeah, that does really feel like, I know there was a show last year, but this feels like the first post Covid shows a lot of energy, a lot of attention despite a difficult economy. In terms of, you know, you guys were commenting in the lead into Big data. I think, you know, if we were to talk about Big Data five, six years ago, what would we be talking about? We'd been talking about Hadoop, we were talking about Cloudera, we were talking about Hortonworks, we were talking about Big Data Lakes, data stores. I think what's happened is, is this this interesting dynamic of, let's call it if you will, the, the secularization of data in which it breaks into different fields, different, almost a taxonomy. You've got this set of search data, you've got this observability data, you've got graph data, you've got document data and what you're seeing in the market and now you have time series data. >>And what you're seeing in the market is this incredible capability by developers as well and mostly open source dynamic driving this, this incredible capability of developers to assemble data platforms that aren't unicellular, that aren't just built on Hado or Oracle or Postgres or MySQL, but in fact represent different data types. So for us, what we care about his time series, we care about anything that happens in time, where time can be the primary measurement, which if you think about it, is a huge proportion of real data. Cuz when you think about what drives ai, you think about what happened, what happened, what happened, what happened, what's going to happen. That's the functional thing. But what happened is always defined by a period, a measurement, a time. And so what's new for us is we've developed this new open source engine called IOx. And so it's basically a refresh of the whole database, a kilo database that uses Apache Arrow, par K and data fusion and turns it into a super powerful real time analytics platform. It was already pretty real time before, but it's increasingly now and it adds SQL capability and infinite cardinality. And so it handles bigger data sets, but importantly, not just bigger but faster, faster data. So that's primarily what we're talking about to show. >>So how does that affect where you can play in the marketplace? Is it, I mean, how does it affect your total available market? Your great question. Your, your customer opportunities. >>I think it's, it's really an interesting market in that you've got all of these different approaches to database. Whether you take data warehouses from Snowflake or, or arguably data bricks also. And you take these individual database companies like Mongo Influx, Neo Forge, elastic, and people like that. I think the commonality you see across the volume is, is many of 'em, if not all of them, are based on some sort of open source dynamic. So I think that is an in an untractable trend that will continue for on. But in terms of the broader, the broader database market, our total expand, total available tam, lots of these things are coming together in interesting ways. And so the, the, the wave that will ride that we wanna ride, because it's all big data and it's all increasingly fast data and it's all machine learning and AI is really around that measurement issue. That instrumentation the idea that if you're gonna build any sophisticated system, it starts with instrumentation and the journey is defined by instrumentation. So we view ourselves as that instrumentation tooling for understanding complex systems. And how, >>I have to follow quick follow up. Why did you say arguably data bricks? I mean open source ethos? >>Well, I was saying arguably data bricks cuz Spark, I mean it's a great company and it's based on Spark, but there's quite a gap between Spark and what Data Bricks is today. And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot like a really sophisticated data warehouse with a lot of post-processing capabilities >>And, and with an open source less >>Than a >>Core database. Yeah. Right, right, right. Yeah, I totally agree. Okay, thank you for that >>Part that that was not arguably like they're, they're not a good company or >>No, no. They got great momentum and I'm just curious. Absolutely. You know, so, >>So talk a little bit about IOx and, and what it is enabling you guys to achieve from a competitive advantage perspective. The key differentiators give us that scoop. >>So if you think about, so our old storage engine was called tsm, also open sourced, right? And IOx is open sourced and the old storage engine was really built around this time series measurements, particularly metrics, lots of metrics and handling those at scale and making it super easy for developers to use. But, but our old data engine only supported either a custom graphical UI that you'd build yourself on top of it or a dashboarding tool like Grafana or Chronograph or things like that. With IOCs. Two or three interventions were important. One is we now support, we'll support things like Tableau, Microsoft, bi, and so you're taking that same data that was available for instrumentation and now you're using it for business intelligence also. So that became super important and it kind of answers your question about the expanded market expands the market. The second thing is, when you're dealing with time series data, you're dealing with this concept of cardinality, which is, and I don't know if you're familiar with it, but the idea that that it's a multiplication of measurements in a table. And so the more measurements you want over the more series you have, you have this really expanding exponential set that can choke a database off. And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to think about that design point of view. And then lastly, it's just query performance is dramatically better. And so it's pretty exciting. >>So the unlimited cardinality, basically you could identify relationships between data and different databases. Is that right? Between >>The same database but different measurements, different tables, yeah. Yeah. Right. Yeah, yeah. So you can handle, so you could say, I wanna look at the way, the way the noise levels are performed in this room according to 400 different locations on 25 different days, over seven months of the year. And that each one is a measurement. Each one adds to cardinality. And you can say, I wanna search on Tuesdays in December, what the noise level is at 2:21 PM and you get a very quick response. That kind of instrumentation is critical to smarter systems. How are >>You able to process that data at at, in a performance level that doesn't bring the database to its knees? What's the secret sauce behind that? >>It's AUM database. It's built on Parque and Apache Arrow. But it's, but to say it's nice to say without a much longer conversation, it's an architecture that's really built for pulling that kind of data. If you know the data is time series and you're looking for a time measurement, you already have the ability to optimize pretty dramatically. >>So it's, it's that purpose built aspect of it. It's the >>Purpose built aspect. You couldn't take Postgres and do the same >>Thing. Right? Because a lot of vendors say, oh yeah, we have time series now. Yeah. Right. So yeah. Yeah. Right. >>And they >>Do. Yeah. But >>It's not, it's not, the founding of the company came because Paul Dicks was working on Wall Street building time series databases on H base, on MyQ, on other platforms and realize every time we do it, we have to rewrite the code. We build a bunch of application logic to handle all these. We're talking about, we have customers that are adding hundreds of millions to billions of points a second. So you're talking about an ingest level. You know, you think about all those data points, you're talking about ingest level that just doesn't, you know, it just databases aren't designed for that. Right? And so it's not just us, our competitors also build good time series databases. And so the category is really emergent. Yeah, >>Sure. Talk about a favorite customer story they think really articulates the value of what Influx is doing, especially with IOx. >>Yeah, sure. And I love this, I love this story because you know, Tesla may not be in favor because of the latest Elon Musker aids, but, but, but so we've had about a four year relationship with Tesla where they built their power wall technology around recording that, seeing your device, seeing the stuff, seeing the charging on your car. It's all captured in influx databases that are reporting from power walls and mega power packs all over the world. And they report to a central place at, at, at Tesla's headquarters and it reports out to your phone and so you can see it. And what's really cool about this to me is I've got two Tesla cars and I've got a Tesla solar roof tiles. So I watch this date all the time. So it's a great customer story. And actually if you go on our website, you can see I did an hour interview with the engineer that designed the system cuz the system is super impressive and I just think it's really cool. Plus it's, you know, it's all the good green stuff that we really appreciate supporting sustainability, right? Yeah. >>Right, right. Talk about from a, what's in it for me as a customer, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers like Tesla, like other industry customers as well? >>Well, so it's relatively new. It just arrived in our cloud product. So Tesla's not using it today. We have a first set of customers starting to use it. We, the, it's in open source. So it's a very popular project in the open source world. But the key issues are, are really the stuff that we've kind of covered here, which is that a broad SQL environment. So accessing all those SQL developers, the same people who code against Snowflake's data warehouse or data bricks or Postgres, can now can code that data against influx, open up the BI market. It's the cardinality, it's the performance. It's really an architecture. It's the next gen. We've been doing this for six years, it's the next generation of everything. We've seen how you make time series be super performing. And that's only relevant because more and more things are becoming real time as we develop smarter and smarter systems. The journey is pretty clear. You instrument the system, you, you let it run, you watch for anomalies, you correct those anomalies, you re instrument the system. You do that 4 billion times, you have a self-driving car, you do that 55 times, you have a better podcast that is, that is handling its audio better, right? So everything is on that journey of getting smarter and smarter. So >>You guys, you guys the big committers to IOCs, right? Yes. And how, talk about how you support the, develop the surrounding developer community, how you get that flywheel effect going >>First. I mean it's actually actually a really kind of, let's call it, it's more art than science. Yeah. First of all, you you, you come up with an architecture that really resonates for developers. And Paul Ds our founder, really is a developer's developer. And so he started talking about this in the community about an architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file formats that uses Apache Arrow for directing queries and things like that and uses data fusion and said what this thing needs is a Columbia database that sits behind all of this stuff and integrates it. And he started talking about it two years ago and then he started publishing in IOCs that commits in the, in GitHub commits. And slowly, but over time in Hacker News and other, and other people go, oh yeah, this is fundamentally right. >>It addresses the problems that people have with things like click cows or plain databases or Coast and they go, okay, this is the right architecture at the right time. Not different than original influx, not different than what Elastic hit on, not different than what Confluent with Kafka hit on and their time is you build an audience of people who are committed to understanding this kind of stuff and they become committers and they become the core. Yeah. And you build out from it. And so super. And so we chose to have an MIT open source license. Yeah. It's not some secondary license competitors can use it and, and competitors can use it against us. Yeah. >>One of the things I know that Influx data talks about is the time to awesome, which I love that, but what does that mean? What is the time to Awesome. Yeah. For developer, >>It comes from that original story where, where Paul would have to write six months of application logic and stuff to build a time series based applications. And so Paul's notion was, and this was based on the original Mongo, which was very successful because it was very easy to use relative to most databases. So Paul developed this commitment, this idea that I quickly joined on, which was, hey, it should be relatively quickly for a developer to build something of import to solve a problem, it should be able to happen very quickly. So it's got a schemaless background so you don't have to know the schema beforehand. It does some things that make it really easy to feel powerful as a developer quickly. And if you think about that journey, if you feel powerful with a tool quickly, then you'll go deeper and deeper and deeper and pretty soon you're taking that tool with you wherever you go, it becomes the tool of choice as you go to that next job or you go to that next application. And so that's a fundamental way we think about it. To be honest with you, we haven't always delivered perfectly on that. It's generally in our dna. So we do pretty well, but I always feel like we can do better. >>So if you were to put a bumper sticker on one of your Teslas about influx data, what would it >>Say? By the way, I'm not rich. It just happened to be that we have two Teslas and we have for a while, we just committed to that. The, the, so ask the question again. Sorry. >>Bumper sticker on influx data. What would it say? How, how would I >>Understand it be time to Awesome. It would be that that phrase his time to Awesome. Right. >>Love that. >>Yeah, I'd love it. >>Excellent time to. Awesome. Evan, thank you so much for joining David, the >>Program. It's really fun. Great thing >>On Evan. Great to, you're on. Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really transform their businesses, which is all about business transformation these days. We appreciate your insights. >>That's great. Thank >>You for our guest and Dave Ante. I'm Lisa Martin, you're watching The Cube, the leader in emerging and enterprise tech coverage. We'll be right back with our next guest.

Published Date : Nov 29 2022

SUMMARY :

And one of the things I think we learned in the pandemic is that access to real time data and real time analytics, I mean, you know, I love the topic and it's, it's got so many dimensions and such Evan, thank you so much for joining us. It's great to be here. Influxed since the last time we saw you at Reinvent. terms of, you know, you guys were commenting in the lead into Big data. And so it's basically a refresh of the whole database, a kilo database that uses So how does that affect where you can play in the marketplace? And you take these individual database companies like Mongo Influx, Why did you say arguably data bricks? And in some ways data bricks from the outside looking in looks a lot like Snowflake to me looks a lot Okay, thank you for that You know, so, So talk a little bit about IOx and, and what it is enabling you guys to achieve from a And the way we've designed IIS to handle what we call infinite cardinality, where you don't even have to So the unlimited cardinality, basically you could identify relationships between data And you can say, time measurement, you already have the ability to optimize pretty dramatically. So it's, it's that purpose built aspect of it. You couldn't take Postgres and do the same So yeah. And so the category is really emergent. especially with IOx. And I love this, I love this story because you know, what you guys have done, the change to IOCs, what, what are some of the key features of it and the key values in it for customers you have a self-driving car, you do that 55 times, you have a better podcast that And how, talk about how you support architecture that uses Apache Arrow Parque, which is, you know, the standard now becoming for file And you build out from it. One of the things I know that Influx data talks about is the time to awesome, which I love that, So it's got a schemaless background so you don't have to know the schema beforehand. It just happened to be that we have two Teslas and we have for a while, What would it say? Understand it be time to Awesome. Evan, thank you so much for joining David, the Great thing Haven't Well, great to have you back talking about what you guys are doing and helping organizations like Tesla and others really That's great. You for our guest and Dave Ante.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Evan Kaplan	PERSON	0.99+
six months	QUANTITY	0.99+
Evan	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Influx Data	ORGANIZATION	0.99+
Paul	PERSON	0.99+
55 times	QUANTITY	0.99+
two	QUANTITY	0.99+
2:21 PM	DATE	0.99+
Las Vegas	LOCATION	0.99+
Dave Ante	PERSON	0.99+
Paul Dicks	PERSON	0.99+
six years	QUANTITY	0.99+
last year	DATE	0.99+
hundreds of millions	QUANTITY	0.99+
Mongo Influx	ORGANIZATION	0.99+
4 billion times	QUANTITY	0.99+
Two	QUANTITY	0.99+
December	DATE	0.99+
Microsoft	ORGANIZATION	0.99+
Influxed	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Influx	ORGANIZATION	0.99+
IOx	TITLE	0.99+
MySQL	TITLE	0.99+
three	QUANTITY	0.99+
Tuesdays	DATE	0.99+
each one	QUANTITY	0.98+
400 different locations	QUANTITY	0.98+
25 different days	QUANTITY	0.98+
first set	QUANTITY	0.98+
an hour	QUANTITY	0.98+
First	QUANTITY	0.98+
six years ago	DATE	0.98+
The Cube	TITLE	0.98+
One	QUANTITY	0.98+
Neo Forge	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Each one	QUANTITY	0.98+
Paul Ds	PERSON	0.97+
IOx	ORGANIZATION	0.97+
today	DATE	0.97+
Teslas	ORGANIZATION	0.97+
MIT	ORGANIZATION	0.96+
Postgres	ORGANIZATION	0.96+
over seven months	QUANTITY	0.96+
one	QUANTITY	0.96+
five	DATE	0.96+
Venetian Expo Center	LOCATION	0.95+
Big Data Lakes	ORGANIZATION	0.95+
Cloudera	ORGANIZATION	0.94+
Columbia	LOCATION	0.94+
InfluxData	ORGANIZATION	0.94+
Wall Street	LOCATION	0.93+
SQL	TITLE	0.92+
Elastic	TITLE	0.92+
Data Bricks	ORGANIZATION	0.92+
Hacker News	TITLE	0.92+
two years ago	DATE	0.91+
Oracle	ORGANIZATION	0.91+
AWS Reinvent 2022	EVENT	0.91+
Elon Musker	PERSON	0.9+
Snowflake	ORGANIZATION	0.9+
Reinvent	ORGANIZATION	0.89+
billions of points a second	QUANTITY	0.89+
four year	QUANTITY	0.88+
Chronograph	TITLE	0.88+
Confluent	TITLE	0.87+
Spark	TITLE	0.86+
Apache	ORGANIZATION	0.86+
Snowflake	TITLE	0.85+
Grafana	TITLE	0.85+
GitHub	ORGANIZATION	0.84+

Subbu Iyer

>> And it'll be the fastest 15 minutes of your day from there. >> In three- >> We go Lisa. >> Wait. >> Yes >> Wait, wait, wait. I'm sorry I didn't pin the right speed. >> Yap, no, no rush. >> There we go. >> The beauty of not being live. >> I think, in the background. >> Fantastic, you all ready to go there, Lisa? >> Yeah. >> We are speeding around the horn and we are coming to you in five, four, three, two. >> Hey everyone, welcome to theCUBE's coverage of AWS re:Invent 2022. Lisa Martin here with you with Subbu Iyer one of our alumni who's now the CEO of Aerospike. Subbu, great to have you on the program. Thank you for joining us. >> Great as always to be on theCUBE Lisa, good to meet you. >> So, you know, every company these days has got to be a data company, whether it's a retailer, a manufacturer, a grocer, a automotive company. But for a lot of companies, data is underutilized yet a huge asset that is value added. Why do you think companies are struggling so much to make data a value added asset? >> Well, you know, we see this across the board. When I talk to customers and prospects there is a desire from the business and from IT actually to leverage data to really fuel newer applications, newer services newer business lines if you will, for companies. I think the struggle is one, I think one the, the plethora of data that is created. Surveys say that over the next three years data is going to be you know by 2025 around 175 zettabytes, right? A hundred and zettabytes of data is going to be created. And that's really a growth of north of 30% year over year. But the more important and the interesting thing is the real time component of that data is actually growing at, you know 35% CAGR. And what enterprises desire is decisions that are made in real time or near real time. And a lot of the challenges that do exist today is that either the infrastructure that enterprises have in place was never built to actually manipulate data in real time. The second is really the ability to actually put something in place which can handle spikes yet be cost efficient to fuel. So you can build for really peak loads, but then it's very expensive to operate that particular service at normal loads. So how do you build something which actually works for you for both users, so to speak. And the last point that we see out there is even if you're able to, you know bring all that data you don't have the processing capability to run through that data. So as a result, most enterprises struggle with one capturing the data, making decisions from it in real time and really operating it at the cost point that they need to operate it at. >> You know, you bring up a great point with respect to real time data access. And I think one of the things that we've learned the last couple of years is that access to real time data it's not a nice to have anymore. It's business critical for organizations in any industry. Talk about that as one of the challenges that organizations are facing. >> Yeah, when we started Aerospike, right? When the company started, it started with the premise that data is going to grow, number one exponentially. Two, when applications open up to the internet there's going to be a flood of users and demands on those applications. And that was true primarily when we started the company in the ad tech vertical. So ad tech was the first vertical where there was a lot of data both on the supply set and the demand side from an inventory of ads that were available. And on the other hand, they had like microseconds or milliseconds in which they could make a decision on which ad to put in front of you and I so that we would click or engage with that particular ad. But over the last three to five years what we've seen is as digitization has actually permeated every industry out there the need to harness data in real time is pretty much present in every industry. Whether that's retail, whether that's financial services telecommunications, e-commerce, gaming and entertainment. Every industry has a desire. One, the innovative companies, the small companies rather are innovating at a pace and standing up new businesses to compete with the larger companies in each of these verticals. And the larger companies don't want to be left behind. So they're standing up their own competing services or getting into new lines of business that really harness and are driven by real time data. So this compelling pressures, one, you know customer experience is paramount and we as customers expect answers in you know an instant, in real time. And on the other hand, the way they make decisions is based on a large data set because you know larger data sets actually propel better decisions. So there's competing pressures here which essentially drive the need one from a business perspective, two from a customer perspective to harness all of this data in real time. So that's what's driving an incessant need to actually make decisions in real or near real time. >> You know, I think one of the things that's been in short supply over the last couple of years is patience. We do expect as consumers whether we're in our business lives our personal lives that we're going to be getting be given information and data that's relevant it's personal to help us make those real time decisions. So having access to real time data is really business critical for organizations across any industries. Talk about some of the main capabilities that modern data applications and data platforms need to have. What are some of the key capabilities of a modern data platform that need to be delivered to meet demanding customer expectations? >> So, you know, going back to your initial question Lisa around why is data really a high value but underutilized or under-leveraged asset? One of the reasons we see is a lot of the data platforms that, you know, some of these applications were built on have been then around for a decade plus. And they were never built for the needs of today, which is really driving a lot of data and driving insight in real time from a lot of data. So there are four major capabilities that we see that are essential ingredients of any modern data platform. One is really the ability to, you know, operate at unlimited scale. So what we mean by that is really the ability to scale from gigabytes to even petabytes without any degradation in performance or latency or throughput. The second is really, you know, predictable performance. So can you actually deliver predictable performance as your data size grows or your throughput grows or your concurrent user on that application of service grows? It's really easy to build an application that operates at low scale or low throughput or low concurrency but performance usually starts degrading as you start scaling one of these attributes. The third thing is the ability to operate and always on globally resilient application. And that requires a really robust data platform that can be up on a five nine basis globally, can support global distribution because a lot of these applications have global users. And the last point is, goes back to my first answer which is, can you operate all of this at a cost point which is not prohibitive but it makes sense from a TCO perspective. 'Cause a lot of times what we see is people make choices of data platforms and as ironically their service or applications become more successful and more users join their journey the revenue starts going up, the user base starts going up but the cost basis starts crossing over the revenue and they're losing money on the service, ironically as the service becomes more popular. So really unlimited scale predictable performance always on a globally resilient basis and low TCO. These are the four essential capabilities of any modern data platform. >> So then talk to me with those as the four main core functionalities of a modern data platform, how does Aerospike deliver that? >> So we were built, as I said from day one to operate at unlimited scale and deliver predictable performance. And then over the years as we work with customers we build this incredible high availability capability which helps us deliver the always on, you know, operations. So we have customers who are who have been on the platform 10 years with no downtime for example, right? So we are talking about an amazing continuum of high availability that we provide for customers who operate these, you know globally resilient services. The key to our innovation here is what we call the hybrid memory architecture. So, you know, going a little bit technically deep here essentially what we built out in our architecture is the ability on each node or each server to treat a bank of SSDs or solid-state devices as essentially extended memory. So you're getting memory performance but you're accessing these SSDs. You're not paying memory prices but you're getting memory performance. As a result of that you can attach a lot more data to each node or each server in a distributed cluster. And when you kind of scale that across basically a distributed cluster you can do with Aerospike the same things at 60 to 80% lower server count. And as a result 60 to 80% lower TCO compared to some of the other options that are available in the market. Then basically, as I said that's the key kind of starting point to the innovation. We lay around capabilities like, you know replication, change data notification, you know synchronous and asynchronous replication. The ability to actually stretch a single cluster across multiple regions. So for example, if you're operating a global service you can have a single Aerospike cluster with one node in San Francisco one node in New York, another one in London and this would be basically seamlessly operating. So that, you know, this is strongly consistent, very few no SQL data platforms are strongly consistent or if they are strongly consistent they will actually suffer performance degradation. And what strongly consistent means is, you know all your data is always available it's guaranteed to be available there is no data lost any time. So in this configuration that I talked about if the node in London goes down your application still continues to operate, right? Your users see no kind of downtime and you know, when London comes up it rejoins the cluster and everything is back to kind of the way it was before, you know London left the cluster so to speak. So the ability to do this globally resilient highly available kind of model is really, really powerful. A lot of our customers actually use that kind of a scenario and we offer other deployment scenarios from a higher availability perspective. So everything starts with HMA or Hybrid Memory Architecture and then we start building a lot of these other capabilities around the platform. And then over the years what our customers have guided us to do is as they're putting together a modern kind of data infrastructure, we don't live in the silo. So Aerospike gets deployed with other technologies like streaming technologies or analytics technologies. So we built connectors into Kafka, Pulsar, so that as you're ingesting data from a variety of data sources you can ingest them at very high ingest speeds and store them persistently into Aerospike. Once the data is in Aerospike you can actually run Spark jobs across that data in a multi-threaded parallel fashion to get really insight from that data at really high throughput and high speed. >> High throughput, high speed, incredibly important especially as today's landscape is increasingly distributed. Data centers, multiple public clouds, Edge, IoT devices, the workforce embracing more and more hybrid these days. How are you helping customers to extract more value from data while also lowering costs? Go into some customer examples 'cause I know you have some great ones. >> Yeah, you know, I think, we have built an amazing set of customers and customers actually use us for some really mission critical applications. So, you know, before I get into specific customer examples let me talk to you about some of kind of the use cases which we see out there. We see a lot of Aerospike being used in fraud detection. We see us being used in recommendations engines we get used in customer data profiles, or customer profiles, Customer 360 stores, you know multiplayer gaming and entertainment. These are kind of the repeated use case, digital payments. We power most of the digital payment systems across the globe. Specific example from a specific example perspective the first one I would love to talk about is PayPal. So if you use PayPal today, then you know when you're actually paying somebody your transaction is, you know being sent through Aerospike to really decide whether this is a fraudulent transaction or not. And when you do that, you know, you and I as a customer are not going to wait around for 10 seconds for PayPal to say yay or nay. We expect, you know, the decision to be made in an instant. So we are powering that fraud detection engine at PayPal. For every transaction that goes through PayPal. Before us, you know, PayPal was missing out on about 2% of their SLAs which was essentially millions of dollars which they were losing because, you know, they were letting transactions go through and taking the risk that it's not a fraudulent transaction. With Aerospike they can now actually get a much better SLA and the data set on which they compute the fraud score has gone up by you know, several factors. So by 30X if you will. So not only has the data size that is powering the fraud engine actually gone up 30X with Aerospike but they're actually making decisions in an instant for, you know, 99.95% of their transactions. So that's- >> And that's what we expect as consumers, right? We want to know that there's fraud detection on the swipe regardless of who we're interacting with. >> Yes, and so that's a really powerful use case and you know, it's a great customer success story. The other one I would talk about is really Wayfair, right, from retail and you know from e-commerce. So everybody knows Wayfair global leader in really in online home furnishings and they use us to power their recommendations engine. And you know it's basically if you're purchasing this, people who bought this also bought these five other things, so on and so forth. They have actually seen their cart size at checkout go up by up to 30%, as a result of actually powering their recommendations engine through Aerospike. And they were able to do this by reducing the server count by 9X. So on one ninth of the servers that were there before Aerospike, they're now powering their recommendations engine and seeing cart size checkout go up by 30%. Really, really powerful in terms of the business outcome and what we are able to, you know, drive at Wayfair. >> Hugely powerful as a business outcome. And that's also what the consumer wants. The consumer is expecting these days to have a very personalized relevant experience that's going to show me if I bought this show me something else that's related to that. We have this expectation that needs to be really fueled by technology. >> Exactly, and you know, another great example you asked about you know, customer stories, Adobe. Who doesn't know Adobe, you know. They're on a mission to deliver the best customer experience that they can. And they're talking about, you know great Customer 360 experience at scale and they're modernizing their entire edge compute infrastructure to support this with Aerospike. Going to Aerospike basically what they have seen is their throughput go up by 70%, their cost has been reduced by 3X. So essentially doing it at one third of the cost while their annual data growth continues at, you know about north of 30%. So not only is their data growing they're able to actually reduce their cost to actually deliver this great customer experience by one third to one third and continue to deliver great Customer 360 experience at scale. Really, really powerful example of how you deliver Customer 360 in a world which is dynamic and you know on a data set which is constantly growing at north of 30% in this case. >> Those are three great examples, PayPal, Wayfair, Adobe, talking about, especially with Wayfair when you talk about increasing their cart checkout sizes but also with Adobe increasing throughput by over 70%. I'm looking at my notes here. While data is growing at 32%, that's something that every organization has to contend with data growth is continuing to scale and scale and scale. >> Yap, I'll give you a fun one here. So, you know, you may not have heard about this company it's called Dream11 and it's a company based out of India but it's a very, you know, it's a fun story because it's the world's largest fantasy sports platform. And you know, India is a nation which is cricket crazy. So you know, when they have their premier league going on and there's millions of users logged onto the Dream11 platform building their fantasy league teams and you know, playing on that particular platform, it has a hundred million users a hundred million plus users on the platform, 5.5 million concurrent users and they have been growing at 30%. So they are considered an amazing success story in terms of what they have accomplished and the way they have architected their platform to operate at scale. And all of that is really powered by Aerospike. Think about that they're able to deliver all of this and support a hundred million users 5.5 million concurrent users all with, you know 99 plus percent of their transactions completing in less than one millisecond. Just incredible success story. Not a brand that is, you know, world renowned but at least you know from what we see out there it's an amazing success story of operating at scale. >> Amazing success story, huge business outcomes. Last question for you as we're almost out of time is talk a little bit about Aerospike AWS the partnership Graviton2 better together. What are you guys doing together there? >> Great partnership. AWS has multiple layers in terms of partnerships. So, you know, we engage with AWS at the executive level. They plan out, really roll out of new instances in partnership with us, making sure that, you know those instance types work well for us. And then we just released support for Aerospike on the Graviton platform and we just announced a benchmark of Aerospike running on Graviton on AWS. And what we see out there is with the benchmark a 1.6X improvement in price performance. And you know about 18% increase in throughput while maintaining a 27% reduction in cost, you know, on Graviton. So this is an amazing story from a price performance perspective, performance per watt for greater energy efficiencies, which basically a lot of our customers are starting to kind of talk to us about leveraging this to further meet their sustainability target. So great story from Aerospike and AWS not just from a partnership perspective on a technology and an executive level, but also in terms of what joint outcomes we are able to deliver for our customers. >> And it sounds like a great sustainability story. I wish we had more time so we would talk about this but thank you so much for talking about the main capabilities of a modern data platform, what's needed, why, and how you guys are delivering that. We appreciate your insights and appreciate your time. >> Thank you very much. I mean, if folks are at re:Invent next week or this week come on and see us at our booth and we are in the data analytics pavilion and you can find us pretty easily. Would love to talk to you. >> Perfect, we'll send them there. Subbu Iyer, thank you so much for joining me on the program today. We appreciate your insights. >> Thank you Lisa. >> I'm Lisa Martin, you're watching theCUBE's coverage of AWS re:Invent 2022. Thanks for watching. >> Clear- >> Clear cutting. >> Nice job, very nice job.

Published Date : Nov 25 2022

SUMMARY :

the fastest 15 minutes I'm sorry I didn't pin the right speed. and we are coming to you in Subbu, great to have you on the program. Great as always to be on So, you know, every company these days And a lot of the challenges that access to real time data to put in front of you and I and data platforms need to have. One of the reasons we see is So the ability to do How are you helping customers let me talk to you about fraud detection on the swipe and you know, it's a great We have this expectation that needs to be Exactly, and you know, with Wayfair when you talk So you know, when they have What are you guys doing together there? And you know about 18% and how you guys are delivering that. and you can find us pretty easily. for joining me on the program today. of AWS re:Invent 2022.

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
60	QUANTITY	0.99+
London	LOCATION	0.99+
Lisa	PERSON	0.99+
PayPal	ORGANIZATION	0.99+
New York	LOCATION	0.99+
15 minutes	QUANTITY	0.99+
3X	QUANTITY	0.99+
2025	DATE	0.99+
Wayfair	ORGANIZATION	0.99+
35%	QUANTITY	0.99+
Adobe	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
99.95%	QUANTITY	0.99+
10 seconds	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
30X	QUANTITY	0.99+
70%	QUANTITY	0.99+
32%	QUANTITY	0.99+
27%	QUANTITY	0.99+
1.6X	QUANTITY	0.99+
each server	QUANTITY	0.99+
two	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.99+
Aerospike	ORGANIZATION	0.99+
millions of dollars	QUANTITY	0.99+
India	LOCATION	0.99+
Subbu	PERSON	0.99+
9X	QUANTITY	0.99+
five	QUANTITY	0.99+
99 plus percent	QUANTITY	0.99+
first answer	QUANTITY	0.99+
third thing	QUANTITY	0.99+
less than one millisecond	QUANTITY	0.99+
10 years	QUANTITY	0.99+
this week	DATE	0.99+
Subbu Iyer	PERSON	0.99+
one third	QUANTITY	0.99+
millions of users	QUANTITY	0.99+
over 70%	QUANTITY	0.98+
both users	QUANTITY	0.98+
Dream11	ORGANIZATION	0.98+
80%	QUANTITY	0.98+
today	DATE	0.98+
Graviton	TITLE	0.98+
each node	QUANTITY	0.98+
second	QUANTITY	0.98+
both	QUANTITY	0.98+
three	QUANTITY	0.98+
four	QUANTITY	0.98+
Two	QUANTITY	0.98+
one node	QUANTITY	0.98+
hundred million users	QUANTITY	0.98+
first vertical	QUANTITY	0.97+
about 2%	QUANTITY	0.97+
Aerospike	TITLE	0.97+
single cluster	QUANTITY	0.96+

Ali Ghodsi, Databricks | Cube Conversation Partner Exclusive

(outro music) >> Hey, I'm John Furrier, here with an exclusive interview with Ali Ghodsi, who's the CEO of Databricks. Ali, great to see you. Preview for reinvent. We're going to launch this story, exclusive Databricks material on the notes, after the keynotes prior to the keynotes and after the keynotes that reinvent. So great to see you. You know, you've been a partner of AWS for a very, very long time. I think five years ago, I think I first interviewed you, you were one of the first to publicly declare that this was a place to build a company on and not just post an application, but refactor capabilities to create, essentially a platform in the cloud, on the cloud. Not just an ISV; Independent Software Vendor, kind of an old term, we're talking about real platform like capability to change the game. Can you talk about your experience as an AWS partner? >> Yeah, look, so we started in 2013. I swiped my personal credit card on AWS and some of my co-founders did the same. And we started building. And we were excited because we just thought this is a much better way to launch a company because you can just much faster get time to market and launch your thing and you can get the end users much quicker access to the thing you're building. So we didn't really talk to anyone at AWS, we just swiped a credit card. And eventually they told us, "Hey, do you want to buy extra support?" "You're asking a lot of advanced questions from us." "Maybe you want to buy our advanced support." And we said, no, no, no, no. We're very advanced ourselves, we know what we're doing. We're not going to buy any advanced support. So, you know, we just built this, you know, startup from nothing on AWS without even talking to anyone there. So at some point, I think around 2017, they suddenly saw this company with maybe a hundred million ARR pop up on their radar and it's driving massive amounts of compute, massive amounts of data. And it took a little bit in the beginning just us to get to know each other because as I said, it's like we were not on their radar and we weren't really looking, we were just doing our thing. And then over the years the partnership has deepened and deepened and deepened and then with, you know, Andy (indistinct) really leaning into the partnership, he mentioned us at Reinvent. And then we sort of figured out a way to really integrate the two service, the Databricks platform with AWS . And today it's an amazing partnership. You know, we directly connected with the general managers for the services. We're connected at the CEO level, you know, the sellers get compensated for pushing Databricks, we're, we have multiple offerings on their marketplace. We have a native offering on AWS. You know, we're prominently always sort of marketed and you know, we're aligned also vision wise in what we're trying to do. So yeah, we've come a very, very long way. >> Do you consider yourself a SaaS app or an ISV or do you see yourself more of a platform company because you have customers. How would you categorize your category as a company? >> Well, it's a data platform, right? And actually the, the strategy of the Databricks is take what's otherwise five, six services in the industry or five, six different startups, but do them as part of one data platform that's integrated. So in one word, the strategy of data bricks is "unification." We call it the data lake house. But really the idea behind the data lake house is that of unification, or in more words it's, "The whole is greater than the sum of its parts." So you could actually go and buy five, six services out there or actually use five, six services from the cloud vendors, stitch it together and it kind of resembles Databricks. Our power is in doing those integrated, together in a way in which it's really, really easy and simple to use for end users. So yeah, we're a data platform. I wouldn't, you know, ISV that's a old term, you know, Independent Software Vendor. You know, I think, you know, we have actually a whole slew of ISVs on top of Databricks, that integrate with our platform. And you know, in our marketplace as well as in our partner connect, we host those ISVs that then, you know, work on top of the data that we have in the Databricks, data lake house. >> You know, I think one of the things your journey has been great to document and watch from the beginning. I got to give you guys credit over there and props, congratulations. But I think you're the poster child as a company to what we see enterprises doing now. So go back in time when you guys swiped a credit card, you didn't need attending technical support because you guys had brains, you were refactoring, rethinking. It wasn't just banging out software, you had, you were doing some complex things. It wasn't like it was just write some software hosted on server. It was really a lot more. And as a result your business worth billions of dollars. I think 38 billion or something like that, big numbers, big numbers of great revenue growth as well, billions in revenue. You have customers, you have an ecosystem, you have data applications on top of Databricks. So in a way you're a cloud on top of the cloud. So is there a cloud on top of the cloud? So you have ISVs, Amazon has ISVs. Can you take us through what this means and at this point in history, because this seems to be an advanced version of benefits of platforming and refactoring, leveraging say AWS. >> Yeah, so look, when we started, there was really only one game in town. It was AWS. So it was one cloud. And the strategy of the company then was, well Amazon had this beautiful set of services that they're building bottom up, they have storage, compute, networking, and then they have databases and so on. But it's a lot of services. So let us not directly compete with AWS and try to take out one of their services. Let's not do that because frankly we can't. We were not of that size. They had the scale, they had the size and they were the only cloud vendor in town. So our strategy instead was, let's do something else. Let's not compete directly with say, a particular service they're building, let's take a different strategy. What if we had a unified holistic data platform, where it's just one integrated service end to end. So think of it as Microsoft office, which contains PowerPoint, and Word, and Excel and even Access, if you want to use it. What if we build that and AWS has this really amazing knack for releasing things, you know services, lots of them, every reinvent. And they're sort of a DevOps person's dream and you can stitch these together and you know you have to be technical. How do we elevate that and make it simpler and integrate it? That was our original strategy and it resonated with a segment of the market. And the reason it worked with AWS so that we wouldn't butt heads with AWS was because we weren't a direct replacement for this service or for that service, we were taking a different approach. And AWS, because credit goes to them, they're so customer obsessed, they would actually do what's right for the customer. So if the customer said we want this unified thing, their sellers would actually say, okay, so then you should use Databricks. So they truly are customer obsessed in that way. And I really mean it, John. Things have changed over the years. They're not the only cloud anymore. You know, Azure is real, GCP is real, there's also Alibaba. And now over 70% of our customers are on more than one cloud. So now what we hear from them is, not only want, do we want a simplified, unified thing, but we want it also to work across the clouds. Because those of them that are seriously considering multiple clouds, they don't want to use a service on cloud one and then use a similar service on cloud two. But it's a little bit different. And now they have to do twice the work to make it work. You know, John, it's hard enough as it is, like it's this data stuff and analytics. It's not a walk in the park, you know. You hire an administrator in the back office that clicks a button and its just, now you're a data driven digital transformed company. It's hard. If you now have to do it again on the second cloud with different set of services and then again on a third cloud with a different set of services. That's very, very costly. So the strategy then has changed that, how do we take that unified simple approach and make it also the same and standardize across the clouds, but then also integrate it as far down as we can on each of the clouds. So that you're not giving up any of the benefits that the particular cloud has. >> Yeah, I think one of the things that we see, and I want get your reaction to this, is this rise of the super cloud as we call it. I think you were involved in the Sky paper that I saw your position paper came out after we had introduced Super Cloud, which is great. Congratulations to the Berkeley team, wearing the hat here. But you guys are, I think a driver of this because you're creating the need for these things. You're saying, okay, we went on one cloud with AWS and you didn't hide that. And now you're publicly saying there's other clouds too, increased ham for your business. And customers have multiple clouds in their infrastructure for the best of breed that they have. Okay, get that. But there's still a challenge around the innovation, growth that's still around the corner. We still have a supply chain problem, we still have skill gaps. You know, you guys are unique at Databricks as other these big examples of super clouds that are developing. Enterprises don't have the Databricks kind of talent. They need, they need turnkey solutions. So Adam and the team at Amazon are promoting, you know, more solution oriented approaches higher up on the stack. You're starting to see kind of like, I won't say templates, but you know, almost like application specific headless like, low code, no code capability to accelerate clients who are wanting to write code for the modern error. Right, so this kind of, and then now you, as you guys pointed out with these common services, you're pushing the envelope. So you're saying, hey, I need to compete, I don't want to go to my customers and have them to have a staff or this cloud and this cloud and this cloud because they don't have the staff. Or if they do, they're very unique. So what's your reaction? Because this kind is the, it kind of shows your leadership as a partner of AWS and the clouds, but also highlights I think what's coming. But you share your reaction. >> Yeah, look, it's, first of all, you know, I wish I could take credit for this but I can't because it's really the customers that have decided to go on multiple clouds. You know, it's not Databricks that you know, push this or some other vendor, you know, that, Snowflake or someone who pushed this and now enterprises listened to us and they picked two clouds. That's not how it happened. The enterprises picked two clouds or three clouds themselves and we can get into why, but they did that. So this largely just happened in the market. We as data platforms responded to what they're then saying, which is they're saying, "I don't want to redo this again on the other cloud." So I think the writing is on the wall. I think it's super obvious what's going to happen next. They will say, "Any service I'm using, it better work exactly the same on all the clouds." You know, that's what's going to happen. So in the next five years, every enterprise will say, "I'm going to use the service, but you better make sure that this service works equally well on all of the clouds." And obviously the multicloud vendors like us, are there to do that. But I actually think that what you're going to see happening is that you're going to see the cloud vendors changing the existing services that they have to make them work on the other clouds. That's what's goin to happen, I think. >> Yeah, and I think I would add that, first of all, I agree with you. I think that's going to be a forcing function. Because I think you're driving it. You guys are in a way, one, are just an actor in the driving this because you're on the front end of this and there are others and there will be people following. But I think to me, I'm a cloud vendor, I got to differentiate. Adam, If I'm Adam Saleski, I got to say, "Hey, I got to differentiate." So I don't wan to get stuck in the middle, so to speak. Am I just going to innovate on the hardware AKA infrastructure or am I going to innovate at the higher level services? So what we're talking about here is the tail of two clouds within Amazon, for instance. So do I innovate on the silicon and get low level into the physics and squeeze performance out of the hardware and infrastructure? Or do I focus on ease of use at the top of the stack for the developers? So again, there's a channel of two clouds here. So I got to ask you, how do they differentiate? Number one and number two, I never heard a developer ever say, "I want to run my app or workload on the slower cloud." So I mean, you know, back when we had PCs you wanted to go, "I want the fastest processor." So again, you can have common level services, but where is that performance differentiation with the cloud? What do the clouds do in your opinion? >> Yeah, look, I think it's pretty clear. I think that it's, this is, you know, no surprise. Probably 70% or so of the revenue is in the lower infrastructure layers, compute, storage, networking. And they have to win that. They have to be competitive there. As you said, you can say, oh you know, I guess my CPUs are slower than the other cloud, but who cares? I have amazing other services which only work on my cloud by the way, right? That's not going to be a winning recipe. So I think all three are laser focused on, we going to have specialized hardware and the nuts and bolts of the infrastructure, we can do it better than the other clouds for sure. And you can see lots of innovation happening there, right? The Graviton chips, you know, we see huge price performance benefits in those chips. I mean it's real, right? It's basically a 20, 30% free lunch. You know, why wouldn't you, why wouldn't you go for it there? There's no downside. You know, there's no, "got you" or no catch. But we see Azure doing the same thing now, they're also building their own chips and we know that Google builds specialized machine learning chips, TPU, Tenor Processing Units. So their legs are focused on that. I don't think they can give up that or focused on higher levels if they had to pick bets. And I think actually in the next few years, most of us have to make more, we have to be more deliberate and calculated in the picks we do. I think in the last five years, most of us have said, "We'll do all of it." You know. >> Well you made a good bet with Spark, you know, the duke was pretty obvious trend that was, everyone was shut on that bandwagon and you guys picked a big bet with Spark. Look what happened with you guys? So again, I love this betting kind of concept because as the world matures, growth slows down and shifts and that next wave of value coming in, AKA customers, they're going to integrate with a new ecosystem. A new kind of partner network for AWS and the other clouds. But with aws they're going to need to nurture the next Databricks. They're going to need to still provide that SaaS, ISV like experience for, you know, a basic software hosting or some application. But I go to get your thoughts on this idea of multiple clouds because if I'm a developer, the old days was, old days, within our decade, full stack developer- >> It was two years ago, yeah (John laughing) >> This is a decade ago, full stack and then the cloud came in, you kind had the half stack and then you would do some things. It seems like the clouds are trying to say, we want to be the full stack or not. Or is it still going to be, you know, I'm an application like a PC and a Mac, I'm going to write the same application for both hardware. I mean what's your take on this? Are they trying to do full stack and you see them more like- >> Absolutely. I mean look, of course they're going, they have, I mean they have over 300, I think Amazon has over 300 services, right? That's not just compute, storage, networking, it's the whole stack, right? But my key point is, I think they have to nail the core infrastructure storage compute networking because the three clouds that are there competing, they're formidable companies with formidable balance sheets and it doesn't look like any of them is going to throw in the towel and say, we give up. So I think it's going to intensify. And given that they have a 70% revenue on that infrastructure layer, I think they, if they have to pick their bets, I think they'll focus it on that infrastructure layer. I think the layer above where they're also placing bets, they're doing that, the full stack, right? But there I think the demand will be, can you make that work on the other clouds? And therein lies an innovator's dilemma because if I make it work on the other clouds, then I'm foregoing that 70% revenue of the infrastructure. I'm not getting it. The other cloud vendor is going to get it. So should I do that or not? Second, is the other cloud vendor going to be welcoming of me making my service work on their cloud if I am a competing cloud, right? And what kind of terms of service are I giving me? And am I going to really invest in doing that? And I think right now we, you know, most, the vast, vast, vast majority of the services only work on the one cloud that you know, it's built on. It doesn't work on others, but this will shift. >> Yeah, I think the innovators dilemma is also very good point. And also add, it's an integrators dilemma too because now you talk about integration across services. So I believe that the super cloud movement's going to happen before Sky. And I think what explained by that, what you guys did and what other companies are doing by representing advanced, I call platform engineering, refactoring an existing market really fast, time to value and CAPEX is, I mean capital, market cap is going to be really fast. I think there's going to be an opportunity for those to emerge that's going to set the table for global multicloud ultimately in the future. So I think you're going to start to see the same pattern of what you guys did get in, leverage the hell out of it, use it, not in the way just to host, but to refactor and take down territory of markets. So number one, and then ultimately you get into, okay, I want to run some SLA across services, then there's a little bit more complication. I think that's where you guys put that beautiful paper out on Sky Computing. Okay, that makes sense. Now if you go to today's market, okay, I'm betting on Amazon because they're the best, this is the best cloud win scenario, not the most robust cloud. So if I'm a developer, I want the best. How do you look at their bet when it comes to data? Because now they've got machine learning, Swami's got a big keynote on Wednesday, I'm expecting to see a lot of AI and machine learning. I'm expecting to hear an end to end data story. This is what you do, so as a major partner, how do you view the moves Amazon's making and the bets they're making with data and machine learning and AI? >> First I want to lift off my hat to AWS for being customer obsessed. So I know that if a customer wants Databricks, I know that AWS and their sellers will actually help us get that customer deploy Databricks. Now which of the services is the customer going to pick? Are they going to pick ours or the end to end, what Swami is going to present on stage? Right? So that's the question we're getting. But I wanted to start with by just saying, their customer obsessed. So I think they're going to do the right thing for the customer and I see the evidence of it again and again and again. So kudos to them. They're amazing at this actually. Ultimately our bet is, customers want this to be simple, integrated, okay? So yes there are hundreds of services that together give you the end to end experience and they're very customizable that AWS gives you. But if you want just something simply integrated that also works across the clouds, then I think there's a special place for Databricks. And I think the lake house approach that we have, which is an integrated, completely integrated, we integrate data lakes with data warehouses, integrate workflows with machine learning, with real time processing, all these in one platform. I think there's going to be tailwinds because I think the most important thing that's going to happen in the next few years is that every customer is going to now be obsessed, given the recession and the environment we're in. How do I cut my costs? How do I cut my costs? And we learn this from the customers they're adopting the lake house because they're thinking, instead of using five vendors or three vendors, I can simplify it down to one with you and I can cut my cost. So I think that's going to be one of the main drivers of why people bet on the lake house because it helps them lower their TCO; Total Cost of Ownership. And it's as simple as that. Like I have three things right now. If I can get the same job done of those three with one, I'd rather do that. And by the way, if it's three or four across two clouds and I can just use one and it just works across two clouds, I'm going to do that. Because my boss is telling me I need to cut my budget. >> (indistinct) (John laughing) >> Yeah, and I'd rather not to do layoffs and they're asking me to do more. How can I get smaller budgets, not lay people off and do more? I have to cut, I have to optimize. What's happened in the last five, six years is there's been a huge sprawl of services and startups, you know, you know most of them, all these startups, all of them, all the activity, all the VC investments, well those companies sold their software, right? Even if a startup didn't make it big, you know, they still sold their software to some vendors. So the ecosystem is now full of lots and lots and lots and lots of different software. And right now people are looking, how do I consolidate, how do I simplify, how do I cut my costs? >> And you guys have a great solution. You're also an arms dealer and a innovator. So I have to ask this question, because you're a professor of the industry as well as at Berkeley, you've seen a lot of the historical innovations. If you look at the moment we're in right now with the recession, okay we had COVID, okay, it changed how people work, you know, people working at home, provisioning VLAN, all that (indistinct) infrastructure, okay, yeah, technology and cloud health. But we're in a recession. This is the first recession where the Amazon and the other cloud, mainly Amazon Web Services is a major economic puzzle in the piece. So they were never around before, even 2008, they were too small. They're now a major economic enabler, player, they're serving startups, enterprises, they have super clouds like you guys. They're a force and the people, their customers are cutting back but also they can also get faster. So agility is now an equation in the economic recovery. And I want to get your thoughts because you just brought that up. Customers can actually use the cloud and Databricks to actually get out of the recovery because no one's going to say, stop making profit or make more profit. So yeah, cut costs, be more efficient, but agility's also like, let's drive more revenue. So in this digital transformation, if you take this to conclusion, every company transforms, their company is the app. So their revenue is tied directly to their technology deployment. What's your reaction and comment to that because this is a new historical moment where cloud and scale and data, actually could be configured in a way to actually change the nature of a business in such a short time. And with the recession looming, no one's got time to wait. >> Yeah, absolutely. Look, the secular tailwind in the market is that of, you know, 10 years ago it was software is eating the world, now it's AI's going to eat all of software software. So more and more we're going to have, wherever you have software, which is everywhere now because it's eaten the world, it's going to be eaten up by AI and data. You know, AI doesn't exist without data so they're synonymous. You can't do machine learning if you don't have data. So yeah, you're going to see that everywhere and that automation will help people simplify things and cut down the costs and automate more things. And in the cloud you can also do that by changing your CAPEX to OPEX. So instead of I invest, you know, 10 million into a data center that I buy, I'm going to have headcount to manage the software. Why don't we change this to OPEX? And then they are going to optimize it. They want to lower the TCO because okay, it's in the cloud. but I do want the costs to be much lower that what they were in the previous years. Last five years, nobody cared. Who cares? You know what it costs. You know, there's a new brave world out there. Now there's like, no, it has to be efficient. So I think they're going to optimize it. And I think this lake house approach, which is an integration of the lakes and the warehouse, allows you to rationalize the two and simplify them. It allows you to basically rationalize away the data warehouse. So I think much faster we're going to see the, why do I need the data warehouse? If I can get the same thing done with the lake house for fraction of the cost, that's what's going to happen. I think there's going to be focus on that simplification. But I agree with you. Ultimately everyone knows, everybody's a software company. Every company out there is a software company and in the next 10 years, all of them are also going to be AI companies. So that is going to continue. >> (indistinct), dev's going to stop. And right sizing right now is a key economic forcing function. Final question for you and I really appreciate you taking the time. This year Reinvent, what's the bumper sticker in your mind around what's the most important industry dynamic, power dynamic, ecosystem dynamic that people should pay attention to as we move from the brave new world of okay, I see cloud, cloud operations. I need to really make it structurally change my business. How do I, what's the most important story? What's the bumper sticker in your mind for Reinvent? >> Bumper sticker? lake house 24. (John laughing) >> That's data (indistinct) bumper sticker. What's the- >> (indistinct) in the market. No, no, no, no. You know, it's, AWS talks about, you know, all of their services becoming a lake house because they want the center of the gravity to be S3, their lake. And they want all the services to directly work on that, so that's a lake house. We're Bumper see Microsoft with Synapse, modern, you know the modern intelligent data platform. Same thing there. We're going to see the same thing, we already seeing it on GCP with Big Lake and so on. So I actually think it's the how do I reduce my costs and the lake house integrates those two. So that's one of the main ways you can rationalize and simplify. You get in the lake house, which is the name itself is a (indistinct) of two things, right? Lake house, "lake" gives you the AI, "house" give you the database data warehouse. So you get your AI and you get your data warehousing in one place at the lower cost. So for me, the bumper sticker is lake house, you know, 24. >> All right. Awesome Ali, well thanks for the exclusive interview. Appreciate it and get to see you. Congratulations on your success and I know you guys are going to be fine. >> Awesome. Thank you John. It's always a pleasure. >> Always great to chat with you again. >> Likewise. >> You guys are a great team. We're big fans of what you guys have done. We think you're an example of what we call "super cloud." Which is getting the hype up and again your paper speaks to some of the innovation, which I agree with by the way. I think that that approach of not forcing standards is really smart. And I think that's absolutely correct, that having the market still innovate is going to be key. standards with- >> Yeah, I love it. We're big fans too, you know, you're doing awesome work. We'd love to continue the partnership. >> So, great, great Ali, thanks. >> Take care (outro music)

Published Date : Nov 23 2022

SUMMARY :

after the keynotes prior to the keynotes and you know, we're because you have customers. I wouldn't, you know, I got to give you guys credit over there So if the customer said we So Adam and the team at So in the next five years, But I think to me, I'm a cloud vendor, and calculated in the picks we do. But I go to get your thoughts on this idea Or is it still going to be, you know, And I think right now we, you know, So I believe that the super cloud I can simplify it down to one with you and startups, you know, and the other cloud, And in the cloud you can also do that I need to really make it lake house 24. That's data (indistinct) of the gravity to be S3, and I know you guys are going to be fine. It's always a pleasure. We're big fans of what you guys have done. We're big fans too, you know,

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Ali Ghodsi	PERSON	0.99+
Adam	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
Google	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
2008	DATE	0.99+
five vendors	QUANTITY	0.99+
Adam Saleski	PERSON	0.99+
five	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Ali	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
three vendors	QUANTITY	0.99+
70%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Excel	TITLE	0.99+
38 billion	QUANTITY	0.99+
four	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Word	TITLE	0.99+
three	QUANTITY	0.99+
two clouds	QUANTITY	0.99+
Andy	PERSON	0.99+
three clouds	QUANTITY	0.99+
10 million	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
twice	QUANTITY	0.99+
Second	QUANTITY	0.99+
over 300 services	QUANTITY	0.99+
one game	QUANTITY	0.99+
second cloud	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Sky	ORGANIZATION	0.99+
one word	QUANTITY	0.99+
OPEX	ORGANIZATION	0.99+
two things	QUANTITY	0.98+
two years ago	DATE	0.98+
Access	TITLE	0.98+
over 300	QUANTITY	0.98+
six years	QUANTITY	0.98+
over 70%	QUANTITY	0.98+
five years ago	DATE	0.98+

Ali Ghosdi, Databricks | AWS Partner Exclusive

Published Date : Nov 23 2022

SUMMARY :

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Ali Ghodsi	PERSON	0.99+
Adam	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2013	DATE	0.99+
Google	ORGANIZATION	0.99+
Alibaba	ORGANIZATION	0.99+
2008	DATE	0.99+
Ali Ghosdi	PERSON	0.99+
five vendors	QUANTITY	0.99+
Adam Saleski	PERSON	0.99+
five	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Ali	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
three vendors	QUANTITY	0.99+
70%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Excel	TITLE	0.99+
38 billion	QUANTITY	0.99+
four	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Word	TITLE	0.99+
three	QUANTITY	0.99+
two clouds	QUANTITY	0.99+
Andy	PERSON	0.99+
three clouds	QUANTITY	0.99+
10 million	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
twice	QUANTITY	0.99+
Second	QUANTITY	0.99+
over 300 services	QUANTITY	0.99+
one game	QUANTITY	0.99+
second cloud	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Sky	ORGANIZATION	0.99+
one word	QUANTITY	0.99+
OPEX	ORGANIZATION	0.99+
two things	QUANTITY	0.98+
two years ago	DATE	0.98+
Access	TITLE	0.98+
over 300	QUANTITY	0.98+
six years	QUANTITY	0.98+
over 70%	QUANTITY	0.98+
five years ago	DATE	0.98+

Justin Emerson, Pure Storage | SuperComputing 22

(soft music) >> Hello, fellow hardware nerds and welcome back to Dallas Texas where we're reporting live from Supercomputing 2022. My name is Savannah Peterson, joined with the John Furrier on my left. >> Looking good today. >> Thank you, John, so are you. It's been a great show so far. >> We've had more hosts, more guests coming than ever before. >> I know. >> Amazing, super- >> We've got a whole thing going on. >> It's been a super computing performance. >> It, wow. And, we'll see how many times we can say super on this segment. Speaking of super things, I am in a very unique position right now. I am a flanked on both sides by people who have been doing content on theCUBE for 12 years. Yes, you heard me right, our next guest was on theCUBE 12 years ago, the third event, was that right, John? >> Man: First ever VM World. >> Yeah, the first ever VM World, third event theCUBE ever did. We are about to have a lot of fun. Please join me in welcoming Justin Emerson of Pure Storage. Justin, welcome back. >> It's a pleasure to be here. It's been too long, you never call, you don't write. (Savannah laughs) >> Great to see you. >> Yeah, likewise. >> How fun is this? Has the set evolved? Is everything looking good? >> I mean, I can barely remember what happened last week, so. (everyone laughs) >> Well, I remember lot's changed that VM world. You know, Paul Moritz was the CEO if you remember at that time. His actual vision actually happened but not the way, for VMware, but the industry, the cloud, he called the software mainframe. We were kind of riffing- >> It was quite the decade. >> Unbelievable where we are now, how we got here, but not where we're going to be. And you're with Pure Storage now which we've been, as you know, covering as well. Where's the connection into the supercomputing? Obviously storage performance, big part of this show. >> Right, right. >> What's the take? >> Well, I think, first of all it's great to be back at events in person. We were talking before we went on, and it's been so great to be back at live events now. It's been such a drought over the last several years, but yeah, yeah. So I'm very glad that we're doing in person events again. For Pure, this is an incredibly important show. You know, the product that I work with, with FlashBlade is you know, one of our key areas is specifically in this high performance computing, AI machine learning kind of space. And so we're really glad to be here. We've met a lot of customers, met a lot of other folks, had a lot of really great conversations. So it's been a really great show for me. And also just seeing all the really amazing stuff that's around here, I mean, if you want to find, you know, see what all the most cutting edge data center stuff that's going to be coming down the pipe, this is the place to do it. >> So one of the big themes of the show for us and probably, well, big theme of your life, is balancing power efficiency. You have a product in this category, Direct Flash. Can you tell us a little bit more about that? >> Yeah, so Pure as a storage company, right, what do we do differently from everybody else? And if I had to pick one thing, right, I would talk about, it's, you know, as the name implies, we're an all, we're purely flash, we're an all flash company. We've always been, don't plan to be anything else. And part of that innovation with Direct Flash is the idea of rather than treating a solid state disc as like a hard drive, right? Treat it as it actually is, treat it like who it really is and that's a very different kind of thing. And so Direct Flash is all about bringing native Flash interfaces to our product portfolio. And what's really exciting for me as a FlashBlade person, is now that's also part of our FlashBlade S portfolio, which just launched in June. And so the benefits of that are our myriad. But, you know, talking about efficiency, the biggest difference is that, you know, we can use like 90% less DRAM in our drives, which you know, everything uses, everything that you put in a drive uses power, it adds cost and all those things and so that really gives us an efficiency edge over everybody else and at a show like this, where, I mean, you walk the aisles and there's there's people doing liquid cooling and so much immersion stuff, and the reason they're doing that is because power is just increasing everywhere, right? So if you can figure out how do we use less power in some areas means you can shift that budget to other places. So if you can talk to a customer and say, well, if I could shrink your power budget for storage by two thirds or even, save you two-thirds of power, how many more accelerators, how many more CPUs, how much more work could you actually get done? So really exciting. >> I mean, less power consumption, more power and compute. >> Right. >> Kind of power center. So talk about the AI implications, where the use cases are. What are you seeing here? A lot of simulations, a lot of students, again, dorm room to the boardroom we've been saying here on theCUBE this is a great broad area, where's the action in the ML and the AI for you guys? >> So I think, not necessarily storage related but I think that right now there's this enormous explosion of custom silicon around AI machine learning which I as a, you said welcome hardware nerds at the beginning and I was like, ah, my people. >> We're all here, we're all here in Dallas. >> So wonderful. You know, as a hardware nerd we're talking about conferences, right? Who has ever attended hot chips and there's so much really amazing engineering work going on in the silicon space. It's probably the most exciting time for, CPU and accelerator, just innovation in, since the days before X 86 was the defacto standard, right? And you could go out and buy a different workstation with 16 different ISAs. That's really the most exciting thing, I walked past so many different places where you know, our booth is right next to Havana Labs with their gout accelerator, and they're doing this cute thing with one of the AI image generators in their booth, which is really cute. >> Woman: We're going to have to go check that out. >> Yeah, but that to me is like one of the more exciting things around like innovation at a, especially at a show like this where it's all about how do we move forward, the state of the art. >> What's different now than just a few years ago in terms of what's opening up the creativity for people to look at things that they could do with some of the scale that's different now. >> Yeah well, I mean, every time the state of the art moves forward what it means is, is that the entry level gets better, right? So if the high end is going faster, that means that the mid-range is going faster, and that means the entry level is going faster. So every time it pushes the boundary forward, it's a rising tide that floats all boats. And so now, the kind of stuff that's possible to do, if you're a student in a dorm room or if you're an enterprise, the world, the possible just keeps expanding dramatically and expanding almost, you know, geometrically like the amount of data that we are, that we have, as a storage guy, I was coming back to data but the amount of data that we have and the amount of of compute that we have, and it's not just about the raw compute, but also the advances in all sorts of other things in terms of algorithms and transfer learning and all these other things. There's so much amazing work going on in this area and it's just kind of this Kay Green explosion of innovation in the area. >> I love that you touched on the user experience for the community, no matter the level that you're at. >> Yeah. >> And I, it's been something that's come up a lot here. Everyone wants to do more faster, always, but it's not just that, it's about making the experience and the point of entry into this industry more approachable and digestible for folks who may not be familiar, I mean we have every end of the ecosystem here, on the show floor, where does Pure Storage sit in the whole game? >> Right, so as a storage company, right? What AI is all about deriving insights from data, right? And so everyone remembers that magazine cover data's the new oil, right? And it's kind of like, okay, so what do you do with it? Well, how do you derive value from all of that data? And AI machine learning and all of this supercomputing stuff is about how do we take all this data? How do we innovate with it? And so if you want data to innovate with, you need storage. And so, you know, our philosophy is that how do we make the best storage platforms that we can using the best technology for our customers that enable them to do really amazing things with AI machine learning and we've got different products, but, you know at the show here, what we're specifically showing off is our new flashlight S product, which, you know, I know we've had Pure folks on theCUBE before talking about FlashBlade, but for viewers out there, FlashBlade is our our scale out unstructured data platform and AI and machine learning and supercomputing is all about unstructured data. It's about sensor data, it's about imaging, it's about, you know, photogrammetry, all this other kinds of amazing stuff. But, you got to land all that somewhere. You got to process that all somewhere. And so really high performance, high throughput, highly scalable storage solutions are really essential. It's an enabler for all of the amazing other kinds of engineering work that goes on at a place like Supercomputing. >> It's interesting you mentioned data's oil. Remember in 2010, that year, our first year of theCUBE, Hadoop World, Hadoop just started to come on the scene, which became, you know kind of went away and, but now you got, Spark and Databricks and Snowflake- >> Justin: And it didn't go away, it just changed, right? >> It just got refactored and right size, I think for what the people wanted it to be easy to use but there's more data coming. How is data driving innovation as you bring, as people see clearly the more data's coming? How is data driving innovation as you guys look at your products, your roadmap and your customer base? How is data driving innovation for your customers? >> Well, I think every customer who has been, you know collecting all of this data, right? Is trying to figure out, now what do I do with it? And a lot of times people collect data and then it will end up on, you know, lower slower tiers and then suddenly they want to do something with it. And it's like, well now what do I do, right? And so there's all these people that are reevaluating you know, we, when we developed FlashBlade we sort of made this bet that unstructured data was going to become the new tier one data. It used to be that we thought unstructured data, it was emails and home directories and all that stuff the kind of stuff that you didn't really need a really good DR plan on. It's like, ah, we could, now of course, as soon as email goes down, you realize how important email is. But, the perspectives that people had on- >> Yeah, exactly. (all laughing) >> The perspectives that people had on unstructured data and it's value to the business was very different and so now- >> Good bet, by the way. >> Yeah, thank you. So now unstructured data is considered, you know, where companies are going to derive their value from. So it's whether they use the data that they have to build better products whether it's they use the data they have to develop you know, improvements in processes. All those kinds of things are data driven. And so all of the new big advancements in industry and in business are all about how do I derive insights from data? And so machine learning and AI has something to do with that, but also, you know, it all comes back to having data that's available. And so, we're working very hard on building platforms that customers can use to enable all of this really- >> Yeah, it's interesting, Savannah, you know, the top three areas we're covering for reinventing all the hyperscale events is data. How does it drive innovation and then specialized solutions to make customers lives easier? >> Yeah. >> It's become a big category. How do you compose stuff and then obviously compute, more and more compute and services to make the performance goes. So those seem to be the three hot areas. So, okay, data's the new oil refineries. You've got good solutions. What specialized solutions do you see coming out because once people have all this data, they might have either large scale, maybe some edge use cases. Do you see specialized solutions emerging? I mean, obviously it's got DPU emerging which is great, but like, do you see anything else coming out at that people are- >> Like from a hardware standpoint. >> Or from a customer standpoint, making the customer's lives easier? So, I got a lot of data flowing in. >> Yeah. >> It's never stopping, it keeps powering in. >> Yeah. >> Are there things coming out that makes their life easier? Have you seen anything coming out? >> Yeah, I think where we are as an industry right now with all of this new technology is, we're really in this phase of the standards aren't quite there yet. Everybody is sort of like figuring out what works and what doesn't. You know, there was this big revolution in sort of software development, right? Where moving towards agile development and all that kind of stuff, right? The way people build software change fundamentally this is kind of like another wave like that. I like to tell people that AI and machine learning is just a different way of writing software. What is the output of a training scenario, right? It's a model and a model is just code. And so I think that as all of these different, parts of the business figure out how do we leverage these technologies, what it is, is it's a different way of writing software and it's not necessarily going to replace traditional software development, but it's going to augment it, it's going to let you do other interesting things and so, where are things going? I think we're going to continue to start coalescing around what are the right ways to do things. Right now we talk about, you know, ML Ops and how development and the frameworks and all of this innovation. There's so much innovation, which means that the industry is moving so quickly that it's hard to settle on things like standards and, or at least best practices you know, at the very least. And that the best practices are changing every three months. Are they really best practices right? So I think, right, I think that as we progress and coalesce around kind of what are the right ways to do things that's really going to make customers' lives easier. Because, you know, today, if you're a software developer you know, we build a lot of software at Pure Storage right? And if you have people and developers who are familiar with how the process, how the factory functions, then their skills become portable and it becomes easier to onboard people and AI is still nothing like that right now. It's just so, so fast moving and it's so- >> Wild West kind of. >> It's not standardized. It's not industrialized, right? And so the next big frontier in all of this amazing stuff is how do we industrialize this and really make it easy to implement for organizations? >> Oil refineries, industrial Revolution. I mean, it's on that same trajectory. >> Yeah. >> Yeah, absolutely. >> Or industrial revolution. (John laughs) >> Well, we've talked a lot about the chaos and sort of we are very much at this early stage stepping way back and this can be your personal not Pure Storage opinion if you want. >> Okay. >> What in HPC or AIML I guess it all falls under the same umbrella, has you most excited? >> Ooh. >> So I feel like you're someone who sees a lot of different things. You've got a lot of customers, you're out talking to people. >> I think that there is a lot of advancement in the area of natural language processing and I think that, you know, we're starting to take things just like natural language processing and then turning them into vision processing and all these other, you know, I think the, the most exciting thing for me about AI is that there are a lot of people who are, you are looking to use these kinds of technologies to make technology more inclusive. And so- >> I love it. >> You know the ability for us to do things like automate captioning or the ability to automate descriptive, audio descriptions of video streams or things like that. I think that those are really,, I think they're really great in terms of bringing the benefits of technology to more people in an automated way because the challenge has always been bandwidth of how much a human can do. And because they were so difficult to automate and what AI's really allowing us to do is build systems whether that's text to speech or whether that's translation, or whether that's captioning or all these other things. I think the way that AI interfaces with humans is really the most interesting part. And I think the benefits that it can bring there because there's a lot of talk about all of the things that it does that people don't like or that they, that people are concerned about. But I think it's important to think about all the really great things that maybe don't necessarily personally impact you, but to the person who's not cited or to the person who you know is hearing impaired. You know, that's an enormously valuable thing. And the fact that those are becoming easier to do they're becoming better, the quality is getting better. I think those are really important for everybody. >> I love that you brought that up. I think it's a really important note to close on and you know, there's always the kind of terminator, dark side that we obsess over but that's actually not the truth. I mean, when we think about even just captioning it's a tool we use on theCUBE. It's, you know, we see it on our Instagram stories and everything else that opens the door for so many more people to be able to learn. >> Right? >> And the more we all learn, like you said the water level rises together and everything is magical. Justin, it has been a pleasure to have you on board. Last question, any more bourbon tasting today? >> Not that I'm aware of, but if you want to come by I'm sure we can find something somewhere. (all laughing) >> That's the spirit, that is the spirit of an innovator right there. Justin, thank you so much for joining us from Pure Storage. John Furrier, always a pleasure to interview with you. >> I'm glad I can contribute. >> Hey, hey, that's the understatement of the century. >> It's good to be back. >> Yeah. >> Hopefully I'll see you guys in, I'll see you guys in 2034. >> No. (all laughing) No, you've got the Pure Accelerate conference. We'll be there. >> That's right. >> We'll be there. >> Yeah, we have our Pure Accelerate conference next year and- >> Great. >> Yeah. >> I love that, I mean, feel free to, you know, hype that. That's awesome. >> Great company, great runs, stayed true to the mission from day one, all Flash, continue to innovate congratulations. >> Yep, thank you so much, it's pleasure being here. >> It's a fun ride, you are a joy to talk to and it's clear you're just as excited as we are about hardware, so thanks a lot Justin. >> My pleasure. >> And thank all of you for tuning in to this wonderfully nerdy hardware edition of theCUBE live from Dallas, Texas, where we're at, Supercomputing, my name's Savannah Peterson and I hope you have a wonderful night. (soft music)

Published Date : Nov 16 2022

SUMMARY :

and welcome back to Dallas Texas It's been a great show so far. We've had more hosts, more It's been a super the third event, was that right, John? Yeah, the first ever VM World, It's been too long, you I mean, I can barely remember for VMware, but the industry, the cloud, as you know, covering as well. and it's been so great to So one of the big the biggest difference is that, you know, I mean, less power consumption, in the ML and the AI for you guys? nerds at the beginning all here in Dallas. places where you know, have to go check that out. Yeah, but that to me is like one of for people to look at and the amount of of compute that we have, I love that you touched and the point of entry It's an enabler for all of the amazing but now you got, Spark and as you guys look at your products, the kind of stuff that Yeah, exactly. And so all of the new big advancements Savannah, you know, but like, do you see a hardware standpoint. the customer's lives easier? It's never stopping, it's going to let you do And so the next big frontier I mean, it's on that same trajectory. (John laughs) a lot about the chaos You've got a lot of customers, and I think that, you know, or to the person who you and you know, there's always And the more we all but if you want to come by that is the spirit of an Hey, hey, that's the Hopefully I'll see you guys We'll be there. free to, you know, hype that. all Flash, continue to Yep, thank you so much, It's a fun ride, you and I hope you have a wonderful night.

ENTITIES

Entity	Category	Confidence
Paul Moritz	PERSON	0.99+
Justin	PERSON	0.99+
Justin Emerson	PERSON	0.99+
John	PERSON	0.99+
Savannah Peterson	PERSON	0.99+
Savannah	PERSON	0.99+
Dallas	LOCATION	0.99+
June	DATE	0.99+
John Furrier	PERSON	0.99+
12 years	QUANTITY	0.99+
2010	DATE	0.99+
Kay Green	PERSON	0.99+
Dallas, Texas	LOCATION	0.99+
third event	QUANTITY	0.99+
Dallas Texas	LOCATION	0.99+
last week	DATE	0.99+
12 years ago	DATE	0.99+
two-thirds	QUANTITY	0.99+
First	QUANTITY	0.98+
VM World	EVENT	0.98+
first	QUANTITY	0.98+
two thirds	QUANTITY	0.98+
Havana Labs	ORGANIZATION	0.98+
Pure Accelerate	EVENT	0.98+
next year	DATE	0.98+
today	DATE	0.98+
both sides	QUANTITY	0.98+
Pure Storage	ORGANIZATION	0.97+
first year	QUANTITY	0.97+
16 different ISAs	QUANTITY	0.96+
FlashBlade	TITLE	0.96+
three hot areas	QUANTITY	0.94+
three	QUANTITY	0.94+
Snowflake	ORGANIZATION	0.93+
one	QUANTITY	0.93+
2034	DATE	0.93+
one thing	QUANTITY	0.93+
Supercomputing	ORGANIZATION	0.9+
90% less	QUANTITY	0.89+
theCUBE	ORGANIZATION	0.86+
agile	TITLE	0.84+
VM world	EVENT	0.84+
few years ago	DATE	0.81+
day one	QUANTITY	0.81+
Hadoop World	ORGANIZATION	0.8+
VMware	ORGANIZATION	0.79+
Instagram	ORGANIZATION	0.78+
Spark and	ORGANIZATION	0.77+
Hadoop	ORGANIZATION	0.74+
years	DATE	0.73+
last	DATE	0.73+
three months	QUANTITY	0.69+
FlashBlade	ORGANIZATION	0.68+
Direct Flash	TITLE	0.67+
year	DATE	0.65+
tier one	QUANTITY	0.58+
Supercomputing	TITLE	0.58+
Direct	TITLE	0.56+
Flash	ORGANIZATION	0.55+
86	TITLE	0.55+
aces	QUANTITY	0.55+
Pure	ORGANIZATION	0.51+
Databricks	ORGANIZATION	0.5+
2022	ORGANIZATION	0.5+
X	EVENT	0.45+

Felix Van de Maele, Collibra, Data Citizens 22

(upbeat techno music) >> Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions, and they were largely confined to regulated industries that had to comply with public policy mandates. But as the cloud went mainstream the tech giants showed us how valuable data could become, and the value proposition for data quality and trust, it evolved from primarily a compliance driven issue, to becoming a linchpin of competitive advantage. But, data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper-specialized skills, to develop data architectures and processes, to serve the myriad data needs of organizations. And it resulted in a lot of frustration, with data initiatives for most organizations, that didn't have the resources of the cloud guys and the social media giants, to really attack their data problems and turn data into gold. This is why today, for example, there's quite a bit of momentum to re-thinking monolithic data architectures. You see, you hear about initiatives like Data Mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business users. You hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver, like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that but also, how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. In other words, while it's enticing to experiment, and run fast and loose with data initiatives, kind of like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated and intelligent. Governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is going to use data that is entrusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. Hello and welcome to theCUBE's coverage of Data Citizens made possible by Collibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Vellante and I'm one of the hosts of our program which is running in parallel to Data Citizens. Now at theCUBE we like to say we extract the signal from the noise, and over the next couple of days we're going to feature some of the themes from the keynote speakers at Data Citizens, and we'll hear from several of the executives. Felix Van de Maele, who is the co-founder and CEO of Collibra, will join us. Along with one of the other founders of Collibra, Stan Christiaens, who's going to join my colleague Lisa Martin. I'm going to also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Haslbeck. He's the Vice President of Data Quality at Collibra. He's an amazingly smart dude who founded Owl DQ, a company that he sold to Collibra last year. Now, many companies they didn't make it through the Hadoop era, you know they missed the industry waves and they became driftwood. Collibra, on the other hand, has evolved its business, they've leveraged the cloud, expanded its product portfolio and leaned in heavily to some major partnerships with cloud providers as well as receiving a strategic investment from Snowflake, earlier this year. So, it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. (upbeat rock music) Last year theCUBE covered Data Citizens, Collibra's customer event, and the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know starting with the Hadoop movement, we had Data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like Tensorflow, the rise of AI, Low Code, No Code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives, and we said at the time, you know maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation. Meaning, making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back and we're pleased to have Felix Van de Maele who is the founder and CEO of Collibra. He's on theCUBE. We're excited to have you Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data citizens 2021, and we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends and we're not just snapping back to the 2010s, that's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s, from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely, and and I think you said it well, Dave and the intro that, that rising complexity and fragmentation, in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten more more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under, respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity, and fragmentation. So, it's become much more acute. And to your earlier point, we do live in a different world and and the past couple of years we could probably just kind of brute force it, right? We could focus on, on the top line, there was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, how do we truly get the value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale with data, not just from a a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? You said at the, the people and process, how do we do that at scale? And that's only, only, only becoming much more important, and we do believe that the, the economic environment that we find ourselves in today is going to be catalyst for organizations to really take that more seriously if, if, if you will, than they maybe have in the have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >> Yeah, absolutely. We, we started Collibra in 2008. So, in some sense and the, the last kind of financial crisis and that was really the, the start of Collibra, where we found product market fit, working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis. And kind of here we are again, in a very different environment of course 15 years, almost 15 years later, but data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So, what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it Data Citizens, we truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we still relatively early in that, in that journey. >> Well that's interesting, because you know, in my observation it takes 7 to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwind organizations that are only maturing their data practices and we've seen that kind of transform or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world with its Adobe, Heineken, Bank of America and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in the, in the market with some of the cloud partners like Google, Amazon, Snowflake, Data Breaks, and and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new but I think there's a lot of reasons why we're so excited about quality and observability now. One, is around leveraging AI machine learning again to drive more automation. And a second is that those data pipelines, that are now being created in the cloud, in these modern data architecture, architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously, has become absolutely critical so that they're really excited about, about that as well. And on the organizational side, I'm sure you've heard the term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations, and so that aligns really well with our vision and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know Kirk Haslbeck and, and their team. It's interesting, you know the whole data quality used to be this back office function and and really confined to highly regulated industries. It's come to the front office, it's top of mind for Chief Data Officers. Data mesh, you mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So, let's chat a little bit about the, the products. We're going to go deeper into products later on, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes, as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We we're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission. Either customers are still start, are just starting on that, on that journey. We want to make it as easy as possible for the, for organization to actually get started, because we know that's important that they do. And for our organization and customers, that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again to make it easier for, really to, to accomplish that mission and vision around that Data Citizen, that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that, as clear becomes this kind of mission critical enterprise platform, from a security performance, architecture scale supportability, that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One, is around data marketplace. Again, a lot of our customers have plans in that direction, How to make it easy? How do we make How do we make available to true kind of shopping experience? So that anybody in the organization can, in a very easy search first way, find the right data product, find the right dataset, that they can then consume. Usage analytics, how do you, how do we help organizations drive adoption? Tell them where they're working really well and where they have opportunities. Homepages again to, to make things easy for, for people, for anyone in your organization, to kind of get started with Collibra. You mentioned Workflow Designer, again, we have a very powerful enterprise platform, one of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a, a new Low-Code, No-Code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around Collibra protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PIA data, is managed as a much more effective, effective rate. Really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily, and quickly, and widely as we can? Moving that to the cloud has been a big part of our strategy. So, we launch our data quality cloud product, as well as making use of those, those native compute capabilities and platforms, like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down, so we're actually pushing down the computer and data quality, to monitoring into the underlying platform, which again from a scale performance and ease of use perspective, is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So that's a lot coming out, the team has been work, at work really hard, and we are really really excited about what we are coming, what we're bringing to market. >> Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you you talked about, you know, the marketplace, you know you think about Data Mesh, you think of data as product, one of the key principles, you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been, been so hard. So, how do you see sort of the future and, you know give us the, your closing thoughts please? >> Yeah, absolutely. And, and I think we we're really at a pivotal moment and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to, deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can, as kind of our, as our mission. And so I'm really, really excited to see what we, what we are going to, how the marks are going to evolve over the next, next few quarters and years. I think the trend is clearly there. We talked about Data Mesh, this kind of federated approach focus on data products, is just another signal that we believe, that a lot of our organization are now at the time, they're understanding need to go beyond just the technology. I really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now is the time, given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now is the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much. Good luck in, in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in-person event and and of course the content post-event is going to be available at collibra.com and you can of course catch theCUBE coverage at theCUBE.net and all the news at siliconangle.com. This is Dave Vellante for theCUBE, your leader in enterprise and emerging tech coverage. (upbeat techno music)

Published Date : Nov 2 2022

SUMMARY :

and the premise that we put for having me again. in the data landscape of the 2020s, and to scale with data, and what are you doing to And kind of here we are again, still in the early days a lot of momentum in the org in the, And of course we see you at all the shows. is the ability to the technology to work and now is the time we need to look of data won't be like the and of course the content

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Heineken	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Van de Maele	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Laura Sellers	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
San Diego	LOCATION	0.99+
Stan Christiaens	PERSON	0.99+
Dave	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
7	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
2020s	DATE	0.99+
last year	DATE	0.99+
2010s	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
Python	TITLE	0.99+
Last year	DATE	0.99+
12 months	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
one	QUANTITY	0.99+
Data Citizens	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Owl DQ	ORGANIZATION	0.98+
10	DATE	0.98+
OwlDQ	ORGANIZATION	0.98+
Kirk Haslbeck	PERSON	0.98+
10 years	QUANTITY	0.98+
One	QUANTITY	0.98+
Spark	TITLE	0.98+
today	DATE	0.98+
first	QUANTITY	0.97+
Data Citizens	EVENT	0.97+
earlier this year	DATE	0.96+
Tensorflow	TITLE	0.96+
Data Citizens 22	ORGANIZATION	0.95+
both	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
15 years ago	DATE	0.93+
over 600 enterprise customers	QUANTITY	0.91+
past couple of years	DATE	0.91+
about 18 months ago	DATE	0.9+
collibra.com	OTHER	0.89+
Data citizens 2021	ORGANIZATION	0.88+
Data Citizens 2022	EVENT	0.86+
almost 15 years later	DATE	0.85+
West	LOCATION	0.85+
Azure	TITLE	0.84+
first way	QUANTITY	0.83+
Vice President	PERSON	0.83+
last couple of years	DATE	0.8+

Murli Thirumale, Portworx by Pure Storage | KubeCon + CloudNativeCon NA 2022

>>Good afternoon and welcome back to Detroit, Lisa Martin here with John Furrier. We are live day two of our coverage of Coan Cloud Native Con North America. John, we've had great conversations. Yeah. All day yesterday. Half a day today. So far we're talking all things, Well, not all things Kubernetes so much more than that. We also have to talk about storage and data management solutions for Kubernetes projects, cuz that's obviously critical. >>Yeah, I mean the big trend here is Kubernetes going mainstream has been for a while. The adopt is crossing over, it's crossing the CADs and with that you're seeing security concerns. You're seeing things being gaps being filled. But enterprise grade is really the, the, the story. It's going enterprise, that's managed services, that's professional service, that's basically making things work at scale. This next segment hits that part and we are gonna talk about it in grade length >>With one of our alumni. Moral morale to Molly is back DP and GM of Port Work's Peer Storage. Great to have you back really? >>Yeah, absolutely. Delightful >>To be here. So I was looking on the website, number one in Kubernetes storage. Three years in a row. Yep. Awesome. What's Coworks doing here at KU Con? >>Well, I'll tell you, we, our engineering crew has been so productive and hard at work that I almost can't decide what to kind of tell you. But I thought what, what, what I thought I would do is kind of tell you that we are in forefront of two major trends in the world of Kubernetes. Right? And the, the two trends that I see are one is as a service, so is trend number one. So it's not software eating the world anymore. That's, that's old, old, old news. It's as a service unifying the world. The world wants easy, We all are, you know, subscribers to things like Netflix. We've been using Salesforce or other HR functions. Everything is as a service. And in the world of Kubernetes, it's a sign of that maturity that John was talking about as a platform that now as a service is the big trend. >>And so headline number one, if you will, is that Port Works is leading in the data management world for Kubernetes by providing, we're going all in on easy on as a service. So everything we do, we are satisfying it, right? So if you think, if you think about, if you think about this, that, that there are really, most of the people who are consuming Kubernetes are people who are building platforms for their dev users. And dev users want self service. That's one of the advantages of, of, of Kubernetes. And the more it is service size and made as a service, the more ready to consume it is. And so we are announcing at the show that we have, you know, the basic Kubernetes data management as a service, ha d r as a service. We have backup as a service and we have database as a service. So these are the three major components of data. And all of those are being made available as a service. And in fact, we're offering and announcing at the show our backup as a service freemium version where you can get free forever a terabyte of, of, you know, stuff to do for Kubernetes for forever. >>Congratulations on the announcement. Totally. In line with what the market wants. Developers want Selfer, they wanna also want simplicity by the way they'll leave if they don't like the service. Correct. So that you, you know that before we get into some more specifics, I want Yeah. Ask you on the industry and some of the point solutions you have, what, it's been two years since the acquisition with Pure Storage. Can you just give an update on how it's gone? Obviously as a service, you guys are hitting all your Marks, developers love it. Storage are big part of the game right now as well as these environments. Yeah. What's the update post acquisition two years. You had a great offering Stay right In >>Point Works. Yeah. So look, John, you're, you're, you're a veteran of the industry and have seen lots of acquisitions, right? And I've been acquired twice before myself. So, you know, there's, there's always best practices and poor practices in terms of acquisitions and I'm, you know, really delighted to say I think this, this acquisition has had some of the best practices. Let me just name a couple of them, right? One of them is just cultural fit, right? Cultural fit is great. Entrepreneurs, anybody, it's not just entrepreneurs. Everybody loves to work in a place they enjoy working with, with people that they, you know, thrive when they, when they interact with. And so the cultural fit with, with Pure is fantastic. The other one is the strategic intent that Pure had when they acquired us is still true. And so that goes a long way, you know, in terms of an investment profile, in terms of the ability to kind of leverage assets within the company. So Pure had kind of disrupted the world of storage using Flash and they wanted to disrupt higher up the stack using Kubernetes. And that's kind of been our role inside their strategy. And it's, it's still true. >>So culture, strategic intent. Yeah. Product market fit as well. You were, you weren't just an asset for customers or acquisition and then let the founders go through their next thing. You are part of their growth play. >>Absolutely. Right. The, the beauty of, of the kind of product market fit is, let's talk about the market is we have been always focused on the global two k and that is at the heart of, you know, purest 10,000 strong customer base, right? They have very strong presence in the, in the global two k. And we, we allow them to kind of go to those same folks with, with the offering. >>So satisfying everything that you do. What's for me as a business, whether I'm a financial services organization, I'm a hospital, I'm a retailer, what's in it for me >>As a customer? Yeah. So the, the what's in it for, for me is two things. It's speed and ease of use, which in a way are related. But, but, but you know, one is when something is provided as a service, it's much more consumable. It's instantly ready. It's like instant oatmeal, right? You just get it just ad hot water and it's there. Yep. So the world of of it has moved from owning large data centers, right? That used to be like 25 years ago and running those data centers better than everybody else to move to let me just consume a data center in the form of a cloud, right? So satisfying the cloud part of the data center. Now people are saying, well I expect that for software and services and I don't want it just from the public cloud, I want it from my own IT department. >>This is old news. And so the, the, the big news here is how fast Kubernetes has kind of moved everything. You know, you take a lot of these changes, Kubernetes is a poster child for things happening faster than the last wave. And in the last couple of years I would say that as a service model has really kind of thrived in the world of Kubernetes. And developers want to be able to get it fast. And the second thing is they want to be able to operate it fast. Self-service is the other benefit. Yeah. So speed and self-service are both benefits of, of >>This. Yeah. And, and the thing that's come up clearly in the cube, this is gonna be part of the headlines we'll probably end up getting a lot of highlights from telling my team to make a note of this, is that developers are gonna be be the, the business if you, if you take digital transformation to its conclusion, they're not a department that serves the business, they are the business that means Exactly. They have to be more productive. So developer productivity has been the top story. Yes. Security as a serves all these things. These are, these are examples to make developers more productive. But one of the things that came up and I wanna get your reaction to is, is that when you have disruption and, and the storage vision, you know what disruption it means. Cuz there's been a whole discussion around disruptive operations. When storage goes down, you have back m dr and failover. If there's a disruption that changes the nature of invisible infrastructure, developers want invisible infrastructure. That's the future steady state. So if there's a disruption in storage >>Yeah. It >>Can't affect the productivity and the tool chains and the workflows of developers. Yep. Right? So how do you guys look at that? Cuz you're a critical component. Storage is a service is a huge thing. Yeah. Storage has to, has to work seamlessly. And let's keep the developers out of the weeds. >>John. I think what, what what you put your finger on is another huge trend in the world of Kubernetes where at Cube Con, after all, which is really where, where all the leading practitioners both come and the leading vendors are. So here's the second trend that we are leading and, and actually I think it's happening not just with us, but with other, for folks in the industry. And that is, you know, the world of DevOps. Like DevOps has been such a catchphrase for all, all of us in the industry last five years. And it's been both a combination of cultural change as well as technology change. Here's what the latest is on the, in the world of DevOps. DevOps is now crystallized. It's not some kind of mysterious art form that you read about how people are practicing. DevOps is, it's broken into two, two things now. >>There is the platform part. So DevOps is now a bunch of platforms. And the other part of DevOps is a bunch of practices. So a little bit on both these, the platforms in the world of es there's only three platforms, right? There's the orchestration platforms, the, you know, eks, the open ships of the world and so on. There are the data management platforms, pro people like Port Works. And the third is security platforms, right? You know, Palo Alto Networks, others Aqua or all in this. So these are the three platforms and there are platform engineering teams now that many of our largest customers, some of the largest banks, the largest service providers, they're all operating as a ES platform engineering team. And then now developers, to your point, developers are in the practice of being able to use these platforms to launch new services. So the, the actual IT ops, the ops are run by developers now and they can do it on these platforms. And the platform engineering team provide that as an ease of use and they're there to troubleshoot when problems happen. So the idea of DevOps as a ops practice and a platform is the newest thing. E and, and ports and pure storage leading in the world of data management platforms >>There. Talk about a customer example that you think really articulates the value that Port Works and Pure Storage delivers from a data management perspective. >>Yeah, so there's so many examples. One of the, one of the longest running examples we have is a very, very large service provider that, you know, you all know and probably use, and they have been using us in the cable kinda set box or cable box business. They get streams of data from, from cable boxes all over the world. They collected all in a centralized large kind of thing and run elastic search and analytics on it. Now what they have done is they couldn't keep up with this at the scale and the depth, right? The speed of, of activity and the distributed nature of the activity. The only way to solve this was to use something like Kubernetes manage with Spark coming, bringing all the data in to deep, deep, deep silos of storage, which are all running not even on a sand, but on kind of, you know, very deep terabytes and terabytes of, of storage. So all of this is orchestrated with the Heco coworks and there's a platform engineering team. We are building that platform for them with some of these other components that allows them to kind of do analytics and, and make some changes in real time. Huge kind of setup for, for >>That. Yeah. Well, you guys have the right architecture. I love the vision. I love what you guys are doing. I think this is right in line with Pures. They've always been disruptors. I remember when we first interviewed the CEO when they started Yep. They, they stayed on path. They didn't waiver. EMC was the big player. They ended up taking their lunch and dinner as well and they beat 'em in the marketplace. But now you got this traction here. So I have to ask you, how's the business, what's the results look like? Either GM cloud native business unit of a storage company that's transformed and transforming? >>Yeah, you know, it's interesting, we just hit the two year anniversary, right John? And so what we did was just kind of like step back and hey, you know, we're running so hard, you just take a step back. And we've tripled the business in the two years since the acquisition, the two years before and, and we were growing through proven. So, you know, that that's quite a fe and we've tripled the number of people, the amount of engineering investments we have, the number of go to market investments have, have been, have been awesome. So business is going really well though, I will say. But I think, you know, we have, we can't be, we we're watching the market closely. You know, as a former ceo, I, you have to kind of learn to read the tea leaves when you invest. And I think, you know, what I would say is we're proceeding with caution in the next two quarters. I view business transformation as not a cancelable activity. So that's the, that's the good news, right? Our customers are large, it's, >>It's >>Right. All they're gonna do is say, Hey, they're gonna put their hand, their hand was always going right on the dial. Now they're kind of putting their hand on the dial going, hey, where, what is happening? But my, my own sense of this is that people will continue to invest through it. The question is at what level? And I also think that this is a six month kind of watch, the watch where, where we put the dial. So Q4 and q1 I think are kind of, you know, we have our, our watch kind of watch the market sign. But I have the highest confidence. What >>Does your gut tell you? You're an entrepreneur, >>Which my, my gut says that we'll go through a little bit of a cautious investment period in the next six months. And after that I think we're gonna be back in, back full, full in the crazy growth that we've always been. We're gonna grow by the way, in the next think >>It's core style. I think I'm, I'm more bullish. I think there's gonna be some, you know, weeding out of some overinvestment pre C or pre bubble. But I think tech's gonna continue to grow. I don't see >>It's stopping. Yeah. And, and the investment is gonna be on these core platforms. See, back to the platform story, it's gonna be in these core platforms and on unifying everything, let's consume it better rather than let's go kind of experiment with a whole bunch of things all over the map, right? So you'll see less experimentation and more kind of, let's harvest some of the investments we've made in the last couple >>Of years and actually be able to, to enable companies in any industry to truly be data companies. Because absolutely. We talked about as a service, we all have these expectations that any service we want, we can get it. Yes. There's no delay because patients has gone Yeah. From the pandemic. >>So it is kind of, you know, tightening up the screws on what they've built. They, you know, adding some polish to it, adding some more capability, like I said, a a a, a combination of harvesting and new investing. It's a combination I think is what we're gonna see. >>Yeah. What are some of the things that you're looking forward to? You talked about some of the, the growth things in the investment, but as we round out Q4 and head into a new year, what are you excited about? >>Yeah, so you know, I mentioned our, as a service kind of platform, the global two K for us has been a set of customers who we co-create stuff with. And so one of the other set of things that we are very excited about and announcing is because we're deployed at scale, we're, we're, we have upgraded our backend. So we have now the ability to go to million IOPS and more and, and for, for the right backends. And so Kubernetes is a add-on which will not slow down your, your core base infrastructure. Second thing that that we, we have is added a bunch of capability in the disaster recovery business continuity front, you know, we always had like metro kind of distance dr. We had long distance dr. We've added a near sync Dr. So now we can provide disaster recovery and business continuity for metro distances across continents and across the planet. Right? That's kind of a major change that we've done. The third thing is we've added the capability for file block and Object. So now by adding object, we're really a complete solution. So it is really that maturity of the business Yeah. That you start seeing as enterprises move to embracing a platform approach, deploying it much more widely. You talked about the early majority. Yeah. Right. And so what they require is more enterprise class capability and those are all the things that we've been adding and we're really looking forward >>To it. Well it sounds like tremendous evolution and maturation of Port Works in the two years since it's been with Pure Storage. You talked about the cultural alignment, great stuff that you're achieving. Congratulations on that. Yeah. Great stuff >>Ahead and having fun. Let's not forget that, that's too life's too short to do. It is right. >>You're right. Thank you. We will definitely, as always on the cube, keep our eyes on this space. Mur. Meley, it's been great to have you back on the program. Thank you for joining, John. >>Thank you so much. It's pleasure. Our, >>For our guests and John Furrier, Lisa Martin here live in Detroit with the cube about Coan Cloud Native Con at 22. We'll be back after a short break.

Published Date : Oct 28 2022

SUMMARY :

So far we're talking all things, Well, not all things Kubernetes so much more than that. crossing over, it's crossing the CADs and with that you're seeing security concerns. Great to have you back really? Yeah, absolutely. So I was looking on the website, number one in Kubernetes storage. And in the world of Kubernetes, it's a sign of that maturity that and made as a service, the more ready to consume it is. Storage are big part of the game right now as well as these environments. And so the cultural fit with, with Pure is fantastic. You were, you weren't just an asset for customers that is at the heart of, you know, purest 10,000 strong customer base, So satisfying everything that you do. So satisfying the cloud part of the data center. And in the last couple of years I would say that So developer productivity has been the top story. And let's keep the developers out of the weeds. So here's the second trend that we are leading and, There's the orchestration platforms, the, you know, eks, Talk about a customer example that you think really articulates the value that Port Works and Pure Storage delivers we have is a very, very large service provider that, you know, you all know I love the vision. And so what we did was just kind of like step back and hey, you know, But I have the highest confidence. We're gonna grow by the way, in the next think I think there's gonna be some, you know, weeding out of some overinvestment experimentation and more kind of, let's harvest some of the investments we've made in the last couple From the pandemic. So it is kind of, you know, tightening up the screws on what they've the growth things in the investment, but as we round out Q4 and head into a new year, what are you excited about? of capability in the disaster recovery business continuity front, you know, You talked about the cultural alignment, great stuff that you're achieving. It is right. it's been great to have you back on the program. Thank you so much. For our guests and John Furrier, Lisa Martin here live in Detroit with the cube about Coan Cloud

ENTITIES

Entity	Category	Confidence
John Furrier	PERSON	0.99+
John	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Detroit	LOCATION	0.99+
Molly	PERSON	0.99+
Murli Thirumale	PERSON	0.99+
six month	QUANTITY	0.99+
twice	QUANTITY	0.99+
DevOps	TITLE	0.99+
yesterday	DATE	0.99+
two things	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
Three years	QUANTITY	0.99+
both	QUANTITY	0.99+
10,000	QUANTITY	0.99+
second trend	QUANTITY	0.99+
three platforms	QUANTITY	0.99+
Pure	ORGANIZATION	0.99+
Half a day	QUANTITY	0.99+
Cube Con	ORGANIZATION	0.98+
third	QUANTITY	0.98+
one	QUANTITY	0.98+
Pure Storage	ORGANIZATION	0.98+
first	QUANTITY	0.98+
second thing	QUANTITY	0.98+
third thing	QUANTITY	0.98+
global two k	ORGANIZATION	0.98+
25 years ago	DATE	0.97+
two years	QUANTITY	0.97+
Netflix	ORGANIZATION	0.97+
Second thing	QUANTITY	0.96+
global two k.	ORGANIZATION	0.96+
Aqua	ORGANIZATION	0.96+
two years	DATE	0.96+
two things	QUANTITY	0.96+
Kubernetes	TITLE	0.96+
Port Work's Peer Storage	ORGANIZATION	0.95+
Meley	PERSON	0.95+
two trends	QUANTITY	0.95+
GM	ORGANIZATION	0.94+
CloudNativeCon	EVENT	0.94+
today	DATE	0.93+
Pures	ORGANIZATION	0.93+
Spark	TITLE	0.93+
last five years	DATE	0.92+
three major components	QUANTITY	0.92+
both benefits	QUANTITY	0.92+
Port Works	ORGANIZATION	0.91+
Coan Cloud Native Con	EVENT	0.91+
pandemic	EVENT	0.89+
Con	EVENT	0.89+
22	DATE	0.89+
day two	QUANTITY	0.87+
next six months	DATE	0.87+
two year anniversary	QUANTITY	0.87+
Mur	PERSON	0.86+
Q4	DATE	0.85+
Heco	ORGANIZATION	0.85+
q1	DATE	0.84+
last couple of years	DATE	0.83+
million IOPS	QUANTITY	0.82+

Murli Thirumale, Portworx by Pure Storage | KubeCon + CloudNativeCon NA 2022

>>Good afternoon and welcome back to Detroit, Lisa Martin here with John Furrier. We are live day two of our coverage of Coan Cloud Native, Con North America. John, we've had great conversations. Yeah. All day yesterday. Half a day today. So far we're talking all things, Well, not all things Kubernetes so much more than that. We also have to talk about storage and data management solutions for Kubernetes projects, cuz that's obviously critical. >>Yeah, I mean the big trend here is Kubernetes going mainstream has been for a while. The adopt is crossing over, it's crossing the CADs and with that you're seeing security concerns. You're seeing things being gaps being filled. But enterprise grade is really the, the, the story. It's going enterprise, that's managed services, that's professional service, that's basically making things work at scale. This next segment hits that, that part, and we're gonna talk about it in grade length >>With one of our alumni morale to Molly is back VP and GM of Port Work's peer Storage. Great to have you back really? >>Yeah, absolutely. Delightful to >>Be here. So I was looking on the website, number one in Kubernetes storage. Three years in a row. Yep. Awesome. What's Coworks doing here at KU Con? >>Well, I'll tell you, we, our engineering crew has been so productive and hard at work that I almost can't decide what to kind of tell you. But I thought what, what, what I thought I would do is kind of tell you that we are in forefront of two major trends in the world of es. Right? And the, the two trends that I see are one is as a service, so is trend number one. So it's not software eating the world anymore. That's, that's old, old, old news. It's as a service, unifying the world. The world wants easy, We all are, you know, subscribers to things like Netflix. We've been using Salesforce or other HR functions. Everything is as a service. And in the world of Kubernetes, it's a sign of that maturity that John was talking about as a platform that now as a service is the big trend. >>And so headline number one, if you will, is that Port Works is leading in the data management world for the Kubernetes by providing, we're going all in on easy on as a service. So everything we do, we are satisfying it, right? So if you think, if you think about, if you think about this, that, that there are really, most of the people who are consuming Kubernetes are people who are building platforms for their dev users and their users want self service. That's one of the advantages of, of, of Kubernetes. And the more it is service size and made as a service, the more ready to consume it is. And so we are announcing at the show that we have, you know, the basic Kubernetes data management as a service, ha d r as a service. We have backup as a service and we have database as a service. So these are the three major components of data. And all of those are being made available as a service. And in fact, we're offering and announcing at the show our backup as a service freemium version where you can get free forever a terabyte of, of, you know, stuff to do for Kubernetes for forever. >>Congratulations on the announcement. Totally. In line with what the market wants. Developers want self serve, they wanna also want simplicity by the way they'll leave if they don't like the service. Correct. So that you, you know, that before we get into some more specifics, I want to Yeah. Ask you on the industry and some of the point solutions you have, what, it's been two years since the acquisition with Pure Storage. Can you just give an update on how it's gone? Obviously as a service, you guys are hitting all your Marks, developers love it. Storage a big part of the game right now as well as these environments. Yeah. What's the update post acquisition two years, You had a great offering Stay >>Right In Point Works. Yeah. So look, John, you're, you're, you're a veteran of the industry and have seen lots of acquisitions, right? And I've been acquired twice before myself. So, you know, there's, there's always best practices and poor practices in terms of acquisitions and I'm, you know, really delighted to say I think this, this acquisition has had some of the best practices. Let me just name a couple of them, right? One of them is just cultural fit, right? Cultural fit is great. Entrepreneurs, anybody, it's not just entrepreneurs. Everybody loves to work in a place they enjoy working with, with people that they, you know, thrive when they, when they interact with. And so the cultural fit with, with Pure is fantastic. The other one is the strategic intent that Pure had when they acquired us is still true. And so that goes a long way, you know, in terms of an investment profile, in terms of the ability to kind of leverage assets within the company. So Pure had kind of disrupted the world of storage using Flash and they wanted to disrupt higher up the stack using Kubernetes. And that's kind of been our role inside their strategy. And it's, it's still true. >>So culture, strategic intent. Yeah. Product market fit as well. You were, you weren't just an asset for customers or acquisition and then let the founders go through their next thing. You are part of their growth play. >>Absolutely. Right. The, the beauty of, of the kind of product market fit is, let's talk about the market is we have been always focused on the global two k and that is at the heart of, you know, purest 10,000 strong customer base, right? They have very strong presence in the, in the global two k. And we, we allow them to kind of go to those same folks with, with the offering. >>So satisfying everything that you do. What's for me as a business, whether I'm a financial services organization, I'm a hospital, I'm a retailer, what's in it for me >>As a customer? Yeah. So the, the what's in it for, for me is two things. It's speed and ease of use, which in a way are related. But, but, but you know, one is when something is provided as a service, it's much more consumable. It's instantly ready. It's like instant oatmeal, right? You just get it just adho water and it's there. Yep. So the world of of IT has moved from owning large data centers, right? That used to be like 25 years ago and running those data centers better than everybody else to move to let me just consume a data center in the form of a cloud, right? So satisfying the cloud part of the data center. Now people are saying, well I expect that for software and services and I don't want it just from the public cloud, I want it from my own IT department. >>This is old news. And so the, the, the big news here is how fast Kubernetes has kind of moved everything. You know, you take a lot of these changes, Kubernetes is a poster child for things happening faster than the last wave. And in the last couple of years I would say that as a service model has really kind of thrived in the world of Kubernetes. And developers want to be able to get it fast. And the second thing is they wanna be able to operate it fast. Self-service is the other benefit. Yeah. So speed and self-service are both benefits of, of >>This. Yeah. And, and the thing that's come up clearly in the cube, and this is gonna be part of the headlines, we'll probably end up getting a lot of highlights from telling my team to make a note of this, is that developers are gonna be be the business if you, if you take digital transformation to its conclusion, they're not a department that serves the business, they are the business that means Exactly. They have to be more productive. So developer productivity has been the top story. Yes. Security as a services, all these things. These are, these are examples to make developers more productive. But one of the things that came up and I wanna get your reaction to Yeah. Is, is that when you have disruption and, and the storage vision, you know what disruption it means. Cuz there's been a whole discussion around disruptive operations. When storage goes down, you have back DR. And failover. If there's a disruption that changes the nature of invisible infrastructure, developers want invisible infrastructure. That's the future steady state. So if there's a disruption in storage >>Yeah. It >>Can't affect the productivity and the tool chains and the workflows of developers. Yep. Right? So how do you guys look at that? Cause you're a critical component. Storage is a service, it's a huge thing. Yeah. Storage has to, has to work seamlessly. And let's keep the developers out of the weeds. >>John. I think what, what what you put your finger on is another huge trend in the world of Kubernetes where Atan after all, which is really where, where all the leading practitioners both come and the leading vendors are. So here's the second trend that we are leading and, and actually I think it's happening not just with us, but with other, for folks in the industry. And that is, you know, the world of DevOps. Like DevOps has been such a catchphrase for all of of us in the industry last five years. And it's been both a combination of cultural change as well as technology change. Here's what the latest is on the, in the world of DevOps. DevOps is now crystallized. It's not some kind of mysterious art form that you read about. Okay. How people are practicing. DevOps is, it's broken into two, two things now. >>There is the platform part. So DevOps is now a bunch of platforms. And the other part of DevOps is a bunch of practices. So a little bit on both these, the platforms in the world of es there's only three platforms, right? There's the orchestration platforms, the, you know, eks, the open ships of the world and so on. There are the data management platforms, pro people like Port Works. And the third is security platforms, right? You know, Palo Alto Networks, others Aqua are all in this. So these are the three platforms and there are platform engineering teams now that many of our largest customers, some of the largest banks, the largest service providers, they're all operating as a ES platform engineering team. And then now developers, to your point, developers are in the practice of being able to use these platforms to launch new services. So the, the actual IT ops, the ops are run by developers now and they can do it on these platforms. And the platform engineering team provide that as an ease of use and they're there to troubleshoot when problems happen. So the idea of DevOps as a ops practice and a platform is the newest thing. And, and ports and pure storage leading in the world of data management >>Platforms there. Talk about a customer example that you think really articulates the value that Port Works and Pure Storage delivers from a data management >>Perspective. Yeah, so there's so many examples. One of the, one of the longest running examples we have is a very, very large service provider that, you know, you all know and probably use, and they have been using us in the cable kind of set box or cable box business. They get streams of data from, from cable boxes all over the world. They collected all in a centralized large kind of thing and run elastic search and analytics on it. Now what they have done is they couldn't keep up with this at the scale and the depth, right? The speed of, of activity and the distributed nature of the activity. The only way to solve this was to use something like Kubernetes manage with Spark coming, bringing all the data in into deep, deep, deep silos of storage, which are all running not even on a sand, but on kind of, you know, very deep terabytes and terabytes of, of storage. So all of this is orchestrated with the he of Coworks and there's a platform engineering team. We are building that platform for them, them with some of these other components that allows them to kind of do analytics and, and make some changes in real time. Huge kind of setup for, for >>That. Yeah. Well, you guys have the right architecture. I love the vision. I love what you guys are doing. I think this is right in line with Pures. They've always been disruptors. I remember when we first interviewed the CEO and they started Yep. They, they stayed on path. They didn't waver. EMC was the big player. They ended up taking their lunch and dinner as well and they beat 'em in the marketplace. But now you got this traction here. So I have to ask you, how's the business, what's the results look like? You're a GM cloud native business unit of a storage company that's transformed and transforming. >>Yeah, you know, it's interesting, we just hit the two year anniversary, right John? And so what we did was just kind of like step back and hey to, you know, we're running so hard, you just take a step back and we've tripled the business in the two years since the acquisition, the two years before and, and we were growing through proven. So, you know, that that's quite a fee. And we've tripled the number of people, the amount of engineering investments we have, the number of go to market investments have been, have been awesome. So business is going really well though, I will say. But I think, you know, we have, we can't be, we're watching the market closely. You know, as a former ceo, I, you have to kind of learn to read the tea leaves when you invest. And I think, you know, what I would say is we're proceeding with caution in the next two quarters. I view business transformation as not a cancelable activity. So that's the, that's the good news, right? Our customers are large, >>It's >>Right. Never gonna stop prices, right? All they're gonna do is say, Hey, they're gonna put their hand, their hand was always going right on the dial. Now they're kind of putting their hand on the dial going, hey, where, what is happening? But my, my own sense of this is that people who continue to invest through it, the question is at what level? And I also think that this is a six month kind of watch, the watch where, where we put the dial. So Q4 and q1 I think are kind of, you know, we have our, our watch kind of watch the market sign. But I have the highest confidence. What >>Does your gut tell you? You're an >>Entrepreneur. My, my gut says that we'll go through a little bit of a cautious investment period in the next six months. And after that I think we're gonna be back in, back full, full in the crazy growth that we've always been. Yeah. We're gonna grow by the way, in the next, I think >>It's corn style. I think I'm, I'm more bullish. I think it's gonna be some, you know, weeding out of some overinvestment, pre covid or pre bubble. But I think tech's gonna continue to grow. I don't see >>It's stopping. Yeah. And, and the investment is gonna be on these core platforms. See, back to the platform story, it's gonna be in these lower platforms and on unifying everything, let's consume it better rather than let's go kind of experiment with a whole bunch of things all over the map, right? So you'll see less experimentation and more kind of, let's harvest some of the investments we've made in the last couple >>Of years and actually be able to, to enable companies in, in the industry to truly be data companies because absolutely. We talked about as a service, we all have these expectations that any service we want, we can get it. Yes. There's no delay because patients has gone Yeah. From the pandemic. >>So it is kind of, you know, tightening up the screws on what they've built. They, you know, adding some polish to it, adding some more capability, like I said, a, a a, a combination of harvesting and new investing. It's a combination I think is what we're gonna see. >>Yeah. What are some of the things that you're looking forward to? You talked about some of the, the growth things in the investment, but as we round out Q4 and head into a new year, what are you excited about? >>Yeah, so, you know, I mentioned our, as a service kind of platform. The global two K for us has been a set of customers who we co-create stuff with. And so one of the other set of things that we are very excited about and announcing is because we're deployed at scale, we're, we're, we have upgraded our backend. So we have now the ability to go to million IOPS and more and, and for, for the right backends. And so Kubernetes is a add-on, which will not slow down your, your core base infrastructure. Second thing that that we, we have is added a bunch of capability in the disaster recovery business continuity front, you know, we always had like metro kind of distance Dr. We had long distance dr. We've added a near sync Dr. So now we can provide disaster recovery and business continuity for metro distances across continents and across the planet. Right? That's kind of a major change that we've done. The third thing is we've added the capability for file block and Object. So now by adding object, we're really a complete solution. So it is really that maturity of the business Yeah. That you start seeing as enterprises move to embracing a platform approach, deploying it much more widely. You talked about the early majority. Yeah. Right. And so what they require is more enterprise class capability and those are all the things that we've been adding and we're really looking forward to it. >>Well it sounds like tremendous evolution and maturation of Port Works in the two years since it's been with Pure Storage. You talked about the cultural alignment, Great stuff that you are achieving. Congratulations on that. Great stuff >>Ahead and having fun. Let's not forget that that's too life's too short to do. It is. You're right. >>Right. Thank you. We will definitely, as always on the cube, keep our eyes on this space. Mur. Meley, it's been great to have you back on the program. Thank you for joining, John. >>Great. Thank you so much. It's a pleasure. Our, >>For our guests and John Furrier, Lisa Martin here live in Detroit with the cube about Cob Con Cloud native Con at 22. We'll be back after a short break.

Published Date : Oct 27 2022

SUMMARY :

So far we're talking all things, Well, not all things Kubernetes so much more than that. crossing over, it's crossing the CADs and with that you're seeing security concerns. Great to have you back really? Delightful to So I was looking on the website, number one in Kubernetes storage. And in the world of Kubernetes, it's a sign of that maturity that and made as a service, the more ready to consume it is. Storage a big part of the game right now as well as these environments. And so the cultural You were, you weren't just an asset for customers that is at the heart of, you know, purest 10,000 strong customer base, So satisfying everything that you do. So satisfying the cloud part of the data center. And in the last couple of years I would say that disruption and, and the storage vision, you know what disruption it means. And let's keep the developers out So here's the second trend that we are leading and, And the platform engineering team provide that as an ease of use and they're there to troubleshoot Talk about a customer example that you think really articulates the value that Port Works and Pure Storage The speed of, of activity and the distributed nature of the activity. I love the vision. And so what we did was just kind of like step back and hey to, you know, But I have the highest confidence. full in the crazy growth that we've always been. I think it's gonna be some, you know, weeding out of some overinvestment, experimentation and more kind of, let's harvest some of the investments we've made in the last couple in the industry to truly be data companies because absolutely. So it is kind of, you know, tightening up the screws on what they've the growth things in the investment, but as we round out Q4 and head into a new year, what are you excited about? of capability in the disaster recovery business continuity front, you know, You talked about the cultural alignment, Great stuff that you are achieving. Let's not forget that that's too life's too short to do. it's been great to have you back on the program. Thank you so much. For our guests and John Furrier, Lisa Martin here live in Detroit with the cube about Cob Con Cloud

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Detroit	LOCATION	0.99+
twice	QUANTITY	0.99+
Molly	PERSON	0.99+
One	QUANTITY	0.99+
six month	QUANTITY	0.99+
two	QUANTITY	0.99+
yesterday	DATE	0.99+
DevOps	TITLE	0.99+
two things	QUANTITY	0.99+
Three years	QUANTITY	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
Port Work	ORGANIZATION	0.99+
Murli Thirumale	PERSON	0.99+
10,000	QUANTITY	0.99+
second trend	QUANTITY	0.99+
Pure Storage	ORGANIZATION	0.99+
Coworks	ORGANIZATION	0.99+
both	QUANTITY	0.99+
third	QUANTITY	0.99+
Pure	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.98+
two years	QUANTITY	0.98+
third thing	QUANTITY	0.98+
one	QUANTITY	0.98+
three platforms	QUANTITY	0.98+
Half a day	QUANTITY	0.98+
Netflix	ORGANIZATION	0.98+
first	QUANTITY	0.98+
second thing	QUANTITY	0.98+
global two k	ORGANIZATION	0.97+
Kubernetes	TITLE	0.97+
25 years ago	DATE	0.97+
pandemic	EVENT	0.97+
global two k.	ORGANIZATION	0.96+
Spark	TITLE	0.96+
two trends	QUANTITY	0.96+
Second thing	QUANTITY	0.95+
two things	QUANTITY	0.94+
Port Works	ORGANIZATION	0.94+
Aqua	ORGANIZATION	0.94+
three major components	QUANTITY	0.93+
last five years	DATE	0.92+
both benefits	QUANTITY	0.92+
Pures	ORGANIZATION	0.91+
Con North America	ORGANIZATION	0.9+
Con Cloud	ORGANIZATION	0.9+
Con	EVENT	0.89+
two years	DATE	0.89+
22	DATE	0.89+
two K	QUANTITY	0.88+
day two	QUANTITY	0.88+
two year anniversary	QUANTITY	0.87+
Coan Cloud Native	ORGANIZATION	0.85+
two major trends	QUANTITY	0.84+
today	DATE	0.84+
last couple of years	DATE	0.82+
Mur. Meley	PERSON	0.82+
GM	ORGANIZATION	0.82+
q1	DATE	0.79+
Kubernetes	ORGANIZATION	0.79+
a terabyte	QUANTITY	0.78+
next six months	DATE	0.77+

Felix Van de Maele, Collibra | Data Citizens '22

(upbeat music) >> Last year, the Cube covered Data Citizens, Collibra's customer event. And the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know, starting with the Hadoop movement. We had data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like TensorFlow, the rise of AI, low code, no code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives. And we said at the time, you know, maybe it's time to rethink data innovation. While a lot of the effort has been focused on more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation, meaning making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back, and we're pleased to have Felix Van de Maele, who is the founder and CEO of Collibra. He's on the Cube. We're excited to have you, Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it, why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data Citizens 2021. And we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends, and we're not just snapping back to the 2010s. That's clear. And that's really true, as well, in the world of data. So what's different in your mind in the data landscape of the 2020s from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely. And I think you said it well, Dave, in the intro that rising complexity and fragmentation in the broader data landscape that hasn't gotten any better over the last couple of years. When we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten kind of more difficult. So that trend is continuing. I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under with respect to data, as data becomes more mission critical, as data becomes more impactful and important, the level of scrutiny with respect to privacy, security, regulatory compliance, is only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity and fragmentation. So it's become much more acute. And to your earlier point, we do live in a different world, and the past couple of years, we could probably just kind of brute force it, right? We could focus on the top line. There was enough kind of investments to be had. I think nowadays organizations are focused, or are in a very different environment where there's much more focus on cost control, productivity, efficiency. How do we truly get value from that data? So again, I think it's just another incentive for organizations to now truly look at that data and to scale that data, not just from a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? Like you said, the people and process, how do we do that at scale? And that's only becoming much more important. And we do believe that the economic environment that we find ourselves in today is going to be a catalyst for organizations to really take that more seriously if you will than they maybe have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your mission, and what are you doing to address these challenges? >> Yeah, absolutely. We started Collibra in 2008. So in some sense in the last kind of financial crisis. And that was really the start of Collibra, where we found product market fit working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the financial crisis, and kind of here we are again in a very different environment of course, 15 years, almost 15 years later. But data only becoming more important. But our mission to deliver trusted data for every user, every use case, and across every source, frankly has only become more important. So while it's been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to, again, be able to provide everyone, and that's why we call it Data Citizens. We truly believe that everyone in the organization should be able to use trusted data in an easy, easy manner. That mission is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we're still relatively early in that journey. >> Well, that's interesting because, you know, in my observation, it takes seven to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwinds, organizations are only maturing their data practices, and we've seen it kind of transform, or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world, whether it's Adobe, Heineken, Bank of America, and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the market with some of the cloud partners like Google, Amazon, Snowflake, Databricks, and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new, but I think there's a lot of reasons why we're so excited about quality and observability now. One is around leveraging AI, machine learning, again to drive more automation. And the second is that those data pipelines that are now being created in the cloud, in these modern data architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously has become absolutely critical. So we're really excited about that as well. And on the organizational side, I'm sure you've heard a term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale the data organizations, and so that aligns really well with our vision, and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know, Kirk Haslbeck and their team, it's interesting, you know, the whole data quality used to be this back office function and really confined to highly regulated industries. It's come to the front office, it's top of mind for chief data officers, data mesh, you mentioned. You guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're a critical part of many ecosystems, and you're developing your own ecosystem. So let's chat a little bit about the products. We're going to go deeper into products later on at Data Citizens '22, but we know you're debuting some new innovations, you know, whether it's, you know, the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We're super excited, a ton of innovation. And if we think about the big theme, and like I said, we're still relatively early in this journey towards kind of that mission of data intelligence, that really bold and compelling mission. Either customers are just starting on that journey, and we want to make it as easy as possible for the organization to actually get started, because we know that's important that they do. And for our organization and customers that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again, to make it easier for, really to accomplish that mission and vision around that data citizen that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that as Collibra becomes this kind of mission critical enterprise platform from a security performance architecture scale, supportability that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One is around data marketplace. Again, a lot of our customers have plans in that direction. How do we make it easy? How do we make available a true kind of shopping experience so that anybody in your organization can, in a very easy search first way, find the right data product, find the right data set that data can then consume, use its analytics. How do we help organizations drive adoption, tell them where they're working really well, and where they have opportunities. Home pages, again, to make things easy for people, for anyone in your organization, to kind of get started with Collibra. You mentioned workflow designer, again, we have a very powerful enterprise platform. One of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a new low code, no code, kind of workflow designer experience. So really customers can take it to the next level. There's a lot more new product around Collibra Protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PII data, is managed in a much more effective way. Really excited about that product. There's more around data quality. Again, how do we get that deployed as easily and quickly and widely as we can? Moving that to the cloud has been a big part of our strategy. So we launched our data quality cloud product as well as making use of those native compute capabilities in platforms like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability that we call push down. So we're actually pushing down the computer and data quality, the monitoring, into the underlying platform, which again, from a scale performance and ease of use perspective is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations that we talk about, being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure, and Google Cloud Storage as well. So there's a lot coming out. The team has been at work really hard, and we are really, really excited about what we are coming, what we're bringing to markets. >> Yeah, a lot going on there. I wonder if you could give us your closing thoughts. I mean, you talked about the marketplace, you know, you think about data mesh, you think of data as product, one of the key principles. You think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been been so hard, so how do you see sort of the future? And, you know, give us your closing thoughts please. >> Yeah, absolutely. And I think we're really at this pivotal moment, and I think you said it well. We all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to deliver this data intelligence vision, the data intelligence platform. We are still early, making it as easy as we can. It's kind of our, as our mission. And so I'm really, really excited to see what we are going to, how the markets are going to evolve over the next few quarters and years. I think the trend is clearly there, when we talk about data mesh, this kind of federated approach, focus on data products is just another signal that we believe that a lot of our organizations are now at the time, they understand the need to go beyond just the technology, how to really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now's the time given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now's the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much, and good luck in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in person event, and of course, the content post event is going to be available at collibra.com, and you can of course catch the Cube coverage at thecube.net, and all the news at siliconangle.com. This is Dave Vellante for the Cube, your leader in enterprise and emerging tech coverage. (light music)

Published Date : Oct 24 2022

SUMMARY :

And the premise that we put Thanks for having me again. of the 2020s from the previous decade, and the past couple of years, and what are you doing to and kind of here we are again What do people need to know And on the organizational side, And of course we see you at all the shows. for the organization to the technology to work and now's the time we need to look beyond I know you're going to crush it out there. and of course, the content post event

ENTITIES

Entity	Category	Confidence
Adobe	ORGANIZATION	0.99+
Heineken	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
San Diego	LOCATION	0.99+
Dave	PERSON	0.99+
Felix Van de Maele	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
seven	QUANTITY	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
2020s	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Python	TITLE	0.99+
2010s	DATE	0.99+
Last year	DATE	0.99+
thecube.net	OTHER	0.99+
Data Citizens	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
second	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
One	QUANTITY	0.99+
10 years	QUANTITY	0.99+
OwlDQ	ORGANIZATION	0.98+
Spark	TITLE	0.98+
TensorFlow	TITLE	0.97+
Data Citizens	EVENT	0.97+
today	DATE	0.97+
Kirk Haslbeck	PERSON	0.96+
over 600 enterprise customers	QUANTITY	0.96+
both	QUANTITY	0.96+
Collibra Protect	ORGANIZATION	0.96+
first way	QUANTITY	0.94+
one	QUANTITY	0.93+
last decade	DATE	0.93+
past couple of years	DATE	0.93+
collibra.com	OTHER	0.92+
15 years	QUANTITY	0.88+
about 18 months ago	DATE	0.87+
last couple of years	DATE	0.87+
last couple of years	DATE	0.83+
almost 15 years later	DATE	0.82+
Data	ORGANIZATION	0.81+
previous decade	DATE	0.76+
Data Citizens 2021	ORGANIZATION	0.73+
next 10 years	DATE	0.69+
quarters	DATE	0.67+
last	DATE	0.66+
Data Citizens 2022	ORGANIZATION	0.63+
Google Cloud	ORGANIZATION	0.63+
past year	DATE	0.62+
Storage	TITLE	0.6+
Azure	ORGANIZATION	0.59+
next	DATE	0.58+
case	QUANTITY	0.58+
Cube	ORGANIZATION	0.53+
single vertical	QUANTITY	0.53+
14	QUANTITY	0.46+
Cube	COMMERCIAL_ITEM	0.45+

Ali Ghodsi, Databricks | Supercloud22

(light hearted music) >> Okay, welcome back to Supercloud '22. I'm John Furrier, host of theCUBE. We got Ali Ghodsi here, co-founder and CEO of Databricks. Ali, Great to see you. Thanks for spending your valuable time to come on and talk about Supercloud and the future of all the structural change that's happening in cloud computing. >> My pleasure, thanks for having me. >> Well, first of all, congratulations. We've been talking for many, many years, and I still go back to the video that we have in archive, you talking about cloud. And really, at the beginning of the big reboot, I called the post Hadoop, a revitalization of data. Congratulations, you've been cloud-first, now on multiple clouds. Congratulations to you and your team for achieving what looks like a billion dollars in annualized revenue as reported by the Wall Street Journal, so first, congratulations. >> Thank you so much, appreciate it. >> So I was talking to some young developers and I asked a random poll, what do you think about Databricks? Oh, we love those guys, they're AI and ML-native, and that's their advantage over the competition. So I pressed why. I don't think they knew why, but that's an interesting perspective. This idea of cloud native, AI/ML-native, ML Ops, this has been a big trend and it's continuing. This is a big part of how this change and this structural change is happening. How do you react to that? And how do you see Databricks evolving into this new Supercloud-like multi-cloud environment? >> Yeah, look, I think it's a continuum. It starts with having data, but they want to clean it, you know, and they want to get insights out of it. But then, eventually, you'd like to start asking questions, doing reports, maybe ask questions about what was my revenue yesterday, last week, but soon you want to start using the crystal ball, predictive technology. Okay, but what will my revenue be next week? Next quarter? Who's going to churn? And if you can finally automate that completely so that you can act on the predictions, right? So this credit card that got swiped, the AI thinks it's fraud, we're going to deny it. That's when you get real value. So we're trying to help all these organizations move through this data AI maturity curve, all the way to that, the prescriptive, automated AI machine learning. That's when you get real competitive advantage. And you know, we saw that with the fans, right? I mean, Google wouldn't be here today if it wasn't for AI. You know, we'd be using AltaVista or something. We want to help all organizations to be able to leverage data and AI that way that the fans did. >> One of the things we're looking at with supercloud and why we call it supercloud versus other things like multi-cloud is that today a lot of the successful companies have started in the cloud have been successful, but have realized and even enterprises who have gotten by accident, and maybe have done nothing with cloud have just some cloud projects on multiple clouds. So, people have multiple cloud operational things going on but it hasn't necessarily been a strategy per se. It's been more of kind of a default reaction to things but the ones that are innovating have been successful in one native cloud because the use cases that drove that got scale got value, and then they're making that super by bringing it on premise, putting in a modern data stack, for the modern application development, and kind of dealing with the things that you guys are in the middle of with data bricks is that, that is where the action is, and they don't want to go, lose the trajectory in all the economies of scale. So we're seeing another structural change where the evolutionary nature of the cloud has solved a bunch of use cases, but now other use cases are emerging that's on premises and edge that have been driven by applications because of the developer boom, that's happening. You guys are in the middle of it. What is happening with this structural change? Are people looking for the modern data stack? Are they looking for more AI? What's the, what's your perspective on this supercloud kind of position? >> Look, it started with not AR on multiple clouds, right? So multi-cloud has been a thing. It became a thing 70, 80% of our customers when you ask them, they're more than one cloud. But then soon to start realizing that, hey, you know, if I'm on multiple clouds, this data stuff is hard enough as it is. Do I want to redo it again and again with different proprietary technologies, on each of the clouds. And that's when I started thinking about let's standardize this, let's figure out a way which just works across them. That's where I think open source comes in, becomes really important. Hey, can we leverage open standards because then we can make it work in these different environments, as we said so that we can actually go super, as you said, that's one. The second thing is, can we simplify it? You know, and I think today, the data landscape is complicated. Conceptually it's simple. You have data which is essentially customer data that you have, maybe employee data. And you want to get some kind of insights from that. But how you do that is very complicated. You have to buy data warehouse, hire data analysts. You have to buy, store stuff in the Delta Lake you know, get your data engineers. If you want streaming real time thing that's another complete different set of technologies you have to buy. And then you have to stitch all these together, and you have to do again and again on every cloud. So they just want simplification. So that's why we're big believers in this Delta Lakehouse concept. Which is an open standard to simplifying this data stack and help people to just get value out of their data in any environment. So they can do that in this sort of supercloud as you call it. >> You know, we've been talking about that in previous interviews, do the heavy lifting let them get the value. I have to ask you about how you see that going forward, Because if I'm a customer, I have a lot of operational challenges. Cause the developers are are kicking butt right now. We see that clearly. Open sources growing at, and continue to be great. But ops and security teams they really care about this stuff. And most companies don't want to spin up multiple ops teams to deal with different stacks. This is one big problem that I think that's leading into the multi-cloud viability. How do you guys deal with that? How do you talk to customers when they say, I want to have less complications on operations? >> Yeah, you're absolutely right. You know, it's easy for a developer to adopt all these technologies and new things are coming out all the time. The ops teams are the ones that have to make sure this works. Doing that in multiple different environments is super hard. especially when there's a proprietary stack in each environment that's different. So they just want standardization. They want open source, that's super important. We hear that all the time from them. They want open the source technologies. They believe in the communities around it. You know, they know that source code is open. So you can also see if there's issues with it. If there's security breaches, those kind of things that they can have a community around it. So they can actually leverage that. So they're the ones that are really pushing this, and we're seeing it across the board. You know, it starts first with the digital natives you know, the companies that are, but slowly it's also now percolating to the other organizations, we're hearing across the board. >> Where are we, Ali on the innovation strategies for customers? Where are they on the trajectory around how they're building out their teams? How are they looking at the open source? How are they extending the value proposition of Databricks, and data at scale, as they start to build out their teams and operations, because some are like kind of starting, crawl, walk, run, kind of vibe. Some are big companies, they're dealing with data all the time. Where are they in their journey? What's the core issues that they're solving? What are some of the use cases that you see that are most pressing in customer? >> Yeah, what I've seen, that's really exciting about this Delta Lakehouse concept is that we're now seeing a lot of use cases around real time. So real time fraud detection, real time stock ticker pricing, anyone that's doing trading, they want that to work real time. Lots of use cases around that. Lots of use cases around how do we in real time drive more engagement on our web assets if we're a media company, right? We have all these assets how do we get people to get engaged? Stay on our sites. Continue engaging with the material we have. Those are real time use cases. And the interesting thing is, they're real time. So, you know, it's really important that you that now you don't want to recommend someone, hey, you should go check out this restaurant if they just came from that restaurant, half an hour ago. So you want it to be real time, but B, that it's also all based on machine learning. These are a lot of this is trying to predict what you want to see, what you want to do, is it fraudulent? And that's also interesting because basically more and more machine learning is coming in. So that's super exciting to see, the combination of real time and machine learning on the Lakehouse. And finally, I would say the Lakehouse is really important for this because that's where the data is flowing in. If they have to take that data that's flowing into the lake and actually copy it into a separate warehouse, that delays the real time use cases. And then it can't hit those real time deadlines. So that's another catalyst for this Lakehouse pattern. >> Would that be an example of how the metrics are changing? Cause I've been looking at some people saying, well you can tell if someone's doing well there's a lot of data being transferred. And then I was saying, well, wait a minute. Data transfer costs money, right? And time. So this is interesting dynamic, in a way you don't want to have a lot of movement, right? >> Yeah, movement actually decreases for a lot of these real time use cases. 'Cause what we saw in the past was that they would run a batch processing to process all the data. So once they process all the data. But actually if you look at the things that have changed since the data that we have yesterday it's actually not that much. So if you can actually incrementally process it in real time, you can actually reduce the cost of transfers and storage and processing. So that's actually a great point. That's also one of the main things that we're seeing with the use cases, the bill shrinks and the cost goes down, and they can process less. >> Yeah, and it'd be interesting to see how those KPIs evolve into industry metrics down the road around the supercloud of evolution. I got to ask you about the open source concept of data platforms. You guys have been a pioneer in there doing great work, kind of picking the baton off where the Hadoop World left off as Dave Vellante always points out. But if working across clouds is super important. How are you guys looking at the ability to work across the different clouds with data bricks? Are you going to build that abstraction yourself? Does data sharing and model sharing kind of come into play there? How do you see this data bricks capability across the clouds? >> Yeah, I mean, let me start by saying, we just we're big fans of open source. We think that open source is a force in software. That's going to continue for, decades, hundreds of years, and it's going to slowly replace all proprietary code in its way. We saw that, it could do that with the most advanced technology. Windows, you know proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the Delta Lakehouse is that slowly the open source community is building a replacement for the proprietary data warehouse, Delta Lake, machine learning, real time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you layout your data in the cloud. And when it comes a really important protocol called data sharing, that enables you in a open way actually for the first time ever share large data sets between organizations, but it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't need to even like Databricks, you just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently just one copy of the data. So you don't have to copy it if you're within the same cloud. >> So you're playing the long game on open source. >> Absolutely. I mean, this is a force it's going to be there if if you deny it, before you know it there's going to be, something like Linux, that is going to be a threat to your propriety. >> I totally agree by the way. I was just talking to somebody the other day and they're like hey, the software industry someone made the comment, the software industry, the software industry is open source. There's no more software industry, it's called open source. It's integrations that become interesting. And I was looking at integrations now is really where the action is. And we had a panel with the Clouderati we called it, the people have been around for a long time. And it was called the innovator's dilemma. And one of the comments was it's the integrator's dilemma, not the innovator's dilemma. And this is a big part of this piece of supercloud. Can you share your thoughts on how cloud and integration need to be tightened up to really make it super? >> Actually that's a great point. I think the beauty of this is, look the ecosystem of data today is vast, there's this picture that someone puts together every year of all the different vendors and how they relate, and it gets bigger and bigger and messy and messier. So, we see customers use all kinds of different aspects of what's existing in the ecosystem and they want it to be integrated in whatever you're selling them. And that's where I think the power of open source comes in. Open source, you get integrations that people will do without you having to push it. So us, Databricks as a vendor, we don't have to go tell people please integrate with Databricks. The open source technology that we contribute to, automatically, people are integrating with it. Delta Lake has integrations with lots of different software out there and Databricks as a company doesn't have to push that. So I think open source is also another thing that really helps with the ecosystem integrations. Many of these companies in this data space actually have employees that are full-time dedicated to make sure make sure our software works well with Spark. Make sure our software works well with Delta and they contribute back to that community. And that's the way you get this sort of ecosystem to further sort of flourish. >> Well, I really appreciate your time. And I, my final question for you is, as we're kind of unpack and and kind of shape and frame supercloud for the future, how would you see a roadmap or architecture or outcome for companies that are going to clearly be in the cloud where it's open source is going to be dominating. Integrations has got to be seamless and frictionless. Abstraction layer make things super easy and take away the complexity. What is supercloud to them? What does the outcome look like? How would you define a supercloud environment for an enterprise? >> Yeah, for me, it's the simplification that you get where you standardize an open source. You get your data in one place, in one format in one standardized way, and then you can get your insights from it, without having to buy lots of different idiosyncratic proprietary software from different vendors. That's different in each environment. So it's this slow standardization that's happening. And I think it's going to happen faster than we think. And I think in a couple years it's going to be a requirement that, does your software work on all these different departments? Is it based on open source? Is it using this Delta Lake house pattern? And if it's not, I think they're going to demand it. >> Yeah, I feel like we're close to some sort of defacto standard coming and you guys are a big part of it, once that clicks in, it's going to highly accelerate in the open, and I think it's going to be super valuable. Ali, thank you so much for your time, and congratulations to you and your team. Like we've been following you guys since the beginning. Remember the early days and look how far it's come. And again, you guys are really making a big difference in making a super cool environment out there. Thanks for coming on sharing. >> Thank you so much John. >> Okay, this is supercloud 22. I'm John Furrier stay with more for more coverage and more commentary after this break. (light hearted music)

Published Date : Aug 7 2022

SUMMARY :

and the future of all Congratulations to you and your team And how do you see Databricks evolving And if you can finally One of the things we're And then you have to I have to ask you about how We hear that all the time from them. What are some of the use cases that delays the real time use cases. in a way you don't want to So if you can actually incrementally I got to ask you about So you don't have to copy it So you're playing the that is going to be a And one of the comments was And that's the way you and take away the complexity. simplification that you get and congratulations to you and your team. Okay, this is supercloud 22.

ENTITIES

Entity	Category	Confidence
Ali Ghodsi	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Google	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
John	PERSON	0.99+
last week	DATE	0.99+
next week	DATE	0.99+
Ali	PERSON	0.99+
Next quarter	DATE	0.99+
yesterday	DATE	0.99+
John Furrier	PERSON	0.99+
Delta	ORGANIZATION	0.99+
one format	QUANTITY	0.99+
first	QUANTITY	0.99+
today	DATE	0.98+
second thing	QUANTITY	0.98+
one	QUANTITY	0.98+
Linux	TITLE	0.98+
one copy	QUANTITY	0.98+
Delta Lakehouse	ORGANIZATION	0.98+
supercloud 22	ORGANIZATION	0.98+
more than one cloud	QUANTITY	0.98+
each environment	QUANTITY	0.98+
Clouderati	ORGANIZATION	0.98+
Supercloud22	ORGANIZATION	0.98+
hundreds of years	QUANTITY	0.97+
Delta Lake	LOCATION	0.97+
one big problem	QUANTITY	0.97+
70, 80%	QUANTITY	0.97+
Windows	TITLE	0.96+
one place	QUANTITY	0.96+
first time	QUANTITY	0.96+
billion dollars	QUANTITY	0.95+
decades	QUANTITY	0.95+
Delta Lake	ORGANIZATION	0.95+
One	QUANTITY	0.94+
supercloud	ORGANIZATION	0.94+
Supercloud	ORGANIZATION	0.94+
half an hour ago	DATE	0.93+
Delta Lake	TITLE	0.92+
Lakehouse	ORGANIZATION	0.92+
Spark	TITLE	0.91+
each	QUANTITY	0.91+
a minute	QUANTITY	0.85+
one of	QUANTITY	0.73+
one native	QUANTITY	0.72+
supercloud	TITLE	0.7+
couple years	QUANTITY	0.66+
AltaVista	ORGANIZATION	0.65+
Wall Street Journal	ORGANIZATION	0.63+
theCUBE	ORGANIZATION	0.63+
Lakehouse	TITLE	0.51+
Lake	LOCATION	0.46+
Hadoop World	TITLE	0.41+
'22	EVENT	0.24+

Evolution of Data Lakes

(light music) >> Kevin Miller joins us. He's the Vice President and General Manager of Amazon S3. And we're going to discuss the evolution of data lakes. Hey Kevin. >> Hey Dave. Great to be here. >> Yeah, let's riff on this a little bit. Why is S3 so popular for data lakes? How have data lakes on S3 changed and evolved? >> Well, I think a lot of the core benefits of S3 really play directly into what customers are looking for when they're building a data lake, right? They're looking for low cost storage, some place that they can put shared data sets and have, make it very easy for other teams and businesses to access a set of data as well as have all the management around it. Knowing that the data's secure, is durable, it's protected. And so all of the capability that S3 provides out of the box, is just a really good fit for what customers need out of a data lake storage provider. >> And it's really the simple form. I remember when Schema on Read hit, and people were like, oh great, we can just shove all our stuff into a data lake. And then of course the old broma it became a data swamp. But the industry has evolved, hasn't it? It has new tools, machine intelligence and AI, and machine learning have really helped a lot. Talk about how that's changed from the, the old days if you will, where it was just kind of this mess and you really couldn't do much with it. And why today we're able to get so much more out of data lakes. >> Yeah. I think that original use of data lakes centered a lot around analytics and sort of Hadoop or Spark type applications. And that continues to be a big driver. But I think that one is that we're continuing to expand the kinds of applications. Like you mentioned, machine learning, or other kinds of intelligence are, those applications are increasing as things that customers want to do around these shared data sets. And being able to pretty easily sort of dynamically combine data sets together and use that to drive more insight. I think that you're absolutely right. You know, if you left unstructured or left without any kind of governance you can quickly develop a lot of unusable data. And so I think we're seeing the evolution is in customers putting more of a governance structure in place around it, really trying to understand and catalog the the data sets they have. And I think that's going to continue. That's something that we're seeing pretty actively develop right now in terms of knowing what data I have, knowing the essence of metadata around it. As far as how frequently is this data being updated? When is it updated? What are the rules around when I can access it and so forth. As well as around data lake access control, making it very easy to grant an end user, a specific end user, access to certain data sets knowing that they can then audit and really know exactly who has access to what data in that data lake. So you're seeing a lot of that governance type structure come around while not taking away the essence of having a simple, low cost, scalable way to store and then access data from a number of applications. So that's all now starting to really come together, I see. >> I think this is a really important point you're making because I see organizations rethinking their data architecture and their data organizations to really put put data in the hands of the lines of business, those with domain expertise and self-service is becoming really important. I see a lot of organizations say, 'Hey we're going to give the lines of business their own data lakes that they can spin up' but, they have to be governed in a federated fashion. I know you guys use this term lake house. How do these things fit together? >> Well, Dave, I think you're absolutely right. I think that what a lot of organizations, what I see a lot of organizations doing is evolving to a point where they want as minimal layers between someone who owns a business outcome. Whether it's a top level revenue generation line or bottom level cost line, they want to connect the people who are in the, closest to the business problem with the applications and the technology that they can use to solve it. And that's, a big part of that then is the data and the data sets that are available. So I think where it needs to come together and where it is coming together is around making it very easy to federate, to know what data sources I have, to know what the rules are around accessing it, to remove as much of the friction as we can around just the basics of provisioning access. Knowing that this set of people is allowed to access it. And how do they access it. Just as much as possible removing that, so that it's not weeks between when I have an idea and when I can build an application to process that data. Ideally it's within an hour, I have an idea, I can spin up a notebook. I can pull in the data sets I need. Train an ML algorithm or build some analytics function and then start to see some results and see is this really working or not? And then of course sort of scale it up from there in a seamless fashion. So I think that a lot of the essence of AWS that we've built over the years is really starting to come together. And where we are continuing to make it simpler for customers is all around that federation and the simplicity of provisioning access to the data. >> And share that data across a massive global network. Kevin Miller, thanks so much for coming on theCube and talking about data lakes. >> Yeah. Thanks for having me, Dave. >> You're welcome. And thank you for watching. This is Dave Vellante for theCube. (light music)

Published Date : Aug 1 2022

SUMMARY :

the evolution of data lakes. Why is S3 so popular for data lakes? And so all of the And it's really the simple form. And I think that's going to continue. of the lines of business, of the essence of AWS And share that data across And

ENTITIES

Entity	Category	Confidence
Kevin Miller	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Kevin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
today	DATE	0.94+
an hour	QUANTITY	0.93+
Spark	TITLE	0.93+
Schema	TITLE	0.87+
S3	TITLE	0.84+
one	QUANTITY	0.81+
Hadoop	TITLE	0.67+
S3	COMMERCIAL_ITEM	0.46+

Haseeb Budhani, Rafay & Kevin Coleman, AWS | AWS Summit New York 2022

(gentle music) (upbeat music) (crowd chattering) >> Welcome back to The City That Never Sleeps. Lisa Martin and John Furrier in New York City for AWS Summit '22 with about 10 to 12,000 of our friends. And we've got two more friends joining us here today. We're going to be talking with Haseeb Budhani, one of our alumni, co-founder and CEO of Rafay Systems, and Kevin Coleman, senior manager for Go-to Market for EKS at AWS. Guys, thank you so much for joining us today. >> Thank you very much for having us. Excited to be here. >> Isn't it great to be back at an in-person event with 10, 12,000 people? >> Yes. There are a lot of people here. This is packed. >> A lot of energy here. So, Haseeb, we've got to start with you. Your T-shirt says it all. Don't hate k8s. (Kevin giggles) Talk to us about some of the trends, from a Kubernetes perspective, that you're seeing, and then Kevin will give your follow-up. >> Yeah. >> Yeah, absolutely. So, I think the biggest trend I'm seeing on the enterprise side is that enterprises are forming platform organizations to make Kubernetes a practice across the enterprise. So it used to be that a BU would say, "I need Kubernetes. I have some DevOps engineers, let me just do this myself." And the next one would do the same, and then next one would do the same. And that's not practical, long term, for an enterprise. And this is now becoming a consolidated effort, which is, I think it's great. It speaks to the power of Kubernetes, because it's becoming so important to the enterprise. But that also puts a pressure because what the platform team has to solve for now is they have to find this fine line between automation and governance, right? I mean, the developers, you know, they don't really care about governance. Just give me stuff, I need to compute, I'm going to go. But then the platform organization has to think about, how is this going to play for the enterprise across the board? So that combination of automation and governance is where we are finding, frankly, a lot of success in making enterprise platform team successful. I think, that's a really new thing to me. It's something that's changed in the last six months, I would say, in the industry. I don't know if, Kevin, if you agree with that or not, but that's what I'm seeing. >> Yeah, definitely agree with that. We see a ton of customers in EKS who are building these new platforms using Kubernetes. The term that we hear a lot of customers use is standardization. So they've got various ways that they're deploying applications, whether it's on-prem or in the cloud and region. And they're really trying to standardize the way they deploy applications. And Kubernetes is really that compute substrate that they're standardizing on. >> Kevin, talk about the relationship with Rafay Systems that you have and why you're here together. And two, second part of that question, why is EKS kicking ass so much? (Haseeb and Kevin laughing) All right, go ahead. First one, your relationship. Second one, EKS is doing pretty well. >> Yep, yep, yep. (Lisa laughing) So yeah, we work closely with Rafay, Rafay, excuse me. A lot of joint customer wins with Haseeb and Co, so they're doing great work with EKS customers and, yeah, love the partnership there. In terms of why EKS is doing so well, a number of reasons, I think. Number one, EKS is vanilla, upstream, open-source Kubernetes. So customers want to use that open-source technology, that open-source Kubernetes, and they come to AWS to get it in a managed offering, right? Kubernetes isn't the easiest thing to self-manage. And so customers, you know, back before EKS launched, they were banging down the door at AWS for us to have a managed Kubernetes offering. And, you know, we launched EKS and there's been a ton of customer adoption since then. >> You know, Lisa, when we, theCUBE 12 years, now everyone knows we started in 2010, we used to cover a show called OpenStack. >> I remember that. >> OpenStack Summit. >> What's that now? >> And at the time, at that time, Kubernetes wasn't there. So theCUBE was present at creation. We've been to every KubeCon ever, CNCF then took it over. So we've been watching it from the beginning. >> Right. And it reminds me of the same trend we saw with MapReduce and Hadoop. Very big promise, everyone loved it, but it was hard, very difficult. And Hadoop's case, big data, it ended up becoming a data lake. Now you got Spark, or Snowflake, and Databricks, and Redshift. Here, Kubernetes has not yet been taken over. But, instead, it's being abstracted away and or managed services are emerging. 'Cause general enterprises can't hire enough Kubernetes people. >> Yep. >> They're not that many out there yet. So there's the training issue. But there's been the rise of managed services. >> Yep. >> Can you guys comment on what your thoughts are relative to that trend of hard to use, abstracting away the complexity, and, specifically, the managed services? >> Yeah, absolutely. You want to go? >> Yeah, absolutely. I think, look, it's important to not kid ourselves. It is hard. (Johns laughs) But that doesn't mean it's not practical, right. When Kubernetes is done well, it's a thing of beauty. I mean, we have enough customer to scale, like, you know, it's like a, forget a hockey stick, it's a straight line up, because they just are moving so fast when they have the right platform in place. I think that the mistake that many of us make, and I've made this mistake when we started this company, was trivializing the platform aspect of Kubernetes, right. And a lot of my customers, you know, when they start, they kind of feel like, well, this is not that hard. I can bring this up and running. I just need two people. It'll be fine. And it's hard to hire, but then, I need two, then I need two more, then I need two, it's a lot, right. I think, the one thing I keep telling, like, when I talk to analysts, I say, "Look, somebody needs to write a book that says, 'Yes, it's hard, but, yes, it can be done, and here's how.'" Let's just be open about what it takes to get there, right. And, I mean, you mentioned OpenStack. I think the beauty of Kubernetes is that because it's such an open system, right, even with the managed offering, companies like Rafay can build really productive businesses on top of this Kubernetes platform because it's an open system. I think that is something that was not true with OpenStack. I've spent time with OpenStack also, I remember how it is. >> Well, Amazon had a lot to do with stalling the momentum of OpenStack, but your point about difficulty. Hadoop was always difficult to maintain and hiring against. There were no managed services and no one yet saw that value of big data yet. Here at Kubernetes, people are living a problem called, I'm scaling up. >> Yep. And so it sounds like it's a foundational challenge. The ongoing stuff sounds easier or manageable. >> Once you have the right tooling. >> Is that true? >> Yeah, no, I mean, once you have the right tooling, it's great. I think, look, I mean, you and I have talked about this before, I mean, the thesis behind Rafay is that, you know, there's like 8, 12 things that need to be done right for Kubernetes to work well, right. And my whole thesis was, I don't want my customer to buy 10, 12, 15 products. I want them to buy one platform, right. And I truly believe that, in our market, similar to what vCenter, like what VMware's vCenter did for VMs, I want to do that for Kubernetes, right. And that the reason why I say that is because, see, vCenter is not about hypervisors, right? vCenter is about hypervisor, access, networking, storage, all of the things, like multitenancy, all the things that you need to run an enterprise-grade VM environment. What is that equivalent for the Kubernetes world, right? So what we are doing at Rafay is truly building a vCenter, but for Kubernetes, like a kCenter. I've tried getting the domain. I couldn't get it. (Kevin laughs) >> Well, after the Broadcom view, you don't know what's going to happen. >> Ehh. (John laughs) >> I won't go there! >> Yeah. Yeah, let's not go there today. >> Kevin, EKS, I've heard people say to me, "Love EKS. Just add serverless, that's a home run." There's been a relationship with EKS and some of the other Amazon tools. Can you comment on what you're seeing as the most popular interactions among the services at AWS? >> Yeah, and was your comment there, add serverless? >> Add serverless with AKS at the edge- >> Yeah. >> and things are kind of interesting. >> I mean, so, one of the serverless offerings we have today is actually Fargate. So you can use Fargate, which is our serverless compute offering, or one of our serverless compute offerings with EKS. And so customers love that. Effectively, they get the beauty of EKS and the Kubernetes API but they don't have to manage nodes. So that's, you know, a good amount of adoption with Fargate as well. But then, we also have other ways that they can manage their nodes. We have managed node groups as well, in addition to self-managed nodes also. So there's a variety of options that customers can use from a compute perspective with EKS. And you'll continue to see us evolve the portfolio as well. >> Can you share, Haseeb, can you share a customer example, a joint customer example that you think really articulates the value of what Rafay and AWS are doing together? >> Yeah, absolutely. In fact, we announced a customer very recently on this very show, which is MoneyGram, which is a joint AWS and Rafay customer. Look, we have enough, you know, the thing about these massive customers is that, you know, not everybody's going to give us their logo to use. >> Right. >> But MoneyGram has been a Rafay plus EKS customer for a very, very long time. You know, at this point, I think we've earned their trust, and they've allowed us to, kind of say this publicly. But there's enough of these financial services companies who have, you know, standardized on EKS. So it's EKS first, Rafay second, right. They standardized on EKS. And then they looked around and said, "Who can help me platform EKS across my enterprise?" And we've been very lucky. We have some very large financial services, some very large healthcare companies now, who, A, EKS, B, Rafay. I'm not just saying that because my friend Kevin's here, (Lisa laughs) it's actually true. Look, EKS is a brilliant platform. It scales so well, right. I mean, people try it out, relative to other platforms, and it's just a no-brainer, it just scales. You want to build a big enterprise on the backs of a Kubernetes platform. And I'm not saying that's because I'm biased. Like EKS is really, really good. There's a reason why so many companies are choosing it over many other options in the market. >> You're doing a great job of articulating why the theme (Kevin laughs) of the New York City Summit is scale anything. >> Oh, yeah. >> There you go. >> Oh, yeah. >> I did not even know that but I'm speaking the language, right? >> You are. (John laughs) >> Yeah, absolutely. >> One of the things that we're seeing, also, I want to get your thoughts on, guys, is the app modernization trend, right? >> Yep. >> Because unlike other standards that were hard, that didn't have any benefit downstream 'cause they were too hard to get to, here, Kubernetes is feeding into real app for app developer pressure. They got to get cloud-native apps out. It's fairly new in the mainstream enterprise and a lot of hyperscalers have experience. So I'm going to ask you guys, what is the key thing that you're enabling with Kubernetes in the cloud-native apps? What is the key value? >> Yeah. >> I think, there's a bifurcation happening in the market. One is the Kubernetes Engine market, which is like EKS, AKS, GKE, right. And then there's the, you know, what, back in the day, we used to call operations and management, right. So the OAM layer for Kubernetes is where there's need, right. People are learning, right. Because, as you said before, the skill isn't there, you know, there's not enough talent available to the market. And that's the opportunity we're seeing. Because to solve for the standardization, the governance, and automation that we talked about earlier, you know, you have to solve for, okay, how do I manage my network? How do I manage my service mesh? How do I do chargebacks? What's my, you know, policy around actual Kubernetes policies? What's my blueprinting strategy? How do I do add-on management? How do I do pipelines for updates of add-ons? How do I upgrade my clusters? And we're not done yet, there's a longer list, right? This is a lot, right? >> Yeah. >> And this is what happens, right. It's just a lot. And really, the companies who understand that plethora of problems that need to be solved and build easy-to-use solutions that enterprises can consume with the right governance automation, I think they're going to be very, very successful here. >> Yeah. >> Because this is a train, right? I mean, this is happening whether, it's not us, it's happening, right? Enterprises are going to keep doing this. >> And open-source is a big driver in all of this. >> Absolutely. >> Absolutely. >> And I'll tag onto that. I mean, you talked about platform engineering earlier. Part of the point of building these platforms on top of Kubernetes is giving developers an easier way to get applications into the cloud. So building unique developer experiences that really make it easy for you, as a software developer, to take the code from your laptop, get it out of production as quickly as possible. The question is- >> So is that what you mean, does that tie your point earlier about that vertical, straight-up value once you've set up it, right? >> Yep. >> Because it's taking the burden off the developers for stopping their productivity. >> Absolutely. >> To go check in, is it configured properly? Is the supply chain software going to be there? Who's managing the services? Who's orchestrating the nodes? >> Yep. >> Is that automated, is that where you guys see the value? >> That's a lot of what we see, yeah. In terms of how these companies are building these platforms, is taking all the component pieces that Haseeb was talking about and really putting it into a cohesive whole. And then, you, as a software developer, you don't have to worry about configuring all of those things. You don't have to worry about security policy, governance, how your app is going to be exposed to the internet. >> It sounds like infrastructure is code. >> (laughs) Yeah. >> Come on, like. >> (laughs) Infrastructure's code is a big piece of it, for sure, for sure. >> Yeah, look, infrastructure's code actually- >> Infrastructure's sec is code too, the security. >> Yeah. >> Huge. >> Well, it all goes together. Like, we talk about developer self-service, right? The way we enable developer self-service is by teaching developers, here's a snippet of code that you write and you check it in and your infrastructure will just magically be created. >> Yep. >> But not automatically. It's going to go through a check, like a check through the platform team. These are the workflows that if you get them right, developers don't care, right. All developers want is I want to compute. But then all these 20 things need to happen in the back. That's what, if you nail it, right, I mean, I keep trying to kind of pitch the company, I don't want to do that today. But if you nail that, >> I'll give you a plug at the end. >> you have a good story. >> But I got to, I just have a tangent question 'cause you reminded me. There's two types of developers that have emerged, right. You have the software developer that wants infrastructures code. I just want to write my code, I don't want to stop. I want to build in shift-left for security, shift-right for data. All that's in there. >> Right. >> I'm coding away, I love coding. Then you've got the under-the-hood person. >> Yes. >> I've been to the engines. >> Certainly. >> So that's more of an SRE, data engineer, I'm wiring services together. >> Yeah. >> A lot of people are like, they don't know who they are yet. They're in college or they're transforming from an IT job. They're trying to figure out who they are. So question is, how do you tell a person that's watching, like, who am I? Like, should I be just coding? But I love the tech. Would you guys have any advice there? >> You know, I don't know if I have any guidance in terms of telling people who they are. (all laughing) I mean, I think about it in terms of a spectrum and this is what we hear from customers, is some customers want to shift as much responsibility onto the software teams to manage their infrastructure as well. And then some want to shift it all the way over to the very centralized model. And, you know, we see everything in between as well with our EKS customer base. But, yeah, I'm not sure if I have any direct guidance for people. >> Let's see, any wisdom? >> Aside from experiment. >> If you're coding more, you're a coder. If you like to play with the hardware, >> Yeah. >> or the gears. >> Look, I think it's really important for managers to understand that developers, yes, they have a job, you have to write code, right. But they also want to learn new things. It's only fair, right. >> Oh, yeah. >> So what we see is, developers want to learn. And we enable for them to understand Kubernetes in small pieces, like small steps, right. And that is really, really important because if we completely abstract things away, like Kubernetes, from them, it's not good for them, right. It's good for their careers also, right. It's good for them to learn these things. This is going to be with us for the next 15, 20 years. Everybody should learn it. But I want to learn it because I want to learn, not because this is part of my job, and that's the distinction, right. I don't want this to become my job because I want, I want to write my code. >> Do what you love. If you're more attracted to understanding how automation works, and robotics, or making things scale, you might be under-the-hood. >> Yeah. >> Yeah, look under the hood all day long. But then, in terms of, like, who keeps the lights on for the cluster, for example. >> All right, see- >> That's the job. >> He makes a lot of value. Now you know who you are. Ask these guys. (Lisa laughing) Congratulations on your success on EKS 2. >> Yeah, thank you. >> Quick, give a plug for the company. I know you guys are growing. I want to give you a minute to share to the audience a plug that's going to be, what are you guys doing? You're hiring? How many employees? Funding? Customer new wins? Take a minute to give a plug. >> Absolutely. And look, I come see, John, I think, every show you guys are doing a summit or a KubeCon, I'm here. (John laughing) And every time we come, we talk about new customers. Look, platform teams at enterprises seem to love Rafay because it helps them build that, well, Kubernetes platform that we've talked about on the show today. I think, many large enterprises on the financial service side, healthcare side, digital native side seem to have recognized that running Kubernetes at scale, or even starting with Kubernetes in the early days, getting it right with the right standards, that takes time, that takes effort. And that's where Rafay is a great partner. We provide a great SaaS offering, which you can have up and running very, very quickly. Of course, we love EKS. We work with our friends at AWS. But also works with Azure, we have enough customers in Azure. It also runs in Google. We have enough customers at Google. And it runs on-premises with OpenShift or with EKS A, right, whichever option you want to take. But in terms of that standardization and governance and automation for your developers to move fast, there's no better product in the market right now when it comes to Kubernetes platforms than Rafay. >> Kevin, while we're here, why don't you plug EKS too, come on. >> Yeah, absolutely, why not? (group laughing) So yes, of course. EKS is AWS's managed Kubernetes offering. It's the largest managed Kubernetes service in the world. We help customers who want to adopt Kubernetes and adopt it wherever they want to run Kubernetes, whether it's in region or whether it's on the edge with EKS A or running Kubernetes on Outposts and the evolving portfolio of EKS services as well. We see customers running extremely high-scale Kubernetes clusters, excuse me, and we're here to support them as well. So yeah, that's the managed Kubernetes offering. >> And I'll give the plug for theCUBE, we'll be at KubeCon in Detroit this year. (Lisa laughing) Lisa, look, we're giving a plug to everybody. Come on. >> We're plugging everybody. Well, as we get to plugs, I think, Haseeb, you have a book to write, I think, on Kubernetes. And I think you're wearing the title. >> Well, I do have a book to write, but I'm one of those people who does everything at the very end, so I will never get it right. (group laughing) So if you want to work on it with me, I have some great ideas. >> Ghostwriter. >> Sure! >> But I'm lazy. (Kevin chuckles) >> Ooh. >> So we got to figure something out. >> Somehow I doubt you're lazy. (group laughs) >> No entrepreneur's lazy, I know that. >> Right? >> You're being humble. >> He is. So Haseeb, Kevin, thank you so much for joining John and me today, >> Thank you. >> talking about what you guys are doing at Rafay with EKS, the power, why you shouldn't hate k8s. We appreciate your insights and your time. >> Thank you as well. >> Yeah, thank you very much for having us. >> Our pleasure. >> Thank you. >> We appreciate it. With John Furrier, I'm Lisa Martin. You're watching theCUBE live from New York City at the AWS NYC Summit. John and I will be right back with our next guest, so stick around. (upbeat music) (gentle music)

Published Date : Jul 14 2022

SUMMARY :

We're going to be talking Thank you very much for having us. This is packed. Talk to us about some of the trends, I mean, the developers, you know, in the cloud and region. that you have and why And so customers, you know, we used to cover a show called OpenStack. And at the time, And it reminds me of the same trend we saw They're not that many out there yet. You want to go? And, I mean, you mentioned OpenStack. Well, Amazon had a lot to do And so it sounds like it's And that the reason why Well, after the Broadcom view, (John laughs) Yeah, let's not go there today. and some of the other Amazon tools. I mean, so, one of the you know, the thing about these who have, you know, standardized on EKS. of the New York City (John laughs) So I'm going to ask you guys, And that's the opportunity we're seeing. I think they're going to be very, I mean, this is happening whether, big driver in all of this. I mean, you talked about Because it's taking the is taking all the component pieces code is a big piece of it, is code too, the security. here's a snippet of code that you write that if you get them right, at the end. I just want to write my I'm coding away, I love coding. So that's more of But I love the tech. And then some want to If you like to play with the hardware, for managers to understand This is going to be with us Do what you love. the cluster, for example. Now you know who you are. I want to give you a minute Kubernetes in the early days, why don't you plug EKS too, come on. and the evolving portfolio And I'll give the plug And I think you're wearing the title. So if you want to work on it with me, But I'm lazy. So we got to (group laughs) So Haseeb, Kevin, thank you so much the power, why you shouldn't hate k8s. Yeah, thank you very much at the AWS NYC Summit.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Kevin Coleman	PERSON	0.99+
Kevin	PERSON	0.99+
John	PERSON	0.99+
Rafay	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Haseeb	PERSON	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
EKS	ORGANIZATION	0.99+
10	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Haseeb Budhani	PERSON	0.99+
2010	DATE	0.99+
Rafay Systems	ORGANIZATION	0.99+
20 things	QUANTITY	0.99+
12	QUANTITY	0.99+
Lisa	PERSON	0.99+
two people	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
one platform	QUANTITY	0.99+
two types	QUANTITY	0.99+
MoneyGram	ORGANIZATION	0.99+
15 products	QUANTITY	0.99+
one	QUANTITY	0.99+
OpenShift	TITLE	0.99+
Rafay	ORGANIZATION	0.99+
12 things	QUANTITY	0.98+
today	DATE	0.98+
Second one	QUANTITY	0.98+
8	QUANTITY	0.98+
10, 12,000 people	QUANTITY	0.98+
vCenter	TITLE	0.98+
Detroit	LOCATION	0.98+
12 years	QUANTITY	0.98+
New York City Summit	EVENT	0.97+
EKS A	TITLE	0.97+
Kubernetes	TITLE	0.97+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Wrap with Stu Miniman | Red Hat Summit 2022

(bright music) >> Okay, we're back in theCUBE. We said we were signing off for the night, but during the hallway track, we ran into old friend Stu Miniman who was the Director of Market Insights at Red Hat. Stu, friend of theCUBE done the thousands of CUBE interviews. >> Dave, it's great to be here. Thanks for pulling me on, you and I hosted Red Hat Summit before. It's great to see Paul here. I was actually, I was talking to some of the Red Hatters walking around Boston. It's great to have an event here. Boston's got strong presence and I understand, I think was either first or second year, they had it over... What's the building they're tearing down right down the road here. Was that the World Trade Center? I think that's where they actually held it, the first time they were here. We hosted theCUBE >> So they moved up. >> at the Hines Convention Center. We did theCUBE for summit at the BCEC next door. And of course, with the pandemic being what it was, we're a little smaller, nice intimate event here. It's great to be able to room the hall, see a whole bunch of people and lots watching online. >> It's great, it's around the same size as those, remember those Vertica Big Data events that we used to have here. And I like that you were commenting out at the theater and the around this morning for the keynotes, that was good. And the keynotes being compressed, I think, is real value for the attendees, you know? 'Cause people come to these events, they want to see each other, you know? They want to... It's like the band getting back together. And so when you're stuck in the keynote room, it's like, "Oh, it's okay, it's time to go." >> I don't know that any of us used to sitting at home where I could just click to another tab or pause it or run for, do something for the family, or a quick bio break. It's the three-hour keynote I hope has been retired. >> But it's an interesting point though, that the virtual event really is driving the physical and this, the way Red Hat marketed this event was very much around the virtual attendee. Physical was almost an afterthought, so. >> Right, this is an invite only for in-person. So you're absolutely right. It's optimizing the things that are being streamed, the online audience is the big audience. And we just happy to be in here to clap and do some things see around what you're doing. >> Wonderful see that becoming the norm. >> I think like virtual Stu, you know this well when virtual first came in, nobody had a clue with what they were doing. It was really hard. They tried different things, they tried to take the physical and just jam it into the virtual. That didn't work, they tried doing fun things. They would bring in a famous person or a comedian. And that kind of worked, I guess, but everybody showed up for that and then left. And I think they're trying to figure it out what this hybrid thing is. I've seen it both ways. I've seen situations like this, where they're really sensitive to the virtual. I've seen others where that's the FOMO of the physical, people want physical. So, yeah, I think it depends. I mean, reinvent last year was heavy physical. >> Yeah, with 15,000 people there. >> Pretty long keynotes, you know? So maybe Amazon can get away with it, but I think most companies aren't going to be able to. So what is the market telling you? What are these insights? >> So Dave just talking about Amazon, obviously, the world I live in cloud and that discussion of cloud, the journey that customers are going on is where we're spending a lot of the discussions. So, it was great to hear in the keynote, talked about our deep partnerships with the cloud providers and what we're doing to help people with, you like to call it super cloud, some call it hybrid, or multi-cloud... >> New name. (crosstalk) Meta-Cloud, come on. >> All right, you know if Che's my executive, so it's wonderful. >> Love it. >> But we'll see, if I could put on my VR Goggles and that will help me move things. But I love like the partnership announcement with General Motors today because not every company has the needs of software driven electric vehicles all over the place. But the technology that we build for them actually has ramifications everywhere. We've working to take Kubernetes and make it smaller over time. So things that we do at the edge benefit the cloud, benefit what we do in the data center, it's that advancement of science and technology just lifts all boats. >> So what's your take on all this? The EV and software on wheels. I mean, Tesla obviously has a huge lead. It's kind of like the Amazon of vehicles, right? It's sort of inspired a whole new wave of innovation. Now you've got every automobile manufacturer kind of go and after. That is the future of vehicles is something you followed or something you have an opinion on Stu? >> Absolutely. It's driving innovation in some ways, the way the DOS drove innovation on the desktop, if you remember the 64K DOS limit, for years, that was... The software developers came up with some amazing ways to work within that 64K limit. Then when it was gone, we got bloatware, but it actually does enforce a level of discipline on you to try to figure out how to make software run better, run more efficiently. And that has upstream impacts on the enterprise products. >> Well, right. So following your analogy, you talk about the enablement to the desktop, Linux was a huge influence on allowing the individual person to write code and write software, and what's happening in the EV, it's software platform. All of these innovations that we're seeing across industries, it's how is software transforming things. We go back to the mark end reasons, software's eating the world, open source is the way that software is developed. Who's at the intersection of all those? We think we have a nice part to play in that. I loved tha- Dave, I don't know if you caught at the end of the keynote, Matt Hicks basically said, "Our mission isn't just to write enterprise software. "Our mission is based off of open source because open source unlocks innovation for the world." And that's one of the things that drew me to Red Hat, it's not just tech in good places, but allowing underrepresented, different countries to participate in what's happening with software. And we can all move that ball forward. >> Well, can we declare victory for open source because it's not just open source products, but everything that's developed today, whether proprietary or open has open source in it. >> Paul, I agree. Open source is the development model period, today. Are there some places that there's proprietary? Absolutely. But I had a discussion with Deepak Singh who's been on theCUBE many times. He said like, our default is, we start with open source code. I mean, even Amazon when you start talking about that. >> I said this, the $70 billion business on open source. >> Exactly. >> Necessarily give it back, but that say, Hey, this is... All's fair in tech and more. >> It is interesting how the managed service model has sort of rescued open source, open source companies, that were trying to do the Red Hat model. No one's ever really successfully duplicated the Red Hat model. A lot of companies were floundering and failing. And then the managed service option came along. And so now they're all cloud service providers. >> So the only thing I'd say is that there are some other peers we have in the industry that are built off open source they're doing okay. The recent example, GitLab and Hashicorp, both went public. Hashi is doing some managed services, but it's not the majority of their product. Look at a company like Mongo, they've heavily pivoted toward the managed service. It is where we see the largest growth in our area. The products that we have again with Amazon, with Microsoft, huge growth, lots of interest. It's one of the things I spend most of my time talking on. >> I think Databricks is another interesting example 'cause Cloudera was the now company and they had the sort of open core, and then they had the proprietary piece, and they've obviously didn't work. Databricks when they developed Spark out of Berkeley, everybody thought they were going to do kind of a similar model. Instead, they went for all in managed services. And it's really worked well, I think they were ahead of that curve and you're seeing it now is it's what customers want. >> Well, I mean, Dave, you cover the database market pretty heavily. How many different open source database options are there today? And that's one of the things we're solving. When you look at what is Red Hat doing in the cloud? Okay, I've got lots of databases. Well, we have something called, it's Red Hat Open Database Access, which is from a developer, I don't want to have to think about, I've got six different databases, which one, where's the repository? How does all that happen? We give that consistency, it's tied into OpenShift, so it can help abstract some of those pieces. we've got same Kafka streaming and we've got APIs. So it's frameworks and enablers to help bridge that gap between the complexity that's out there, in the cloud and for the developer tool chain. >> That's really important role you guys play though because you had this proliferation, you mentioned Mongo. So many others, Presto and Starbursts, et cetera, so many other open source options out there now. And companies, developers want to work with multiple databases within the same application. And you have a role in making that easy. >> Yeah, so and that is, if you talk about the question I get all the time is, what's next for Kubernetes? Dave, you and I did a preview for KubeCon and it's automation and simplicity that we need to be. It's not enough to just say, "Hey, we've got APIs." It's like Dave, we used to say, "We've got standards? Great." Everybody's implementation was a little bit different. So we have API Sprawl today. So it's building that ecosystem. You've been talking to a number of our partners. We are very active in the community and trying to do things that can lift up the community, help the developers, help that cloud native ecosystem, help our customers move faster. >> Yeah API's better than scripts, but they got to be managed, right? So, and that's really what you guys are doing that's different. You're not trying to own everything, right? It's sort of antithetical to how billions and trillions are made in the IT industry. >> I remember a few years ago we talked here, and you look at the size that Red Hat is. And the question is, could Red Hat have monetized more if the model was a little different? It's like, well maybe, but that's not the why. I love that they actually had Simon Sinek come in and work with Red Hat and that open, unlocks the world. Like that's the core, it's the why. When I join, they're like, here's a book of Red Hat, you can get it online and that why of what we do, so we never have to think of how do we get there. We did an acquisition in the security space a year ago, StackRox, took us a year, it's open source. Stackrox.io, it's community driven, open source project there because we could have said, "Oh, well, yeah, it's kind of open source and there's pieces that are open source, but we want it to be fully open source." You just talked to Gunnar about how he's RHEL nine, based off CentOS stream, and now developing out in the open with that model, so. >> Well, you were always a big fan of Whitehurst culture book, right? It makes a difference. >> The open organization and right, Red Hat? That culture is special. It's definitely interesting. So first of all, most companies are built with the hierarchy in mind. Had a friend of mine that when he joined Red Hat, he's like, I don't understand, it's almost like you have like lots of individual contractors, all doing their things 'cause Red Hat works on thousands of projects. But I remember talking to Rackspace years ago when OpenStack was a thing and they're like, "How do you figure out what to work on?" "Oh, well we hired great people and they work on what's important to them." And I'm like, "That doesn't sound like a business." And he is like, "Well, we struggle sometimes to that balance." Red Hat has found that balance because we work on a lot of different projects and there are people inside Red Hat that are, you know, they care more about the project than they do the business, but there's the overall view as to where we participate and where we productize because we're not creating IP because it's all an open source. So it's the monetizations, the relationships we have our customers, the ecosystems that we build. And so that is special. And I'll tell you that my line has been Red Hat on the inside is even more Red Hat. The debates and the discussions are brutal. I mean, technical people tearing things apart, questioning things and you can't be thin skinned. And the other thing is, what's great is new people. I've talked to so many people that started at Red Hat as interns and will stay for seven, eight years. And they come there and they have as much of a seat at the table, and when I talk to new people, your job, is if you don't understand something or you think we might be able to do it differently, you better speak up because we want your opinion and we'll take that, everybody takes that into consideration. It's not like, does the decision go all the way up to this executive? And it's like, no, it's done more at the team. >> The cultural contrast between that and your parent, IBM, couldn't be more dramatic. And we talked earlier with Paul Cormier about has IBM really walked the walk when it comes to leaving Red Hat alone. Naturally he said, "Yes." Well what's your perspective. >> Yeah, are there some big blue people across the street or something I heard that did this event, but look, do we interact with IBM? Of course. One of the reasons that IBM and IBM Services, both products and services should be able to help get us breadth in the marketplace. There are times that we go arm and arm into customer meetings and there are times that customers tell us, "I like Red Hat, I don't like IBM." And there's other ones that have been like, "Well, I'm a long time IBM, I'm not sure about Red Hat." And we have to be able to meet all of those customers where they are. But from my standpoint, I've got a Red Hat badge, I've got a Red Hat email, I've got Red Hat benefits. So we are fiercely independent. And you know, Paul, we've done blogs and there's lots of articles been written is, Red Hat will stay Red Hat. I didn't happen to catch Arvin I know was on CNBC today and talking at their event, but I'm sure Red Hat got mentioned, but... >> Well, he talks about Red Hat all time. >> But in his call he's talking backwards. >> It's interesting that he's not here, greeting this audience, right? It's again, almost by design, right? >> But maybe that's supposed to be... >> Hundreds of yards away. >> And one of the questions being in the cloud group is I'm not out pitching IBM Cloud, you know? If a customer comes to me and asks about, we have a deep partnership and IBM will be happy to tell you about our integrations, as opposed to, I'm happy to go into a deep discussion of what we're doing with Google, Amazon, and Microsoft. So that's how we do it. It's very different Dave, from you and I watch really closely the VMware-EMC, VMware-Dell, and how that relationship. This one is different. We are owned by IBM, but we mostly, it does IBM fund initiatives and have certain strategic things that are done, absolutely. But we maintain Red Hat. >> But there are similarities. I mean, VMware crowd didn't want to talk about EMC, but they had to, they were kind of forced to. Whereas, you're not being forced to. >> And then once Dell came in there, it was joint product development. >> I always thought a spin in. Would've been the more effective, of course, Michael Dell and Egon wouldn't have gotten their $40 billion out. But I think a spin in was more natural based on where they were going. And it would've been, I think, a more dominant position in the marketplace. They would've had more software, but again, financially it wouldn't have made as much sense, but that whole dynamic is different. I mean, but people said they were going to look at VMware as a model and it's been largely different because remember, VMware of course was a separate company, now is a fully separate company. Red Hat was integrated, we thought, okay, are they going to get blue washed? We're watching and watching, and watching, you had said, well, if the Red Hat culture isn't permeating IBM, then it's a failure. And I don't know if that's happening, but it's definitely... >> I think a long time for that. >> It's definitely been preserved. >> I mean, Dave, I know I read one article at the beginning of the year is, can Arvin make IBM, Microsoft Junior? Follow the same turnaround that Satya Nadella drove over there. IBM I think making some progress, I mean, I read and watch what you and the team are all writing about it. And I'll withhold judgment on IBM. Obviously, there's certain financial things that we'd love to see IBM succeed. We worry about our business. We do our thing and IBM shares our results and they've been solid, so. >> Microsoft had such massive cash flow that even bomber couldn't screw it up. Well, I mean, this is true, right? I mean, you think about how were relevant Microsoft was in the conversation during his tenure and yet they never got really... They maintained a position so that when the Nadella came in, they were able to reascend and now are becoming that dominant player. I mean, IBM just doesn't have that cash flow and that luxury, but I mean, if he pulls it off, he'll be the CEO of the decade. >> You mentioned partners earlier, big concern when the acquisition was first announced, was that the Dells and the HP's and the such wouldn't want to work with Red Hat anymore, you've sort of been here through that transition. Is that an issue? >> Not that I've seen, no. I mean, the hardware suppliers, the ISVs, the GSIs are all very important. It was great to see, I think you had Accenture on theCUBE today, obviously very important partner as we go to the cloud. IBM's another important partner, not only for IBM Cloud, but IBM Services, deep partnership with Azure and AWS. So those partners and from a technology standpoint, the cloud native ecosystem, we talked about, it's not just a Red Hat product. I constantly have to talk about, look, we have a lot of pieces, but your developers are going to have other tools that they're going to use and the security space. There is no such thing as a silver bullet. So I've been having some great conversations here already this week with some of our partners that are helping us to round out that whole solution, help our customers because it has to be, it's an ecosystem. And we're one of the drivers to help that move forward. >> Well, I mean, we were at Dell Tech World last week, and there's a lot of talk about DevSecOps and DevOps and Dell being more developer friendly. Obviously they got a long way to go, but you can't have that take that posture and not have a relationship with Red Hat. If all you got is Pivotal and VMware, and Tansu >> I was thrilled to hear the OpenShift mention in the keynote when they talked about what they were doing. >> How could you not, how could you have any credibility if you're just like, Oh, Pivotal, Pivotal, Pivotal, Tansu, Tansu. Tansu is doing its thing. And they smart strategy. >> VMware is also a partner of ours, but that we would hope that with VMware being independent, that does open the door for us to do more with them. >> Yeah, because you guys have had a weird relationship with them, under ownership of EMC and then Dell, right? And then the whole IBM thing. But it's just a different world now. Ecosystems are forming and reforming, and Dell's building out its own cloud and it's got to have... Look at Amazon, I wrote about this. I said, "Can you envision the day where Dell actually offers competitive products in its suite, in its service offering?" I mean, it's hard to see, they're not there yet. They're not even close. And they have this high say/do ratio, or really it's a low say/do, they say high say/do, but look at what they did with Nutanix. You look over- (chuckles) would tell if it's the Cisco relationship. So it's got to get better at that. And it will, I really do believe. That's new thinking and same thing with HPE. And, I don't know about Lenovo that not as much of an ecosystem play, but certainly Dell and HPE. >> Absolutely. Michael Dell would always love to poke at HPE and HP really went very far down the path of their own products. They went away from their services organization that used to be more like IBM, that would offer lots of different offerings and very much, it was HP Invent. Well, if we didn't invent it, you're not getting it from us. So Dell, we'll see, as you said, the ecosystems are definitely forming, converging and going in lots of different directions. >> But your position is, Hey, we're here, we're here to help. >> Yeah, we're here. We have customers, one of the best proof points I have is the solution that we have with Amazon. Amazon doesn't do the engineering work to make us a native offering if they didn't have the customer demand because Amazon's driven off of data. So they came to us, they worked with us. It's a lot of work to be able to make that happen, but you want to make it frictionless for customers so that they can adopt that. That's a long path. >> All right, so evening event, there's a customer event this evening upstairs in the lobby. Microsoft is having a little shin dig, and then serves a lot of customer dinners going on. So Stu, we'll see you out there tonight. >> All right, thanks you. >> Were watching a brewing somewhere. >> Keynotes tomorrow, a lot of good sessions and enablement, and yeah, it's great to be in person to be able to bump some people, meet some people and, Hey, I'm still a year and a half in still meeting a lot of my peers in person for the first time. >> Yeah, and that's kind of weird, isn't it? Imagine. And then we kick off tomorrow at 10:00 AM. Actually, Stephanie Chiras is coming on. There she is in the background. She's always a great guest and maybe do a little kickoff and have some fun tomorrow. So this is Dave Vellante for Stu Miniman, Paul Gillin, who's my co-host. You're watching theCUBEs coverage of Red Hat Summit 2022. We'll see you tomorrow. (bright music)

Published Date : May 11 2022

SUMMARY :

but during the hallway track, Was that the World Trade Center? at the Hines Convention Center. And I like that you were It's the three-hour keynote that the virtual event really It's optimizing the things becoming the norm. and just jam it into the virtual. aren't going to be able to. a lot of the discussions. Meta-Cloud, come on. All right, you know But the technology that we build for them It's kind of like the innovation on the desktop, And that's one of the things Well, can we declare I mean, even Amazon when you start talking the $70 billion business on open source. but that say, Hey, this is... the managed service model but it's not the majority and then they had the proprietary piece, And that's one of the And you have a role in making that easy. I get all the time is, are made in the IT industry. And the question is, Well, you were always a big fan the relationships we have our customers, And we talked earlier One of the reasons that But in his call he's talking that's supposed to be... And one of the questions I mean, VMware crowd didn't And then once Dell came in there, Would've been the more I think a long time It's definitely been at the beginning of the year is, and that luxury, the HP's and the such I mean, the hardware suppliers, the ISVs, and not have a relationship with Red Hat. the OpenShift mention in the keynote And they smart strategy. that does open the door for us and it's got to have... the ecosystems are definitely forming, But your position is, Hey, is the solution that we have with Amazon. So Stu, we'll see you out there tonight. Were watching a brewing person for the first time. There she is in the background.

ENTITIES

Entity	Category	Confidence
Google	ORGANIZATION	0.99+
Paul	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Stu Miniman	PERSON	0.99+
General Motors	ORGANIZATION	0.99+
Paul Gillin	PERSON	0.99+
Dave	PERSON	0.99+
seven	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
Stephanie Chiras	PERSON	0.99+
HP	ORGANIZATION	0.99+
Matt Hicks	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Gunnar	PERSON	0.99+
Paul Cormier	PERSON	0.99+
Deepak Singh	PERSON	0.99+
$40 billion	QUANTITY	0.99+
Boston	LOCATION	0.99+
Databricks	ORGANIZATION	0.99+
Berkeley	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Satya Nadella	PERSON	0.99+
HPE	ORGANIZATION	0.99+
$70 billion	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Simon Sinek	PERSON	0.99+
Stu	PERSON	0.99+
last week	DATE	0.99+
Hashicorp	ORGANIZATION	0.99+
GitLab	ORGANIZATION	0.99+
Dells	ORGANIZATION	0.99+
Lenovo	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
15,000 people	QUANTITY	0.99+
Red Hat	TITLE	0.99+
Michael Dell	PERSON	0.99+
64K	QUANTITY	0.99+
last year	DATE	0.99+
Arvin	PERSON	0.99+
VMware	ORGANIZATION	0.99+
Red Hat	ORGANIZATION	0.99+

Steven Mih, Ahana & Girish Baliga, Uber | CUBE Conversation

(bright music) >> Hey everyone, welcome to this CUBE conversation featuring Ahana, I'm your host Lisa Martin. I've got two guests here with me today. Steven Mih joins us, the Presto Foundation governing board member, co-founder and CEO of Ahana, and Girish Baliga Presto Foundation governing board chair and senior engineering manager at Uber. Guys thanks for joining us. >> Thanks for having us. >> Thanks for having us. >> So Steven we're going to dig into and unpack Presto in the next few minutes or so, but Steven let's go ahead and start with you. Talk to us about some of the challenges with the open data lake house market. What are some of those key challenges that organizations are facing? >> Yeah, just pulling up the slide you know, what we see is that many organizations are dealing with a lot more data and very different data types and putting that all into, traditionally as the data warehouse, which has been the workhorse for BI and analytics traditionally, it becomes very, very expensive, and there's a lot of lock in associated with that. And so what's happening is that people are putting the data semistructured and unstructured data for example, in cloud data lakes or other data lakes, and they find that they can query directly with a SQL query engine like Presto. And that lets you have a much more approach to dealing with getting insights out of your data. And that's what this is all about, and that's why companies are moving to a modern architecture. Girish maybe you can share some of your thoughts on how Uber uses Presto for this. >> Yeah, at Uber we use Presto in our internal deployments. So at Uber we have our own data centers, we store data locally in our data centers, but we have made the conscious choice to go with an open data stack. Our entire data stack is built around open source technologies like Hadoop, Hive, Spark and Presto. And so Presto is an invaluable engine that is able to connect to all these different storage and data formats and allow us to have a single entry point for our users, to run their SQL engines and get insights rather quickly compared to some of the other engines that we have at Uber. >> So let's talk a little bit about Presto so that the audience gets a good overview of that. Steven starting with you, you talked about the challenges of the traditional data warehouse application. Talk to us about why Presto was founded the open, the project, give us that background information if you will. >> Absolutely, so Presto was originally developed out of the biggest hyperscaler out there which is Facebook now known as Meta. And they donated that project to the, and open sourced it and donated it to the Linux Foundation. And so Presto is a SQL query engine, it's a storage SQL query engine, that runs directly on open data lakes, so you can put your data into open formats like 4K or C, and get insights directly from that at a very good price performance ratio. The Presto Foundation of which Girish and I are part of, we're all working together as a consortium of companies that all want to see Presto continue to get bigger and bigger. Kind of like Kubernetes has a, has an organization called CNCF, Presto has Presto Foundation all under the umbrella of the Linux Foundation. And so there's a lot of exciting things that are coming on the roadmap that make Presto very unique. You know, RaptorX is a multilevel caching system that it's been fantastic, Aria optimizations are another area, we Ahana have developed some security features with donating the integrations with Apache Ranger and that's the type of things that we do to help the community. But maybe Girish can talk about some of the exciting items on the roadmap that you're looking forward to. >> Absolutely, I think from Uber's point of view just a sheer scale of data and our volume of query traffic. So we run about half a million Presto queries a day, right? And we have thousands of machines in our Presto deployments. So at that scale in addition to functionality you really want a system that can handle traffic reliably, that can scale, and that is backed by a strong community which guarantees that if you pull in the new version of Presto, you won't break anything, right? So all of those things are very important to us. So I think that's where we are relying on our partners particularly folks like Facebook and Twitter and Ahana to build and maintain this ecosystem that gives us those guarantees. So that is on the reliability front, but on the roadmap side we are also excited to see where Presto is extending. So in addition to the projects that Steven talked about, we are also looking at things like Presto and Spark, right? So take the Presto SQL and run it as a Spark job for instance, or running Presto on real-time analytics applications something that we built and contributed from Uber side. So we are all taking it in very different directions, we all have different use cases to support, and that's the exciting thing about the foundation. That it allows us all to work together to get Presto to a bigger and better and more flexible engine. >> You guys mentioned Facebook and I saw on the slide I think Twitter as well. Talk to me about some of the organizations that are leveraging the Presto engine and some of the business benefits. I think Steve you talked about insights, Steven obviously being able to get insights from data is critical for every business these days. >> Yeah, a major, major use case is finding the ad hoc and interactive queries, and being able to drive insights from doing so. And so, as I mentioned there's so much data that's being generated and stored, and to be able to query that data in place, at a, with very, very high performance, meaning that you can get answers back in seconds of time. That lets you have the interactive ability to drill into data and innovate your business. And so this is fantastic because it's been developed at hyperscalers like Uber that allow you to have open source technology, pick that up, and just download it right from prestodb.io, and then start to run with this and join the community. I think from an open source perspective this project under the governance of Linux Foundation gives you the confidence that it's fully transparent and you'll never see any licensing changes by the Linux Foundation charter. And therefore that means the technology remains free forever without later on limitations occurring, which then would perhaps favor commercialization of any one vendor. That's not the case. So maybe Girish your thoughts on how we've been able to attract industry giants to collaborate, to innovate further, and your thoughts on that. >> Yeah, so of the interesting I've seen in the space is that there is a bifurcation of companies in this ecosystem. So there are these large internet scale companies like Facebook, and Uber, and Twitter, which basically want to use something like Presto for their internal use cases. And then there is the second set of companies, enterprise companies like Ahana which basically wanted to take Presto and provide it as a service for other companies to use as an alternative to things like Snowflake and other systems right? So, and the foundation is a great place for both sets of companies to come together and work. The internet scale companies bring in the scale, the reliability, the different kind of ways in which you can challenge the system, optimize it, and so forth, and then companies like Ahana bring in the flexibility and the extensibility. So you can work with different clouds, different storage formats, different engines, and I think it's a great partnership that we can see happening primarily through the foundational spaces. Which you would be hard pressed to find in a single vendor or a, you know, a single-source system that is there on the market today. >> How long ago was the Presto Foundation initiated? >> It's been over three years now and it's been going strong, we're over a dozen members and it's open to everyone. And it's all governed like the Linux Foundation so we use best practices from that and you can just check it out at prestodb.io where you can get the software, or you can hear about how to join the foundation. So it includes members like Intel, and HPE as well, and we're really excited for new members to come, and contribute in and participate. >> Sounds like you've got good momentum there in the foundation. Steven talk a little bit about the last two years. Have you seen the acceleration in use cases in the number of users as we've been in such an interesting environment where the need for real-time insights is essential for every business initially a few couple of years ago to survive but now to be, to really thrive, is it, have you seen the acceleration in Presto in that timeframe? >> Absolutely, we see there's acceleration of being more data-driven and especially moving to cloud and having more data in the cloud, we think that innovation is happening, digital innovation is happening very fast and Presto is a major enabler of that, again, being able to get, drive insights from the data this is not just your typical business data, it's now getting into really clickstream data, knowing about how customers are operating today, Uber is a great example of all the different types of innovations they can drive, whether it be, you know, knowing in real time what's happening with rides, or offering you a subscription for special deals to use the service more. So, you know, Ahana we really love Presto, and we provide a SaaS manage service of the open source and provide free trials, and help people get up to speed that may not have the same type of skills as Uber or Facebook does. And we work with all companies in that way. >> Think about the consumers these days, we're very demanding, right? When I think one of the things that was in short supply during the last two years was patience. And if I think of Uber as a great example, I want to know if I'm asking for a ride I want to know exactly in real time what's coming for me? Where is it now? How many more minutes is it going to take? I mean, that need to fulfill real-time insights is critical across every industry but have you seen anything in the last couple years that's been more leading edge, like e-commerce or retail for example? I'm just curious. >> Girish you want to take that one or? >> Yeah, sure. So I can speak from the Uber point of view. So real-time insights has really exploded as an area, particularly as you mentioned with this just-in-time economy, right? Just to talk about it a little bit from Uber side, so some of the insights that you mentioned about when is your ride coming, and things of that nature, right? Look at it from the driver's point of view who are, now we have Uber Eats, so look at it from the restaurant manager's point of view, right? They also want to know how is their business coming? How many customer orders are coming for instance? what is the conversion rate? And so forth, right? And today these are all insights that are powered by a system which has a Presto as an front-end interface at Uber. And these queries run like, you have like tens of thousands of queries every single second, and the queries run in like a second and so forth. So you are really talking about production systems running on top of Presto, production serving systems. So coming to other use cases like eCommerce, we definitely have seen some of that uptake happen as well, so in the broader community for instance, we have companies like Stripe, and other folks who are also using this hashtag which is very similar to us based on another open source technology called Pino, using Presto as an interface. And so we are seeing this whole open data lakehouse more from just being, you know, about interactive analytics to driving all different kinds of analytics. Having anything to do with data and insights in this space. >> Yeah, sounds like the evolution has been kind of on a rocket ship the last couple years. Steven, one more time we're out of time, but can you mention that URL where folks can go to learn more? >> Yeah, prestodb.io and that's the Presto Foundation. And you know, just want to say that we'll be sharing the use case at the Startup Showcase coming up with theCUBE. We're excited about that and really welcome everyone to join the community, it's a real vibrant, expanding community and look forward to seeing you online. >> Sounds great guys. Thank you so much for sharing with us what Presto Foundation is doing, all of the things that it is catalyzing, great stuff, we look forward to hearing that customer use case, thanks for your time. >> Thank you. >> Thanks Lisa, thank you. >> Thanks everyone. >> For Steven and Girish, I'm Lisa Martin, you're watching theCUBE the leader in live tech coverage. (bright music)

Published Date : Mar 24 2022

SUMMARY :

and Girish Baliga Presto in the next few minutes or so, And that lets you have that is able to connect to so that the audience gets and that's the type of things that we do So that is on the reliability front, and some of the business benefits. and then start to run with So, and the foundation is a great place and it's open to everyone. in the number of users as we've been and having more data in the cloud, I mean, that need to fulfill so some of the insights that you mentioned Yeah, sounds like the evolution and look forward to seeing you online. all of the things that it For Steven and Girish, I'm Lisa Martin,

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Steven	PERSON	0.99+
Steve	PERSON	0.99+
Girish	PERSON	0.99+
Lisa	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Steven Mih	PERSON	0.99+
Presto Foundation	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Ahana	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
CNCF	ORGANIZATION	0.99+
Twitter	ORGANIZATION	0.99+
Intel	ORGANIZATION	0.99+
two guests	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
Presto	ORGANIZATION	0.99+
second set	QUANTITY	0.99+
both sets	QUANTITY	0.99+
over three years	QUANTITY	0.99+
Ahana	PERSON	0.98+
Kubernetes	ORGANIZATION	0.98+
Spark	TITLE	0.97+
Girish Baliga	PERSON	0.97+
about half a million	QUANTITY	0.97+
today	DATE	0.97+
over a dozen members	QUANTITY	0.96+
one	QUANTITY	0.96+
Presto	TITLE	0.96+
SQL	TITLE	0.95+
single	QUANTITY	0.95+
thousands of machines	QUANTITY	0.94+
every single second	QUANTITY	0.93+
Girish Baliga Presto Foundation	ORGANIZATION	0.92+
prestodb.io	OTHER	0.91+
last couple years	DATE	0.9+
4K	OTHER	0.89+
Startup Showcase	EVENT	0.88+
one vendor	QUANTITY	0.88+

Fangjin Yang, Imply.io | CUBE Conversation

(bright upbeat music) >> Welcome, everyone, to this CUBE Conversation featuring Imply. I'm your host, Lisa Martin. Today, we are excited to be joined by FJ Yang, the co-founder and CEO of Imply. FJ, thanks so much for joining us today. >> Lisa, thank you so much for having me. >> Tell me a little bit about yourself and about Imply. >> Yeah, absolutely. So, I started Imply a couple years ago and before start the company, I was a technologist. So, I was a software engineer and software developer primarily specializing in distributed systems. And one of the projects I worked on, ultimately became kind of the centerpiece behind Imply. Imply, as a company is a database company. What we do is we provide developers a powerful tool in order to help them build various types of data analytic applications. We're also an open source company, where the company develops a popular open source project called Apache Druid. >> Got it, so database as a service for modern analytics applications. You're also one of the original authors of Apache Druid. Talk to me, gimme a timeline, Druid's 10-year history or so. What's the big picture? What's been the market evolution that you've seen? >> Yeah, absolutely. So, I moved out to Silicon Valley basically to try and work at a startup, 'cause I was enamored with startups and I thought they were the coolest thing ever. So, at one point, I basically joined the smallest startup I could find. It was a startup called Metamarkets, which actually doesn't exist anymore, it was ultimately acquired by Snapchat a couple years ago. But, I was one of the first employees there. And what we were trying to do at the time, was we were trying to build an analytics application, a user-facing application where people could slice and dice various types of data. At the time, the data sets we were working with were like online advertising, digital advertising data sets which were very large and complex. And, we really struggled to find a database that could basically power the kind of interactive and user experience that we know we want to provide our end customers. So, what ended up happening was we decided to build our own database and we were a three or five-person shop when we decided to build our own database, and that was Druid. And over time, we saw many other types of companies actually struggle with a similar set of problems, albeit with very different types of use cases and very different types of data sets. And, the Druid community kind of grew and evolved from that. And in my work in engaging with the community, what I saw was a market opportunity and a market gap and that's where Imply formed. >> Let's double click on that. You talked about why you built Druid, the problem you were looking to solve. But, talk to me about the role that Imply has. >> Right. So, Imply is a commercial company. What we do is we build kind of an end-to-end enterprise product around Druid as the core engine. Imply provides deployment, it deploys management, it provides security, and it also provides visualization and monitoring pieces around Druid as a core engine. What we aim to do at Imply is really enable developers to build various types of data applications with only the click of a few buttons and interacting with a simple set of APIs. So, the goal is, if you're a developer, you don't have to think about managing the database yourself, you don't have to think about the operational complexity at the database, but instead, what you do is just work with APIs and build your application. >> So, then what gives Druid its superpower? What makes Druid Druid? >> Yeah, so, Druid, the easiest way to think about it, is it's a really fast calculator and it's a very fast calculator for a whole lot of data. So, when you have a whole lot of data and you want to crunch numbers very, very quickly, Druid is very good at doing that. And, people always ask me this question, which is, what makes Druid special? And I always struggle with it, because it's never just one thing, it's actually layers, upon layers, upon layers of engineering. You start with fundamentals of how you maximally optimize the resources of any hardware. So, how do you maximize storage? How do you maximize compute? And then, there's a lot of optimizations around how do you store the data? How do you access that data in a very fast way once it's stored in order to run computations very quickly? So, unfortunately, there's no silver bullet about Druid, but maybe I can summarize in this way. Druid, it's like a search system, and a data warehouse, and a time series database all mixed together. And, that architecture enables it to be very, very quickly. And unfortunately, if you don't know what some of the components I'm talking about are, it's hard to describe where the secret sauce is (chuckling). >> Sometimes you want to keep that secret sauce secret. Talk to me about the overall data space, as we see these days, every company is a data company or if it's not, it needs to be to be successful. Where does Druid fit in the overall data space? Give us that picture of where it fits. >> Yeah, absolutely. So, it's pretty interesting that you see now in the public markets as well as the private markets, some of the hottest unicorns out there are actually data companies. And, I think what people are are understanding now for the first time, is just how vast and complex the data space is and also how large the market is as well. So for sure, there's many different components and pieces in the data space, and they oftentimes come together to form what's known as a data stack. So, data stack is basically kind of an architecture that has various systems and each of these systems are designed to do a certain set of things very, very well. For example, a company that recently went public is a company called Confluent, which mostly catered towards data transport, so getting data from one place to another. They're built around an open source engine called Apache Kafka. Databricks is another mega unicorn that's going to go public pretty soon. And they're built around an open source project called Spark, which is mainly used for data processing. Where we sit is on the data query side. So, what that means is we're a system in which people can store data and then access that data very, very quickly. And there's other systems that do that, but where our bread and butter is, is we're building some sort of application, where you have end users that are clicking buttons in order to get access to data, we're a platform that enables the best end user experience. We return queries very, very quickly with a consistent SLA, we immediately visualize data as soon as it's made available, and then we can support many, many, many concurrent end users to access the system at the same time. >> So, real time. One of the things I think that we learned during the pandemic, one of the many things is that access to real time data, it's no longer a nice to have, it is table stakes for, as I said, every company, these days is a data company. So with how you describe it, how should people think of Druid versus a data warehouse? >> Yeah. So, that's a great question. And obviously, data warehouses have been around since the 70s. In the B2B space, they're among the largest players that kind of exist in enterprise software. So, it's only natural that when you come up with sort of a new analytics database, that people compare it with what they already know, which is data warehouse. So, a lot of how we think about why we're different than data warehouse goes back to how I answered the previous question, and that we're focused right now, really, on powering different types of data applications. Data applications are UIs in which people are really accessing and getting insights from data by clicking buttons versus writing more complex equal queries. And when you click buttons and you get access to data, what you want in terms of an end user experience, is you want answers to questions to come back almost immediately. So you don't want to click a button and then see a spinning dial that goes on for minute and minutes before an answer comes back. You basically want results to come back immediately. You want that experience no matter what types of queries that you're issuing or how many people are issuing those queries. If you have thousands, if not tens of thousands of people that are trying to access data exact same time, you want to give a consistent user experience like Google, which is one of my favorite products. There're millions of people that use Google, and ask questions and they get their answers back immediately. So we try to provide that same experience, but instead of a generic search engine, what we're doing is we're providing a system that basically answers questions on data and users get a very interactive and fast experience when asking questions. And that's something that I think is very different than what data warehouses are primarily specialized in. Data warehouses are really designed to be systems in which people write very large complex sequel queries that might take minutes or hours sometimes to run. But the experience of using a data warehouse to power and application is not a great one. >> So, I'm just curious, FJ, in the last couple of years, with, as I mentioned before the access to real time data no longer a nice to have, but it's something business critical for so many industries, did you see any industries in particular in the recent years that were really primed candidates for what Druid would can deliver? >> Yeah, that's a great question. And you can imagine that the industries that really heavily rely on fast decision making are the ones that are earliest to adopt technologies like this. So, in the security space, and the observability space, as well as working with networking and various forms of backend kind of metrics data, this system has been very popular and it's been popular because people need to triage (indistinct) as they occur, they need to resolve problems, and they also need immediate visibility, as well as very fast queries on data. Another space is online advertising. Online advertising, nowadays is almost entirely programmatic and digital. So, response times are critical in order to make decisions. And that's where Druid was actually born. It was born for advertising before it kind of went everywhere else. We're seeing it more in fraud protection, fraud prevention as well as fraud diagnostics nowadays. We're seeing it in retail as well, which is pretty interesting. And, the goal, of course, is I believe every industry and every vertical needs the capabilities that we provide. So hopefully, we see a whole lot more use cases in the near future. >> Right, it's absolutely horizontal these days. So, 10-year history, you've got a community of thousands, what's the future of Druid? What do you see when you open the crystal ball and look now down the 12 months, 18 months road? >> Yeah. So, I think as a technologist, your goal as the technologist, at least for me, is to try and create technology that has as much applicability as possible and solves problems for as many people as possible. That's always the way I think about it. So, I want to do good engineering and I want to build good systems. And I think what the hallmark of a really good system is you can solve all different types of problems and condense all these different problems, actually into the same set of models and the same set of principles. And, a thing that makes me most excited about Druid is the many, many different industries that it's found value and the many different use cases it's found value. So, if I were to give 30,000 foot roadmap, that's what we're trying to do with the next generation of Druid. We're actually doing a pretty major engine upgrade right now, and pretty major overhaul the entire system. And the goal of that is to take all the learnings that we've had over the last decade and to create something new that can solve an expanded set of problems that we've heard from the community and from other places as well. >> Excellent. FJ, exciting work that you've done the last 10 years. Congratulations on that. Looking forward to the roadmap that you talked about. Thanks for sharing what Druid is, the Imply connection, and all the different use cases where it applies. We appreciate your insights. >> Appreciate you having me on the show. Thank you very much. >> My pleasure. For FJ Yang, I'm Lisa Martin. You're watching this CUBE Conversation, the leader in live tech enterprise coverage. (bright upbeat music)

Published Date : Mar 23 2022

SUMMARY :

the co-founder and CEO of Imply. and before start the company, You're also one of the original At the time, the data sets we were working the problem you were looking to solve. So, the goal is, if you're a developer, of the components I'm talking about are, the overall data space? in the data space, One of the things I think So, a lot of how we think So, in the security space, and look now down the 12 and the same set of principles. and all the different use Appreciate you having me on the show. the leader in live tech

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
thousands	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Lisa	PERSON	0.99+
Snapchat	ORGANIZATION	0.99+
10-year	QUANTITY	0.99+
18 months	QUANTITY	0.99+
FJ Yang	PERSON	0.99+
three	QUANTITY	0.99+
Imply	ORGANIZATION	0.99+
Confluent	ORGANIZATION	0.99+
12 months	QUANTITY	0.99+
30,000 foot	QUANTITY	0.99+
Druid	TITLE	0.99+
each	QUANTITY	0.99+
one	QUANTITY	0.99+
Fangjin Yang	PERSON	0.99+
first time	QUANTITY	0.98+
Today	DATE	0.98+
Google	ORGANIZATION	0.98+
today	DATE	0.98+
millions of people	QUANTITY	0.98+
One	QUANTITY	0.98+
Imply.io	ORGANIZATION	0.97+
Metamarkets	ORGANIZATION	0.96+
five-person	QUANTITY	0.96+
first employees	QUANTITY	0.94+
tens of thousands of people	QUANTITY	0.94+
pandemic	EVENT	0.94+
last couple of years	DATE	0.91+
FJ	PERSON	0.91+
70s	DATE	0.89+
one thing	QUANTITY	0.89+
Databricks	ORGANIZATION	0.88+
one point	QUANTITY	0.87+
Druid	PERSON	0.84+
couple years ago	DATE	0.81+
last decade	DATE	0.75+
Apache Druid	ORGANIZATION	0.73+
Conversation	EVENT	0.73+
Apache	ORGANIZATION	0.72+
last 10 years	DATE	0.72+
double	QUANTITY	0.69+
Spark	TITLE	0.66+
my favorite products	QUANTITY	0.62+
CUBE Conversation	TITLE	0.58+
minutes	QUANTITY	0.54+
minute	QUANTITY	0.51+
Kafka	TITLE	0.41+
CUBE Conversation	EVENT	0.31+

Analyst Predictions 2022: The Future of Data Management

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

ENTITIES

Entity	Category	Confidence
381 databases	QUANTITY	0.99+
2014	DATE	0.99+
2022	DATE	0.99+
2021	DATE	0.99+
january of 2022	DATE	0.99+
100 users	QUANTITY	0.99+
jamal dagani	PERSON	0.99+
last week	DATE	0.99+
dave meninger	PERSON	0.99+
sanji	PERSON	0.99+
second question	QUANTITY	0.99+
15 converged data stores	QUANTITY	0.99+
dave vellante	PERSON	0.99+
microsoft	ORGANIZATION	0.99+
three	QUANTITY	0.99+
sanjeev	PERSON	0.99+
2023	DATE	0.99+
15 data stores	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
last year	DATE	0.99+
sanjeev mohan	PERSON	0.99+
six	QUANTITY	0.99+
two	QUANTITY	0.99+
carl	PERSON	0.99+
tony	PERSON	0.99+
carl olufsen	PERSON	0.99+
six years	QUANTITY	0.99+
david	PERSON	0.99+
carlos specter	PERSON	0.98+
both sides	QUANTITY	0.98+
2010s	DATE	0.98+
first backlash	QUANTITY	0.98+
five years	QUANTITY	0.98+
today	DATE	0.98+
dave	PERSON	0.98+
each	QUANTITY	0.98+
three quarters	QUANTITY	0.98+
first	QUANTITY	0.98+
single platform	QUANTITY	0.98+
lake house	ORGANIZATION	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
doug	PERSON	0.97+
one word	QUANTITY	0.97+
this year	DATE	0.97+
wikibon.com	OTHER	0.97+
one platform	QUANTITY	0.97+
39	QUANTITY	0.97+
about 600 percent	QUANTITY	0.97+
two analysts	QUANTITY	0.97+
ten years	QUANTITY	0.97+
single platform	QUANTITY	0.96+
five	QUANTITY	0.96+
one	QUANTITY	0.96+
three quarters	QUANTITY	0.96+
california	LOCATION	0.96+
google	ORGANIZATION	0.96+
single	QUANTITY	0.95+

Predictions 2022: Top Analysts See the Future of Data

(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)

Published Date : Jan 7 2022

SUMMARY :

and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well

ENTITIES

Entity	Category	Confidence
Dave Menninger	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Doug Henschen	PERSON	0.99+
David	PERSON	0.99+
Brad Shimmin	PERSON	0.99+
Doug	PERSON	0.99+
Tony Baer	PERSON	0.99+
Dave Velannte	PERSON	0.99+
Tony	PERSON	0.99+
Carl	PERSON	0.99+
Brad	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2014	DATE	0.99+
Sanjeev Mohan	PERSON	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Oracle	ORGANIZATION	0.99+
last year	DATE	0.99+
January of 2022	DATE	0.99+
three	QUANTITY	0.99+
381 databases	QUANTITY	0.99+
IDC	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
2021	DATE	0.99+
Google	ORGANIZATION	0.99+
Omdia	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
SanjMo	ORGANIZATION	0.99+
79%	QUANTITY	0.99+
second question	QUANTITY	0.99+
last week	DATE	0.99+
15 data stores	QUANTITY	0.99+
100%	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for one spark: