Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Tomer Shiran, Dremio | AWS re:Invent 2022

>>Hey everyone. Welcome back to Las Vegas. It's the Cube live at AWS Reinvent 2022. This is our fourth day of coverage. Lisa Martin here with Paul Gillen. Paul, we started Monday night, we filmed and streamed for about three hours. We have had shammed pack days, Tuesday, Wednesday, Thursday. What's your takeaway? >>We're routed final turn as we, as we head into the home stretch. Yeah. This is as it has been since the beginning, this show with a lot of energy. I'm amazed for the fourth day of a conference, how many people are still here I am too. And how, and how active they are and how full the sessions are. Huge. Proud for the keynote this morning. You don't see that at most of the day four conferences. Everyone's on their way home. So, so people come here to learn and they're, and they're still >>Learning. They are still learning. And we're gonna help continue that learning path. We have an alumni back with us, Toron joins us, the CPO and co-founder of Dremeo. Tomer, it's great to have you back on the program. >>Yeah, thanks for, for having me here. And thanks for keeping the, the best session for the fourth day. >>Yeah, you're right. I like that. That's a good mojo to come into this interview with Tomer. So last year, last time I saw you was a year ago here in Vegas at Reinvent 21. We talked about the growth of data lakes and the data lake houses. We talked about the need for open data architectures as opposed to data warehouses. And the headline of the Silicon Angle's article on the interview we did with you was, Dremio Predicts 2022 will be the year open data architectures replace the data warehouse. We're almost done with 2022. Has that prediction come true? >>Yeah, I think, I think we're seeing almost every company out there, certainly in the enterprise, adopting data lake, data lakehouse technology, embracing open source kind of file and table formats. And, and so I think that's definitely happening. Of course, nothing goes away. So, you know, data warehouses don't go away in, in a year and actually don't go away ever. We still have mainframes around, but certainly the trends are, are all pointing in that direction. >>Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, what it really means for organizations. >>Yeah. I think you could think of the data lakehouse as the evolution of the data lake, right? And so, you know, for, for, you know, the last decade we've had kind of these two options, data lakes and data warehouses and, you know, warehouses, you know, having good SQL support, but, and good performance. But you had to spend a lot of time and effort getting data into the warehouse. You got locked into them, very, very expensive. That's a big problem now. And data lakes, you know, more open, more scalable, but had all sorts of kind of limitations. And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache Iceberg, is we've unlocked all the capabilities of the warehouse directly on object storage like s3. So you can insert and update and delete individual records. You can do transactions, you can do all the things you could do with a, a database directly in kind of open formats without getting locked in at a much lower cost. >>But you're still dealing with semi-structured data as opposed to structured data. And there's, there's work that has to be done to get that into a usable form. That's where Drio excels. What, what has been happening in that area to, to make, I mean, is it formats like j s o that are, are enabling this to happen? How, how we advancing the cause of making semi-structured data usable? Yeah, >>Well, I think first of all, you know, I think that's all changed. I think that was maybe true for the original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. It's all, it's all tables with the schema. And, you know, you can, you know, create table insert records. You know, it's, it's, it's really everything you can do with a data warehouse you can now do in the lakehouse. Now, that's not to say that there aren't like very advanced capabilities when it comes to, you know, j s O and nested data and kind of sparse data. You know, we excel in that as well. But we're really seeing kind of the lakehouse take over the, the bread and butter data warehouse use cases. >>You mentioned open a minute ago. Talk about why it's, why open is important and the value that it can deliver for customers. >>Yeah, well, I think if you look back in time and you see all the challenges that companies have had with kind of traditional data architectures, right? The, the, the, a lot of that comes from the, the, the problems with data warehouses. The fact that they are, you know, they're very expensive. The data is, you have to ingest it into the data warehouse in order to query it. And then it's almost impossible to get off of these systems, right? It takes an enormous effort, tremendous cost to get off of them. And so you're kinda locked in and that's a big problem, right? You also, you're dependent on that one data warehouse vendor, right? You can only do things with that data that the warehouse vendor supports. And if you contrast that to data lakehouse and open architectures where the data is stored in entirely open formats. >>So things like par files and Apache iceberg tables, that means you can use any engine on that data. You can use s SQL Query Engine, you can use Spark, you can use flin. You know, there's a dozen different engines that you can use on that, both at the same time. But also in the future, if you ever wanted to try something new that comes out, some new open source innovation, some new startup, you just take it and point out the same data. So that data's now at the core, at the center of the architecture as opposed to some, you know, vendors logo. Yeah. >>Amazon seems to be bought into the Lakehouse concept. It has big announcements on day two about eliminating the ETL stage between RDS and Redshift. Do you see the cloud vendors as pushing this concept forward? >>Yeah, a hundred percent. I mean, I'm, I'm Amazon's a great, great partner of ours. We work with, you know, probably 10 different teams there. Everything from, you know, the S3 team, the, the glue team, the click site team, you know, everything in between. And, you know, their embracement of the, the, the lake house architecture, the fact that they adopted Iceberg as their primary table format. I think that's exciting as an industry. We're all coming together around standard, standard ways to represent data so that at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account in open formats and be able to use all these different engines without losing any of the functionality that they need, right? The ability to do all these interactions with data that maybe in the past you would have to move the data into a database or, or warehouse in order to do, you just don't have to do that anymore. Speaking >>Of functionality, talk about what's new this year with drio since we've seen you last. >>Yeah, there's a lot of, a lot of new things with, with Drio. So yeah, we now have full Apache iceberg support, you know, with DML commands, you can do inserts, updates, deletes, you know, copy into all, all that kind of stuff is now, you know, fully supported native part of the platform. We, we now offer kind of two flavors of dr. We have, you know, Dr. Cloud, which is our SaaS version fully hosted. You sign up with your Google or, you know, Azure account and, and, and you're up in, you're up and running in, in, in a minute. And then dral software, which you can self host usually in the cloud, but even, even even outside of the cloud. And then we're also very excited about this new idea of data as code. And so we've introduced a new product that's now in preview called Dr. >>Arctic. And the idea there is to bring the concepts of GI or GitHub to the world of data. So things like being able to create a branch and work in isolation. If you're a data scientist, you wanna experiment on your own without impacting other people, or you're a data engineer and you're ingesting data, you want to transform it and test it before you expose it to others. You can do that in a branch. So all these ideas that, you know, we take for granted now in the world of source code and software development, we're bringing to the world of data with Jamar. And when you think about data mesh, a lot of people talking about data mesh now and wanting to kind of take advantage of, of those concepts and ideas, you know, thinking of data as a product. Well, when you think about data as a product, we think you have to manage it like code, right? You have to, and that's why we call it data as code, right? The, all those reasons that we use things like GI have to build products, you know, if we wanna think of data as a product, we need all those capabilities also with data. You know, also the ability to go back in time. The ability to undo mistakes, to see who changed my data and when did they change that table. All of those are, are part of this, this new catalog that we've created. >>Are you talk about data as a product that's sort of intrinsic to the data mesh concept. Are you, what's your opinion of data mesh? Is the, is the world ready for that radically different approach to data ownership? >>You know, we are now in dozens of, dozens of our customers that are using drio for to implement enterprise-wide kind of data mesh solutions. And at the end of the day, I think it's just, you know, what most people would consider common sense, right? In a large organization, it is very hard for a centralized single team to understand every piece of data, to manage all the data themselves, to, you know, make sure the quality is correct to make it accessible. And so what data mesh is first and foremost about is being able to kind of federate the, or distribute the, the ownership of data, the governance of the data still has to happen, right? And so that is, I think at the heart of the data mesh, but thinking of data as kind of allowing different teams, different domains to own their own data to really manage it like a product with all the best practices that that we have with that super important. >>So we we're doing a lot with data mesh, you know, the way that cloud has multiple projects and the way that Jamar allows you to have multiple catalogs and different groups can kind of interact and share data among each other. You know, the fact that we can connect to all these different data sources, even outside your data lake, you know, with Redshift, Oracle SQL Server, you know, all the different databases that are out there and join across different databases in addition to your data lake, that that's all stuff that companies want with their data mesh. >>What are some of your favorite customer stories that where you've really helped them accelerate that data mesh and drive business value from it so that more people in the organization kind of access to data so they can really make those data driven decisions that everybody wants to make? >>I mean, there's, there's so many of them, but, you know, one of the largest tech companies in the world creating a, a data mesh where you have all the different departments in the company that, you know, they, they, they were a big data warehouse user and it kinda hit the wall, right? The costs were so high and the ability for people to kind of use it for just experimentation, to try new things out to collaborate, they couldn't do it because it was so prohibitively expensive and difficult to use. And so what they said, well, we need a platform that different people can, they can collaborate, they can ex, they can experiment with the data, they can share data with others. And so at a big organization like that, the, their ability to kind of have a centralized platform but allow different groups to manage their own data, you know, several of the largest banks in the world are, are also doing data meshes with Dr you know, one of them has over over a dozen different business units that are using, using Dremio and that ability to have thousands of people on a platform and to be able to collaborate and share among each other that, that's super important to these >>Guys. Can you contrast your approach to the market, the snowflakes? Cause they have some of those same concepts. >>Snowflake's >>A very closed system at the end of the day, right? Closed and very expensive. Right? I think they, if I remember seeing, you know, a quarter ago in, in, in one of their earnings reports that the average customer spends 70% more every year, right? Well that's not sustainable. If you think about that in a decade, that's your cost is gonna increase 200 x, most companies not gonna be able to swallow that, right? So companies need, first of all, they need more cost efficient solutions that are, you know, just more approachable, right? And the second thing is, you know, you know, we talked about the open data architecture. I think most companies now realize that the, if you want to build a platform for the future, you need to have the data and open formats and not be locked into one vendor, right? And so that's kind of another important aspect beyond that's ability to connect to all your data, even outside the lake to your different databases, no sequel databases, relational databases, and drs semantic layer where we can accelerate queries. And so typically what you have, what happens with data warehouses and other data lake query engines is that because you can't get the performance that you want, you end up creating lots and lots of copies of data. You, for every use case, you're creating a, you know, a pre-joy copy of that data, a pre aggregated version of that data. And you know, then you have to redirect all your data. >>You've got a >>Governance problem, individual things. It's expensive. It's expensive, it's hard to secure that cuz permissions don't travel with the data. So you have all sorts of problems with that, right? And so what we've done because of our semantic layer that makes it easy to kind of expose data in a logical way. And then our query acceleration technology, which we call reflections, which transparently accelerates queries and gives you subsecond response times without data copies and also without extracts into the BI tools. Cause if you start doing bi extracts or imports, again, you have lots of copies of data in the organization, all sorts of refresh problems, security problems, it's, it's a nightmare, right? And that just collapsing all those copies and having a, a simple solution where data's stored in open formats and we can give you fast access to any of that data that's very different from what you get with like a snowflake or, or any of these other >>Companies. Right. That, that's a great explanation. I wanna ask you, early this year you announced that your Dr. Cloud service would be a free forever, the basic DR. Cloud service. How has that offer gone over? What's been the uptake on that offer? >>Yeah, it, I mean it is, and thousands of people have signed up and, and it's, I think it's a great service. It's, you know, it's very, very simple. People can go on the website, try it out. We now have a test drive as well. If, if you want to get started with just some sample public sample data sets and like a tutorial, we've made that increasingly easy as well. But yeah, we continue to, you know, take that approach of, you know, making it, you know, making it easy, democratizing these kind of cloud data platforms and, and kinda lowering the barriers to >>Adoption. How, how effective has it been in driving sales of the enterprise version? >>Yeah, a lot of, a lot of, a lot of business with, you know, that, that we do like when it comes to, to selling is, you know, folks that, you know, have educated themselves, right? They've started off, they've followed some tutorials. I think generally developers, they prefer the first interaction to be with a product, not with a salesperson. And so that's, that's basically the reason we did that. >>Before we ask you the last question, I wanna just, can you give us a speak peek into the product roadmap as we enter 2023? What can you share with us that we should be paying attention to where Drum is concerned? >>Yeah. You know, actually a couple, couple days ago here at the conference, we, we had a press release with all sorts of new capabilities that we, we we just released. And there's a lot more for, for the coming year. You know, we will shortly be releasing a variety of different performance enhancements. So we'll be in the next quarter or two. We'll be, you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections and our career acceleration, you know, support for all the major clouds is coming. You know, just a lot of capabilities in Inre that make it easier and easier to use the platform. >>Awesome. Tomer, thank you so much for joining us. My last question to you is, if you had a billboard in your desired location and it was going to really just be like a mic drop about why customers should be looking at Drio, what would that billboard say? >>Well, DRIO is the easy and open data lake house and, you know, open architectures. It's just a lot, a lot better, a lot more f a lot more future proof, a lot easier and a lot just a much safer choice for the future for, for companies. And so hard to argue with those people to take a look. Exactly. That wasn't the best. That wasn't the best, you know, billboards. >>Okay. I think it's a great billboard. Awesome. And thank you so much for joining Poly Me on the program, sharing with us what's new, what some of the exciting things are that are coming down the pipe. Quite soon we're gonna be keeping our eye Ono. >>Awesome. Always happy to be here. >>Thank you. Right. For our guest and for Paul Gillin, I'm Lisa Martin. You're watching The Cube, the leader in live and emerging tech coverage.

Published Date : Dec 1 2022

SUMMARY :

It's the Cube live at AWS Reinvent This is as it has been since the beginning, this show with a lot of energy. it's great to have you back on the program. And thanks for keeping the, the best session for the fourth day. And the headline of the Silicon Angle's article on the interview we did with you was, So, you know, data warehouses don't go away in, in a year and actually don't go away ever. Describe the data lakehouse for anybody who may not be really familiar with that and, and what it's, And what we've done now as an industry with the Lake House, and especially with, you know, technologies like Apache are enabling this to happen? original data lakes, but now with the Lake house, you know, our bread and butter is actually structured data. You mentioned open a minute ago. The fact that they are, you know, they're very expensive. at the center of the architecture as opposed to some, you know, vendors logo. Do you see the at the end of the day, companies have this benefit of being able to, you know, have their own data in their own S3 account Apache iceberg support, you know, with DML commands, you can do inserts, updates, So all these ideas that, you know, we take for granted now in the world of Are you talk about data as a product that's sort of intrinsic to the data mesh concept. And at the end of the day, I think it's just, you know, what most people would consider common sense, So we we're doing a lot with data mesh, you know, the way that cloud has multiple several of the largest banks in the world are, are also doing data meshes with Dr you know, Cause they have some of those same concepts. And the second thing is, you know, you know, stored in open formats and we can give you fast access to any of that data that's very different from what you get What's been the uptake on that offer? But yeah, we continue to, you know, take that approach of, you know, How, how effective has it been in driving sales of the enterprise version? to selling is, you know, folks that, you know, have educated themselves, right? you know, probably twice as fast just in terms of rock qu speed, you know, that's in addition to our reflections My last question to you is, if you had a Well, DRIO is the easy and open data lake house and, you And thank you so much for joining Poly Me on the program, sharing with us what's new, Always happy to be here. the leader in live and emerging tech coverage.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Paul Gillen	PERSON	0.99+
Paul Gillin	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Tomer	PERSON	0.99+
Tomer Shiran	PERSON	0.99+
Toron	PERSON	0.99+
Las Vegas	LOCATION	0.99+
70%	QUANTITY	0.99+
Monday night	DATE	0.99+
Vegas	LOCATION	0.99+
fourth day	QUANTITY	0.99+
Paul	PERSON	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
dozens	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
10 different teams	QUANTITY	0.99+
Dremio	PERSON	0.99+
early this year	DATE	0.99+
SQL Query Engine	TITLE	0.99+
The Cube	TITLE	0.99+
Tuesday	DATE	0.99+
2023	DATE	0.99+
one	QUANTITY	0.98+
a year ago	DATE	0.98+
next quarter	DATE	0.98+
S3	TITLE	0.98+
a quarter ago	DATE	0.98+
twice	QUANTITY	0.98+
Oracle	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
Drio	ORGANIZATION	0.98+
couple days ago	DATE	0.98+
both	QUANTITY	0.97+
DRIO	ORGANIZATION	0.97+
2022	DATE	0.97+
Lake House	ORGANIZATION	0.96+
thousands of people	QUANTITY	0.96+
Wednesday	DATE	0.96+
Spark	TITLE	0.96+
200 x	QUANTITY	0.96+
first	QUANTITY	0.96+
Drio	TITLE	0.95+
Dremeo	ORGANIZATION	0.95+
two options	QUANTITY	0.94+
about three hours	QUANTITY	0.94+
day two	QUANTITY	0.94+
s3	TITLE	0.94+
Apache Iceberg	ORGANIZATION	0.94+
a minute ago	DATE	0.94+
Silicon Angle	ORGANIZATION	0.94+
hundred percent	QUANTITY	0.93+
Apache	ORGANIZATION	0.93+
single team	QUANTITY	0.93+
GitHub	ORGANIZATION	0.91+
this morning	DATE	0.9+
a dozen different engines	QUANTITY	0.89+
Iceberg	TITLE	0.87+
Redshift	TITLE	0.87+
last	DATE	0.87+
this year	DATE	0.86+
first interaction	QUANTITY	0.85+
two flavors	QUANTITY	0.84+
Thursday	DATE	0.84+
Azure	ORGANIZATION	0.84+
DR. Cloud	ORGANIZATION	0.84+
SQL Server	TITLE	0.83+
four conferences	QUANTITY	0.82+
coming year	DATE	0.82+
over over a dozen different business	QUANTITY	0.81+
one vendor	QUANTITY	0.8+
Poly	ORGANIZATION	0.79+
Jamar	PERSON	0.77+
GI	ORGANIZATION	0.77+
Inre	ORGANIZATION	0.76+
Dr.	ORGANIZATION	0.73+
Lake house	ORGANIZATION	0.71+
Arctic	ORGANIZATION	0.71+
a year	QUANTITY	0.7+
a minute	QUANTITY	0.7+
SQL	TITLE	0.69+
AWS Reinvent 2022	EVENT	0.69+
subsecond	QUANTITY	0.68+
DML	TITLE	0.68+

Breaking Analysis: CEO Nuggets from Microsoft Ignite & Google Cloud Next

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR, this is Breaking Analysis with Dave Vellante. >> This past week we saw two of the Big 3 cloud providers present the latest update on their respective cloud visions, their business progress, their announcements and innovations. The content at these events had many overlapping themes, including modern cloud infrastructure at global scale, applying advanced machine intelligence, AKA AI, end-to-end data platforms, collaboration software. They talked a lot about the future of work automation. And they gave us a little taste, each company of the Metaverse Web 3.0 and much more. Despite these striking similarities, the differences between these two cloud platforms and that of AWS remains significant. With Microsoft leveraging its massive application software footprint to dominate virtually all markets and Google doing everything in its power to keep up with the frenetic pace of today's cloud innovation, which was set into motion a decade and a half ago by AWS. Hello and welcome to this week's Wikibon CUBE Insights, powered by ETR. In this Breaking Analysis, we unpack the immense amount of content presented by the CEOs of Microsoft and Google Cloud at Microsoft Ignite and Google Cloud Next. We'll also quantify with ETR survey data the relative position of these two cloud giants in four key sectors: cloud IaaS, BI analytics, data platforms and collaboration software. Now one thing was clear this past week, hybrid events are the thing. Google Cloud Next took place live over a 24-hour period in six cities around the world, with the main gathering in New York City. Microsoft Ignite, which normally is attended by 30,000 people, had a smaller event in Seattle, in person with a virtual audience around the world. AWS re:Invent, of course, is much different. Yes, there's a virtual component at re:Invent, but it's all about a big live audience gathering the week after Thanksgiving, in the first week of December in Las Vegas. Regardless, Satya Nadella keynote address was prerecorded. It was highly produced and substantive. It was visionary, energetic with a strong message that Azure was a platform to allow customers to build their digital businesses. Doing more with less, which was a key theme of his. Nadella covered a lot of ground, starting with infrastructure from the compute, highlighting a collaboration with Arm-based, Ampere processors. New block storage, 60 regions, 175,000 miles of fiber cables around the world. He presented a meaningful multi-cloud message with Azure Arc to support on-prem and edge workloads, as well as of course the public cloud. And talked about confidential computing at the infrastructure level, a theme we hear from all cloud vendors. He then went deeper into the end-to-end data platform that Microsoft is building from the core data stores to analytics, to governance and the myriad tooling Microsoft offers. AI was next with a big focus on automation, AI, training models. He showed demos of machines coding and fixing code and machines automatically creating designs for creative workers and how Power Automate, Microsoft's RPA tooling, would combine with Microsoft Syntex to understand documents and provide standard ways for organizations to communicate with those documents. There was of course a big focus on Azure as developer cloud platform with GitHub Copilot as a linchpin using AI to assist coders in low-code and no-code innovations that are coming down the pipe. And another giant theme was a workforce transformation and how Microsoft is using its heritage and collaboration and productivity software to move beyond what Nadella called productivity paranoia, i.e., are remote workers doing their jobs? In a world where collaboration is built into intelligent workflows, and he even showed a glimpse of the future with AI-powered avatars and partnerships with Meta and Cisco with Teams of all firms. And finally, security with a bevy of tools from identity, endpoint, governance, et cetera, stressing a suite of tools from a single provider, i.e., Microsoft. So a couple points here. One, Microsoft is following in the footsteps of AWS with silicon advancements and didn't really emphasize that trend much except for the Ampere announcement. But it's building out cloud infrastructure at a massive scale, there is no debate about that. Its plan on data is to try and provide a somewhat more abstracted and simplified solutions, which differs a little bit from AWS's approach of the right database tool, for example, for the right job. Microsoft's automation play appears to provide simple individual productivity tools, kind of a ground up approach and make it really easy for users to drive these bottoms up initiatives. We heard from UiPath that forward five last month, a little bit of a different approach of horizontal automation, end-to-end across platforms. So quite a different play there. Microsoft's angle on workforce transformation is visionary and will continue to solidify in our view its dominant position with Teams and Microsoft 365, and it will drive cloud infrastructure consumption by default. On security as well as a cloud player, it has to have world-class security, and Azure does. There's not a lot of debate about that, but the knock on Microsoft is Patch Tuesday becomes Hack Wednesday because Microsoft releases so many patches, it's got so much Swiss cheese in its legacy estate and patching frequently, it becomes a roadmap and a trigger for hackers. Hey, patch Tuesday, these are all the exploits that you can go after so you can act before the patches are implemented. And so it's really become a problem for users. As well Microsoft is competing with many of the best-of-breed platforms like CrowdStrike and Okta, which have market momentum and appear to be more attractive horizontal plays for customers outside of just the Microsoft cloud. But again, it's Microsoft. They make it easy and very inexpensive to adopt. Now, despite the outstanding presentation by Satya Nadella, there are a couple of statements that should raise eyebrows. Here are two of them. First, as he said, Azure is the only cloud that supports all organizations and all workloads from enterprises to startups, to highly regulated industries. I had a conversation with Sarbjeet Johal about this, to make sure I wasn't just missing something and we were both surprised, somewhat, by this claim. I mean most certainly AWS supports more certifications for example, and we would think it has a reasonable case to dispute that claim. And the other statement, Nadella made, Azure is the only cloud provider enabling highly regulated industries to bring their most sensitive applications to the cloud. Now, reasonable people can debate whether AWS is there yet, but very clearly Oracle and IBM would have something to say about that statement. Now maybe it's not just, would say, "Oh, they're not real clouds, you know, they're just going to hosting in the cloud if you will." But still, when it comes to mission-critical applications, you would think Oracle is really the the leader there. Oh, and Satya also mentioned the claim that the Edge browser, the Microsoft Edge browser, no questions asked, he said, is the best browser for business. And we could see some people having some questions about that. Like isn't Edge based on Chrome? Anyway, so we just had to question these statements and challenge Microsoft to defend them because to us it's a little bit of BS and makes one wonder what else in such as awesome keynote and it was awesome, it was hyperbole. Okay, moving on to Google Cloud Next. The keynote started with Sundar Pichai doing a virtual session, he was remote, stressing the importance of Google Cloud. He mentioned that Google Cloud from its Q2 earnings was on a $25-billion annual run rate. What he didn't mention is that it's also on a 3.6 billion annual operating loss run rate based on its first half performance. Just saying. And we'll dig into that issue a little bit more later in this episode. He also stressed that the investments that Google has made to support its core business and search, like its global network of 22 subsea cables to support things like, YouTube video, great performance obviously that we all rely on, those innovations there. Innovations in BigQuery to support its search business and its threat analysis that it's always had and its AI, it's always been an AI-first company, he's stressed, that they're all leveraged by the Google Cloud Platform, GCP. This is all true by the way. Google has absolutely awesome tech and the talk, as well as his talk, Pichai, but also Kurian's was forward thinking and laid out a vision of the future. But it didn't address in our view, and I talked to Sarbjeet Johal about this as well, today's challenges to the degree that Microsoft did and we expect AWS will at re:Invent this year, it was more out there, more forward thinking, what's possible in the future, somewhat less about today's problem, so I think it's resonates less with today's enterprise players. Thomas Kurian then took over from Sundar Pichai and did a really good job of highlighting customers, and I think he has to, right? He has to say, "Look, we are in this game. We have customers, 9 out of the top 10 media firms use Google Cloud. 8 out of the top 10 manufacturers. 9 out of the top 10 retailers. Same for telecom, same for healthcare. 8 out of the top 10 retail banks." He and Sundar specifically referenced a number of companies, customers, including Avery Dennison, Groupe Renault, H&M, John Hopkins, Prudential, Minna Bank out of Japan, ANZ bank and many, many others during the session. So you know, they had some proof points and you got to give 'em props for that. Now like Microsoft, Google talked about infrastructure, they referenced training processors and regions and compute optionality and storage and how new workloads were emerging, particularly data-driven workloads in AI that required new infrastructure. He explicitly highlighted partnerships within Nvidia and Intel. I didn't see anything on Arm, which somewhat surprised me 'cause I believe Google's working on that or at least has come following in AWS's suit if you will, but maybe that's why they're not mentioning it or maybe I got to do more research there, but let's park that for a minute. But again, as we've extensively discussed in Breaking Analysis in our view when it comes to compute, AWS via its Annapurna acquisition is well ahead of the pack in this area. Arm is making its way into the enterprise, but all three companies are heavily investing in infrastructure, which is great news for customers and the ecosystem. We'll come back to that. Data and AI go hand in hand, and there was no shortage of data talk. Google didn't mention Snowflake or Databricks specifically, but it did mention, by the way, it mentioned Mongo a couple of times, but it did mention Google's, quote, Open Data cloud. Now maybe Google has used that term before, but Snowflake has been marketing the data cloud concept for a couple of years now. So that struck as a shot across the bow to one of its partners and obviously competitor, Snowflake. At BigQuery is a main centerpiece of Google's data strategy. Kurian talked about how they can take any data from any source in any format from any cloud provider with BigQuery Omni and aggregate and understand it. And with the support of Apache Iceberg and Delta and Hudi coming in the future and its open Data Cloud Alliance, they talked a lot about that. So without specifically mentioning Snowflake or Databricks, Kurian co-opted a lot of messaging from these two players, such as life and tech. Kurian also talked about Google Workspace and how it's now at 8 million users up from 6 million just two years ago. There's a lot of discussion on developer optionality and several details on tools supported and the open mantra of Google. And finally on security, Google brought out Kevin Mandian, he's a CUBE alum, extremely impressive individual who's CEO of Mandiant, a leading security service provider and consultancy that Google recently acquired for around 5.3 billion. They talked about moving from a shared responsibility model to a shared fate model, which is again, it's kind of a shot across AWS's bow, kind of shared responsibility model. It's unclear that Google will pay the same penalty if a customer doesn't live up to its portion of the shared responsibility, but we can probably assume that the customer is still going to bear the brunt of the pain, nonetheless. Mandiant is really interesting because it's a services play and Google has stated that it is not a services company, it's going to give partners in the channel plenty of room to play. So we'll see what it does with Mandiant. But Mandiant is a very strong enterprise capability and in the single most important area security. So interesting acquisition by Google. Now as well, unlike Microsoft, Google is not competing with security leaders like Okta and CrowdStrike. Rather, it's partnering aggressively with those firms and prominently putting them forth. All right. Let's get into the ETR survey data and see how Microsoft and Google are positioned in four key markets that we've mentioned before, IaaS, BI analytics, database data platforms and collaboration software. First, let's look at the IaaS cloud. ETR is just about to release its October survey, so I cannot share the that data yet. I can only show July data, but we're going to give you some directional hints throughout this conversation. This chart shows net score or spending momentum on the vertical axis and overlap or presence in the data, i.e., how pervasive the platform is. That's on the horizontal axis. And we've inserted the Wikibon estimates of IaaS revenue for the companies, the Big 3. Actually the Big 4, we included Alibaba. So a couple of points in this somewhat busy data chart. First, Microsoft and AWS as always are dominant on both axes. The red dotted line there at 40% on the vertical axis. That represents a highly elevated spending velocity and all of the Big 3 are above the line. Now at the same time, GCP is well behind the two leaders on the horizontal axis and you can see that in the table insert as well in our revenue estimates. Now why is Azure bigger in the ETR survey when AWS is larger according to the Wikibon revenue estimates? And the answer is because Microsoft with products like 365 and Teams will often be considered by respondents in the survey as cloud by customers, so they fit into that ETR category. But in the insert data we're stripping out applications and SaaS from Microsoft and Google and we're only isolating on IaaS. The other point is when you take a look at the early October returns, you see downward pressure as signified by those dotted arrows on every name. The only exception was Dell, or Dell and IBM, which showing slightly improved momentum. So the survey data generally confirms what we know that AWS and Azure have a massive lead and strong momentum in the marketplace. But the real story is below the line. Unlike Google Cloud, which is on pace to lose well over 3 billion on an operating basis this year, AWS's operating profit is around $20 billion annually. Microsoft's Intelligent Cloud generated more than $30 billion in operating income last fiscal year. Let that sink in for a moment. Now again, that's not to say Google doesn't have traction, it does and Kurian gave some nice proof points and customer examples in his keynote presentation, but the data underscores the lead that Microsoft and AWS have on Google in cloud. And here's a breakdown of ETR's proprietary net score methodology, that vertical axis that we showed you in the previous chart. It asks customers, are you adopting the platform new? That's that lime green. Are you spending 6% or more? That's the forest green. Is you're spending flat? That's the gray. Is you're spending down 6% or worse? That's the pinkest color. Or are you replacing the platform, defecting? That's the bright red. You subtract the reds from the greens and you get a net score. Now one caveat here, which actually is really favorable from Microsoft, the Microsoft data that we're showing here is across the entire Microsoft portfolio. The other point is, this is July data, we'll have an update for you once ETR releases its October results. But we're talking about meaningful samples here, the ends. 620 for AWS over a thousand from Microsoft in more than 450 respondents in the survey for Google. So the real tell is replacements, that bright red. There is virtually no churn for AWS and Microsoft, but Google's churn is 5x, those two in the survey. Now 5% churn is not high, but you'd like to see three things for Google given it's smaller size. One is less churn, two is much, much higher adoption rates in the lime green. Three is a higher percentage of those spending more, the forest green. And four is a lower percentage of those spending less. And none of these conditions really applies here for Google. GCP is still not growing fast enough in our opinion, and doesn't have nearly the traction of the two leaders and that shows up in the survey data. All right, let's look at the next sector, BI analytics. Here we have that same XY dimension. Again, Microsoft dominating the picture. AWS very strong also in both axes. Tableau, very popular and respectable of course acquired by Salesforce on the vertical axis, still looking pretty good there. And again on the horizontal axis, big presence there for Tableau. And Google with Looker and its other platforms is also respectable, but it again, has some work to do. Now notice Streamlit, that's a recent Snowflake acquisition. It's strong in the vertical axis and because of Snowflake's go-to-market (indistinct), it's likely going to move to the right overtime. Grafana is also prominent in the Y axis, but a glimpse at the most recent survey data shows them slightly declining while Looker actually improves a bit. As does Cloudera, which we'll move up slightly. Again, Microsoft just blows you away, doesn't it? All right, now let's get into database and data platform. Same X Y dimensions, but now database and data warehouse. Snowflake as usual takes the top spot on the vertical axis and it is actually keeps moving to the right as well with again, Microsoft and AWS is dominant in the market, as is Oracle on the X axis, albeit it's got less spending velocity, but of course it's the database king. Google is well behind on the X axis but solidly above the 40% line on the vertical axis. Note that virtually all platforms will see pressure in the next survey due to the macro environment. Microsoft might even dip below the 40% line for the first time in a while. Lastly, let's look at the collaboration and productivity software market. This is such an important area for both Microsoft and Google. And just look at Microsoft with 365 and Teams up into the right. I mean just so impressive in ubiquitous. And we've highlighted Google. It's in the pack. It certainly is a nice base with 174 N, which I can tell you that N will rise in the next survey, which is an indication that more people are adopting. But given the investment and the tech behind it and all the AI and Google's resources, you'd really like to see Google in this space above the 40% line, given the importance of this market, of this collaboration area to Google's success and the degree to which they emphasize it in their pitch. And look, this brings up something that we've talked about before on Breaking Analysis. Google doesn't have a tech problem. This is a go-to-market and marketing challenge that Google faces and it's up against two go-to-market champs and Microsoft and AWS. And Google doesn't have the enterprise sales culture. It's trying, it's making progress, but it's like that racehorse that has all the potential in the world, but it's just missing some kind of key ingredient to put it over at the top. It's always coming in third, (chuckles) but we're watching and Google's obviously, making some investments as we shared with earlier. All right. Some final thoughts on what we learned this week and in this research: customers and partners should be thrilled that both Microsoft and Google along with AWS are spending so much money on innovation and building out global platforms. This is a gift to the industry and we should be thankful frankly because it's good for business, it's good for competitiveness and future innovation as a platform that can be built upon. Now we didn't talk much about multi-cloud, we haven't even mentioned supercloud, but both Microsoft and Google have a story that resonates with customers in cross cloud capabilities, unlike AWS at this time. But we never say never when it comes to AWS. They sometimes and oftentimes surprise you. One of the other things that Sarbjeet Johal and John Furrier and I have discussed is that each of the Big 3 is positioning to their respective strengths. AWS is the best IaaS. Microsoft is building out the kind of, quote, we-make-it-easy-for-you cloud, and Google is trying to be the open data cloud with its open-source chops and excellent tech. And that puts added pressure on Snowflake, doesn't it? You know, Thomas Kurian made some comments according to CRN, something to the effect that, we are the only company that can do the data cloud thing across clouds, which again, if I'm being honest is not really accurate. Now I haven't clarified these statements with Google and often things get misquoted, but there's little question that, as AWS has done in the past with Redshift, Google is taking a page out of Snowflake, Databricks as well. A big difference in the Big 3 is that AWS doesn't have this big emphasis on the up-the-stack collaboration software that both Microsoft and Google have, and that for Microsoft and Google will drive captive IaaS consumption. AWS obviously does some of that in database, a lot of that in database, but ISVs that compete with Microsoft and Google should have a greater affinity, one would think, to AWS for competitive reasons. and the same thing could be said in security, we would think because, as I mentioned before, Microsoft competes very directly with CrowdStrike and Okta and others. One of the big thing that Sarbjeet mentioned that I want to call out here, I'd love to have your opinion. AWS specifically, but also Microsoft with Azure have successfully created what Sarbjeet calls brand distance. AWS from the Amazon Retail, and even though AWS all the time talks about Amazon X and Amazon Y is in their product portfolio, but you don't really consider it part of the retail organization 'cause it's not. Azure, same thing, has created its own identity. And it seems that Google still struggles to do that. It's still very highly linked to the sort of core of Google. Now, maybe that's by design, but for enterprise customers, there's still some potential confusion with Google, what's its intentions? How long will they continue to lose money and invest? Are they going to pull the plug like they do on so many other tools? So you know, maybe some rethinking of the marketing there and the positioning. Now we didn't talk much about ecosystem, but it's vital for any cloud player, and Google again has some work to do relative to the leaders. Which brings us to supercloud. The ecosystem and end customers are now in a position this decade to digitally transform. And we're talking here about building out their own clouds, not by putting in and building data centers and installing racks of servers and storage devices, no. Rather to build value on top of the hyperscaler gift that has been presented. And that is a mega trend that we're watching closely in theCUBE community. While there's debate about the supercloud name and so forth, there little question in our minds that the next decade of cloud will not be like the last. All right, we're going to leave it there today. Many thanks to Sarbjeet Johal, and my business partner, John Furrier, for their input to today's episode. Thanks to Alex Myerson who's on production and manages the podcast and Ken Schiffman as well. Kristen Martin and Cheryl Knight helped get the word out on social media and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE, who does some wonderful editing. And check out SiliconANGLE, a lot of coverage on Google Cloud Next and Microsoft Ignite. Remember, all these episodes are available as podcast wherever you listen. Just search Breaking Analysis podcast. I publish each week on wikibon.com and siliconangle.com. And you can always get in touch with me via email, david.vellante@siliconangle.com or you can DM me at dvellante or comment on my LinkedIn posts. And please do check out etr.ai, the best survey data in the enterprise tech business. This is Dave Vellante for the CUBE Insights, powered by ETR. Thanks for watching and we'll see you next time on Breaking Analysis. (gentle music)

Published Date : Oct 15 2022

SUMMARY :

with Dave Vellante. and the degree to which they

ENTITIES

Entity	Category	Confidence
AWS	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Nadella	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Kevin Mandian	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Cheryl Knight	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Thomas Kurian	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Ken Schiffman	PERSON	0.99+
October	DATE	0.99+
Satya Nadella	PERSON	0.99+
Seattle	LOCATION	0.99+
John Furrier	PERSON	0.99+
3.6 billion	QUANTITY	0.99+
Rob Hof	PERSON	0.99+
Sundar	PERSON	0.99+
Prudential	ORGANIZATION	0.99+
July	DATE	0.99+
New York City	LOCATION	0.99+
H&M	ORGANIZATION	0.99+
Kurian	PERSON	0.99+
two	QUANTITY	0.99+
6%	QUANTITY	0.99+
Minna Bank	ORGANIZATION	0.99+
5x	QUANTITY	0.99+
Sarbjeet Johal	PERSON	0.99+

Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst

>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 22 2022

SUMMARY :

give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is

ENTITIES

Entity	Category	Confidence
Jess Borgman	PERSON	0.99+
Richard	PERSON	0.99+
20 cents	QUANTITY	0.99+
six	QUANTITY	0.99+
Justin	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
Jess	PERSON	0.99+
pythons	TITLE	0.99+
seven years	QUANTITY	0.99+
Today	DATE	0.99+
Javas	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.98+
millions	QUANTITY	0.98+
EVAs	ORGANIZATION	0.98+
JAK	PERSON	0.98+
Starburst	ORGANIZATION	0.98+
both	QUANTITY	0.97+
10	DATE	0.97+
12 years ago	DATE	0.97+
Starbust	TITLE	0.96+
today	DATE	0.95+
Apache iceberg	ORGANIZATION	0.94+
Google	ORGANIZATION	0.93+
12 years	QUANTITY	0.92+
single point	QUANTITY	0.92+
two worlds	QUANTITY	0.92+
10	QUANTITY	0.91+
Hudu	LOCATION	0.91+
Unix	TITLE	0.9+
one thing	QUANTITY	0.87+
trillions of records	QUANTITY	0.83+
first data lake	QUANTITY	0.82+
Starburst	TITLE	0.8+
PJI	ORGANIZATION	0.79+
years ago	DATE	0.76+
IE	TITLE	0.75+
Lie 2	TITLE	0.72+
many years ago	DATE	0.72+
over a couple times	QUANTITY	0.7+
TCO	ORGANIZATION	0.7+
Parque	ORGANIZATION	0.67+
Number two	QUANTITY	0.64+
Kubernetes	ORGANIZATION	0.59+
a decade	QUANTITY	0.58+
plus years	DATE	0.57+
Azure	TITLE	0.57+
S3	TITLE	0.55+
Delta	TITLE	0.54+
20	QUANTITY	0.49+
last	DATE	0.48+
Mohan	PERSON	0.44+
ORC	ORGANIZATION	0.27+

Starburst The Data Lies FULL V2b

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 22 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst The Data Lies FULL V1

>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.

Published Date : Aug 20 2022

SUMMARY :

famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Dave Lanta	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Theresa	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Teresa	PERSON	0.99+
Jeff Ocker	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Dave Valante	PERSON	0.99+
Justin Boardman	PERSON	0.99+
six	QUANTITY	0.99+
Dani	PERSON	0.99+
Massachusetts	LOCATION	0.99+
20 cents	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Jamma	PERSON	0.99+
UK	LOCATION	0.99+
FINRA	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
20%	QUANTITY	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
Jess	PERSON	0.99+
2011	DATE	0.99+
Starburst	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
thousands	QUANTITY	0.99+
pythons	TITLE	0.99+
Boston	LOCATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.99+
two models	QUANTITY	0.99+
Zolando Comcast	ORGANIZATION	0.99+
Gemma	PERSON	0.99+
Starbust	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Javas	TITLE	0.99+
today	DATE	0.99+
AWS	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
first lie	QUANTITY	0.99+
10	DATE	0.99+
12 years	QUANTITY	0.99+
one place	QUANTITY	0.99+
Tomorrow	DATE	0.99+

Starburst Panel Q2

>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.

Published Date : Aug 2 2022

SUMMARY :

cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Jess Borgman	PERSON	0.99+
Justin	PERSON	0.99+
six	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Richard Jarvis	PERSON	0.99+
20 cents	QUANTITY	0.99+
20%	QUANTITY	0.99+
Kurt Monash	PERSON	0.99+
P Sanji Mohan	PERSON	0.99+
Today	DATE	0.99+
seven years	QUANTITY	0.99+
pythons	TITLE	0.99+
Teradata	ORGANIZATION	0.99+
JAK	PERSON	0.99+
Javas	TITLE	0.99+
10	DATE	0.99+
today	DATE	0.98+
Starbust	TITLE	0.98+
Starburst	ORGANIZATION	0.97+
VMware	ORGANIZATION	0.97+
both	QUANTITY	0.97+
12 years ago	DATE	0.96+
single point	QUANTITY	0.96+
millions of hours	QUANTITY	0.95+
10	QUANTITY	0.93+
Unix	TITLE	0.92+
12 years	QUANTITY	0.92+
Google	ORGANIZATION	0.9+
two worlds	QUANTITY	0.9+
DY	ORGANIZATION	0.87+
first data lake	QUANTITY	0.86+
Hudu	LOCATION	0.85+
trillions	QUANTITY	0.85+
one thing	QUANTITY	0.83+
many years ago	DATE	0.79+
Apache iceberg	ORGANIZATION	0.79+
over a couple times	QUANTITY	0.77+
emus health	ORGANIZATION	0.75+
Jimin	PERSON	0.73+
Starburst	TITLE	0.73+
years ago	DATE	0.72+
Azure	TITLE	0.7+
Kubernetes	ORGANIZATION	0.67+
TCO	ORGANIZATION	0.64+
S3	TITLE	0.62+
Delta	ORGANIZATION	0.6+
plus years	DATE	0.59+
Number two	QUANTITY	0.58+
a decade	QUANTITY	0.56+
iceberg	TITLE	0.47+
Parque	ORGANIZATION	0.47+
last	DATE	0.47+
20	QUANTITY	0.46+
Q2	QUANTITY	0.31+
ORC	ORGANIZATION	0.27+

Ed Walsh, ChaosSearch | AWS re:Inforce 2022

(upbeat music) >> Welcome back to Boston, everybody. This is the birthplace of theCUBE. In 2010, May of 2010 at EMC World, right in this very venue, John Furrier called it the chowder and lobster post. I'm Dave Vellante. We're here at RE:INFORCE 2022, Ed Walsh, CEO of ChaosSearch. Doing a drive by Ed. Thanks so much for stopping in. You're going to help me wrap up in our final editorial segment. >> Looking forward to it. >> I really appreciate it. >> Thank you for including me. >> How about that? 2010. >> That's amazing. It was really in this-- >> Really in this building. Yeah, we had to sort of bury our way in, tunnel our way into the Blogger Lounge. We did four days. >> Weekends, yeah. >> It was epic. It was really epic. But I'm glad they're back in Boston. AWS was going to do June in Houston. >> Okay. >> Which would've been awful. >> Yeah, yeah. No, this is perfect. >> Yeah. Thank God they came back. You saw Boston in summer is great. I know it's been hot, And of course you and I are from this area. >> Yeah. >> So how you been? What's going on? I mean, it's a little crazy out there. The stock market's going crazy. >> Sure. >> Having the tech lash, what are you seeing? >> So it's an interesting time. So I ran a company in 2008. So we've been through this before. By the way, the world's not ending, we'll get through this. But it is an interesting conversation as an investor, but also even the customers. There's some hesitation but you have to basically have the right value prop, otherwise things are going to get sold. So we are seeing longer sales cycles. But it's nothing that you can't overcome. But it has to be something not nice to have, has to be a need to have. But I think we all get through it. And then there is some, on the VC side, it's now buckle down, let's figure out what to do which is always a challenge for startup plans. >> In pre 2000 you, maybe you weren't a CEO but you were definitely an executive. And so now it's different and a lot of younger people haven't seen this. You've got interest rates now rising. Okay, we've seen that before but it looks like you've got inflation, you got interest rates rising. >> Yep. >> The consumer spending patterns are changing. You had 6$, $7 gas at one point. So you have these weird crosscurrents, >> Yup. >> And people are thinking, "Okay post-September now, maybe because of the recession, the Fed won't have to keep raising interest rates and tightening. But I don't know what to root for. It's like half full, half empty. (Ed laughing) >> But we haven't been in an environment with high inflation. At least not in my career. >> Right. Right. >> I mean, I got into 92, like that was long gone, right?. >> Yeah. >> So it is a interesting regime change that we're going to have to deal with, but there's a lot of analogies between 2008 and now that you still have to work through too, right?. So, anyway, I don't think the world's ending. I do think you have to run a tight shop. So I think the grow all costs is gone. I do think discipline's back in which, for most of us, discipline never left, right?. So, to me that's the name of the game. >> What do you tell just generally, I mean you've been the CEO of a lot of private companies. And of course one of the things that you do to retain people and attract people is you give 'em stock and it's great and everybody's excited. >> Yeah. >> I'm sure they're excited cause you guys are a rocket ship. But so what's the message now that, Okay the market's down, valuations are down, the trees don't grow to the moon, we all know that. But what are you telling your people? What's their reaction? How do you keep 'em motivated? >> So like anything, you want over communicate during these times. So I actually over communicate, you get all these you know, the Sequoia decks, 2008 and the recent... >> (chuckles) Rest in peace good times, that one right? >> I literally share it. Why? It's like, Hey, this is what's going on in the real world. It's going to affect us. It has almost nothing to do with us specifically, but it will affect us. Now we can't not pay attention to it. It does change how you're going to raise money, so you got to make sure you have the right runway to be there. So it does change what you do, but I think you over communicate. So that's what I've been doing and I think it's more like a student of the game, so I try to share it, and I say some appreciate it others, I'm just saying, this is normal, we'll get through this and this is what happened in 2008 and trust me, once the market hits bottom, give it another month afterwards. Then everyone says, oh, the bottom's in and we're back to business. Valuations don't go immediately back up, but right now, no one knows where the bottom is and that's where kind of the world's ending type of things. >> Well, it's interesting because you talked about, I said rest in peace good times >> Yeah >> that was the Sequoia deck, and the message was tighten up. Okay, and I'm not saying you shouldn't tighten up now, but the difference is, there was this period of two years of easy money and even before that, it was pretty easy money. >> Yeah. >> And so companies are well capitalized, they have runway so it's like, okay, I was talking to Frank Slootman about this now of course there are public companies, like we're not taking the foot off the gas. We're inherently profitable, >> Yeah. >> we're growing like crazy, we're going for it. You know? So that's a little bit of a different dynamic. There's a lot of good runway out there, isn't there? >> But also you look at the different companies that were either born or were able to power through those environments are actually better off. You come out stronger in a more dominant position. So Frank, listen, if you see what Frank's done, it's been unbelievable to watch his career, right?. In fact, he was at Data Domain, I was Avamar so, but look at what he's done since, he's crushed it. Right? >> Yeah. >> So for him to say, Hey, I'm going to literally hit the gas and keep going. I think that's the right thing for Snowflake and a right thing for a lot of people. But for people in different roles, I literally say that you have to take it seriously. What you can't be is, well, Frank's in a different situation. What is it...? How many billion does he have in the bank? So it's... >> He's over a billion, you know, over a billion. Well, you're on your way Ed. >> No, no, no, it's good. (Dave chuckles) Okay, I want to ask you about this concept that we've sort of we coined this term called Supercloud. >> Sure. >> You could think of it as the next generation of multi-cloud. The basic premises that multi-cloud was largely a symptom of multi-vendor. Okay. I've done some M&A, I've got some Shadow IT, spinning up, you know, Shadow clouds, projects. But it really wasn't a strategy to have a continuum across clouds. And now we're starting to see ecosystems really build, you know, you've used the term before, standing on the shoulders of giants, you've used that a lot. >> Yep. >> And so we're seeing that. Jerry Chen wrote a seminal piece on Castles in The Cloud, so we coined this term SuperCloud to connote this abstraction layer that hides the underlying complexities and primitives of the individual clouds and then adds value on top of it and can adjudicate and manage, irrespective of physical location, Supercloud. >> Yeah. >> Okay. What do you think about that concept?. How does it maybe relate to some of the things that you're seeing in the industry? >> So, standing on shoulders of giants, right? So I always like to do hard tech either at big company, small companies. So we're probably your definition of a Supercloud. We had a big vision, how to literally solve the core challenge of analytics at scale. How are you going to do that? You're not going to build on your own. So literally we're leveraging the primitives, everything you can get out of the Amazon cloud, everything get out of Google cloud. In fact, we're even looking at what it can get out of this Snowflake cloud, and how do we abstract that out, add value to it? That's where all our patents are. But it becomes a simplified approach. The customers don't care. Well, they care where their data is. But they don't care how you got there, they just want to know the end result. So you simplify, but you gain the advantages. One thing's interesting is, in this particular company, ChaosSearch, people try to always say, at some point the sales cycle they say, no way, hold on, no way that can be fast no way, or whatever the different issue. And initially we used to try to explain our technology, and I would say 60% was explaining the public, cloud capabilities and then how we, harvest those I guess, make them better add value on top and what you're able to get is something you couldn't get from the public clouds themselves and then how we did that across public clouds and then extracted it. So if you think about that like, it's the Shoulders of giants. But what we now do, literally to avoid that conversation because it became a lengthy conversation. So, how do you have a platform for analytics that you can't possibly overwhelm for ingest. All your messy data, no pipelines. Well, you leverage things like S3 and EC2, and you do the different security things. You can go to environments say, you can't possibly overrun me, I could not say that. If I didn't literally build on the shoulders giants of all these public clouds. But the value. So if you're going to do hard tech as a startup, you're going to build, you're going to be the principles of Supercloud. Maybe they're not the same size of Supercloud just looking at Snowflake, but basically, you're going to leverage all that, you abstract it out and that's where you're able to have a lot of values at that. >> So let me ask you, so I don't know if there's a strict definition of Supercloud, We sort of put it out to the community and said, help us define it. So you got to span multiple clouds. It's not just running in each cloud. There's a metadata layer that kind of understands where you're pulling data from. Like you said you can pull data from Snowflake, it sounds like we're not running on Snowflake, correct? >> No, complimentary to them in their different customers. >> Yeah. Okay. >> They want to build on top of a data platform, data apps. >> Right. And of course they're going cross cloud. >> Right. >> Is there a PaaS layer in there? We've said there's probably a Super PaaS layer. You're probably not doing that, but you're allowing people to bring their own, bring your own PaaS sort of thing maybe. >> So we're a little bit different but basically we publish open APIs. We don't have a user interface. We say, keep the user interface. Again, we're solving the challenge of analytics at scale, we're not trying to retrain your analytics, either analysts or your DevOps or your SOV or your Secop team. They use the tools they already use. Elastic search APIs, SQL APIs. So really they program, they build applications on top of us, Equifax is a good example. Case said it coming out later on this week, after 18 months in production but, basically they're building, we provide the abstraction layer, the quote, I'm going to kill it, Jeff Tincher, who owns all of SREs worldwide, said to the effect of, Hey I'm able to rethink what I do for my data pipelines. But then he also talked about how, that he really doesn't have to worry about the data he puts in it. We deal with that. And he just has to, just query on the other side. That simplicity. We couldn't have done that without that. So anyway, what I like about the definition is, if you were going to do something harder in the world, why would you try to rebuild what Amazon, Google and Azure or Snowflake did? You're going to add things on top. We can still do intellectual property. We're still doing patents. So five grand patents all in this. But literally the abstraction layer is the simplification. The end users do not want to know that complexity, even though they ask the questions. >> And I think too, the other attribute is it's ecosystem enablement. Whereas I think, >> Absolutely >> in general, in the Multicloud 1.0 era, the ecosystem wasn't thinking about, okay, how do I build on top and abstract that. So maybe it is Multicloud 2.0, We chose to use Supercloud. So I'm wondering, we're at the security conference, >> RE: INFORCE is there a security Supercloud? Maybe Snyk has the developer Supercloud or maybe Okta has the identity Supercloud. I think CrowdStrike maybe not. Cause CrowdStrike competes with Microsoft. So maybe, because Microsoft, what's interesting, Merritt Bear was just saying, look, we don't show up in the spending data for security because we're not charging for most of our security. We're not trying to make a big business. So that's kind of interesting, but is there a potential for the security Supercloud? >> So, I think so. But also, I'll give you one thing I talked to, just today, at least three different conversations where everyone wants to log data. It's a little bit specific to us, but basically they want to do the security data lake. The idea of, and Snowflake talks about this too. But the idea of putting all the data in one repository and then how do you abstract out and get value from it? Maybe not the perfect, but it becomes simple to do but hard to get value out. So the different players are going to do that. That's what we do. We're able to, once you land it in your S3 or it doesn't matter, cloud of choice, simple storage, we allow you to get after that data, but we take the primitives and hide them from you. And all you do is query the data and we're spinning up stateless computer to go after it. So then if I look around the floor. There's going to be a bunch of these players. I don't think, why would someone in this floor try to recreate what Amazon or Google or Azure had. They're going to build on top of it. And now the key thing is, do you leave it in standard? And now we're open APIs. People are building on top of my open APIs or do you try to put 'em in a walled garden? And they're in, now your Supercloud. Our belief is, part of it is, it needs to be open access and let you go after it. >> Well. And build your applications on top of it openly. >> They come back to snowflake. That's what Snowflake's doing. And they're basically saying, Hey come into our proprietary environment. And the benefit is, and I think both can win. There's a big market. >> I agree. But I think the benefit of Snowflake's is, okay, we're going to have federated governance, we're going to have data sharing, you're going to have access to all the ecosystem players. >> Yep. >> And as everything's going to be controlled and you know what you're getting. The flip side of that is, Databricks is the other end >> Yeah. >> of that spectrum, which is no, no, you got to be open. >> Yeah. >> So what's going to happen, well what's happening clearly, is Snowflake's saying, okay we've got Snowpark. we're going to allow Python, we're going to have an Apache Iceberg. We're going to have open source tooling that you can access. By the way, it's not going to be as good as our waled garden where the flip side of that is you get Databricks coming at it from a data science and data engineering perspective. And there's a lot of gaps in between, aren't there? >> And I think they both win. Like for instance, so we didn't do Snowpark integration. But we work with people building data apps on top of Snowflake or data bricks. And what we do is, we can add value to that, or what we've done, again, using all the Supercloud stuff we're done. But we deal with the unstructured data, the four V's coming at you. You can't pipeline that to save. So we actually could be additive. As they're trying to do like a security data cloud inside of Snowflake or do the same thing in Databricks. That's where we can play. Now, we play with them at the application level that they get some data from them and some data for us. But I believe there's a partnership there that will do it inside their environment. To us they're just another large scaler environment that my customers want to get after data. And they want me to abstract it out and give value. >> So it's another repository to you. >> Yeah. >> Okay. So I think Snowflake recently added support for unstructured data. You chose not to do Snowpark because why? >> Well, so the way they're doing the unstructured data is not bad. It's JSON data. Basically, This is the dilemma. Everyone wants their application developers to be flexible, move fast, securely but just productivity. So you get, give 'em flexibility. The problem with that is analytics on the end want to be structured to be performant. And this is where Snowflake, they have to somehow get that raw data. And it's changing every day because you just let the developers do what they want now, in some structured base, but do what you need to do your business fast and securely. So it completely destroys. So they have large customers trying to do big integrations for this messy data. And it doesn't quite work, cause you literally just can't make the pipelines work. So that's where we're complimentary do it. So now, the particular integration wasn't, we need a little bit deeper integration to do that. So we're integrating, actually, at the data app layer. But we could, see us and I don't, listen. I think Snowflake's a good actor. They're trying to figure out what's best for the customers. And I think we just participate in that. >> Yeah. And I think they're trying to figure out >> Yeah. >> how to grow their ecosystem. Because they know they can't do it all, in fact, >> And we solve the key thing, they just can't do certain things. And we do that well. Yeah, I have SQL but that's where it ends. >> Yeah. >> I do the messy data and how to play with them. >> And when you talk to one of their founders, anyway, Benoit, he comes on the cube and he's like, we start with simple. >> Yeah. >> It reminds me of the guy's some Pure Storage, that guy Coz, he's always like, no, if it starts to get too complicated. So that's why they said all right, we're not going to start out trying to figure out how to do complex joins and workload management. And they turn that into a feature. So like you say, I think both can win. It's a big market. >> I think it's a good model. And I love to see Frank, you know, move. >> Yeah. I forgot So you AVMAR... >> In the day. >> You guys used to hate each other, right? >> No, no, no >> No. I mean, it's all good. >> But the thing is, look what he's done. Like I wouldn't bet against Frank. I think it's a good message. You can see clients trying to do it. Same thing with Databricks, same thing with BigQuery. We get a lot of same dynamic in BigQuery. It's good for a lot of things, but it's not everything you need to do. And there's ways for the ecosystem to play together. >> Well, what's interesting about BigQuery is, it is truly cloud native, as is Snowflake. You know, whereas Amazon Redshift was sort of Parexel, it's cobbled together now. It's great engineering, but BigQuery gets a lot of high marks. But again, there's limitations to everything. That's why companies like yours can exist. >> And that's why.. so back to the Supercloud. It allows me as a company to participate in that because I'm leveraging all the underlying pieces. Which we couldn't be doing what we're doing now, without leveraging the Supercloud concepts right, so... >> Ed, I really appreciate you coming by, help me wrap up today in RE:INFORCE. Always a pleasure seeing you, my friend. >> Thank you. >> All right. Okay, this is a wrap on day one. We'll be back tomorrow. I'll be solo. John Furrier had to fly out but we'll be following what he's doing. This is RE:INFORCE 2022. You're watching theCUBE. I'll see you tomorrow.

Published Date : Jul 26 2022

SUMMARY :

John Furrier called it the How about that? It was really in this-- Yeah, we had to sort of bury our way in, But I'm glad they're back in Boston. No, this is perfect. And of course you and So how you been? But it's nothing that you can't overcome. but you were definitely an executive. So you have these weird crosscurrents, because of the recession, But we haven't been in an environment Right. that was long gone, right?. I do think you have to run a tight shop. the things that you do But what are you telling your people? 2008 and the recent... So it does change what you do, and the message was tighten up. the foot off the gas. So that's a little bit But also you look at I literally say that you you know, over a billion. Okay, I want to ask you about this concept you know, you've used the term before, of the individual clouds and to some of the things So I always like to do hard tech So you got to span multiple clouds. No, complimentary to them of a data platform, data apps. And of course people to bring their own, the quote, I'm going to kill it, And I think too, the other attribute is in the Multicloud 1.0 era, for the security Supercloud? And now the key thing is, And build your applications And the benefit is, But I think the benefit of Snowflake's is, you know what you're getting. which is no, no, you got to be open. that you can access. You can't pipeline that to save. You chose not to do Snowpark but do what you need to do they're trying to figure out how to grow their ecosystem. And we solve the key thing, I do the messy data And when you talk to So like you say, And I love to see Frank, you know, move. So you AVMAR... it's all good. but it's not everything you need to do. there's limitations to everything. so back to the Supercloud. Ed, I really appreciate you coming by, I'll see you tomorrow.

ENTITIES

Entity	Category	Confidence
Jeff Tincher	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Boston	LOCATION	0.99+
2008	DATE	0.99+
Jerry Chen	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Ed Walsh	PERSON	0.99+
Frank	PERSON	0.99+
Frank Slootman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Houston	LOCATION	0.99+
2010	DATE	0.99+
tomorrow	DATE	0.99+
Benoit	PERSON	0.99+
Ed	PERSON	0.99+
60%	QUANTITY	0.99+
Dave	PERSON	0.99+
ChaosSearch	ORGANIZATION	0.99+
June	DATE	0.99+
May of 2010	DATE	0.99+
BigQuery	TITLE	0.99+
Castles in The Cloud	TITLE	0.99+
September	DATE	0.99+
Data Domain	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
today	DATE	0.99+
$7	QUANTITY	0.99+
each cloud	QUANTITY	0.99+
both	QUANTITY	0.99+
over a billion	QUANTITY	0.99+
Multicloud 2.0	TITLE	0.99+
four days	QUANTITY	0.99+
M&A	ORGANIZATION	0.98+
one repository	QUANTITY	0.98+
Python	TITLE	0.98+
Databricks	ORGANIZATION	0.98+
Merritt Bear	PERSON	0.98+
Supercloud	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.97+
SQL	TITLE	0.97+
EC2	TITLE	0.97+
one	QUANTITY	0.96+
Fed	ORGANIZATION	0.96+
S3	TITLE	0.96+
five grand patents	QUANTITY	0.96+
Snowpark	ORGANIZATION	0.96+
Multicloud 1.0	TITLE	0.95+
billion	QUANTITY	0.94+
Avamar	ORGANIZATION	0.93+
EMC World	LOCATION	0.93+
Snowflake	PERSON	0.93+
one point	QUANTITY	0.93+
Supercloud	TITLE	0.93+
Equifax	ORGANIZATION	0.92+
92	QUANTITY	0.91+
Super PaaS	TITLE	0.91+
Snowflake	TITLE	0.89+

Breaking Analysis: Answering the top 10 questions about SuperCloud

>> From the theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Welcome to this week's Wikibon, theCUBE's insights powered by ETR. As we exited the isolation economy last year, supercloud is a term that we introduced to describe something new that was happening in the world of cloud. In this Breaking Analysis, we address the 10 most frequently asked questions we get around supercloud. Okay, let's review these frequently asked questions on supercloud that we're going to try to answer today. Look at an industry that's full of hype and buzzwords. Why the hell does anyone need a new term? Aren't hyperscalers building out superclouds? We'll try to answer why the term supercloud connotes something different from hyperscale clouds. And we'll talk about the problems that superclouds solve specifically. And we'll further define the critical aspects of a supercloud architecture. We often get asked, isn't this just multi-cloud? Well, we don't think so, and we'll explain why in this Breaking Analysis. Now in an earlier episode, we introduced the notion of super PaaS. Well, isn't a plain vanilla PaaS already a super PaaS? Again, we don't think so, and we'll explain why. Who will actually build and who are the players currently building superclouds? What workloads and services will run on superclouds? And 8-A or number nine, what are some examples that we can share of supercloud? And finally, we'll answer what you can expect next from us on supercloud? Okay, let's get started. Why do we need another buzzword? Well, late last year, ahead of re:Invent, we were inspired by a post from Jerry Chen called "Castles in the Cloud." Now in that blog post, he introduced the idea that there were sub-markets emerging in cloud that presented opportunities for investors and entrepreneurs that the cloud wasn't going to suck the hyperscalers. Weren't going to suck all the value out of the industry. And so we introduced this notion of supercloud to describe what we saw as a value layer emerging above the hyperscalers CAPEX gift, we sometimes call it. Now it turns out, that we weren't the only ones using the term as both Cornell and MIT have used the phrase in somewhat similar, but different contexts. The point is something new was happening in the AWS and other ecosystems. It was more than IaaS and PaaS, and wasn't just SaaS running in the cloud. It was a new architecture that integrates infrastructure, platform and software as services to solve new problems that the cloud vendors in our view, weren't addressing by themselves. It seemed to us that the ecosystem was pursuing opportunities across clouds that went beyond conventional implementations of multi-cloud. And we felt there was a structural change going on at the industry level, the supercloud, metaphorically was highlighting. So that's the background on why we felt a new catch phrase was warranted, love it or hate it. It's memorable and it's what we chose. Now to that last point about structural industry transformation. Andy Rappaport is sometimes and often credited with identifying the shift from the vertically integrated IBM mainframe era to the fragmented PC microprocesor-based era in his HBR article in 1991. In fact, it was David Moschella, who at the time was an IDC Analyst who first introduced the concept in 1987, four years before Rappaport's article was published. Moschella saw that it was clear that Intel, Microsoft, Seagate and others would replace the system vendors, and put that forth in a graphic that looked similar to the first two on this chart. We don't have to review the shift from IBM as the center of the industry to Wintel, that's well understood. What isn't as well known or accepted is what Moschella put out in his 2018 book called "Seeing Digital" which introduced the idea of "The Matrix" that's shown on the right hand side of this chart. Moschella posited that new services were emerging built on top of the internet and hyperscale clouds that would integrate other innovations and would define the next era of computing. He used the term Matrix because the conceptual depiction included not only horizontal technology rose like the cloud and the internet, but for the first time included connected industry verticals, the columns in this chart. Moschella pointed out that whereas historically, industry verticals had a closed value chain or stack and ecosystem of R&D, and production, and manufacturing, and distribution. And if you were in that industry, the expertise within that vertical generally stayed within that vertical and was critical to success. But because of digital and data, for the first time, companies were able to traverse industries, jump across industries and compete because data enabled them to do that. Examples, Amazon and content, payments, groceries, Apple, and payments, and content, and so forth. There are many examples. Data was now this unifying enabler and this marked a change in the structure of the technology landscape. And supercloud is meant to imply more than running in hyperscale clouds, rather it's the combination of multiple technologies enabled by CloudScale with new industry participants from those verticals, financial services and healthcare, manufacturing, energy, media, and virtually all in any industry. Kind of an extension of every company is a software company. Basically, every company now has the opportunity to build their own cloud or supercloud. And we'll come back to that. Let's first address what's different about superclouds relative to hyperscale clouds? You know, this one's pretty straightforward and obvious, I think. Hyperscale clouds, they're walled gardens where they want your data in their cloud and they want to keep you there. Sure, every cloud player realizes that not all data will go to their particular cloud so they're meeting customers where their data lives with initiatives like Amazon Outposts and Azure Arc, and Google Anthos. But at the end of the day, the more homogeneous they can make their environments, the better control, security, cost, and performance they can deliver. The more complex the environment, the more difficult it is to deliver on their brand promises. And of course, the lesser margin that's left for them to capture. Will the hyperscalers get more serious about cross-cloud services? Maybe, but they have plenty of work to do within their own clouds and within enabling their own ecosystems. They had a long way to go a lot of runway. So let's talk about specifically, what problems superclouds solve? We've all seen the stats from IDC or Gartner, or whomever the customers on average use more than one cloud. You know, two clouds, three clouds, five clouds, 20 clouds. And we know these clouds operate in disconnected silos for the most part. And that's a problem because each cloud requires different skills because the development environment is different as is the operating environment. They have different APIs, different primitives, and different management tools that are optimized for each respective hyperscale cloud. Their functions and value props don't extend to their competitors' clouds for the most part. Why would they? As a result, there's friction when moving between different clouds. It's hard to share data, it's hard to move work. It's hard to secure and govern data. It's hard to enforce organizational edicts and policies across these clouds, and on-prem. Supercloud is an architecture designed to create a single environment that enables management of workloads and data across clouds in an effort to take out complexity, accelerate application development, streamline operations and share data safely, irrespective of location. It's pretty straightforward, but non-trivial, which is why I always ask a company's CEO and executives if stock buybacks and dividends will yield as much return as building out superclouds that solve really specific and hard problems, and create differential value. Okay, let's dig a bit more into the architectural aspects of supercloud. In other words, what are the salient attributes of supercloud? So first and foremost, a supercloud runs a set of specific services designed to solve a unique problem and it can do so in more than one cloud. Superclouds leverage the underlying cloud native tooling of a hyperscale cloud, but they're optimized for a specific objective that aligns with the problem that they're trying to solve. For example, supercloud might be optimized for lowest cost or lowest latency, or sharing data, or governing, or securing that data, or higher performance for networking, for example. But the point is, the collection of services that is being delivered is focused on a unique value proposition that is not being delivered by the hyperscalers across clouds. A supercloud abstracts the underlying and siloed primitives of the native PaaS layer from the hyperscale cloud and then using its own specific platform as a service tooling, creates a common experience across clouds for developers and users. And it does so in a most efficient manner, meaning it has the metadata knowledge and management capabilities that can optimize for latency, bandwidth, or recovery, or data sovereignty, or whatever unique value that supercloud is delivering for the specific use case in their domain. And a supercloud comprises a super PaaS capability that allows ecosystem partners through APIs to add incremental value on top of the supercloud platform to fill gaps, accelerate features, and of course innovate. The services can be infrastructure-related, they could be application services, they could be data services, security services, user services, et cetera, designed and packaged to bring unique value to customers. Again, that hyperscalers are not delivering across clouds or on-premises. Okay, so another common question we get is, isn't that just multi-cloud? And what we'd say to that is yes, but no. You can call it multi-cloud 2.0, if you want, if you want to use it, it's kind of a commonly used rubric. But as Dell's Chuck Whitten proclaimed at Dell Technologies World this year, multi-cloud by design, is different than multi-cloud by default. Meaning to date, multi-cloud has largely been a symptom of what we've called multi-vendor or of M&A, you buy a company and they happen to use Google Cloud, and so you bring it in. And when you look at most so-called, multi-cloud implementations, you see things like an on-prem stack, which is wrapped in a container and hosted on a specific cloud or increasingly a technology vendor has done the work of building a cloud native version of their stack and running it on a specific cloud. But historically, it's been a unique experience within each cloud with virtually no connection between the cloud silos. Supercloud sets out to build incremental value across clouds and above hyperscale CAPEX that goes beyond cloud compatibility within each cloud. So if you want to call it multi-cloud 2.0, that's fine, but we chose to call it supercloud. Okay, so at this point you may be asking, well isn't PaaS already a version of supercloud? And again, we would say no, that supercloud and its corresponding superPaaS layer which is a prerequisite, gives the freedom to store, process and manage, and secure, and connect islands of data across a continuum with a common experience across clouds. And the services offered are specific to that supercloud and will vary by each offering. Your OpenShift, for example, can be used to construct a superPaaS, but in and of itself, isn't a superPaaS, it's generic. A superPaaS might be developed to support, for instance, ultra low latency database work. It would unlikely again, taking the OpenShift example, it's unlikely that off-the-shelf OpenShift would be used to develop such a low latency superPaaS layer for ultra low latency database work. The point is supercloud and its inherent superPaaS will be optimized to solve specific problems like that low latency example for distributed databases or fast backup and recovery for data protection, and ransomware, or data sharing, or data governance. Highly specific use cases that the supercloud is designed to solve for. Okay, another question we often get is who has a supercloud today and who's building a supercloud, and who are the contenders? Well, most companies that consider themselves cloud players will, we believe, be building or are building superclouds. Here's a common ETR graphic that we like to show with Net Score or spending momentum on the Y axis and overlap or pervasiveness in the ETR surveys on the X axis. And we've randomly chosen a number of players that we think are in the supercloud mix, and we've included the hyperscalers because they are enablers. Now remember, this is a spectrum of maturity it's a maturity model and we've added some of those industry players that we see building superclouds like CapitalOne, Goldman Sachs, Walmart. This is in deference to Moschella's observation around The Matrix and the industry structural changes that are going on. This goes back to every company, being a software company and rather than pattern match an outdated SaaS model, we see new industry structures emerging where software and data, and tools, specific to an industry will lead the next wave of innovation and bring in new value that traditional technology companies aren't going to solve, and the hyperscalers aren't going to solve. You know, we've talked a lot about Snowflake's data cloud as an example of supercloud. After being at Snowflake Summit, we're more convinced than ever that they're headed in this direction. VMware is clearly going after cross-cloud services you know, perhaps creating a new category. Basically, every large company we see either pursuing supercloud initiatives or thinking about it. Dell showed project Alpine at Dell Tech World, that's a supercloud. Snowflake introducing a new application development capability based on their superPaaS, our term of course, they don't use the phrase. Mongo, Couchbase, Nutanix, Pure Storage, Veeam, CrowdStrike, Okta, Zscaler. Yeah, all of those guys. Yes, Cisco and HPE. Even though on theCUBE at HPE Discover, Fidelma Russo said on theCUBE, she wasn't a fan of cloaking mechanisms, but then we talked to HPE's Head of Storage Services, Omer Asad is clearly headed in the direction that we would consider supercloud. Again, those cross-cloud services, of course, their emphasis is connecting as well on-prem. That single experience, which traditionally has not existed with multi-cloud or hybrid. And we're seeing the emergence of companies, smaller companies like Aviatrix and Starburst, and Clumio and others that are building versions of superclouds that solve for a specific problem for their customers. Even ISVs like Adobe, ADP, we've talked to UiPath. They seem to be looking at new ways to go beyond the SaaS model and add value within their cloud ecosystem specifically, around data as part of their and their customers digital transformations. So yeah, pretty much every tech vendor with any size or momentum and new industry players are coming out of hiding, and competing. Building superclouds that look a lot like Moschella's Matrix, with machine intelligence and blockchains, and virtual realities, and gaming, all enabled by the internet and hyperscale cloud CAPEX. So it's moving fast and it's the future in our opinion. So don't get too caught up in the past or you'll be left behind. Okay, what about examples? We've given a number in the past, but let's try to be a little bit more specific. Here are a few we've selected and we're going to answer the two questions in one section here. What workloads and services will run in superclouds and what are some examples? Let's start with analytics. Our favorite example is Snowflake, it's one of the furthest along with its data cloud, in our view. It's a supercloud optimized for data sharing and governance, query performance, and security, and ecosystem enablement. When you do things inside of that data cloud, what we call a super data cloud. Again, our term, not theirs. You can do things that you could not do in a single cloud. You can't do this with Redshift, You can't do this with SQL server and they're bringing new data types now with merging analytics or at least accommodate analytics and transaction type data, and bringing open source tooling with things like Apache Iceberg. And so it ticks the boxes we laid out earlier. I would say that a company like Databricks is also in that mix doing it, coming at it from a data science perspective, trying to create that consistent experience for data scientists and data engineering across clouds. Converge databases, running transaction and analytic workloads is another example. Take a look at what Couchbase is doing with Capella and how it's enabling stretching the cloud to the edge with ARM-based platforms and optimizing for low latency across clouds, and even out to the edge. Document database workloads, look at MongoDB, a very developer-friendly platform that with the Atlas is moving toward a supercloud model running document databases very, very efficiently. How about general purpose workloads? This is where VMware comes into to play. Very clearly, there's a need to create a common operating environment across clouds and on-prem, and out to the edge. And I say VMware is hard at work on that. Managing and moving workloads, and balancing workloads, and being able to recover very quickly across clouds for everyday applications. Network routing, take a look at what Aviatrix is doing across clouds, industry workloads. We see CapitalOne, it announced its cost optimization platform for Snowflake, piggybacking on Snowflake supercloud or super data cloud. And in our view, it's very clearly going to go after other markets is going to test it out with Snowflake, running, optimizing on AWS and it's going to expand to other clouds as Snowflake's business and those other clouds grows. Walmart working with Microsoft to create an on-premed Azure experience that's seamless. Yes, that counts, on-prem counts. If you can create that seamless and continuous experience, identical experience from on-prem to a hyperscale cloud, we would include that as a supercloud. You know, we've written about what Goldman is doing. Again, connecting its on-prem data and software tooling, and other capabilities to AWS for scale. And we can bet dollars to donuts that Oracle will be building a supercloud in healthcare with its Cerner acquisition. Supercloud is everywhere you look. So I'm sorry, naysayers it's happening all around us. So what's next? Well, with all the industry buzz and debate about the future, John Furrier and I, have decided to host an event in Palo Alto, we're motivated and inspired to further this conversation. And we welcome all points of view, positive, negative, multi-cloud, supercloud, hypercloud, all welcome. So theCUBE on Supercloud is coming on August 9th, out of our Palo Alto studios, we'll be running a live program on the topic. We've reached out to a number of industry participants, VMware, Snowflake, Confluent, Sky High Security, Gee Rittenhouse's new company, HashiCorp, CloudFlare. We've hit up Red Hat and we expect many of these folks will be in our studios on August 9th. And we've invited a number of industry participants as well that we're excited to have on. From industry, from financial services, from healthcare, from retail, we're inviting analysts, thought leaders, investors. We're going to have more detail in the coming weeks, but for now, if you're interested, please reach out to me or John with how you think you can advance the discussion and we'll see if we can fit you in. So mark your calendars, stay tuned for more information. Okay, that's it for today. Thanks to Alex Myerson who handles production and manages the podcast for Breaking Analysis. And I want to thank Kristen Martin and Cheryl Knight, they help get the word out on social and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE, who does a lot of editing and appreciate you posting on SiliconANGLE, Rob. Thanks to all of you. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis podcast. It publish each week on wikibon.com and siliconangle.com. You can email me directly at david.vellante@siliconangle.com or DM me @DVellante, or comment on my LinkedIn post. And please do check out ETR.ai for the best survey data. And the enterprise tech business will be at AWS NYC Summit next Tuesday, July 12th. So if you're there, please do stop by and say hello to theCUBE, it's at the Javits Center. This is Dave Vellante for theCUBE insights powered by ETR. Thanks for watching. And we'll see you next time on "Breaking Analysis." (bright music)

Published Date : Jul 9 2022

SUMMARY :

From the theCUBE studios and how it's enabling stretching the cloud

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Seagate	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
1987	DATE	0.99+
Andy Rappaport	PERSON	0.99+
David Moschella	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Jerry Chen	PERSON	0.99+
Intel	ORGANIZATION	0.99+
Chuck Whitten	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Rob Hof	PERSON	0.99+
1991	DATE	0.99+
August 9th	DATE	0.99+
Amazon	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
John	PERSON	0.99+
Moschella	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
20 clouds	QUANTITY	0.99+
Starburst	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Fidelma Russo	PERSON	0.99+
2018	DATE	0.99+
two questions	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Aviatrix	ORGANIZATION	0.99+
Omer Asad	PERSON	0.99+
Sky High Security	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Confluent	ORGANIZATION	0.99+
Wintel	ORGANIZATION	0.99+
Nutanix	ORGANIZATION	0.99+
CapitalOne	ORGANIZATION	0.99+
Couchbase	ORGANIZATION	0.99+
HashiCorp	ORGANIZATION	0.99+
five clouds	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
last year	DATE	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
two clouds	QUANTITY	0.99+
Rob	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Pure Storage	ORGANIZATION	0.99+
each cloud	QUANTITY	0.99+
Veeam	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Gartner	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
first two	QUANTITY	0.99+
Clumio	ORGANIZATION	0.99+
CrowdStrike	ORGANIZATION	0.99+
Okta	ORGANIZATION	0.99+
three clouds	QUANTITY	0.99+
MIT	ORGANIZATION	0.99+
Javits Center	LOCATION	0.99+
first time	QUANTITY	0.99+
Zscaler	ORGANIZATION	0.99+
Rappaport	PERSON	0.99+
Moschella	ORGANIZATION	0.99+
each week	QUANTITY	0.99+
late last year	DATE	0.99+
UiPath	ORGANIZATION	0.99+
10 most frequently asked questions	QUANTITY	0.99+
CloudFlare	ORGANIZATION	0.99+
IDC	ORGANIZATION	0.99+
one section	QUANTITY	0.99+
SiliconANGLE	ORGANIZATION	0.98+
Seeing Digital	TITLE	0.98+
each	QUANTITY	0.98+
first	QUANTITY	0.98+
both	QUANTITY	0.98+
Adobe	ORGANIZATION	0.98+
more than one cloud	QUANTITY	0.98+
each offering	QUANTITY	0.98+

Breaking Analysis: Answering the top 10 questions about supercloud

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vallante. >> Welcome to this week's Wikibon CUBE Insights powered by ETR. As we exited the isolation economy last year, Supercloud is a term that we introduced to describe something new that was happening in the world of cloud. In this "Breaking Analysis," we address the 10 most frequently asked questions we get around Supercloud. Okay, let's review these frequently asked questions on Supercloud that we're going to try to answer today. Look at an industry that's full of hype and buzzwords. Why the hell does anyone need a new term? Aren't hyperscalers building out Superclouds? We'll try to answer why the term Supercloud connotes something different from hyperscale clouds. And we'll talk about the problems that Superclouds solve specifically, and we'll further define the critical aspects of a Supercloud architecture. We often get asked, "Isn't this just multi-cloud?" Well, we don't think so, and we'll explain why in this "Breaking Analysis." Now, in an earlier episode, we introduced the notion of super PaaS. Well, isn't a plain vanilla PaaS already a super PaaS? Again, we don't think so, and we'll explain why. Who will actually build and who are the players currently building Superclouds? What workloads and services will run on Superclouds? And eight A or number nine, what are some examples that we can share of Supercloud? And finally, we'll answer what you can expect next from us on Supercloud. Okay, let's get started. Why do we need another buzzword? Well, late last year ahead of re:Invent, we were inspired by a post from Jerry Chen called castles in the cloud. Now, in that blog post, he introduced the idea that there were submarkets emerging in cloud that presented opportunities for investors and entrepreneurs. That the cloud wasn't going to suck the hyperscalers, weren't going to suck all the value out of the industry. And so we introduced this notion of Supercloud to describe what we saw as a value layer emerging above the hyperscalers CAPEX gift, we sometimes call it. Now, it turns out that we weren't the only ones using the term, as both Cornell and MIT, have used the phrase in somewhat similar, but different contexts. The point is, something new was happening in the AWS and other ecosystems. It was more than IS and PaaS, and wasn't just SaaS running in the cloud. It was a new architecture that integrates infrastructure, platform and software as services, to solve new problems that the cloud vendors, in our view, weren't addressing by themselves. It seemed to us that the ecosystem was pursuing opportunities across clouds that went beyond conventional implementations of multi-cloud. And we felt there was a structural change going on at the industry level. The Supercloud metaphorically was highlighting. So that's the background on why we felt a new catch phrase was warranted. Love it or hate it, it's memorable and it's what we chose. Now, to that last point about structural industry transformation. Andy Rapaport is sometimes and often credited with identifying the shift from the vertically integrated IBM mainframe era to the fragmented PC microprocesor based era in his HBR article in 1991. In fact, it was David Moschella, who at the time was an IDC analyst who first introduced the concept in 1987, four years before Rapaport's article was published. Moschella saw that it was clear that Intel, Microsoft, Seagate and others would replace the system vendors and put that forth in a graphic that looked similar to the first two on this chart. We don't have to review the shift from IBM as the center of the industry to Wintel. That's well understood. What isn't as well known or accepted is what Moschella put out in his 2018 book called "Seeing Digital" which introduced the idea of the matrix that's shown on the right hand side of this chart. Moschella posited that new services were emerging, built on top of the internet and hyperscale clouds that would integrate other innovations and would define the next era of computing. He used the term matrix, because the conceptual depiction included, not only horizontal technology rows, like the cloud and the internet, but for the first time included connected industry verticals, the columns in this chart. Moschella pointed out that, whereas historically, industry verticals had a closed value chain or stack and ecosystem of R&D and production and manufacturing and distribution. And if you were in that industry, the expertise within that vertical generally stayed within that vertical and was critical to success. But because of digital and data, for the first time, companies were able to traverse industries jump across industries and compete because data enabled them to do that. Examples, Amazon and content, payments, groceries, Apple and payments, and content and so forth. There are many examples. Data was now this unifying enabler and this marked a change in the structure of the technology landscape. And Supercloud is meant to imply more than running in hyperscale clouds. Rather, it's the combination of multiple technologies, enabled by cloud scale with new industry participants from those verticals; financial services, and healthcare, and manufacturing, energy, media, and virtually all and any industry. Kind of an extension of every company is a software company. Basically, every company now has the opportunity to build their own cloud or Supercloud. And we'll come back to that. Let's first address what's different about Superclouds relative to hyperscale clouds. Now, this one's pretty straightforward and obvious, I think. Hyperscale clouds, they're walled gardens where they want your data in their cloud and they want to keep you there. Sure, every cloud player realizes that not all data will go to their particular cloud. So they're meeting customers where their data lives with initiatives like Amazon Outposts and Azure Arc and Google Antos. But at the end of the day, the more homogeneous they can make their environments, the better control, security, costs, and performance they can deliver. The more complex the environment, the more difficult it is to deliver on their brand promises. And, of course, the less margin that's left for them to capture. Will the hyperscalers get more serious about cross cloud services? Maybe, but they have plenty of work to do within their own clouds and within enabling their own ecosystems. They have a long way to go, a lot of runway. So let's talk about specifically, what problems Superclouds solve. We've all seen the stats from IDC or Gartner or whomever, that customers on average use more than one cloud, two clouds, three clouds, five clouds, 20 clouds. And we know these clouds operate in disconnected silos for the most part. And that's a problem, because each cloud requires different skills, because the development environment is different as is the operating environment. They have different APIs, different primitives, and different management tools that are optimized for each respective hyperscale cloud. Their functions and value props don't extend to their competitors' clouds for the most part. Why would they? As a result, there's friction when moving between different clouds. It's hard to share data. It's hard to move work. It's hard to secure and govern data. It's hard to enforce organizational edicts and policies across these clouds and on-prem. Supercloud is an architecture designed to create a single environment that enables management of workloads and data across clouds in an effort to take out complexity, accelerate application development, streamline operations, and share data safely, irrespective of location. It's pretty straightforward, but non-trivial, which is why I always ask a company's CEO and executives if stock buybacks and dividends will yield as much return as building out Superclouds that solve really specific and hard problems and create differential value. Okay, let's dig a bit more into the architectural aspects of Supercloud. In other words, what are the salient attributes of Supercloud? So, first and foremost, a Supercloud runs a set of specific services designed to solve a unique problem, and it can do so in more than one cloud. Superclouds leverage the underlying cloud native tooling of a hyperscale cloud, but they're optimized for a specific objective that aligns with the problem that they're trying to solve. For example, Supercloud might be optimized for lowest cost or lowest latency or sharing data or governing or securing that data or higher performance for networking, for example. But the point is, the collection of services that is being delivered is focused on a unique value proposition that is not being delivered by the hyperscalers across clouds. A Supercloud abstracts the underlying and siloed primitives of the native PaaS layer from the hyperscale cloud, and then using its own specific platform as a service tooling, creates a common experience across clouds for developers and users. And it does so in the most efficient manner, meaning it has the metadata knowledge and management capabilities that can optimize for latency, bandwidth, or recovery or data sovereignty, or whatever unique value that Supercloud is delivering for the specific use case in their domain. And a Supercloud comprises a super PaaS capability that allows ecosystem partners through APIs to add incremental value on top of the Supercloud platform to fill gaps, accelerate features, and of course, innovate. The services can be infrastructure related, they could be application services, they could be data services, security services, user services, et cetera, designed and packaged to bring unique value to customers. Again, that hyperscalers are not delivering across clouds or on premises. Okay, so another common question we get is, "Isn't that just multi-cloud?" And what we'd say to that is yeah, "Yes, but no." You can call it multi-cloud 2.0, if you want. If you want to use, it's kind of a commonly used rubric. But as Dell's Chuck Whitten proclaimed at Dell Technologies World this year, multi-cloud, by design, is different than multi-cloud by default. Meaning, to date, multi-cloud has largely been a symptom of what we've called multi-vendor or of M&A. You buy a company and they happen to use Google cloud. And so you bring it in. And when you look at most so-called multi-cloud implementations, you see things like an on-prem stack, which is wrapped in a container and hosted on a specific cloud. Or increasingly, a technology vendor has done the work of building a cloud native version of their stack and running it on a specific cloud. But historically, it's been a unique experience within each cloud, with virtually no connection between the cloud silos. Supercloud sets out to build incremental value across clouds and above hyperscale CAPEX that goes beyond cloud compatibility within each cloud. So, if you want to call it multi-cloud 2.0, that's fine, but we chose to call it Supercloud. Okay, so at this point you may be asking, "Well isn't PaaS already a version of Supercloud?" And again, we would say, "No." That Supercloud and its corresponding super PaaS layer, which is a prerequisite, gives the freedom to store, process, and manage and secure and connect islands of data across a continuum with a common experience across clouds. And the services offered are specific to that Supercloud and will vary by each offering. OpenShift, for example, can be used to construct a super PaaS, but in and of itself, isn't a super PaaS, it's generic. A super PaaS might be developed to support, for instance, ultra low latency database work. It would unlikely, again, taking the OpenShift example, it's unlikely that off the shelf OpenShift would be used to develop such a low latency, super PaaS layer for ultra low latency database work. The point is, Supercloud and its inherent super PaaS will be optimized to solve specific problems like that low latency example for distributed databases or fast backup in recovery for data protection and ransomware, or data sharing or data governance. Highly specific use cases that the Supercloud is designed to solve for. Okay, another question we often get is, "Who has a Supercloud today and who's building a Supercloud and who are the contenders?" Well, most companies that consider themselves cloud players will, we believe, be building or are building Superclouds. Here's a common ETR graphic that we like to show with net score or spending momentum on the Y axis, and overlap or pervasiveness in the ETR surveys on the X axis. And we've randomly chosen a number of players that we think are in the Supercloud mix. And we've included the hyperscalers because they are enablers. Now, remember, this is a spectrum of maturity. It's a maturity model. And we've added some of those industry players that we see building Superclouds like Capital One, Goldman Sachs, Walmart. This is in deference to Moschella's observation around the matrix and the industry structural changes that are going on. This goes back to every company being a software company. And rather than pattern match and outdated SaaS model, we see new industry structures emerging where software and data and tools specific to an industry will lead the next wave of innovation and bring in new value that traditional technology companies aren't going to solve. And the hyperscalers aren't going to solve. We've talked a lot about Snowflake's data cloud as an example of Supercloud. After being at Snowflake Summit, we're more convinced than ever that they're headed in this direction. VMware is clearly going after cross cloud services, perhaps creating a new category. Basically, every large company we see either pursuing Supercloud initiatives or thinking about it. Dell showed Project Alpine at Dell Tech World. That's a Supercloud. Snowflake introducing a new application development capability based on their super PaaS, our term, of course. They don't use the phrase. Mongo, Couchbase, Nutanix, Pure Storage, Veeam, CrowdStrike, Okta, Zscaler. Yeah, all of those guys. Yes, Cisco and HPE. Even though on theCUBE at HPE Discover, Fidelma Russo said on theCUBE, she wasn't a fan of cloaking mechanisms. (Dave laughing) But then we talked to HPE's head of storage services, Omer Asad, and he's clearly headed in the direction that we would consider Supercloud. Again, those cross cloud services, of course, their emphasis is connecting as well on-prem. That single experience, which traditionally has not existed with multi-cloud or hybrid. And we're seeing the emergence of smaller companies like Aviatrix and Starburst and Clumio and others that are building versions of Superclouds that solve for a specific problem for their customers. Even ISVs like Adobe, ADP, we've talked to UiPath. They seem to be looking at new ways to go beyond the SaaS model and add value within their cloud ecosystem, specifically around data as part of their and their customer's digital transformations. So yeah, pretty much every tech vendor with any size or momentum, and new industry players are coming out of hiding and competing, building Superclouds that look a lot like Moschella's matrix, with machine intelligence and blockchains and virtual realities and gaming, all enabled by the internet and hyperscale cloud CAPEX. So it's moving fast and it's the future in our opinion. So don't get too caught up in the past or you'll be left behind. Okay, what about examples? We've given a number in the past but let's try to be a little bit more specific. Here are a few we've selected and we're going to answer the two questions in one section here. What workloads and services will run in Superclouds and what are some examples? Let's start with analytics. Our favorite example of Snowflake. It's one of the furthest along with its data cloud, in our view. It's a Supercloud optimized for data sharing and governance, and query performance, and security, and ecosystem enablement. When you do things inside of that data cloud, what we call a super data cloud. Again, our term, not theirs. You can do things that you could not do in a single cloud. You can't do this with Redshift. You can't do this with SQL server. And they're bringing new data types now with merging analytics or at least accommodate analytics and transaction type data and bringing open source tooling with things like Apache Iceberg. And so, it ticks the boxes we laid out earlier. I would say that a company like Databricks is also in that mix, doing it, coming at it from a data science perspective trying to create that consistent experience for data scientists and data engineering across clouds. Converge databases, running transaction and analytic workloads is another example. Take a look at what Couchbase is doing with Capella and how it's enabling stretching the cloud to the edge with arm based platforms and optimizing for low latency across clouds, and even out to the edge. Document database workloads, look at Mongo DB. A very developer friendly platform that where the Atlas is moving toward a Supercloud model, running document databases very, very efficiently. How about general purpose workloads? This is where VMware comes into play. Very clearly, there's a need to create a common operating environment across clouds and on-prem and out to the edge. And I say, VMware is hard at work on that, managing and moving workloads and balancing workloads, and being able to recover very quickly across clouds for everyday applications. Network routing, take a look at what Aviatrix is doing across clouds. Industry workloads, we see Capital One. It announced its cost optimization platform for Snowflake, piggybacking on Snowflake's Supercloud or super data cloud. And in our view, it's very clearly going to go after other markets. It's going to test it out with Snowflake, optimizing on AWS, and it's going to expand to other clouds as Snowflake's business and those other clouds grows. Walmart working with Microsoft to create an on-premed Azure experience that's seamless. Yes, that counts, on-prem counts. If you can create that seamless and continuous experience, identical experience from on-prem to a hyperscale cloud, we would include that as a Supercloud. We've written about what Goldman is doing. Again, connecting its on-prem data and software tooling, and other capabilities to AWS for scale. And you can bet dollars to donuts that Oracle will be building a Supercloud in healthcare with its Cerner acquisition. Supercloud is everywhere you look. So I'm sorry, naysayers, it's happening all around us. So what's next? Well, with all the industry buzz and debate about the future, John Furrier and I have decided to host an event in Palo Alto. We're motivated and inspired to further this conversation. And we welcome all points of view, positive, negative, multi-cloud, Supercloud, HyperCloud, all welcome. So theCUBE on Supercloud is coming on August 9th out of our Palo Alto studios. We'll be running a live program on the topic. We've reached out to a number of industry participants; VMware, Snowflake, Confluent, Skyhigh Security, G. Written House's new company, HashiCorp, CloudFlare. We've hit up Red Hat and we expect many of these folks will be in our studios on August 9th. And we've invited a number of industry participants as well that we're excited to have on. From industry, from financial services, from healthcare, from retail, we're inviting analysts, thought leaders, investors. We're going to have more detail in the coming weeks, but for now, if you're interested, please reach out to me or John with how you think you can advance the discussion, and we'll see if we can fit you in. So mark your calendars, stay tuned for more information. Okay, that's it for today. Thanks to Alex Myerson who handles production and manages the podcast for "Breaking Analysis." And I want to thank Kristen Martin and Cheryl Knight. They help get the word out on social and in our newsletters. And Rob Hof is our editor in chief over at SiliconANGLE, who does a lot of editing and appreciate you posting on SiliconANGLE, Rob. Thanks to all of you. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search, breaking analysis podcast. I publish each week on wikibon.com and siliconangle.com. Or you can email me directly at david.vellante@siliconangle.com. Or DM me @DVallante, or comment on my LinkedIn post. And please, do check out etr.ai for the best survey data in the enterprise tech business. We'll be at AWS NYC summit next Tuesday, July 12th. So if you're there, please do stop by and say hello to theCUBE. It's at the Javits Center. This is Dave Vallante for theCUBE Insights, powered by ETR. Thanks for watching. And we'll see you next time on "Breaking Analysis." (slow music)

Published Date : Jul 8 2022

SUMMARY :

This is "Breaking Analysis" stretching the cloud to the edge

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Seagate	ORGANIZATION	0.99+
1987	DATE	0.99+
Dave Vallante	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
1991	DATE	0.99+
Andy Rapaport	PERSON	0.99+
Jerry Chen	PERSON	0.99+
Moschella	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Cheryl Knight	PERSON	0.99+
David Moschella	PERSON	0.99+
Rob Hof	PERSON	0.99+
Palo Alto	LOCATION	0.99+
August 9th	DATE	0.99+
Intel	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
Chuck Whitten	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Fidelma Russo	PERSON	0.99+
20 clouds	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Wintel	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
2018	DATE	0.99+
Apple	ORGANIZATION	0.99+
John	PERSON	0.99+
Boston	LOCATION	0.99+
Aviatrix	ORGANIZATION	0.99+
Starburst	ORGANIZATION	0.99+
Confluent	ORGANIZATION	0.99+
five clouds	QUANTITY	0.99+
Clumio	ORGANIZATION	0.99+
Couchbase	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
Nutanix	ORGANIZATION	0.99+
Moschella	ORGANIZATION	0.99+
Skyhigh Security	ORGANIZATION	0.99+
MIT	ORGANIZATION	0.99+
HashiCorp	ORGANIZATION	0.99+
last year	DATE	0.99+
Rob	PERSON	0.99+
two clouds	QUANTITY	0.99+
three clouds	QUANTITY	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
first two	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
Gartner	ORGANIZATION	0.99+
CrowdStrike	ORGANIZATION	0.99+
Okta	ORGANIZATION	0.99+
Pure Storage	ORGANIZATION	0.99+
Omer Asad	PERSON	0.99+
Capital One	ORGANIZATION	0.99+
each cloud	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Veeam	ORGANIZATION	0.99+
OpenShift	TITLE	0.99+
10 most frequently asked questions	QUANTITY	0.99+
Rapaport	PERSON	0.99+
SiliconANGLE	ORGANIZATION	0.99+
CloudFlare	ORGANIZATION	0.99+
one section	QUANTITY	0.99+
Seeing Digital	TITLE	0.99+
VMware	ORGANIZATION	0.99+
IDC	ORGANIZATION	0.99+
Zscaler	ORGANIZATION	0.99+
each week	QUANTITY	0.99+
Javits Center	LOCATION	0.99+
late last year	DATE	0.98+
first	QUANTITY	0.98+
Adobe	ORGANIZATION	0.98+
more than one cloud	QUANTITY	0.98+
each offering	QUANTITY	0.98+

Breaking Analysis: Snowflake Summit 2022...All About Apps & Monetization

>> From theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Snowflake Summit 2022 underscored that the ecosystem excitement which was once forming around Hadoop is being reborn, escalated and coalescing around Snowflake's data cloud. What was once seen as a simpler cloud data warehouse and good marketing with the data cloud is evolving rapidly with new workloads of vertical industry focus, data applications, monetization, and more. The question is, will the promise of data be fulfilled this time around, or is it same wine, new bottle? Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR. In this "Breaking Analysis," we'll talk about the event, the announcements that Snowflake made that are of greatest interest, the major themes of the show, what was hype and what was real, the competition, and some concerns that remain in many parts of the ecosystem and pockets of customers. First let's look at the overall event. It was held at Caesars Forum. Not my favorite venue, but I'll tell you it was packed. Fire Marshall Full, as we sometimes say. Nearly 10,000 people attended the event. Here's Snowflake's CMO Denise Persson on theCUBE describing how this event has evolved. >> Yeah, two, three years ago, we were about 1800 people at a Hilton in San Francisco. We had about 40 partners attending. This week we're close to 10,000 attendees here. Almost 10,000 people online as well, and over over 200 partners here on the show floor. >> Now, those numbers from 2019 remind me of the early days of Hadoop World, which was put on by Cloudera but then Cloudera handed off the event to O'Reilly as this article that we've inserted, if you bring back that slide would say. The headline it almost got it right. Hadoop World was a failure, but it didn't have to be. Snowflake has filled the void created by O'Reilly when it first killed Hadoop World, and killed the name and then killed Strata. Now, ironically, the momentum and excitement from Hadoop's early days, it probably could have stayed with Cloudera but the beginning of the end was when they gave the conference over to O'Reilly. We can't imagine Frank Slootman handing the keys to the kingdom to a third party. Serious business was done at this event. I'm talking substantive deals. Salespeople from a host sponsor and the ecosystems that support these events, they love physical. They really don't like virtual because physical belly to belly means relationship building, pipeline, and deals. And that was blatantly obvious at this show. And in fairness, all theCUBE events that we've done year but this one was more vibrant because of its attendance and the action in the ecosystem. Ecosystem is a hallmark of a cloud company, and that's what Snowflake is. We asked Frank Slootman on theCUBE, was this ecosystem evolution by design or did Snowflake just kind of stumble into it? Here's what he said. >> Well, when you are a data clouding, you have data, people want to do things with that data. They don't want just run data operations, populate dashboards, run reports. Pretty soon they want to build applications and after they build applications, they want build businesses on it. So it goes on and on and on. So it drives your development to enable more and more functionality on that data cloud. Didn't start out that way, you know, we were very, very much focused on data operations. Then it becomes application development and then it becomes, hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many ways. >> So it sounds like it was maybe a little bit of both. The Facebook analogy is interesting because Facebook is a walled garden, as is Snowflake, but when you come into that garden, you have assurances that things are going to work in a very specific way because a set of standards and protocols is being enforced by a steward, i.e. Snowflake. This means things run better inside of Snowflake than if you try to do all the integration yourself. Now, maybe over time, an open source version of that will come out but if you wait for that, you're going to be left behind. That said, Snowflake has made moves to make its platform more accommodating to open source tooling in many of its announcements this week. Now, I'm not going to do a deep dive on the announcements. Matt Sulkins from Monte Carlo wrote a decent summary of the keynotes and a number of analysts like Sanjeev Mohan, Tony Bear and others are posting some deeper analysis on these innovations, and so we'll point to those. I'll say a few things though. Unistore extends the type of data that can live in the Snowflake data cloud. It's enabled by a new feature called hybrid tables, a new table type in Snowflake. One of the big knocks against Snowflake was it couldn't handle and transaction data. Several database companies are creating this notion of a hybrid where both analytic and transactional workloads can live in the same data store. Oracle's doing this for example, with MySQL HeatWave and there are many others. We saw Mongo earlier this month add an analytics capability to its transaction system. Mongo also added sequel, which was kind of interesting. Here's what Constellation Research analyst Doug Henschen said about Snowflake's moves into transaction data. Play the clip. >> Well with Unistore, they're reaching out and trying to bring transactional data in. Hey, don't limit this to analytical information and there's other ways to do that like CDC and streaming but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So another reach to a broader play across a big community that they're building. >> And you're also seeing Snowflake expand its workload types in its unique way and through Snowpark and its stream lit acquisition, enabling Python so that native apps can be built in the data cloud and benefit from all that structure and the features that Snowflake is built in. Hence that Facebook analogy, or maybe the App Store, the Apple App Store as I propose as well. Python support also widens the aperture for machine intelligence workloads. We asked Snowflake senior VP of product, Christian Kleinerman which announcements he thought were the most impactful. And despite the who's your favorite child nature of the question, he did answer. Here's what he said. >> I think the native applications is the one that looks like, eh, I don't know about it on the surface but he has the biggest potential to change everything. That's create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >> Snowflake also announced support for Apache Iceberg, which is a new open table format standard that's emerging. So you're seeing Snowflake respond to these concerns about its lack of openness, and they're building optionality into their cloud. They also showed some cost op optimization tools both from Snowflake itself and from the ecosystem, notably Capital One which launched a software business on top of Snowflake focused on optimizing cost and eventually the rollout data management capabilities, and all kinds of features that Snowflake announced that the show around governance, cross cloud, what we call super cloud, a new security workload, and they reemphasize their ability to read non-native on-prem data into Snowflake through partnerships with Dell and Pure and a lot more. Let's hear from some of the analysts that came on theCUBE this week at Snowflake Summit to see what they said about the announcements and their takeaways from the event. This is Dave Menninger, Sanjeev Mohan, and Tony Bear, roll the clip. >> Our research shows that the majority of organizations, the majority of people do not have access to analytics. And so a couple of the things they've announced I think address those or help to address those issues very directly. So Snowpark and support for Python and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job and they need analytics in that job. >> Primarily, I think it is to contract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started but it's usually beneficial to the customers, to the users because now if you have large amount of data in paket files you can leave it on S3, but then you using the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the micro partitioning, you get the metadata. And in a single query, you can join, you can do select from a Snowflake table union and select from an iceberg table and you can do store procedure, user defined function. So I think what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. So the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking, but most organizations have more than Snowflake. So what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other like relational databases, you can run these cross platform capabilities in that layer. So that way, you know, Snowflake's done a great job of enabling that ecosystem. >> I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically that are very comfortable with Tableau but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, and I think part of it, this kind of plays into it is what makes this different from the Hadoop era is the fact that all these capabilities, you know, a lot of vendors are taking it very seriously to put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the Streamlit capabilities are going to be native. >> I want to share a little bit about the higher level thinking at Snowflake, here's a chart from Frank Slootman's keynote. It's his version of the modern data stack, if you will. Now, Snowflake of course, was built on the public cloud. If there were no AWS, there would be no Snowflake. Now, they're all about bringing data and live data and expanding the types of data, including structured, we just heard about that, unstructured, geospatial, and the list is going to continue on and on. Eventually I think it's going to bleed into the edge if we can figure out what to do with that edge data. Executing on new workloads is a big deal. They started with data sharing and they recently added security and they've essentially created a PaaS layer. We call it a SuperPaaS layer, if you will, to attract application developers. Snowflake has a developer-focused event coming up in November and they've extended the marketplace with 1300 native apps listings. And at the top, that's the holy grail, monetization. We always talk about building data products and we saw a lot of that at this event, very, very impressive and unique. Now here's the thing. There's a lot of talk in the press, in the Wall Street and the broader community about consumption-based pricing and concerns over Snowflake's visibility and its forecast and how analytics may be discretionary. But if you're a company building apps in Snowflake and monetizing like Capital One intends to do, and you're now selling in the marketplace, that is not discretionary, unless of course your costs are greater than your revenue for that service, in which case is going to fail anyway. But the point is we're entering a new error where data apps and data products are beginning to be built and Snowflake is attempting to make the data cloud the defacto place as to where you're going to build them. In our view they're well ahead in that journey. Okay, let's talk about some of the bigger themes that we heard at the event. Bringing apps to the data instead of moving the data to the apps, this was a constant refrain and one that certainly makes sense from a physics point of view. But having a single source of data that is discoverable, sharable and governed with increasingly robust ecosystem options, it doesn't have to be moved. Sometimes it may have to be moved if you're going across regions, but that's unique and a differentiator for Snowflake in our view. I mean, I'm yet to see a data ecosystem that is as rich and growing as fast as the Snowflake ecosystem. Monetization, we talked about that, industry clouds, financial services, healthcare, retail, and media, all front and center at the event. My understanding is that Frank Slootman was a major force behind this shift, this development and go to market focus on verticals. It's really an attempt, and he talked about this in his keynote to align with the customer mission ultimately align with their objectives which not surprisingly, are increasingly monetizing with data as a differentiating ingredient. We heard a ton about data mesh, there were numerous presentations about the topic. And I'll say this, if you map the seven pillars Snowflake talks about, Benoit Dageville talked about this in his keynote, but if you map those into Zhamak Dehghani's data mesh framework and the four principles, they align better than most of the data mesh washing that I've seen. The seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace and governance. Those are the seven pillars that he talked about in his keynote. All data, well, maybe with hybrid tables that becomes more of a reality. Global architecture means the data is globally distributed. It's not necessarily physically in one place. Self-managed is key. Self-service infrastructure is one of Zhamak's four principles. And then inherent governance. Zhamak talks about computational, what I'll call automated governance, built in. And with all the talk about monetization, that aligns with the second principle which is data as product. So while it's not a pure hit and to its credit, by the way, Snowflake doesn't use data mesh in its messaging anymore. But by the way, its customers do, several customers talked about it. Geico, JPMC, and a number of other customers and partners are using the term and using it pretty closely to the concepts put forth by Zhamak Dehghani. But back to the point, they essentially, Snowflake that is, is building a proprietary system that substantially addresses some, if not many of the goals of data mesh. Okay, back to the list, supercloud, that's our term. We saw lots of examples of clouds on top of clouds that are architected to spin multiple clouds, not just run on individual clouds as separate services. And this includes Snowflake's data cloud itself but a number of ecosystem partners that are headed in a very similar direction. Snowflake still talks about data sharing but now it uses the term collaboration in its high level messaging, which is I think smart. Data sharing is kind of a geeky term. And also this is an attempt by Snowflake to differentiate from everyone else that's saying, hey, we do data sharing too. And finally Snowflake doesn't say data marketplace anymore. It's now marketplace, accounting for its application market. Okay, let's take a quick look at the competitive landscape via this ETR X-Y graph. Vertical access remembers net score or spending momentum and the x-axis is penetration, pervasiveness in the data center. That's what ETR calls overlap. Snowflake continues to lead on the vertical axis. They guide it conservatively last quarter, remember, so I wouldn't be surprised if that lofty height, even though it's well down from its earlier levels but I wouldn't be surprised if it ticks down again a bit in the July survey, which will be in the field shortly. Databricks is a key competitor obviously at a strong spending momentum, as you can see. We didn't draw it here but we usually draw that 40% line or red line at 40%, anything above that is considered elevated. So you can see Databricks is quite elevated. But it doesn't have the market presence of Snowflake. It didn't get to IPO during the bubble and it doesn't have nearly as deep and capable go-to market machinery. Now, they're getting better and they're getting some attention in the market, nonetheless. But as a private company, you just naturally, more people are aware of Snowflake. Some analysts, Tony Bear in particular, believe Mongo and Snowflake are on a bit of a collision course long term. I actually can see his point. You know, I mean, they're both platforms, they're both about data. It's long ways off, but you can see them sort of in a similar path. They talk about kind of similar aspirations and visions even though they're quite in different markets today but they're definitely participating in similar tam. The cloud players are probably the biggest or definitely the biggest partners and probably the biggest competitors to Snowflake. And then there's always Oracle. Doesn't have the spending velocity of the others but it's got strong market presence. It owns a cloud and it knows a thing about data and it definitely is a go-to market machine. Okay, we're going to end on some of the things that we heard in the ecosystem. 'Cause look, we've heard before how particular technology, enterprise data warehouse, data hubs, MDM, data lakes, Hadoop, et cetera. We're going to solve all of our data problems and of course they didn't. And in fact, sometimes they create more problems that allow vendors to push more incremental technology to solve the problems that they created. Like tools and platforms to clean up the no schema on right nature of data lakes or data swamps. But here are some of the things that I heard firsthand from some customers and partners. First thing is, they said to me that they're having a hard time keeping up sometimes with the pace of Snowflake. It reminds me of AWS in 2014, 2015 timeframe. You remember that fire hose of announcements which causes increased complexity for customers and partners. I talked to several customers that said, well, yeah this is all well and good but I still need skilled people to understand all these tools that I'm integrated in the ecosystem, the catalogs, the machine learning observability. A number of customers said, I just can't use one governance tool, I need multiple governance tools and a lot of other technologies as well, and they're concerned that that's going to drive up their cost and their complexity. I heard other concerns from the ecosystem that it used to be sort of clear as to where they could add value you know, when Snowflake was just a better data warehouse. But to point number one, they're either concerned that they'll be left behind or they're concerned that they'll be subsumed. Look, I mean, just like we tell AWS customers and partners, you got to move fast, you got to keep innovating. If you don't, you're going to be left. Either if your customer you're going to be left behind your competitor, or if you're a partner, somebody else is going to get there or AWS is going to solve the problem for you. Okay, and there were a number of skeptical practitioners, really thoughtful and experienced data pros that suggested that they've seen this movie before. That's hence the same wine, new bottle. Well, this time around I certainly hope not given all the energy and investment that is going into this ecosystem. And the fact is Snowflake is unquestionably making it easier to put data to work. They built on AWS so you didn't have to worry about provisioning, compute and storage and networking and scaling. Snowflake is optimizing its platform to take advantage of things like Graviton so you don't have to, and they're doing some of their own optimization tools. The ecosystem is building optimization tools so that's all good. And firm belief is the less expensive it is, the more data will get brought into the data cloud. And they're building a data platform on which their ecosystem can build and run data applications, aka data products without having to worry about all the hard work that needs to get done to make data discoverable, shareable, and governed. And unlike the last 10 years, you don't have to be a keeper and integrate all the animals in the Hadoop zoo. Okay, that's it for today, thanks for watching. Thanks to my colleague, Stephanie Chan who helps research "Breaking Analysis" topics. Sometimes Alex Myerson is on production and manages the podcasts. Kristin Martin and Cheryl Knight help get the word out on social and in our newsletters, and Rob Hof is our editor in chief over at Silicon, and Hailey does some wonderful editing, thanks to all. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis Podcasts. I publish each week on wikibon.com and siliconangle.com and you can email me at David.Vellante@siliconangle.com or DM me @DVellante. If you got something interesting, I'll respond. If you don't, I'm sorry I won't. Or comment on my LinkedIn post. Please check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time. (upbeat music)

Published Date : Jun 18 2022

SUMMARY :

bringing you data driven that the ecosystem excitement here on the show floor. and the action in the ecosystem. Didn't start out that way, you know, One of the big knocks against Snowflake the idea of bring your data of the question, he did answer. is the one that looks like, and from the ecosystem, And so a couple of the So that way, you know, from the Hadoop era is the fact the defacto place as to where

ENTITIES

Entity	Category	Confidence
Frank Slootman	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Rob Hof	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
2014	DATE	0.99+
Matt Sulkins	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
2019	DATE	0.99+
Cheryl Knight	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Denise Persson	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Tony Bear	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Dell	ORGANIZATION	0.99+
July	DATE	0.99+
Geico	ORGANIZATION	0.99+
November	DATE	0.99+
Snowflake	TITLE	0.99+
40%	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
App Store	TITLE	0.99+
Capital One	ORGANIZATION	0.99+
second principle	QUANTITY	0.99+
Sanjeev Mohan	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
1300 native apps	QUANTITY	0.99+
Tony Bear	PERSON	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
Kristin Martin	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Snowflake Summit 2022	EVENT	0.99+
First	QUANTITY	0.99+
two	DATE	0.99+
Python	TITLE	0.99+
10 different tables	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
ETR	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Snowflake	EVENT	0.98+
one place	QUANTITY	0.98+
each week	QUANTITY	0.98+
O'Reilly	ORGANIZATION	0.98+
This week	DATE	0.98+
Hadoop World	EVENT	0.98+
this week	DATE	0.98+
Pure	ORGANIZATION	0.98+
about 40 partners	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
last quarter	DATE	0.98+
One	QUANTITY	0.98+
S3	TITLE	0.97+
Hadoop	LOCATION	0.97+
single	QUANTITY	0.97+
Caesars Forum	LOCATION	0.97+
Iceberg	TITLE	0.97+
single source	QUANTITY	0.97+
Silicon	ORGANIZATION	0.97+
Nearly 10,000 people	QUANTITY	0.97+
Apache Iceberg	ORGANIZATION	0.97+

theCUBE Insights | Snowflake Summit 2022

(upbeat music) >> Hey everyone, welcome back to theCUBE's three day coverage of Snowflake Summit 22. Lisa Martin here with Dave Vellante. We have been here as I said for three days. Dave, we have had an amazing three days. The energy, the momentum, the number of people still here speaks volumes for- >> Yeah, I was just saying, you look back, theCUBE, when it started, early days was a big part of the Hadoop ecosystem. You know Cloudera kind of got it started, the whole big data movement, it was awesome energy, and that whole ecosystem has been, I think, just hoovered into the Snowflake ecosystem. They've taken over as the data company, the data cloud, I mean, that was Cloudera, it could have been Cloudera, and now they didn't, they missed it, it was a variety of factors, but Snowflake has nailed it. And now it's theirs to lose. Benoit talked about that on our previous segment, how he knew that technically Hadoop was too complex, and was going to fail, and they didn't know it was going to do this. They were going to turn their company into what we see here. But the event itself, Lisa, is almost 10,000 people, the right people, people are doing business, we've had a number of people tell us that they're booking deals. That's why people come to face-to-face shows, right? That's the criticism of virtual. It takes too long to close business. Salespeople want to be belly-to-belly. And this is a belly-to belly-show. >> It absolutely is. When you and I were trying to get into the keynote on Tuesday, we finally got in standing room only, multiple overflow rooms, and we're even hearing that, so this is day four of the summit for them, there are still queues to get into breakout sessions. The momentum, but the appetite for this flywheel, and what they're creating, but also they're involving this massively growing ecosystem in its evolution. It's that synergy was really very much heard, and echoed throughout pretty much all of our segments the last couple days. >> Yeah, it was amazing actually. So we like to go, we want to be in the front row in the keynotes, we're taking notes, we always do that. Sometimes we listen remotely, but when you listen remotely, you miss some things. When you're there, you can see the executives, you can feel their energy, you can chit chat to them on the side, be seen, whatever. And it was crazy, we couldn't get in. So we had to do our thing, and sneak our way in, and "Hey, we're media." "Oh yeah, come on in." And then no, they were taking us to a breakout room. We had to sneak in a side door, got like the last two seats, and wow, I'm glad we were in there because it gave us a better sense. When you're in the remote watching rooms you just can't get a sense of the energy. That's why I like to be there, I know you do too. And then to your point about ecosystem. So we've said many times that what Snowflake is developing is what we call supercloud. It's not just a SaaS, it's not just a cloud database, it's a new layer that they're creating. And so what are the attributes of that layer? Well, it hides the underlying complexity of the underlying primitives of the cloud. We've said that ad nauseam, and it adds new value on top. Well, what's that value that they're adding? Well, they're adding value of being able to share data, collaborate, have data that's governed, and secure, globally. And now the other hallmark of a cloud company is ecosystem. And so they're building that ecosystem much more rapidly than we saw at ServiceNow, which is Slootman's previous company. And the key to me is they've launched an application development platform, essentially a super PaaS, so that you can develop applications on top of the data cloud. And we're hearing tons about monetization. Duh, you could actually make money with data. You can package data into data products, and data services, or feed data products and services, and actually sell that in a cloud, in a supercloud. That's exactly what's happening here. So that's critical. I think my one question mark if I had to lay one out, is the other hallmark of a cloud is startup, startups come into that cloud. And I think we're seeing that, maybe not at the pace that AWS did, it's a little different. Snowflake are, they're whale hunters. They're after big companies. But it looks to me like they're relying on the ecosystem to be the startup innovators. That's the important thing about cloud, cloud brings scale. It definitely brings lower cost 'cause you're eliminating all this undifferentiated labor, but it also brings innovation through startups. So unlike AWS, who sold the startups directly, and startups built businesses on AWS, and by paying AWS, it's a little bit indirect, but it's actually happening where startups in the ecosystem are building products on the data cloud, and that ultimately is going to drive value for customers, and money for Snowflake, and ultimately AWS, and Google, and Azure. The other thing I would say is the criticism or concern that the cost of goods sold for cloud are going to be so high that it's going to force people to come back on-prem. I think it's a step in the wrong direction. I think cloud, and the cloud operating model is here to stay. I think it's going to be very difficult to replicate that on-prem. I don't think you can do cloud without cloud, and we'll see what the edge brings. >> Curious what your thoughts are. We were just at Dell technologies world a month or so ago when the big announcement, the Snowflake partnership there, cloud native companies recognizing, ah, there's still a lot of data that lives on-prem. Given that, and everything that we've heard the last couple of days, what are your thoughts around that and their partnerships there? >> So Dell is, I think finally, now maybe they weren't publicly talking like this, but certainly their marketing was defensive. But in the last year or so, Dell has really embraced cloud, not just the cloud operating model, Dell has said, "Look, we can build value on top of all these hyperscalers." And we saw some examples at Dell Tech World of them stepping their toe into supercloud. Project Alpine is an example, and there are others. And then of course the Snowflake deal, where Snowflake and Dell got together, I asked Frank Slootman how that deal came about. And 'cause I said, "Did the customer get you into a headlock?" 'Cause I presume that was the case. Customer said, "You got to do this or we're not going to do business with you." He said, "Well, no, not really. Michael and I had a chat, and that's how it started." Which was my other scenario, and that's exactly what happened I guess. The point being that those worlds are coming together. And so what it means for Dell is as they embrace cloud, as they develop supercloud capabilities, they're going to do a lot of business. Dell for sure knows how to sell, they know how to execute. What I would be doing if I were Dell, is I would be trying to substantially replicate what's happening in the cloud on-prem with on-prem data. So what happens with that Snowflake deal is, it's read-only data, you read the data into the cloud, the compute is in the cloud. And I should've asked Terry this, I mean Benoit. Can there be an architecture on-prem? We've seen at Vertica has one, it's called Vertica Eon where you separate compute from storage. It doesn't have unlimited elasticity, but you can grow, compute, and storage independently, and have a lot more. With Dell doing APEX on demand, it's cloudlike, they could begin to develop a little mini data cloud, or a big data cloud within on-prem that connects to the public cloud. So what Snowflake is missing, a big part of their TAM that they're missing is the on-prem. The Dell and Pure deals are forays into that, but this on-prem is massive, and Dell is the on-prem poster child. So I think again what it means for them is they've got to continue to embrace it, they got to do more in software, more in data management, they got to push on APEX. And I'd say the same thing for HPE. I think they're both well behind this in terms of ecosystems. I mean they're not even close. But they have to start, and they got to start somewhere, and they've got resources to make it happen. >> You said in your breaking analysis that you published just a few days ago before the event that Snowflake plans to create a de facto standard in data platforms. What we heard from our guests on this program, your mainstage session with Frank Slootman. Still think that? >> I do. I think it more than I believed it coming in. And the reason I called it that is because I am a super fan of Zhamak Dehghani and her data mesh. And what her vision is, it's kind of the Immaculate Conception, where she wants everything to be open, open standards, and those don't exist today. And I think she perfectly realizes the practicality of de facto standards are going to get to market, and add value sooner than open standards. Now open standards over time, and I'll come back to that, may occur, but that's clear to me what Snowflake is creating, is the de facto standard for data platforms, the data cloud, the supercloud. And what's most impressive, or I think really important, is they're layering applications now on top of that. The metric to me, and I don't know if we can even count this, but VMware used to use it. For every dollar spent on VMware license, $15 was spent in the ecosystem. It started at 1 to 1.5, 1 to 2, 1 to 10, 1 to 15, I think it went up to 1 to 30 at the max. I don't know how they counted that, but it's countable. Reasonable people can make estimates like that. And I think as the ecosystem grows, what Snowflake's doing is it's in many respects modeling the cloud, what the cloud has. Cloud has ecosystems, we talked about startups, and the cloud also has optionality. And optionality means open source. So what you saw with Apache Iceberg is we're going to extend to open technologies. What you saw with Hybrid tables is we're going to extend a new workloads like transactions. The other thing about Snowflake that's really impressive is you're seeing the vertical focus. Financial services, healthcare, retail, media and entertainment. It's very rare for a company in this tenure, they're only 10 years old, to really start going vertical with their go-to-market, and building expertise around that. I think what's going to happen is the GSIs are going to come in, they love to eat at the trough, the trough here is maybe not big enough for them yet, but it will be. And they're going to start to align with the GSIs, and they're going to do really well within those industries, connecting people, collaborating with data. But I think it's a killer strategy, but they're executing on it. >> Right, and we heard a lot of great customer stories from all of those four verticals that you talked about, and then some, that that direction and that pivot from a customer perspective, from a sales and marketing perspective is all aligned. And that was kind of one of the themes as well that Frank talked about in his keynote is mission alignment, mission alignment with customers, but also with the ecosystem. And I feel that I heard that with every customer conversation, with every partner conversation, and Snowflake conversation that we had over the last I think 36 segments, Dave. >> Yeah, I mean, yeah, it's the power of many versus the resources of one. And even though Snowflake tell you they have $5 billion in cash, and assets on the balance sheet, and that's fine, that's nothing compared to what an ecosystem has. And Amazon's part of that ecosystem. Azure is part of that ecosystem. Google is part of that ecosystem. Those companies have huge resources, and Snowflake it seems has figured out how to tap those resources, and build value on top of it. To me they're doing a better job than a lot of the cloud databases out there. They don't necessarily have a better database, in fact, I could argue that their database is less functional. And I would argue that actually in many cases. Their database is less functional if you just want a database. But if you want a data cloud, and an ecosystem, and develop applications on top of that, and to be able to monetize, that's unique, and that is a moat that they're building that is highly differentiable, and being able to do that relatively easily. I mean, I think they overstate the simplicity with which that is being done. We talked to some customers who said, he didn't say same wine, new bottle. I did ask him that, about Hadoop complexity. And he said, "No, it's not that bad." But you still got to put this stuff together. And I think in the early parts of a market that are immature, people get really excited because it's so much easier than what was previous. So my other question is, okay, what's somebody working on now, that's looking at what Snowflake's doing and saying, I can improve on that. And what's going to be really interesting to see is, can they improve on it in a way, and can they raise enough capital such that they can disrupt, or is Snowflake going to keep staying paranoid, 'cause they got good leaders, and keep executing? And then I think the other wild card is edge. Snowflake doesn't really have an edge strategy right now. I think they will develop one. >> Through the ecosystem? >> And I don't think they're missing the boat, and they'll do it through the ecosystem, exactly. I don't think they're missing the boat, I think they're just like, "Well, we don't know what to do today." It's all distributed data, and it's ephemeral, and nobody's storing the data. You know anything that comes back to the cloud, we get. But new architectures are emerging on the edge that are going to bring new economics. There's new silicon, you see what's happening with Apple, and the M1, the M1 Ultra, and the new systems that they've just developed. What Tesla is doing with custom silicon, and amazing things, and programmability of the arm model. So it's early days, but semiconductors are the mainspring of innovation in this industry. Without chips, you got nothing. And when you get innovations in silicon, it drives innovations in software, because developers go, "Wow, I can do that now?" I can do things in parallel, I can do things faster, I can do things more simply, and programmable at scale. So that's happening. And that's going to bring a new set of economics that the premise is that will eventually bleed into the data center. It will, it always does. And I guess the other thing is every 15 years or so, the world gets disrupted, the tech world. We're about 15, 16 years in now to the cloud. So at this point, everybody's like, "Wow this is insurmountable, this is all we'll ever see. Everything that's ever been invented, this is the model of the future." We know that's not the case. I don't know how it's going to get disrupted, but I think edge is going to be part of that. It could be public policy. Governments could come in and take big tech on, seems like Sharekhan wants to do that. So that's what makes this industry so fun. >> Never a dull moment, Dave. This has been a great three days hosting this show with you. We've uncovered a lot. Your breaking analysis was great to get me prepared for the show. If you haven't seen it, check it out on siliconangle.com. Thanks, Dave, I appreciate all of your insights. >> Thank you, Lisa, It's been a pleasure working with you. >> Always good to work with you. >> Awesome, great job. >> Likewise. Great job to the team. >> Yes, thank you to our awesome production team. They've kept us going for three days. >> Yes, and the team back, Kristin, and Cheryl, and everybody back at the office. >> Exactly, it takes a village. For Dave Vellante, I am Lisa Martin. We are wrappin' up three days of wall-to-wall coverage at Snowflake Summit 22 from Vegas. Thanks for watching guys, we'll see you soon. (upbeat music)

Published Date : Jun 17 2022

SUMMARY :

The energy, the momentum, And now it's theirs to lose. The momentum, but the And the key to me is they've launched the last couple of days, and Dell is the on-prem poster child. that Snowflake plans to is the GSIs are going to come in, And I feel that I heard that and assets on the balance And I guess the other thing to get me prepared for the show. a pleasure working with you. Great job to the team. Yes, thank you to our Yes, and the team guys, we'll see you soon.

ENTITIES

Entity	Category	Confidence
Frank Slootman	PERSON	0.99+
Michael	PERSON	0.99+
Kristin	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cheryl	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Terry	PERSON	0.99+
Lisa	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Dell	ORGANIZATION	0.99+
$15	QUANTITY	0.99+
$5 billion	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Tuesday	DATE	0.99+
Vegas	LOCATION	0.99+
Benoit	PERSON	0.99+
three days	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Apache Iceberg	ORGANIZATION	0.99+
three day	QUANTITY	0.99+
Snowflake Summit 22	EVENT	0.99+
last year	DATE	0.99+
Apple	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
1	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
15	QUANTITY	0.98+
36 segments	QUANTITY	0.98+
30	QUANTITY	0.98+
1.5	QUANTITY	0.98+
M1 Ultra	COMMERCIAL_ITEM	0.98+
10	QUANTITY	0.98+
today	DATE	0.98+
theCUBE	ORGANIZATION	0.97+
siliconangle.com	OTHER	0.97+
both	QUANTITY	0.97+
Snowflake Summit 2022	EVENT	0.97+
2	QUANTITY	0.96+
Cloudera	ORGANIZATION	0.96+
M1	COMMERCIAL_ITEM	0.94+
Vertica Eon	ORGANIZATION	0.94+
two seats	QUANTITY	0.94+
Dell Tech World	ORGANIZATION	0.92+
few days ago	DATE	0.92+
one question	QUANTITY	0.91+
one	QUANTITY	0.91+
ServiceNow	ORGANIZATION	0.91+
up	QUANTITY	0.9+
VMware	ORGANIZATION	0.9+
10 years old	QUANTITY	0.89+
TAM	ORGANIZATION	0.87+
four verticals	QUANTITY	0.85+
almost 10,000 people	QUANTITY	0.84+
a month or so ago	DATE	0.83+
last couple of days	DATE	0.82+

Benoit Dageville, Snowflake | Snowflake Summit 2022

(upbeat music) >> Welcome back everyone, theCUBE's three days of wall to wall coverage of Snowflake Summit '22 is coming to an end, but Dave Vellante and I, Lisa Martin are so pleased to have our final guest as none other than the co-founder and president of products at Snowflake, Benoit Dageville. Benoit, thank you so much for joining us on the program. Welcome. >> Thank you. Thank you, thank you. >> So this is day four, 'cause you guys started on Monday. This is Thursday. The amount of people that are still here speaks volumes. We've had close to 10,000 people here. >> Yeah. >> Could you ever have imagined back in the day, 10 years ago that it would come to something like this in such a short period of time? >> Absolutely not. And I always say if I had imagined that I might not have started Snowflake, right. This is somehow scary. I mean and yeah, it's huge. And you can feel the excitement of everyone. It is like mind boggling and the fact that so many people are still there after four days is great. >> Your keynote on Tuesday was fantastic. Your energy was off the charts. It was standing room only. There were overflow rooms. Like we just mentioned, a lot of people are still here. Talk about the evolution of Snowflake, this week's announcements and what it means for the future of the data cloud. >> Yeah, so evolution, I mean, I will start with the evolution. It's true that that's what we have announced. This week is not where we started necessarily. So we started really very quickly with big data combined with data warehouse as one thing. We saw that the world was moving into fragmented siloing data and we thought with Thierry, we are going to combine big data and data warehouse in one system for the cloud with this elasticity and this service simplicity. So simplicity, amazing elasticity, which is this multi workload architecture that I was explaining during the keynotes and really extreme simplicity with the service. Then we realized that there is one other attribute in the cloud, which is unique, which doesn't exist on-premise, which is collaboration. How you can connect different tenets of the platform together. And Google showed that with Google Docs. I always say to me, it was amazing that you could share document and have direct access to document that you didn't produce and you can collaborate on this document. So we wanted to do the same thing for data and this is where we created the data cloud and the marketplace where you can have all these data sets available and really the next evolution I would say is really about applications that are (indistinct) by that data, but are way simpler to use for all the tenets of the data cloud. And this is the way you can share expertise also, including, ML model, everyone talks about ML and the democratization of ML. How are you going to democratize ML? It's not by making necessary training super easy. Such that everyone can train their ML for themselves. It's by having very specialized application where data and ML is at the core, which are shared, through the marketplace and we shall leverage by many tenets of this marketplace that have no necessary knowledge about building this ML models. So that's where, yeah. >> When you and Thierry started the company, I go back to the improbable rise of Kubernetes and there were other more sophisticated container management systems back then, but they chose to focus on simplicity. And you've told me before, that was our main tenet. We are not going to worry about all the complex database stuff. You knew how to do that, but you chose not to. So my question is, did you envision solving those complex problems over time yourselves or through an ecosystem? Was this by design or did you... As you started to get into it, say let's not even try to go there let's partner to go there. >> Yeah, I mean, it's both. It's a combination of both. Snowflake, the simplicity of the platform is really important because if our partners are struggling to put their solution and build solution on top of Snowflake they will not build it. So it's very important that number one, our platform is really easy to use from day one. And that really has to be built inside the platform. You cannot build simplicity on top. You cannot have a complex solution and all of a sudden realize that, oh, this is complex. I need to build another layer on top of it to make it simpler, that will not work. So it had to be built from day one, but you're right. What is going to be Snowflake? I always say in 10 years from now, we just turn 10 years old or we are going to turn 10 years old in few months. Actually a few months, yes. >> Right. >> So for the next 10 years I really believe that most of Snowflake will not be built by Snowflake. And that's the power of the partners and these applications. When you are going to say I'm using Snowflake, actually, probably you are not going to use directly code developed by Snowflake. That code will leverage our platform, but you will use a solution that has been built on top of Snowflake. And this is the way we are going to decouple, the effort of Snowflake and multiply it. >> It's an interesting balance, isn't it? When I think of what you did with Apache Iceberg, if I use Iceberg and I'm not going to get as much functionality, but I may want that openness, but I'm going to get more functionality inside of the data cloud. And I don't know, but if you know the answer to what's going to happen. >> No, that's a super good question. So to explain what we did with Apache Iceberg, and the fact that now it's a native format for us. So everything that you can do with our internal formats, you can do it with Apache Iceberg, including security, defining masking, data masking all the governors that we have, fine grain security aspects, the replications you can define you can use (indistinct) on top of... >> But there's a but, right? But if I do that with native Snowflake tools, I'm going to get an even greater advantage, am I not? >> Yes. So that's what I'm saying. So that's why we embraced Iceberg, because I think we can bring all the benefit of Snowflake to people who have decided to use Iceberg, I mean open formats. Iceberg is a table format. So and why it was important because people had massive investments in open source in Hadoop. And we had a lot of companies saying, we love Snowflake. We want to be a Snowflake customer, but we cannot really migrate all our data. I mean, it will be really costly. And we have a lot of tools that need access, direct access. So this is why we created Iceberg because we can really... I mean, we really think that we can bring the benefit of Snowflake to this data. >> Gives customers optionality. Okay. I use this term super cloud. You don't use the term, but that's okay. And I get a lot of heat for it. But to me, what you're doing is quite a bit different than multicloud because you're creating that abstraction layer. You're bringing value above it. My question to you is, the most of the heat I get is, oh, that's just SaaS. Are you just SaaS? >> No. I mean, no, absolutely not. I mean, you're right we are a super cloud. I mean it's a much better word than saying we are multicloud. Multicloud is often viewed as oh, I have my system and now I can run this system in the different cloud providers. Snowflake is different. We have one single platform for the world, which happens to have some regions are AWS region, some regions are Azure, some regions are GCP, Google and we merge them together. We have this Snowgrid technology that connects all our regions together so that we have really one platform for the world. And that's very important because when you talk about connections of data and expertise applications you want to have global reach, right. It doesn't exist. We are not siloed by region of the world, right? You have a lot of companies which are multinational that have presence everywhere. And you want to have this global reach. The world is not a independent set of regions and countries, right. And that's the realization. So we had to create this global platform for our customers. >> And now you have people building clouds on top of your data cloud, well that to me is the next signal. In your keynote, you talked about seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace, governance, which ones are the most important? >> All of them. It's like when you have kids, you don't want to pick and say, this one is my preferred one, so they are really important. All of them, as I said without data, there is no Snowflake, right? So all data is so important that we can reach every data, wherever it is. And Iceberg is a part of that, but all workload is really important because you don't want to put your data in one platform, if you cannot run all your workloads and workloads are much broader than just data warehousing, there is data engineering, data science, ML engineering, (indistinct) all these workloads applications. So that's critical. Programmable is where we are moving, right. We want to be the place where data applications are built. And we think we have a lot of advantages because data application needs to use many workloads at once, right? It's not that that application will do only data warehousing, they need to store their states, they need to use this new workload that we define, which is Unistore. They need to do data engineering because they need to get data, right. They have to save this data. So they need to combine many workload and if they have to stitch this workload, because the platform was not designed as one single product where everything is consistent and works together, that you have to stitch, it's complicated for this application to make it work. So Snowflake is we believe an ideal platform to run these data applications. So all workloads, programmable, obviously, so that you can program. And programmable has two aspects, which is big part of our announcement. Is both data programmability, which is running Python against petabyte, terabytes of data at scale and doing it scale out. So that's what we call data programmability. So both Java, Python and (indistinct), but also running applications like UI. And we had this acquisition of Streamlit. Streamlit now has been fully integrated in Snowflake. We announced that such that not only you can have this data programmability, but you can expose your data through this nice UIs, interactive UI to business users potentially. So it goes all the way there. Global is super important. As we say, we want to be one platform for the world. And of course, as I said, the last pillar, which is somehow critical for us, because we are cloud, we need to have governance. We need to have security of our data. And why it took us so long to do Python is not because it's out to run Python, right? Everyone can run Python it's because we had to secure it. And I talk about it creating this amazing sandboxing technology, such that when you include third party libraries and third party codes, you are guaranteed that this third party code will not reach to infiltrate your data, right. We control the environment that Snowflake provides. >> Can you share us some of the feedback from the customer? You probably had many customer conversations over the last four days. >> Look at that smile. (interviewer laughing) (Lisa laughing) >> Actually not because I was so busy everywhere. Unfortunately, I didn't speak to many customers. Saying that, I had everyone stopping me and talking about what they heard and yeah, there is a huge excitement about all of this. >> What's been the feedback around the theme of the event? The world of data collaboration. Data collaboration is so critical as every company these days must be a data company to compete, to win. What's been from just some of the feedback that you've had customers really embracing data collaboration, what Snowflake is enabling. >> Yeah. I mean, almost every company which is using Snowflake, is collaborating with data. You have heard, the number of stable edges that we have, and there is a real need for that because your data alone... You cannot make sense of your data if it is just alone. It needs to be connected with other data. You haven't not generated. So all data, when you say the first pillar of Snowflake is all data is not only about your data, but is about all the data that's created around you. That puts perspective on your own data. And that's critical and it's so painful to get. I mean, even your data is difficult to have access to your data, but imagine data that you didn't produce. And so yes, so the data collaboration is critical, and then now we expanded it to application and expertise, sharing models, for example, That's going to have a huge impact. >> All data includes now transaction data, right? >> Yes. >> That's a big part of the announcements that you guys made. >> Yeah. So and that's the motivation for that was really, if we want to run application, full application, we announced native applications, which are fully executed and run inside the (indistinct) data cloud, right. They need all the services that application need and in particular managing their states. And so we created Unistore, which is a new workload, which allows you to combine transactional data, which are generated by this application. And at the same time being able to do analytics directly on this data. So we call it Hybrid Table because it has this hybrid aspect. You can do both transactional access to this data and at the same time analytic here without having data pipeline and moving data and transforming it from the transactional system to the analytical system, right. Snowflake is one system. Again, in the spirit of simplifying everything, this is the Snowflake (indistinct). >> I can ask the same question I ask at first, (indistinct) when was the aha moment that you and Thierry had that said, this is not just a better data warehouse, it's actually more than that. You probably didn't call it a data cloud until later on, but did you know that from the beginning or was that something you kind of stumbled into? >> No. So as I said, we founded Snowflake in 2012 and Thierry and I, we locked in my apartment and we were doing the blueprint of Snowflake and trying to find what is the revolution with the cloud for this data warehouse system and analytical system, both big data and data warehouse. And the aha moment was but of course cloud, okay. What is cloud? It's elasticity, it's service and later collaboration. So in the elasticity aspect, when you ask database people, what is elasticity, they will tell you, oh, you have a cluster of nodes. Like if it is Oracle, it would be a (indistinct) cluster. And the elasticities that you can add one node, two node to this cluster without having too much impact on the existing workload, because you need to shuffle data, right. It's hard and doing it online, right, that's elasticity. If you can do that, you are elastic. We thought that that was not very interesting to do that. What is interesting with elasticity is to plug new workloads. You can plug a workload like that and that workload is running without having any impact on other workloads, which are running on the platform. So elasticity for us was having dedicated computer resources to workloads. And these computer resources could start and be part as soon as the workload starts and will shut down when the workload finishes and they will be sized exactly for the demand of that workload. And we thought the aha moment was, okay if we can do that, now we can run a workload with, let's say 10X more computer resources than what you would have used or 100X more. Okay, let's say 100X more because we paralyzed things. Now this workload can run 100X faster, right? That's assuming we do a good job in the scale, which is our IP. And if we can do that, now the computer resources that you have used, you have used them for 100 times less. So you have used 100 times more resources because you have more nodes, but because you go fast, you use them for less time, right? So if you multiply the two it's constant. So you can run and accelerate workload dramatically 10X, 100X for the same price. Even if we are not better in efficiency than competition, just having that was the magic, right? >> You know how Google founders originally had trouble raising money because who needs another search engine? Did you get from original, like when you started going to raise money, Amazon's got a database, so who needs another cloud database? Did you get that early on or was it just obvious Speiser and companies as well. >> Speiser is a little bit on the crazy side and ambitious and so Speiser is Speiser. And of course he had no doubt, but even him was saying Benoit, Thierry, Hadoop, right. Everyone is saying Hadoop is going to be the revolution. And you guys are betting actually against Hadoop because we told Speiser, Hadoop is a bad system, it's going to fail, but at the time everyone was so bullish about Hadoop, everyone was implementing Hadoop that it didn't look like it was going to fail and we were probably wrong. So there was a lot of skepticism about not leveraging Hadoop and not being an Hadoop. Okay, something being on top of Hadoop. That was number one. There was no cloud warehouse at the time we started. Redshift was not started. It was the pioneer somewhere when Snowflake was founded. So creating a data warehouse in the cloud sounded crazy to people. How am I going to move my data over there? And security and what about security, the cloud is not secure. So that was another... >> So you guys predated that Parexel move by... >> Yes. >> Okay, so that's interesting. And I thought when Redshift... I mean, Amazon announced Redshift, I was sure that Mike Speiser will come and say, guys it's too sad, but they beat you guys and they build something and actually it was the reverse. Mike Speiser was super excited and so it was interesting to me. >> Wow, that's amazing. 'Cause John Furrier and I, we were early with theCUBE. when theCUBE started it was like the beginning of Hadoop. And so we brought theCUBE to, I think it was the second Hadoop World and we was rubbing nickels together at the time. And I was so excited bring compute to storage and it made so much sense. But I remember and I won't say who it was, but an early Hadoop committer told me this is going to fail. And I'm like, what? And he started going age basis crap and all this stuff. And I was sad because I was so excited, but it turned out that you had the same (indistinct). >> Because of complexity. Okay, Hadoop failed for two reasons. One is because they decided that, oh, a lot of this database thing, you don't need transaction, you don't need SQL, you don't necessarily, you don't need to go fast. It'll be batch, normal real time interaction with data, no one needs that. >> Cheap storage. >> So a lot of compromise on the very important technology. And at the same time, extreme complexity and complexity for me was, where I was I knew that it was going to fail big time and we bet Snowflake on the failure of Hadoop indeed. >> And there was no cloud early on in Hadoop. >> And there was no cloud too. >> And that was what killed it. That was like... >> You're right. And the model that Hadoop had for data didn't work on block storage. Block storage is not as efficient as HGFS. So that was also another figure. >> Do you ever sit back and think about... So you think about how much money has poured in to separating compute from storage and cloud databases and you started it all. (interviewer laughing) >> Yeah. No, this is... >> Pretty amazing. >> Yeah. >> Right, so that's good. That means that you're onto a good idea, but a lot of people get confused that again, they think that you're a cloud data warehouse and you're not, I mean, you're much more than that. >> Yeah, I hate that. I have to say, because from day one we were not a cloud data warehouse. As I said, it was all about combining the big data, massive amount of unstructured data, petabytes stored as files. Okay, that's very important, store as files where it's very easy to drop data in the system without... Very low cost to combine with data warehouse, full multi statement transaction when people will tell you today, oh, now we are a data warehouse. They don't have multi statement transaction, right. So we had from day one multi statement transaction really efficient SQL. You could run your dashboard. So combining these two worlds was I think the crazy thing, that's the crazy innovation that Snowflake did initially. >> Yeah. >> And I know it's really easy to build data warehouse somewhere, because if you don't think about big data, petabytes, extremely structured data, you remove a lot of complexity. >> This is why Lisa, when you get excited about technology, but you always have to have a, somebody who really deeply understands technology to stink test it, all right so awesome. Thank you for sharing that story. >> Yeah. >> Fantastic. So over 5,900 customers now. I saw over 500 in the Forbes G2K, over almost 10,000 people here this year. If we think back to 2019, there was about what? Less than 2000 people. >> Yeah. >> What do you think is going to happen next year? >> I don't know. I don't like to think about next year. I mean, I always say, Snowflake is so exciting to me because it is like a TV show, right. Where you wait the next season and we have one season every year. So I'm really excited to know what is going to happen next year. And I don't want to project what I think will happen, but all these movements to the Snowflake being the platform for data application. I want to see what people are going to build on our platform. I mean, that's the excitement. >> Season 11 coming up. >> Yes. Season 11. Yes. >> No binge watching here. Benoit, it's been a pleasure to have you on the program. >> Thank you. >> Congratulations on incredible success, the momentum, the energy is contagious. We love it. (Benoit laughing) >> Thank you so much. >> Thank you. >> Bye bye. >> For Benoit Dageville and Dave Vellante, I'm Lisa Martin. You're watching theCUBE's coverage of Snowflake Summit '22. Dave and I will be right back with a wrap. (upbeat music)

Published Date : Jun 16 2022

SUMMARY :

is coming to an end, Thank you, thank you. you guys started on Monday. And you can feel the future of the data cloud. and the marketplace where you So my question is, did you envision And that really has to be And that's the power of the and I'm not going to get So everything that you can the benefit of Snowflake to this data. My question to you is, the And that's the realization. And now you have people building clouds And of course, as I said, the last pillar, the feedback from the customer? Look at that smile. I was so busy everywhere. the feedback that you've had but imagine data that you didn't produce. announcements that you guys made. So and that's the motivation I can ask the same question And the elasticities that you can add like when you started at the time we started. So you guys predated and so it was interesting to me. And I was so excited you don't need to go fast. And at the same time, extreme complexity And there was no And that was what killed it. And the model that Hadoop had for data and you started it all. No, this is... but a lot of people get I have to say, because from day one because if you don't think about big data, This is why Lisa, when you I saw over 500 in the Forbes G2K, I mean, that's the excitement. Yes. to have you on the program. the momentum, the energy is contagious. Dave and I will be right back with a wrap.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Mike Speiser	PERSON	0.99+
10X	QUANTITY	0.99+
100X	QUANTITY	0.99+
100 times	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Mike Speiser	PERSON	0.99+
2012	DATE	0.99+
Benoit Dageville	PERSON	0.99+
Dave	PERSON	0.99+
Benoit	PERSON	0.99+
Monday	DATE	0.99+
Thierry	PERSON	0.99+
Thursday	DATE	0.99+
2019	DATE	0.99+
Tuesday	DATE	0.99+
Snowflake	TITLE	0.99+
Google	ORGANIZATION	0.99+
next year	DATE	0.99+
two aspects	QUANTITY	0.99+
Lisa	PERSON	0.99+
Python	TITLE	0.99+
This week	DATE	0.99+
one season	QUANTITY	0.99+
two reasons	QUANTITY	0.99+
One	QUANTITY	0.99+
Hadoop	PERSON	0.99+
two	QUANTITY	0.99+
both	QUANTITY	0.99+
Snowflake Summit '22	EVENT	0.99+
this week	DATE	0.99+
one platform	QUANTITY	0.99+
Streamlit	TITLE	0.99+
Speiser	ORGANIZATION	0.99+
Java	TITLE	0.99+
one platform	QUANTITY	0.99+
10 years	QUANTITY	0.99+
one system	QUANTITY	0.98+
one node	QUANTITY	0.98+
Less than 2000 people	QUANTITY	0.98+
Snowflake	EVENT	0.98+
AWS	ORGANIZATION	0.98+
two node	QUANTITY	0.98+
this year	DATE	0.98+
second	QUANTITY	0.98+
today	DATE	0.98+
John Furrier	PERSON	0.98+
Hadoop	TITLE	0.97+
over 5,900 customers	QUANTITY	0.97+
10 years ago	DATE	0.97+
one single product	QUANTITY	0.97+
first pillar	QUANTITY	0.97+
Google Docs	TITLE	0.97+
Snowflake	ORGANIZATION	0.97+
Multicloud	TITLE	0.97+
over 500	QUANTITY	0.97+
Parexel	ORGANIZATION	0.96+

Frank Slootman, Snowflake | Snowflake Summit 2022

>>Hi, everybody. Welcome back to Caesars in Las Vegas. My name is Dave ante. We're here with the chairman and CEO of snowflake, Frank Luman. Good to see you again, Frank. Thanks for coming on. Yeah, >>You, you as well, Dave. Good to be with you. >>No, it's, it's awesome to be, obviously everybody's excited to be back. You mentioned that in your, in your keynote, the most amazing thing to me is the progression of what we're seeing here in the ecosystem and of your data cloud. Um, you wrote a book, the rise of the data cloud, and it was very cogent. You talked about network effects, but now you've executed on that. I call it the super cloud. You have AWS, you know, I use that term, AWS. You're building on top of that. And now you have customers building on top of your cloud. So there's these layers of value that's unique in the industry. Was this by design >>Or, well, you know, when you, uh, are a data clouding, you have data, people wanna do things, you know, with that data, they don't want to just, you know, run data operations, populate dashboards, you know, run reports pretty soon. They want to build applications and after they build applications, they wanna build businesses on it. So it goes on and on and on. So it, it drives your development to enable more and more functionality on that data cloud. Didn't start out that way. You know, we were very, very much focused on data operations, then it becomes application development and then it becomes, Hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many, in many ways, you know, >>There was some confusion I think, and there still is in the community of, particularly on wall street, about your quarter, your con the consumption model I loved on the earnings call. One of the analysts asked Mike, you know, do you ever consider going to a subscription model? And Mike got cut him off, then let finish. No, that would really defeat the purpose. Um, and so there's also a narrative around, well, maybe snowflake, consumption's easier to dial down. Maybe it's more discretionary, but I, I, I would say this, that if you're building apps on top of snowflake and you're actually monetizing, which is a big theme here, now, your revenue is aligned, you know, with those cloud costs. And so unless you're selling it for more, you know, than it costs more than, than you're selling it for, you're gonna dial that up. And that is the future of, I see this ecosystem in your company. Is that, is that fair? You buy that. >>Yeah, it, it is fair. Obviously the public cloud runs on a consumption model. So, you know, you start looking all the layers of the stack, um, you know, snowflake, you know, we have to be a consumption model because we run on top of other people's, uh, consumption models. Otherwise you don't have alignment. I mean, we have conversations, uh, with people that build on snowflake, um, you know, they have trouble, you know, with their financial model because they're not running a consumption model. So it's like square pack around hole. So we all have to align ourselves. So that's when they pay a dollar, you know, a portion goes to, let's say, AWS portion goes to the snowflake of that dollar. And the portion goes to whatever the uplift is, application value, data value, whatever it is to that goes on top of that. So the whole dollar, you know, gets allocated depending on whose value at it. Um, we're talking about. >>Yeah, but you sell value. Um, so you're not a SaaS company. Uh, at least I don't look at you that that way I I've always felt like the SAS pricing model is flawed because it's not aligned with customers. Right. If you, if you get stuck with orphaned licenses too bad, you know, pay us. >>Yeah. We're, we're, we're obviously a SaaS model in the sense that it is software as a service, but it's not a SaaS model in the sense that we don't sell use rights. Right. And that's the big difference. I mean, when you buy, you know, so many users from, you know, Salesforce and ServiceNow or whoever you have just purchased the right, you know, for so many users to use that software for this period of time, and the revenue gets recognized, you know, radically, you know, one month at a time, the same amount. Now we're not that different because we still do a contract the exact same way as SA vendor does it, but we don't recognize the revenue radically. We recognize the revenue based on the consumption, but over the term of the contract, we recognize the entire amount. It just is not neatly organized in these monthly buckets. >>You know? So what happens if they underspend one quarter, they have to catch up by the end of the, the term, is that how it works or is that a negotiation or it's >>The, the, the spending is a totally, totally separate from the consumption itself, you know, because you know how they pay for the contract. Let's say they do a three year contract. Um, you know, they, they will probably pay for that, you know, on an annual basis, you know, that three year contract. Um, but it's how they recognize their expenses for snowflake and how we recognize the revenue is based on what they actually consume. But it's not like you're on demand where you can just decide to not use it. And then I don't have any cost, but over the three year period, you know, all of that, you know, uh, needs to get consumed or they expire. And that's the same way with Amazon. If I don't consume what I buy from Amazon, I still gotta pay for it. You know, so, >>Well, you're right. Well, I guess you could buy by the drink, but it's way, way more expensive and nobody really correct. Does that, so, yep. Okay. Phase one, better simpler, you know, cloud enterprise data warehouse, phase two, you introduced the, the data cloud and, and now we're seeing the rise of the data cloud. What, what does phase three look like >>Now? Phase, phase three is all about applications. Um, and we've just learned, uh, you know, from the beginning that people were trying to do this, but we weren't instrumental at all to do it. So people would ODBC, you know, JDBC drivers just uses as database, right? So the entire application would happen outside, you know, snowflake, we're just a database. You connect to the database, you know, you read or right data, you know, you do data, data manipulations. And then the application, uh, processing all happens outside of snowflake. Now there's issues with that because we start to exfil trade data, meaning that we started to take data out of snowflake and, and put it, uh, in other places. Now there's risk for that. There's operational risk, there's governance, exposure, security issues, you know, all this kind of stuff. And the other problem is, you know, data gets Reed. >>It proliferates. And then, you know, data science tests are like, well, I, I need that data to stay in one place. That's the whole idea behind the data cloud. You know, we have very big infrastructure clouds. We have very big application clouds and then data, you know, sort of became the victim there and became more proliferated and more segment. And it's ever been. So all we do is just send data to the work all day. And we said, no, we're gonna enable the work to get to the data. And the data that stays in more in place, we don't have latency issue. We don't have data quality issues. We don't have lineage issues. So, you know, people have responded very, very well to the data cloud idea, like, yeah, you know, as an enterprise or an institution, you know, I'm the epicenter of my own data cloud because it's not just my own data. >>It's also my ecosystem. It's the people that I have data networking relationships with, you know, for example, you know, take, you know, uh, an investment bank, you know, in, in, in, in New York city, they send data to fidelity. They send data to BlackRock. They send data to, you know, bank of New York, all the regulatory clearing houses, all on and on and on, you know, every night they're running thousands, tens of thousands, you know, of jobs pushing that data, you know, out there. It just, and they they're all on snowflake already. So it doesn't have to be this way. Right. So, >>Yes. So I, I asked the guys before, you know, last week, Hey, what, what would you ask Frank? Now? You might remember you came on, uh, our program during COVID and I was asking you how you're dealing with it, turn off the news. And it was, that was cool. And I asked you at the time, you know, were you ever, you go on Preem and you said, look, I'll never say never, but it defeats the purpose. And you said, we're not gonna do a halfway house. Actually, you were more declarative. We're not doing a halfway house, one foot in one foot out. And then the guy said, well, what about that Dell deal? And that pure deal that you just did. And I, I think I know the answer, but I want to hear from you did a customer come to you and say, get you in the headlock and say, you gotta do this. >>Or it did happen that way. Uh, it, uh, it started with a conversation, um, you know, via with, uh, with Michael Dell. Um, it was supposed to be just a friendly chat, you know, Hey, how's it going? And I mean, obviously Dell is the owner of data, the main, or our first company, you know? Um, but it's, it, wasn't easy for, for Dell and snowflake to have a conversation because they're the epitome of the on-premise company and we're the epitome of a cloud company. And it's like, how, what do we have in common here? Right. What can we talk about? But, you know, Michael's a very smart, uh, engaging guy, you know, always looking for, for opportunity. And of course they decided we're gonna hook up our CTOs, our product teams and, you know, explore, you know, somebody's, uh, ideas and, you know, yeah. We had some, you know, starts and restarts and all of that because it's just naturally, you know, uh, not an easy thing to conceive of, but, you know, in the end it was like, you know what? >>It makes a lot of sense. You know, we can virtualize, you know, Dell object storage, you know, as if it's, you know, an S three storage, you know, from Amazon and then, you know, snowflake in its analytical processing. We'll just reference that data because to us, it just looks like a file that's sitting on, on S3. And we have, we have such a thing it's called an external table, right. That's, that's how we basically, it projects, you know, a snowflake, uh, semantic and structural model, you know, on an external object. And we process against it exactly the same way as if it was an internal, uh, table. So we just extended that, um, you know, with, um, with our storage partners, like Dell and pure storage, um, for it to happen, you know, across a network to an on-prem place. So it's very elegant and it, it, um, it becomes an, an enterprise architecture rather than just a cloud architecture. And I'm, I just don't know what will come of it. And, but I've already talked to customers who have to have data on premises just can't go anywhere because they process against it, you know, where it originates, but there are analytical processes that wanna reference attributes of that data. Well, this is what we'll do that. >>Yeah. I'm, it is interesting. I'm gonna ask Dell if I were them, I'd be talking to you about, Hey, I'm gonna try to separate compute from storage on prem and maybe do some of the, the work there. I don't even know if it's technically feasible. It's, I'll ask OI. But, um, but, but, but to me, that's an example of your extending your ecosystem. Um, so you're talking now about applications and that's an example of increasing your Tam. I don't know if you ever get to the edge, you know, we'll see, we're not quite quite there yet, but, um, but as you've said before, there's no lack of market for you. >>Yeah. I mean, obviously snowflake it it's, it's Genesis was reinventing database management in, in a cloud computing environment, which is so different from a, a machine environment or a cluster environment. So that's why, you know, we're, we're, we're not a, a fit for a machine centric, uh, environment sort of defeats the purpose of, you know, how we were built. We, we are truly a native solution. Most products, uh, in the clouds are actually not cloud native. You know, they, they originated the machine environments and you still see that, you know, almost everything you see in the cloud by the way is not cloud native, our generation of applications. They only run the cloud. They can only run the cloud. They are cloud native. They don't know anything else, >>You know? Yeah, you're right. A lot of companies would just wrap something in wrap their stack in Kubernetes and throw it into the cloud and say, we're in the cloud too. And you basically get, you just shifted. It >>Didn't make sense. Oh. They throw it in the container and run it. Right. Yeah. >>So, okay. That's cool. But what does that get you that doesn't change your operational model? Um, so coming back to software development and what you're doing in, in that regard, it seems one of the things we said about Supercloud is in order to have a Supercloud, you gotta have an ecosystem, you gotta have optionality. Hence you're doing things like Apache iceberg, you know, you said today, well, we're not sure where it's gonna go, but we offering options. Uh, but, but my, my question is, um, as it pertains to software developments specifically, how do you, so one of the things we said, sorry, I've lost my train there. One of the things we said is you have to have a super PAs in order to have a super cloud ecosystem, PAs layer. That's essentially what you've introduced here. Is it not a platform for our application development? >>Yeah. I mean, what happens today? I mean, how do you enable a developer, you know, on snowflake, without the developer, you know, reading the, the files out of snowflake, you know, processing, you know, against that data, wherever they are, and then putting the results set, God knows where, right. And that's what happens today. It's the wild west it's completely UN uncovered, right? And that's the reason why lots of enterprises will not allow Python anything anywhere near, you know, their enterprise data. We just know that, uh, we also know it from streamlet, um, or the acquisition, um, large acquisition that we made this year because they said, look, you know, we're, we have a lot of demand, you know, uh, in the Python community, but that's the wild west. That's not the enterprise grade high trust, uh, you know, corporate environment. They are strictly segregated, uh, today. >>Now do some, do these, do these things sometimes dribble up in the enterprise? Yes, they do. And it's actually intolerable the risk that enterprises, you know, take, you know, with things being UN uncovered. I mean the whole snowflake strategy and promises that you're in snowflake, it is a, an absolute enterprise grade environment experience. And it's really hard to do. It takes enormous investment. Uh, but that is what you buy from us. Just having Python is not particularly hard. You know, we can do that in a week. This has taken us years to get it to this level, you know, of, of, you know, governance, security and, and, you know, having all the risks around exfiltration and so on, really understood and dealt with. That's also why these things run in private previews and public previews for so long because we have to squeeze out, you know, everything that may not have been, you know, understood or foreseen, you know, >>So there are trade offs of, of going into this snowflake cloud, you get all this great functionality. Some people might think it's a walled garden. How, how would you respond to that? >>Yeah. And it's true when you have a, you know, a snowflake object, like a snowflake, uh, table only snowflake, you know, runs that table. And, um, you know, that, that is, you know, it's very high function. It's very sort of analogous to what apple did, you know, they have very high functioning, but you do have to accept the fact that it's, that it's not, uh, you know, other, other things in apple cannot, you know, get that these objects. So this is the reason why we introduce an open file format, you know, like, like iceberg, uh, because what iceberg effectively does is it allows any tool, uh, you know, to access that particular object. We do it in such a way that a lot of the functionality of snowflake, you know, will address the iceberg format, which is great because it's, you're gonna get much more function out of our, you know, iceberg implementation than you would get from iceberg on its own. So we do it in a very high value addeds, uh, you know, manner, but other tools can still access the same object in a read to write, uh, manner. So it, it really sort of delivers the original, uh, promise of the data lake, which is just like, Hey, I have all these objects tools come and go. I can use what I want. Um, so you get, you get the best of both worlds for the most part. >>Have you reminds me a little bit of VMware? I mean, VMware's a software mainframe, it's just better than >>Doing >>It on your own. Yep. Um, one of the other hallmarks of a cloud company, and you guys clearly are a cloud company is startups and innovation. Um, now of course you see that in, in the, in the ecosystem, uh, and maybe that's the answer to my question, but you guys are kind of whale hunters, <laugh> your customers are, tend to be bigger. Uh, is the, is the innovation now the extension of that, the ecosystem is that by design. >>Oh, um, you know, we have a enormous, uh, ISV following and, um, we're gonna have a whole separate conference like this, by the way, just for, yeah. >>For developers. I hope you guys will up there too. Yeah. Um, you know, the, the reason that, that the ISV strategy is very important for, you know, for, for, for, for many reasons, but, you know, ISVs are the people that are really going to unlock a lot of the value and a lot of the promise of data, right? Because you, you can never do that on your own. And the problem has been that for ISVs, it is so expensive and so difficult to build a product that can be used because the entire enterprise platform infrastructure needs to be built by somebody, you know, I mean, are you really gonna run infrastructure, database, operations, security, compliance, scalability, economics. How do you do that as a software company where really you only have your, your domain expertise that you want to deliver on a platform. You don't wanna do all these things. >>First of all, you don't know how to do it, how to do it well. Um, so it is much easier, much faster when there is already platform to actually build done in the world of clout that just doesn't, you know, exist. And then beyond that, you know, okay, fine building. It is sort of step one. Now I gotta sell it. I gotta market it. So how do I do that? Well, in the snowflake community, you have already market <laugh>, there's thousands and thousands of customers that are also on self lake. Okay. So their, their ability to consume that service that you just built, you know, they can search it, they can try it, they can test it and decide whether they want to consume it. And then, you know, we can monetize it. So all they have to do is cash the check. So the net effecti of it is we drastically lowered the barriers to entry into the world, you know, of software, you know, two men or two women in a dog, and a handful of files can build something that then can be sold, sort of to, for software developers. >>I wrote a piece 2012 after the first reinvent. And I, you know, and I, and I put a big gorilla on the front page and I said, how do you compete with Amazon gorilla? And then one of my answers was you build data ecosystems and you verticalize, and that's, that's what you're doing >>Here. Yeah. There certain verticals that are farther along than others, uh, obviously, but for example, in financial, uh, which is our largest vertical, I mean, the, the data ecosystem is really developing hardcore now. And that's, that's because they so rely on those relationships between all the big financial institutions and entities, regulatory, you know, clearing houses, investment bankers, uh, retail banks, all this kind of stuff. Um, so they're like, it becomes a no brainer. The network affects kick in so strongly because they're like, well, this is really the only way to do it. I mean, if you and I work in different companies and we do, and we want to create a secure, compliant data network and connection between us, I mean, it would take forever to get our lawyers to agree that yeah, it's okay. <laugh> right now, it's like a matter of minutes to set it up. If we're both on snowflake, >>It's like procurement, do they, do you have an MSA yeah. Check? And it just sail right through versus back and forth and endless negotiations >>Today. Data networking is becoming core ecosystem in the world of computing. You know, >>I mean, you talked about the network effects in rise of the data cloud and correct. Again, you know, you, weren't the first to come up with that notion, but you are applying it here. Um, I wanna switch topics a little bit. I, when I read your press releases, I laugh every time. Cause this says no HQ, Bozeman. And so where, where do you, I think I know where you land on, on hybrid work and remote work, but what are your thoughts on that? You, you see Elon the other day said you can't work for us unless you come to the office. Where, where do you stand? >>Yeah. Well, the, well, the, the first aspect is, uh, we really wanted to, uh, separate from the idea of a headquarters location, because I feel it's very antiquated. You know, we have many different hubs. There's not one place in the world where all the important people are and where we make all the important positions, that whole way of thinking, uh, you know, it is obsolete. I mean, I am where I need to be. And it it's many different places. It's not like I, I sit in this incredible place, you know, and that's, you know, that's where I sit and everybody comes to me. No, we are constantly moving around and we have engineering hubs. You know, we have your regional, uh, you know, headquarters for, for sales. Obviously we have in Malaysia, we have in Europe, you know? And, um, so I wanted to get rid of this headquarters designation. >>And, you know, the, the, the other issue obviously is that, you know, we were obviously in California, but you know, California is, is no longer, uh, the dominant place of where we are resident. I mean, 40% of our engineering people are now in be Washington. You know, we have hundreds of people in Poland where people, you know, we are gonna have very stressed location in Toronto. Um, yeah. Obviously our customers are, are everywhere, right? So this idea that, you know, everything is happening in, in one state is just, um, you know, not, not correct. So we wanted to go to no headquarters. Of course the SCC doesn't let you do that. Um, because they want, they want you to have a street address where the government can send you a mail and then it becomes, the question is, well, what's an acceptable location. Well, it has to be a place where the CEO and the CFO have residency by hooker, by crook. >>That happened to be in Bozeman Montana because Mike and I are both, it was not by design. We just did that because we were, uh, required to, you know, you know, comply with government, uh, requirements, which of course we do, but that's why it, it says what it says now on, on the topic of, you know, where did we work? Um, we are super situational about it. It's not like, Hey, um, you know, everybody in the office or, or everybody is remote, we're not categorical about it. Depends on the function, depends on the location. Um, but everybody is tethered to an office. Okay. In words, everybody has a relationship with an office. There's, there's almost nobody, there are a few exceptions of people that are completely remote. Uh, but you know, if you get hired on with snowflake, you will always have an office affiliation and you can be called into the office by your manager. But for purpose, you know, a meeting, a training, an event, you don't get called in just to hang out. And like, the office is no longer your home away from home. Right. And we're now into hotel, right? So you don't have a fixed place, you know? So >>You talked in your keynote a lot about last question. I let you go customer alignment, obviously a big deal. I have been watching, you know, we go to a lot of events, you'll see a technology company tell a story, you know, about their widget or whatever it was their box. And then you'll see an outcome and you look at it and you shake your head and say, well, that the difference between this and that is the square root of zero, right. When you talk about customer alignment today, we're talking about monetizing data. Um, so that's a whole different conversation. Um, and I, I wonder if you could sort of close on how that's different. Um, I mean, at ServiceNow, you transformed it. You know, I get that, you know, data, the domain was okay, tape, blow it out, but this is a, feels like a whole new vector or wave of growth. >>Yeah. You know, monetizing, uh, data becomes sort of a, you know, a byproduct of having a data cloud you all of a sudden, you know, become aware of the fact that, Hey, Hey, I have data and be that data might actually be quite valuable to parties. And then C you know, it's really easy to then, you know, uh, sell that and, and monetize that. Cause if it was hard, forget it, you know, I don't have time for it. Right. But if it's relatively, if it's compliant, it's relatively effortless, it's pure profit. Um, I just want to reference one attribute, two attributes of what you have, by the way, you know, uh, hedge funds have been into this sort of thing, you know, for a long time, because they procure data from hundreds and hundreds of sources, right. Because they're, they are the original data scientists. >>Um, but the, the bigger thing with data is that a lot of, you know, digital transformation is, is, is finally becoming real. You know, for years it was arm waving and conceptual and abstract, but it's becoming real. I mean, how do we, how do we run a supply chain? You know, how do we run, you know, healthcare, um, all these things are become are, and how do we run cyber security? They're being redefined as data problems and data challenges. And they have data solutions. So that's right. Data strategies are insanely important because, you know, if, if the solution is through data, then you need to have, you know, a data strategy, you know, and in our world, that means you have a data cloud and you have all the enablement that allows you to do that. But, you know, hospitals, you know, are, are saying, you know, data science is gonna have a bigger impact on healthcare than life science, you know, in the coming, whatever, you know, 10, 20 years, how do you enable that? >>Right. I, I have conversations with, with, with hospital executives are like, I got generations of data, you know, clinical diagnostic, demographic, genomic. And then I, I am envisioning these predictive outcomes over here. I wanna be able to predict, you know, once somebody's gonna get what disease and you know, what I have to do about it, um, how do I do that? <laugh> right. The day you go from, uh, you know, I have a lot of data too. I have these outcomes and then do me a miracle in the middle, in the middle of somewhere. Well, that's where we come in. We're gonna organize ourselves and then unpack thats, you know, and then we, we work, we through training models, you know, we can start delivering some of these insights, but the, the promise is extraordinary. We can change whole industries like pharma and, and, and healthcare. Um, you know, 30 effects of data, the economics will change. And you know, the societal outcomes, you know, um, quality of life disease, longevity of life is quite extraordinary. Supply chain management. That's all around us right >>Now. Well, there's a lot of, you know, high growth companies that were kind of COVID companies, valuations shot up. And now they're trying to figure out what to do. You've been pretty clear because of what you just talked about, the opportunities enormous. You're not slowing down, you're amping it up, you know, pun intended. So Frank Luman, thanks so much for coming on the cube. Really appreciate your time. >>My pleasure. >>All right. And thank you for watching. Keep it right there for more coverage from the snowflake summit, 2022, you're watching the cube.

Published Date : Jun 15 2022

SUMMARY :

Good to see you again, Frank. You have AWS, you know, I use that term, AWS. you know, with that data, they don't want to just, you know, run data operations, populate dashboards, One of the analysts asked Mike, you know, do you ever consider going to a subscription model? with people that build on snowflake, um, you know, they have trouble, you know, with their financial model because bad, you know, pay us. you know, so many users from, you know, Salesforce and ServiceNow or whoever you have just purchased the they, they will probably pay for that, you know, on an annual basis, you know, that three year contract. Phase one, better simpler, you know, cloud enterprise data warehouse, You connect to the database, you know, you read or right data, you know, you do data, data manipulations. like, yeah, you know, as an enterprise or an institution, you know, I'm the epicenter of you know, for example, you know, take, you know, uh, an investment bank, you know, in, you know, were you ever, you go on Preem and you said, look, I'll never say never, but it defeats the purpose. just naturally, you know, uh, not an easy thing to conceive of, but, you know, You know, we can virtualize, you know, Dell object storage, you know, I don't know if you ever get to the edge, you know, we'll see, we're not quite quite there yet, So that's why, you know, we're, And you basically get, you just shifted. Oh. They throw it in the container and run it. you know, you said today, well, we're not sure where it's gonna go, but we offering options. you know, on snowflake, without the developer, you know, reading the, the files out of snowflake, And it's actually intolerable the risk that enterprises, you know, take, So there are trade offs of, of going into this snowflake cloud, you get all this great functionality. uh, you know, other, other things in apple cannot, you know, get that these objects. Um, now of course you see that Oh, um, you know, we have a enormous, uh, ISV following and, be built by somebody, you know, I mean, are you really gonna run infrastructure, you know, of software, you know, two men or two women in a dog, and a handful of files can build you know, and I, and I put a big gorilla on the front page and I said, how do you compete with Amazon gorilla? regulatory, you know, clearing houses, investment bankers, uh, retail banks, It's like procurement, do they, do you have an MSA yeah. Data networking is becoming core ecosystem in the world of computing. Again, you know, It's not like I, I sit in this incredible place, you know, and that's, And, you know, the, the, the other issue obviously is that, you know, we were obviously in California, We just did that because we were, uh, required to, you know, you know, I have been watching, you know, we go to a lot of events, you'll see a technology company tell And then C you know, you know, a data strategy, you know, and in our world, that means you have a data cloud and you have all the enablement that thats, you know, and then we, we work, we through training models, you know, you know, pun intended. And thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
California	LOCATION	0.99+
Mike	PERSON	0.99+
Frank Luman	PERSON	0.99+
BlackRock	ORGANIZATION	0.99+
Poland	LOCATION	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Malaysia	LOCATION	0.99+
Frank	PERSON	0.99+
Toronto	LOCATION	0.99+
Dell	ORGANIZATION	0.99+
Frank Slootman	PERSON	0.99+
one foot	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
thousands	QUANTITY	0.99+
2012	DATE	0.99+
Michael	PERSON	0.99+
Washington	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
one month	QUANTITY	0.99+
three year	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
Bozeman Montana	LOCATION	0.99+
New York	LOCATION	0.99+
last week	DATE	0.99+
Facebook	ORGANIZATION	0.99+
30 effects	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
two attributes	QUANTITY	0.99+
SCC	ORGANIZATION	0.99+
two women	QUANTITY	0.99+
Python	TITLE	0.99+
one quarter	QUANTITY	0.99+
one attribute	QUANTITY	0.99+
zero	QUANTITY	0.98+
today	DATE	0.98+
20 years	QUANTITY	0.98+
Las Vegas	LOCATION	0.98+
apple	ORGANIZATION	0.98+
Today	DATE	0.98+
10	QUANTITY	0.98+
first aspect	QUANTITY	0.98+
ServiceNow	ORGANIZATION	0.98+
two men	QUANTITY	0.98+
Kubernetes	TITLE	0.97+
tens of thousands	QUANTITY	0.97+
Elon	PERSON	0.97+
first	QUANTITY	0.97+
Snowflake Summit 2022	EVENT	0.97+
both worlds	QUANTITY	0.96+
First	QUANTITY	0.96+
one	QUANTITY	0.96+
S three	COMMERCIAL_ITEM	0.96+
one state	QUANTITY	0.95+
Supercloud	ORGANIZATION	0.95+
this year	DATE	0.95+
one place	QUANTITY	0.94+
first company	QUANTITY	0.93+
snowflake	ORGANIZATION	0.9+
Dave ante	PERSON	0.87+
hundreds of people	QUANTITY	0.87+
hundreds of sources	QUANTITY	0.85+
2022	DATE	0.85+

Rosemary Hua, Snowflake & Patrick Kelly, 84 51 | Snowflake Summit 2022

>>Hey everyone. Welcome back to the Cube's coverage of snowflake summit. 22 live from Las Vegas. We're at Caesar's forum, Lisa Martin, with Dave ante. We've been having some great conversations over the last day and a half. This guy just came from main stage interviewing the CEO, Franks Lubin himself, who joins us after our next guest here, we're gonna be talking customers and successes with snowflake Rosemary Hua joins us the global head of retail at snowflake and Patrick Kelly, the VP of product management at their customer 84 51. Welcome to the program guys. >>Thank you. It's nice to be here. So >>Patrick, 84 51. Talk to us about the business, give the audience an overview of what you guys are doing. And then we'll talk about how you're working with snowflake. >>Yeah, absolutely. Thank you both for, uh, the opportunity to be here. So 84 51 is a retail data science insights and media company. And really what that means is that we, we partner with our, uh, parent company Kroger, as well as consumer packaged goods or brands and brokers and agencies, really to understand shoppers and create relevant, personalized, and valuable experiences for shoppers in source and grocery stores. >>That relevance is key. We all expect that these days, I think the last couple of years as everyone's patience has been wearing. Yeah, very thin. I'm not, I'm not convinced it's gonna come back either, but we expect that brands are gonna interact with us and offer us the next best offer. That's actually relevant and personalized to us. How does AB 4 51 achieve that? >>Yeah, it's a great question. And you're right. That expectation is only growing. Um, and it takes data analytics, data science and all of these capabilities in order to deliver it on that promise, uh, you know, big, a big part of the relationship that retailers and brands have with consumers is about a value exchange. And it's, again, it's about that expectation that brands and retailers need to be able to meet the ever-changing needs of consumers. Uh, whether that be introducing new brands or offering the right price points or promotions or ensuring you meet them where they are, whether it be online, which has obviously been catalyzed by, um, the pandemic over the last two years or in store. So a deep understanding of, of the customer, which is founded in data and the appropriate analytics and science, and then the collaboration back with the retailers and, and the brands so that you can bring that experience to life. Again, that could be a price point on the, on the shelf, um, or it could be a personalized email or, um, website interaction that delivers the right experience for the co for the consumer. So they can see that value and really build loyalty >>In the right time in real time. That's >>One of the most Marrit I'm in real time. That's right. One goes, Mary, I love the concept of the, the actual platform of the retail data cloud. Yes. It's so unique for a technology company. Snowflake's a technology company, you see services companies do it all the time, but yeah, but to actually transform what was considered a data warehouse in the cloud to a platform for data, I call it super cloud. Yeah. Tell us how this came about, um, how you were able to actually develop this and where you are in that journey. >>Yeah, absolutely. It's been a big focus on data sharing. We saw that that's how our customers are interacting with each other is using our data sharing functionality to really bring that ecosystem to life. So that's retailers sharing with their consumer products companies selling through those retailers. And then of course the data service companies that are kind of helping both sides and that data sharing functionality is the kind of under fabric for the data cloud, where we bring in partners. We bring in customers and we bring in tech solutions to the table. Um, and customers can use the data cloud, not only with the powered by partners that we have, but also the data marketplace, getting that data in real time and making some business value out of that data. So that's really the big focus of snowflake is investing in industry to realize the business value >>And talk about ecosystem and how important that is, where, where you leave off and the ecosystem picks up and how that's evolving. >>Absolutely. And I'm sure you can join in on this, but, um, definitely that collaboration between retailers and CPGs, right? I mean, retailers have that rich first party customer data. They see all those transactions, they see when people are shopping and then the brands really need that first party data to figure out what their, how their customers are interacting with their brand. And so that collaborative nature that makes up the ecosystem. And of course, you've got the tech partners in the middle that are kind of providing enrich data assets as well. You guys at 84 51 are a huge part of that ecosystem being, you know, one of the key retailers in, in the United States. Um, have you been seeing that as well with your brands? Yeah, >>Absolutely. I mean data and data science has always been core to the identity of 84 51. Um, and historically a lot of the interaction that we have with brands were through report web based applications, right. And it's a really great seamless way to, to deliver insights to non-technical users. But as the entire market has really started to invest in data and data science and technology and capabilities, you know, we, we launched a collaborative cloud last year and it was really an opportunity for us to reimagine what that experience would look like and to ensure that we are meeting the evolving needs of the industry. And as Rosemary pointed out, you know, data sharing is, is table stakes, right? It's a capability that you don't wanna have to think about. You wanna be thinking about the strategic initiatives, the science that you're gonna create in order to drive action and personalize experiences. So what we've found at 84 51 is really investing in our collaborative cloud, um, and working with leading technology providers like snowflake to make that seamless has been, you know, the, the, the UN unlock to ensure that data and data science can be a competitive advantage for our clients and partners, not just, you know, the retailer in 84 51 >>Is the collaborative cloud built on snowflake. >>Yeah. So the collaborative cloud is really about, um, ensuring that data sharing through snowflake is done seamlessly. So we've really, we've invited our clients and partners to build their own science on 84 51 S first party data asset through Kroger. And our, our data is represents 60 million households, half of the United States, 2 billion transactions annually, the robustness of that data asset. And it's it's it's analysis ready is so impactful to the investment that brands can make in their own data science efforts, because brands wanna invest in data science, not to do data work, not to do cleaning and Muning and, and merging and, and standardizing. They wanna do analysis. That's gonna impact the strategies and ultimately the shopper's lives. So again, we're able to leverage the capabilities of snowflake to ensure data sharing is not part of our day to day conversation. Data sharing is something we can take for granted so that we can talk about the shopper and our strategies. >>So this is why I call it super cloud. So Jerry Chen wrote an article of castles in the cloud. And in there he said, he called it sub clouds. And I'm like, no, it's, uh, by the way, great article. Jerry's brilliant. But so you got AWS, you built on top of AWS. That's right. You got the snowflake data called you're building on top of that. And I was sitting at the table and my kid goes, this is super, I'm like, ah, super clouds. So I didn't really even coin it, but, and then I realized somebody else had use it before, but that is different. It's new, it's around data. It's around vertical industries. Yes. Um, I, I get a lot of heat for that term, but I feel like this look around this industry, everybody's doing that that's that is digital transformation. That's don't you see that with your customers? >>Absolutely. I mean, there's a lot of different industry trends where you can't use your own historical first party data to figure out what customers are doing. I mean, with COVID customers are behaving totally differently than they used to. And you can't use your historical data to predict out of stocks or how the customer's gonna be interacting with your brand anymore. And you need that third party macroeconomic data. You need that third party COVID data or foot traffic data to enrich what your businesses are doing. And so, yes, it, it is a super cloud. And I think the big differentiator is that we are cloud agnostic, meaning that, like you said, you can take the technology for granted. You don't have to worry about where the other person has their tech stack. It's all the same experience on the snowflake super cloud as he put it. So, >>So Patrick, talk about the, the, the impact that you have been able to have during COVID. I mean, everybody had supply chain issues, but, you know, if you took, if you took away the machine learning and the data science that you are initiating, would life have been harder? Do you have data on that? You know, the, the, what if we didn't have this capability during the >>Challenges? No, it's, it's a fantastic question. And I'll actually build on the example that Rosemary, um, offered around COVID and better understanding COVID. So, um, in the past, you know, when we talk about data sharing data collaboration, it's basically wasn't possible, right? What's your tech stack, what's mine. How do we share data? I don't wanna send you my data without go releasing governance. It was a non-starter and, you know, through technology like snowflake, as we launched the collaborative cloud, we actually had a pilot client start right at the beginning of 2020. Um, we, we had, you know, speced out it onto use cases that really impactful for their, for their organization. But of course, what happened is, uh, a pandemic hit us and it became the biggest question, CEO executive team, all the way down is what is happening, what is happening in our stores? >>How are shoppers behaving and what, what that client of ours came to realize is while we, we actually, we have access to the E 4 51 collaborative cloud. We can see half of America's behavior last week down to the basket transaction UPC level. Let's get going. So again, the conversation wasn't about, you know, what data sources, how do we scramble? How do we get it together? What technologies, how do we collaborate? It was immediately focused on building the analysis to better understand that. And, and the outcomes that drove actually were all the way from manufacturing impact to marketing, to merchandising, because that brand was able to figure out, Hey, our top selling products, they're, they're not on the shelves. What are shoppers doing? Are they going to a, another brand? Are they not buying it all together? Are they going to a different size? Are they staying within our product portfolio? Are they going to a competitor? And those insights drove everything again from what do we need to manufacture more to, how do we need to communicate and incent our, our, our shoppers, our, our loyal shoppers also what's happening to our non loyals. Are they looking for an, you know, an alternative that a need that we can serve that level of, of shopper and customer understanding going all the way up to a strategic initiatives is something that is enabled through the Supercloud >><laugh>. How do you facilitate privacy as we're seeing this proliferation of privacy legislation? Yeah. I think there's now 22 states that have individual, and California's changing to CPR a at the beginning of yes, January 23. How do you balance that need that ability to share data? Yeah. Equitably fast, quickly, but also balance consumer privacy requirements. >>I mean, I could take a stab first. I mean, at snowflake, right, there is no better place to share your data that in a governed way than with snowflake data sharing, because then you can see and understand how the other side is using your data. Whereas in traditional methods, using an API or using an FTP server, you wouldn't be able to actually see how the other side is using your data. But in addition to that, we have the clean room where you can actually join on that underlying PII data without exposing it, because you can share functions securely on, on both sides. So I think there is no better place to do it than here at snowflake. Um, and because we deeply understand those policies, I think we are kind of keeping up with the times trying to get in front of things so that our data sharing capabilities stay up to date. When you have to expunge records, identify records with CCPA and, and GDPR and, and all the rest that are coming. Um, and so, so, I mean, I think especially with 84 50 ones, um, you know, collaborative cloud also building on top of the clean room, um, in, in further road in the further roadmap, I think, uh, you're gonna see some of that privacy compliant, data sharing, coming to play as well. You >>Know, what's interesting, Patrick is we were just in that session with the Frank Q and a, and he was very candid about when he was talking about, uh, Apache, uh, I'm sorry. Apache iceberg. Yeah. Yes. And he, he basically flat out said, look, you know, you gotta put it into the snowflake data cloud. It's, it's better there, but people might, you know, want to put it outside, not get locked in, et cetera. But what I'm, I'm listening to you saying it's so much easier for you today that could evolve something open source. And, and how do you think about that in terms of placing your bets? >>Yeah, it, it's a great question and really to go back to privacy, um, as a total topic, I mean, you're right. It's extremely relevant topic. It's, it's, you know, very ever changing right now at 84 51. Privacy is, is first it's the foundation. Um, it it's table stakes and that's from a policy that's from a governance, it's from a technology capability standpoint. And it's part of our, our culture because, um, it, it, because it has to be, uh, and, and so when we, when we think about, you know, the products that we're gonna build, how we want to implement, it's, it's a requirement that we leverage technologies that enable us to secure the governance and ensure that we're privacy compliant. Um, the customer data asset that we have is, is, you know, is extremely valuable as we've talked about in this interview, it's also responsibility. And we take that very, very seriously. And so, you know, Dave, back to your question about, you know, decisions to go, you know, open source or leverage for technologies. So there's always a balance. You know, we, we love to push the, the bounds of innovation and, and we wanna be on the forefront of data, sharing data, science, collaboration for this industry. But at the same time, we balance that with making sure that our technology partners are the right ones, because we are not willing to compromise our governance and our fir and our, our privacy, uh, priorities. >>That's gonna be interesting to see how that evolves. And I, I loved that. Frank was so candid about it. I think the key for any cloud player, including a super cloud is you gotta have an ecosystem without an ecosystem. Forget it. And you see a lot of companies. I mean, we were at Dell tech world. They're kind of, they're at the beginnings of that, but the ecosystems, nothing like this, right. Which is amazing, nothing against, against Dell, they're just kind of getting started and you have to be open. You have to have optionality. Yep. You know, so I, I don't know if we'll see the day where they're including data, bricks, data lakes inside of the snowflake cloud. That will be amazing. <laugh> but you know, you never say never in the world of cloud, >>Do you stranger things, Rosemary and Patrick, thank you so much for joining us talking about what 84 51 is doing powered by snowflake and also the rise of the snowflake retail cloud and what that's doing. We'll have to have you back on to hear what's going on as I'm sure the adoption will continue to increase. Absolutely. Thank you so much to both for having us, our pleasure. You appreciate this for our guests. I'm Lisa Martin. He's Dave ante stick around Dave will be back with Frankman CEO of snowflake. Next. You won't wanna miss it.

Published Date : Jun 15 2022

SUMMARY :

the VP of product management at their customer 84 51. It's nice to be here. And then we'll talk about how you're working with snowflake. Thank you both for, uh, the opportunity to be here. That's actually relevant and personalized to us. with the retailers and, and the brands so that you can bring that experience to life. In the right time in real time. the cloud to a platform for data, I call it super cloud. So that's really the big focus of snowflake is investing in industry to realize the business value And talk about ecosystem and how important that is, where, where you leave off You guys at 84 51 are a huge part of that ecosystem being, you know, one of the key retailers in, Um, and historically a lot of the interaction that we have with brands were through report web based applications, And it's it's it's analysis ready is so impactful to the investment that That's don't you see that with your customers? And you can't use your historical data to predict I mean, everybody had supply chain issues, but, you know, if you took, It was a non-starter and, you know, through technology like snowflake, as we launched the collaborative cloud, So again, the conversation wasn't about, you know, what data sources, How do you balance that need that But in addition to that, we have the clean room where you can actually join And he, he basically flat out said, look, you know, you gotta put it into the snowflake data cloud. And so, you know, Dave, back to your question about, you know, decisions to go, And you see a lot of companies. We'll have to have you back on to hear what's going on as I'm sure the adoption

ENTITIES

Entity	Category	Confidence
Patrick	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Rosemary Hua	PERSON	0.99+
January 23	DATE	0.99+
Jerry Chen	PERSON	0.99+
Patrick Kelly	PERSON	0.99+
Franks Lubin	PERSON	0.99+
Kroger	ORGANIZATION	0.99+
Mary	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Frank	PERSON	0.99+
Rosemary	PERSON	0.99+
84 51	OTHER	0.99+
Jerry	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Apache	ORGANIZATION	0.99+
last week	DATE	0.99+
last year	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
United States	LOCATION	0.99+
both sides	QUANTITY	0.99+
22 states	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
both	QUANTITY	0.99+
84 51	ORGANIZATION	0.98+
GDPR	TITLE	0.98+
today	DATE	0.97+
2 billion transactions	QUANTITY	0.97+
first	QUANTITY	0.96+
Snowflake Summit 2022	EVENT	0.95+
America	LOCATION	0.94+
Frankman	PERSON	0.93+
one	QUANTITY	0.91+
Marrit	PERSON	0.89+
Cube	ORGANIZATION	0.88+
beginning	DATE	0.86+
snowflake	ORGANIZATION	0.85+
last two years	DATE	0.84+
COVID	OTHER	0.84+
60 million households	QUANTITY	0.79+
AB 4 51	ORGANIZATION	0.77+
2020	DATE	0.76+
One	QUANTITY	0.76+
CCPA	TITLE	0.72+
California	LOCATION	0.71+
Caesar	ORGANIZATION	0.71+
first party	QUANTITY	0.7+
last day	DATE	0.69+
E 4 51	OTHER	0.68+
84	ORGANIZATION	0.66+
Apache iceberg	ORGANIZATION	0.65+
last	DATE	0.64+
84 50	OTHER	0.63+
pandemic	EVENT	0.63+
51	OTHER	0.62+
annually	QUANTITY	0.54+
COVID	TITLE	0.53+
COVID	ORGANIZATION	0.53+
years	DATE	0.52+
Supercloud	ORGANIZATION	0.52+
Q	ORGANIZATION	0.51+
84 51 S	LOCATION	0.49+
22	QUANTITY	0.42+
COVID	EVENT	0.33+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Isha Sharma, Dremio | CUBE Conversation | March 2021

>>Well, welcome to the special cube conversation. I'm Jennifer with the cube, your host, we're here with Jeremy and Iisha Sharma director of product management for trim. We're going to talk about data, data lakes, the future of data, and how it works with cloud and in the new applications. Iisha thanks for joining me. >>Thank you for having me, John, >>You guys are a cutting-edge startup. You've got a lot of good action going on. You're kind of on the new, the new guard as Andy Jassy at AWS always talks about this. The old guard incumbents you guys are on the, on the new breed, you guys are doing the new stuff around data lakes and also making data accessible for customers. Uh, what, what is that all about? Take us through what is Dremio. >>So Dremio is the data Lake service that essentially allows you to very simply run SQL queries on directly on your data Lake storage, without having to make any of those copies that everybody's going on about all the time. So you're really able to get that fast time to value without having to have this long process of let's put in a request to my data team, let's make all of those copies and then finally get this very reduced scope of, of your data and still have to go back to your data team every time you need it, you need a change to that. So dreamy is bringing you that fast time to value with that. No copy data strategy, and really providing you the flexibility to keep your data in your data Lake storage, as the single source of truth. >>You know, the past 10 years, we've watched with cube coverage since we've been doing this program and in the community following from the early days of Hadoop to now, we've seen the trials and tribulations of ETL data warehousing. We've seen the starts and stops, and we've seen that the most successful formula has been store everything. Um, and then, you know, then the ease of use became a challenge. I don't want to have to hire really high powered engineers to manage certain kinds of clusters. I just got cloud now comes into the mix. I got on-premise storage, but the notion of a data Lake became hugely popular because it became a phrase meant store everything, and it meant different things to different peoples. And since then, teams of people have been hired to be the data teams. So it's kind of new. So I got to ask you, what is the challenge of these data teams? What do they look like? What's the psychology going on with some of the people on these teams? What problems are they solving what's going on? Because you know, they becoming data full >>To take >>Us through what's going on with data teams, >>To your point, the volumes, the variety of data, Eastern growing exponentially every day, there's really no end to it, right? And companies are looking to get their hands on as much data as they possibly can. So that means data teams in a position to how do I provide access to as many users as easily as possible that self service experience or data, um, and data democratization as much of a great concept as it is in theory, it comes with its own challenges in terms of all of those copies that ended up being created to provide the quote unquote self service experience. And then with all of these copies comes the cost to store all of them. And you've just added a tremendous amount of complexity and delayed your time to value significantly. >>You mentioned self-service is one of those things that seems like a moving train. Everyone I talked to is like, Oh, self-service is the Holy grail we've got to get to self-service almost. And then you get to some self serves, then you gotta, you gotta re rethink it cause more stuff's changing. So I have to ask in that capacity, you've got data architects and you've got analysts, the customer of the data. How's the, what's the relationship between those two is who gives and who gets, who drives it, who leans in to the analyst, feed the requirements into the architect, set up the boundaries. How is that relationship? Can you take us through how you guys view the relationship between the data analyst and architect? I mean data architect and the data analysts. >>Sure. So you have the data architect, the data team that's actually responsible for providing data access at the end of the day, right? They're the people that have the data democratization requirement on them. And so they've created these copies, tremendous amount of copies. A lot of the times the data Lake storage is, is that source of truth. But, um, you're copying your data into a data warehouse. And then what they end up doing is your, your end user, your analyst, they want, they all want different types of data. They want different views of this data. So there's a tremendous amount of personalized copies that the architects end up creating. And then on top of it, there's performance. We need to get everything back in a timely manner. Otherwise what's the point, right? Real time analytics. So there's all these performance related copies, whether that be additive tables or, you know, VI extract cues, all of that fun stuff. >>And so the architect is the one that's responsible for creating all of those. That's what they have to do to provide access to the analyst. And then, like I'm saying, when we need an update to that data set, when I discover that I have a new data set, that I need to join with an existing one, I have the analyst go to the data architect and say, Hey, by the way, I need this new data set. Can you make this usable for me? Or can you provide me access? And so then we did protect has to process that request now. And so again, coming back to all these copies that have been created, um, the data architect goes through a tremendous amount of work and almost, um, has, has to do this over and over again to actually make the data available to the analyst. But it's a cycle that goes on between the two. >>Yeah. It's interesting dynamic. It's a power dynamic, but also trying to get to the innovation. I've got to ask you, some people are saying that data copies are the major obstacle for democratization. How do you respond to that? What's your view? >>They absolutely are. Data copies are the complete opposite of data democratization. There's no aspect of self-service there, which is exactly what you're looking to do with data democratization. Um, because of those copies, how do you manage those? How do you govern those? How, uh, like I was saying, when somebody needs a new data set or an update to one, they have to go back to that data team. And there goes that self-service actually Dana coffees create a bottleneck because it all comes back to that data team that has to continue to get through those requests that are coming in from their analysts. So, uh, data copies and data democratization is completely automated. >>You know, I remember talking to David latte in a cube event two years ago, he said infrastructure as code was the big DevOps movement. And we felt that data ops would be something similar where data as code, where you didn't have to think about it. So you're kind of getting to this idea of, you know, copies are bad because it doesn't, it holds back that self-service this modern error is looking for more of programmability with data. Kind of what you're teasing out here is that's the modern architecture. Is that how you see it? How do, how do you see, uh, a, uh, a modern data architecture? >>Yeah, so the modern data or the data architecture has evolved significantly in the last several years, right? We started with traditional data warehouses and the traditional data Lake with Duke where the storage and compute were totally tightly coupled. And then we moved on to cloud data warehouses, where there was a separation of compute and storage, and that provided a little more flexibility there. But then with the modern data architecture now with cloud data lakes, you have this aspect of separating, not only storage and compute, but also compute data. So that creates a separate tier for data altogether. What does that look like? So you have your data and your feeling storage as three ATLs, whatever it may be. And on top of that. So of course it's an open format, right? And so on top of that, thanks to technology. It's like Apache iceberg and Delta Lake. There's this ability to give your files, your data, a table structure. And so that starts to bring the capabilities that a data warehouse was providing the data. Thanks to these. You have the ability to do transactions, record level mutations, burgeoning things that were missing completely from a data Lake architecture before. And so, um, introducing that, that data to your, having that separation of compute and data really, really accelerate the ability to get that time to value because you're keeping your data in the data Lake storage at the end of the day. >>And it's interesting, you see all the hot companies tend to be, have that kind of mindset and architecture, and it's creating new opportunities as a ton of white space. So I have to kind of ask you guys, how does Dremio fit into this because you guys are playing in this kind of the new wave here with data it's growing extremely, it's moving fast. You got, again, edge is developing more. Data's coming in at the edge. You've got hybrid testing multi-cloud environments on the horizon. I mean this ultimate multicloud, but I mean, data in real time across multiple clouds is the next kind of area people are focused on. What does, what's the role of GMU and all this to take, take us through that. >>Yeah. So Dremio provides, again, like I said, this data Lake service, and we're all referring to just storage or Hadoop. When we say data Lake, we're talking about an entire solution. Um, so you keep your data, you keep your data in your data, Lake orange. And then on top of that, with the integrations that Dremio has with Apache iceberg and Delta, like we do provide that data here that I was talking about. And so you've given your data, this table structure, and now you can operate on it like you would in a data warehouse. So there's really no need to move your data from a data Lake data warehouse, again, keeping that data Lake as that source of truth. And then on top of that, um, when we talk about copies, personalized copies, performance related copies, you, you really, like I was saying, you've created so much complexity with Jeremy of you don't do that when it comes to personalized copies, we've got the semantic layer and that's a very key aspect of Dremio where you can provide as many views of, of data that you want without having to make any copies. So it really accelerates that, that data democratization story, and then when it, >>So it's the no cop, my strategy trim, you guys are on it, but you're about no copy keeps semantic layer, have that be horizontal across whatever environment and just applications have, can applications tap into this, or how do you guys integrate into apps if I'm an app developer, for instance, how does that work? >>Of course. So that's, that's one of the most important use cases in the sense that when there's an application or even when it's a, you know, a BI client or some other tool that's tapping into the data in S3 or ATLs, a lot of people see performance degradation. Typically with the Dremio, that's not the case we've got, Aeroflight integrated into Tremino, it's a key component as well. And that puts so much, uh, it, so put so much ease in terms of running dashboards off of that, running your analytics apps off of that, because that replay can deliver 20 times the performance that PIO DBC could. So coming back to the no data strategy or note copy data strategy, there's no those local copies anymore that you needed to make. >>So one of the things I got to ask you is, cause this comes up all the time. So she had less pass re-invent. I notice again, Amazon was, I was banging on this hard Azure as well on their side too. Their whole thing is we want to take the AI environment and make it so that people can normal people can use it and deploy machine learning. The same thing kind of comes down into this layer where you're talking about is this democratization is a huge trend because you don't have to be super peaked, you know, math, PhD, data scientist, or ETL, or data Wrangler. You just want to actually code the data or play party with the data in any way you want to do with it. So, so the question I have is is that that's certainly a great trend and no one debates that, but the reality is people are storing data, like almost hoarding it, just throw it in a data Lake and we'll deal with them later. How does you guys solve that problem? Because once that starts happening, do you have to hire someone super smart to dig that out or rearchitected or because that seems to be kind of the pattern, right? You know, throw everything into data Lake, uh, and we'll deal with it later >>Called the data swamp. And it's like, no one knows what's going on. >>Of course though, you don't actually want to throw everything into a data Lake. There still needs to be a certain amount of structure that all of this lands in. You want it to live in one place, but have still a little bit of structure so that, um, Dremio and other are, are much more enabled to query that with fantastic performance. So there's, there's still some amount of structure that needs to happen at a data Lake level, but from, uh, that semantic layer that we have with during the, you you're, you're creating structure for your end user, >>How would you advise, how would you advise someone who wants to hedge their future and not take on too much technical debt, but says, Hey, you know, I do have the store. Is there a best practice on kind of some guard rails around getting going, how do you, how do you advise your customers who want to get it going? >>So how we advise our customers is again, plugin put your, put your data in that data Lake. A lot of them already have three TLS in place. And getting started with Bermeo is really easy. I would say I did it for the first time and it took a matter of minutes if not less. And so what you're doing with Dremio is connecting data directly to that data source and then creating a semantic layer on top. So you bring together a bunch of data. That's sitting in your data Lake, you know, if that sales data and Sophia, and we give you a really streamlined way to say together, the, you know, last, however, we go back in time, create a view on top of all of that. If you have that structured in folders as great, we will provide you a way to create one view on top of all of that, as opposed to having a view for every day or whatnot. And so again, that semantic layer really comes in handy when you're trying to, as the architect provide access to this data Lake. And then as the user who just, just interacts with the data as, as the views are provided to them, there's really, uh, there's a whole lot of transparency there, and it's really easy to get up and running with drumming. >>I'm looking forward to it. I got to finally ask the question is how do I get started? How do people engage with you guys? Is it, is it a freemium? Is it a cloud service? What's the requirements? What are some of the ways that people can engage and work with you guys? >>Yeah, so we get started, uh, on our website at dot com. And speaking of self-service, we've got a virtual lab at dremio.com/labs that you can get started with that gives you a product tour and even gives you a getting started, walk through the tissue through your first query so that you can see how well it works. And in addition to that, we've got a free trial of Dremio available on AWS marketplace. >>Awesome. Net marketplace is a good place to download stuff. So can I ask you a personal question, Isha? Um, you're the director of product management. You get to see inside the kitchen where everyone's making the, making the product. You also got the customer relationships out there looking at product market fit, as it evolves, customer's requirements evolve. What's some of the cool things that you've seen in this space. That's just interesting to you that either you kind of expected or maybe some surprises, what's the coolest thing you've seen come out of this new data environment we're living in. >>I think just the ability to the way things have evolved, right? It used to be data Lake or data warehouse, and you pick one, you probably have both, but you're not like reaching either to their highest potential. Now you've got, this is coming together of both of them. I think it's been fantastic to see how you've got technology is like a iceberg and Delta Lake and bringing those two things together. And you know, you're in your data Lake and it's great in terms of cost and storage and all of that. But now you're able to have so much flexibility in terms of some of those data warehouse capabilities. And on top of that with technologies like Dremio, and just in general, this open format concept, you're, you're never locked in with a particular vendor with a particular format. You're not locking yourself out of a technology that you don't even know exists yet. And thinking in the past, you were always going to end up there. You always ended up putting your data in something where it was going to be difficult to change it, to get it out. But now you have so much flexibility with the open architecture that's coming. What's the DNA like of the >>Culture at Treme. And obviously you've got a cutting edge. We're in a big, hot wave data. You're enabling a lot of value. Uh, what's the, what's it like there at Jemena? What do you guys strive for? What's the purpose? What's the, what's the DNA of the culture. >>There's a lot of excitement in terms of getting customers to this flexibility, to get them out of things they're locked into really in providing them with accessibility to their data, right? This data access data democratization concept to make that actually happen so that, you know, time to value is a key thing. You want to derive insights out of your, out of your data. And everybody, I drove you in super excited and charging towards that, >>Unlocking that value. That's awesome. Aisha, thank you for coming on the cube conversation. Great to see you. Thanks for coming on. Appreciate it. He's just Sharma director of product management. Dremio here inside the cube. I'm John for your host. Thanks for watching.

Published Date : Mar 17 2021

SUMMARY :

We're going to talk about data, data lakes, the future of data, you guys are on the, on the new breed, you guys are doing the new stuff around data lakes and also So Dremio is the data Lake service that essentially allows you to very following from the early days of Hadoop to now, we've seen the trials and tribulations of ETL So that means data teams in a position to And then you get to some self serves, then you gotta, you gotta re rethink it cause more A lot of the times the data Lake storage one, I have the analyst go to the data architect and say, Hey, by the way, How do you respond to that? Um, because of those copies, how do you manage those? Is that how you see it? the modern data architecture now with cloud data lakes, you have this aspect So I have to kind of ask you guys, how does Dremio fit So there's really no need to move your data from a data Lake that when there's an application or even when it's a, you know, a BI client or So one of the things I got to ask you is, cause this comes up all the time. And it's like, no one knows what's going on. that semantic layer that we have with during the, you you're, you're creating structure for your end user, How would you advise, how would you advise someone who wants to hedge their future and not take So you bring together a bunch of data. What are some of the ways that people can engage and work with you guys? so that you can see how well it works. That's just interesting to you that either you kind of expected or maybe some surprises, And you know, you're in your data Lake and it's great in terms What do you guys strive for? make that actually happen so that, you know, time to value is a Aisha, thank you for coming on the cube conversation.

ENTITIES

Entity	Category	Confidence
Jeremy	PERSON	0.99+
Andy Jassy	PERSON	0.99+
Aisha	PERSON	0.99+
March 2021	DATE	0.99+
Isha Sharma	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Iisha Sharma	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dremio	PERSON	0.99+
20 times	QUANTITY	0.99+
John	PERSON	0.99+
Jennifer	PERSON	0.99+
Iisha	PERSON	0.99+
Dremio	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Sharma	PERSON	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
Sophia	PERSON	0.99+
one	QUANTITY	0.99+
GMU	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
David latte	PERSON	0.99+
two years ago	DATE	0.98+
first time	QUANTITY	0.98+
Delta	ORGANIZATION	0.98+
first query	QUANTITY	0.98+
Bermeo	ORGANIZATION	0.98+
Duke	ORGANIZATION	0.98+
dremio.com/labs	OTHER	0.95+
S3	TITLE	0.95+
dot com	ORGANIZATION	0.95+
Apache iceberg	ORGANIZATION	0.94+
SQL	TITLE	0.93+
Jemena	ORGANIZATION	0.93+
one place	QUANTITY	0.92+
Azure	TITLE	0.91+
Isha	PERSON	0.9+
single source	QUANTITY	0.88+
one view	QUANTITY	0.83+
Dana coffees	ORGANIZATION	0.8+
past 10 years	DATE	0.73+
last several years	DATE	0.73+
Treme	ORGANIZATION	0.72+
three	QUANTITY	0.71+
Lake	ORGANIZATION	0.68+
Dremio	TITLE	0.64+
Aeroflight	TITLE	0.64+
Tremino	TITLE	0.57+
Delta Lake	ORGANIZATION	0.56+
dreamy	PERSON	0.55+
Lake	LOCATION	0.46+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Apache iceberg: