Analyst Predictions 2023: The Future of Data Management
(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)
SUMMARY :
and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Carl | PERSON | 0.99+ |
Carl Olofson | PERSON | 0.99+ |
Dave Menninger | PERSON | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Dave Valente | PERSON | 0.99+ |
Collibra | ORGANIZATION | 0.99+ |
Curt Monash | PERSON | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Christian Kleinerman | PERSON | 0.99+ |
Dave Valente | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sanjeev | PERSON | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Ventana Research | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Hazelcast | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Tony Bear | PERSON | 0.99+ |
25% | QUANTITY | 0.99+ |
2021 | DATE | 0.99+ |
last year | DATE | 0.99+ |
65% | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
today | DATE | 0.99+ |
five-year | QUANTITY | 0.99+ |
TigerGraph | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two services | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
RisingWave Labs | ORGANIZATION | 0.99+ |
Lie 2, An Open Source Based Platform Cannot Give You Performance and Control | Starburst
>>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lake. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again, like iceberg and Delta and hoote that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the clothes is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect and what you don't want to end up done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in hit back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you wanna use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers there, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, the Jammin us on price and the license cost, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So IE, it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understanding holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.
SUMMARY :
give you the performance and control that you can get with a proprietary We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a quote from, you know, Kurt Monash many years ago where he said, you know, it is an evolving, you know, spectrum, but, but from your perspective, in a, a direction, slightly different to what people expect and what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, it's interesting remind of when I, you know, I see the, the gas price, the TSR gas price And I think, you know, I loved what Richard said. you know, the Jammin us on price and the license cost, but we do get value out And so for those different teams, they can get to an you know, the data brick snowflake, you know, thing is always a lot of fun for analysts like me. So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack is
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jess Borgman | PERSON | 0.99+ |
Richard | PERSON | 0.99+ |
20 cents | QUANTITY | 0.99+ |
six | QUANTITY | 0.99+ |
Justin | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
pythons | TITLE | 0.99+ |
seven years | QUANTITY | 0.99+ |
Today | DATE | 0.99+ |
Javas | TITLE | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
VMware | ORGANIZATION | 0.98+ |
millions | QUANTITY | 0.98+ |
EVAs | ORGANIZATION | 0.98+ |
JAK | PERSON | 0.98+ |
Starburst | ORGANIZATION | 0.98+ |
both | QUANTITY | 0.97+ |
10 | DATE | 0.97+ |
12 years ago | DATE | 0.97+ |
Starbust | TITLE | 0.96+ |
today | DATE | 0.95+ |
Apache iceberg | ORGANIZATION | 0.94+ |
ORGANIZATION | 0.93+ | |
12 years | QUANTITY | 0.92+ |
single point | QUANTITY | 0.92+ |
two worlds | QUANTITY | 0.92+ |
10 | QUANTITY | 0.91+ |
Hudu | LOCATION | 0.91+ |
Unix | TITLE | 0.9+ |
one thing | QUANTITY | 0.87+ |
trillions of records | QUANTITY | 0.83+ |
first data lake | QUANTITY | 0.82+ |
Starburst | TITLE | 0.8+ |
PJI | ORGANIZATION | 0.79+ |
years ago | DATE | 0.76+ |
IE | TITLE | 0.75+ |
Lie 2 | TITLE | 0.72+ |
many years ago | DATE | 0.72+ |
over a couple times | QUANTITY | 0.7+ |
TCO | ORGANIZATION | 0.7+ |
Parque | ORGANIZATION | 0.67+ |
Number two | QUANTITY | 0.64+ |
Kubernetes | ORGANIZATION | 0.59+ |
a decade | QUANTITY | 0.58+ |
plus years | DATE | 0.57+ |
Azure | TITLE | 0.57+ |
S3 | TITLE | 0.55+ |
Delta | TITLE | 0.54+ |
20 | QUANTITY | 0.49+ |
last | DATE | 0.48+ |
Mohan | PERSON | 0.44+ |
ORC | ORGANIZATION | 0.27+ |
Starburst The Data Lies FULL V2b
>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.
SUMMARY :
famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Theresa | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Jeff Ocker | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Justin Boardman | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Dani | PERSON | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
20 cents | QUANTITY | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jamma | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
2011 | DATE | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Today | DATE | 0.99+ |
two models | QUANTITY | 0.99+ |
Zolando Comcast | ORGANIZATION | 0.99+ |
Gemma | PERSON | 0.99+ |
Starbust | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Javas | TITLE | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
first lie | QUANTITY | 0.99+ |
10 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
Tomorrow | DATE | 0.99+ |
Starburst The Data Lies FULL V1
>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.
SUMMARY :
famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Theresa | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Jeff Ocker | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Justin Boardman | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Dani | PERSON | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
20 cents | QUANTITY | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jamma | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
2011 | DATE | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Today | DATE | 0.99+ |
two models | QUANTITY | 0.99+ |
Zolando Comcast | ORGANIZATION | 0.99+ |
Gemma | PERSON | 0.99+ |
Starbust | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Javas | TITLE | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
first lie | QUANTITY | 0.99+ |
10 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
Tomorrow | DATE | 0.99+ |
Starburst Panel Q2
>>We're back with Jess Borgman of Starburst and Richard Jarvis of emus health. Okay. We're gonna get into lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you'll, you'll never get performance because you need to be column. You need to store data in a column format. And then, you know, column formats were introduced to, to data lakes. You have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and DY that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a lie and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, the closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen the technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, but want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, obviously her vision is there's an open source that, that data mesh is open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to hit and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in had back then. And I think, think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, it's interesting reminded when I, you know, I see the, the gas price, the TSR gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you, you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up. You mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down cause I thought it was amazing quote. He said, it buys us the ability to be unsure of the future. That that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use smart to train a machine learning model and you wanna use Starbust to query be a sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you and, and locks you in. >>So I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit yeah, you know, they Jimin some price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast an ROI? >>I think the answer to that is it can depend a bit. It depends on your business's skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like P Sanji Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you command a 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years and in the world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse, it fit in this, in this world. >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a data lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access control so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle when it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage.
SUMMARY :
cannot give you the performance and control that you can get with We got, you know, largely over the performance hurdle, you know, more recently people will say, And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, open systems and so it's, it is an evolving, you know, spectrum, And what you don't want to end up So Justin, let me play devil's advocate here a little bit, and I've talked to JAK about this and you know, And I think, think similarly, you know, being able to connect to an external table that lives in an open data Well, it's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, I think the answer to that is it can depend a bit. that strike me, you know, the data brick snowflake, you know, thing is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, years and in the world of Oracle, you know, normally it's the staff, it easy to discover and consume via, you know, the creation of data products as well. data model that we see emerging and the so-called modern data stack
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
20 cents | QUANTITY | 0.99+ |
20% | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
P Sanji Mohan | PERSON | 0.99+ |
Today | DATE | 0.99+ |
seven years | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
JAK | PERSON | 0.99+ |
Javas | TITLE | 0.99+ |
10 | DATE | 0.99+ |
today | DATE | 0.98+ |
Starbust | TITLE | 0.98+ |
Starburst | ORGANIZATION | 0.97+ |
VMware | ORGANIZATION | 0.97+ |
both | QUANTITY | 0.97+ |
12 years ago | DATE | 0.96+ |
single point | QUANTITY | 0.96+ |
millions of hours | QUANTITY | 0.95+ |
10 | QUANTITY | 0.93+ |
Unix | TITLE | 0.92+ |
12 years | QUANTITY | 0.92+ |
ORGANIZATION | 0.9+ | |
two worlds | QUANTITY | 0.9+ |
DY | ORGANIZATION | 0.87+ |
first data lake | QUANTITY | 0.86+ |
Hudu | LOCATION | 0.85+ |
trillions | QUANTITY | 0.85+ |
one thing | QUANTITY | 0.83+ |
many years ago | DATE | 0.79+ |
Apache iceberg | ORGANIZATION | 0.79+ |
over a couple times | QUANTITY | 0.77+ |
emus health | ORGANIZATION | 0.75+ |
Jimin | PERSON | 0.73+ |
Starburst | TITLE | 0.73+ |
years ago | DATE | 0.72+ |
Azure | TITLE | 0.7+ |
Kubernetes | ORGANIZATION | 0.67+ |
TCO | ORGANIZATION | 0.64+ |
S3 | TITLE | 0.62+ |
Delta | ORGANIZATION | 0.6+ |
plus years | DATE | 0.59+ |
Number two | QUANTITY | 0.58+ |
a decade | QUANTITY | 0.56+ |
iceberg | TITLE | 0.47+ |
Parque | ORGANIZATION | 0.47+ |
last | DATE | 0.47+ |
20 | QUANTITY | 0.46+ |
Q2 | QUANTITY | 0.31+ |
ORC | ORGANIZATION | 0.27+ |
Manish Devgan, Hazelcast | Kubecon + Cloudnativecon Europe 2022
>>The cube presents, Coon and cloud native con Europe, 2022. Brought to you by red hat, the cloud native computing foundation and its ecosystem partners. >>Welcome to Licia Spain and cube con cloud native con 2022 Europe. I'm Keith Townsend, along with Paul Gillon senior editor, enterprise architecture for Silicon angle. We're gonna talk to some amazing folks. Day two coverage of Q con cloud native con Paul. We did the wrap up yesterday. Great. A great back and forth about what en Rico about yesterday's, uh, session. What are you looking for to today? >>I'm looking for, uh, to understand better, uh, how Kubernetes is being put into production, the types of applications that are being built on top of it. Yesterday, we talked a lot about infrastructure today. I think we're gonna talk a little bit more about applications, including with our first guest. >>Yeah, I was speaking our first guest. We have ish Degan CPO chief product officer at Hazelcast Hazelcast has been on the program before, but you, this is your first time in the queue, correct? >>It, it is Keith. Yeah. Well, >>Welcome to been Cuban. So we're talking data, which is always a fascinating topic. Containers are, have been known for not being supportive of stateful applications. At least you shouldn't hold the traditional thought. You shouldn't hold stateful data in containers. Tell me about the relationship between Hazel cast and containers we're at Cuan. >>Yeah, so a little bit about, uh, Hazelcast. We are a real time data platform and, uh, we are not a database, but a data platform because we basically allow, uh, data at rest as well as data in motion. So you can imagine that if you're writing an application, you can basically query and join a data coming in events, as well as data, which might have been persisted. So you can do both stream processing as well as, you know, low latency data access. And, and this platform of course, is supported on all the clouds. And we kind of delegate the orchestration of this kind of scale out system to Kubernetes. Um, and you know, that provides a resiliency and many things which go along with that. >>So you say you don't, you're not a database platform. What are you used for to manage the data? >>So we are, uh, we are memory first. So we are, you know, we started with low latency applications, but then we realized that real time has really become a business term. It's it's more of a business SLA mm-hmm, <affirmative>, it's really the, we see the opportunity, the punctuated change, which is happening in the market today is about real time data access to real time. I mean, there are real time applications. Our customers are building around real time offers, um, realtime thread detection. I mean, just imagine, you know, one of our customers like B and P par bars, they have, they basically originate a loan while the customer is banking. So you are in an ATM machine and you swipe your card and you are asking for, you know, taking 50 euros out. And at that point they can actually originate a custom loan offer based on your existing balance you're existing request and your credit score in that moment. So that's a value moment for them and they actually saw 400% loan origination go up because of that, because nobody's gonna be thinking about a credit, uh, line of credit after they're done banking. So it's in that value moment and we allow basically our data platform allows you to have fast access to data and also process incoming streams. So not before they get stored, but as they're coming in. >>So if I'm a developer and cuon is definitely a conference for developer and I, I come to the booth and I hear <inaudible>, that's the end value. I, I hear what I can do with my application. I guess the question is, how do I get there? I mean, uh, if it's not a database, how do I make a call from a container to, from my microservice to Hazel cath? Like, do I think of this as a, uh, a CNI or, or C CSI? How do I access >>PA care? Yeah. So, so we, uh, you know, we are, our server is actually built in Java. So a lot of the application which get written on top of the data platform are basically accessing through Java APIs. Or as you have a.net shop, you can actually use.net API. So we are basically an API first platform and SQL is basically the polyglot way of accessing data, both streaming data, as well as it store data. So most of the application developers, a lot of it is run done in microservices, and they're doing these fast get inputs for data. So they, they have a key, they want to get to a customer, they give a customer ID. And the beauty is that, um, while they're processing the events, they can actually enrich it because you need contextual information as well. So going back to the ATM example, you know, at that event happened, somebody swiped the card and ask for 50 euros, and now you want more information like credit score information, all that needs to be combined in that, in that value moment. >>So we allow you to do those joins and, you know, the contextual information is very important. So you see a lot of streaming platform out there, which just do streaming, but if you're an application developer, like you asked, you have to basically do call out to a streaming platform to get, um, to do streaming analytics and then do another call to get the context of that. You know, what is the credit score for this customer? But whereas in our case, because the data platform supports both streaming as well as data at rest, you can do that in one call and, you know, you don't want to have the operational complexity to stand out. Two different scale out servers is, is, is, is humongous, right? I mean, you want to build your business application. So, >>So you are querying data streaming data and data rest yes. In the same query >>Yes. In the same query. And we are memory first. So what happens is that we store a lot of the hot data in memory. So we have a scale out Ram based server. So that's where you get the low latency from. In fact, last year we did a benchmark. We were able to process a billion events a second, uh, with 99% of the latency under 30 milliseconds. So that kind of processing and that kind of power is, and, and the most important thing is determinism. I mean, you know, there's a lot of, um, if you look at real time, what real time is, is about this predictable latency at scale, because ultimately your, your adhering to a business SLA is not about milliseconds or microsecond. It's what your business needs. If your business needs that you need to deny or, uh, approve a credit credit card transaction in 50 milliseconds, that's your business SLA, and you need that predictability for every transaction. >>So talk to us about how how's this packaged in consumed. Cause I'm hearing a, a bunch of server Ram I'm hearing numbers that we're trying to adapt away from at this conference. We don't wanna see the onlay. We just want to use it. >>Yeah. So, so we kind of take a bit that, that complexity of managing this scale out, um, uh, uh, cluster, which actually utilizes Rams from each server. And then, you know, if you, you can configure it so that the hard set of data is in Ram, but the data, which is, you know, not so hard can actually go into a tiered storage model. So we are memory first. So, but what you are doing is you're doing simple, it's an API. So you do basically a crud, right? You create records, you read them through SQL. So for you, it's, it's, it's kind of like how you access that database. And we also provide you, you know, real time is also a journey. I mean, a lot of customers, you know, you don't want to rip their existing system and deploy another kind of scale out platform. Right? So we, we see a lot of these use cases where they have a database and we can sit in between the database, a system of record and the application. So we are kind of in between there. So that's, that's the journey you can take to real time. >>How does Kubernetes, uh, containers and Kubernetes change the game for real time analytics? >>Yeah. So, uh, Kubernetes does change it because what's hap first of all, we service most of the operational workloads. So it's, it's more on the, a lot of our customers. We have most, most of the big banks credit card companies in financial services and retail. Those are the two big sectors for us. And first of all, you know, a lot of these operational workloads are moving to the cloud and with move to the cloud, they're actually taking their existing applications and, and moving to, you know, one of the providers and to kind of orchestrate this scale out platform, which does auto scaling, that's where the benefit comes from mm-hmm <affirmative>. And it also gives them the freedom of choice. So, you know, the Kubernetes is, you know, a standard which goes across cloud providers. So that gives them the benefit that they can actually take their application. And if they want, they can actually move it to a different, a different cloud provider because we take away the orchestration complexity, you know, in that abstraction layer. >>So what happens when I need to go really fast? I mean, I, I, I need, uh, I'm looking at bare metal and I'm looking at really scaling a, a, a homogeneous application in a single data center set of data centers. Is there a bare metal play here? >>Yes. There, there, there are some very, very, uh, like if you want microsecond latency, mm-hmm, <affirmative>, um, you know, we have customers who actually store two to four terabytes in Ram and, and they can actually stand up. Um, you know, again, it depends on what kind of deployment you want. You can either scale up or scale out, scaling up is expensive, you know, because those boxes are not cheap, but if you have a requirement like that, where there is sub millisecond or microphone latency requirement, you could actually store the entire data set. I mean, a lot of the operational data sets are under four terabytes. So it's not uncommon that you could actually take the entire operational transactional data set, actually move, move that to a pure Ram. But, uh, I think now we, we also see that these operational workloads are also, there's a need for analytics to be done on top as well. >>I mean, we, going back to the example I gave you, so this, this, uh, customer is not only doing stream crossing, they're also influencing a machine learning algorithm in that same, in the same kind of cycle in the life cycle. So they might have trained a machine learning or algorithm on a data lake somewhere, but once they're ready, they're actually influencing the ML algorithm in our kind of life cycle right there. So, you know, that that really brings analytics and transactions kind of together because after all transactions are where the real, you know, insights are. >>Yeah. I'm, I'm struggling a little bit with this, with these two different use cases where I have transactional basically a transactional database or transactional data platform alongside a analytics platform. Those are two, like they're two different things. I have a, you know, I, I have spinning rust for one, and then I have memory and, and MBME for another. Uh, and that requires tuning requires DBAs. It requires a lot of overhead, there seems to be some type of secret sauce going on here. >>Yeah. Yeah. So, I mean, you know, we, we basically say that if you are, if you have a business case where you want to make a decision, you know, you, the only chance to succeed is where you are not making a decision tomorrow based on today's data. Right? I mean, the only way to act on that data is today. So the act is a keyword here. We actually let you generate a realtime offer. We, we let you do credit card fraud detection. In that moment, the analytics is about knowing less about acting on it. Right? Most of our applications are machine critical. They're acting on real time. I think when you talk about like the data lakes there, there's actually a real time there as well, but it's about knowing, and we believe that the operational side is where, you know, that value moment is there, you know, what good is, is to know about something tomorrow, you know, if something wrong happened, I mean, it, yeah, so there's a latency squeeze there as well, but we are on, on more on the kind of transaction and operational side. >>I gotcha. Yeah. So help me understand, like integrations. A lot of the, the, when I think of transactions, I'm thinking of SAP, Oracle, where the process is done, or some legacy banking or not legacy or new modern banking app, how does the data get from one platform to a, to Hazel cast so I can make those >>Decisions? Yeah. So we have, uh, this, the streaming engine, we have has a whole bunch of connectors to a lot of data sources. So in fact, most of our use cases already have data sources underneath there, their databases there's KA connectors, you know, joining us because if you look at it, events is, are comprised of transactions. So something, a customer did, uh, a credit card swipe, right. And also events events could be machine or IOT. So it's really unique connectivity and data ingestion before you can process that. So we have, uh, a whole suite of connectors to kind of bring data in, in our platform. >>We've been talking a lot, these last couple of days about, uh, about the edge and about moving processing capability closer to the edge. How do you enable that? >>Yeah. So edge is actually very, very relevant because of what's happening is that, um, you know, if you, if you look at like a edge deployment use case, um, you know, we have a use case where data is being pushed from these different edge devices to cloud data warehouse. Right. But just imagine that you want to be filtering data at the, at, at where it is being originated from, and you wanna push only relevant data to, to maybe a central data lake where you might want to do, you know, train your machine learning models. Mm-hmm <affirmative> so that at the edge, we are actually able to process that data. So Hazel cast will allow you to actually write a data pipeline and do stream processing so that you might want to just push, you know, a part or a subset of data, which applies by the rules. Uh, so there's, there's a big, um, uh, I think edge is, you know, there's a lot of data being generated and you don't want like garbage and garbage out there's there's, there is there's filtration done at the edge. So that only the relevant data lands in a data, data lake or something like that. >>Well, Monash, we really appreciate you stopping by realtime data is an exciting area of coverage for the queue overall from Valencia Spain, I'm Keith Townsend, along with Paul Gillon, and you're watching the queue, the leader in high tech coverage.
SUMMARY :
Brought to you by red hat, What are you looking for to today? the types of applications that are being built on top of it. product officer at Hazelcast Hazelcast has been on the program before, It, it is Keith. At least you shouldn't hold the traditional thought. So you can imagine that if you're writing an application, So you say you don't, you're not a database platform. So we are, you know, we started with low So if I'm a developer and cuon is definitely a conference for developer So a lot of the application which get written on top of the data platform are basically accessing through Java So we allow you to do those joins and, you know, the contextual information is very important. So you are querying data streaming data and data rest yes. I mean, you know, So talk to us about how how's this packaged in consumed. I mean, a lot of customers, you know, you don't want to rip their existing system and deploy another a different cloud provider because we take away the orchestration complexity, you know, So what happens when I need to go really fast? So it's not uncommon that you could after all transactions are where the real, you know, insights are. I have a, you know, I, I have spinning rust for one, you know, that value moment is there, you know, what good is, is to know about something tomorrow, not legacy or new modern banking app, how does the data get from one platform to a, you know, joining us because if you look at it, events is, are comprised of transactions. How do you enable that? um, you know, if you, if you look at like a edge deployment use Well, Monash, we really appreciate you stopping by realtime data is an
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Keith Townsend | PERSON | 0.99+ |
Paul Gillon | PERSON | 0.99+ |
99% | QUANTITY | 0.99+ |
400% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Hazel cast | ORGANIZATION | 0.99+ |
Java | TITLE | 0.99+ |
Hazelcast | ORGANIZATION | 0.99+ |
50 milliseconds | QUANTITY | 0.99+ |
50 euros | QUANTITY | 0.99+ |
Keith | PERSON | 0.99+ |
Manish Devgan | PERSON | 0.99+ |
yesterday | DATE | 0.99+ |
today | DATE | 0.99+ |
Yesterday | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
tomorrow | DATE | 0.99+ |
first guest | QUANTITY | 0.99+ |
first time | QUANTITY | 0.99+ |
Valencia Spain | LOCATION | 0.99+ |
50 euros | QUANTITY | 0.99+ |
SQL | TITLE | 0.99+ |
one call | QUANTITY | 0.99+ |
four terabytes | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
each server | QUANTITY | 0.98+ |
one platform | QUANTITY | 0.98+ |
SAP | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.97+ |
under 30 milliseconds | QUANTITY | 0.97+ |
first platform | QUANTITY | 0.97+ |
a billion events | QUANTITY | 0.95+ |
Coon | ORGANIZATION | 0.94+ |
2022 | DATE | 0.94+ |
single | QUANTITY | 0.94+ |
two different things | QUANTITY | 0.94+ |
Kubecon | ORGANIZATION | 0.93+ |
Cloudnativecon | ORGANIZATION | 0.93+ |
two different use cases | QUANTITY | 0.92+ |
Day two | QUANTITY | 0.92+ |
two big sectors | QUANTITY | 0.91+ |
red hat | ORGANIZATION | 0.87+ |
Europe | LOCATION | 0.84+ |
use.net | OTHER | 0.83+ |
under four terabytes | QUANTITY | 0.82+ |
Two different scale | QUANTITY | 0.78+ |
Kubernetes | ORGANIZATION | 0.75+ |
a second | QUANTITY | 0.72+ |
Kubernetes | TITLE | 0.71+ |
cube con cloud native con | ORGANIZATION | 0.7+ |
cloud native con | ORGANIZATION | 0.67+ |
Degan | PERSON | 0.66+ |
Silicon | LOCATION | 0.63+ |
Licia Spain | ORGANIZATION | 0.62+ |
Hazel cath | ORGANIZATION | 0.61+ |
con cloud native con | ORGANIZATION | 0.58+ |
Rico | LOCATION | 0.57+ |
Cuban | OTHER | 0.56+ |
Monash | ORGANIZATION | 0.55+ |
Hazel | TITLE | 0.53+ |
Cuan | LOCATION | 0.53+ |
foundation | ORGANIZATION | 0.52+ |
Q | EVENT | 0.51+ |
last couple | DATE | 0.5+ |
CNI | TITLE | 0.46+ |
C | TITLE | 0.45+ |
Paul | PERSON | 0.44+ |
2022 | EVENT | 0.33+ |
Video Exclusive: Oracle Lures MongoDB Devs With New API for ADB
(upbeat music) >> Oracle continues to pursue a multi-mode converged database strategy. The premise of this all in one approach is to make life easier for practitioners and developers. And the most recent example is the Oracle database API for MongoDB, which was announced today. Now, Oracle, they're not the first to come out with a MongoDB compatible API, but Oracle hopes to use its autonomous database as a differentiator and further build a moat around OCI, Oracle Cloud Infrastructure. And with us to talk about Oracle's MongoDB compatible API is Gerald Venzl, who's a distinguished Product Manager at Oracle. Gerald was a guest along with Maria Colgan on the CUBE a while back, and we talked about Oracle's converge database and the kind of Swiss army knife strategy, I called it, of databases. This is dramatically different. It's an approach that we see at the opposite end of the the spectrum, for instance, from AWS, who, for example, goes after the world of developers with a different database for every use case. So, kind of picking up from there, Gerald, I wonder if you could talk about how this new MongoDB API adds to your converged model and the whole strategy there. Where does it fit? >> Yeah, thank you very much, Dave and, by the way, thanks for having me on the CUBE again. A pleasure to be here. So, essentially the MongoDB API to build the compatibility that we used with this API is a continuation of the converge database story, as you said before. Which is essentially bringing the many features of the many single purpose databases that people often like and use, together into one technology so that everybody can benefit from it. So as such, this is just a continuation that we have from so many other APIs or standards that we support. Since a long time, we already, of course to SQL because we are relational database from the get go. Also other standard like GraphQL, Sparkle, et cetera that we have. And the MongoDB API, is now essentially just the next step forward to give the developers this API that they've gotten to love and use. >> I wonder if you could talk about from the developer angle, what do they get out of it? Obviously you're appealing to the Mongo developers out there, but you've got this Mongo compatible API you're pouting the autonomous database on OCI. Why aren't they just going to use MongoDB Atlas on whatever cloud, Azure or AWS or Google Cloud platform? >> That's a very good question. We believe that the majority of developers want to just worry about their application, writing the application, and not so much about the database backend that they're using. And especially in cloud with cloud services, the reason why developers choose these services is so that they don't have to manage them. Now, autonomous database brings many topnotch advanced capabilities to database cloud services. We firmly believe that autonomous database is essentially the next generation of cloud services with all the self-driving features built in, and MongoDB developers writing applications against the MongoDB API, should not have to hold out on these capabilities either. It's like no developer likes to tune the database. No developer likes to take a downtime when they have to rescale their database to accommodate a bigger workload. And this is really where we see the benefit here, so for the developer, ideally nothing will change. You have MongoDB compatible API so they can keep on using their tools. They can build the applications the way that they do, but the benefit from the best cloud database service out there not having to worry about any of these package things anymore, that even MongoDB Atlas has a lot of shortcomings still today, as we find. >> Of cos, this is always a moving target The technology business, that's why we love it. So everybody's moving fast and investing and shaking and jiving. But, I want to ask you about, well, by the way, that's so you're hiding the underlying complexity, That's really the big takeaway there. So that's you huge for developers. But take, I was talking before about, the Amazon's approach, right tool for the right job. You got document DB, you got Microsoft with Cosmos, they compete with Mongo and they've been doing so for some time. How does Oracle's API for Mongo different from those offerings and how you going to attract their users to your JSON offering. >> So, you know, for first of all we have to kind of separate slightly document DB and AWS and Cosmos DB in Azure, they have slightly different approaches there. Document DB essentially is, a document store owned by and built by AWS, nothing different to Mongo DB, it's a head to head comparison. It's like use my document store versus the other document store. So you don't get any of the benefits of a converge database. If you ever want to do a different data model, run analytics over, etc. You still have to use the many other services that AWS provides you to. You cannot all do it into one database. Now Cosmos DB it's more in interesting because they claim to be a multi-model database. And I say claim because what we understand as multi-model database is different to what they understand as multimodel database. And also one of the reasons why we start differentiating with converge database. So what we mean is you should be able to regardless what data format you want to store in the database leverage all the functionality of the database over that data format, with no trade offs. Cosmos DB when you look at it, it essentially gives you mode of operation. When you connect as the application or the user, you have to decide at connection time, how you want, how this database should be treated. Should it be a document store? Should it be a graph store? Should it be a relational store? Once you make that choice, you are locked into that. As long as you establish that connection. So it's like, if you say, I want a document store, all you get is a document store. There's no way for you to crossly analyze with the relational data sitting in the same service. There's no for you to break these boundaries. If you ever want to add some graph data and graph analytics, you essentially have to disconnect and now treat it as a graph store. So you get multiple data models in it, but really you still get, one trick pony the moment you connect to it that you have to choose to. And that is where we see a huge differentiation again with our converge database, because we essentially say, look, one database cloud service on Oracle cloud, where it allows you to do anything, if you wish to do so. You can start as a document store if you wish to do so. If you want to write some SQL queries on top, you can do so. If you want to add some graph data, you can do so. But there's no way for you to have to rewrite your application, use different libraries and frameworks now to connect et cetera, et cetera. >> Got it. Thank you for that. Do you have any data when you talk to customers? Like I'm interested in the diversity of deployments, like for instance, how many customers are using more than one data model? Do for instance, do JSON users need support for other data types or are they happy to stay kind of in their own little sandbox? Do you have any data on that? >> So what we see from the majority of our customers, there is no such thing as one data model fits everything. So, and it's like, there again we have to differentiate the developer that builds a certain microservice, that makes happy to stay in the JSON world or relational world, or the company that's trying to derive value from the data. So it's like the relational model has not gone away since 40 years of it existence. It's still kicking strong. It's still really good at what it does. The JSON data model is really good in what it does. The graph model is really good at what it does. But all these models have been built for different purposes. Try to do graph analytics on relational or JSON data. It's like, it's really tricky, but that's why you use a graph model to begin with. Try to shield yourself from the organization of the data, how it's structured, that's really easy in the relational world, not so much when you get into a document store world. And so what we see about our customers is like as they accumulate more data, is they have many different applications to run their enterprises. The question always comes back, as we have predicted since about six, seven years now, where they say, hey, we have all this different data and different data formats. We want to bring it all together, analyze it together, get value out of the data together. We have seen a whole trend of big data emerge and disappear to answer the question and didn't quite do the trick. And we are basically now back to where we were in the early 2000's when XML databases have faded away, because everybody just allowed you to store XML in the database. >> Got it. So let's make this real for people. So maybe you could give us some examples. You got this new API from Mongo, you have your multi model database. How, take a, paint a picture of how customers are going to benefit in real world use cases. How does it kind of change the customer's world before and after if you will? >> Yeah, absolutely. So, you know the API essentially we are going to use it to accept before, you know, make the lives of the developers easier, but also of course to assist our customers with migrations from Mongo DB over to Oracle Autonomous Database. One customer that we have, for example, that would've benefited of the API several a couple of years ago, two, three years ago, it's one of the largest logistics company on the planet. They track every package that is being sent in JSON documents. So every track package is entries resembled in a JSON document, and they very early on came in with the next question of like, hey, we track all these packages and document in JSON documents. It will be really nice to know actually which packages are stuck, or anywhere where we have to intervene. It's like, can we do this? Can we analyze just how many packages get stuck, didn't get delivered on, the end of a day or whatever. And they found this struggle with this question a lot, they found this was really tricky to do back then, in that case in MongoDB. So they actually approached Oracle, they came over, they migrated over and they rewrote their applications to accommodate that. And there are happy JSON users in Oracle database, but if we were having this API already for them then they wouldn't have had to rewrite their applications or would we often see like worry about the rewriting the application later on. Usually migration use cases, we want to get kind of the migration done, get the data over be running, and then worry about everything else. So this would be one where they would've greatly benefited to shorten this migration time window. If we had already demo the Mongo API back then or this compatibility layer. >> That's a good use case. I mean, it's, one of the most prominent and painful, so anything you could do to help that is key. I remember like the early days of big data, NoSQL, of course was the big thing. There was a lot of confusion. No, people thought was none or not only SQL, which is kind of the more widely accepted interpretation today. But really, it's talking about data that's stored in a non-relational format. So, some people, again they thought that SQL was going to fade away, some people probably still believe that. And, we saw the rise of NoSQL and document databases, but if I understand it correctly, a premise for your Mongo DB API is you really see SQL as a main contributor over Mongo DB's document collections for analytics for example. Can you make, add some color here? Are you seeing, what are you seeing in terms of resurgence of SQL or the momentum in SQL? Has it ever really waned? What's your take? >> Yeah, no, it's a very good point. So I think there as well, we see to some extent history repeating itself from, this all has been tried beforehand with object databases, XML database, et cetera. But if we stay with the NoSQL databases, I think it speaks at length that every NoSQL database that as you write for the sensor you started with NoSQL, and then while actually we always meant, not only SQL, everybody has introduced a SQL like engine or interface. The last two actually join this family is MongoDB. Now they have just recently introduced a SQL compatibility for the aggregation pipelines, something where you can put in a SQL statement and that essentially will then work with aggregation pipeline. So they all acknowledge that SQL is powerful, for us this was always clear. SQL is a declarative language. Some argue it's the only true 4GL language out there. You don't have to code how to get the data, but you just ask the question and the rest is done for you. And, we think that as we, basically, has SQL ever diminished as you said before, if you look out there? SQL has always been a demand. Look at the various developer surveys, etc. The various top skills that are asked for SQL has never gone away. Everybody loves and likes and you wants to use SQL. And so, yeah, we don't think this has ever been, going away. It has maybe just been, put in the shadow by some hypes. But again, we had the same discussion in the 2000's with XML databases, with the same discussions in the 90's with object databases. And we have just frankly, all forgotten about it. >> I love when you guys come on and and let me do my thing and I can pretty much ask any question I want, because, I got to say, when Oracle starts talking about another company I know that company's doing well. So I like, I see Mongo in the marketplace and I love that you guys are calling it out and making some moves there. So here's the thing, you guys have a large install base and that can be an advantage, but it can also be a weight in your shoulder. These specialized cloud databases they don't have that legacy. So they can just kind of move freely about, less friction. Now, all the cloud database services they're going to have more and more automation. I mean, I think that's pretty clear and inevitable. And most if not all of the database vendors they're going to provide support for these kind of converged data models. However they choose to do that. They might do it through the ecosystem, like what Snowflake's trying to do, or bring it in the house themselves, like a watch maker that brings an in-house movement, if you will. But it's like death and taxes, you can't avoid it. It's got to happen. That's what customers want. So with all that being said, how do you see the capabilities that you have today with automation and converge capabilities, How do you see that, that playing out? What's, do you think it gives you enough of an advantage? And obviously it's an advantage, but is it enough of an advantage over the specialized cloud database vendors, where there's clearly a lot of momentum today? >> I mean, honestly yes, absolutely. I mean, we are with some of these databases 20 years ahead. And I give you concrete examples. It's like Oracle had transaction support asset transactions since forever. NoSQL players all said, oh, we don't need assets transactions, base transactions is fine. Yada, yada, yada. Mongo DB started introducing some transaction support. It comes with some limits, cannot be longer than 60 seconds, cannot touch more than a thousand documents as well, et cetera. They still will have to do some catching up there. I mean, it took us a while to get there, let's be honest. Glad We have been around for a long time. Same thing, now that happened with version five, is like we started some simple version of multi version concurrency control that comes along with asset transactions. The interesting part here is like, we've introduced this also an Oracle five, which was somewhere in the 80's before I even started using Oracle Database. So there's a lot of catching up to do. And then you look at the cloud services as well, there's actually certain, a lot of things that we kind of gotten take, we've kind of, we Oracle people have taken for granted and we kind of keep forgetting. For example, our elastic scale, you want to add one CPU, you add one CPU. Should you take downtime for that? Absolutely not. It's like, this is ridiculous. Why would you, you cannot take it downtime in a 24/7 backend system that runs the world. Take any of our customers. If you look at most of these cloud services or you want to reshape, you want to scale your cloud service, that's fine. It's just the VM under the covers, we just shut everything down, give you a VM with more CPU, and you boot it up again, downtown right there. So it's like, there's a lot of these things where we go like, well, we solved this frankly decades ago, that these cloud vendors will run into. And just to add one more point here, so it's like one thing that we see with all these migrations happening is exactly in that field. It's like people essentially started building on whether it's Mongo DB or other of these NoSQL databases or cloud databases. And eventually as these systems grow, as they ask more difficult questions, their use cases expand, they find shortcomings. Whether it's the scalability, whether it's the security aspects, the functionalities that we have, and this is essentially what drives them back to Oracle. And this is why we see essentially this popularity now of pendulum swimming towards our direction again, where people actually happily come over back and they come over to us, to get their workloads enterprise grade if you like. >> Well, It's true. I mean, I just reported on this recently, the momentum that you guys have in cloud because it is, 'cause you got the best mission critical database. You're all about maps. I got to tell you a quick story. I was at a vertical conference one time, I was on stage with Kurt Monash. I don't know if you know Kurt, but he knows this space really well. He's probably forgot and more about database than I'll ever know. But, and I was kind of busting his chops. He was talking about asset transactions. I'm like, well with NoSQL, who needs asset transactions, just to poke him. And he was like, "Are you out of your mind?" And, and he said, look it's everybody is going to head in this direction. It turned out, it's true. So I got to give him props for that. And so, my last question, if you had a message for, let's say there's a skeptical developer out there that's using Mongo DB and Atlas, what would you say to them? >> I would say go try it for yourself. If you don't believe us, we have an always free cloud tier out there. You just go to oracle.com/cloud/free. You sign up for an always free tier, spin up an autonomous database, go try it for yourself. See what's actually possible today. Don't just follow your trends on Hackernews and use a set study here or there. Go try it for yourself and see what's capable of >> All right, Gerald. Hey, thanks for coming into my firing line today. I really appreciate your time. >> Thank you for having me again. >> Good luck with the announcement. You're very welcome, and thank you for watching this CUBE conversation. This is Dave Vellante, We'll see you next time. (gentle music)
SUMMARY :
the first to come out the next step forward to I wonder if you could talk is so that they don't have to manage them. and how you going to attract their users the moment you connect to it you talk to customers? So it's like the relational So maybe you could give us some examples. to accept before, you know, make API is you really see SQL that as you write for the and I love that you And I give you concrete examples. the momentum that you guys have in cloud If you don't believe us, I really appreciate your time. and thank you for watching
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Maria Colgan | PERSON | 0.99+ |
Gerald Venzl | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Gerald | PERSON | 0.99+ |
Kurt | PERSON | 0.99+ |
NoSQL | TITLE | 0.99+ |
MongoDB | TITLE | 0.99+ |
JSON | TITLE | 0.99+ |
SQL | TITLE | 0.99+ |
MongoDB Atlas | TITLE | 0.99+ |
40 years | QUANTITY | 0.99+ |
Mongo | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
One customer | QUANTITY | 0.99+ |
oracle.com/cloud/free | OTHER | 0.98+ |
first | QUANTITY | 0.98+ |
Kurt Monash | PERSON | 0.98+ |
more than a thousand documents | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one time | QUANTITY | 0.97+ |
two | DATE | 0.97+ |
one database | QUANTITY | 0.97+ |
more than one data model | QUANTITY | 0.97+ |
one thing | QUANTITY | 0.97+ |
90's | DATE | 0.97+ |
one technology | QUANTITY | 0.96+ |
20 years | QUANTITY | 0.96+ |
80's | DATE | 0.96+ |
one more point | QUANTITY | 0.95+ |
decades ago | DATE | 0.95+ |
one data model | QUANTITY | 0.95+ |
Azure | TITLE | 0.94+ |
three years ago | DATE | 0.93+ |
seven years | QUANTITY | 0.93+ |
version five | OTHER | 0.92+ |
one approach | QUANTITY | 0.92+ |
Manish Gupta, Redis Labs | Spark Summit East 2017
>> Announcer: Live from Boston, Massachusetts, it's theCUBE, covering Spark Summit East 2017. Brought to you by Databricks. Now, here are your hosts Dave Vellante and George Gilbert. >> Welcome back to snowy Boston, everybody. This is theCUBE, the leader in live tech coverage. We're here at Spark Summit East, hashtag SparkSummit. Manish Gupta is here, he's the CMO at Redis Labs. Manish, welcome to theCUBE. >> Thank you, good to be here. >> So, you know, 10 years ago you say you're in the database business and everybody would yawn. Now you're the life of the party. >> Yeah, the world has changed. I think the party has lots and lots of players. We are happy to be on the top of that heap. >> It is a crowded space, so how does Redis Labs differentiate? >> Redis Labs is the company behind the massively popular open source Redis, and Redis became popular because of its performance primarily, and then simplicity. Developers could very easily run up an instance of Redis, solve some very hairy problems, and time to market was a big issue for them. Redis Enterprise took that forward and enabled it to be mission critical, ready for the largest workloads, ready for things that the enterprises need in a highly distributed clustered environment. So they have resilience and they benefit from the performance of Redis. >> And your claim to fame, as you say, is that top-gun performance, you guys will talk about some of the benchmarks later. We're talking about use cases like fraud detection, as example. Obviously ad serving would be another one. But add some color to that if you would. >> Redis is whatever you need to make real time real, Redis plays a very important role. It is able to deliver millions of operations per second with sub-millisecond latency, and that's the hallmark. With data structures that comprise Redis, you can solve the problems in a way, and the reason you can get that performance is because the data structures take some very complex issues and simplify the operation. Depending on the use case, you could use one of the data structures, you can mix and match the data structures, so that's the power of a Redis. We're used for ITO, for machine learning, for metering of billing and telecommunications environment, for personalization, for ad serving with companies like Groupon and others, and the list goes on and on. >> Yeah, you've got a big list on your website of all your customers, so you can check that out. Let's get the business model piece out of the way. Everybody's always fascinated. Okay, you got open source, how do you make money? How does Redis make money? >> Yeah, you know, we believe strategically fostering the growth of open source is foundational in our business model, and we invest heavily both R&D and marketing to do that. On top of that, to enable enterprise success and deployment of Redis, we have the mission critical, highly available Redis Enterprise offerings. Our monetization is entirely based on the Redis Enterprise platform, which takes advantage of the data structures and performance of core Redis, but layers on top management and the capabilities that make things like auto-recovery, auto-sorting, management much, much easier for the enterprise. We make that available in four deployment models. The enterprise can select us as Redis cloud, which runs on a public infrastructure on any of the four major platforms. We also allow for the enterprise to select a VPC environment in their own private clouds. They can also get software and self-manage that, or get our software and we can manage it for them. Four deployment options are the modalities in other ways where the enterprise customers help us monetize. >> When you said four major platforms, you meant cloud platforms? >> That's right. AWS, >> So, AWS, Azure >> Azure, Google, and IBM. >> Is IBM software, got there in the fourth, alright. >> That's right, all four. >> Go to the whip IBM. Go ahead, George. >> Along the lines of the business model, and we were sort of starting to talk about this earlier offline, you're just one component in building an application, and there's always this challenge of, well, I can manage my component better than anyone else, but it's got to fit with a bunch of other vendors' components. How do you make that seamless to the customer so that it's not defaulting over to a cloud vendor who has to build all the components themselves to make it work together? >> Certainly, you know, database is an integral part of your stack, of your application stack, but it is a stack, so there are other components. Redis and Redis Labs has a very, very large ecosystem within which we operate. We work closely with others for interfaces, for connectors, for interoperability, and that's a sustained environment that we invest in on a continuous basis. >> How do handle application consistency? A lot of in the no-SQL world, even in the AWS world, you hear about eventual consistency, but in the real-time world, there's a need for more rigorous, what's your philosophy there, how do you approach that? >> I think that's an issue that many no-SQL vendors have not been able to crack. Redis Labs has been at the forefront of that. We are taking an approach, and we are offering what we call tuneable consistency. Depending on the economics and the business model and the use case, the needs of consistency vary. In some cases, you do need immediate consistency. In other cases, you don't ever need consistency. And to give that flexibility to the customer is very important, so we've taken the approach where you can go from loose consistency to what we call strong eventual consistency. That approach is based on a fairly well trusted architecture and approach called CRDT, Conflict-free Replication Data Type. That approach allows us to, regardless of what the cluster magnitude or the distribution looks like geographically, we can deliver strong eventual consistency which meets the needs of majority of the customers. >> What are you seeing in terms of, you know, also in that a discussion about acid properties, and how many workloads really need acid properties. What are seeing now as you get more cloud native workloads and more no-SQL oriented workloads in terms of the requirement for those acid properties? >> First of all, we truly believe and agree that not all environments required acid support. Having said that, to be a truly credible database, you must support acid, and we do. Redis is acid-compli, supports acid, and Redis Labs certainly supports that. >> I remember on a stage once with Curt Monash, I'm sure you know Curt, right? Very famous database person. And he basically had a similar answer. But you would say that increasingly there are workloads that, the growth workloads don't necessarily require that, is that fair statement? >> That's a fair statement I would say. >> Dave: Great, good. >> There's a trade-off, though, when you talked about strong eventual consistency, potentially you have to wait for, presumably, a quorum of the partitions, I'm getting really technical here, but in other words, you've got a copy of the data here-- >> Dave: Good CMO question. (laughing) >> But your value proposition to the customers, we get this stuff done fast, but if you have to wait for a couple other servers to make sure that they've got the update, that can slow things way down. How does that trade-off work? >> I think that's part of the power of our architecture. We have a nothing shared, single proxy architecture where all of the replication, the disaster recovery, and the consistency management of the back end is handled by the proxy, and we ensure that the performance is not degraded when you are working through the consistency challenges, and that's where significant amount of IP is in the development of that proxy. >> I'll take that as a, let's go into it even more offline. >> Manish: Sounds good. >> And I have some other CMO questions, if I may. A lot of young companies like yours, especially in open source world, when they go to get the word out, they rely on their community, their open source community, and that's the core, and that makes a lot of sense, it's their peeps. As you become, grow more into enterprise grade apps and workloads, how do you extend beyond that? What is Redis Labs doing to sort of reach that C-Suite, are you even trying to reach that C-Suite up level to messaging? How do you as a CMO deal with those challenges? >> Maybe I'll begin by talking about our personas that matter to us in the ecosystem. The enterprise level, the architects, the developers, are the primary target, which we try to influence in early part of the decision cycle, it's at the architectural level. The ultimate teams that manage, run, and operate the infrastructure is certainly the DevOps, or the operations teams, and we spend time there. All along for some of the enterprise engagements, CIOs, chief data officers, and CTOs tend to play a very important role in the decisions and the selection process, and so, we do influence and interact with the C-Suite quite heavily. What the power of the open source gives us is that groundswell of love for Redis. Literally you can walk around a developer environment, such as the Spark Summit here, and you'll find people wearing Redis Geek shirts. And we get emails from Kazakhstan and strange, places from all over the world where we don't necessarily have salesforce, and requesting t-shirts, "send us stickers." Because people love Redis, and the word of mouth, that ground level love for the technology enables the decisions to be so much easier and smoother. We're not convincing, it's not a philosophical battle anymore. It's simply about the use case and the solution where Redis Enterprise fits or doesn't fit. >> Okay, so it really is that core developer community that are your advocates, and they're able to internally sell to the C-Suite. A lot of times the C-Suite, not the CTO so much, but certainly the CIO, CDO are like, "Yeah, yeah, they're geekin' out on some new hot thing. "What's the business impact?" Do you get that question a lot, and how do address it? >> I think then you get to some of the very basic tools, ROI calculators and the value proposition. For the C-level, the message is very simple. We are the least risky bet. We are the best long-term proposition, and we are the best cost answer for their implementation. Particularly as the needs are increasingly becoming more real-time in nature, they are not batch processed. Yes, there will always be some of that, but as the workloads are becoming, there is a need for faster processing, there is a need for quick insights, and real-time is not a moniker anymore, right. Real-time truly needs to be delivered today. And so, I think those three propositions for the C-Suite are resonating very well. >> Let's talk about ROI calculators for a second. I love talking about it because it underscores what a company feels as though its core value proposition is. I would think with Redis Labs part of the value proposition is you are enabling new types of workloads and new types of, whether it's sources of revenue or productivity. And these are generally telephone numbers as compared to some of the cost savings head to head to your competition, which of course you want to stress as well because the CFO cares about the cap-backs. What do you emphasize in that, and we don't have to get into the calculator itself, but in the conceptual model, what's the emphasis? Is it on those sort of business value attributes, is it on the sort of cost-savings? How do you translate performance into that business value? A lot of questions there, but if you could summarize, that'd be great. >> Well, I think you can think of it in three dimensions. The very first one is, does the performance support the use case or the solution that is required? That's the very first one. The second piece that fits in it, and that's in our books, that's operations per second and the latency. The second piece is the cost side, and that has two components to it. The first component is, what are the compute requirements? So, what is the infrastructure underneath that has to support it? And the efficiency that Redis and Redis Enterprise has is dramatically superior to the alternatives. And so, the economics show up. To run a million operations per second, we can do that on two nodes as opposed to alternative, which might need 50 nodes or 300 nodes. >> You can utilize your assets on the floor much better than maybe the competition can. >> This is where the data structures come into play quite a bit. That's one part of-- >> Dave: That's one part of the cost. >> Yeah. The other part of the cost is the human cost. >> Dave: People, yeah. >> And because, and this goes back to the open source, because the people available with the talent and the competency and appreciation for Redis, it's easy to procure those people, and your cost of acquisition and deploying goes down quite a bit. So, there's a human cost to it. The third dimension to this whole equation is time to market. And time to market is measured in many ways. Is it lost revenue if it takes you longer to get there? And Redis consistently from multiple analysts' reports gets top ranking for fastest way to get to market because of how simple it is. Beyond performance, simplicity is a second hallmark. >> That's a benefit acceleration, and you can quantify that. >> Absolutely, absolutely. And that's a revenue parameter, right. >> For years, people have been saying this Cambrian explosion of databases is unsustainable, and sort of in response we've gotten a squaring of the Cambrian explosion. The question is, with your sort of very flexible, I don't want to get too geeky, 'cause Dave'll cut me off, but the idea that you can accommodate time series and all these different ways of, all these different types of data, are we approaching a situation where customers can start consolidating their database choices and have fewer vendors, fewer products in their landscape? >> I think not only are we getting there, but we must get there. You've got over 300 databases in the marketplace, and imagine a CIO or an architect trying to have to sort through that to make a decision, it's difficult, and you certainly cannot support it from a trading standpoint or from an investment, cap-backs, and all that standpoint. What we have done with Redis is introduce something called Redis Modules. We released that at the last RedisConf in May in San Francisco. And the Redis Module is a very simple concept but a very powerful concept. It's an API which can be utilized to take an existing development effort, written as CC++, that can be ported onto the Redis data structures. This gives you the flexibility without having to reinvent the wheel every single time to take that investment, port it on top of Redis, and you get the performance, and you can make now Redis becomes a multi-model database. And I'm going to get to your answer of how do you address the multiple needs so you don't need multiple databases. To give you some examples, since the introduction of Redis Modules, we have now over 50 modules that have been published by a variety of places, not just Redis Labs. To indicate how simple and how powerful this model is. We took Lucene and developed the world's fastest full-text search engine as a module. We have very recently introduced Redis machine learning as a module that works with Spark ML and serves as a great serving layer in the machine learning domain. Just two very simple examples, but work that's being done ported over onto Redis data structures and now you have ability to do some very powerful things because of what Redis is. And this is the way future's going to be. I think every database is trying to offer multi-functionality to be multi-model in nature, but instead of doing it one step at a time, this approach gives us the ability to leverage the entire ecosystem. >> Your point being consolidation's inevitable in this business as well. >> Manish: Architectural consolidation. >> Yes, but also you would think, company consolidation, isn't that going to follow? What do you make of the market, and tell me, if you look back on the database market and what Oracle was able to achieve in the face of, maybe not as many players, but you had Sybase and Informix, and certainly DB2's still around, and SQL Server's still around, but Oracle won, and maybe it was SQL standards that. It's great to be lucky and good. Can we learn from that, or is this a whole different world? Are there similarities, and how do you, how do you see that consolidation potentially shaking out, if you agree that there will be consolidation? >> Yeah, there has to be, first and foremost, an architectural approach that solves the OPEX, CAPEX challenge for the enterprise. But beyond that, no industry can sustain the diversity and the fragmentation that exists in database world. I think there will always be new things coming out, of universities particularly. There's great innovation and research happening, and that is required to augment. But at the end of the day, the commercial enterprises cannot be of the fragmented volume that we have today in the database world, so there is going to be some consolidation, and it's not unnatural. I think it's natural, it's expected, time will tell what that looks like. We've seen some of our competitors acquire smaller companies to add graph functionality, to add search functionality. We just don't think that's the level of consolidation that really moves the needle for the industry. It's got to be at a higher level of consolidation. >> I don't want to, don't take this the wrong way, don't hate me for saying it, but is Oracle sort of the enemy, if I can say that. I mean, it's like, no, okay. >> Depends how you define enemy. >> I'm not going to go do many of the workloads that you're talking about on Oracle, despite what Larry tells me at Oracle OpenWorld. And I'm not going to make Oracle my choice for any of the workloads that you guys are working on. I guess in terms, I mean, everybody who's in the database business looks at that and say, "Hey, we can do it cheaper, better, "more productively," but, could you respond to that, and what do you make of Amazon's moves in the database world? Does that concern you? >> We think of Amazon and Oracle as two very different philosophies, if you can use that word. The approach we have taken is really a forward-looking approach and philosophy. We believe that the needs of the market need to be solved in new ways, and new ways should not be encumbered by old approaches. We're not trying to go and replicate what was done in the SQL world or in a relational database world. Our approach is how do you deliver a multi-model database that has the real-time attribute attached to it in a way that requires very limited computer force power and very few resources to manage? You take all of those things as kind of the core philosophy, which is a forward-looking philosophy. We are definitely not trying to replicate what an Oracle used to be. AWS I think is a very different animal. >> Dave: Interesting, though. >> They have defined the cloud, and I think play a very important role. We are a strong partner of theirs, much of our traffic runs on AWS infrastructure, certainly also on other clouds. I think AWS is one to watch in how they evolve. They have database offerings, including Redis offerings. However, we fully recognize, and the industry recognizes that that's not to the same capability as Redis Enterprise. It's open sourced Redis managed by AWS, and that's fine as a cache, but you cannot persist, and you really cannot have a multi-model capability that's a full database in that approach. >> And you're in the marketplace. >> Manish: We are in the marketplace. >> Obviously. >> And actually, we announced earlier, a few weeks ago, that you can buy and get Redis cloud access, which is Redis Enterprise cloud, on AWS through the integrated billing approach on their marketplace. You can have an AWS account and get our service, the true Redis Enterprise service. >> And as a software company, you'd figure, okay, the cloud infrastructures are service, we don't care what infrastructure it runs on. Whatever the customer wants, but you see AWS making these moves up-market, you got to obviously be paying attention to that. >> Manish: Certainly, certainly. >> Go ahead, last question. >> Interesting that you were saying that to solve this problem of proliferation of choice it has to be multi-model with speed and low resource requirement. If I were to interpret that from an old-style database perspective, it would be you're going to get, the multi-model is something you are addressing now, with the extensibility, but the speed means taking out that abstraction layer that was the query optimizer sort of and working almost at the storage layer, or having an option to do that. Would that be a fair way to say? >> No, I don't think that necessarily needs to be the case. For us, speed translates from the simplicity and the power of the data structures. Instead of having to serialize, deserialize before you process data in a Spark context, or instead of having to look for data that is perhaps not put in sorted sets for a use case that you might be doing, running a query on, if the data is already handled through one of the data structures, you now have a much faster query time, you now have the ability to reach the data in the right approach. And again, this is no-SQL, right, so it's a schema lesson write and it sets your scheme as you want it be on read. We marry that with the data structures, and that gives you the ultimate speed. >> We have to leave it there, but Manish, I'll give you the last word. Things we should be paying attention to for Redis Labs this year, events, announcements? >> I think the big thing I would leave the audience with is RedisConf 2017. It's May 31 to June 2 in San Francisco. We are expecting over 1,000 people. The brightest minds around Redis of the database world will be there, and anybody who is considering deploying the next generation database should attend. >> Dave: Where are you doing that? >> It's the Marriott Marquis in San Franciso. >> Great, is that on Howard Street, across from the--? >> It is right across from Moscone. >> Great, awesome location. People know it, easy to get to. Well, congratulations on the success. We'll be lookin' for outputs from that event, and hope to see you again on theCUBE. >> Thank you, enjoyed the conversation. >> Alright, good. Keep it right there, everybody, we'll be back with our next guest. This is theCUBE, we're live from Spark Summit East. Be right back. (upbeat electronic rock music)
SUMMARY :
Brought to you by Databricks. Manish Gupta is here, he's the CMO at Redis Labs. So, you know, 10 years ago you say We are happy to be on the top of that heap. Redis Labs is the company behind But add some color to that if you would. and the reason you can get that performance Let's get the business model piece out of the way. We also allow for the enterprise to select a VPC environment That's right. Google, and IBM. Go to the whip IBM. Along the lines of the business model, Certainly, you know, database is an integral part and the use case, the needs of consistency vary. in terms of the requirement for those acid properties? you must support acid, and we do. the growth workloads don't necessarily require that, Dave: Good CMO question. but if you have to wait for a couple other servers and the consistency management of the back end and that's the core, and that makes and the word of mouth, that ground level love but certainly the CIO, CDO are like, For the C-level, the message is very simple. part of the value proposition is you are enabling That's the very first one. much better than maybe the competition can. This is where the data structures of the cost. The other part of the cost is the human cost. and the competency and appreciation for Redis, And that's a revenue parameter, right. but the idea that you can accommodate time series We released that at the last RedisConf in this business as well. and tell me, if you look back on the database market that really moves the needle for the industry. but is Oracle sort of the enemy, if I can say that. for any of the workloads that you guys are working on. We believe that the needs of the market and that's fine as a cache, but you cannot persist, the true Redis Enterprise service. okay, the cloud infrastructures are service, the multi-model is something you are addressing now, and the power of the data structures. but Manish, I'll give you the last word. of the database world will be there, and hope to see you again on theCUBE. This is theCUBE, we're live from Spark Summit East.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Amazon | ORGANIZATION | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
George | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Howard Street | LOCATION | 0.99+ |
Curt | PERSON | 0.99+ |
second piece | QUANTITY | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Redis Labs | ORGANIZATION | 0.99+ |
Manish Gupta | PERSON | 0.99+ |
two nodes | QUANTITY | 0.99+ |
Redis | ORGANIZATION | 0.99+ |
two components | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
San Franciso | LOCATION | 0.99+ |
Larry | PERSON | 0.99+ |
Manish | PERSON | 0.99+ |
first component | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
over 50 modules | QUANTITY | 0.99+ |
June 2 | DATE | 0.99+ |
May 31 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
Boston | LOCATION | 0.99+ |
Curt Monash | PERSON | 0.99+ |
May | DATE | 0.99+ |
millions | QUANTITY | 0.99+ |
third dimension | QUANTITY | 0.98+ |
50 nodes | QUANTITY | 0.98+ |
Moscone | LOCATION | 0.98+ |
fourth | QUANTITY | 0.98+ |
Redis Enterprise | TITLE | 0.98+ |
300 nodes | QUANTITY | 0.98+ |
Redis | TITLE | 0.98+ |
Kazakhstan | LOCATION | 0.98+ |
over 1,000 people | QUANTITY | 0.98+ |
one part | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
one step | QUANTITY | 0.97+ |
C-Suite | TITLE | 0.97+ |
Marriott Marquis | ORGANIZATION | 0.97+ |
second hallmark | QUANTITY | 0.97+ |
10 years ago | DATE | 0.97+ |
Spark Summit East 2017 | EVENT | 0.97+ |
Groupon | ORGANIZATION | 0.97+ |
first one | QUANTITY | 0.97+ |
CDO | TITLE | 0.97+ |
over 300 databases | QUANTITY | 0.96+ |
SQL Server | TITLE | 0.96+ |
Redis Enterprise cloud | TITLE | 0.96+ |