Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps
>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)
SUMMARY :
with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Bob Muglia | PERSON | 0.99+ |
Alex Myerson | PERSON | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
David Flynn | PERSON | 0.99+ |
Veronica | PERSON | 0.99+ |
Jack | PERSON | 0.99+ |
Nelu Mihai | PERSON | 0.99+ |
Zhamak Dehghani | PERSON | 0.99+ |
Thomas Hazel | PERSON | 0.99+ |
Nick Taylor | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Jack Greenfield | PERSON | 0.99+ |
Kristen Martin | PERSON | 0.99+ |
Ken Schiffman | PERSON | 0.99+ |
Veronica Durgin | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
Rob Ho | PERSON | 0.99+ |
Warner Media | ORGANIZATION | 0.99+ |
Tristan Handy | PERSON | 0.99+ |
Veronika Durgin | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Ionis Pharmaceutical | ORGANIZATION | 0.99+ |
George Gilbert | PERSON | 0.99+ |
Bob Muglia | PERSON | 0.99+ |
David Flore | PERSON | 0.99+ |
DBT Labs | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Bob | PERSON | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
21 sessions | QUANTITY | 0.99+ |
Darren Bramberm | PERSON | 0.99+ |
33 guests | QUANTITY | 0.99+ |
Nir Zuk | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Harveer Singh | PERSON | 0.99+ |
Kit Colbert | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Supercloud 2 | TITLE | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
Western Union | ORGANIZATION | 0.99+ |
Cohesity | ORGANIZATION | 0.99+ |
Supercloud | ORGANIZATION | 0.99+ |
200 locations | QUANTITY | 0.99+ |
August | DATE | 0.99+ |
Keith Townsend | PERSON | 0.99+ |
Data Mesh | ORGANIZATION | 0.99+ |
Palo Alto Networks | ORGANIZATION | 0.99+ |
David.Vellante@siliconangle.com | OTHER | 0.99+ |
next week | DATE | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
first point | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
VMware | ORGANIZATION | 0.98+ |
Silicon Angle | ORGANIZATION | 0.98+ |
ETR | ORGANIZATION | 0.98+ |
Eric Bradley | PERSON | 0.98+ |
two | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Sachs | ORGANIZATION | 0.98+ |
SAKS | ORGANIZATION | 0.98+ |
Supercloud | EVENT | 0.98+ |
last August | DATE | 0.98+ |
each week | QUANTITY | 0.98+ |
Breaking Analysis: Grading our 2022 Enterprise Technology Predictions
>>From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from the cube and E T R. This is breaking analysis with Dave Valante. >>Making technology predictions in 2022 was tricky business, especially if you were projecting the performance of markets or identifying I P O prospects and making binary forecast on data AI and the macro spending climate and other related topics in enterprise tech 2022, of course was characterized by a seesaw economy where central banks were restructuring their balance sheets. The war on Ukraine fueled inflation supply chains were a mess. And the unintended consequences of of forced march to digital and the acceleration still being sorted out. Hello and welcome to this week's weekly on Cube Insights powered by E T R. In this breaking analysis, we continue our annual tradition of transparently grading last year's enterprise tech predictions. And you may or may not agree with our self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, tell us what you think. >>All right, let's get right to it. So our first prediction was tech spending increases by 8% in 2022. And as we exited 2021 CIOs, they were optimistic about their digital transformation plans. You know, they rushed to make changes to their business and were eager to sharpen their focus and continue to iterate on their digital business models and plug the holes that they, the, in the learnings that they had. And so we predicted that 8% rise in enterprise tech spending, which looked pretty good until Ukraine and the Fed decided that, you know, had to rush and make up for lost time. We kind of nailed the momentum in the energy sector, but we can't give ourselves too much credit for that layup. And as of October, Gartner had it spending growing at just over 5%. I think it was 5.1%. So we're gonna take a C plus on this one and, and move on. >>Our next prediction was basically kind of a slow ground ball. The second base, if I have to be honest, but we felt it was important to highlight that security would remain front and center as the number one priority for organizations in 2022. As is our tradition, you know, we try to up the degree of difficulty by specifically identifying companies that are gonna benefit from these trends. So we highlighted some possible I P O candidates, which of course didn't pan out. S NQ was on our radar. The company had just had to do another raise and they recently took a valuation hit and it was a down round. They raised 196 million. So good chunk of cash, but, but not the i p O that we had predicted Aqua Securities focus on containers and cloud native. That was a trendy call and we thought maybe an M SS P or multiple managed security service providers like Arctic Wolf would I p o, but no way that was happening in the crummy market. >>Nonetheless, we think these types of companies, they're still faring well as the talent shortage in security remains really acute, particularly in the sort of mid-size and small businesses that often don't have a sock Lacework laid off 20% of its workforce in 2022. And CO C e o Dave Hatfield left the company. So that I p o didn't, didn't happen. It was probably too early for Lacework. Anyway, meanwhile you got Netscope, which we've cited as strong in the E T R data as particularly in the emerging technology survey. And then, you know, I lumia holding its own, you know, we never liked that 7 billion price tag that Okta paid for auth zero, but we loved the TAM expansion strategy to target developers beyond sort of Okta's enterprise strength. But we gotta take some points off of the failure thus far of, of Okta to really nail the integration and the go to market model with azero and build, you know, bring that into the, the, the core Okta. >>So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge with others holding their own, not the least of which was Palo Alto Networks as it continued to expand beyond its core network security and firewall business, you know, through acquisition. So overall we're gonna give ourselves an A minus for this relatively easy call, but again, we had some specifics associated with it to make it a little tougher. And of course we're watching ve very closely this this coming year in 2023. The vendor consolidation trend. You know, according to a recent Palo Alto network survey with 1300 SecOps pros on average organizations have more than 30 tools to manage security tools. So this is a logical way to optimize cost consolidating vendors and consolidating redundant vendors. The E T R data shows that's clearly a trend that's on the upswing. >>Now moving on, a big theme of 2020 and 2021 of course was remote work and hybrid work and new ways to work and return to work. So we predicted in 2022 that hybrid work models would become the dominant protocol, which clearly is the case. We predicted that about 33% of the workforce would come back to the office in 2022 in September. The E T R data showed that figure was at 29%, but organizations expected that 32% would be in the office, you know, pretty much full-time by year end. That hasn't quite happened, but we were pretty close with the projection, so we're gonna take an A minus on this one. Now, supply chain disruption was another big theme that we felt would carry through 2022. And sure that sounds like another easy one, but as is our tradition, again we try to put some binary metrics around our predictions to put some meat in the bone, so to speak, and and allow us than you to say, okay, did it come true or not? >>So we had some data that we presented last year and supply chain issues impacting hardware spend. We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain above pre covid levels, which would reverse a decade of year on year declines, which I think started in around 2011, 2012. Now, while demand is down this year pretty substantially relative to 2021, I D C has worldwide unit shipments for PCs at just over 300 million for 22. If you go back to 2019 and you're looking at around let's say 260 million units shipped globally, you know, roughly, so, you know, pretty good call there. Definitely much higher than pre covid levels. But so what you might be asking why the B, well, we projected that 30% of customers would replace security appliances with cloud-based services and that more than a third would replace their internal data center server and storage hardware with cloud services like 30 and 40% respectively. >>And we don't have explicit survey data on exactly these metrics, but anecdotally we see this happening in earnest. And we do have some data that we're showing here on cloud adoption from ET R'S October survey where the midpoint of workloads running in the cloud is around 34% and forecast, as you can see, to grow steadily over the next three years. So this, well look, this is not, we understand it's not a one-to-one correlation with our prediction, but it's a pretty good bet that we were right, but we gotta take some points off, we think for the lack of unequivocal proof. Cause again, we always strive to make our predictions in ways that can be measured as accurate or not. Is it binary? Did it happen, did it not? Kind of like an O K R and you know, we strive to provide data as proof and in this case it's a bit fuzzy. >>We have to admit that although we're pretty comfortable that the prediction was accurate. And look, when you make an hard forecast, sometimes you gotta pay the price. All right, next, we said in 2022 that the big four cloud players would generate 167 billion in IS and PaaS revenue combining for 38% market growth. And our current forecasts are shown here with a comparison to our January, 2022 figures. So coming into this year now where we are today, so currently we expect 162 billion in total revenue and a 33% growth rate. Still very healthy, but not on our mark. So we think a w s is gonna miss our predictions by about a billion dollars, not, you know, not bad for an 80 billion company. So they're not gonna hit that expectation though of getting really close to a hundred billion run rate. We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're gonna get there. >>Look, we pretty much nailed Azure even though our prediction W was was correct about g Google Cloud platform surpassing Alibaba, Alibaba, we way overestimated the performance of both of those companies. So we're gonna give ourselves a C plus here and we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, but the misses on GCP and Alibaba we think warrant a a self penalty on this one. All right, let's move on to our prediction about Supercloud. We said it becomes a thing in 2022 and we think by many accounts it has, despite the naysayers, we're seeing clear evidence that the concept of a layer of value add that sits above and across clouds is taking shape. And on this slide we showed just some of the pickup in the industry. I mean one of the most interesting is CloudFlare, the biggest supercloud antagonist. >>Charles Fitzgerald even predicted that no vendor would ever use the term in their marketing. And that would be proof if that happened that Supercloud was a thing and he said it would never happen. Well CloudFlare has, and they launched their version of Supercloud at their developer week. Chris Miller of the register put out a Supercloud block diagram, something else that Charles Fitzgerald was, it was was pushing us for, which is rightly so, it was a good call on his part. And Chris Miller actually came up with one that's pretty good at David Linthicum also has produced a a a A block diagram, kind of similar, David uses the term metacloud and he uses the term supercloud kind of interchangeably to describe that trend. And so we we're aligned on that front. Brian Gracely has covered the concept on the popular cloud podcast. Berkeley launched the Sky computing initiative. >>You read through that white paper and many of the concepts highlighted in the Supercloud 3.0 community developed definition align with that. Walmart launched a platform with many of the supercloud salient attributes. So did Goldman Sachs, so did Capital One, so did nasdaq. So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud storm. We're gonna take an a plus on this one. Sorry, haters. Alright, let's talk about data mesh in our 21 predictions posts. We said that in the 2020s, 75% of large organizations are gonna re-architect their big data platforms. So kind of a decade long prediction. We don't like to do that always, but sometimes it's warranted. And because it was a longer term prediction, we, at the time in, in coming into 22 when we were evaluating our 21 predictions, we took a grade of incomplete because the sort of decade long or majority of the decade better part of the decade prediction. >>So last year, earlier this year, we said our number seven prediction was data mesh gains momentum in 22. But it's largely confined and narrow data problems with limited scope as you can see here with some of the key bullets. So there's a lot of discussion in the data community about data mesh and while there are an increasing number of examples, JP Morgan Chase, Intuit, H S P C, HelloFresh, and others that are completely rearchitecting parts of their data platform completely rearchitecting entire data platforms is non-trivial. There are organizational challenges, there're data, data ownership, debates, technical considerations, and in particular two of the four fundamental data mesh principles that the, the need for a self-service infrastructure and federated computational governance are challenging. Look, democratizing data and facilitating data sharing creates conflicts with regulatory requirements around data privacy. As such many organizations are being really selective with their data mesh implementations and hence our prediction of narrowing the scope of data mesh initiatives. >>I think that was right on J P M C is a good example of this, where you got a single group within a, within a division narrowly implementing the data mesh architecture. They're using a w s, they're using data lakes, they're using Amazon Glue, creating a catalog and a variety of other techniques to meet their objectives. They kind of automating data quality and it was pretty well thought out and interesting approach and I think it's gonna be made easier by some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to eliminate ET t l, better connections between Aurora and Redshift and, and, and better data sharing the data clean room. So a lot of that is gonna help. Of course, snowflake has been on this for a while now. Many other companies are facing, you know, limitations as we said here and this slide with their Hadoop data platforms. They need to do new, some new thinking around that to scale. HelloFresh is a really good example of this. Look, the bottom line is that organizations want to get more value from data and having a centralized, highly specialized teams that own the data problem, it's been a barrier and a blocker to success. The data mesh starts with organizational considerations as described in great detail by Ash Nair of Warner Brothers. So take a listen to this clip. >>Yeah, so when people think of Warner Brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of H B O, you think of t n t, you think of C N N. We have 30 plus brands in our portfolio and each have their own needs. So the, the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against, as an example, HBO if Game of Thrones is going on. >>So it's often the case that data mesh is in the eyes of the implementer. And while a company's implementation may not strictly adhere to Jamma Dani's vision of data mesh, and that's okay, the goal is to use data more effectively. And despite Gartner's attempts to deposition data mesh in favor of the somewhat confusing or frankly far more confusing data fabric concept that they stole from NetApp data mesh is taking hold in organizations globally today. So we're gonna take a B on this one. The prediction is shaping up the way we envision, but as we previously reported, it's gonna take some time. The better part of a decade in our view, new standards have to emerge to make this vision become reality and they'll come in the form of both open and de facto approaches. Okay, our eighth prediction last year focused on the face off between Snowflake and Databricks. >>And we realized this popular topic, and maybe one that's getting a little overplayed, but these are two companies that initially, you know, looked like they were shaping up as partners and they, by the way, they are still partnering in the field. But you go back a couple years ago, the idea of using an AW w s infrastructure, Databricks machine intelligence and applying that on top of Snowflake as a facile data warehouse, still very viable. But both of these companies, they have much larger ambitions. They got big total available markets to chase and large valuations that they have to justify. So what's happening is, as we've previously reported, each of these companies is moving toward the other firm's core domain and they're building out an ecosystem that'll be critical for their future. So as part of that effort, we said each is gonna become aggressive investors and maybe start doing some m and a and they have in various companies. >>And on this chart that we produced last year, we studied some of the companies that were targets and we've added some recent investments of both Snowflake and Databricks. As you can see, they've both, for example, invested in elation snowflake's, put money into Lacework, the Secur security firm, ThoughtSpot, which is trying to democratize data with ai. Collibra is a governance platform and you can see Databricks investments in data transformation with D B T labs, Matillion doing simplified business intelligence hunters. So that's, you know, they're security investment and so forth. So other than our thought that we'd see Databricks I p o last year, this prediction been pretty spot on. So we'll give ourselves an A on that one. Now observability has been a hot topic and we've been covering it for a while with our friends at E T R, particularly Eric Bradley. Our number nine prediction last year was basically that if you're not cloud native and observability, you are gonna be in big trouble. >>So everything guys gotta go cloud native. And that's clearly been the case. Splunk, the big player in the space has been transitioning to the cloud, hasn't always been pretty, as we reported, Datadog real momentum, the elk stack, that's open source model. You got new entrants that we've cited before, like observe, honeycomb, chaos search and others that we've, we've reported on, they're all born in the cloud. So we're gonna take another a on this one, admittedly, yeah, it's a re reasonably easy call, but you gotta have a few of those in the mix. Okay, our last prediction, our number 10 was around events. Something the cube knows a little bit about. We said that a new category of events would emerge as hybrid and that for the most part is happened. So that's gonna be the mainstay is what we said. That pure play virtual events are gonna give way to hi hybrid. >>And the narrative is that virtual only events are, you know, they're good for quick hits, but lousy replacements for in-person events. And you know that said, organizations of all shapes and sizes, they learn how to create better virtual content and support remote audiences during the pandemic. So when we set at pure play is gonna give way to hybrid, we said we, we i we implied or specific or specified that the physical event that v i p experience is going defined. That overall experience and those v i p events would create a little fomo, fear of, of missing out in a virtual component would overlay that serves an audience 10 x the size of the physical. We saw that really two really good examples. Red Hat Summit in Boston, small event, couple thousand people served tens of thousands, you know, online. Second was Google Cloud next v i p event in, in New York City. >>Everything else was, was, was, was virtual. You know, even examples of our prediction of metaverse like immersion have popped up and, and and, and you know, other companies are doing roadshow as we predicted like a lot of companies are doing it. You're seeing that as a major trend where organizations are going with their sales teams out into the regions and doing a little belly to belly action as opposed to the big giant event. That's a definitely a, a trend that we're seeing. So in reviewing this prediction, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, but the, but the organization still haven't figured it out. They have hybrid experiences but they generally do a really poor job of leveraging the afterglow and of event of an event. It still tends to be one and done, let's move on to the next event or the next city. >>Let the sales team pick up the pieces if they were paying attention. So because of that, we're only taking a B plus on this one. Okay, so that's the review of last year's predictions. You know, overall if you average out our grade on the 10 predictions that come out to a b plus, I dunno why we can't seem to get that elusive a, but we're gonna keep trying our friends at E T R and we are starting to look at the data for 2023 from the surveys and all the work that we've done on the cube and our, our analysis and we're gonna put together our predictions. We've had literally hundreds of inbounds from PR pros pitching us. We've got this huge thick folder that we've started to review with our yellow highlighter. And our plan is to review it this month, take a look at all the data, get some ideas from the inbounds and then the e t R of January surveys in the field. >>It's probably got a little over a thousand responses right now. You know, they'll get up to, you know, 1400 or so. And once we've digested all that, we're gonna go back and publish our predictions for 2023 sometime in January. So stay tuned for that. All right, we're gonna leave it there for today. You wanna thank Alex Myerson who's on production and he manages the podcast, Ken Schiffman as well out of our, our Boston studio. I gotta really heartfelt thank you to Kristen Martin and Cheryl Knight and their team. They helped get the word out on social and in our newsletters. Rob Ho is our editor in chief over at Silicon Angle who does some great editing for us. Thank you all. Remember all these podcasts are available or all these episodes are available is podcasts. Wherever you listen, just all you do Search Breaking analysis podcast, really getting some great traction there. Appreciate you guys subscribing. I published each week on wikibon.com, silicon angle.com or you can email me directly at david dot valante silicon angle.com or dm me Dante, or you can comment on my LinkedIn post. And please check out ETR AI for the very best survey data in the enterprise tech business. Some awesome stuff in there. This is Dante for the Cube Insights powered by etr. Thanks for watching and we'll see you next time on breaking analysis.
SUMMARY :
From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, We kind of nailed the momentum in the energy but not the i p O that we had predicted Aqua Securities focus on And then, you know, I lumia holding its own, you So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge put some meat in the bone, so to speak, and and allow us than you to say, okay, We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain Kind of like an O K R and you know, we strive to provide data We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, Chris Miller of the register put out a Supercloud block diagram, something else that So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud But it's largely confined and narrow data problems with limited scope as you can see here with some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to the company so that, you know, CNN can work at their own pace. So it's often the case that data mesh is in the eyes of the implementer. but these are two companies that initially, you know, looked like they were shaping up as partners and they, So that's, you know, they're security investment and so forth. So that's gonna be the mainstay is what we And the narrative is that virtual only events are, you know, they're good for quick hits, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, You know, overall if you average out our grade on the 10 predictions that come out to a b plus, You know, they'll get up to, you know,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Alex Myerson | PERSON | 0.99+ |
Cheryl Knight | PERSON | 0.99+ |
Ken Schiffman | PERSON | 0.99+ |
Chris Miller | PERSON | 0.99+ |
CNN | ORGANIZATION | 0.99+ |
Rob Ho | PERSON | 0.99+ |
Alibaba | ORGANIZATION | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
5.1% | QUANTITY | 0.99+ |
2022 | DATE | 0.99+ |
Charles Fitzgerald | PERSON | 0.99+ |
Dave Hatfield | PERSON | 0.99+ |
Brian Gracely | PERSON | 0.99+ |
2019 | DATE | 0.99+ |
Lacework | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
GCP | ORGANIZATION | 0.99+ |
33% | QUANTITY | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
David | PERSON | 0.99+ |
2021 | DATE | 0.99+ |
20% | QUANTITY | 0.99+ |
Kristen Martin | PERSON | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
2020 | DATE | 0.99+ |
Ash Nair | PERSON | 0.99+ |
Goldman Sachs | ORGANIZATION | 0.99+ |
162 billion | QUANTITY | 0.99+ |
New York City | LOCATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
October | DATE | 0.99+ |
last year | DATE | 0.99+ |
Arctic Wolf | ORGANIZATION | 0.99+ |
two companies | QUANTITY | 0.99+ |
38% | QUANTITY | 0.99+ |
September | DATE | 0.99+ |
Fed | ORGANIZATION | 0.99+ |
JP Morgan Chase | ORGANIZATION | 0.99+ |
80 billion | QUANTITY | 0.99+ |
29% | QUANTITY | 0.99+ |
32% | QUANTITY | 0.99+ |
21 predictions | QUANTITY | 0.99+ |
30% | QUANTITY | 0.99+ |
HBO | ORGANIZATION | 0.99+ |
75% | QUANTITY | 0.99+ |
Game of Thrones | TITLE | 0.99+ |
January | DATE | 0.99+ |
2023 | DATE | 0.99+ |
10 predictions | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
22 | QUANTITY | 0.99+ |
ThoughtSpot | ORGANIZATION | 0.99+ |
196 million | QUANTITY | 0.99+ |
30 | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
Palo Alto Networks | ORGANIZATION | 0.99+ |
2020s | DATE | 0.99+ |
167 billion | QUANTITY | 0.99+ |
Okta | ORGANIZATION | 0.99+ |
Second | QUANTITY | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
Eric Bradley | PERSON | 0.99+ |
Aqua Securities | ORGANIZATION | 0.99+ |
Dante | PERSON | 0.99+ |
8% | QUANTITY | 0.99+ |
Warner Brothers | ORGANIZATION | 0.99+ |
Intuit | ORGANIZATION | 0.99+ |
Cube Studios | ORGANIZATION | 0.99+ |
each week | QUANTITY | 0.99+ |
7 billion | QUANTITY | 0.99+ |
40% | QUANTITY | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Starburst The Data Lies FULL V2b
>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.
SUMMARY :
famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Theresa | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Jeff Ocker | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Justin Boardman | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Dani | PERSON | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
20 cents | QUANTITY | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jamma | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
2011 | DATE | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Today | DATE | 0.99+ |
two models | QUANTITY | 0.99+ |
Zolando Comcast | ORGANIZATION | 0.99+ |
Gemma | PERSON | 0.99+ |
Starbust | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Javas | TITLE | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
first lie | QUANTITY | 0.99+ |
10 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
Tomorrow | DATE | 0.99+ |
Starburst The Data Lies FULL V1
>>In 2011, early Facebook employee and Cloudera co-founder Jeff Ocker famously said the best minds of my generation are thinking about how to get people to click on ads. And that sucks. Let's face it more than a decade later organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile data-driven enterprise. What does that even mean? You ask? Well, it means that everyone in the organization has the data they need when they need it. In a context that's relevant to advance the mission of an organization. Now that could mean cutting cost could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving, supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data, warehouses, data marts, data hubs, and yes, even data lakes were broken and left us wanting from more welcome to the data doesn't lie, or doesn't a series of conversations produced by the cube and made possible by Starburst data. >>I'm your host, Dave Lanta and joining me today are three industry experts. Justin Borgman is this co-founder and CEO of Starburst. Richard Jarvis is the CTO at EMI health and Theresa tongue is cloud first technologist at Accenture. Today we're gonna have a candid discussion that will expose the unfulfilled and yes, broken promises of a data past we'll expose data lies, big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth. Inevitable will the data warehouse ever have featured parody with the data lake or vice versa is the so-called modern data stack, simply centralization in the cloud, AKA the old guards model in new cloud close. How can organizations rethink their data architectures and regimes to realize the true promises of data can and will and open ecosystem deliver on these promises in our lifetimes, we're spanning much of the Western world today. Richard is in the UK. Teresa is on the west coast and Justin is in Massachusetts with me. I'm in the cube studios about 30 miles outside of Boston folks. Welcome to the program. Thanks for coming on. Thanks for having us. Let's get right into it. You're very welcome. Now here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >>Yeah, definitely a lie. My first startup was a company called hit adapt, which was an early SQL engine for hit that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem data in the cloud. You know, those companies were acquiring other companies and inheriting their data architecture. So, you know, despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >>So Richard, from a practitioner's point of view, you know, what, what are your thoughts? I mean, there, there's a lot of pressure to cut cost, keep things centralized, you know, serve the business as best as possible from that standpoint. What, what is your experience show? >>Yeah, I mean, I think I would echo Justin's experience really that we, as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in, in a platform that's close to data experts, people who really understand healthcare data from pharmacies or from, from doctors. And so, although if you were starting from a Greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that that businesses just don't grow up like that. And, and it's just really impossible to get that academic perfection of, of storing everything in one place. >>Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, you know, right. You actually did have to have a single version of the truth for certain financial data, but really for those, some of those other use cases, I, I mentioned, I, I do feel like the industry has kinda let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralized? >>I, I think you gotta have centralized governance, right? So from the central team, for things like star Oxley, for things like security for certainly very core data sets, having a centralized set of roles, responsibilities to really QA, right. To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise you're not gonna be able to scale. Right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately you're gonna collaborate with your partners. So partners that are not within the company, right. External partners, we're gonna see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >>So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, on data mesh. It was a great program. You invited Jamma, Dani, of course, she's the creator of the data mesh. And her one of our fundamental premises is that you've got this hyper specialized team that you've gotta go through. And if you want anything, but at the same time, these, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess question for you, Richard, how do you deal with that? Do you, do you organize so that there are a few sort of rock stars that, that, you know, build cubes and, and the like, and, and, and, or have you had any success in sort of decentralizing with, you know, your, your constituencies, that data model? >>Yeah. So, so we absolutely have got rockstar, data scientists and data guardians. If you like people who understand what it means to use this data, particularly as the data that we use at emos is very private it's healthcare information. And some of the, the rules and regulations around using the data are very complex and, and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business, because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a, a consulting type experience from a, a set of rock stars to help a, a more decentralized business who needs to, to understand the data and to generate some valuable output. >>Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, I got a centralized team and that's the most cost effective way to serve the business. Otherwise I got, I got duplication. What do you say to that? >>Well, I, I would argue it's probably not the most cost effective and, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you, you know, for many, many years to come. I think that's the story at Oracle or Terra data or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams is as much as they are experts in the technology. They don't necessarily understand the data itself. And this is one of the core tenants of data mash that that jam writes about is this idea of the domain owners actually know the data the best. >>And so by, you know, not only acknowledging that data is generally decentralized and to your earlier point about SAR, brain Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for, for those laws to be compliant. But I think the reality is, you know, the data mesh model basically says, data's decentralized, and we're gonna turn that into an asset rather than a liability. And we're gonna turn that into an asset by empowering the people that know the data, the best to participate in the process of, you know, curating and creating data products for, for consumption. So I think when you think about it, that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two, the two models comparing and contrasting. >>So do you think the demise of the data warehouse is inevitable? I mean, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing infrastructure. Maybe they're gonna build on top of it, but what does that mean? Does that mean the E D w just becomes, you know, less and less valuable over time, or it's maybe just isolated to specific use cases. What's your take on that? >>Listen, I still would love all my data within a data warehouse would love it. Mastered would love it owned by essential team. Right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date. I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's gonna be a new technology. That's gonna emerge that we're gonna wanna tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this, you know, new mesh layer that still takes advantage of the things. I mentioned, the data products in the systems that are meaningful today and the data products that actually might span a number of systems, maybe either those that either source systems for the domains that know it best, or the consumer based systems and products that need to be packaged in a way that be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >>So, Richard, let me ask you, you take, take Gemma's principles back to those. You got to, you know, domain ownership and, and, and data as product. Okay, great. Sounds good. But it creates what I would argue are two, you know, challenges, self-serve infrastructure let's park that for a second. And then in your industry, the one of the high, most regulated, most sensitive computational governance, how do you automate and ensure federated governance in that mesh model that Theresa was just talking about? >>Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to be, to centralize the security and the governance of the data. And I think, although a data warehouse makes that very simple, cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at emus is we have a single security layer that sits on top of our data match, which means that no matter which user is accessing, which data source, we go through a well audited well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is, is audited in a very kind of standard way, regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible understanding where your source of truth is and securing that in a common way is still a valuable approach and you can do it without having to bring all that data into a single bucket so that it's all in one place. And, and so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform and ensuring that only data that's available under GDPR and other regulations is being used by, by the data users. >>Yeah. So Justin, I mean, Democrat, we always talk about data democratization and you know, up until recently, they really haven't been line of sight as to how to get there. But do you have anything to add to this because you're essentially taking, you know, do an analytic queries and with data that's all dispersed all over the, how are you seeing your customers handle this, this challenge? >>Yeah. I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, people know the data, the best to, to create, you know, data as a product ultimately to be consumed. And we try to represent that in our product as effectively a almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization. And then you can start to consume them as, as you'd like. And so really trying to build on that notion of, you know, data democratization and self-service, and making it very easy to discover and, and start to use with whatever BI tool you, you may like, or even just running, you know, SQL queries yourself, >>Okay. G guys grab a sip of water. After this short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence, keep it right there. >>Your company has more data than ever, and more people trying to understand it, but there's a problem. Your data is stored across multiple systems. It's hard to access and that delays analytics and ultimately decisions. The old method of moving all of your data into a single source of truth is slow and definitely not built for the volume of data we have today or where we are headed while your data engineers spent over half their time, moving data, your analysts and data scientists are left, waiting, feeling frustrated, unproductive, and unable to move the needle for your business. But what if you could spend less time moving or copying data? What if your data consumers could analyze all your data quickly? >>Starburst helps your teams run fast queries on any data source. We help you create a single point of access to your data, no matter where it's stored. And we support high concurrency, we solve for speed and scale, whether it's fast, SQL queries on your data lake or faster queries across multiple data sets, Starburst helps your teams run analytics anywhere you can't afford to wait for data to be available. Your team has questions that need answers. Now with Starburst, the wait is over. You'll have faster access to data with enterprise level security, easy connectivity, and 24 7 support from experts, organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact our Trino experts to get started. >>We're back with Jess Borgman of Starburst and Richard Jarvis of EVAs health. Okay, we're gonna get to lie. Number two, and that is this an open source based platform cannot give you the performance and control that you can get with a proprietary system. Is that a lie? Justin, the enterprise data warehouse has been pretty dominant and has evolved and matured. Its stack has mature over the years. Why is it not the default platform for data? >>Yeah, well, I think that's become a lie over time. So I, I think, you know, if we go back 10 or 12 years ago with the advent of the first data lake really around Hudu, that probably was true that you couldn't get the performance that you needed to run fast, interactive, SQL queries in a data lake. Now a lot's changed in 10 or 12 years. I remember in the very early days, people would say, you you'll never get performance because you need to be column there. You need to store data in a column format. And then, you know, column formats we're introduced to, to data apes, you have Parque ORC file in aro that were created to ultimately deliver performance out of that. So, okay. We got, you know, largely over the performance hurdle, you know, more recently people will say, well, you don't have the ability to do updates and deletes like a traditional data warehouse. >>And now we've got the creation of new data formats, again like iceberg and Delta and Hodi that do allow for updates and delete. So I think the data lake has continued to mature. And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven years to build a functional database. I think that's that's right. And now we've had almost a decade go by. So, you know, these technologies have matured to really deliver very, very close to the same level performance and functionality of, of cloud data warehouses. So I think the, the reality is that's become a line and now we have large giant hyperscale internet companies that, you know, don't have the traditional data warehouse at all. They do all of their analytics in a data lake. So I think we've, we've proven that it's very much possible today. >>Thank you for that. And so Richard, talk about your perspective as a practitioner in terms of what open brings you versus, I mean, look closed is it's open as a moving target. I remember Unix used to be open systems and so it's, it is an evolving, you know, spectrum, but, but from your perspective, what does open give you that you can't get from a proprietary system where you are fearful of in a proprietary system? >>I, I suppose for me open buys us the ability to be unsure about the future, because one thing that's always true about technology is it evolves in a, a direction, slightly different to what people expect. And what you don't want to end up is done is backed itself into a corner that then prevents it from innovating. So if you have chosen a technology and you've stored trillions of records in that technology and suddenly a new way of processing or machine learning comes out, you wanna be able to take advantage and your competitive edge might depend upon it. And so I suppose for us, we acknowledge that we don't have perfect vision of what the future might be. And so by backing open storage technologies, we can apply a number of different technologies to the processing of that data. And that gives us the ability to remain relevant, innovate on our data storage. And we have bought our way out of the, any performance concerns because we can use cloud scale infrastructure to scale up and scale down as we need. And so we don't have the concerns that we don't have enough hardware today to process what we want to do, want to achieve. We can just scale up when we need it and scale back down. So open source has really allowed us to maintain the being at the cutting edge. >>So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, obviously her vision is there's an open source that, that the data meshes open source, an open source tooling, and it's not a proprietary, you know, you're not gonna buy a data mesh. You're gonna build it with, with open source toolings and, and vendors like you are gonna support it, but to come back to sort of today, you can get to market with a proprietary solution faster. I'm gonna make that statement. You tell me if it's a lie and then you can say, okay, we support Apache iceberg. We're gonna support open source tooling, take a company like VMware, not really in the data business, but how, the way they embraced Kubernetes and, and you know, every new open source thing that comes along, they say, we do that too. Why can't proprietary systems do that and be as effective? >>Yeah, well, I think at least with the, within the data landscape saying that you can access open data formats like iceberg or, or others is, is a bit dis disingenuous because really what you're selling to your customer is a certain degree of performance, a certain SLA, and you know, those cloud data warehouses that can reach beyond their own proprietary storage drop all the performance that they were able to provide. So it is, it reminds me kind of, of, again, going back 10 or 12 years ago when everybody had a connector to Haddo and that they thought that was the solution, right? But the reality was, you know, a connector was not the same as running workloads in Haddo back then. And I think similarly, you know, being able to connect to an external table that lives in an open data format, you know, you're, you're not going to give it the performance that your customers are accustomed to. And at the end of the day, they're always going to be predisposed. They're always going to be incentivized to get that data ingested into the data warehouse, cuz that's where they have control. And you know, the bottom line is the database industry has really been built around vendor lockin. I mean, from the start, how, how many people love Oracle today, but our customers, nonetheless, I think, you know, lockin is, is, is part of this industry. And I think that's really what we're trying to change with open data formats. >>Well, that's interesting reminded when I, you know, I see the, the gas price, the tees or gas price I, I drive up and then I say, oh, that's the cash price credit card. I gotta pay 20 cents more, but okay. But so the, the argument then, so let me, let me come back to you, Justin. So what's wrong with saying, Hey, we support open data formats, but yeah, you're gonna get better performance if you, if you keep it into our closed system, are you saying that long term that's gonna come back and bite you cuz you're gonna end up, you mentioned Oracle, you mentioned Teradata. Yeah. That's by, by implication, you're saying that's where snowflake customers are headed. >>Yeah, absolutely. I think this is a movie that, you know, we've all seen before. At least those of us who've been in the industry long enough to, to see this movie play over a couple times. So I do think that's the future. And I think, you know, I loved what Richard said. I actually wrote it down. Cause I thought it was an amazing quote. He said, it buys us the ability to be unsure of the future. Th that that pretty much says it all the, the future is unknowable and the reality is using open data formats. You remain interoperable with any technology you want to utilize. If you want to use spark to train a machine learning model and you want to use Starbust to query via sequel, that's totally cool. They can both work off the same exact, you know, data, data sets by contrast, if you're, you know, focused on a proprietary model, then you're kind of locked in again to that model. I think the same applies to data, sharing to data products, to a wide variety of, of aspects of the data landscape that a proprietary approach kind of closes you in and locks you in. >>So I, I would say this Richard, I'd love to get your thoughts on it. Cause I talked to a lot of Oracle customers, not as many te data customers, but, but a lot of Oracle customers and they, you know, they'll admit, yeah, you know, they're jamming us on price and the license cost they give, but we do get value out of it. And so my question to you, Richard, is, is do the, let's call it data warehouse systems or the proprietary systems. Are they gonna deliver a greater ROI sooner? And is that in allure of, of that customers, you know, are attracted to, or can open platforms deliver as fast in ROI? >>I think the answer to that is it can depend a bit. It depends on your businesses skillset. So we are lucky that we have a number of proprietary teams that work in databases that provide our operational data capability. And we have teams of analytics and big data experts who can work with open data sets and open data formats. And so for those different teams, they can get to an ROI more quickly with different technologies for the business though, we can't do better for our operational data stores than proprietary databases. Today we can back off very tight SLAs to them. We can demonstrate reliability from millions of hours of those databases being run at enterprise scale, but for an analytics workload where increasing our business is growing in that direction, we can't do better than open data formats with cloud-based data mesh type technologies. And so it's not a simple answer. That one will always be the right answer for our business. We definitely have times when proprietary databases provide a capability that we couldn't easily represent or replicate with open technologies. >>Yeah. Richard, stay with you. You mentioned, you know, you know, some things before that, that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts like me. You've got data bricks coming at it. Richard, you mentioned you have a lot of rockstar, data engineers, data bricks coming at it from a data engineering heritage. You get snowflake coming at it from an analytics heritage. Those two worlds are, are colliding people like PJI Mohan said, you know what? I think it's actually harder to play in the data engineering. So I E it's easier to for data engineering world to go into the analytics world versus the reverse, but thinking about up and coming engineers and developers preparing for this future of data engineering and data analytics, how, how should they be thinking about the future? What, what's your advice to those young people? >>So I think I'd probably fall back on general programming skill sets. So the advice that I saw years ago was if you have open source technologies, the pythons and Javas on your CV, you commander 20% pay, hike over people who can only do proprietary programming languages. And I think that's true of data technologies as well. And from a business point of view, that makes sense. I'd rather spend the money that I save on proprietary licenses on better engineers, because they can provide more value to the business that can innovate us beyond our competitors. So I think I would my advice to people who are starting here or trying to build teams to capitalize on data assets is begin with open license, free capabilities, because they're very cheap to experiment with. And they generate a lot of interest from people who want to join you as a business. And you can make them very successful early, early doors with, with your analytics journey. >>It's interesting. Again, analysts like myself, we do a lot of TCO work and have over the last 20 plus years. And in world of Oracle, you know, normally it's the staff, that's the biggest nut in total cost of ownership, not an Oracle. It's the it's the license cost is by far the biggest component in the, in the blame pie. All right, Justin, help us close out this segment. We've been talking about this sort of data mesh open, closed snowflake data bricks. Where does Starburst sort of as this engine for the data lake data lake house, the data warehouse fit in this, in this world? >>Yeah. So our view on how the future ultimately unfolds is we think that data lakes will be a natural center of gravity for a lot of the reasons that we described open data formats, lowest total cost of ownership, because you get to choose the cheapest storage available to you. Maybe that's S3 or Azure data lake storage, or Google cloud storage, or maybe it's on-prem object storage that you bought at a, at a really good price. So ultimately storing a lot of data in a deal lake makes a lot of sense, but I think what makes our perspective unique is we still don't think you're gonna get everything there either. We think that basically centralization of all your data assets is just an impossible endeavor. And so you wanna be able to access data that lives outside of the lake as well. So we kind of think of the lake as maybe the biggest place by volume in terms of how much data you have, but to, to have comprehensive analytics and to truly understand your business and understand it holistically, you need to be able to go access other data sources as well. And so that's the role that we wanna play is to be a single point of access for our customers, provide the right level of fine grained access controls so that the right people have access to the right data and ultimately make it easy to discover and consume via, you know, the creation of data products as well. >>Great. Okay. Thanks guys. Right after this quick break, we're gonna be back to debate whether the cloud data model that we see emerging and the so-called modern data stack is really modern, or is it the same wine new bottle? When it comes to data architectures, you're watching the cube, the leader in enterprise and emerging tech coverage. >>Your data is capable of producing incredible results, but data consumers are often left in the dark without fast access to the data they need. Starers makes your data visible from wherever it lives. Your company is acquiring more data in more places, more rapidly than ever to rely solely on a data centralization strategy. Whether it's in a lake or a warehouse is unrealistic. A single source of truth approach is no longer viable, but disconnected data silos are often left untapped. We need a new approach. One that embraces distributed data. One that enables fast and secure access to any of your data from anywhere with Starburst, you'll have the fastest query engine for the data lake that allows you to connect and analyze your disparate data sources no matter where they live Starburst provides the foundational technology required for you to build towards the vision of a decentralized data mesh Starburst enterprise and Starburst galaxy offer enterprise ready, connectivity, interoperability, and security features for multiple regions, multiple clouds and everchanging global regulatory requirements. The data is yours. And with Starburst, you can perform analytics anywhere in light of your world. >>Okay. We're back with Justin Boardman. CEO of Starbust Richard Jarvis is the CTO of EMI health and Theresa tongue is the cloud first technologist from Accenture. We're on July number three. And that is the claim that today's modern data stack is actually modern. So I guess that's the lie it's it is it's is that it's not modern. Justin, what do you say? >>Yeah. I mean, I think new isn't modern, right? I think it's the, it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually are exactly the same as what we've had for 40 years, rather than Terra data. You have snowflake rather than Informatica you have five trend. So it's the same general stack, just, you know, a cloud version of it. And I think a lot of the challenges that it plagued us for 40 years still maintain. >>So lemme come back to you just, but okay. But, but there are differences, right? I mean, you can scale, you can throw resources at the problem. You can separate compute from storage. You really, you know, there's a lot of money being thrown at that by venture capitalists and snowflake, you mentioned it's competitors. So that's different. Is it not, is that not at least an aspect of, of modern dial it up, dial it down. So what, what do you say to that? >>Well, it, it is, it's certainly taking, you know, what the cloud offers and taking advantage of that, but it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same sort of structural constraints that exist with the old enterprise data warehouse model OnPrem still exist just yes, a little bit more elastic now because the cloud offers that. >>So Theresa, let me go to you cuz you have cloud first in your, in your, your title. So what's what say you to this conversation? >>Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud, as we know it, maybe data lake data warehouse in the central place, that's not even how the cloud providers are looking at it. They have news query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our, the future goes, right, that that's gonna very much fall the same thing. There was gonna be more edge. There's gonna be more on premise because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers. Right? So there's a lot of reasons why the modern, I guess, the next modern generation of the data staff needs to be much more federated. >>Okay. So Richard, how do you deal with this? You you've obviously got, you know, the technical debt, the existing infrastructure it's on the books. You don't wanna just throw it out. A lot of, lot of conversation about modernizing applications, which a lot of times is a, you know, a microservices layer on top of leg legacy apps. How do you think about the modern data stack? >>Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just cuz you can scale CPU and storage doesn't mean you can get more people to use your data, to generate you more, more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data, but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams, we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business, not just the technology itself. >>Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five, seven years cloud obviously has given a different pricing model. De-risked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm, I'm taking away that that's not enough based on what Richard just said. The modern data stack has to serve the business and enable the business to build data products. I, I buy that. I'm a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about, you know, paying, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >>Of, of how it should look like or, or how >>It's yeah. What it should be. >>Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in, in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I, I certainly agree with that. So by no means, are we suggesting that, you know, snowflake or Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's, it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of, of idealism. They had the benefit of it was starting with a clean slate that does not reflect the vast majority of enterprises. >>And even those companies, as they grow up mature out of that ideal state, they go buy a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding, you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really Futureproof yourself from the inevitable change that you will, you won't encounter over time. >>So thank you. So there, based on what Justin just said, I, my takeaway there is it's inclusive, whether it's a data Mar data hub, data lake data warehouse, it's a, just a node on the mesh. Okay. I get that. Does that include there on Preem data? O obviously it has to, what are you seeing in terms of the ability to, to take that data mesh concept on Preem? I mean, most implementations I've seen in data mesh, frankly really aren't, you know, adhering to the philosophy. They're maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing. Hello, fresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >>I mean, I, I think it's a killer case for data. Me, the fact that you have valuable data sources, OnPrem, and then yet you still wanna modernize and take the best of cloud cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both worlds. You can start using the data products on-prem or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or, or maybe just tap into better analytics to get better insights, right? So you're gonna be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >>Okay. Thank you. So Richard, you know, talking about data as product, wonder if we could give us your perspectives here, what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >>So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients' demographics about their, their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight. And in any business, that's clearly not a desirable outcome, but when that insight is so critical, as it might be in healthcare or some security settings, you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured, managed way, even if that data comes from a variety of different sources in, in, in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >>So that data product through whatever APIs is, is accessible, it's discoverable, but it's obviously gotta be governed as well. You mentioned you, you appropriately provided to internally. Yeah. But also, you know, external folks as well. So the, so you've, you've architected that capability today >>We have, and because the data is standard, it can generate value much more quickly and we can be sure of the security and, and, and value that that's providing because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context, what does this data mean? And what does it mean to process this data for a particular use case? >>Yeah, it makes sense. It's got the context. If the, if the domains own the data, you, you gotta cut through a lot of the, the, the centralized teams, the technical teams that, that data agnostic, they don't really have that context. All right. Let's send Justin, how does Starburst fit into this modern data stack? Bring us home. >>Yeah. So I think for us, it's really providing our customers with, you know, the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know, and optionality provides the ability to reduce costs, store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, you can really create and, and curate, you know, data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know, model and make that an appropriate compliment to, you know, the, the, the modern data stack that people have today. >>Excellent. Hey, I wanna thank Justin Theresa and Richard for joining us today. You guys are great. I big believers in the, in the data mesh concept, and I think, you know, we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are gonna be available on the cube.net for on-demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and, and, and they have awesome resources, lots of data mesh conversations over there, and really good stuff in, in the resource section. So check that out. Thanks for watching the data doesn't lie or does it made possible by Starburst data? This is Dave Valante for the cube, and we'll see you next time. >>The explosion of data sources has forced organizations to modernize their systems and architecture and come to terms with one size does not fit all for data management today. Your teams are constantly moving and copying data, which requires time management. And in some cases, double paying for compute resources. Instead, what if you could access all your data anywhere using the BI tools and SQL skills your users already have. And what if this also included enterprise security and fast performance with Starburst enterprise, you can provide your data consumers with a single point of secure access to all of your data, no matter where it lives with features like strict, fine grained, access control, end to end data encryption and data masking Starburst meets the security standards of the largest companies. Starburst enterprise can easily be deployed anywhere and managed with insights where data teams holistically view their clusters operation and query execution. So they can reach meaningful business decisions faster, all this with the support of the largest team of Trino experts in the world, delivering fully tested stable releases and available to support you 24 7 to unlock the value in all of your data. You need a solution that easily fits with what you have today and can adapt to your architecture. Tomorrow. Starbust enterprise gives you the fastest path from big data to better decisions, cuz your team can't afford to wait. Trino was created to empower analytics anywhere and Starburst enterprise was created to give you the enterprise grade performance, connectivity, security management, and support your company needs organizations like Zolando Comcast and FINRA rely on Starburst to move their businesses forward. Contact us to get started.
SUMMARY :
famously said the best minds of my generation are thinking about how to get people to the data warehouse ever have featured parody with the data lake or vice versa is So, you know, despite being the industry leader for 40 years, not one of their customers truly had So Richard, from a practitioner's point of view, you know, what, what are your thoughts? although if you were starting from a Greenfield site and you were building something brand new, Y you know, Theresa, I feel like Sarbanes Oxley kinda saved the data warehouse, I, I think you gotta have centralized governance, right? So, you know, Justin, you guys last, geez, I think it was about a year ago, had a session on, And you can think of them Justin, what do you say to a, to a customer or prospect that says, look, Justin, I'm gonna, you know, for many, many years to come. But I think the reality is, you know, the data mesh model basically says, I mean, you know, there Theresa you work with a lot of clients, they're not just gonna rip and replace their existing that the mesh actually allows you to use all of them. But it creates what I would argue are two, you know, Well, it absolutely depends on some of the tooling and processes that you put in place around those do an analytic queries and with data that's all dispersed all over the, how are you seeing your the best to, to create, you know, data as a product ultimately to be consumed. open platforms are the best path to the future of data But what if you could spend less you create a single point of access to your data, no matter where it's stored. give you the performance and control that you can get with a proprietary system. I remember in the very early days, people would say, you you'll never get performance because And I remember a, a quote from, you know, Kurt Monash many years ago where he said, you know, know it takes six or seven it is an evolving, you know, spectrum, but, but from your perspective, And what you don't want to end up So Jess, let me play devil's advocate here a little bit, and I've talked to Shaak about this and you know, And I think similarly, you know, being able to connect to an external table that lives in an open data format, Well, that's interesting reminded when I, you know, I see the, the gas price, And I think, you know, I loved what Richard said. not as many te data customers, but, but a lot of Oracle customers and they, you know, And so for those different teams, they can get to an ROI more quickly with different technologies that strike me, you know, the data brick snowflake, you know, thing is, oh, is a lot of fun for analysts So the advice that I saw years ago was if you have open source technologies, And in world of Oracle, you know, normally it's the staff, easy to discover and consume via, you know, the creation of data products as well. really modern, or is it the same wine new bottle? And with Starburst, you can perform analytics anywhere in light of your world. And that is the claim that today's So it's the same general stack, just, you know, a cloud version of it. So lemme come back to you just, but okay. So a lot of the same sort of structural constraints that exist with So Theresa, let me go to you cuz you have cloud first in your, in your, the data staff needs to be much more federated. you know, a microservices layer on top of leg legacy apps. So I think the stack needs to support a scalable So you think about the past, you know, five, seven years cloud obviously has given What it should be. And I think that's the paradigm shift that needs to occur. data that lives outside of the data warehouse, maybe living in open data formats in a data lake seen in data mesh, frankly really aren't, you know, adhering to So the mesh allows you to have the best of both worlds. So Richard, you know, talking about data as product, wonder if we could give us your perspectives is expecting means that you generate the wrong insight. But also, you know, around the data to say in a very clear business context, It's got the context. And ultimately with this concept of data products that we've now, you know, incorporated into our offering as well, This is Dave Valante for the cube, and we'll see you next time. You need a solution that easily fits with what you have today and can adapt
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Richard | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Jess Borgman | PERSON | 0.99+ |
Justin | PERSON | 0.99+ |
Theresa | PERSON | 0.99+ |
Justin Borgman | PERSON | 0.99+ |
Teresa | PERSON | 0.99+ |
Jeff Ocker | PERSON | 0.99+ |
Richard Jarvis | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Justin Boardman | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Dani | PERSON | 0.99+ |
Massachusetts | LOCATION | 0.99+ |
20 cents | QUANTITY | 0.99+ |
Teradata | ORGANIZATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jamma | PERSON | 0.99+ |
UK | LOCATION | 0.99+ |
FINRA | ORGANIZATION | 0.99+ |
40 years | QUANTITY | 0.99+ |
Kurt Monash | PERSON | 0.99+ |
20% | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Jess | PERSON | 0.99+ |
2011 | DATE | 0.99+ |
Starburst | ORGANIZATION | 0.99+ |
10 | QUANTITY | 0.99+ |
Accenture | ORGANIZATION | 0.99+ |
seven years | QUANTITY | 0.99+ |
thousands | QUANTITY | 0.99+ |
pythons | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
GDPR | TITLE | 0.99+ |
Today | DATE | 0.99+ |
two models | QUANTITY | 0.99+ |
Zolando Comcast | ORGANIZATION | 0.99+ |
Gemma | PERSON | 0.99+ |
Starbust | ORGANIZATION | 0.99+ |
JPMC | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Javas | TITLE | 0.99+ |
today | DATE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
millions | QUANTITY | 0.99+ |
first lie | QUANTITY | 0.99+ |
10 | DATE | 0.99+ |
12 years | QUANTITY | 0.99+ |
one place | QUANTITY | 0.99+ |
Tomorrow | DATE | 0.99+ |
Jon Loyens, data.world | Snowflake Summit 2022
>>Good morning, everyone. Welcome back to the Cube's coverage of snowflake summit 22 live from Caesar's forum in Las Vegas. Lisa Martin, here with Dave Valante. This is day three of our coverage. We've had an amazing, amazing time. Great conversations talking with snowflake executives, partners, customers. We're gonna be digging into data mesh with data.world. Please welcome John loins, the chief product officer. Great to have you on the program, John, >>Thank you so much for, for having me here. I mean, the summit, like you said, has been incredible, so many great people, so such a good time, really, really nice to be back in person with folks. >>It is fabulous to be back in person. The fact that we're on day four for, for them. And this is the, the solution showcase is as packed as it is at 10 11 in the morning. Yeah. Is saying something >>Yeah. Usually >>Chopping at the bit to hear what they're doing and innovate. >>Absolutely. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, especially >>In Vegas. This is impressive. Talk to the audience a little bit about data.world, what you guys do and talk about the snowflake relationship. >>Absolutely data.world is the only true cloud native enterprise data catalog. We've been an incredible snowflake partner and Snowflake's been an incredible partner to us really since 2018. When we became the first data catalog in the snowflake partner connect experience, you know, snowflake and the data cloud make it so possible. And it's changed so much in terms of being able to, you know, very easily transition data into the cloud to break down those silos and to have a platform that enables folks to be incredibly agile with data from an engineering and infrastructure standpoint, data out world is able to provide a layer of discovery and governance that matches that agility and the ability for a lot of different stakeholders to really participate in the process of data management and data governance. >>So data mesh basically Jamma, Dani lays out the first of all, the, the fault domains of existing data and big data initiatives. And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams that you have to go through and it just slows everything down and it doesn't scale. They don't have domain context. So she came up with four principles if I may, yep. Domain ownership. So push it out to the businesses. They have the context they should own the data. The second is data as product. We're certainly hearing a lot about that today this week. The third is that. So that makes it sounds good. Push out the, the data great, but it creates two problems. Self-serve infrastructure. Okay. But her premises infrastructure should be an operational detail. And then the fourth is computational governance. So you talked about data CA where do you fit in those four principles? >>You know, honestly, we are able to help teams realize the data mesh architecture. And we know that data mesh is really, it's, it's both a process in a culture change, but then when you want to enact a process in a culture change like this, you also need to select the appropriate tools to match the culture that you're trying to build the process in the architecture that you're trying to build. And the data world data catalog can really help along all four of those axes. When you start thinking first about, let's say like, let's take the first one, you know, data as a product, right? We even like very meta of us from metadata management platform at the end of the day. But very meta of us. When you talk about data as a product, we track adoption and usage of all your data assets within your organization and provide program teams and, you know, offices of the CDO with incredible evented analytics, very detailed that gives them the right audit trail that enables them to direct very scarce data engineering, data architecture resources, to make sure that their data assets are getting adopted and used properly. >>On the, on the domain driven side, we are entirely knowledge graph and open standards based enabling those different domains. We have, you know, incredible joint snowflake customers like Prologis. And we chatted a lot about this in our session here yesterday, where, because of our knowledge graph underpinnings, because of the flexibility of our metadata model, it enables those domains to actually model their assets uniquely from, from group to group, without having to, to relaunch or run different environments. Like you can do that all within one day catalog platform without having to have separate environments for each of those domains, federated governance. Again, the amount of like data exhaust that we create that really enables ambient governance and participatory governance as well. We call it agile data governance, really the adoption of agile and open principles applied to governance to make it more inclusive and transparent. And we provide that in a way that Confederate across those means and make it consistent. >>Okay. So you facilitate across that whole spectrum of, of principles. And so what in the, in the early examples of data mesh that I've studied and actually collaborated with, like with JPMC, who I don't think is who's not using your data catalog, but hello, fresh who may or may not be, but I mean, there, there are numbers and I wanna get to that. But what they've done is they've enabled the domains to spin up their own, whatever data lakes, data, warehouses, data hubs, at least in, in concept, most of 'em are data lakes on AWS, but still in concept, they wanna be inclusive and they've created a master data catalog. And then each domain has its sub catalogue, which feeds into the master and that's how they get consistency and governance and everything else is, is that the right way to think about it? And or do you have a different spin on that? >>Yeah, I, I, you know, I have a slightly different spin on it. I think organizationally it's the right way to think about it. And in absence of a catalog that can truly have multiple federated metadata models, multiple graphs in one platform, I, that is really kind of the, the, the only way to do it, right with data.world. You don't have to do that. You can have one platform, one environment, one instance of data.world that spans all of your domains, enable them to operate independently and then federate across. So >>You just answered my question as to why I should use data.world versus Amazon glue. >>Oh, absolutely. >>And that's a, that's awesome that you've done now. How have you done that? What, what's your secret >>Sauce? The, the secret sauce era is really an all credit to our CTO. One of my closest friends who was a true student of knowledge graph practices and principles, and really felt that the right way to manage metadata and knowledge about the data analytics ecosystem that companies were building was through federated linked data, right? So we use standards and we've built a, a, an open and extensible metadata model that we call costs that really takes the best parts of existing open standards in the semantics space. Things like schema.org, DCA, Dublin core brings them together and models out the most typical enterprise data assets providing you with an ontology that's ready to go. But because of the graph nature of what we do is instantly accessible without having to rebuild environments, without having to do a lot of management against it. It's, it's really quite something. And it's something all of our customers are, are very impressed with and, and, and, and, you know, are getting a lot of leverage out of, >>And, and we have a lot of time today, so we're not gonna shortchange this topic. So one last question, then I'll shut up and let you jump in. This is an open standard. It's not open source. >>No, it's an open built on open standards, built on open standards. We also fundamentally believe in extensibility and openness. We do not want to vertically like lock you into our platform. So everything that we have is API driven API available. Your metadata belongs to you. If you need to export your graph, you know, instantly available in open machine readable formats. That's really, we come from the open data community. That was a lot of the founding of data.world. We, we worked a lot in with the open data community and we, we fundamentally believe in that. And that's enabled a lot of our customers as well to truly take data.world and not have it be a data catalog application, but really an entire metadata management platform and extend it even further into their enterprise to, to really catalog all of their assets, but also to build incredible integrations to things like corporate search, you know, having data assets show up in corporate Wiki search, along with all the, the descriptive metadata that people need has been incredibly powerful and an incredible extension of our platform that I'm so happy to see our customers in. >>So leasing. So it's not exclusive to, to snowflake. It's not exclusive to AWS. You can bring it anywhere. Azure GCP, >>Anytime. Yeah. You know where we are, where we love snowflake, look, we're at the snowflake summit. And we've always had a great relationship with snowflake though, and really leaned in there because we really believe Snowflake's principles, particularly around cloud and being cloud native and the operating advantages that it affords companies that that's really aligned with what we do. And so snowflake was really the first of the cloud data catalogs that we ultimately or say the cloud data warehouses that we integrated with and to see them transition to building really out the data cloud has been awesome. >>Talk about how data world and snowflake enable companies like per lodges to be data companies. These days, every company has to be a data company, but they, they have to be able to do so quickly to be competitive and to, to really win. How do you help them if we like up level the conversation to really impacting the overall business? >>That's a great question, especially right now, everybody knows. And pro is a great example. They're a logistics and supply chain company at the end of the day. And we know how important logistics and supply chain is nowadays and for them and for a lot of our customers. I think one of the advantages of having a data catalog is the ability to build trust, transparency and inclusivity into their data analytics practice by adopting agile principles, by adopting a data mesh, you're able to extend your data analytics practice to a much broader set of stakeholders and to involve them in the process while the work is getting done. One of the greatest things about agile software development, when it became a thing in the early two thousands was how inclusive it was. And that inclusivity led to a much faster ROI on software projects. And we see the same thing happening in data analytics, people, you know, we have amazing data scientists and data analysts coming up with these insights that could be business changing that could make their company significantly more resilient, especially in the face of economic uncertainty. >>But if you have to sit there and argue with your business stakeholders about the validity of the data, about the, the techniques that were used to do the analysis, and it takes you three months to get people to trust what you've done, that opportunity's passed. So how do we shorten those cycles? How do we bring them closer? And that's, that's really a huge benefit that like Prologis has, has, has realized just tightening that cycle time, building trust, building inclusion, and making sure ultimately humans learn by doing, and if you can be inclusive, it, even, it even increases things like that. We all want to, to, to, to help cuz Lord knows the world needs it. Things like data literacy. Yeah. Right. >>So data.world can inform me as to where on the spectrum of data quality, my data set lives. So I can say, okay, this is usable, shareable, you know, exactly of gold standard versus fix this. Right. Okay. Yep. >>Yep. >>That's yeah. Okay. And you could do that with one data catalog, not a bunch of >>Yeah. And trust trust is really a multifaceted and multi multi-angle idea, right? It's not just necessarily data quality or data observability. And we have incredible partnerships in that space, like our partnership with, with Monte Carlo, where we can ingest all their like amazing observability information and display that in a really like a really consumable way in our data catalog. But it also includes things like the lineage who touch it, who is involved in the process of a, can I get a, a, a question answered quickly about this data? What's it been used for previously? And do I understand that it's so multifaceted that you have to be able to really model and present that in a way that's unique to any given organization, even unique within domains within a single organization. >>If you're not, that means to suggest you're a data quality. No, no supplier. Absolutely. But your partner with them and then that you become the, the master catalog. >>That's brilliant. I love it. Exactly. And you're >>You, you just raised your series C 15 million. >>We did. Yeah. So, you know, really lucky to have incredible investors like Goldman Sachs, who, who led our series C it really, I think, communicates the trust that they have in our vision and what we're doing and the impact that we can have on organization's ability to be agile and resilient around data analytics, >>Enabling customers to have that single source of truth is so critical. You talked about trust. That is absolutely. It's no joke. >>Absolutely. >>That is critical. And there's a tremendous amount of business impact, positive business impact that can come from that. What are some of the things that are next for data.world that we're gonna see? >>Oh, you know, I love this. We have such an incredibly innovative team. That's so dedicated to this space and the mission of what we're doing. We're out there trying to fundamentally change how people get data analytics work done together. One of the big reasons I founded the company is I, I really truly believe that data analytics needs to be a team sport. It needs to go from, you know, single player mode to team mode and everything that we've worked on in the last six years has leaned into that. Our architecture being cloud native, we do, we've done over a thousand releases a year that nobody has to manage. You don't have to worry about upgrading your environment. It's a lot of the same story that's made snowflake. So great. We are really excited to have announced in March on our own summit. And we're rolling this suite of features out over the course of the year, a new package of features that we call data.world Eureka, which is a suite of automations and, you know, knowledge driven functionality that really helps you leverage a knowledge graph to make decisions faster and to operationalize your data in, in the data ops way with significantly less effort, >>Big, big impact there. John, thank you so much for joining David, me unpacking what data world is doing. The data mesh, the opportunities that you're giving to customers and every industry. We appreciate your time and congratulations on the news and the funding. >>Ah, thank you. It's been a, a true pleasure. Thank you for having me on and, and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. Thank you. >>We will. All right. For our guest and Dave ante, I'm Lisa Martin. You're watching the cubes third day of coverage of snowflake summit, 22 live from Vegas, Dave and I will be right back with our next guest. So stick around.
SUMMARY :
Great to have you on the program, John, I mean, the summit, like you said, has been incredible, It is fabulous to be back in person. Usually those last days of conferences, everybody starts getting a little tired, but we're not seeing that at all here, what you guys do and talk about the snowflake relationship. And it's changed so much in terms of being able to, you know, very easily transition And she boils it down to the fact that it's just this monolithic architecture with hyper specialized teams about, let's say like, let's take the first one, you know, data as a product, We have, you know, incredible joint snowflake customers like Prologis. governance and everything else is, is that the right way to think about it? And in absence of a catalog that can truly have multiple federated How have you done that? of knowledge graph practices and principles, and really felt that the right way to manage then I'll shut up and let you jump in. an incredible extension of our platform that I'm so happy to see our customers in. It's not exclusive to AWS. first of the cloud data catalogs that we ultimately or say the cloud data warehouses but they, they have to be able to do so quickly to be competitive and to, thing happening in data analytics, people, you know, we have amazing data scientists and data the data, about the, the techniques that were used to do the analysis, and it takes you three So I can say, okay, this is usable, shareable, you know, That's yeah. that you have to be able to really model and present that in a way that's unique to any then that you become the, the master catalog. And you're that we can have on organization's ability to be agile and resilient Enabling customers to have that single source of truth is so critical. What are some of the things that are next for data.world that we're gonna see? It needs to go from, you know, single player mode to team mode and everything The data mesh, the opportunities that you're giving to customers and every industry. and I hope, I hope you guys enjoy the rest of, of the day and, and your other guests that you have. So stick around.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Jon Loyens | PERSON | 0.99+ |
Monte Carlo | ORGANIZATION | 0.99+ |
John loins | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
March | DATE | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
Vegas | LOCATION | 0.99+ |
Goldman Sachs | ORGANIZATION | 0.99+ |
yesterday | DATE | 0.99+ |
three months | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
one platform | QUANTITY | 0.99+ |
one day | QUANTITY | 0.99+ |
third | QUANTITY | 0.99+ |
two problems | QUANTITY | 0.99+ |
fourth | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
2018 | DATE | 0.99+ |
15 million | QUANTITY | 0.98+ |
Dani | PERSON | 0.98+ |
second | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
third day | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
Snowflake | ORGANIZATION | 0.98+ |
DCA | ORGANIZATION | 0.98+ |
one last question | QUANTITY | 0.98+ |
data.world. | ORGANIZATION | 0.97+ |
Prologis | ORGANIZATION | 0.97+ |
JPMC | ORGANIZATION | 0.97+ |
each domain | QUANTITY | 0.97+ |
today this week | DATE | 0.97+ |
Jamma | PERSON | 0.97+ |
both | QUANTITY | 0.97+ |
first data catalog | QUANTITY | 0.95+ |
Snowflake Summit 2022 | EVENT | 0.95+ |
each | QUANTITY | 0.94+ |
today | DATE | 0.94+ |
single | QUANTITY | 0.94+ |
data.world | ORGANIZATION | 0.93+ |
day three | QUANTITY | 0.93+ |
one | QUANTITY | 0.93+ |
one instance | QUANTITY | 0.92+ |
over a thousand releases a year | QUANTITY | 0.92+ |
day four | QUANTITY | 0.91+ |
Snowflake | TITLE | 0.91+ |
four | QUANTITY | 0.91+ |
10 11 in the morning | DATE | 0.9+ |
22 | QUANTITY | 0.9+ |
one environment | QUANTITY | 0.9+ |
single organization | QUANTITY | 0.88+ |
four principles | QUANTITY | 0.86+ |
agile | TITLE | 0.85+ |
last six years | DATE | 0.84+ |
one data catalog | QUANTITY | 0.84+ |
Eureka | ORGANIZATION | 0.83+ |
Azure GCP | TITLE | 0.82+ |
Caesar | PERSON | 0.82+ |
series C | OTHER | 0.8+ |
Cube | ORGANIZATION | 0.8+ |
data.world | OTHER | 0.78+ |
Lord | PERSON | 0.75+ |
thousands | QUANTITY | 0.74+ |
single source | QUANTITY | 0.74+ |
Dublin | ORGANIZATION | 0.73+ |
snowflake summit 22 | EVENT | 0.7+ |
Wiki | TITLE | 0.68+ |
schema.org | ORGANIZATION | 0.67+ |
early two | DATE | 0.63+ |
CDO | TITLE | 0.48+ |
Rik Tamm Daniels, Informatica & Peter Ku, Informatica | Snowflake Summit 2022
>>Hey everyone. Welcome back to the cube. Lisa Martin here with Dave ante, we're covering snowflake summit 22. This is Dave two of our wall to wall cube coverage of three days. We've been talking with a lot of customers partners, and we've got some more partners to talk with us. Next. Informatica two of our guests are back with us on the program. Rick TA Daniels joins us the G P global ecosystems and technology at Informatica and Peter COO vice president and chief strategist banking and financial services. Welcome guys. >>Thank you guys. Thanks for having us, Peter, >>Talk to us about what some of the trends are that you're seeing in the financial services space with respect to cloud and data and AI. >>Absolutely. You know, I'd say 10 years ago, the conversation around cloud was what is that? Right? How do we actually, or no way, because there was a lot of concerns about privacy and security and so forth. You know, now, as you see organizations modernizing their business capabilities, they're investing in cloud solutions for analytics applications, as well as data data being not only just a byproduct of transactions and interactions in financial services, it truly fuels business success. But we have a term here in Informatica where data really has no value unless it's fit for business. Use data has to be accessible in the systems and applications you use to run your business. It has to be clean. It has to be valid. It has to be transparent. People need to understand where it comes from, where it's going, how it's used and who's using it. It also has to be understood by the business. >>You can have all the data in the world and your business applications, but people don't know what they need it to use it for how they should use it. It has no value as well. And then lastly, it has to be protected when it matters most what we're seeing across financial services, that with the evolution of cloud now, really being the center of focus for many of the net new investments, data is scattered everywhere, not just in one cloud environment, but in multiple cloud environments, but they're still dealing with many of the on premise systems that have been running this industry for many, many years. So organizations need to have the ability to understand what they need to do with their data. More importantly, tie that to a measurable business outcome. So we're seeing the data conversation really at the board level, right? It's an asset of the business. It's no longer just owned by it. Data governance brings both business technology and data leaders together to really understand how do we use manage, govern and really leverage data for positive business outcomes. So we see that as an imperative that cuts across all sectors of financial services, both for large firms, as well as for the mid-market so >>Quick follow up. If I, may you say it's a board level. I totally agree. Is it also a line of business level? Are you seeing increasingly that line of businesses are leaning in owning the data, be building data products and the like >>Absolutely. Because at the end of the day business needs information in order to be successful. And data ownership now really belongs in the front office. Business executives understand that data again is not just a bunch of zeros and ones. These are critical elements for them make decisions and to run their business, whether it's to improve customer experience, whether it's to grow Wallace share, whether it's to comply with regulations, manage risks in today's environment. And of course being agile business knows that data's important. They have ownership of it and technology and data organizations help facilitate that solutions. And of course the investments to ensure that business can make the decisions and take the appropriate actions. >>A lot of asks and requirements on data. That's a big challenge for organizations. You mentioned. Well, one of the things that we've mentioned many times on this program recently is every company has to be a data company. There is no more, it's not an option anymore. If you wanna be successful, how does Informatica help customers navigate all of the requirements on data for them to be able to extract that business value and create new products and services in a timely fashion? >>So Informatica announced what we call the intelligent data management cloud platform. The platform has capabilities to help organizations access the data that they need, share it across to applications that run their business, be able to identify and deal with data, quality issues and requirements. Being able to provide that transparency, the lineage that people need across multiple environments. So we've been investing in this platform that really allows our customers to take advantage of these critical data management, data governance and data privacy requirements, all in one single solution. So we're no longer out there just selling piecemeal products. The platform is the offering that we provide across all industries. >>So how has that affected the way Informatica does business over the last several years? Snowflake is relatively new. You guys have been around a long time. How has your business evolved and specifically, how are you serving the snowflake yeah. Joint customers with >>Informatica? Yeah, I think then when I've been talking with folks here at the event, there are two big areas that keep coming up. So, so data governance, data governance, data governance, right? It's such a hot topic out there. And as Peter was mentioning, data governance is a critical enabler of access to data. In fact, there is an IDC study for last year that said that, you know, 80, 84% of executives, you know, no surprise, right? They wanna have data driven outcomes, data driven organizations, but only 30% of practitioners actually use data to make decisions. There's a huge gap there. And really that's where governance comes in and creating trust around data and not only creating trust, but delivering data to and users. So that's one big trend. The other one is departmental user adoption. We're seeing a, a huge push towards agility and rapid startup of new projects, new data driven transformations that are happening at the departmental level, you know, individual contributors, that sort of thing. So Informatica, we did a made announcement yesterday with snowflake of a whole host of innovations that are really targeting those two big trend areas. >>I wanna get into the announcements, but you know, the point about governance and, and users, business users being reluctant, it's kind of chicken and egg, isn't it. If, if I don't have the governance, I'm, I'm afraid to use it. But even if I do have it, there's the architecture of my, my, my company, my, my data organization, you know, may not facilitate that. And so I'm gonna change the architect, but then it's a wild west. So it has to be governed. Isn't that a challenge that company companies >>Absolutely, and, and governance is, is a lot more than just technology, right? It's of a people process problem. And there really is a community or an ecosystem inside every organization for governance. So it's really important that when you think about deploying governance and being successful, that every stakeholder have the ability to interact with this common framework, right. They get what they need out of it. It's tailored for how they wanna work. You've got your it folks, you got your chief data officer data stewards, you have your privacy folks and you have your business users. They're all different personas. So we really focus on creating a holistic, single pane of glass view with our cloud data governance and catalog offering that that really takes all the way from the raw technical data and actually delivers data in, in a shopping cart, like experience for actual enterprise users. Right? And, and so I think that's when data governance goes from historically data, governments was seen as an impediment. It was seen as a tax, I think, but now it's really an accelerator, an enabler and driving consumption of data, which in turn for our friends here at snowflake is exactly what they're looking for. >>Talk about the news. So data loader, what does that do? >>Well, it's all in the name. We say, no, the data loader it, it's a free utility that we announced here at, at snowflake summit that allows any user to sign up. It's completely free, no capacity limits. You just need an email address, three simple steps start rapidly loading data into snowflake. Right? So that first step is just get data in there. Start working with snowflake. Informatica is investing and making that easy for every single user out there. And especially those departmental users who wanna get started quickly. >>Yeah. So, I mean, that's a key part point of getting data into the snowflake data cloud, right? It's like any cloud, you gotta get data in. How does it work with, with customers? I mean, you guys are, are known, you have a long history of, you know, extract transform ETL. How does it work in the snowflake world? Is it, is it different? Is it, you remember the Hadoop days? It was, it was E LT, right? How are customers doing that today in this environment? >>Yeah, it's different. I mean, there, there are a lot of the, the same patterns are still in play. There's a lot more of a rapid data loading, right. Is a key theme. Just get it into snowflake and then work on the data, transform it inside of snowflake. So it's, it's a flavor of T right. But it's really pushing down to the snowflake data cloud as opposed to Hado with spark or something like that. Right. So that, that's definitely how customers are using it. And, you know, majority of our customers actually with snowflake are using our cloud technology, but we're also helping customers who are on premise customers, automate the migration from our on-premises technology to our cloud native platform as well. Yeah. >>And I'd say, you know, in addition to that, if you think about building a snowflake environment, Informatica helps with our data loader solution, but that's not enough. Then now you need to get value out of your data. So you can put raw data into the snowflake environment, but then you realize the data's not actually fit for business use, what do we need to do actually transform it to clean it, to govern it. And our customers that use Informatica with snowflake are managing the entire data management and data governance process so that they can allow the business to get value out of the snowflake investment. >>How quickly can you enable a business to get value from that data to be able to make business decisions that can transform right. Deliver competitive advantage? >>I think it really depends on an organization on a case by case basis. At the end of the day, you need to understand why are you doing this in the first place, right? What's the business outcome that you're trying to achieve next, identify what data elements do you actually need to capture, govern and manage in order to support the decisions and the actions that the business needs to take. If you don't have those things defined, that's where data governance comes into play. Then all you're doing is setting up a technical environment with a bunch of zeros in ones that no one knows what to do with. So we talk about data governance more holistically, say, you need to align it to your business outcomes, but ensure that you have people, processes, roles, and responsibilities, and the underlying technology to not just load data into snowflake, but to leverage it again for the business needs across the organization. >>Oh, good, please. >>I just wanted to add to that real quickly. Yeah. One of the things Informatica we're philosophically focused on is how do you accelerate the entire business of data management? So with our, our cloud platform, we have what's called our clear AI engine, right? So we use AI techniques, machine learning recommendations to accelerate with the, the knowledge of the metadata of what's gone on the organization. For example, that when we discover data assets figure out is this customer data, is it product data that dramatically shortens the time to find data assets deliver them? And so across our whole portfolio, we're taking things that were traditionally months to do. We're taking 'em down to weeks and days and even hours, right? So that's the whole goal is just accelerate that entire journey and life cycle through cloud native approaches and AI. Yeah, >>You kind of just answered my question. I think Rick, so you have this joint value statement together. We help customers. This is informatic and snowflake together. We help customers modernize their data. Architecture enable the most critical workloads, provide AI driven data governance and accelerate added value with advanced analytics. I mean, you definitely touched on some of those, but kind of unpack the rest of that. What do you mean by modernize? What is their data architecture? What is that? Let's start there. What does that look like? Modernizing a data. Yeah. >>So, so a lot with so many customers, right? They, they built data warehouses, core data and analytics systems on premises, right? They're using ETL technology using those, those either warehouse, appliances or databases. And what they're looking for is they wanna move to a cloud native model, right. And all the benefits of cloud in terms of TCO elasticity, instant scale up agility, all those benefits. So we're looking, we're looking to do with our, our modernization programs for our, for our current customer base that are on premises. We automate the process to get them to a fully cloud native, which means they can now do hybrid. They can do multi-cloud elastic processing. And it's all also in a consumption based model that we introduced about about a year and a half ago. So, so they're looking for all those elements of a cloud native platform and they're, but they're solving the same problems, right? We still have to connect data. We still have to transform data, prepare it, cleanse it, all those things exist, but in a, in a cloud native footprint, and that's what we're helping them get to. >>And the modern architecture these days, quite honestly, it's no longer about getting best breed tools and stitching them together and hoping that it will actually work. And Informatica is value proposition that our platform has all those capabilities as services. So our customers don't have to deal with the costs and the risks of trying to make everything work behind the scenes and what we've done with IDMC or intelligent data management cloud for financial services, retail, CPG, and healthcare and life sciences. In addition to our core capabilities and our clear AI machine learning engine, we also have industry accelerators, prebuilt data, quality rules for certain regulations in within banking. We've got master data management, customer models for healthcare insurance industry, all prebuilt. So these are accelerators that we've actually built over the years. And we're now making available to our customers who adopt informatic as intelligent data management cloud for their data management and governance needs. >>And then, and then the other part of this statement that that's interesting is provide AI driven data governance. You know, we are seeing a move toward, you know, decentralized data architectures and, and, and organizations. And we talk to snowflake about that. They go, yeah, we're globally distributed cloud. Okay, great. So that's decent place, but what we see a lot of customers doing to say, okay, we're gonna give lines of business responsibility for data. We're gonna argue about who owns what. And then once we settle that here's your own, here's your own data lake. Maybe they they'll try to cobble together a catalog or a super catalog. Right. And then they'll try to figure out, you know, some algorithms to, to determine data quality, you know, best, you know, okay. Don't use. Right, right. So that, so if I understand it, you automate all that. >>So what we're doing with AI machine learning is really helping the data professional, whether in the business, in technology or in between not only to get the job done faster, better, and cheaper, but actually do it intelligently. What do we mean by that? For example, our AI engine machine learning will look at data patterns and determine not only what's wrong with your data, but how should you fix it and recommend data quality rules to actually apply them and get those errors addressed. We also infer data relationships across a multi-cloud environment where those definitions were never there in the beginning. So we have the ability to scan the metadata and determine, Hey, this data set is actually related to that data set across multiple clouds. It makes the organization more productive, but more importantly, it increases the confidence level that these organizations have the right infrastructure in place in order to manage and govern their data for what they're trying to do from a business perspective. >>And I add that as well. I think you're talking a lot about data mesh architectures, right? That, that are really kind of popular right now. And I think those kind of, they live or die on, on data governance. Right? If you don't have data governance to share taxonomy, these things, it's very hard to, I think, scale those individual working groups. But if you have a platform where they, the data owners can publish out visibility to what their data means, how to use it, how to interpret it and get that insight, that context directly to the data consumers that's game changing. Right. And that's exactly what we're doing with our cloud data governance and catalog. >>Well, the data mesh, you talk about data mesh, there's four principles, right? It's like decentralized architecture data products. So if, once you figure out those two yep. You just created two more problems, which is the other two parts of the Princip four, two parts of the four principles, self service infrastructure, and computational governance. And that's like the hardest part of federated, federated, computational governance. That's the hardest part. That's the problem that you're solving. >>Yeah. Yeah, absolutely. I mean, think about the whole decentralization and self-service, well, I may be able to access my data in mesh architecture, but if I don't know what it means, how to use it for what purpose, when not to use it, you're creating more problems than what you originally expected to solve. So what we're doing is addressing the data management and the governance requirements, regardless of what the architecture is, whether it's a mesh architecture, a fabric architecture or a traditional data lake or a data store. >>Yeah. Mean, I say, I think data mesh is more of an organizational construct than it is. I, I'm not quite sure what data fabric is. I think Gartner confused the issue that data fabric was an old NetApp term. Yeah. You're probably working in NetApp at the time and it made sense in the NetApp context. And then I think Gartner didn't like the fact that Jamma Dani co-opted this cool term. So they created data fabric, but whatever. But my, my point being, I think when I talk to customers that are they're, they're trying to get more value outta data and they recognize that going through all these hyper specialized roles is time consuming and it's not working for them. And they're frustrated to your points and your joint statement. They want to accelerate that. And they're realizing, and the only way to do that is to distribute responsibility, get more people involved in the process. >>And, and that's, it kind of dovetails with some, the announcements we made on data governance for snowflake, right, is you're taking these, these operational controls of the snowflake layer that are typically managed by SQL and you, and that decentralized architecture data owner doesn't know how to set those patterns and things like that. Right. So we're saying, all right, we're, we're creating these deep integration so that again, we have a fit for persona type experience where they can publish data assets, they can set the rules and policies, and we're gonna push that down to snowflake. So when it actually comes to provisioning data and doing data sharing through snowflake, it's all a seamless experience for the end user and the data owner. Yeah. >>That's great. Beautiful, >>Seamless experience absolutely necessary these days for everybody above guys. Thanks so much for joining David me today, talking about Informatica what's new, what you're doing with snowflake and what you're enabling customers to do in terms of really extracting value from that data. We appreciate your insights. >>Thank you. Yep. >>Thank you for having us >>For our guests and Dave ante. I'm Lisa Martin. You're watching the cubes coverage of snowflake summit day two of the cubes coverage stick around Dave. And I will be right back with our next guest.
SUMMARY :
Welcome back to the cube. Thank you guys. Talk to us about what some of the trends are that you're seeing in the financial services Use data has to be accessible in the systems and applications you use to run your business. So organizations need to have the ability to understand what Are you seeing increasingly that line of businesses are leaning in owning the data, be building data And of course the investments to ensure that business can make the decisions and take the appropriate actions. all of the requirements on data for them to be able to extract that business value and create new share it across to applications that run their business, be able to identify and deal with data, So how has that affected the way Informatica does business over the last several years? happening at the departmental level, you know, individual contributors, that sort of thing. if I don't have the governance, I'm, I'm afraid to use it. So it's really important that So data loader, what does that do? We say, no, the data loader it, it's a free utility that we announced here at, I mean, you guys are, are known, you have a long history of, you know, But it's really pushing down to the snowflake data cloud as opposed to managing the entire data management and data governance process so that they can allow the business to get value How quickly can you enable a business to get value from that data to be able to make business At the end of the day, you need to understand why are customer data, is it product data that dramatically shortens the time to find data assets deliver them? I think Rick, so you have this joint value statement together. We automate the process to get them to a fully cloud native, So our customers don't have to deal with the costs and the risks of trying to make everything work behind And then they'll try to figure out, you know, some algorithms to, to determine data quality, So what we're doing with AI machine learning is really helping the data professional, And that's exactly what we're doing with our cloud data governance and catalog. Well, the data mesh, you talk about data mesh, there's four principles, right? how to use it for what purpose, when not to use it, you're creating more problems than what you originally expected And they're frustrated to your points and your joint statement. So when it actually comes to provisioning data and doing data sharing through snowflake, it's all a seamless experience for the end user and the data owner. That's great. We appreciate your insights. Thank you. And I will be right back with our next guest.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Lisa Martin | PERSON | 0.99+ |
Rick | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
three days | QUANTITY | 0.99+ |
Rik Tamm Daniels | PERSON | 0.99+ |
two parts | QUANTITY | 0.99+ |
Peter Ku | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
last year | DATE | 0.99+ |
30% | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
NetApp | TITLE | 0.99+ |
Rick TA Daniels | PERSON | 0.99+ |
three simple steps | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
first step | QUANTITY | 0.98+ |
10 years ago | DATE | 0.98+ |
first | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
80, 84% | QUANTITY | 0.97+ |
IDMC | ORGANIZATION | 0.97+ |
Snowflake Summit 2022 | EVENT | 0.94+ |
about a year and a half ago | DATE | 0.94+ |
two more problems | QUANTITY | 0.93+ |
Princip four | OTHER | 0.93+ |
four principles | QUANTITY | 0.91+ |
G P | ORGANIZATION | 0.9+ |
two big areas | QUANTITY | 0.89+ |
single pane | QUANTITY | 0.89+ |
one single solution | QUANTITY | 0.87+ |
day two | QUANTITY | 0.87+ |
years | DATE | 0.85+ |
Wallace | PERSON | 0.85+ |
One | QUANTITY | 0.85+ |
one cloud | QUANTITY | 0.83+ |
IDC | ORGANIZATION | 0.83+ |
two of our guests | QUANTITY | 0.8+ |
two big trend areas | QUANTITY | 0.79+ |
Jamma Dani | PERSON | 0.79+ |
Dave ante | PERSON | 0.77+ |
COO | PERSON | 0.77+ |
about | DATE | 0.75+ |
every single user | QUANTITY | 0.71+ |
zeros | QUANTITY | 0.69+ |
SQL | TITLE | 0.68+ |
last | DATE | 0.67+ |
once | QUANTITY | 0.58+ |
Hado | TITLE | 0.52+ |
vice | PERSON | 0.51+ |
ones | QUANTITY | 0.5+ |
summit 22 | LOCATION | 0.44+ |
Hadoop | EVENT | 0.37+ |
Day 1 Keynote Analysis | Snowflake Summit 2022
>>Good morning live from Las Vegas, Lisa Martin and Dave Lanta here covering snowflake summit 22. Dave, it's great to be here in person. The keynote we just came from was standing room only. In fact, there was overflow. People are excited to be back and to hear from the company in person the first time, since the IPO, >>Lots of stuff, lots of deep technical dives, uh, you know, they took the high end of the pyramid and then dove down deep in the keynotes. It >>Was good. They did. And we've got Doug Hench with us to break this down in the next eight to 10 minutes, VP and principle analyst at constellation research. Doug, welcome to the cube. >>Great to be here. >>All right, so guys, I was telling Dave, as we were walking back from the keynote, this was probably the most technical keynote I've seen in a very long time. Obviously in person let's break down some of the key announcements. What were some of the things Dave that stood out to you and what they announced just in the last hour and a half alone? >>Well, I, you know, we had a leave before they did it, but the unit store piece was really interesting to me cuz you know, the big criticism is, oh, say snowflake, that doesn't do transaction data. It's just a data warehouse. And now they're sort of reaching out. We're seeing the evolution of the ecosystem. Uh, sluman said it was by design. It was one of the questions I had for them. Is this just kind of happen or is it by design? So that's one of many things that, that we can unpack. I mean the security workload, uh, the, the Apache tables, we were just talking about thatt, which not a lot of hands went up when they said, who uses Apache tables, but, but a lot of the things they're doing seem to me anyway, to be trying to counteract the narrative, that snow, I mean that data bricks is put out there about you guys. Aren't open, you're a walled garden and now they're saying, Hey, we're we're as open as anybody, but what are your thoughts, Doug? >>Well, that's the, the iceberg announcement, uh, also, uh, the announcement of, of uni store being able to reach out to, to any source. Uh, you know, I think the big theme here was this, this contrast you constantly see with snowflake between their effort to democratize and simplify and disrupt the market by bringing in a great big tent. And you saw that great big tent here today, 7,000 people, 2,007,000 plus I'm told 2000 just three years ago. So this company is growing hugely quickly, >>Unprecedented everybody. >>Yeah. Uh, fastest company to a billion in revenue is Frank Salman said in his keynote today. Um, you know, and I think that there's, there's that great big tent. And then there's the innovations they're delivering. And a lot of their announcements are way ahead of the J general availability. A lot of the things they talked about today, Python support and some, some other aspects they're just getting into public preview. And many of the things that they're announcing today are in private preview. So it could be six, 12 months be before they're generally available. So they're here educating a lot of these customers. What is iceberg? You know, they're letting them know about, Hey, we're not just the data warehouse. We're not just letting you migrate your old workloads into the cloud. We're helping you innovate with things like the data marketplace. I see the data marketplace is really crucial to a lot of the announcements they're making today. Particularly the native apps, >>You know, what was interesting sluman in his keynote said we don't use the term data mesh, cuz that means has meaning to the people, lady from Geico stood up and said, we're building a data mesh. And when you think about, you know, the, those Gemma Dani's definition of data mesh, Snowflake's actually ticking a lot of boxes. I mean, it's it's is it a decentralized architecture? You could argue that it's sort of their own wall garden, but things like data as product we heard about building data products, uh, uh, self-serve infrastructure, uh, computational governance, automated governance. So those are all principles of Gemma's data mesh. So I there's close as anybody that, that I've seen with the exception of it's all in the data cloud. >>Why do you think he was very particular in saying we're not gonna call it a data mesh? I, >>I think he's respecting the principles that have been put forth by the data mesh community generally and specifically Jamma Dani. Uh, and they don't want to, you know, they don't want to data mesh wash. I mean, I, I, I think that's a good call. >>Yeah, that's it's a little bit out there and, and it, they didn't talk about data mesh so much as Geico, uh, the keynote or mentioned their building one. So again, they have this mix of the great big tent of customers and then very forward looking very sophisticated customers. And that's who they're speaking to with some of these announcements, like the native apps and the uni store to bring transactional data, bring more data in and innovate, create new apps. And the key to the apps is that they're made available through the marketplace. Things like data sharing. That's pretty simple. A lot of, uh, of their competitors are talking about, Hey, we can data share, but they don't have the things that make it easy, like the way to distribute the data, the way to monetize the data. So now they're looking forward monetizing apps, they changed the name from the data marketplace to the, to the snowflake marketplace. So it'll be apps. It will be data. It'll be all sorts of innovative products. >>We talk about Geico, uh, JPMC is speaking at this conference, uh, and the lead technical person of their data mesh initiative. So it's like, they're some of their customers that they're putting forth. So it's kind of interesting. And then Doug, something else that you and I have talked about on the, some of the panels that we've done is you've got an application development stack, you got the database over there and then you have the data analytics stack and we've, I've said, well, those things come together. Then people have said, yeah, they have to. And this is what snowflake seems to be driving towards. >>Well with uni store, they're reaching out and trying to bring transactional data in, right? Hey, don't limit this to analytical information. And there's other ways to do that, like CDC and streaming, but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So a, another reach to a broader play across a big community that they're >>Building different than what we saw last week at Mongo, different than what you know, Oracle does with, with heat wave. A lot of ways to skin a cat. >>That was gonna be my next question to both of you is talk to me about all the announcements that we saw. And, and like we said, we didn't actually get to see the entire keynote had come back here. Where are they from a differentiation perspective in terms of the competitive market? You mentioned Doug, a lot of the announcements in either private preview or soon to be public preview early. Talk to me about your thoughts where they are from a competitive standpoint. >>Again, it's that dichotomy between their very forward looking announcements. They're just coming on with things like Python support. That's just becoming generally available. They're just introducing, uh, uh, machine learning algorithms, like time series built into the database. So in some ways they're catching up while painting this vision of future capabilities and talking about things that are in development or in private preview that won't be here for a year or two, but they're so they're out there, uh, talking about a BLE bleeding edge story yet the reality is the product sometimes are lagging behind. Yeah, >>It's interesting. I mean, they' a lot of companies choose not to announce anything until it's ready to ship. Yeah. Typically that's a technique used by the big whales to try to freeze the market, but I think it's different here. And the strategy is to educate customers on what's possible because snowflake really does have, you know, they're trying to differentiate from, Hey, we're not just a data warehouse. We have a highly differentiatable strategy from whether it's Oracle or certainly, you know, Mongo is more transactional, but, but you know, whether it's couch base or Redis or all the other databases out there, they're saying we're not a database, we're a data cloud. <laugh> right. Right. Okay. What is that? Well, look at all the things that you can do with the data cloud, but to me, the most interesting is you can actually build data products and you can monetize that. And their, the emphasis on ecosystem, you, they look at Salman's previous company would ServiceNow took a long time for them to build an ecosystem. It was a lot of SI in smaller SI and they finally kind of took off, but this is exceeding my expectations and ecosystem is critical because they can't do it all. You know, they're gonna O otherwise they're gonna spread themselves to >>That. That's what I think some competitors just don't get about snowflake. They don't get that. It's all about the community, about their network that they're building and the relationships between these customers. And that they're facilitating that with distribution, with monetization, things that are hard. So you can't just add sharing, or you can share data from one of their, uh, legacy competitors, uh, in, in somebody else's marketplace that doesn't facilitate the transaction that doesn't, you know, build on the community. Well, >>And you know, one of the criticisms too, of the criticism on snowflake goes, they don't, you know, they can't do complex joins. They don't do workload management. And I think their answer to that is, well, we're gonna look to the ecosystem to do that. Or you, you saw some kind of, um, cost governance today in the, in the keynote, we're gonna help you optimize your spend, um, a little different than workload management, but related >>Part of their governance was having a, a, a node, uh, for every workload. So workload isolation in that way, but that led to the cost problems, you know, like too many nodes with not enough optimization. So here too, you saw a lot of, uh, announcements around cost controls, budgets, new features, uh, user groups that you could bring, uh, caps and guardrails around those costs. >>In the last couple minutes, guys talk about their momentum. Franks Lutman showed a slide today that showed over 5,900 customers. I was looking at some stats, uh, in the last couple of days that showed that there is an over 1200% increase in the number of customers with a million plus ARR. Talk about their momentum, what you expect to see here. A lot of people here, people are ready to hear what they're doing in person. >>Well, I think this, the stats say it all, uh, fastest company to a, to a billion in revenue. Uh, you see the land and expand experience that many companies have and in the cost control, uh, announcements they were making, they showed the typical curve like, and he talked about it being a roller coaster, and we wanna help you level that out. Uh, so that's, uh, a matter of maturation. Uh, that's one of the downsides of this rapid growth. You know, you have customers adding new users, adding new clusters, multi clusters, and the costs get outta control. They want to help customers even that out, uh, with reporting with these budget and cost control measures. So, uh, one of the growing pains that comes with, uh, adding so many customers so quickly, and those customers adding so many users and new, uh, workloads quickly, >>I know we gotta break, but last point I'll make about the key. Uh, keynote is SL alluded to the fact that they're not taking the foot off the gas. They don't see any reason to, despite the narrative in the press, they have inherent profitability. If they want to be more profitable, they could be, but they're going for growth >>Going for growth. There is so much to unpack in the next three days. You won't wanna miss it. The Cube's wall to oil coverage, Lisa Martin for Dave Valenti, Doug hen joined us in our keynote analysis. Thanks so much for walking, watching stick around. Our first guest is up in just a few minutes.
SUMMARY :
22. Dave, it's great to be here in person. Lots of stuff, lots of deep technical dives, uh, you know, they took the high end of the pyramid and then dove down deep And we've got Doug Hench with us to break this down in the next eight to 10 minutes, stood out to you and what they announced just in the last hour and a half alone? but, but a lot of the things they're doing seem to me anyway, to be trying to counteract the narrative, Uh, you know, I think the big theme here was this, And many of the things that they're announcing today are in private preview. And when you think about, you know, the, those Gemma Dani's definition of data mesh, Uh, and they don't want to, you know, And the key to the apps is that they're made available through the marketplace. And then Doug, something else that you and I have talked about on the, some of the panels that we've done is you've So a, another reach to a broader play across a big community that Building different than what we saw last week at Mongo, different than what you know, Oracle does with, That was gonna be my next question to both of you is talk to me about all the announcements that we saw. into the database. Well, look at all the things that you can do with the data cloud, but to me, the most interesting is you So you can't just add sharing, or you can share data from one of their, And you know, one of the criticisms too, of the criticism on snowflake goes, they don't, you know, they can't do complex joins. new features, uh, user groups that you could bring, uh, A lot of people here, people are ready to hear what they're doing they showed the typical curve like, and he talked about it being a roller coaster, and we wanna help you level that Uh, keynote is SL alluded to the fact that they're There is so much to unpack in the next three days.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Lisa Martin | PERSON | 0.99+ |
Dave Lanta | PERSON | 0.99+ |
Frank Salman | PERSON | 0.99+ |
Doug Hench | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Dave Valenti | PERSON | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
2,007,000 | QUANTITY | 0.99+ |
7,000 people | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
six | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
a year | QUANTITY | 0.99+ |
Salman | PERSON | 0.99+ |
one | QUANTITY | 0.99+ |
Doug hen | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
first guest | QUANTITY | 0.99+ |
2000 | DATE | 0.99+ |
over 5,900 customers | QUANTITY | 0.99+ |
Jamma Dani | PERSON | 0.98+ |
three years ago | DATE | 0.98+ |
over 1200% | QUANTITY | 0.98+ |
first time | QUANTITY | 0.98+ |
Geico | ORGANIZATION | 0.98+ |
Mongo | ORGANIZATION | 0.98+ |
JPMC | ORGANIZATION | 0.97+ |
10 minutes | QUANTITY | 0.97+ |
sluman | PERSON | 0.96+ |
Gemma Dani | PERSON | 0.96+ |
Day 1 | QUANTITY | 0.96+ |
Snowflake Summit 2022 | EVENT | 0.95+ |
two | QUANTITY | 0.95+ |
eight | QUANTITY | 0.89+ |
ServiceNow | ORGANIZATION | 0.87+ |
12 months | QUANTITY | 0.86+ |
a million plus | QUANTITY | 0.84+ |
Apache | ORGANIZATION | 0.81+ |
Gemma | PERSON | 0.78+ |
a billion | QUANTITY | 0.76+ |
next three days | DATE | 0.72+ |
CDC | ORGANIZATION | 0.71+ |
Redis | ORGANIZATION | 0.7+ |
Franks Lutman | PERSON | 0.67+ |
last couple of days | DATE | 0.66+ |
summit 22 | LOCATION | 0.64+ |
uni store | ORGANIZATION | 0.63+ |
and a half | QUANTITY | 0.53+ |
Cube | ORGANIZATION | 0.52+ |
last hour | DATE | 0.52+ |
uni | TITLE | 0.51+ |
store | ORGANIZATION | 0.47+ |
billion | QUANTITY | 0.46+ |
Snowflake | TITLE | 0.39+ |
couple | DATE | 0.36+ |