Bob Muglia, George Gilbert & Tristan Handy | How Supercloud will Support a new Class of Data Apps
(upbeat music) >> Hello, everybody. This is Dave Vellante. Welcome back to Supercloud2, where we're exploring the intersection of data analytics and the future of cloud. In this segment, we're going to look at how the Supercloud will support a new class of applications, not just work that runs on multiple clouds, but rather a new breed of apps that can orchestrate things in the real world. Think Uber for many types of businesses. These applications, they're not about codifying forms or business processes. They're about orchestrating people, places, and things in a business ecosystem. And I'm pleased to welcome my colleague and friend, George Gilbert, former Gartner Analyst, Wiki Bond market analyst, former equities analyst as my co-host. And we're thrilled to have Tristan Handy, who's the founder and CEO of DBT Labs and Bob Muglia, who's the former President of Microsoft's Enterprise business and former CEO of Snowflake. Welcome all, gentlemen. Thank you for coming on the program. >> Good to be here. >> Thanks for having us. >> Hey, look, I'm going to start actually with the SuperCloud because both Tristan and Bob, you've read the definition. Thank you for doing that. And Bob, you have some really good input, some thoughts on maybe some of the drawbacks and how we can advance this. So what are your thoughts in reading that definition around SuperCloud? >> Well, I thought first of all that you did a very good job of laying out all of the characteristics of it and helping to define it overall. But I do think it can be tightened a bit, and I think it's helpful to do it in as short a way as possible. And so in the last day I've spent a little time thinking about how to take it and write a crisp definition. And here's my go at it. This is one day old, so gimme a break if it's going to change. And of course we have to follow the industry, and so that, and whatever the industry decides, but let's give this a try. So in the way I think you're defining it, what I would say is a SuperCloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers. >> Boom. Nice. Okay, great. I'm going to go back and read the script on that one and tighten that up a bit. Thank you for spending the time thinking about that. Tristan, would you add anything to that or what are your thoughts on the whole SuperCloud concept? >> So as I read through this, I fully realize that we need a word for this thing because I have experienced the inability to talk about it as well. But for many of us who have been living in the Confluence, Snowflake, you know, this world of like new infrastructure, this seems fairly uncontroversial. Like I read through this, and I'm just like, yeah, this is like the world I've been living in for years now. And I noticed that you called out Snowflake for being an example of this, but I think that there are like many folks, myself included, for whom this world like fully exists today. >> Yeah, I think that's a fair, I dunno if it's criticism, but people observe, well, what's the big deal here? It's just kind of what we're living in today. It reminds me of, you know, Tim Burns Lee saying, well, this is what the internet was supposed to be. It was supposed to be Web 2.0, so maybe this is what multi-cloud was supposed to be. Let's turn our attention to apps. Bob first and then go to Tristan. Bob, what are data apps to you? When people talk about data products, is that what they mean? Are we talking about something more, different? What are data apps to you? >> Well, to understand data apps, it's useful to contrast them to something, and I just use the simple term people apps. I know that's a little bit awkward, but it's clear. And almost everything we work with, almost every application that we're familiar with, be it email or Salesforce or any consumer app, those are applications that are targeted at responding to people. You know, in contrast, a data application reacts to changes in data and uses some set of analytic services to autonomously take action. So where applications that we're familiar with respond to people, data apps respond to changes in data. And they both do something, but they do it for different reasons. >> Got it. You know, George, you and I were talking about, you know, it comes back to SuperCloud, broad definition, narrow definition. Tristan, how do you see it? Do you see it the same way? Do you have a different take on data apps? >> Oh, geez. This is like a conversation that I don't know has an end. It's like been, I write a substack, and there's like this little community of people who all write substack. We argue with each other about these kinds of things. Like, you know, as many different takes on this question as you can find, but the way that I think about it is that data products are atomic units of functionality that are fundamentally data driven in nature. So a data product can be as simple as an interactive dashboard that is like actually had design thinking put into it and serves a particular user group and has like actually gone through kind of a product development life cycle. And then a data app or data application is a kind of cohesive end-to-end experience that often encompasses like many different data products. So from my perspective there, this is very, very related to the way that these things are produced, the kinds of experiences that they're provided, that like data innovates every product that we've been building in, you know, software engineering for, you know, as long as there have been computers. >> You know, Jamak Dagani oftentimes uses the, you know, she doesn't name Spotify, but I think it's Spotify as that kind of example she uses. But I wonder if we can maybe try to take some examples. If you take, like George, if you take a CRM system today, you're inputting leads, you got opportunities, it's driven by humans, they're really inputting the data, and then you got this system that kind of orchestrates the business process, like runs a forecast. But in this data driven future, are we talking about the app itself pulling data in and automatically looking at data from the transaction systems, the call center, the supply chain and then actually building a plan? George, is that how you see it? >> I go back to the example of Uber, may not be the most sophisticated data app that we build now, but it was like one of the first where you do have users interacting with their devices as riders trying to call a car or driver. But the app then looks at the location of all the drivers in proximity, and it matches a driver to a rider. It calculates an ETA to the rider. It calculates an ETA then to the destination, and it calculates a price. Those are all activities that are done sort of autonomously that don't require a human to type something into a form. The application is using changes in data to calculate an analytic product and then to operationalize that, to assign the driver to, you know, calculate a price. Those are, that's an example of what I would think of as a data app. And my question then I guess for Tristan is if we don't have all the pieces in place for sort of mainstream companies to build those sorts of apps easily yet, like how would we get started? What's the role of a semantic layer in making that easier for mainstream companies to build? And how do we get started, you know, say with metrics? How does that, how does that take us down that path? >> So what we've seen in the past, I dunno, decade or so, is that one of the most successful business models in infrastructure is taking hard things and rolling 'em up behind APIs. You take messaging, you take payments, and you all of a sudden increase the capability of kind of your median application developer. And you say, you know, previously you were spending all your time being focused on how do you accept credit cards, how do you send SMS payments, and now you can focus on your business logic, and just create the thing. One of, interestingly, one of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse, inside of your data lake. These are core concepts that, you know, you would imagine that the business would be able to create applications around very easily, but in fact that's not the case. It's actually quite challenging to, and involves a lot of data engineering pipeline and all this work to make these available. And so if you really want to make it very easy to create some of these data experiences for users, you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes, and they don't need to. >> So how rich can that API layer grow if you start with metric definitions that you've defined? And DBT has, you know, the metric, the dimensions, the time grain, things like that, that's a well scoped sort of API that people can work within. How much can you extend that to say non-calculated business rules or governance information like data reliability rules, things like that, or even, you know, features for an AIML feature store. In other words, it starts, you started pragmatically, but how far can you grow? >> Bob is waiting with bated breath to answer this question. I'm, just really quickly, I think that we as a company and DBT as a product tend to be very pragmatic. We try to release the simplest possible version of a thing, get it out there, and see if people use it. But the idea that, the concept of a metric is really just a first landing pad. The really, there is a physical manifestation of the data and then there's a logical manifestation of the data. And what we're trying to do here is make it very easy to access the logical manifestation of the data, and metric is a way to look at that. Maybe an entity, a customer, a user is another way to look at that. And I'm sure that there will be more kind of logical structures as well. >> So, Bob, chime in on this. You know, what's your thoughts on the right architecture behind this, and how do we get there? >> Yeah, well first of all, I think one of the ways we get there is by what companies like DBT Labs and Tristan is doing, which is incrementally taking and building on the modern data stack and extending that to add a semantic layer that describes the data. Now the way I tend to think about this is a fairly major shift in the way we think about writing applications, which is today a code first approach to moving to a world that is model driven. And I think that's what the big change will be is that where today we think about data, we think about writing code, and we use that to produce APIs as Tristan said, which encapsulates those things together in some form of services that are useful for organizations. And that idea of that encapsulation is never going to go away. It's very, that concept of an API is incredibly useful and will exist well into the future. But what I think will happen is that in the next 10 years, we're going to move to a world where organizations are defining models first of their data, but then ultimately of their business process, their entire business process. Now the concept of a model driven world is a very old concept. I mean, I first started thinking about this and playing around with some early model driven tools, probably before Tristan was born in the early 1980s. And those tools didn't work because the semantics associated with executing the model were too complex to be written in anything other than a procedural language. We're now reaching a time where that is changing, and you see it everywhere. You see it first of all in the world of machine learning and machine learning models, which are taking over more and more of what applications are doing. And I think that's an incredibly important step. And learned models are an important part of what people will do. But if you look at the world today, I will claim that we've always been modeling. Modeling has existed in computers since there have been integrated circuits and any form of computers. But what we do is what I would call implicit modeling, which means that it's the model is written on a whiteboard. It's in a bunch of Slack messages. It's on a set of napkins in conversations that happen and during Zoom. That's where the model gets defined today. It's implicit. There is one in the system. It is hard coded inside application logic that exists across many applications with humans being the glue that connects those models together. And really there is no central place you can go to understand the full attributes of the business, all of the business rules, all of the business logic, the business data. That's going to change in the next 10 years. And we'll start to have a world where we can define models about what we're doing. Now in the short run, the most important models to build are data models and to describe all of the attributes of the data and their relationships. And that's work that DBT Labs is doing. A number of other companies are doing that. We're taking steps along that way with catalogs. People are trying to build more complete ontologies associated with that. The underlying infrastructure is still super, super nascent. But what I think we'll see is this infrastructure that exists today that's building learned models in the form of machine learning programs. You know, some of these incredible machine learning programs in foundation models like GPT and DALL-E and all of the things that are happening in these global scale models, but also all of that needs to get applied to the domains that are appropriate for a business. And I think we'll see the infrastructure developing for that, that can take this concept of learned models and put it together with more explicitly defined models. And this is where the concept of knowledge graphs come in and then the technology that underlies that to actually implement and execute that, which I believe are relational knowledge graphs. >> Oh, oh wow. There's a lot to unpack there. So let me ask the Colombo question, Tristan, we've been making fun of your youth. We're just, we're just jealous. Colombo, I'll explain it offline maybe. >> I watch Colombo. >> Okay. All right, good. So but today if you think about the application stack and the data stack, which is largely an analytics pipeline. They're separate. Do they, those worlds, do they have to come together in order to achieve Bob's vision? When I talk to practitioners about that, they're like, well, I don't want to complexify the application stack cause the data stack today is so, you know, hard to manage. But but do those worlds have to come together? And you know, through that model, I guess abstraction or translation that Bob was just describing, how do you guys think about that? Who wants to take that? >> I think it's inevitable that data and AI are going to become closer together? I think that the infrastructure there has been moving in that direction for a long time. Whether you want to use the Lakehouse portmanteau or not. There's also, there's a next generation of data tech that is still in the like early stage of being developed. There's a company that I love that is essentially Cross Cloud Lambda, and it's just a wonderful abstraction for computing. So I think that, you know, people have been predicting that these worlds are going to come together for awhile. A16Z wrote a great post on this back in I think 2020, predicting this, and I've been predicting this since since 2020. But what's not clear is the timeline, but I think that this is still just as inevitable as it's been. >> Who's that that does Cross Cloud? >> Let me follow up on. >> Who's that, Tristan, that does Cross Cloud Lambda? Can you name names? >> Oh, they're called Modal Labs. >> Modal Labs, yeah, of course. All right, go ahead, George. >> Let me ask about this vision of trying to put the semantics or the code that represents the business with the data. It gets us to a world that's sort of more data centric, where data's not locked inside or behind the APIs of different applications so that we don't have silos. But at the same time, Bob, I've heard you talk about building the semantics gradually on top of, into a knowledge graph that maybe grows out of a data catalog. And the vision of getting to that point, essentially the enterprise's metadata and then the semantics you're going to add onto it are really stored in something that's separate from the underlying operational and analytic data. So at the same time then why couldn't we gradually build semantics beyond the metric definitions that DBT has today? In other words, you build more and more of the semantics in some layer that DBT defines and that sits above the data management layer, but any requests for data have to go through the DBT layer. Is that a workable alternative? Or where, what type of limitations would you face? >> Well, I think that it is the way the world will evolve is to start with the modern data stack and, you know, which is operational applications going through a data pipeline into some form of data lake, data warehouse, the Lakehouse, whatever you want to call it. And then, you know, this wide variety of analytics services that are built together. To the point that Tristan made about machine learning and data coming together, you see that in every major data cloud provider. Snowflake certainly now supports Python and Java. Databricks is of course building their data warehouse. Certainly Google, Microsoft and Amazon are doing very, very similar things in terms of building complete solutions that bring together an analytics stack that typically supports languages like Python together with the data stack and the data warehouse. I mean, all of those things are going to evolve, and they're not going to go away because that infrastructure is relatively new. It's just being deployed by companies, and it solves the problem of working with petabytes of data if you need to work with petabytes of data, and nothing will do that for a long time. What's missing is a layer that understands and can model the semantics of all of this. And if you need to, if you want to model all, if you want to talk about all the semantics of even data, you need to think about all of the relationships. You need to think about how these things connect together. And unfortunately, there really is no platform today. None of our existing platforms are ultimately sufficient for this. It was interesting, I was just talking to a customer yesterday, you know, a large financial organization that is building out these semantic layers. They're further along than many companies are. And you know, I asked what they're building it on, and you know, it's not surprising they're using a, they're using combinations of some form of search together with, you know, textual based search together with a document oriented database. In this case it was Cosmos. And that really is kind of the state of the art right now. And yet those products were not built for this. They don't really, they can't manage the complicated relationships that are required. They can't issue the queries that are required. And so a new generation of database needs to be developed. And fortunately, you know, that is happening. The world is developing a new set of relational algorithms that will be able to work with hundreds of different relations. If you look at a SQL database like Snowflake or a big query, you know, you get tens of different joins coming together, and that query is going to take a really long time. Well, fortunately, technology is evolving, and it's possible with new join algorithms, worst case, optimal join algorithms they're called, where you can join hundreds of different relations together and run semantic queries that you simply couldn't run. Now that technology is nascent, but it's really important, and I think that will be a requirement to have this semantically reach its full potential. In the meantime, Tristan can do a lot of great things by building up on what he's got today and solve some problems that are very real. But in the long run I think we'll see a new set of databases to support these models. >> So Tristan, you got to respond to that, right? You got to, so take the example of Snowflake. We know it doesn't deal well with complex joins, but they're, they've got big aspirations. They're building an ecosystem to really solve some of these problems. Tristan, you guys are part of that ecosystem, and others, but please, your thoughts on what Bob just shared. >> Bob, I'm curious if, I would have no idea what you were talking about except that you introduced me to somebody who gave me a demo of a thing and do you not want to go there right now? >> No, I can talk about it. I mean, we can talk about it. Look, the company I've been working with is Relational AI, and they're doing this work to actually first of all work across the industry with academics and research, you know, across many, many different, over 20 different research institutions across the world to develop this new set of algorithms. They're all fully published, just like SQL, the underlying algorithms that are used by SQL databases are. If you look today, every single SQL database uses a similar set of relational algorithms underneath that. And those algorithms actually go back to system R and what IBM developed in the 1970s. We're just, there's an opportunity for us to build something new that allows you to take, for example, instead of taking data and grouping it together in tables, treat all data as individual relations, you know, a key and a set of values and then be able to perform purely relational operations on it. If you go back to what, to Codd, and what he wrote, he defined two things. He defined a relational calculus and relational algebra. And essentially SQL is a query language that is translated by the query processor into relational algebra. But however, the calculus of SQL is not even close to the full semantics of the relational mathematics. And it's possible to have systems that can do everything and that can store all of the attributes of the data model or ultimately the business model in a form that is much more natural to work with. >> So here's like my short answer to this. I think that we're dealing in different time scales. I think that there is actually a tremendous amount of work to do in the semantic layer using the kind of technology that we have on the ground today. And I think that there's, I don't know, let's say five years of like really solid work that there is to do for the entire industry, if not more. But the wonderful thing about DBT is that it's independent of what the compute substrate is beneath it. And so if we develop new platforms, new capabilities to describe semantic models in more fine grain detail, more procedural, then we're going to support that too. And so I'm excited about all of it. >> Yeah, so interpreting that short answer, you're basically saying, cause Bob was just kind of pointing to you as incremental, but you're saying, yeah, okay, we're applying it for incremental use cases today, but we can accommodate a much broader set of examples in the future. Is that correct, Tristan? >> I think you're using the word incremental as if it's not good, but I think that incremental is great. We have always been about applying incremental improvement on top of what exists today, but allowing practitioners to like use different workflows to actually make use of that technology. So yeah, yeah, we are a very incremental company. We're going to continue being that way. >> Well, I think Bob was using incremental as a pejorative. I mean, I, but to your point, a lot. >> No, I don't think so. I want to stop that. No, I don't think it's pejorative at all. I think incremental, incremental is usually the most successful path. >> Yes, of course. >> In my experience. >> We agree, we agree on that. >> Having tried many, many moonshot things in my Microsoft days, I can tell you that being incremental is a good thing. And I'm a very big believer that that's the way the world's going to go. I just think that there is a need for us to build something new and that ultimately that will be the solution. Now you can argue whether it's two years, three years, five years, or 10 years, but I'd be shocked if it didn't happen in 10 years. >> Yeah, so we all agree that incremental is less disruptive. Boom, but Tristan, you're, I think I'm inferring that you believe you have the architecture to accommodate Bob's vision, and then Bob, and I'm inferring from Bob's comments that maybe you don't think that's the case, but please. >> No, no, no. I think that, so Bob, let me put words into your mouth and you tell me if you disagree, DBT is completely useless in a world where a large scale cloud data warehouse doesn't exist. We were not able to bring the power of Python to our users until these platforms started supporting Python. Like DBT is a layer on top of large scale computing platforms. And to the extent that those platforms extend their functionality to bring more capabilities, we will also service those capabilities. >> Let me try and bridge the two. >> Yeah, yeah, so Bob, Bob, Bob, do you concur with what Tristan just said? >> Absolutely, I mean there's nothing to argue with in what Tristan just said. >> I wanted. >> And it's what he's doing. It'll continue to, I believe he'll continue to do it, and I think it's a very good thing for the industry. You know, I'm just simply saying that on top of that, I would like to provide Tristan and all of those who are following similar paths to him with a new type of database that can actually solve these problems in a much more architected way. And when I talk about Cosmos with something like Mongo or Cosmos together with Elastic, you're using Elastic as the join engine, okay. That's the purpose of it. It becomes a poor man's join engine. And I kind of go, I know there's a better answer than that. I know there is, but that's kind of where we are state of the art right now. >> George, we got to wrap it. So give us the last word here. Go ahead, George. >> Okay, I just, I think there's a way to tie together what Tristan and Bob are both talking about, and I want them to validate it, which is for five years we're going to be adding or some number of years more and more semantics to the operational and analytic data that we have, starting with metric definitions. My question is for Bob, as DBT accumulates more and more of those semantics for different enterprises, can that layer not run on top of a relational knowledge graph? And what would we lose by not having, by having the knowledge graph store sort of the joins, all the complex relationships among the data, but having the semantics in the DBT layer? >> Well, I think this, okay, I think first of all that DBT will be an environment where many of these semantics are defined. The question we're asking is how are they stored and how are they processed? And what I predict will happen is that over time, as companies like DBT begin to build more and more richness into their semantic layer, they will begin to experience challenges that customers want to run queries, they want to ask questions, they want to use this for things where the underlying infrastructure becomes an obstacle. I mean, this has happened in always in the history, right? I mean, you see major advances in computer science when the data model changes. And I think we're on the verge of a very significant change in the way data is stored and structured, or at least metadata is stored and structured. Again, I'm not saying that anytime in the next 10 years, SQL is going to go away. In fact, more SQL will be written in the future than has been written in the past. And those platforms will mature to become the engines, the slicer dicers of data. I mean that's what they are today. They're incredibly powerful at working with large amounts of data, and that infrastructure is maturing very rapidly. What is not maturing is the infrastructure to handle all of the metadata and the semantics that that requires. And that's where I say knowledge graphs are what I believe will be the solution to that. >> But Tristan, bring us home here. It sounds like, let me put pause at this, is that whatever happens in the future, we're going to leverage the vast system that has become cloud that we're talking about a supercloud, sort of where data lives irrespective of physical location. We're going to have to tap that data. It's not necessarily going to be in one place, but give us your final thoughts, please. >> 100% agree. I think that the data is going to live everywhere. It is the responsibility for both the metadata systems and the data processing engines themselves to make sure that we can join data across cloud providers, that we can join data across different physical regions and that we as practitioners are going to kind of start forgetting about details like that. And we're going to start thinking more about how we want to arrange our teams, how does the tooling that we use support our team structures? And that's when data mesh I think really starts to get very, very critical as a concept. >> Guys, great conversation. It was really awesome to have you. I can't thank you enough for spending time with us. Really appreciate it. >> Thanks a lot. >> All right. This is Dave Vellante for George Gilbert, John Furrier, and the entire Cube community. Keep it right there for more content. You're watching SuperCloud2. (upbeat music)
SUMMARY :
and the future of cloud. And Bob, you have some really and I think it's helpful to do it I'm going to go back and And I noticed that you is that what they mean? that we're familiar with, you know, it comes back to SuperCloud, is that data products are George, is that how you see it? that don't require a human to is that one of the most And DBT has, you know, the And I'm sure that there will be more on the right architecture is that in the next 10 years, So let me ask the Colombo and the data stack, which is that is still in the like Modal Labs, yeah, of course. and that sits above the and that query is going to So Tristan, you got to and that can store all of the that there is to do for the pointing to you as incremental, but allowing practitioners to I mean, I, but to your point, a lot. the most successful path. that that's the way the that you believe you have the architecture and you tell me if you disagree, there's nothing to argue with And I kind of go, I know there's George, we got to wrap it. and more of those semantics and the semantics that that requires. is that whatever happens in the future, and that we as practitioners I can't thank you enough John Furrier, and the
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Tristan | PERSON | 0.99+ |
George Gilbert | PERSON | 0.99+ |
John | PERSON | 0.99+ |
George | PERSON | 0.99+ |
Steve Mullaney | PERSON | 0.99+ |
Katie | PERSON | 0.99+ |
David Floyer | PERSON | 0.99+ |
Charles | PERSON | 0.99+ |
Mike Dooley | PERSON | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
Tristan Handy | PERSON | 0.99+ |
Bob | PERSON | 0.99+ |
Maribel Lopez | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Mike Wolf | PERSON | 0.99+ |
VMware | ORGANIZATION | 0.99+ |
Merim | PERSON | 0.99+ |
Adrian Cockcroft | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Brian | PERSON | 0.99+ |
Brian Rossi | PERSON | 0.99+ |
Jeff Frick | PERSON | 0.99+ |
Chris Wegmann | PERSON | 0.99+ |
Whole Foods | ORGANIZATION | 0.99+ |
Eric | PERSON | 0.99+ |
Chris Hoff | PERSON | 0.99+ |
Jamak Dagani | PERSON | 0.99+ |
Jerry Chen | PERSON | 0.99+ |
Caterpillar | ORGANIZATION | 0.99+ |
John Walls | PERSON | 0.99+ |
Marianna Tessel | PERSON | 0.99+ |
Josh | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
Jerome | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Lori MacVittie | PERSON | 0.99+ |
2007 | DATE | 0.99+ |
Seattle | LOCATION | 0.99+ |
10 | QUANTITY | 0.99+ |
five | QUANTITY | 0.99+ |
Ali Ghodsi | PERSON | 0.99+ |
Peter McKee | PERSON | 0.99+ |
Nutanix | ORGANIZATION | 0.99+ |
Eric Herzog | PERSON | 0.99+ |
India | LOCATION | 0.99+ |
Mike | PERSON | 0.99+ |
Walmart | ORGANIZATION | 0.99+ |
five years | QUANTITY | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Kit Colbert | PERSON | 0.99+ |
Peter | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Tanuja Randery | PERSON | 0.99+ |
Ash Naseer, Warner Bros. Discovery | Busting Silos With Monocloud
(vibrant electronic music) >> Welcome back to SuperCloud2. You know, this event, and the Super Cloud initiative in general, it's an open industry-wide collaboration. Last August at SuperCloud22, we really honed in on the definition, which of course we've published. And there's this shared doc, which folks are still adding to and refining, in fact, just recently, Dr. Nelu Mihai added some critical points that really advanced some of the community's initial principles, and today at SuperCloud2, we're digging further into the topic with input from real world practitioners, and we're exploring that intersection of data, data mesh, and cloud, and importantly, the realities and challenges of deploying technology to drive new business capability, and I'm pleased to welcome Ash Naseer to the program. He's a Senior Director of Data Engineering at Warner Bros. Discovery. Ash, great to see you again, thanks so much for taking time with us. >> It's great to be back, these conversations are always very fun. >> I was so excited when we met last spring, I guess, so before we get started I wanted to play a clip from that conversation, it was June, it was at the Snowflake Summit in Las Vegas. And it's a comment that you made about your company but also data mesh. Guys, roll the clip. >> Yeah, so, when people think of Warner Bros., you always think of the movie studio. But we're more than that, right, I mean, you think of HBO, you think of TNT, you think of CNN. We have 30 plus brands in our portfolio, and each have their own needs. So the idea of a data mesh really helps us because what we can do is we can federate access across the company, so that CNN can work at their own pace, you know, when there's election season, they can ingest their own data. And they don't have to bump up against, as an example, HBO, if Game of Thrones is goin' on. >> So-- Okay, so that's pretty interesting, so you've got these sort of different groups that have different data requirements inside of your organization. Now data mesh, it's a relatively new concept, so you're kind of ahead of the curve. So Ash, my question is, when you think about getting value from data, and how that's changed over the past decade, you've had pre-Hadoop, Hadoop, what do you see that's changed, now you got the cloud coming in, what's changed? What had to be sort of fixed? What's working now, and where do you see it going? >> Yeah, so I feel like in the last decade, we've gone through quite a maturity curve. I actually like to say that we're in the golden age of data, because the tools and technology in the data space, particularly and then broadly in the cloud, they allow us to do things that we couldn't do way back when, like you suggested, back in the Hadoop era or even before that. So there's certainly a lot of maturity, and a lot of technology that has come about. So in terms of the good, bad, and ugly, so let me kind of start with the good, right? In terms of bringing value from the data, I really feel like we're in this place where the folks that are charged with unlocking that value from the data, they're actually spending the majority of their time actually doing that. And what do I mean by that? If you think about it, 10 years ago, the data scientist was the person that was going to sort of solve all of the data problems in a company. But what happened was, companies asked these data scientists to come in and do a multitude of things. And what these data scientists found out was, they were spending most of their time on, really, data wrangling, and less on actually getting the value out of the data. And in the last decade or so, I feel like we've made the shift, and we realize that data engineering, data management, data governance, those are as important practices as data science, which is sort of getting the value out of the data. And so what that has done is, it has freed up the data scientist and the business analyst and the data analyst, and the BI expert, to really focus on how to get value out of the data, and spend less time wrangling data. So I really think that that's the good. In terms of the bad, I feel like, there's a lot of legacy data platforms out there, and I feel like there's going to be a time where we'll be in that hybrid mode. And then the ugly, I feel like, with all the data and all the technology, creates another problem of itself. Because most companies don't have arms around their data, and making sure that they know who's using the data, what they're using for, and how can the company leverage the collective intelligence. That is a bigger problem to solve today than 10 years ago. And that's where technologies like the data mesh come in. >> Yeah, so when I think of data mesh, and I say, you're an early practitioner of data mesh, you mentioned legacy technology, so the concept of data mesh is inclusive. In theory anyway, you're supposed to be including the legacy technologies. Whether it's a data lake or data warehouse or Oracle or Snowflake or whatever it is. And when you think about Jamak Dagani's principles, it's domain-centric ownership, data as product. And that creates challenges around self-serve infrastructure and automated governance, and then when you start to combine these different technologies. You got legacy, you got cloud. Everything's different. And so you have to figure out how to deal with that, so my question is, how have you dealt with that, and what role has the cloud played in solving those problems, in particular, that self-serve infrastructure, and that automated governance, and where are we in terms of solving that problem from a practitioner's standpoint? >> Yeah, I always like to say that data is a team sport, and we should sort of think of it as such, and that's, I feel like, the key of the data mesh concept, is treating it as a team sport. A lot of people ask me, they're like, "Oh hey, Ash, I've heard about this thing called data mesh. "Where can I buy one?" or, "what's the technology that I use to get a data mesh? And the reality is that there isn't one technology, you can't really buy a data mesh. It's really a way of life, it's how organizations decide to approach data, like I said, back to a team sport analogy, making sure that everyone has the seat on the table, making sure that we embrace the fact that we have a lot of data, we have a lot of data problems to solve. And the way we'll be successful is to make everyone inclusive. You know, you think about the old days, Data silos or shadow IT, some might call it. That's been around for decades. And what hasn't changed was this notion that, hey, everything needs to be sort of managed centrally. But with the cloud and with the technologies that we have today, we have the right technology and the tooling to democratize that data, and democratize not only just the access, but also sort of building building blocks and sort of taking building blocks which are relevant to your product or your business. And adding to the overall data mesh. We've got all that technology. The challenge is for us to really embrace it, and make sure that we implement it from an organizational standpoint. >> So, thinking about super cloud, there's a layer that lives above the clouds and adds value. And you think about your brands you got 30 brands, you mentioned shadow IT. If, let's say, one of those brands, HBO or TNT, whatever. They want to go, "Hey, we really like Google's analytics tools," and they maybe go off and build something, I don't know if that's even allowed, maybe it's not. But then you build this data mesh. My question is around multi-cloud, cross cloud, super cloud if you will. Is that a advantage for you as a practitioner, or does that just make things more complicated? >> I really love the idea of a multi-cloud. I think it's great, I think that it should have been the norm, not the exception, I feel like people talk about it as if it's the exception. That should have been the case. I will say, though, I feel like multi-cloud should evolve organically, so back to your point about some of these different brands, and, you know, different brands or different business units. Or even in a merger and acquisitions situation, where two different companies or multiple different companies come together with different technology stacks. You know, I feel like that's an organic evolution, and making sure that we use the concepts and the technologies around the multi-cloud to bring everyone together. That's where we need to be, and again, it talks to the fact that each of those business units and each of those groups have their own unique needs, and we need to make sure that we embrace that and we enable that, rather than stifling everything. Now where I have a little bit of a challenge with the multi-cloud is when technology leaders try to build it by design. So there's a notion there that, "Hey, you need to sort of diversify "and don't put all your eggs in one basket." And so we need to have this multi-cloud thing. I feel like that is just sort of creating more complexity where it doesn't need to be, we can all sort of simplify our lives, but where it evolves organically, absolutely, I think that's the right way to go. >> But, so Ash, if it evolves organically don't you need some kind of cloud interpreter, to create a common experience across clouds, does that exist today? What are your thoughts on that? >> There is a lot of technology that exists today, and that helps go between these different clouds, a lot of these sort of cloud agnostic technologies that you talked about, the Snowflakes and the Databricks and so forth of the world, they operate in multiple clouds, they operate in multiple regions, within a given cloud and multiple clouds. So they span all of that, and they have the tools and technology, so, I feel like the tooling is there. There does need to be more of an evolution around the tooling and I think the market's need are going to dictate that, I feel like the market is there, they're asking for it, so, there's definitely going to be that evolution, but the technology is there, I think just making sure that we embrace that and we sort of embrace that as a challenge and not try to sort of shut all of that down and box everything into one. >> What's the biggest challenge, is it governance or security? Or is it more like you're saying, adoption, cultural? >> I think it's a combination of cultural as well as governance. And so, the cultural side I've talked about, right, just making sure that we give these different teams a seat at the table, and they actually bring that technology into the mix. And we use the modern tools and technologies to make sure that everybody sort of plays nice together. That is definitely, we have ways to go there. But then, in terms of governance, that is another big problem that most companies are just starting to wrestle with. Because like I said, I mean, the data silos and shadow IT, that's been around there, right? The only difference is that we're now sort of bringing everything together in a cloud environment, the collective organization has access to that. And now we just realized, oh we have quite a data problem at our hands, so how do we sort of organize this data, make sure that the quality is there, the trust is there. When people look at that data, a lot of those questions are now coming to the forefront because everything is sort of so transparent with the cloud, right? And so I feel like, again, putting in the right processes, and the right tooling to address that is going to be critical in the next years to come. >> Is sharing data across clouds, something that is valuable to you, or even within a single cloud, being able to share data. And my question is, not just within your organization, but even outside your organization, is that something that has sort of hit your radar or is it mature or is that something that really would add value to your business? >> Data sharing is huge, and again, this is another one of those things which isn't new. You know, I remember back in the '90s, when we had to share data externally, with our partners or our vendors, they used to physically send us stacks of these tapes, or physical media on some truck. And we've evolved since then, right, I mean, it went from that to sharing files online and so forth. But data sharing as a concept and as a concept which is now very frictionless, through these different technologies that we have today, that is very new. And that is something, like I said, it's always been going on. But that needs to be really embraced more as well. We as a company heavily leverage data sharing between our own different brands and business units, that helps us make that data mesh, so that when CNN, as an example, builds their own data model based on election data and the kinds of data that they need, compare that with other data in the rest of the company, sports, entertainment, and so forth and so on. Everyone has their unique data, but that data sharing capability brings it together wherever there is a need. So you think about having a Tiger Woods documentary, as an example, on HBO Max and making sure that you reach the audiences that are interested in golf and interested in sports and so forth, right? That all comes through the magic of data sharing, so, it's really critical, internally, for us. And then externally as well, because just understanding how our products are doing on our partners' networks and different distribution channels, that's important, and then just understanding how our consumers are consuming it off properties, right, I mean, we have brands that transcend just the screen, right? We have a lot of physical merchandise that you can buy in the store. So again, understanding who's buying the Batman action figures after the Batman movie was released, that's another critical insight. So it all gets enabled through data sharing, and something we rely heavily on. >> So I wanted to get your perspective on this. So I feel like the nirvana of data mesh is if I want to use Google BigQuery, an Oracle database, or a Microsoft database, or Snowflake, Databricks, Amazon, whatever. That that's a node on the mesh. And in the perfect world, you can share that data, it can be governed, I don't think we're quite there today, so. But within a platform, maybe it's within Google or within Amazon or within Snowflake or Databricks. If you're in that world, maybe even Oracle. You actually can do some levels of data sharing, maybe greater with some than others. Do you mandate as an organization that you have to use this particular data platform, or are you saying "Hey, we are architecting a data mesh for the future "where we believe the technology will support that," or maybe you've invented some technology that supports that today, can you help us understand that? >> Yeah, I always feel like mandate is a strong area, and it breeds the shadow IT and the data silos. So we don't mandate, we do make sure that there's a consistent set of governance rules, policies, and tooling that's there, so that everyone is on the same page. However, at the same time our focus is really operating in a federated way, that's been our solution, right? Is to make sure that we work within a common set of tooling, which may be different technologies, which in some cases may be different clouds. Although we're not that multi-cloud. So what we're trying to do is making sure that everyone who has that technology already built, as long as it sort of follows certain standards, it's modern, it has the capabilities that will eventually allow us to be successful and eventually allow for that data sharing, amongst those different nodes, as you put it. As long as that's the case, and as long as there's a governance layer, a master governance layer, where we know where all that data is and who has access to what and we can sort of be really confident about the quality of the data, as long as that case, our approach to that is really that federated approach. >> Sorry, did I hear you correctly, you're not multi-cloud today? >> Yeah, that's correct. There are certain spots where we use that, but by and large, we rely on a particular cloud, and that's just been, like I said, it's been the evolution, it was our evolution. We decided early on to focus on a single cloud, and that's the direction we've been going in. >> So, do you want to go to a multi-cloud, or, you mentioned organic before, if a business unit wants to go there, as long as they're adhering to those standards that you put out, maybe recommendations, that that's okay? I guess my question is, does that bring benefit to your business that you'd like to tap, or do you feel like it's not necessary? >> I'll go back to the point of, if it happens organically, we're going to be open about it. Obviously we'll have to look at every situations, not all clouds are created equal as well, so there's a number of different considerations. But by and large, when it happens organically, the key is time to value, right? How do you quickly bring those technologies in, as long as you could share the data, they're interconnected, they're secured, they're governed, we are confident on the quality, as long as those principles are met, we could definitely go in that direction. But by and large, we're sort of evolving in a singular direction, but even within a singular cloud, we're a global company. And we have audiences around the world, so making sure that even within a single cloud, those different regions interoperate as one, that's a bigger challenge that we're having to solve as well. >> Last question is kind of to the future of data and cloud and how it's going to evolve, do you see a day when companies like yours are increasingly going to be offering data, their software, services, and becoming more of a technology company, sort of pointing your tooling and your proprietary knowledge at the external world, as an opportunity, as a business opportunity? >> That's a very interesting concept, and I know companies have done that, and some of them have been extremely successful, I mean, Amazon is the biggest example that comes to mind, right-- >> Yeah. >> When they launched AWS, something that they had that expertise they had internally, and they offered it to the world as a product. But by and large, I think it's going to be far and few between, especially, it's going to be focused on companies that have technology as their DNA, or almost like in the technology sector, building technology. Most other companies have different markets that they are addressing. And in my opinion, a lot of these companies, what they're trying to do is really focus on the problems that we can solve for ourselves, I think there are more problems than we have people and expertise. So my guess is that most large companies, they're going to focus on solving their own problems. A few, like I said, more tech-focused companies, that would want to be in that business, would probably branch out, but by and large, I think companies will continue to focus on serving their customers and serving their own business. >> Alright, Ash, we're going to leave it there, Ash Naseer. Thank you so much for your perspectives, it was great to see you, I'm sure we'll see you face-to-face later on this year. >> This is great, thank you for having me. >> Ah, you're welcome, alright. Keep it right there for more great content from SuperCloud2. We'll be right back. (gentle percussive music)
SUMMARY :
and the Super Cloud initiative in general, It's great to be back, And it's a comment that So the idea of a data mesh really helps us and how that's changed and making sure that they and that automated governance, and make sure that we implement it And you think about your brands and making sure that we use the concepts and so forth of the world, make sure that the quality or is it mature or is that something and the kinds of data that they need, And in the perfect world, so that everyone is on the same page. and that's the direction the key is time to value, right? and they offered it to Thank you so much for your perspectives, Keep it right there
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
CNN | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Warner Bros. | ORGANIZATION | 0.99+ |
TNT | ORGANIZATION | 0.99+ |
Ash Naseer | PERSON | 0.99+ |
HBO | ORGANIZATION | 0.99+ |
Ash | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Nelu Mihai | PERSON | 0.99+ |
each | QUANTITY | 0.99+ |
June | DATE | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
Game of Thrones | TITLE | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
Last August | DATE | 0.99+ |
30 brands | QUANTITY | 0.99+ |
30 plus brands | QUANTITY | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
last spring | DATE | 0.99+ |
Batman | PERSON | 0.99+ |
Jamak Dagani | PERSON | 0.99+ |
AWS | ORGANIZATION | 0.98+ |
one basket | QUANTITY | 0.98+ |
10 years ago | DATE | 0.98+ |
today | DATE | 0.98+ |
last decade | DATE | 0.97+ |
Snowflakes | EVENT | 0.95+ |
single cloud | QUANTITY | 0.95+ |
one | QUANTITY | 0.95+ |
two different companies | QUANTITY | 0.94+ |
SuperCloud2 | ORGANIZATION | 0.94+ |
Tiger Woods | PERSON | 0.94+ |
Warner Bros. Discovery | ORGANIZATION | 0.92+ |
decades | QUANTITY | 0.88+ |
this year | DATE | 0.85+ |
SuperCloud22 | EVENT | 0.84+ |
'90s | DATE | 0.84+ |
SuperCloud2 | EVENT | 0.83+ |
Monocloud | ORGANIZATION | 0.83+ |
Snowflake Summit | LOCATION | 0.77+ |
Super Cloud | EVENT | 0.77+ |
a day | QUANTITY | 0.74+ |
Busting Silos With | TITLE | 0.73+ |
Hadoop era | DATE | 0.66+ |
past decade | DATE | 0.63+ |
Databricks | EVENT | 0.63+ |
Max | TITLE | 0.49+ |
BigQuery | TITLE | 0.46+ |
Discovery | ORGANIZATION | 0.44+ |