Collibra Day 1 Felix Zhamak

>>Hi, Felix. Great to be here. >>Likewise. Um, so when I started reading about data mesh, I think about a year ago, I found myself the more I read about it, the more I find myself agreeing with other principles behind data mesh, it actually took me back to almost the starting of Colibra 13 years ago, based on the research we were doing on semantic technologies, even personally my own master thesis, which was about domain driven ontologies. And we'll talk about domain-driven as it's a key principle behind data mesh, but before we get into that, let's not assume that everybody knows what data measures about. Although we've seen a lot of traction and momentum, which is fantastic to see, but maybe if you could start by talking about some of the key principles and, and a brief overview of what data mesh, uh, Isabella of >>Course, well, they're happy to, uh, so Dana mesh is an approach is a new approach. It's a decentralized, decentralized approach to managing and accessing data and particularly analytical data at scale. So we can break that down a little bit. What is analytical data? Well, analytical data is the data that fuels our reporting as a business intelligence. Most importantly, the machine learning training, right? So it's the data, that's, it's an aggregate view of historical events that happens across organizations, many domains within organizations, or even beyond one organization, right? Um, and today we manage, uh, this analytical data through very centralized solutions. So whether it's a data lake or data warehouse or combinations of the two, and, uh, to be honest, we have kind of outsource the accountability for it, to the data team, right? It doesn't happen within the domains. Uh, what we have found ourselves with is, uh, central button next. >>So as we see the growth in the scale of organizations, in terms of the origins of the data and in terms of the great expectations for the data, all of these wonderful use cases that are, that requires access to that, unless we're data, uh, we find ourselves kind of constraints and limited in agility to respond, you know, because we have a centralized bottleneck from team to technology, to architecture. So there's a mesh kind of is that looks at the past what we've done, accidental complexity that we've kind of created and tries to reimagine a different way of, uh, managing and accessing data that can truly scale as this origins of the data grows. As they become available within one organization, we didn't want a cloud or another, and it links down really the approach based on four principles. Uh, so I so far, I haven't tried to be prescriptive as exactly how you implement it. >>I leave that to Elizabeth, to the imaginations of the users. Um, of course I have my opinions, but, but without being prescriptive, I think there are full shifts that needs to happen. One is, uh, we need to start breaking down the, kind of this complex problem of accessing to data around boundaries that can allow this to scale out a solution. So boundaries that are, that naturally fits into that model or domains, right. Our business domain. So, so there's a first principle is the domain ownership of the data. So analytical data will be shared and served and accountable, uh, by the domains where they come from. And then the second dimension of that is, okay. So once we break down this, the ownership of the database on domains, how can we prevent this data siloing? So the second principle is really treating data as a product. >>So considering the success of that data based on the access and usability and the lifelong experience of data analysts, data scientists. So we talk about data as a product and that the third principle is to really make it possible feasible. We need to really rethink our data platforms, our infrastructure capabilities, and create a new set ourselves of capabilities that allows domain in fact, to own their data in fact, to manage the life cycle of their analytical data. So then self-serve daytime frustration and platform is the fourth principle. And the last principle is really around governance because we have to think about governance. In fact, when I first wrote it down, this was like a little kind of concern in, in embedded in what some of my texts and I thought about, okay, now to make this real, we need to think about securing and quality of the data accessibility of the data at scale, in a fashion that embraces this autonomous domain ownership. So we have to think about how can we make this real with competition of governance? How can we make those domains be part of the governance, federated governance, federally, the competition of governance is the fourth principle. So at insurance it's a organizational shift, it's an architectural change. And of course technology needs to change to get us to decentralize access and management of Emily's school data. >>Yeah, I think that makes a ton of sense. If you want to scale, typically you have to think much more distributed versus centralized at we've seen it in other practices as well, that domain-driven thinking as well. I think, especially around engineering, right? We've seen a lot of the same principles and best practices in order to scale engineering teams and not make the same mistakes again, but maybe we can start there with kind of the core principles around that domain driven thinking. Can you elaborate a little bit on that? Why that is so important than the kind of data organizations, data functions as well? >>Absolutely. I mean, if you look at your organizations, organizations are complex systems, right? There are eight made of parts, which are basically domains functions of the business, your automation and your customer management, yourselves marketing. And then the behavior of the organization is the result of an intuitive, you know, network of dependencies and interactions with these domains. So if we just overlay data on this complex system, it does make sense to really, to scale, to bring the ownership and, um, really access to data right at the domain where it originates, right. But to the people who know that data best and most capable of providing that data. So to optimize response, to change, to optimize creating new features, new services, new machine learning models, we've got to kind of think about your call optimization, but not that the cost of global good. Right. Uh, so the domain ownership really talks about giving autonomy to the domains and accountability to provide their data and model the data, um, in a responsible way, be accountable for its quality. >>So no collect some of the empower them and localize some of those responsibilities, but at the same time, you know, thinking about the global goods, so what are they, how that domain needs to be accountable against the other domains on the mission? That's the governance piece covers that. And that leads to some interesting kind of architectural shifts, because when you think about not submission of the data, then you think about, okay, if I have a machine learning model that needs, you know, three pieces of the data from the different domains, I ended up actually distributing the computer also back to those domains. So it actually starts shifting kind of architectural as well. We start with ownership. Yeah, >>No, I think that makes a ton of sense, but I can imagine people thinking, well, if you're organizing, according to these domains, aren't gonna be going to grades different silos, even more silos. And I think that's where it second principle that's, um, think of data as a product and it comes in, I think that's incredibly powerful in my mind. It's powerful because it helps us think about usability. It helps us think about the consumer of that data and really packaging it in the right way. And as one sentence that I've heard you use that I think is incredibly powerful, it's less collecting, more connecting. Um, and can you elaborate on that a little bit? >>Absolutely. I mean the power and the value of the data is not enhanced, which we have got and stored on this, right. It's really about connecting that data to other data sets to aluminate new insights. The higher order information is connecting that data to the users, right. Then they want to use it. So that's why I think, uh, if we shift that thinking from just collecting more in one place, like whatever, and ability to connect datasets, then, then arrive at a different solution. So, uh, I think data as a product, as you said, exactly, was a kind of a response to the challenges that domain-driven siloing could create. And the idea is that the data that now these domains own needs to be shared with some accountability and incentive structure as a product. So if you bring product thinking to data, what does that mean? >>That means delighting the experience that there are users who are they, they're the data analysts, data scientists. So, you know, how can we delight their experience of their journey starts with a hypothesis. I have a question. Do I have right data to answer this question with a particular model? Let me discover it, let me find it if it's useful. Do I trust it? So really fascinated in that journey? I think we have two choices in that we have the choice of source of that data. The people who are really shouldn't be accountable for it, shrug off the responsibility and say, you know, I dumped this data on some event streaming and somebody downstream, the governance or data team will take care of a terror again. So it usable piece of information. And that's what we have done for, you know, half century almost. And, or let's say let's bring intention of providing quality data back to the source and make the folks both empower them and make them accountable for providing that data right at the source as a product. And I think by being intentional about that, um, w we're going to remove a lot of accidental complexity that we have created with, you know, labyrinth pipelines of moving data from one place to another, and try to build quality back into it. Um, and that requires, you know, architectural shifts, organizational shifts, incentive models, and the whole package, >>The hope is absolutely. And we'll talk about that. Federated computational governance is going to be a really an important aspect, but the other part of kind of data as a product next to usability is whole trust. Right? If you, if you want to use it, why is also trusts so important if you think about data as a product? >>Well, uh, I mean, maybe we turn this question back to you. Would you buy the shiniest product if you don't trust it, if you, if you don't trust where it comes from, can I use it? Is it, does it have integrity? I wouldn't. I think, I think it's almost irresponsible to use the data that you can trust, right. And the, really the meaning of the trust is that, do I know enough about this data to, to, for it, to be useful for the purpose that I'm using it for? So, um, I think trust is absolutely fundamental to, as a fundamental characteristics of a data as a product. And again, it comes back to breaching the gap between what the data user knows needs to know to really trust them, use that data, to find it, whether it's suitable and what they know today. So we can bridge that gap with, uh, you know, adding documentation, adding SLRs, adding lineage, like all of these additional information, but not only that, but also having people that are accountable for providing that integrity and those silos and guaranteeing. So it's really those product owners. So I think, um, it's just, for me, it's a non trust is a non-negotiable characteristic of the data as a product, like any other consumer product. >>Exactly. Like you said, if you think about consumer product, consumer marketplace is almost Uber of Amazon, of Airbnb. You have the simple rating as a very simple way of showing trust and those two and those different stakeholders and that almost. And we also say, okay, how do we actually get there? And I think data measure also talks a little bit about the roles responsibilities. And I think the importance overall of a, of a data product owner probably is aligned with that, that importance and trust. Yeah, >>Absolutely. I think we can't just wish for these good things happens without putting the accountability and the right roles in place. And the data product owner is just the starting point for us to stop playing hot potato. When it comes to, you know, who owns the data will be accountable for not so much. Who's the actual owner of that data because the owner of the data is you and me where the data comes really from, but it's the data product owner who's going to be responsible for the life cycle of this. They know when the data gets changed with consumers, meaning you feel as a new information, make sure that that gets carried out and maybe one day retire that data. So that long term ownership with intimate understanding of the needs of the user for that data, as well as the data itself and the domain itself and managing the life cycle of that, uh, I think that's a, that's a necessary role. >>Um, and then we have to think about why would anybody want to be a data product owner, right? What are the incentives we have to set up in the infrastructure, you know, in the organization. Um, and it really comes down to, I think, adopting prior art that exists in the product ownership landscape and bring it really to the data and assume the data users as the, as the customers, right. To make them happy. So our incentives on KPIs for these people before they get product on it needs to be aligned with the happiness of their data users. >>Yep. I love that. The alignment again, to the consumer using things like we know from product management, product owner of these roles and reusing that for data, I think that makes it makes a ton of sense. And it's a good leeway to talk a little about governance, right? We mentioned already federated governance, computational governance at we seeing that challenge often with our customers centralizing versus decentralizing. How do we find the right balance? Can you talk a little bit about that in the context of data mesh? How do we, how do we do this? >>Yeah, absolutely. I think the, I was hoping to pack three concepts in the title of the governance, but I thought that would be quite mouthful. So, uh, as you mentioned, uh, the kind of that federated aspects, the competition aspects, and I think embedded governance, I would, if I could add another kind of phrasing there and really it's about, um, as we talked about to how to make it happen. So I think the Federation matters because the people who are really in a position listed this, their product owners in a position to provide data in a trustworthy, with integrity and secure way, they have to have a stake in doing that, right. They have to be accountable, not just for their little domain or a big domain, but also they have to have an accountability for the mesh. So some of the concerns that are applied to all of the data front, I've seen fluid, how we secure them are consistently really secure them. >>How do we model the data or the schema language or the SLO metrics, or that allows this, uh, data to be interoperable so we can join multiple data products. So we have to have, I think, a set of policies that are really minimum set of policies that we have to apply globally to all the data products and then in a federated fashion, incentivize the data product owners. So have a stake in that and make that happen because there's always going to be a challenge in prioritizing. Would I add another few attributes? So my data sets to make my customers happy, or would I adopt that this standardized modeling language, right? They have to make that kind of continuous, um, kind of prioritization. Um, and they have to be incentivized to do both. Right. Uh, and then the other piece of it is okay, if we want to apply these consistent policies, across many data products and the mesh, how would it be physically possible? >>And the only way I can see, and I have seen it done in service mesh would be possible is by embedding those policies as competition, as code into every single data product. And how do we do that again, platform has a big part of it. So be able to have this embedded policy engines and whatever those things are into the data products, uh, and to, to be able to competition. So by default, when you become a data product, as part of the scaffolding of that data product, you get all of these, um, kind of computational capabilities to configure your, your policies according to the global policies. >>No, that makes sense. That makes, that makes it on a sense. That makes sense. >>I'm just curious. Really. So you've been at this for a while. You've built this system for the 13 years came from kind of academic background. So, uh, to be honest, we run into your products, lots of our clients, and there's always like a chat conversation within ThoughtWorks that, uh, do you guys know about this product then? So and so, oh, I should have curious, well, how do you think data governance tehcnology then skip and you need to shift with data mesh, right. And, and if, if I would ask, how would your roadmap changes with database? >>Yeah, I think it's a really good question. Um, what I don't want to do is to make, make the mistake that Venice often make and think of data mesh as a product. I think it's a much more holistic mindset change, right? That that's organization. Yes. It needs to be a kind of a platform enablement component there. And we've actually, I think authentically what, how we think about governance, that's very aligned with some of the principles and data measures that federate their thinking or customers know about going to communities domains or operating model. We really support that flexibility. I think from a roadmap perspective, I think making that even easier, uh, as always kind of a, a focus focus area for us, um, specifically around data measures are a few things that come to mind. Uh, one, I think is connectivity, right? If you, if you give different teams more ownership and accountability, we're not going to live in a world where all of the data is going to be stored on one location, right? >>You want to give people themes the opportunity and the accountability to make their own technology decisions so that they are fit for purpose. So I think whatever platform being able to really provide out of the box connectivity to a very wide, um, area or a range of technologies, I think is absolutely critical, um, on the, on the product as a or data as a product, thinking that usability, I think that's top of mind, uh, that's part of our roadmap. You're going to hear us, uh, stock about that tomorrow as well. Um, that data consumer, how do we make it as easy as possible for people to discover data that they can trust that they can access? Um, and in that thinking is a big part of our roadmap. So again, making that as easy as possible, uh, is a, is a big part of it. >>And, and also on the, I think the computation aspect that you mentioned, I think we believe in as well, if, if it's just documentation is going to be really hard to keep that alive, right? And so you have to make an active, we have to get close to the actual data. So if you think about a policy enforcement, for example, some things we're talking about, it's not just definition is the enforcement data quality. That's why we are so excited about our or data quality, um, acquisition as well. Um, so these are a couple of the things that we're thinking of, again, your, your, um, your, your, uh, message around from collecting to connecting. We talk about unity. I think that that works really, really well with our mission and vision as well. So mark, thank you so much. I wish we had more time to continue the conversation, uh, but it's been great to have a conversation here. Thank you so much for being here today and, uh, let's continue to work on that on data. Hello. I'm excited >>To see it. Just come to like.

Published Date : Jun 17 2021

SUMMARY :

Great to be here. I found myself the more I read about it, the more I find myself agreeing with other principles So it's the data, that's, it's an aggregate view of historical events that happens in agility to respond, you know, because we have a centralized bottleneck from team to technology, I leave that to Elizabeth, to the imaginations of the users. some of my texts and I thought about, okay, now to make this real, we need to think about securing in order to scale engineering teams and not make the same mistakes again, but maybe we can start there with kind Uh, so the domain ownership really talks about giving autonomy to the domains and And that leads to some interesting kind of architectural shifts, because when you think about not And as one sentence that I've heard you use that I think is incredibly powerful, it's less collecting, data that now these domains own needs to be shared with some accountability shouldn't be accountable for it, shrug off the responsibility and say, you know, I dumped this data on some event streaming aspect, but the other part of kind of data as a product next to usability is whole So we can bridge that gap with, uh, you know, adding documentation, And I think data measure also talks a little bit about the roles responsibilities. of the data is you and me where the data comes really from, but it's the data product owner who's What are the incentives we have to set up in the infrastructure, you know, in the organization. The alignment again, to the consumer using things like we know from product management, So some of the concerns that are applied to all of the data front, Um, and they have to be incentivized to do both. So be able to have this embedded policy engines That makes, that makes it on a sense. So and so, oh, I should have curious, the principles and data measures that federate their thinking or customers know about going to communities domains or operating of the box connectivity to a very wide, um, area or a range of technologies, And, and also on the, I think the computation aspect that you mentioned, I think we believe in as well, Just come to like.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Felix	PERSON	0.99+
Isabella	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Airbnb	ORGANIZATION	0.99+
Elizabeth	PERSON	0.99+
Felix Zhamak	PERSON	0.99+
13 years	QUANTITY	0.99+
second principle	QUANTITY	0.99+
two	QUANTITY	0.99+
today	DATE	0.99+
one sentence	QUANTITY	0.99+
third principle	QUANTITY	0.99+
second dimension	QUANTITY	0.99+
fourth principle	QUANTITY	0.99+
both	QUANTITY	0.99+
first principle	QUANTITY	0.99+
two choices	QUANTITY	0.98+
Dana	PERSON	0.98+
Emily	PERSON	0.98+
tomorrow	DATE	0.98+
first	QUANTITY	0.98+
one organization	QUANTITY	0.98+
13 years ago	DATE	0.98+
three pieces	QUANTITY	0.97+
a year ago	DATE	0.97+
One	QUANTITY	0.94+
mark	PERSON	0.93+
one location	QUANTITY	0.93+
three concepts	QUANTITY	0.92+
one place	QUANTITY	0.9+
one	QUANTITY	0.86+
eight made	QUANTITY	0.85+
four principles	QUANTITY	0.84+
single data product	QUANTITY	0.79+
Colibra	PERSON	0.76+
Venice	ORGANIZATION	0.73+
half century	DATE	0.63+
Day 1	QUANTITY	0.6+
ThoughtWorks	ORGANIZATION	0.59+

Zhamak Dehghani, ThoughtWorks | theCUBE on Cloud 2021

>>from around the globe. It's the Cube presenting Cuban cloud brought to you by silicon angle in 2000 >>nine. Hal Varian, Google's chief economist, said that statisticians would be the sexiest job in the coming decade. The modern big data movement >>really >>took off later in the following year. After the Second Hadoop World, which was hosted by Claudette Cloudera in New York City. Jeff Ham Abakar famously declared to me and John further in the Cube that the best minds of his generation, we're trying to figure out how to get people to click on ads. And he said that sucks. The industry was abuzz with the realization that data was the new competitive weapon. Hadoop was heralded as the new data management paradigm. Now, what actually transpired Over the next 10 years on Lee, a small handful of companies could really master the complexities of big data and attract the data science talent really necessary to realize massive returns as well. Back then, Cloud was in the early stages of its adoption. When you think about it at the beginning of the last decade and as the years passed, Maurin Mawr data got moved to the cloud and the number of data sources absolutely exploded. Experimentation accelerated, as did the pace of change. Complexity just overwhelmed big data infrastructures and data teams, leading to a continuous stream of incremental technical improvements designed to try and keep pace things like data Lakes, data hubs, new open source projects, new tools which piled on even Mawr complexity. And as we reported, we believe what's needed is a comm pleat bit flip and how we approach data architectures. Our next guest is Jean Marc de Connie, who is the director of emerging technologies That thought works. John Mark is a software engineer, architect, thought leader and adviser to some of the world's most prominent enterprises. She's, in my view, one of the foremost advocates for rethinking and changing the way we create and manage data architectures. Favoring a decentralized over monolithic structure and elevating domain knowledge is a primary criterion. And how we organize so called big data teams and platforms. Chamakh. Welcome to the Cube. It's a pleasure to have you on the program. >>Hi, David. This wonderful to be here. >>Well, okay, so >>you're >>pretty outspoken about the need for a paradigm shift in how we manage our data and our platforms that scale. Why do you feel we need such a radical change? What's your thoughts there? >>Well, I think if you just look back over the last decades you gave us, you know, a summary of what happened since 2000 and 10. But if even if we go before then what we have done over the last few decades is basically repeating and, as you mentioned, incrementally improving how we've managed data based on a certain assumptions around. As you mentioned, centralization data has to be in one place so we can get value from it. But if you look at the parallel movement off our industry in general since the birth of Internet, we are actually moving towards decentralization. If we think today, like if this move data side, if he said the only way Web would work the only way we get access to you know various applications on the Web pages is to centralize it. We would laugh at that idea, but for some reason we don't. We don't question that when it comes to data, right? So I think it's time to embrace the complexity that comes with the growth of number of sources, the proliferation of sources and consumptions models, you know, embrace the distribution of sources of data that they're not just within one part of organization. They're not just within even bounds of organization there beyond the bounds of organization. And then look back and say Okay, if that's the trend off our industry in general, Um, given the fabric of computation and data that we put in, you know globally in place, then how the architecture and technology and organizational structure incentives need to move to embrace that complexity. And to me, that requires a paradigm shift, a full stack from how we organize our organizations, how we organize our teams, how we, you know, put a technology in place, um, to to look at it from a decentralized angle. >>Okay, so let's let's unpack that a little bit. I mean, you've spoken about and written that today's big architecture and you basically just mentioned that it's flawed, So I wanna bring up. I love your diagrams of a simple diagram, guys, if you could bring up ah, figure one. So on the left here we're adjusting data from the operational systems and other enterprise data sets and, of course, external data. We cleanse it, you know, you've gotta do the do the quality thing and then serve them up to the business. So So what's wrong with that picture that we just described and give granted? It's a simplified form. >>Yeah, quite a few things. So, yeah, I would flip the question may be back to you or the audience if we said that. You know, there are so many sources off the data on the Actually, the data comes from systems and from teams that are very diverse in terms off domains. Right? Domain. If if you just think about, I don't know retail, Uh, the the E Commerce versus Order Management versus customer This is a very diverse domains. The data comes from many different diverse domains. And then we expect to put them under the control off a centralized team, a centralized system. And I know that centralization. Probably if you zoom out, it's centralized. If you zoom in it z compartmentalized based on functions that we can talk about that and we assume that the centralized model will be served, you know, getting that data, making sense of it, cleansing and transforming it then to satisfy in need of very diverse set of consumers without really understanding the domains, because the teams responsible for it or not close to the source of the data. So there is a bit of it, um, cognitive gap and domain understanding Gap, um, you know, without really understanding of how the data is going to be used, I've talked to numerous. When we came to this, I came up with the idea. I talked to a lot of data teams globally just to see, you know, what are the pain points? How are they doing it? And one thing that was evident in all of those conversations that they actually didn't know after they built these pipelines and put the data in whether the data warehouse tables or like, they didn't know how the data was being used. But yet the responsible for making the data available for these diverse set of these cases, So s centralized system. A monolithic system often is a bottleneck. So what you find is, a lot of the teams are struggling with satisfying the needs of the consumers, the struggling with really understanding the data. The domain knowledge is lost there is a los off understanding and kind of in that in that transformation. Often, you know, we end up training machine learning models on data that is not really representative off the reality off the business. And then we put them to production and they don't work because the semantic and the same tax off the data gets lost within that translation. So we're struggling with finding people thio, you know, to manage a centralized system because there's still the technology is fairly, in my opinion, fairly low level and exposes the users of those technologies. I said, Let's say warehouse a lot off, you know, complexity. So in summary, I think it's a bottleneck is not gonna, you know, satisfy the pace of change, of pace, of innovation and the pace of, you know, availability of sources. Um, it's disconnected and fragmented, even though the centralizes disconnected and fragmented from where the data comes from and where the data gets used on is managed by, you know, a team off hyper specialized people that you know, they're struggling to understand the actual value of the data, the actual format of the data, so it's not gonna get us where our aspirations and ambitions need to be. >>Yes. So the big data platform is essentially I think you call it, uh, context agnostic. And so is data becomes, you know, more important, our lives. You've got all these new data sources, you know, injected into the system. Experimentation as we said it with the cloud becomes much, much easier. So one of the blockers that you've started, you just mentioned it is you've got these hyper specialized roles the data engineer, the quality engineer, data scientists and and the It's illusory. I mean, it's like an illusion. These guys air, they seemingly they're independent and in scale independently. But I think you've made the point that in fact, they can't that a change in the data source has an effect across the entire data lifecycle entire data pipeline. So maybe you could maybe you could add some color to why that's problematic for some of the organizations that you work with and maybe give some examples. >>Yeah, absolutely so in fact, that initially the hypothesis around that image came from a Siris of requests that we received from our both large scale and progressive clients and progressive in terms of their investment in data architectures. So this is where clients that they were there were larger scale. They had divers and reached out of domains. Some of them were big technology tech companies. Some of them were retail companies, big health care companies. So they had that diversity off the data and the number off. You know, the sources of the domains they had invested for quite a few years in, you know, generations. If they had multi generations of proprietary data warehouses on print that they were moving to cloud, they had moved to the barriers, you know, revisions of the Hadoop clusters and they were moving to the cloud. And they the challenges that they were facing were simply there were not like, if I want to just, like, you know, simplifying in one phrase, they were not getting value from the data that they were collecting. There were continuously struggling Thio shift the culture because there was so much friction between all of these three phases of both consumption of the data and transformation and making it available consumption from sources and then providing it and serving it to the consumer. So that whole process was full of friction. Everybody was unhappy. So its bottom line is that you're collecting all this data. There is delay. There is lack of trust in the data itself because the data is not representative of the reality has gone through a transformation. But people that didn't understand really what the data was got delayed on bond. So there is no trust. It's hard to get to the data. It's hard to create. Ultimately, it's hard to create value from the data, and people are working really hard and under a lot of pressure. But it's still, you know, struggling. So we often you know, our solutions like we are. You know, Technologies will often pointed to technology. So we go. Okay, This this version of you know, some some proprietary data warehouse we're using is not the right thing. We should go to the cloud, and that certainly will solve our problems. Right? Or warehouse wasn't a good one. Let's make a deal Lake version. So instead of you know, extracting and then transforming and loading into the little bits. And that transformation is that, you know, heavy process, because you fundamentally made an assumption using warehouses that if I transform this data into this multi dimensional, perfectly designed schema that then everybody can run whatever choir they want that's gonna solve. You know everybody's problem, but in reality it doesn't because you you are delayed and there is no universal model that serves everybody's need. Everybody that needs the divers data scientists necessarily don't don't like the perfectly modeled data. They're looking for both signals and the noise. So then, you know, we've We've just gone from, uh, et elles to let's say now to Lake, which is okay, let's move the transformation to the to the last mile. Let's just get load the data into, uh into the object stores into semi structured files and get the data. Scientists use it, but they're still struggling because the problems that we mentioned eso then with the solution. What is the solution? Well, next generation data platform, let's put it on the cloud, and we sell clients that actually had gone through, you know, a year or multiple years of migration to the cloud. But with it was great. 18 months I've seen, you know, nine months migrations of the warehouse versus two year migrations of the various data sources to the clubhouse. But ultimately, the result is the same on satisfy frustrated data users, data providers, um, you know, with lack of ability to innovate quickly on relevant data and have have have an experience that they deserve toe have have a delightful experience off discovering and exploring data that they trust. And all of that was still a missed so something something else more fundamentally needed to change than just the technology. >>So then the linchpin to your scenario is this notion of context and you you pointed out you made the other observation that look, we've made our operational systems context aware. But our data platforms are not on bond like CRM system sales guys very comfortable with what's in the CRM system. They own the data. So let's talk about the answer that you and your colleagues are proposing. You're essentially flipping the architecture whereby those domain knowledge workers, the builders, if you will, of data products or data services there now, first class citizens in the data flow and they're injecting by design domain knowledge into the system. So So I wanna put up another one of your charts. Guys, bring up the figure to their, um it talks about, you know, convergence. You showed data distributed domain, dream and architecture. Er this self serve platform design and this notion of product thinking. So maybe you could explain why this approach is is so desirable, in your view, >>sure. The motivation and inspiration for the approach came from studying what has happened over the last few decades in operational systems. We had a very similar problem prior to micro services with monolithic systems, monolithic systems where you know the bottleneck. Um, the changes we needed to make was always, you know, our fellow Noto, how the architecture was centralized and we found a nice nation. I'm not saying this is the perfect way of decoupling a monolith, but it's a way that currently where we are in our journey to become data driven, um is a nice place to be, um, which is distribution or decomposition off your system as well as organization. I think when we whenever we talk about systems, we've got to talk about people and teams that's responsible for managing those systems. So the decomposition off the systems and the teams on the data around domains because that's how today we are decoupling our business, right? We're decoupling our businesses around domains, and that's a that's a good thing and that What does that do really for us? What it does? Is it localizes change to the bounded context of fact business. It creates clear boundary and interfaces and contracts between the rest of the universe of the organization on that particular team, so removes the friction that often we have for both managing the change and both serving data or capability. So it's the first principle of data meshes. Let's decouple this world off analytical data the same to mirror the same way we have to couple their systems and teams and business why data is any different. And the moment you do that, So you, the moment you bring the ownership to people who understands the data best, then you get questions that well, how is that any different from silence that's connected databases that we have today and nobody can get to the data? So then the rest of the principles is really to address all of the challenges that comes with this first principle of decomposition around domain Context on the second principle is well, we have to expect a certain level off quality and accountability and responsibility for the teams that provide the data. So let's bring product thinking and treating data as a product to the data that these teams now, um share and let's put accountability around. And we need a new set of incentives and metrics for domain teams to share the data. We need to have a new set off kind of quality metrics that define what it means for the data to be a product. And we can go through that conversation perhaps later eso then the second principle is okay. The teams now that are responsible, the domain teams responsible for the analytical data need to provide that data with a certain level of quality and assurance. Let's call that a product and bring products thinking to that. And then the next question you get asked off by C. E. O s or city or the people who build the infrastructure and, you know, spend the money. They said, Well, it's actually quite complex to manage big data, and now we're We want everybody, every independent team to manage the full stack of, you know, storage and computation and pipelines and, you know, access, control and all of that. And that's well, we have solved that problem in operational world. And that requires really a new level of platform thinking toe provide infrastructure and tooling to the domain teams to now be able to manage and serve their big data. And that I think that requires reimagining the world of our tooling and technology. But for now, let's just assume that we need a new level of abstraction to hide away ton of complexity that unnecessarily people get exposed to and that that's the third principle of creating Selves of infrastructure, um, to allow autonomous teams to build their domains. But then the last pillar, the last you know, fundamental pillar is okay. Once you distributed problem into a smaller problems that you found yourself with another set of problems, which is how I'm gonna connect this data, how I'm gonna you know, that the insights happens and emerges from the interconnection of the data domains right? It does not necessarily locked into one domain. So the concerns around interoperability and standardization and getting value as a result of composition and interconnection of these domains requires a new approach to governance. And we have to think about governance very differently based on a Federated model and based on a computational model. Like once we have this powerful self serve platform, we can computational e automate a lot of governance decisions. Um, that security decisions and policy decisions that applies to you know, this fabric of mesh not just a single domain or not in a centralized. Also, really. As you mentioned that the most important component of the emissions distribution of ownership and distribution of architecture and data the rest of them is to solve all the problems that come with that. >>So very powerful guys. We actually have a picture of what Jamaat just described. Bring up, bring up figure three, if you would tell me it. Essentially, you're advocating for the pushing of the pipeline and all its various functions into the lines of business and abstracting that complexity of the underlying infrastructure, which you kind of show here in this figure, data infrastructure is a platform down below. And you know what I love about this Jama is it to me, it underscores the data is not the new oil because I could put oil in my car I can put in my house, but I can't put the same court in both places. But I think you call it polyglot data, which is really different forms, batch or whatever. But the same data data doesn't follow the laws of scarcity. I can use the same data for many, many uses, and that's what this sort of graphic shows. And then you brought in the really important, you know, sticking problem, which is that you know the governance which is now not a command and control. It's it's Federated governance. So maybe you could add some thoughts on that. >>Sure, absolutely. It's one of those I think I keep referring to data much as a paradigm shift. And it's not just to make it sound ground and, you know, like, kind of ground and exciting or in court. And it's really because I want to point out, we need to question every moment when we make a decision around how we're going to design security or governance or modeling off the data, we need to reflect and go back and say, um, I applying some of my cognitive biases around how I have worked for the last 40 years, I have seen it work. Or do I do I really need to question. And we do need to question the way we have applied governance. I think at the end of the day, the rule of the data governance and objective remains the same. I mean, we all want quality data accessible to a diverse set of users. And these users now have different personas, like David, Personal data, analyst data, scientists, data application, Um, you know, user, very diverse personal. So at the end of the day, we want quality data accessible to them, um, trustworthy in in an easy consumable way. Um, however, how we get there looks very different in as you mentioned that the governance model in the old world has been very commander control, very centralized. Um, you know, they were responsible for quality. They were responsible for certification off the data, you know, applying making sure the data complies. But also such regulations Make sure you know, data gets discovered and made available in the world of the data mesh. Really. The job of the data governance as a function becomes finding that equilibrium between what decisions need to be um, you know, made and enforced globally. And what decisions need to be made locally so that we can have an interoperable measure. If data sets that can move fast and can change fast like it's really about instead of hardest, you know, kind of putting the putting those systems in a straitjacket of being constant and don't change, embrace, change and continuous change of landscape because that's that's just the reality we can't escape. So the role of governance really the governance model called Federated and Computational. And by that I mean, um, every domain needs to have a representative in the governance team. So the role of the data or domain data product owner who really were understand the data that domain really well but also wears that hacks of a product owner. It is an important role that had has to have a representation in the governance. So it's a federation off domains coming together, plus the SMEs and people have, you know, subject matter. Experts who understands the regulations in that environmental understands the data security concerns, but instead off trying to enforce and do this as a central team. They make decisions as what need to be standardized, what need to be enforced. And let's push that into that computational E and in an automated fashion into the into the camp platform itself. For example, instead of trying to do that, you know, be part of the data quality pipeline and inject ourselves as people in that process, let's actually, as a group, define what constitutes quality, like, how do we measure quality? And then let's automate that and let Z codify that into the platform so that every native products will have a C I City pipeline on as part of that pipeline. Those quality metrics gets validated and every day to product needs to publish those SLOC or service level objectives. So you know, whatever we choose as a measure of quality, maybe it's the, you know, the integrity of the data, the delay in the data, the liveliness of it, whatever the are the decisions that you're making, let's codify that. So it's, um, it's really, um, the role of the governance. The objectives of the governance team tried to satisfies the same, but how they do it. It is very, very different. I wrote a new article recently trying to explain the logical architecture that would emerge from applying these principles. And I put a kind of light table to compare and contrast the roll off the You know how we do governance today versus how we will do it differently to just give people a flavor of what does it mean to embrace the centralization? And what does it mean to embrace change and continuous change? Eso hopefully that that that could be helpful. >>Yes, very so many questions I haven't but the point you make it to data quality. Sometimes I feel like quality is the end game. Where is the end game? Should be how fast you could go from idea to monetization with the data service. What happens again? You sort of address this, but what happens to the underlying infrastructure? I mean, spinning a PC to S and S three buckets and my pie torches and tensor flows. And where does that that lives in the business? And who's responsible for that? >>Yeah, that's I'm glad you're asking this question. Maybe because, um, I truly believe we need to re imagine that world. I think there are many pieces that we can use Aziz utilities on foundational pieces, but I but I can see for myself a 5 to 7 year roadmap of building this new tooling. I think, in terms of the ownership, the question around ownership, if that would remains with the platform team, but and perhaps the domain agnostic, technology focused team right that there are providing instead of products themselves. And but the products are the users off those products are data product developers, right? Data domain teams that now have really high expectations in terms of low friction in terms of lead time to create a new data product. Eso We need a new set off tooling, and I think with the language needs to shift from, You know, I need a storage buckets. So I need a storage account. So I need a cluster to run my, you know, spark jobs, too. Here's the declaration of my data products. This is where the data for it will come from. This is the data that I want to serve. These are the policies that I need toe apply in terms of perhaps encryption or access control. Um, go make it happen. Platform, go provision, Everything that I mean so that as a data product developer. All I can focus on is the data itself, representation of semantic and representation of the syntax. And make sure that data meets the quality that I have that I have to assure and it's available. The rest of provisioning of everything that sits underneath will have to get taken care of by the platform. And that's what I mean by requires a re imagination and in fact, Andi, there will be a data platform team, the data platform teams that we set up for our clients. In fact, themselves have a favorite of complexity. Internally, they divide into multiple teams multiple planes, eso there would be a plane, as in a group of capabilities that satisfied that data product developer experience, there would be a set of capabilities that deal with those need a greatly underlying utilities. I call it at this point, utilities, because to me that the level of abstraction of the platform is to go higher than where it is. So what we call platform today are a set of utilities will be continuing to using will be continuing to using object storage, will continue using relation of databases and so on so there will be a plane and a group of people responsible for that. There will be a group of people responsible for capabilities that you know enable the mesh level functionality, for example, be able to correlate and connects. And query data from multiple knows. That's a measure level capability to be able to discover and explore the measure data products as a measure of capability. So it would be set of teams as part of platforms with a strong again platform product thinking embedded and product ownership embedded into that. To satisfy the experience of this now business oriented domain data team teams s way have a lot of work to do. >>I could go on. Unfortunately, we're out of time. But I guess my first I want to tell people there's two pieces that you put out so far. One is, uh, how to move beyond a monolithic data lake to a distributed data mesh. You guys should read that in a data mesh principles and logical architectures kind of part two. I guess my last question in the very limited time we have is our organization is ready for this. >>E think the desire is there I've bean overwhelmed with number off large and medium and small and private and public governments and federal, you know, organizations that reached out to us globally. I mean, it's not This is this is a global movement and I'm humbled by the response of the industry. I think they're the desire is there. The pains are really people acknowledge that something needs to change. Here s so that's the first step. I think that awareness isa spreading organizations. They're more and more becoming aware. In fact, many technology providers are reach out to us asking what you know, what shall we do? Because our clients are asking us, You know, people are already asking We need the data vision. We need the tooling to support. It s oh, that awareness is there In terms of the first step of being ready, However, the ingredients of a successful transformation requires top down and bottom up support. So it requires, you know, support from Chief Data Analytics officers or above the most successful clients that we have with data. Make sure the ones that you know the CEOs have made a statement that, you know, we want to change the experience of every single customer using data and we're going to do, we're going to commit to this. So the investment and support, you know, exists from top to all layers. The engineers are excited that maybe perhaps the traditional data teams are open to change. So there are a lot of ingredients. Substance to transformation is to come together. Um, are we really ready for it? I think I think the pioneers, perhaps the innovators. If you think about that innovation, careful. My doctors, probably pioneers and innovators and leaders. Doctors are making making move towards it. And hopefully, as the technology becomes more available, organizations that are less or in, you know, engineering oriented, they don't have the capability in house today, but they can buy it. They would come next. Maybe those are not the ones who aren't quite ready for it because the technology is not readily available. Requires, you know, internal investment today. >>I think you're right on. I think the leaders are gonna lead in hard, and they're gonna show us the path over the next several years. And I think the the end of this decade is gonna be defined a lot differently than the beginning. Jammeh. Thanks so much for coming in. The Cuban. Participate in the >>program. Pleasure head. >>Alright, Keep it right. Everybody went back right after this short break.

Published Date : Jan 22 2021

SUMMARY :

cloud brought to you by silicon angle in 2000 The modern big data movement It's a pleasure to have you on the program. This wonderful to be here. pretty outspoken about the need for a paradigm shift in how we manage our data and our platforms the only way we get access to you know various applications on the Web pages is to So on the left here we're adjusting data from the operational lot of data teams globally just to see, you know, what are the pain points? that's problematic for some of the organizations that you work with and maybe give some examples. And that transformation is that, you know, heavy process, because you fundamentally So let's talk about the answer that you and your colleagues are proposing. the changes we needed to make was always, you know, our fellow Noto, how the architecture was centralized And then you brought in the really important, you know, sticking problem, which is that you know the governance which So at the end of the day, we want quality data accessible to them, um, Where is the end game? And make sure that data meets the quality that I I guess my last question in the very limited time we have is our organization is ready So the investment and support, you know, Participate in the Alright, Keep it right.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jean Marc de Connie	PERSON	0.99+
Hal Varian	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
New York City	LOCATION	0.99+
John Mark	PERSON	0.99+
5	QUANTITY	0.99+
Jeff Ham Abakar	PERSON	0.99+
two year	QUANTITY	0.99+
two pieces	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
John	PERSON	0.99+
nine months	QUANTITY	0.99+
2000	DATE	0.99+
18 months	QUANTITY	0.99+
first step	QUANTITY	0.99+
second principle	QUANTITY	0.99+
both places	QUANTITY	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
a year	QUANTITY	0.99+
one part	QUANTITY	0.99+
first	QUANTITY	0.99+
Claudette Cloudera	PERSON	0.99+
third principle	QUANTITY	0.98+
10	DATE	0.98+
first principle	QUANTITY	0.98+
one domain	QUANTITY	0.98+
today	DATE	0.98+
Lee	PERSON	0.98+
one phrase	QUANTITY	0.98+
three phases	QUANTITY	0.98+
Cuban	OTHER	0.98+
Jammeh	PERSON	0.97+
7 year	QUANTITY	0.97+
Mawr	PERSON	0.97+
Jamaat	PERSON	0.97+
last decade	DATE	0.97+
Maurin Mawr	PERSON	0.94+
single domain	QUANTITY	0.92+
one thing	QUANTITY	0.91+
ThoughtWorks	ORGANIZATION	0.9+
one	QUANTITY	0.9+
nine	QUANTITY	0.9+
theCUBE	ORGANIZATION	0.89+
end	DATE	0.88+
last few decades	DATE	0.87+
one place	QUANTITY	0.87+
Second Hadoop World	EVENT	0.86+
three	OTHER	0.85+
C. E. O	ORGANIZATION	0.84+
this decade	DATE	0.84+
Siris	TITLE	0.83+
coming decade	DATE	0.83+
Andi	PERSON	0.81+
Chamakh	PERSON	0.8+
three buckets	QUANTITY	0.77+
Jama	PERSON	0.77+
Cuban	PERSON	0.76+
Aziz	ORGANIZATION	0.72+
years	DATE	0.72+
first class	QUANTITY	0.72+
last 40	DATE	0.67+
single customer	QUANTITY	0.66+
part two	OTHER	0.66+
last	DATE	0.66+
Cloud	TITLE	0.56+
2021	DATE	0.55+
next 10 years	DATE	0.54+
Hadoop	EVENT	0.53+
following year	DATE	0.53+
years	QUANTITY	0.51+
Cube	ORGANIZATION	0.5+
Noto	ORGANIZATION	0.45+
Cube	PERSON	0.39+
Cube	COMMERCIAL_ITEM	0.26+

Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks

(bright music) >> In 2009, Hal Varian, Google's Chief Economist said that statisticians would be the sexiest job in the coming decade. The modern big data movement really took off later in the following year, after the second Hadoop World, which was hosted by Cloudera, in New York city. Jeff Hama Bachar, famously declared to me and John Furrie, in "theCUBE," that the best minds of his generation were trying to figure out how to get people to click on ads. And he said that sucks. The industry was abuzz with the realization that data was the new competitive weapon. Hadoop was heralded as the new data management paradigm. Now what actually transpired over the next 10 years was only a small handful of companies could really master the complexities of big data and attract the data science talent, really necessary to realize massive returns. As well, back then, cloud was in the early stages of its adoption. When you think about it at the beginning of the last decade, and as the years passed, more and more data got moved to the cloud, and the number of data sources absolutely exploded, experimentation accelerated, as did the pace of change. Complexity just overwhelmed big data infrastructures and data teams, leading to a continuous stream of incremental technical improvements designed to try and keep pace, things like data lakes, data hubs, new open source projects, new tools, which piled on even more complexity. And as we reported, we believe what's needed is a complete bit flip and how we approach data architectures. Our next guest is Zhamak Dehgani, who is the Director of Emerging Technologies at ThoughtWorks. Zhamak is a software engineer, architect, thought leader and advisor, to some of the world's most prominent enterprises. She's in my view, one of the foremost advocates for rethinking and changing the way we create and manage data architectures, favoring a decentralized over monolithic structure, and elevating domain knowledge as a primary criterion, and how we organize so-called big data teams and platforms. Zhamak, welcome to the cube, it's a pleasure to have you on the program. >> Hi David, it's wonderful to be here. >> Okay. So you're pretty outspoken about the need for a paradigm shift, in how we manage our data, and our platforms at scale. Why do you feel we need such a radical change? What's your thoughts there? >> Well, I think if you just look back over the last decades, you gave us a summary of what happened since 2010. But even if we got it before then, what we have done over the last few decades is basically repeating, and as you mentioned, incrementally improving how we manage data, based on certain assumptions around, as you mentioned, centralization. Data has to be in one place so we can get value from it. But if you look at the parallel movement of our industry in general, since the birth of internet, we are actually moving towards decentralization. If we think today, like if in this move data side, if we said, the only way web would work, the only way we get access to various applications on the web or pages is to centralize it, we would laugh at that idea, but for some reason, we don't question that when it comes to data, right? So I think it's time to embrace the complexity that comes with the growth of number of sources, the proliferation of sources and consumptions models, embrace the distribution of sources of data, that they're not just within one part of organization. They're not just within even bounds of organizations. They're beyond the bounds of organization, and then look back and say, okay, if that's the trend of our industry in general, given the fabric of compensation and data that we put in globally in place, then how the architecture and technology and organizational structure incentives need to move, to embrace that complexity. And to me, that requires a paradigm shift. A full stack from how we organize our organizations, how we organize our teams, how we put a technology in place to look at it from a decentralized angle. >> Okay, so let's unpack that a little bit. I mean, you've spoken about and written today's big architecture, and you've basically just mentioned that it's flawed. So I want to bring up, I love your diagrams, you have a simple diagram, guys if you could bring up figure one. So on the left here, we're adjusting data from the operational systems, and other enterprise data sets. And of course, external data, we cleanse it, you've got to do the quality thing, and then serve them up to the business. So what's wrong with that picture that we just described, and give granted it's a simplified form. >> Yeah. Quite a few things. So, and I would flip the question maybe back to you or the audience. If we said that there are so many sources of the data and actually data comes from systems and from teams that are very diverse in terms of domains, right? Domain. If you just think about, I don't know, retail, the E-Commerce versus auto management, versus customer. These are very diverse domains. The data comes from many different diverse domains, and then we expect to put them under the control of a centralized team, a centralized system. And I know that centralization probably, if you zoom out is centralized, if you zoom in it's compartmentalized based on functions, and we can talk about that. And we assume that the centralized model, will be getting that data, making sense of it, cleansing and transforming it, then to satisfy a need of very diverse set of consumers without really understanding the domains because the teams responsible for it are not close to the source of the data. So there is a bit of a cognitive gap and domain understanding gap, without really understanding how the data is going to be used. I've talked to numerous, when we came to this, I came up with the idea. I talked to a lot of data teams globally, just to see, what are the pain points? How are they doing it? And one thing that was evident in all of those conversations, that they actually didn't know, after they built these pipelines and put the data in, whether the data warehouse tables or linked, they didn't know how the data was being used. But yet they're responsible for making the data available for this diverse set of use cases. So essentially system and monolithic system, often is a bottleneck. So what you find is that a lot of the teams are struggling with satisfying the needs of the consumers, are struggling with really understanding the data, the domain knowledge is lost, there is a loss of understanding and kind of it in that transformation, often we end up training machine learning models on data, that is not really representative of the reality of the business, and then we put them to production and they don't work because the semantic and the syntax of the data gets lost within that translation. So, and we are struggling with finding people to manage a centralized system because still the technology's fairly, in my opinion, fairly low level and exposes the users of those technology sets and let's say they warehouse a lot of complexity. So in summary, I think it's a bottleneck, it's not going to satisfy the pace of change or pace of innovation, and the availability of sources. It's disconnected and fragmented, even though there's centralized, it's disconnected and fragmented from where the data comes from and where the data gets used, and is managed by a team of hyper specialized people, they're struggling to understand the actual value of the data, the actual format of the data. So it's not going to get us where our aspirations, our ambitions need to be. >> Yeah, so the big data platform is essentially, I think you call it context agnostic. And so as data becomes more important in our lives, you've got all these new data sources injected into the system, experimentation as we said, the cloud becomes much, much easier. So one of the blockers that you've cited and you just mentioned it, is you've got these hyper specialized roles, the data engineer, the quality engineer, data scientist. And it's a losery. I mean, it's like an illusion. These guys, they seemingly they're independent, and can scale independently, but I think you've made the point that in fact, they can't. That a change in a data source has an effect across the entire data life cycle, entire data pipeline. So maybe you could add some some color to why that's problematic for some of the organizations that you work with, and maybe give some examples. >> Yeah, absolutely. So in fact initially, the hypothesis around data mesh came from a series of requests that we received from our both large scale and progressive clients, and progressive in terms of their investment in data architecture. So these were clients that were larger scale, they had diverse and rich set of domain, some of them were big technology, tech companies, some of them were big retail companies, big healthcare companies. So they had that diversity of the data and a number of the sources of the domains. They had invested for quite a few years in generations, of they had multi-generations of PROPRICER data warehouses on prem that were moving to cloud. They had moved through the various revisions of the Hadoop clusters, and they were moving to that to cloud, and then the challenges that they were facing were simply... If I want to just simplify it in one phrase, they we're not getting value from the data that they were collecting. They were continuously struggling to shift the culture because there was so much friction between all of these three phases of both consumption of the data, then transformation and making it available. Consumption from sources and then providing it and serving it to the consumer. So that whole process was full of friction. Everybody was unhappy. So it's bottom line is that you're collecting all this data, there is delay, there is lack of trust in the data itself, because the data is not representative of the reality, it's gone through the transformation, but people that didn't understand really what the data was got delayed. And so there's no trust, it's hard to get to the data. Ultimately, it's hard to create value from the data, and people are working really hard and under a lot of pressure, but it's still struggling. So we often, our solutions, like we are... Technologies, we will often point out to technology. So we go. Okay, this version of some proprietary data warehouse we're using is not the right thing. We should go to the cloud and that certainly will solve our problem, right? Or warehouse wasn't a good one, let's make a data Lake version. So instead of extracting and then transforming and loading into the database, and that transformation is that heavy process because you fundamentally made an assumption using warehouses that if I transform this data into this multidimensional perfectly designed schema, that then everybody can draw on whatever query they want, that's going to solve everybody's problem. But in reality, it doesn't because you are delayed and there is no universal model that serves everybody's need, everybody needs are diverse. Data scientists necessarily don't like the perfectly modeled data, they're for both signals and the noise. So then we've just gone from ATLs to let's say now to Lake, which is... Okay, let's move the transformation to the last mile. Let's just get load the data into the object stores and sort of semi-structured files and get the data scientists use it, but they still struggling because of the problems that we mentioned. So then what is the solution? What is the solution? Well, next generation data platform. Let's put it on the cloud. And we saw clients that actually had gone through a year or multiple years of migration to the cloud but it was great, 18 months, I've seen nine months migrations of the warehouse versus two year migrations of various data sources to the cloud. But ultimately the result is the same, unsatisfied, frustrated data users, data providers with lack of ability to innovate quickly on relevant data and have an experience that they deserve to have, have a delightful experience of discovering and exploring data that they trust. And all of that was still amiss. So something else more fundamentally needed to change than just the technology. >> So the linchpin to your scenario is this notion of context. And you pointed out, you made the other observation that "Look we've made our operational systems context aware but our data platforms are not." And like CRM system sales guys are very comfortable with what's in the CRMs system. They own the data. So let's talk about the answer that you and your colleagues are proposing. You're essentially flipping the architecture whereby those domain knowledge workers, the builders if you will, of data products or data services, they are now first-class citizens in the data flow, and they're injecting by design domain knowledge into the system. So I want to put up another one of your charts guys, bring up the figure two there. It talks about convergence. She showed data distributed, domain driven architecture, the self-serve platform design, and this notion of product thinking. So maybe you could explain why this approach is so desirable in your view. >> Sure. The motivation and inspirations for that approach came from studying what has happened over the last few decades in operational systems. We had a very similar problem prior to microservices with monolithic systems. One of the things systems where the bottleneck, the changes we needed to make was always on vertical now to how the architecture was centralized. And we found a nice niche. And I'm not saying this is a perfect way of decoupling your monolith, but it's a way that currently where we are in our journey to become data driven, it is a nice place to be, which is distribution or a decomposition of your system as well as organization. I think whenever we talk about systems, we've got to talk about people and teams that are responsible for managing those systems. So the decomposition of the systems and the teams, and the data around domains. Because that's how today we are decoupling our business, right? We are decoupling our businesses around domains, and that's a good thing. And what does that do really for us? What it does is it localizes change to the bounded context of that business. It creates clear boundary and interfaces and contracts between the rest of the universe of the organization, and that particular team, so removes the friction that often we have for both managing the change, and both serving data or capability. So if the first principle of data meshes, let's decouple this world of analytical data the same to mirror. The same way we have decoupled our systems and teams, and business. Why data is any different. And the moment you do that, so the moment you bring the ownership to people who understands the data best, then you get questions that well, how is that any different from silos of disconnected databases that we have today and nobody can get to the data? So then the rest of the principles is really to address all of the challenges that comes with this first principle of decomposition around domain context. And the second principle is, well, we have to expect a certain level of quality and accountability, and responsibility for the teams that provide the data. So let's bring products thinking and treating data as a product, to the data that these teams now share, and let's put accountability around it. We need a new set of incentives and metrics for domain teams to share the data, we need to have a new set of kind of quality metrics that define what it means for the data to be a product, and we can go through that conversation perhaps later. So then the second principle is, okay, the teams now that are responsible, the domain teams responsible for their analytical data need to provide that data with a certain level of quality and assurance. Let's call that a product, and bring product thinking to that. And then the next question you get asked off at work by CIO or CTO is the people who build the infrastructure and spend the money. They say, well, "It's actually quite complex to manage big data, now where we want everybody, every independent team to manage the full stack of storage and computation and pipelines and access control and all of that." Well, we've solved that problem in operational world. And that requires really a new level of platform thinking to provide infrastructure and tooling to the domain teams to now be able to manage and serve their big data, and I think that requires re-imagining the world of our tooling and technology. But for now, let's just assume that we need a new level of abstraction to hide away a ton of complexity that unnecessarily people get exposed to. And that's the third principle of creating self-serve infrastructure to allow autonomous teams to build their domains. But then the last pillar, the last fundamental pillar is okay, once he distributed a problem into smaller problems that you found yourself with another set of problems, which is how I'm going to connect this data. The insights happens and emerges from the interconnection of the data domains, right? It's just not necessarily locked into one domain. So the concerns around interoperability and standardization and getting value as a result of composition and interconnection of these domains requires a new approach to governance. And we have to think about governance very differently based on a federated model. And based on a computational model. Like once we have this powerful self-serve platform, we can computationally automate a lot of covenants decisions and security decisions, and policy decisions, that applies to this fabric of mesh, not just a single domain or not in a centralized. So really, as you mentioned, the most important component of the data mesh is distribution of ownership and distribution of architecture in data, the rest of them is to solve all the problems that come with that. >> So, very powerful. And guys, we actually have a picture of what Zhamak just described. Bring up figure three, if you would. So I mean, essentially, you're advocating for the pushing of the pipeline and all its various functions into the lines of business and abstracting that complexity of the underlying infrastructure which you kind of show here in this figure, data infrastructure as a platform down below. And you know why I love about this, Zhamak, is, to me it underscores the data is not the new oil. Because I can put oil in my car, I can put it in my house but I can't put the same code in both places. But I think you call it polyglot data, which is really different forms, batch or whatever. But the same data doesn't follow the laws of scarcity. I can use the same data for many, many uses, and that's what this sort of graphic shows. And then you brought in the really important, sticking problem, which is that the governance which is now not a command and control, it's federated governance. So maybe you could add some thoughts on that. >> Sure, absolutely. It's one of those, I think I keep referring to data mesh as a paradigm shift, and it's not just to make it sound grand and like kind of grand and exciting or important, it's really because I want to point out, we need to question every moment when we make a decision around, how we're going to design security, or governance or modeling of the data. We need to reflect and go back and say, "Am I applying some of my cognitive biases around how I have worked for the last 40 years?" I've seen it work? Or "Do I do I really need to question?" And do need to question the way we have applied governance. I think at the end of the day, the role of the data governance and the objective remains the same. I mean, we all want quality data accessible to a diverse set of users and its users now know have different personas, like data persona, data analysts, data scientists, data application user. These are very diverse personas. So at the end of the day, we want quality data accessible to them, trustworthy in an easy consumable way. However, how we get there looks very different in as you mentioned that the governance model in the old world has been very command and control, very centralized. They were responsible for quality, they were responsible for certification of the data, applying and making sure the data complies with all sorts of regulations, make sure data gets discovered and made available. In the world of data mesh, really the job of the data governance as a function becomes finding the equilibrium between what decisions need to be made and enforced globally, and what decisions need to be made locally so that we can have an interoperable mesh of data sets that can move fast and can change fast. It's really about, instead of kind of putting those systems in a straight jacket of being constantly and don't change, embrace change, and continuous change of landscape because that's just the reality we can't escape. So the role of governance really, the modern governance model I called federated and computational. And by that I mean, every domain needs to have a representative in the governance team. So the role of the data or domain data product owner who really were understands that domain really well, but also wears that hats of the product owner. It's an important role that has to have a representation in the governance. So it's a federation of domains coming together. Plus the SMEs, and people have Subject Matter Experts who understand the regulations in that environment, who understands the data security concerns. But instead of trying to enforce and do this as a central team, they make decisions as what needs to be standardized. What needs to be enforced. And let's push that into that computationally and in an automated fashion into the platform itself, For example. Instead of trying to be part of the data quality pipeline and inject ourselves as people in that process, let's actually as a group, define what constitutes quality. How do we measure quality? And then let's automate that, and let's codify that into the platform, so that every day the products will have a CICD pipeline, and as part of that pipeline, law's quality metrics gets validated, and every day to product needs to publish those SLOs or Service Level Objectives, or whatever we choose as a measure of quality, maybe it's the integrity of the data, or the delay in the data, the liveliness of the data, whatever are the decisions that you're making. Let's codify that. So it's really the objectives of the governance team trying to satisfies the same, but how they do it, it's very, very different. And I wrote a new article recently, trying to explain the logical architecture that would emerge from applying these principles, and I put a kind of a light table to compare and contrast how we do governance today, versus how we'll do it differently, to just give people a flavor of what does it mean to embrace decentralization, and what does it mean to embrace change, and continuous change. So hopefully that could be helpful. >> Yes. There's so many questions I have. But the point you make it too on data quality, sometimes I feel like quality is the end game, Where the end game should be how fast you can go from idea to monetization with a data service. What happens again? And you've sort of addressed this, but what happens to the underlying infrastructure? I mean, spinning up EC2s and S3 buckets, and MyPytorches and TensorFlows. That lives in the business, and who's responding for that? >> Yeah, that's why I'm glad you're asking this question, David, because I truly believe we need to reimagine that world. I think there are many pieces that we can use as utilities are foundational pieces, but I can see for myself at five to seven year road map building this new tooling. I think in terms of the ownership, the question around ownership, that would remain with the platform team, but I don't perhaps a domain agnostic technology focused team, right? That there are providing a set of products themselves, but the users of those products are data product developers, right? Data domain teams that now have really high expectations, in terms of low friction, in terms of a lead time to create a new data products. So we need a new set of tooling and I think the language needs to shift from I need a storage bucket, or I need a storage account, to I need a cluster to run my spark jobs. Too, here's the declaration of my data products. This is where the data file will come from, this is a data that I want to serve, these are the policies that I need to apply in terms of perhaps encryption or access control, go make it happen platform, go provision everything that I need, so that as a data product developer, all I can focus on is the data itself. Representation of semantic and representation of the syntax, and make sure that data meets the quality that I have to assure and it's available. The rest of provisioning of everything that sits underneath will have to get taken care of by the platform. And that's what I mean by requires a reimagination. And there will be a data platform team. The data platform teams that we set up for our clients, in fact themselves have a fair bit of complexity internally, they divide into multiple teams, multiple planes. So there would be a plane, as in a group of capabilities that satisfied that data product developer experience. There would be a set of capabilities that deal with those nitty gritty underlying utilities, I call them (indistinct) utilities because to me, the level of abstraction of the platform needs to go higher than where it is. So what we call platform today are a set of utilities we'll be continuing to using. We'll be continuing to using object storage, we will continue to using relational databases and so on. So there will be a plane and a group of people responsible for that. There will be a group of people responsible for capabilities that enable the mesh level functionality, for example, be able to correlate and connect and query data from multiple nodes, that's a mesh level capability, to be able to discover and explore the mesh of data products, that's the mesh of capability. So it would be a set of teams as part of platform. So we use a strong, again, products thinking embedded in a product and ownership embedded into that to satisfy the experience of this now business oriented domain data teams. So we have a lot of work to do. >> I could go on, unfortunately, we're out of time, but I guess, first of all, I want to tell people there's two pieces that you've put out so far. One is how to move beyond a Monolithic Data Lake to a distributed data mesh. You guys should read that in the "Data Mesh Principles and Logical Architecture," is kind of part two. I guess my last question in the very limited time we have is are organizations ready for this? >> I think how the desire is there. I've been overwhelmed with the number of large and medium and small and private and public, and governments and federal organizations that reached out to us globally. I mean, this is a global movement and I'm humbled by the response of the industry. I think, the desire is there, the pains are real, people acknowledge that something needs to change here. So that's the first step. I think awareness is spreading, organizations are more and more becoming aware, in fact, many technology providers are reaching to us asking what shall we do because our clients are asking us, people are already asking, we need the data mesh and we need the tooling to support it. So that awareness is there in terms of the first step of being ready. However, the ingredients of a successful transformation requires top-down and bottom-up support. So it requires support from chief data analytics officers, all above, the most successful clients that we have with data mesh are the ones that, the CEOs have made a statement that, "We'd want to change the experience of every single customer using data, and we're going to commit to this." So the investment and support exists from top to all layers, the engineers are excited, the maybe perhaps the traditional data teams are open to change. So there are a lot of ingredients of transformations that come together. Are we really ready for it? I think the pioneers, perhaps, the innovators if you think about that innovation curve of adopters, probably pioneers and innovators and lead adopters are making moves towards it, and hopefully as the technology becomes more available, organizations that are less engineering oriented, they don't have the capability in-house today, but they can buy it, they would come next. Maybe those are not the ones who are quite ready for it because the technology is not readily available and requires internal investments to make. >> I think you're right on. I think the leaders are going to lean in hard and they're going to show us the path over the next several years. And I think that the end of this decade is going to be defined a lot differently than the beginning. Zhamak, thanks so much for coming to "theCUBE" and participating in the program. >> Thank you for hosting me, David. >> Pleasure having you. >> It's been wonderful. >> All right, keep it right there everybody, we'll be back right after this short break. (slow music)

Published Date : Dec 23 2020

SUMMARY :

and attract the data science and our platforms at scale. and data that we put in globally in place, So on the left here, we're adjusting data how the data is going to be used. So one of the blockers that you've cited and a number of the So the linchpin to your scenario for the data to be a product, is that the governance So at the end of the day, we But the point you make and make sure that data meets the quality in the "Data Mesh Principles and hopefully as the technology and participating in the program. after this short break.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Michael	PERSON	0.99+
Marc Lemire	PERSON	0.99+
Chris O'Brien	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Hilary	PERSON	0.99+
Mark	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ildiko Vancsa	PERSON	0.99+
John	PERSON	0.99+
Alan Cohen	PERSON	0.99+
Lisa Martin	PERSON	0.99+
John Troyer	PERSON	0.99+
Rajiv	PERSON	0.99+
Europe	LOCATION	0.99+
Stefan Renner	PERSON	0.99+
Ildiko	PERSON	0.99+
Mark Lohmeyer	PERSON	0.99+
JJ Davis	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Beth	PERSON	0.99+
Jon Bakke	PERSON	0.99+
John Farrier	PERSON	0.99+
Boeing	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Dave Nicholson	PERSON	0.99+
Cassandra Garber	PERSON	0.99+
Peter McKay	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave Brown	PERSON	0.99+
Beth Cohen	PERSON	0.99+
Stu Miniman	PERSON	0.99+
John Walls	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
Seattle	LOCATION	0.99+
5	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
JJ	PERSON	0.99+
Jen Saavedra	PERSON	0.99+
Michael Loomis	PERSON	0.99+
Lisa	PERSON	0.99+
Jon	PERSON	0.99+
Rajiv Ramaswami	PERSON	0.99+
Stefan	PERSON	0.99+

Jack Greenfield, Walmart | A Dive into Walmart's Retail Supercloud

>> Welcome back to SuperCloud2. This is Dave Vellante, and we're here with Jack Greenfield. He's the Vice President of Enterprise Architecture and the Chief Architect for the global technology platform at Walmart. Jack, I want to thank you for coming on the program. Really appreciate your time. >> Glad to be here, Dave. Thanks for inviting me and appreciate the opportunity to chat with you. >> Yeah, it's our pleasure. Now we call what you've built a SuperCloud. That's our term, not yours, but how would you describe the Walmart Cloud Native Platform? >> So WCNP, as the acronym goes, is essentially an implementation of Kubernetes for the Walmart ecosystem. And what that means is that we've taken Kubernetes off the shelf as open source, and we have integrated it with a number of foundational services that provide other aspects of our computational environment. So Kubernetes off the shelf doesn't do everything. It does a lot. In particular the orchestration of containers, but it delegates through API a lot of key functions. So for example, secret management, traffic management, there's a need for telemetry and observability at a scale beyond what you get from raw Kubernetes. That is to say, harvesting the metrics that are coming out of Kubernetes and processing them, storing them in time series databases, dashboarding them, and so on. There's also an angle to Kubernetes that gets a lot of attention in the daily DevOps routine, that's not really part of the open source deliverable itself, and that is the DevOps sort of CICD pipeline-oriented lifecycle. And that is something else that we've added and integrated nicely. And then one more piece of this picture is that within a Kubernetes cluster, there's a function that is critical to allowing services to discover each other and integrate with each other securely and with proper configuration provided by the concept of a service mesh. So Istio, Linkerd, these are examples of service mesh technologies. And we have gone ahead and integrated actually those two. There's more than those two, but we've integrated those two with Kubernetes. So the net effect is that when a developer within Walmart is going to build an application, they don't have to think about all those other capabilities where they come from or how they're provided. Those are already present, and the way the CICD pipelines are set up, it's already sort of in the picture, and there are configuration points that they can take advantage of in the primary YAML and a couple of other pieces of config that we supply where they can tune it. But at the end of the day, it offloads an awful lot of work for them, having to stand up and operate those services, fail them over properly, and make them robust. All of that's provided for. >> Yeah, you know, developers often complain they spend too much time wrangling and doing things that aren't productive. So I wonder if you could talk about the high level business goals of the initiative in terms of the hardcore benefits. Was the real impetus to tap into best of breed cloud services? Were you trying to cut costs? Maybe gain negotiating leverage with the cloud guys? Resiliency, you know, I know was a major theme. Maybe you could give us a sense of kind of the anatomy of the decision making process that went in. >> Sure, and in the course of answering your question, I think I'm going to introduce the concept of our triplet architecture which we haven't yet touched on in the interview here. First off, just to sort of wrap up the motivation for WCNP itself which is kind of orthogonal to the triplet architecture. It can exist with or without it. Currently does exist with it, which is key, and I'll get to that in a moment. The key drivers, business drivers for WCNP were developer productivity by offloading the kinds of concerns that we've just discussed. Number two, improving resiliency, that is to say reducing opportunity for human error. One of the challenges you tend to run into in a large enterprise is what we call snowflakes, lots of gratuitously different workloads, projects, configurations to the extent that by developing and using WCNP and continuing to evolve it as we have, we end up with cookie cutter like consistency across our workloads which is super valuable when it comes to building tools or building services to automate operations that would otherwise be manual. When everything is pretty much done the same way, that becomes much simpler. Another key motivation for WCNP was the ability to abstract from the underlying cloud provider. And this is going to lead to a discussion of our triplet architecture. At the end of the day, when one works directly with an underlying cloud provider, one ends up taking a lot of dependencies on that particular cloud provider. Those dependencies can be valuable. For example, there are best of breed services like say Cloud Spanner offered by Google or say Cosmos DB offered by Microsoft that one wants to use and one is willing to take the dependency on the cloud provider to get that functionality because it's unique and valuable. On the other hand, one doesn't want to take dependencies on a cloud provider that don't add a lot of value. And with Kubernetes, we have the opportunity, and this is a large part of how Kubernetes was designed and why it is the way it is, we have the opportunity to sort of abstract from the underlying cloud provider for stateless workloads on compute. And so what this lets us do is build container-based applications that can run without change on different cloud provider infrastructure. So the same applications can run on WCNP over Azure, WCNP over GCP, or WCNP over the Walmart private cloud. And we have a private cloud. Our private cloud is OpenStack based and it gives us some significant cost advantages as well as control advantages. So to your point, in terms of business motivation, there's a key cost driver here, which is that we can use our own private cloud when it's advantageous and then use the public cloud provider capabilities when we need to. A key place with this comes into play is with elasticity. So while the private cloud is much more cost effective for us to run and use, it isn't as elastic as what the cloud providers offer, right? We don't have essentially unlimited scale. We have large scale, but the public cloud providers are elastic in the extreme which is a very powerful capability. So what we're able to do is burst, and we use this term bursting workloads into the public cloud from the private cloud to take advantage of the elasticity they offer and then fall back into the private cloud when the traffic load diminishes to the point where we don't need that elastic capability, elastic capacity at low cost. And this is a very important paradigm that I think is going to be very commonplace ultimately as the industry evolves. Private cloud is easier to operate and less expensive, and yet the public cloud provider capabilities are difficult to match. >> And the triplet, the tri is your on-prem private cloud and the two public clouds that you mentioned, is that right? >> That is correct. And we actually have an architecture in which we operate all three of those cloud platforms in close proximity with one another in three different major regions in the US. So we have east, west, and central. And in each of those regions, we have all three cloud providers. And the way it's configured, those data centers are within 10 milliseconds of each other, meaning that it's of negligible cost to interact between them. And this allows us to be fairly agnostic to where a particular workload is running. >> Does a human make that decision, Jack or is there some intelligence in the system that determines that? >> That's a really great question, Dave. And it's a great question because we're at the cusp of that transition. So currently humans make that decision. Humans choose to deploy workloads into a particular region and a particular provider within that region. That said, we're actively developing patterns and practices that will allow us to automate the placement of the workloads for a variety of criteria. For example, if in a particular region, a particular provider is heavily overloaded and is unable to provide the level of service that's expected through our SLAs, we could choose to fail workloads over from that cloud provider to a different one within the same region. But that's manual today. We do that, but people do it. Okay, we'd like to get to where that happens automatically. In the same way, we'd like to be able to automate the failovers, both for high availability and sort of the heavier disaster recovery model between, within a region between providers and even within a provider between the availability zones that are there, but also between regions for the sort of heavier disaster recovery or maintenance driven realignment of workload placement. Today, that's all manual. So we have people moving workloads from region A to region B or data center A to data center B. It's clean because of the abstraction. The workloads don't have to know or care, but there are latency considerations that come into play, and the humans have to be cognizant of those. And automating that can help ensure that we get the best performance and the best reliability. >> But you're developing the dataset to actually, I would imagine, be able to make those decisions in an automated fashion over time anyway. Is that a fair assumption? >> It is, and that's what we're actively developing right now. So if you were to look at us today, we have these nice abstractions and APIs in place, but people run that machine, if you will, moving toward a world where that machine is fully automated. >> What exactly are you abstracting? Is it sort of the deployment model or, you know, are you able to abstract, I'm just making this up like Azure functions and GCP functions so that you can sort of run them, you know, with a consistent experience. What exactly are you abstracting and how difficult was it to achieve that objective technically? >> that's a good question. What we're abstracting is the Kubernetes node construct. That is to say a cluster of Kubernetes nodes which are typically VMs, although they can run bare metal in certain contexts, is something that typically to stand up requires knowledge of the underlying cloud provider. So for example, with GCP, you would use GKE to set up a Kubernetes cluster, and in Azure, you'd use AKS. We are actually abstracting that aspect of things so that the developers standing up applications don't have to know what the underlying cluster management provider is. They don't have to know if it's GCP, AKS or our own Walmart private cloud. Now, in terms of functions like Azure functions that you've mentioned there, we haven't done that yet. That's another piece that we have sort of on our radar screen that, we'd like to get to is serverless approach, and the Knative work from Google and the Azure functions, those are things that we see good opportunity to use for a whole variety of use cases. But right now we're not doing much with that. We're strictly container based right now, and we do have some VMs that are running in sort of more of a traditional model. So our stateful workloads are primarily VM based, but for serverless, that's an opportunity for us to take some of these stateless workloads and turn them into cloud functions. >> Well, and that's another cost lever that you can pull down the road that's going to drop right to the bottom line. Do you see a day or maybe you're doing it today, but I'd be surprised, but where you build applications that actually span multiple clouds or is there, in your view, always going to be a direct one-to-one mapping between where an application runs and the specific cloud platform? >> That's a really great question. Well, yes and no. So today, application development teams choose a cloud provider to deploy to and a location to deploy to, and they have to get involved in moving an application like we talked about today. That said, the bursting capability that I mentioned previously is something that is a step in the direction of automatic migration. That is to say we're migrating workload to different locations automatically. Currently, the prototypes we've been developing and that we think are going to eventually make their way into production are leveraging Istio to assess the load incoming on a particular cluster and start shedding that load into a different location. Right now, the configuration of that is still manual, but there's another opportunity for automation there. And I think a key piece of this is that down the road, well, that's a, sort of a small step in the direction of an application being multi provider. We expect to see really an abstraction of the fact that there is a triplet even. So the workloads are moving around according to whatever the control plane decides is necessary based on a whole variety of inputs. And at that point, you will have true multi-cloud applications, applications that are distributed across the different providers and in a way that application developers don't have to think about. >> So Walmart's been a leader, Jack, in using data for competitive advantages for decades. It's kind of been a poster child for that. You've got a mountain of IP in the form of data, tools, applications best practices that until the cloud came out was all On Prem. But I'm really interested in this idea of building a Walmart ecosystem, which obviously you have. Do you see a day or maybe you're even doing it today where you take what we call the Walmart SuperCloud, WCNP in your words, and point or turn that toward an external world or your ecosystem, you know, supporting those partners or customers that could drive new revenue streams, you know directly from the platform? >> Great questions, Dave. So there's really two things to say here. The first is that with respect to data, our data workloads are primarily VM basis. I've mentioned before some VMware, some straight open stack. But the key here is that WCNP and Kubernetes are very powerful for stateless workloads, but for stateful workloads tend to be still climbing a bit of a growth curve in the industry. So our data workloads are not primarily based on WCNP. They're VM based. Now that said, there is opportunity to make some progress there, and we are looking at ways to move things into containers that are currently running in VMs which are stateful. The other question you asked is related to how we expose data to third parties and also functionality. Right now we do have in-house, for our own use, a very robust data architecture, and we have followed the sort of domain-oriented data architecture guidance from Martin Fowler. And we have data lakes in which we collect data from all the transactional systems and which we can then use and do use to build models which are then used in our applications. But right now we're not exposing the data directly to customers as a product. That's an interesting direction that's been talked about and may happen at some point, but right now that's internal. What we are exposing to customers is applications. So we're offering our global integrated fulfillment capabilities, our order picking and curbside pickup capabilities, and our cloud powered checkout capabilities to third parties. And this means we're standing up our own internal applications as externally facing SaaS applications which can serve our partners' customers. >> Yeah, of course, Martin Fowler really first introduced to the world Zhamak Dehghani's data mesh concept and this whole idea of data products and domain oriented thinking. Zhamak Dehghani, by the way, is a speaker at our event as well. Last question I had is edge, and how you think about the edge? You know, the stores are an edge. Are you putting resources there that sort of mirror this this triplet model? Or is it better to consolidate things in the cloud? I know there are trade-offs in terms of latency. How are you thinking about that? >> All really good questions. It's a challenging area as you can imagine because edges are subject to disconnection, right? Or reduced connection. So we do place the same architecture at the edge. So WCNP runs at the edge, and an application that's designed to run at WCNP can run at the edge. That said, there are a number of very specific considerations that come up when running at the edge, such as the possibility of disconnection or degraded connectivity. And so one of the challenges we have faced and have grappled with and done a good job of I think is dealing with the fact that applications go offline and come back online and have to reconnect and resynchronize, the sort of online offline capability is something that can be quite challenging. And we have a couple of application architectures that sort of form the two core sets of patterns that we use. One is an offline/online synchronization architecture where we discover that we've come back online, and we understand the differences between the online dataset and the offline dataset and how they have to be reconciled. The other is a message-based architecture. And here in our health and wellness domain, we've developed applications that are queue based. So they're essentially business processes that consist of multiple steps where each step has its own queue. And what that allows us to do is devote whatever bandwidth we do have to those pieces of the process that are most latency sensitive and allow the queue lengths to increase in parts of the process that are not latency sensitive, knowing that they will eventually catch up when the bandwidth is restored. And to put that in a little bit of context, we have fiber lengths to all of our locations, and we have I'll just use a round number, 10-ish thousand locations. It's larger than that, but that's the ballpark, and we have fiber to all of them, but when the fiber is disconnected, When the disconnection happens, we're able to fall back to 5G and to Starlink. Starlink is preferred. It's a higher bandwidth. 5G if that fails. But in each of those cases, the bandwidth drops significantly. And so the applications have to be intelligent about throttling back the traffic that isn't essential, so that it can push the essential traffic in those lower bandwidth scenarios. >> So much technology to support this amazing business which started in the early 1960s. Jack, unfortunately, we're out of time. I would love to have you back or some members of your team and drill into how you're using open source, but really thank you so much for explaining the approach that you've taken and participating in SuperCloud2. >> You're very welcome, Dave, and we're happy to come back and talk about other aspects of what we do. For example, we could talk more about the data lakes and the data mesh that we have in place. We could talk more about the directions we might go with serverless. So please look us up again. Happy to chat. >> I'm going to take you up on that, Jack. All right. This is Dave Vellante for John Furrier and the Cube community. Keep it right there for more action from SuperCloud2. (upbeat music)

Published Date : Feb 17 2023

SUMMARY :

and the Chief Architect for and appreciate the the Walmart Cloud Native Platform? and that is the DevOps Was the real impetus to tap into Sure, and in the course And the way it's configured, and the humans have to the dataset to actually, but people run that machine, if you will, Is it sort of the deployment so that the developers and the specific cloud platform? and that we think are going in the form of data, tools, applications a bit of a growth curve in the industry. and how you think about the edge? and allow the queue lengths to increase for explaining the and the data mesh that we have in place. and the Cube community.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Dave	PERSON	0.99+
Jack	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Martin Fowler	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
US	LOCATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Today	DATE	0.99+
each	QUANTITY	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
two things	QUANTITY	0.99+
three	QUANTITY	0.99+
first	QUANTITY	0.99+
each step	QUANTITY	0.99+
First	QUANTITY	0.99+
early 1960s	DATE	0.99+
Starlink	ORGANIZATION	0.99+
one	QUANTITY	0.98+
a day	QUANTITY	0.97+
GCP	TITLE	0.97+
Azure	TITLE	0.96+
WCNP	TITLE	0.96+
10 milliseconds	QUANTITY	0.96+
both	QUANTITY	0.96+
Kubernetes	TITLE	0.94+
Cloud Spanner	TITLE	0.94+
Linkerd	ORGANIZATION	0.93+
triplet	QUANTITY	0.92+
three cloud providers	QUANTITY	0.91+
Cube	ORGANIZATION	0.9+
SuperCloud2	ORGANIZATION	0.89+
two core sets	QUANTITY	0.88+
John Furrier	PERSON	0.88+
one more piece	QUANTITY	0.86+
two public clouds	QUANTITY	0.86+
thousand locations	QUANTITY	0.83+
Vice President	PERSON	0.8+
10-ish	QUANTITY	0.79+
WCNP	ORGANIZATION	0.75+
decades	QUANTITY	0.75+
three different major regions	QUANTITY	0.74+

Welcome to Supercloud2

(bright upbeat melody) >> Hello everyone, welcome back to Supercloud2. I'm John Furrier, my co-host Dave Vellante, here at theCUBE in Palo Alto, California, for our live stage performance all day for Supercloud2. Unpacking this next generation movement in cloud computing. Dave, Supercloud1 was in August. We had great response and acceleration of that momentum. We had some haters too. We had some folks out there throwing shade on this. But at the same time, a lot of leaders came out of the woodwork, a lot of practitioners. And this Supercloud2 event I think will expose and illustrate some of the examples of what's happening in the industry and more importantly, kind of where it's going. >> Well it's great to be back in our studios in Palo Alto, John. Seems like just yesterday was August 9th, where the community was really refining the definition of Super Cloud. We were identifying the essential characteristics, with some of the leading technologists in Silicon Valley. We were digging into the deployment models. Whereas this Supercloud, Supercloud2 is really taking a practitioner view. We're going to hear from Walmart today. They've built a Supercloud. They called it the Walmart Cloud native platform. We're going to hear from other data practitioners, like Saks. We're going to hear from Western Union. They've got 200 locations around the world, how they're dealing with data sovereignty. And of course we've got some local technologists and practitioners coming in, analysts, consultants, theCUBE community. I'm really excited to be here. >> And we've got some great keynotes from executives at VMware. We're going to expose some of the things that they're working on around cross cloud services, which leads into multicloud. I think the practitioner angle highlights my favorite part of this program, 'cause you're starting to see the builders, a term coined by Andy Jassy, early days of AWS. That builder movement has been continuing to go. And you're seeing the enterprise, global enterprises adopt this builder mentality with Cloud Native. This is going to power the next generation global economy. And I think the role of the cloud computing vendors like AWS, Azure, Google, Alibaba are going to be the source engine of innovation. And what gets built on top of and with the clouds will be a big significant market value for all businesses and their business models. So I think the market wants the supercloud, the business models are pointing to Supercloud. The technology needs supercloud. And society, from an economic standpoint and from a use case standpoint, needs supercloud. You're seeing it today. Everyone's talking about chat GPT. This is an example of what will come out of this next generation and it's just getting started. So to me, you're either on the supercloud side of the camp or you're on the old school, hugging onto the old school mentality of wait a minute, that's cloud computing. So I think if you're not on the super cloud wave, you're going to be driftwood. And that's a term coined by Pat Gelsinger. And this is really the reality. Are you on the super cloud side? Or are you on the old huggin' the old model? And that's going to be a determinant. And you're going to see who's going to be the players on that, Dave. This is going to be a real big year. >> Everybody's heard the phrase follow the money. Well, my philosophy is follow the data. And that's a big part of what Supercloud2 is, because the data is where the money is across the clouds. And people want more simplicity, or greater simplicity across the clouds. So it's really, there's two forces here. You've got the ecosystem that's saying, hey the hyperscalers, they've done a great job but there's problems that they're not solving. So we're going to lean in and solve those problems. At the same time, you have the practitioners saying we have multicloud, we have to deal with this, help us. It's got to be simpler. Because we want to share data across clouds. We want to build data products, we want to monetize and drive revenue and cut costs. >> This is the key thing. The builder movement is hitting a wall, and that wall will be broken down because the business models of the companies themselves are demanding that the value from the data with security has to be embedded. So I think you're going to see a big year this next year or so where the builders will accelerate through this next generation, supercloud wave, will be a builder's wave for business. And I think that's going to be the nuance here. And all the people that are on the side of Supercloud are all pro-business, pro-technology. The ones that aren't are like, wait a minute I used to do things differently. They're stuck. And so I think this is going to be a question of are we stuck? Are builders accelerating? Will the business models develop around it? That's digital transformation. At the end of the day, the market's speaking, Dave. The market wants more. Chat GPT, you're seeing AI starting to flourish, powered by data. It's unstoppable, supercloud's unstoppable. >> One of our headliners today is Zhamak Dehghani, the creator of Data Mesh. We've got some news around her. She's going to be live in studio. Super excited about that. Kit Colbert in Supercloud, the first Supercloud in last August, laid out an initial architecture for Supercloud. He's going to advance that today, tell us what's changed, and really dig into and really talk about the meat on the bone, if you will. And we've got some other technologists that are coming in saying, Hey, is it a platform? Is it an architecture? What's the right model here? So we're going to debate that a little bit today. >> And before we close, I'll just say look at the guests, look at the talk tracks. You're seeing a diversity of startups doing cloud networking, you're seeing big practitioners building their own thing, being builders for business value and business model advantages. And you got companies like VMware, who have been on the wave of virtualization. So the, everyone who's involved in super cloud, they're seeing it, they're on the front lines. They're seeing the trend. They are riding that wave. And they have, they're bringing data to the table. So to me, you look at who's involved and you judge it that way. To me, that's the way I look at this. And because we're making it open, Supercloud is going to continue to be debated. But more importantly, the results are going to come in. The market supports it, the business needs it, tech's there, and will it happen? So I think the builders movement, Dave, is going to be big to watch. And then ultimately how that business transformation kicks in, and I think those are the two variables that I would watch on Supercloud. >> Our mission has always been around free content, giving back to the community. So I really want to thank our sponsors today. We've had a great partnership with VMware, who's not only contributed some financial support, but also great content. Alkira, ChaosSearch, prosimo, all phenomenal, allowing us to achieve our mission of serving our audiences and really trying to give more than we take from. >> Free content, that's our mission. Dave, great to kick it off. Kickin' off Supercloud2 all day, we've got some great programs here. We've got VMware coming up next. We have Victoria Viering, who's been on before. He's got a great vision for cross cloud service. We're getting also a keynote with Kit Colbert, who's going to lay out the fragmentation and the benefits that that solves, from solvent fragmentation and silos, breaking down the silos and bringing multicloud future to the table via Super Cloud. So stay with us. We'll be right back after this short break. (bright upbeat music) (music fades)

Published Date : Feb 17 2023

SUMMARY :

and illustrate some of the examples We're going to hear from Walmart today. And that's going to be a determinant. At the same time, you And so I think this is going to the meat on the bone, if you will. Dave, is going to be big to watch. giving back to the community. and the benefits that that solves,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
August	DATE	0.99+
Victoria Viering	PERSON	0.99+
August 9th	DATE	0.99+
John Furrier	PERSON	0.99+
200 locations	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
Supercloud2	EVENT	0.99+
two forces	QUANTITY	0.99+
last August	DATE	0.99+
yesterday	DATE	0.99+
first	QUANTITY	0.99+
two variables	QUANTITY	0.99+
today	DATE	0.98+
One	QUANTITY	0.98+
supercloud	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.97+
ChaosSearch	ORGANIZATION	0.95+
super cloud wave	EVENT	0.94+
Supercloud1	EVENT	0.94+
Super Cloud	TITLE	0.93+
Alkira	PERSON	0.83+
Palo Alto, John	LOCATION	0.83+
this next year	DATE	0.81+
Data Mesh	ORGANIZATION	0.8+
supercloud wave	EVENT	0.79+
wave of	EVENT	0.79+
Western Union	LOCATION	0.78+
Saks	ORGANIZATION	0.76+
GPT	ORGANIZATION	0.73+
Supercloud2	ORGANIZATION	0.72+
Cloud Native	TITLE	0.69+
Supercloud	TITLE	0.67+
Supercloud2	COMMERCIAL_ITEM	0.66+
multicloud	ORGANIZATION	0.57+
Supercloud	COMMERCIAL_ITEM	0.53+
Supercloud2	TITLE	0.53+
theCUBE	ORGANIZATION	0.51+
super cloud	TITLE	0.51+
Cloud	TITLE	0.41+

Is Data Mesh the Killer App for Supercloud | Supercloud2

(gentle bright music) >> Okay, welcome back to our "Supercloud 2" event live coverage here at stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We've got exclusive news and a scoop here for SiliconANGLE and theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called NextData.com NextData, she's a cube alumni and contributor to our Supercloud initiative, as well as our coverage and breaking analysis with Dave Vellante on data, the killer app for Supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> John: Wonderful. Your contributions to the data conversation has been well-documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing, you know, cold water on it. Some are, I think, it's the next big thing. Tell us about the data mesh super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, it's, you know, the pain point that it surfaced were universal. Everybody said, "Oh, why didn't I think of that?" You know, it was just an obvious next step and people are approaching it, implementing it. I guess the last few years, I've been involved in many of those implementations, and I guess Supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include boundaries, organizational boundaries cloud technology boundaries and trust boundaries. >> I want to bring that up because your venture, NextData which is new, just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Zhamak: Absolutely, yes. So next data is the result of, I suppose, the pains that I suffered from implementing a database for many of the organizations. Basically, a lot of organizations that I've worked with, they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find, I guess, the common denominator that solves those problems and enables that developer experience for data sharing. >> John: Since you just announced the news, what's been the reaction? >> Zhamak: I just announced the news right now, so what's the reaction? >> John: But people in the industry that know you, you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth modes, so we haven't publicly talked about it, but folks that have been close to us in fact have reached out. We already have implementations of our pilot platform with early customers, which is super exciting. And we're going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those where we are going to have multiple pilots, implementations of our platform in real world. We're real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak: When I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally obviously not surprising. They don't include the big vision of inclusivity across clouds across different data stores. But it seems like people are having to go through some gymnastics to get to, you know, the organizational reality of decentralizing data, and at least pushing data ownership to the line of business. How are you approaching or are you approaching, solving that problem? Are you taking a narrow slice? What can you tell us about Next Data? >> Zhamak: Sure, yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that, you know, the data, as you know, resides on different platforms. It's owned by different people, it's processed by pipelines that who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem, the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in this autonomous units, we call them data products, I guess in data mesh, right? That constitutes computation, that governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is, you know, data in different places, decentralization and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation, APIs to get to it in a technology agnostic way, in an open way. And then, sit on top and use existing existing tech, you know, Snowflake, Databricks, whatever exists, you know, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here, that the language, and the modeling that we use is really native to data mesh is that I will make a data product, I'm sharing a data product, and that encapsulates on providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side connected to peer-to-peer data sharing with data product as a primitive first class concept. >> Okay, so the idea would be developers would build applications leveraging those data products which are discoverable and governed. Now, today you see some companies, you know, take a snowflake for example. >> Zhamak: Yeah. >> Attempting to do that within their own little walled garden. They even, at one point, used the term, "Mesh." I dunno if they pull back on that. And then they sort of became aware of some of your work. But a lot of the things that they're doing within their little insulated environment, you know, support that, that, you know, governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realize that, you know, and this is a reality, like you go to organizations, they have a snowflake and half of the organization happily operates on Snowflake. And on the other half, oh, we are on, you know, bare infrastructure on AWS, or we are on Databricks. This is the realities, you know, this Supercloud that's written up here. It's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use next data mesh operating system. People will have different platforms." So you have to build with openness in mind, and in case of Snowflake, I think, you know, they have I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So, it's worth reviewing that basically, the concept of data mesh is that, whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember, I wrote a blog post in 2007 called, "Data's the new developer kit." Back then, they used to call 'em developer kits, if you remember. And that we said at that time, whoever can code data >> Zhamak: Yes. >> Will have a competitive advantage. >> Aren't there machines going to be doing that? Didn't we just hear that? >> Well we have, and you know, Hey Siri, hey Cube. Find me that best video for data mesh. There it is. I mean, this is the point, like what's happening is that, now, data has to be addressable >> Zhamak: Yes. >> For machines and for coding. >> Zhamak: Yes. >> Because as you need to call the data. So the question is, how do you manage the complexity of big things as promiscuous as possible, making it available as well as then governing it because it's a trade off. The more you make open >> Zhamak: Definitely. >> The better the machine learning. >> Zhamak: Yes. >> But yet, the governance issue, so this is the, you need an OS to handle this maybe. >> Yes, well, we call our mental model for our platform is an OS operating system. Operating systems, you know, have shown us how you can kind of abstract what's complex and take care of, you know, a lot of complexities, but yet provide an open and, you know, dynamic enough interface. So we think about it that way. We try to solve the problem of policies live with the data. An enforcement of the policies happens at the most granular level which is, in this concept, the data product. And that would happen whether you read, write, or access a data product. But we can never imagine what are these policies could be. So our thinking is, okay, we should have a open policy framework that can allow organizations write their own policy drivers, and policy definitions, and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, you know, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now, the primitives that we work with to train machine-learning model are still bits and bites in data. They're fields, rows, columns, right? And that creates quite a large surface area, an attack area for, you know, for privacy of the data. So perhaps, one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area so you can really leave the control of the data to the sovereign owners of that data, right? So that data product. So I think the evolution of our data APIs perhaps will become more and more computational. So you describe what you want, and the data owner decides, you know, how to manage the- >> John: That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment with you, who's a machine learning, could really been around the industry. It's almost as if you're starting to see reason come into the data, reasoning. It's like you starting to see not just metadata, using the data to reason so that you don't have to expose the raw data. It's almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that 'cause that seems to be where the dots are connecting. >> Zhamak: Yes, this is perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge-making mode, however, by just the basic notion of saying, "I'm going to put an API in front of my data, and that API today might be as primitive as a level of indirection as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage, and insert all of this intelligence that need to happen. And then I will, today, I will still give you a file. But by just defining that API and standardizing it, now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can kind of evolve that, right? Now you have a point of evolution to this very futuristic, I guess, future where you just describe the question that you're asking from the chat. >> Well, this is the Supercloud, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so, his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has, and he wants your feedback on this, "Is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products, how do you respond to that? How do you see, is that a problem or is that something that is overstated, or do you have an answer for that?" >> Zhamak: Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not burdening them with the complexity of the application and application logic, and yet reducing their cognitive load by localizing what they need to know about which is that domain where they're operating within. Because what's happening right now? what's happening right now is that data engineers, a ton of empathy for them for their high threshold of pain that they can, you know, deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curates it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers, these are still separately moving units. Your app and your data products are independent but yet tightly closed with each other, tightly coupled with each other based on the context of the domain, so reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application but yet have them them separate from app because app provides a very different service. Transactional data for my e-commerce transaction, data product provides a very different service, longitudinal data for the, you know, variety of this intelligent analysis that I can do on the data. But yet, it's all within the domain of e-commerce or sales or whatnot. >> So a lot of decoupling and coupling create that cohesiveness. >> Zhamak: Absolutely. >> Architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server, data center days and cloud, SRE, Google coined the term, "Site Reliability Engineer" for someone to look over the hundreds of thousands of servers. We asked a question to data engineering community who have been suffering, by the way, agree. Is there an SRE-like role for data? Because in a way, data engineering, that platform engineer, they are like the SRE for data. In other words, managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Zhamak: Yes, exactly. So, maybe we go through that history of how SRE came to be. So we had the first DevOps movement which was, remove the wall between dev and ops and bring them together. So you have one cross-functional units of the organization that's responsible for, you build it you run it, right? So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that, and then we said, "Okay, as we decentralized and had this many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing and running a lot while giving autonomy to this cross-functional team." And that's where the SRE, a new generation of engineers came to exist. So I think if I just look- >> Hence Borg, hence Kubernetes. >> Hence, hence, exactly. Hence chaos engineering, hence embracing the complexity and messiness, right? And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think, if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain oriented cross-functional teams, full stop, and still have a very advanced maybe at the platform infrastructure level kind of operational team that they're not busy doing two jobs which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> John: So you see similarities. >> Absolutely, but I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening yet. Eh, a little bit fast and loose with some complexities to clean up. >> Yes, yes. This is a different restructure. As you said we, you know, the job of this industry as a whole on architects is decompose, recompose, decompose, recomposing a new way, and now we're like decomposing centralized team, recomposing them as domains and- >> John: So is data mesh the killer app for Supercloud? >> You had to do this for me. >> Dave: Sorry, I couldn't- (John and Dave laughing) >> Zhamak: What do you want me to say, Dave? >> John: Yes. >> Zhamak: Yes of course. >> I mean Supercloud, I think it's, really the terminology's Supercloud, Opencloud. But I think, in spirits of it, this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> John: Well thank you so much for coming on Supercloud too, really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. (John laughs) >> John: That's now going well. We can move faster, so thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Okay, Supercloud 2 live here in Palo Alto. Our stage performance, I'm John Furrier with Dave Vellante. We're back with more after this short break, Stay with us all day for Supercloud 2. (gentle bright music)

Published Date : Feb 17 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few years, What's the pain point? a database for many of the organizations. in terms of the approach, but folks that have been close to us to get to, you know, the data, as you know, resides Okay, so the idea would be developers But a lot of the things that they're doing This is the realities, you know, inside of the data. And that we said at that Well we have, and you know, So the question is, how do so this is the, you need and the data owner decides, you know, so that you don't have 'cause that seems to be where of this API, you not So the concern that he has, into the domain closer to So a lot of decoupling So I have to ask you, this a lot of the complexity of domains and the infrastructure, in a more early days of that movement. to clean up. the job of this industry the world would work. John: Well thank you so much for coming Dave: Been a great catalyst. We can move faster, so Thank you for hosting me. after this short break,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Zhamak	PERSON	0.99+
Dave	PERSON	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
Palo Alto	LOCATION	0.99+
John Furrier	PERSON	0.99+
John Furrier	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dav	PERSON	0.99+
two jobs	QUANTITY	0.99+
Supercloud	ORGANIZATION	0.99+
NextData	ORGANIZATION	0.99+
today	DATE	0.99+
Opencloud	ORGANIZATION	0.99+
last year	DATE	0.99+
Siri	TITLE	0.99+
ThoughtWorks	ORGANIZATION	0.98+
NextData.com	ORGANIZATION	0.98+
Supercloud 2	EVENT	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
HelloFresh	ORGANIZATION	0.98+
first	QUANTITY	0.98+
millions of dollars	QUANTITY	0.96+
Snowflake	EVENT	0.96+
Oracle	ORGANIZATION	0.96+
SRE	TITLE	0.94+
Snowflake	ORGANIZATION	0.94+
Cube	PERSON	0.93+
Zhama	PERSON	0.92+
Data Mesh the Killer App	TITLE	0.92+
SiliconANGLE	ORGANIZATION	0.91+
Databricks	ORGANIZATION	0.9+
first class	QUANTITY	0.89+
Supercloud 2	ORGANIZATION	0.88+
theCUBE	ORGANIZATION	0.88+
hundreds of thousands	QUANTITY	0.85+
one point	QUANTITY	0.84+
Zham	PERSON	0.83+
Supercloud	EVENT	0.83+
ChatGPT	ORGANIZATION	0.72+
SRE	ORGANIZATION	0.72+
Borg	PERSON	0.7+
Snowflake	TITLE	0.66+
Supercloud	TITLE	0.65+
half	QUANTITY	0.64+

Is Data Mesh the Next Killer App for Supercloud?

(upbeat music) >> Welcome back to our Supercloud 2 event live coverage here of stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We got exclusive news and a scoop here for SiliconANGLE in theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called Nextdata.com, Nextdata. She's a cube alumni and contributor to our supercloud initiative, as well as our coverage and Breaking Analysis with Dave Vellante on data, the killer app for supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> Wonderful. Your contributions to the data conversation has been well documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing cold water on it. Some are thinking it's the next big thing. Tell us about the data mesh, super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, the pain point that it surface were universal. Everybody said, "Oh, why didn't I think of that?" It was just an obvious next step and people are approaching it, implementing it. I guess the last few years I've been involved in many of those implementations and I guess supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include organizational boundaries, cloud technology boundaries, and trust boundaries. >> I want to bring that up because your venture, Nextdata, which is new just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Absolutely. Yes, so Nextdata is the result of, I suppose the pains that I suffered from implementing data mesh for many of the organizations. Basically a lot of organizations that I've worked with they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data, and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find the, I guess the common denominator that solves those problems and enables that developer experience for data sharing. >> Since you just announced the news, what's been the reaction? >> I just announced the news right now, so what's the reaction? >> But people in the industry know you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth mode so we haven't publicly talked about it, but folks that have been close to us, in fact have reached that we already have implementations of our pilot platform with early customers, which is super exciting. And we going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those, but we are going to have multiple pilot implementations of our platform in real world where real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak, when I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients, helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally, obviously not surprising, they don't include the big vision of inclusivity across clouds, across different data storage. But it seems like people are having to go through some gymnastics to get to the organizational reality of decentralizing data and at least pushing data ownership to the line of business. How are you approaching, or are you approaching solving that problem? Are you taking a narrow slice? What can you tell us about Nextdata? >> Yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that the data as you know resides on different platforms, it's owned by different people, is processed by pipelines that who knows who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in these autonomous units. We call them data products, I guess in data mesh. That constitutes computation. That governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is data in different places, decentralization, and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation APIs to get to it in a technology agnostic way, in an open way. And then sit on top and use existing tech, Snowflake, Databricks, whatever exists, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here. The language and the modeling that we use is really native to data mesh, which is that I'm building a data product I'm sharing a data product, and that encapsulates I'm providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side, connected to peer-to-peer data sharing with data product as a primitive first class concept. >> So the idea would be developers would build applications leveraging those data products, which are discoverable and governed. Now today you see some companies, take a Snowflake for example, attempting to do that within their own little walled garden. They even at one point used the term mesh. I don't know if they pull back on that. And then they became aware of some of your work. But a lot of the things that they're doing within their little insulated environment support that governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realized that, and this is a reality, like you go to organizations, they have a Snowflake and half of the organization happily operates on Snowflake. And on the other half, "oh, we are on Bare infrastructure on AWS or we are on Databricks." This is the reality. This supercloud that's written up here, it's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use Nextdata, data mesh operating system. People will have different platforms." So you have to build with openness in mind and in case of Snowflake, I think, they have very, I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So it's worth reviewing that basically the concept of data mesh is that whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember I wrote a blog post in 2007 called "Data as the New Developer Kit" back then we used to call them developer kits if you remember. And that we said at that time, whoever can code data will have a competitive advantage. >> Aren't the machines going to be doing that? Didn't we just hear that? >> Well, we have. Hey, Siri. Hey, Cube, find me that best video for data mesh. There it is. But this is the point, like what's happening is that now data has to be addressable. for machines and for coding because as you need to call the data. So the question is how do you manage the complexity of big things as promiscuous as possible, making it available, as well as then governing it? Because it's a trade off. The more you make open, the better the machine learning. But yet the governance issue, so this is the, you need an OS to handle this maybe. >> Yes. So yes, well we call, our mental model for our platform is an OS operating system. Operating systems have shown us how you can abstract what's complex and take care of a lot of complexities, but yet provide an open and dynamic enough interface. So we think about it that way. Just, we try to solve the problem of policies live with the data, an enforcement of the policies happens at the most granular level, which is in this concept of the data product. And that would happen whether you read, write or access a data product. But we can never imagine what are these policies could be. So our thinking is we should have a policy, open policy framework that can allow organizations write their own policy drivers and policy definitions and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now the primitives that we work with to train machine learning model are still bits and bytes and data. They're fields, rows, columns and that creates quite a large surface area and attack area for privacy of the data. So perhaps one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area. So you can really leave the control of the data to the sovereign owners of that data. So that data product. So I think that evolution of our data APIs perhaps will become more and more computational. So you describe what you want and the data owner decides how to manage. >> That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment we had with you. It was a machine learning have been around the industry. It's almost as if you're starting to see reason come into, the data reasoning is like starting to see not just metadata. Using the data to reason so that you don't have to expose the raw data. So almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that? 'Cause that seems to be where the dots are connecting. >> Yes, perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge making mode. However, by just the basic notion of saying, "I'm going to put an API in front of my data." And that API today might be as primitive as a level of indirection, as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage and insert all of this intelligence that need to happen. And then today, I will still give you a file. But by just defining that API and standardizing it now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can evolve that. Now you have a point of evolution to this very futuristic, I guess, future where you just described the question that you're asking from the ChatGPT. >> Well, this is the supercloud, go ahead, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has and he wants your feedback on this, is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products? How do you respond to that? How do you see? Is that a problem? Is that something that is overstated or do you have an answer for that? >> Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not overburdening them with the complexity of the application and application logic and yet reducing their cognitive load by localizing what they need to know about, which is that domain where they're operating within. Because what's happening right now? What's happening right now is that data engineers with, a ton of empathy for them for their high threshold of pain that they can deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curate it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers. These are still separately moving units. Your app and your data products are independent, but yet tightly closed with each other, tightly coupled with each other based on the context of the domain. So reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application, but yet have them separate from app because app provides a very different service. Transactional data for my e-commerce transaction. Data product provides a very different service. Longitudinal data for the variety of this intelligent analysis that I can do on the data. But yet it's all within the domain of e-commerce or sales or whatnot. >> It's a lot of decoupling and coupling create that cohesiveness architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server data center days and cloud, SRE, Google coined the term, site reliability engineer, for someone to look over the hundreds of thousands of servers. We asked the question to data engineering community who have been suffering, by the way, I agree. Is there an SRE like role for data? Because in a way data engineering, that platform engineer, they are like the SRE for data. In other words managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Yes, exactly. So maybe we go through that history of how SRE came to be. So we had the first DevOps movement, which was remove the wall between dev and ops and bring them together. So you have one unit of one cross-functional units of the organization that's responsible for you build it, you run it. So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that and then we said, okay, there is a ton, as we decentralized and had these many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing, and running a lot while giving autonomy to this cross-functional team. And that's where the SRE, a new generation of engineers came to exist. So I think if I just look at. >> Hence, Kubernetes. >> Hence, hence, exactly. Hence, chaos engineering. Hence, embracing the complexity and messiness. And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain-oriented cross-functional teams full stop and still have a very advanced maybe at the platform level, infrastructure level operational team that they're not busy doing two jobs, which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> So you see similarities? >> I see, absolutely. But I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening, yet a little bit fast and loose with some complexities to clean up. >> Yes. This is a different restructure. As you said, the job of this industry as a whole, an architect, is decompose recompose, decompose recompose in new way and now we're like decomposing centralized team, recomposing them as domains. >> So is data mesh the killer app for supercloud? >> You had to do this to me. >> Sorry, I couldn't resist. >> I know. Of course you want me to say this. >> Yes. >> Yes, of course. I mean, supercloud, I think it's really, the terminology supercloud, open cloud, but I think in spirits of it this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> Well, thank you so much for coming on Supercloud 2. We really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. >> That's now going well. We can move faster. So thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Supercloud 2 live here in Palo Alto, our stage performance. I'm John Furrier with Dave Vellante. We'll back with more after this short break. Stay with us all day for Supercloud 2. (upbeat music)

Published Date : Jan 25 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few What's the pain point? for many of the organizations. But people in the industry know you did but folks that have been close to us, at least the ones that I've is that the data as you know But a lot of the things that they're doing and half of the organization that basically the concept of data mesh And that we said at that time, is that now data has to be addressable. and the data owner decides how to manage. the data reasoning is like starting to see 'Cause that seems to be where What's a logic that I need to go Well, this is the So the concern that he has into the domain closer to We asked the question to of the organization that's responsible So I think if we look at that evolution, in a more early days of that movement. So it's a data DevOps As you said, the job of Of course you want me to say this. assume the world would work. the conversation and exposed So thanks for coming on. Thank you for hosting me. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
George Gilbert	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Nextdata	ORGANIZATION	0.99+
Zhamak	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
one	QUANTITY	0.99+
Nextdata.com	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
today	DATE	0.99+
HelloFresh	ORGANIZATION	0.99+
ThoughtWorks	ORGANIZATION	0.99+
last year	DATE	0.99+
Supercloud 2	EVENT	0.99+
Oracle	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Siri	TITLE	0.98+
Cube	PERSON	0.98+
Databricks	ORGANIZATION	0.98+
Snowflake	ORGANIZATION	0.97+
Supercloud	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one unit	QUANTITY	0.97+
Snowflake	TITLE	0.96+
SRE	TITLE	0.95+
millions of dollars	QUANTITY	0.94+
first class	QUANTITY	0.94+
hundreds of thousands of servers	QUANTITY	0.92+
supercloud	ORGANIZATION	0.92+
one point	QUANTITY	0.92+
Supercloud 2	TITLE	0.89+
ChatGPT	ORGANIZATION	0.81+
half	QUANTITY	0.81+
Data Mesh the Next Killer App	TITLE	0.78+
supercloud	TITLE	0.75+
a ton	QUANTITY	0.73+
Supercloud 2	ORGANIZATION	0.72+
SiliconANGLE	ORGANIZATION	0.7+
DevOps	TITLE	0.66+
Snowflake	EVENT	0.59+
S3	TITLE	0.54+
last	DATE	0.54+
supercloud	EVENT	0.48+
Kubernetes	TITLE	0.47+

Breaking Analysis: ChatGPT Won't Give OpenAI First Mover Advantage

>> From theCUBE Studios in Palo Alto in Boston, bringing you data-driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> OpenAI The company, and ChatGPT have taken the world by storm. Microsoft reportedly is investing an additional 10 billion dollars into the company. But in our view, while the hype around ChatGPT is justified, we don't believe OpenAI will lock up the market with its first mover advantage. Rather, we believe that success in this market will be directly proportional to the quality and quantity of data that a technology company has at its disposal, and the compute power that it could deploy to run its system. Hello and welcome to this week's Wikibon CUBE insights, powered by ETR. In this Breaking Analysis, we unpack the excitement around ChatGPT, and debate the premise that the company's early entry into the space may not confer winner take all advantage to OpenAI. And to do so, we welcome CUBE collaborator, alum, Sarbjeet Johal, (chuckles) and John Furrier, co-host of the Cube. Great to see you Sarbjeet, John. Really appreciate you guys coming to the program. >> Great to be on. >> Okay, so what is ChatGPT? Well, actually we asked ChatGPT, what is ChatGPT? So here's what it said. ChatGPT is a state-of-the-art language model developed by OpenAI that can generate human-like text. It could be fine tuned for a variety of language tasks, such as conversation, summarization, and language translation. So I asked it, give it to me in 50 words or less. How did it do? Anything to add? >> Yeah, think it did good. It's large language model, like previous models, but it started applying the transformers sort of mechanism to focus on what prompt you have given it to itself. And then also the what answer it gave you in the first, sort of, one sentence or two sentences, and then introspect on itself, like what I have already said to you. And so just work on that. So it it's self sort of focus if you will. It does, the transformers help the large language models to do that. >> So to your point, it's a large language model, and GPT stands for generative pre-trained transformer. >> And if you put the definition back up there again, if you put it back up on the screen, let's see it back up. Okay, it actually missed the large, word large. So one of the problems with ChatGPT, it's not always accurate. It's actually a large language model, and it says state of the art language model. And if you look at Google, Google has dominated AI for many times and they're well known as being the best at this. And apparently Google has their own large language model, LLM, in play and have been holding it back to release because of backlash on the accuracy. Like just in that example you showed is a great point. They got almost right, but they missed the key word. >> You know what's funny about that John, is I had previously asked it in my prompt to give me it in less than a hundred words, and it was too long, I said I was too long for Breaking Analysis, and there it went into the fact that it's a large language model. So it largely, it gave me a really different answer the, for both times. So, but it's still pretty amazing for those of you who haven't played with it yet. And one of the best examples that I saw was Ben Charrington from This Week In ML AI podcast. And I stumbled on this thanks to Brian Gracely, who was listening to one of his Cloudcasts. Basically what Ben did is he took, he prompted ChatGPT to interview ChatGPT, and he simply gave the system the prompts, and then he ran the questions and answers into this avatar builder and sped it up 2X so it didn't sound like a machine. And voila, it was amazing. So John is ChatGPT going to take over as a cube host? >> Well, I was thinking, we get the questions in advance sometimes from PR people. We should actually just plug it in ChatGPT, add it to our notes, and saying, "Is this good enough for you? Let's ask the real question." So I think, you know, I think there's a lot of heavy lifting that gets done. I think the ChatGPT is a phenomenal revolution. I think it highlights the use case. Like that example we showed earlier. It gets most of it right. So it's directionally correct and it feels like it's an answer, but it's not a hundred percent accurate. And I think that's where people are seeing value in it. Writing marketing, copy, brainstorming, guest list, gift list for somebody. Write me some lyrics to a song. Give me a thesis about healthcare policy in the United States. It'll do a bang up job, and then you got to go in and you can massage it. So we're going to do three quarters of the work. That's why plagiarism and schools are kind of freaking out. And that's why Microsoft put 10 billion in, because why wouldn't this be a feature of Word, or the OS to help it do stuff on behalf of the user. So linguistically it's a beautiful thing. You can input a string and get a good answer. It's not a search result. >> And we're going to get your take on on Microsoft and, but it kind of levels the playing- but ChatGPT writes better than I do, Sarbjeet, and I know you have some good examples too. You mentioned the Reed Hastings example. >> Yeah, I was listening to Reed Hastings fireside chat with ChatGPT, and the answers were coming as sort of voice, in the voice format. And it was amazing what, he was having very sort of philosophy kind of talk with the ChatGPT, the longer sentences, like he was going on, like, just like we are talking, he was talking for like almost two minutes and then ChatGPT was answering. It was not one sentence question, and then a lot of answers from ChatGPT and yeah, you're right. I, this is our ability. I've been thinking deep about this since yesterday, we talked about, like, we want to do this segment. The data is fed into the data model. It can be the current data as well, but I think that, like, models like ChatGPT, other companies will have those too. They can, they're democratizing the intelligence, but they're not creating intelligence yet, definitely yet I can say that. They will give you all the finite answers. Like, okay, how do you do this for loop in Java, versus, you know, C sharp, and as a programmer you can do that, in, but they can't tell you that, how to write a new algorithm or write a new search algorithm for you. They cannot create a secretive code for you to- >> Not yet. >> Have competitive advantage. >> Not yet, not yet. >> but you- >> Can Google do that today? >> No one really can. The reasoning side of the data is, we talked about at our Supercloud event, with Zhamak Dehghani who's was CEO of, now of Nextdata. This next wave of data intelligence is going to come from entrepreneurs that are probably cross discipline, computer science and some other discipline. But they're going to be new things, for example, data, metadata, and data. It's hard to do reasoning like a human being, so that needs more data to train itself. So I think the first gen of this training module for the large language model they have is a corpus of text. Lot of that's why blog posts are, but the facts are wrong and sometimes out of context, because that contextual reasoning takes time, it takes intelligence. So machines need to become intelligent, and so therefore they need to be trained. So you're going to start to see, I think, a lot of acceleration on training the data sets. And again, it's only as good as the data you can get. And again, proprietary data sets will be a huge winner. Anyone who's got a large corpus of content, proprietary content like theCUBE or SiliconANGLE as a publisher will benefit from this. Large FinTech companies, anyone with large proprietary data will probably be a big winner on this generative AI wave, because it just, it will eat that up, and turn that back into something better. So I think there's going to be a lot of interesting things to look at here. And certainly productivity's going to be off the charts for vanilla and the internet is going to get swarmed with vanilla content. So if you're in the content business, and you're an original content producer of any kind, you're going to be not vanilla, so you're going to be better. So I think there's so much at play Dave (indistinct). >> I think the playing field has been risen, so we- >> Risen and leveled? >> Yeah, and leveled to certain extent. So it's now like that few people as consumers, as consumers of AI, we will have a advantage and others cannot have that advantage. So it will be democratized. That's, I'm sure about that. But if you take the example of calculator, when the calculator came in, and a lot of people are, "Oh, people can't do math anymore because calculator is there." right? So it's a similar sort of moment, just like a calculator for the next level. But, again- >> I see it more like open source, Sarbjeet, because like if you think about what ChatGPT's doing, you do a query and it comes from somewhere the value of a post from ChatGPT is just a reuse of AI. The original content accent will be come from a human. So if I lay out a paragraph from ChatGPT, did some heavy lifting on some facts, I check the facts, save me about maybe- >> Yeah, it's productive. >> An hour writing, and then I write a killer two, three sentences of, like, sharp original thinking or critical analysis. I then took that body of work, open source content, and then laid something on top of it. >> And Sarbjeet's example is a good one, because like if the calculator kids don't do math as well anymore, the slide rule, remember we had slide rules as kids, remember we first started using Waze, you know, we were this minority and you had an advantage over other drivers. Now Waze is like, you know, social traffic, you know, navigation, everybody had, you know- >> All the back roads are crowded. >> They're car crowded. (group laughs) Exactly. All right, let's, let's move on. What about this notion that futurist Ray Amara put forth and really Amara's Law that we're showing here, it's, the law is we, you know, "We tend to overestimate the effect of technology in the short run and underestimate it in the long run." Is that the case, do you think, with ChatGPT? What do you think Sarbjeet? >> I think that's true actually. There's a lot of, >> We don't debate this. >> There's a lot of awe, like when people see the results from ChatGPT, they say what, what the heck? Like, it can do this? But then if you use it more and more and more, and I ask the set of similar question, not the same question, and it gives you like same answer. It's like reading from the same bucket of text in, the interior read (indistinct) where the ChatGPT, you will see that in some couple of segments. It's very, it sounds so boring that the ChatGPT is coming out the same two sentences every time. So it is kind of good, but it's not as good as people think it is right now. But we will have, go through this, you know, hype sort of cycle and get realistic with it. And then in the long term, I think it's a great thing in the short term, it's not something which will (indistinct) >> What's your counter point? You're saying it's not. >> I, no I think the question was, it's hyped up in the short term and not it's underestimated long term. That's what I think what he said, quote. >> Yes, yeah. That's what he said. >> Okay, I think that's wrong with this, because this is a unique, ChatGPT is a unique kind of impact and it's very generational. People have been comparing it, I have been comparing to the internet, like the web, web browser Mosaic and Netscape, right, Navigator. I mean, I clearly still remember the days seeing Navigator for the first time, wow. And there weren't not many sites you could go to, everyone typed in, you know, cars.com, you know. >> That (indistinct) wasn't that overestimated, the overhyped at the beginning and underestimated. >> No, it was, it was underestimated long run, people thought. >> But that Amara's law. >> That's what is. >> No, they said overestimated? >> Overestimated near term underestimated- overhyped near term, underestimated long term. I got, right I mean? >> Well, I, yeah okay, so I would then agree, okay then- >> We were off the charts about the internet in the early days, and it actually exceeded our expectations. >> Well there were people who were, like, poo-pooing it early on. So when the browser came out, people were like, "Oh, the web's a toy for kids." I mean, in 1995 the web was a joke, right? So '96, you had online populations growing, so you had structural changes going on around the browser, internet population. And then that replaced other things, direct mail, other business activities that were once analog then went to the web, kind of read only as you, as we always talk about. So I think that's a moment where the hype long term, the smart money, and the smart industry experts all get the long term. And in this case, there's more poo-pooing in the short term. "Ah, it's not a big deal, it's just AI." I've heard many people poo-pooing ChatGPT, and a lot of smart people saying, "No this is next gen, this is different and it's only going to get better." So I think people are estimating a big long game on this one. >> So you're saying it's bifurcated. There's those who say- >> Yes. >> Okay, all right, let's get to the heart of the premise, and possibly the debate for today's episode. Will OpenAI's early entry into the market confer sustainable competitive advantage for the company. And if you look at the history of tech, the technology industry, it's kind of littered with first mover failures. Altair, IBM, Tandy, Commodore, they and Apple even, they were really early in the PC game. They took a backseat to Dell who came in the scene years later with a better business model. Netscape, you were just talking about, was all the rage in Silicon Valley, with the first browser, drove up all the housing prices out here. AltaVista was the first search engine to really, you know, index full text. >> Owned by Dell, I mean DEC. >> Owned by Digital. >> Yeah, Digital Equipment >> Compaq bought it. And of course as an aside, Digital, they wanted to showcase their hardware, right? Their super computer stuff. And then so Friendster and MySpace, they came before Facebook. The iPhone certainly wasn't the first mobile device. So lots of failed examples, but there are some recent successes like AWS and cloud. >> You could say smartphone. So I mean. >> Well I know, and you can, we can parse this so we'll debate it. Now Twitter, you could argue, had first mover advantage. You kind of gave me that one John. Bitcoin and crypto clearly had first mover advantage, and sustaining that. Guys, will OpenAI make it to the list on the right with ChatGPT, what do you think? >> I think categorically as a company, it probably won't, but as a category, I think what they're doing will, so OpenAI as a company, they get funding, there's power dynamics involved. Microsoft put a billion dollars in early on, then they just pony it up. Now they're reporting 10 billion more. So, like, if the browsers, Microsoft had competitive advantage over Netscape, and used monopoly power, and convicted by the Department of Justice for killing Netscape with their monopoly, Netscape should have had won that battle, but Microsoft killed it. In this case, Microsoft's not killing it, they're buying into it. So I think the embrace extend Microsoft power here makes OpenAI vulnerable for that one vendor solution. So the AI as a company might not make the list, but the category of what this is, large language model AI, is probably will be on the right hand side. >> Okay, we're going to come back to the government intervention and maybe do some comparisons, but what are your thoughts on this premise here? That, it will basically set- put forth the premise that it, that ChatGPT, its early entry into the market will not confer competitive advantage to >> For OpenAI. >> To Open- Yeah, do you agree with that? >> I agree with that actually. It, because Google has been at it, and they have been holding back, as John said because of the scrutiny from the Fed, right, so- >> And privacy too. >> And the privacy and the accuracy as well. But I think Sam Altman and the company on those guys, right? They have put this in a hasty way out there, you know, because it makes mistakes, and there are a lot of questions around the, sort of, where the content is coming from. You saw that as your example, it just stole the content, and without your permission, you know? >> Yeah. So as quick this aside- >> And it codes on people's behalf and the, those codes are wrong. So there's a lot of, sort of, false information it's putting out there. So it's a very vulnerable thing to do what Sam Altman- >> So even though it'll get better, others will compete. >> So look, just side note, a term which Reid Hoffman used a little bit. Like he said, it's experimental launch, like, you know, it's- >> It's pretty damn good. >> It is clever because according to Sam- >> It's more than clever. It's good. >> It's awesome, if you haven't used it. I mean you write- you read what it writes and you go, "This thing writes so well, it writes so much better than you." >> The human emotion drives that too. I think that's a big thing. But- >> I Want to add one more- >> Make your last point. >> Last one. Okay. So, but he's still holding back. He's conducting quite a few interviews. If you want to get the gist of it, there's an interview with StrictlyVC interview from yesterday with Sam Altman. Listen to that one it's an eye opening what they want- where they want to take it. But my last one I want to make it on this point is that Satya Nadella yesterday did an interview with Wall Street Journal. I think he was doing- >> You were not impressed. >> I was not impressed because he was pushing it too much. So Sam Altman's holding back so there's less backlash. >> Got 10 billion reasons to push. >> I think he's almost- >> Microsoft just laid off 10000 people. Hey ChatGPT, find me a job. You know like. (group laughs) >> He's overselling it to an extent that I think it will backfire on Microsoft. And he's over promising a lot of stuff right now, I think. I don't know why he's very jittery about all these things. And he did the same thing during Ignite as well. So he said, "Oh, this AI will write code for you and this and that." Like you called him out- >> The hyperbole- >> During your- >> from Satya Nadella, he's got a lot of hyperbole. (group talks over each other) >> All right, Let's, go ahead. >> Well, can I weigh in on the whole- >> Yeah, sure. >> Microsoft thing on whether OpenAI, here's the take on this. I think it's more like the browser moment to me, because I could relate to that experience with ChatG, personally, emotionally, when I saw that, and I remember vividly- >> You mean that aha moment (indistinct). >> Like this is obviously the future. Anything else in the old world is dead, website's going to be everywhere. It was just instant dot connection for me. And a lot of other smart people who saw this. Lot of people by the way, didn't see it. Someone said the web's a toy. At the company I was worked for at the time, Hewlett Packard, they like, they could have been in, they had invented HTML, and so like all this stuff was, like, they just passed, the web was just being passed over. But at that time, the browser got better, more websites came on board. So the structural advantage there was online web usage was growing, online user population. So that was growing exponentially with the rise of the Netscape browser. So OpenAI could stay on the right side of your list as durable, if they leverage the category that they're creating, can get the scale. And if they can get the scale, just like Twitter, that failed so many times that they still hung around. So it was a product that was always successful, right? So I mean, it should have- >> You're right, it was terrible, we kept coming back. >> The fail whale, but it still grew. So OpenAI has that moment. They could do it if Microsoft doesn't meddle too much with too much power as a vendor. They could be the Netscape Navigator, without the anti-competitive behavior of somebody else. So to me, they have the pole position. So they have an opportunity. So if not, if they don't execute, then there's opportunity. There's not a lot of barriers to entry, vis-a-vis say the CapEx of say a cloud company like AWS. You can't replicate that, Many have tried, but I think you can replicate OpenAI. >> And we're going to talk about that. Okay, so real quick, I want to bring in some ETR data. This isn't an ETR heavy segment, only because this so new, you know, they haven't coverage yet, but they do cover AI. So basically what we're seeing here is a slide on the vertical axis's net score, which is a measure of spending momentum, and in the horizontal axis's is presence in the dataset. Think of it as, like, market presence. And in the insert right there, you can see how the dots are plotted, the two columns. And so, but the key point here that we want to make, there's a bunch of companies on the left, is he like, you know, DataRobot and C3 AI and some others, but the big whales, Google, AWS, Microsoft, are really dominant in this market. So that's really the key takeaway that, can we- >> I notice IBM is way low. >> Yeah, IBM's low, and actually bring that back up and you, but then you see Oracle who actually is injecting. So I guess that's the other point is, you're not necessarily going to go buy AI, and you know, build your own AI, you're going to, it's going to be there and, it, Salesforce is going to embed it into its platform, the SaaS companies, and you're going to purchase AI. You're not necessarily going to build it. But some companies obviously are. >> I mean to quote IBM's general manager Rob Thomas, "You can't have AI with IA." information architecture and David Flynn- >> You can't Have AI without IA >> without, you can't have AI without IA. You can't have, if you have an Information Architecture, you then can power AI. Yesterday David Flynn, with Hammersmith, was on our Supercloud. He was pointing out that the relationship of storage, where you store things, also impacts the data and stressablity, and Zhamak from Nextdata, she was pointing out that same thing. So the data problem factors into all this too, Dave. >> So you got the big cloud and internet giants, they're all poised to go after this opportunity. Microsoft is investing up to 10 billion. Google's code red, which was, you know, the headline in the New York Times. Of course Apple is there and several alternatives in the market today. Guys like Chinchilla, Bloom, and there's a company Jasper and several others, and then Lena Khan looms large and the government's around the world, EU, US, China, all taking notice before the market really is coalesced around a single player. You know, John, you mentioned Netscape, they kind of really, the US government was way late to that game. It was kind of game over. And Netscape, I remember Barksdale was like, "Eh, we're going to be selling software in the enterprise anyway." and then, pshew, the company just dissipated. So, but it looks like the US government, especially with Lena Khan, they're changing the definition of antitrust and what the cause is to go after people, and they're really much more aggressive. It's only what, two years ago that (indistinct). >> Yeah, the problem I have with the federal oversight is this, they're always like late to the game, and they're slow to catch up. So in other words, they're working on stuff that should have been solved a year and a half, two years ago around some of the social networks hiding behind some of the rules around open web back in the days, and I think- >> But they're like 15 years late to that. >> Yeah, and now they got this new thing on top of it. So like, I just worry about them getting their fingers. >> But there's only two years, you know, OpenAI. >> No, but the thing (indistinct). >> No, they're still fighting other battles. But the problem with government is that they're going to label Big Tech as like a evil thing like Pharma, it's like smoke- >> You know Lena Khan wants to kill Big Tech, there's no question. >> So I think Big Tech is getting a very seriously bad rap. And I think anything that the government does that shades darkness on tech, is politically motivated in most cases. You can almost look at everything, and my 80 20 rule is in play here. 80% of the government activity around tech is bullshit, it's politically motivated, and the 20% is probably relevant, but off the mark and not organized. >> Well market forces have always been the determining factor of success. The governments, you know, have been pretty much failed. I mean you look at IBM's antitrust, that, what did that do? The market ultimately beat them. You look at Microsoft back in the day, right? Windows 95 was peaking, the government came in. But you know, like you said, they missed the web, right, and >> so they were hanging on- >> There's nobody in government >> to Windows. >> that actually knows- >> And so, you, I think you're right. It's market forces that are going to determine this. But Sarbjeet, what do you make of Microsoft's big bet here, you weren't impressed with with Nadella. How do you think, where are they going to apply it? Is this going to be a Hail Mary for Bing, or is it going to be applied elsewhere? What do you think. >> They are saying that they will, sort of, weave this into their products, office products, productivity and also to write code as well, developer productivity as well. That's a big play for them. But coming back to your antitrust sort of comments, right? I believe the, your comment was like, oh, fed was late 10 years or 15 years earlier, but now they're two years. But things are moving very fast now as compared to they used to move. >> So two years is like 10 Years. >> Yeah, two years is like 10 years. Just want to make that point. (Dave laughs) This thing is going like wildfire. Any new tech which comes in that I think they're going against distribution channels. Lina Khan has commented time and again that the marketplace model is that she wants to have some grip on. Cloud marketplaces are a kind of monopolistic kind of way. >> I don't, I don't see this, I don't see a Chat AI. >> You told me it's not Bing, you had an interesting comment. >> No, no. First of all, this is great from Microsoft. If you're Microsoft- >> Why? >> Because Microsoft doesn't have the AI chops that Google has, right? Google is got so much core competency on how they run their search, how they run their backends, their cloud, even though they don't get a lot of cloud market share in the enterprise, they got a kick ass cloud cause they needed one. >> Totally. >> They've invented SRE. I mean Google's development and engineering chops are off the scales, right? Amazon's got some good chops, but Google's got like 10 times more chops than AWS in my opinion. Cloud's a whole different story. Microsoft gets AI, they get a playbook, they get a product they can render into, the not only Bing, productivity software, helping people write papers, PowerPoint, also don't forget the cloud AI can super help. We had this conversation on our Supercloud event, where AI's going to do a lot of the heavy lifting around understanding observability and managing service meshes, to managing microservices, to turning on and off applications, and or maybe writing code in real time. So there's a plethora of use cases for Microsoft to deploy this. combined with their R and D budgets, they can then turbocharge more research, build on it. So I think this gives them a car in the game, Google may have pole position with AI, but this puts Microsoft right in the game, and they already have a lot of stuff going on. But this just, I mean everything gets lifted up. Security, cloud, productivity suite, everything. >> What's under the hood at Google, and why aren't they talking about it? I mean they got to be freaked out about this. No? Or do they have kind of a magic bullet? >> I think they have the, they have the chops definitely. Magic bullet, I don't know where they are, as compared to the ChatGPT 3 or 4 models. Like they, but if you look at the online sort of activity and the videos put out there from Google folks, Google technology folks, that's account you should look at if you are looking there, they have put all these distinctions what ChatGPT 3 has used, they have been talking about for a while as well. So it's not like it's a secret thing that you cannot replicate. As you said earlier, like in the beginning of this segment, that anybody who has more data and the capacity to process that data, which Google has both, I think they will win this. >> Obviously living in Palo Alto where the Google founders are, and Google's headquarters next town over we have- >> We're so close to them. We have inside information on some of the thinking and that hasn't been reported by any outlet yet. And that is, is that, from what I'm hearing from my sources, is Google has it, they don't want to release it for many reasons. One is it might screw up their search monopoly, one, two, they're worried about the accuracy, 'cause Google will get sued. 'Cause a lot of people are jamming on this ChatGPT as, "Oh it does everything for me." when it's clearly not a hundred percent accurate all the time. >> So Lina Kahn is looming, and so Google's like be careful. >> Yeah so Google's just like, this is the third, could be a third rail. >> But the first thing you said is a concern. >> Well no. >> The disruptive (indistinct) >> What they will do is do a Waymo kind of thing, where they spin out a separate company. >> They're doing that. >> The discussions happening, they're going to spin out the separate company and put it over there, and saying, "This is AI, got search over there, don't touch that search, 'cause that's where all the revenue is." (chuckles) >> So, okay, so that's how they deal with the Clay Christensen dilemma. What's the business model here? I mean it's not advertising, right? Is it to charge you for a query? What, how do you make money at this? >> It's a good question, I mean my thinking is, first of all, it's cool to type stuff in and see a paper get written, or write a blog post, or gimme a marketing slogan for this or that or write some code. I think the API side of the business will be critical. And I think Howie Xu, I know you're going to reference some of his comments yesterday on Supercloud, I think this brings a whole 'nother user interface into technology consumption. I think the business model, not yet clear, but it will probably be some sort of either API and developer environment or just a straight up free consumer product, with some sort of freemium backend thing for business. >> And he was saying too, it's natural language is the way in which you're going to interact with these systems. >> I think it's APIs, it's APIs, APIs, APIs, because these people who are cooking up these models, and it takes a lot of compute power to train these and to, for inference as well. Somebody did the analysis on the how many cents a Google search costs to Google, and how many cents the ChatGPT query costs. It's, you know, 100x or something on that. You can take a look at that. >> A 100x on which side? >> You're saying two orders of magnitude more expensive for ChatGPT >> Much more, yeah. >> Than for Google. >> It's very expensive. >> So Google's got the data, they got the infrastructure and they got, you're saying they got the cost (indistinct) >> No actually it's a simple query as well, but they are trying to put together the answers, and they're going through a lot more data versus index data already, you know. >> Let me clarify, you're saying that Google's version of ChatGPT is more efficient? >> No, I'm, I'm saying Google search results. >> Ah, search results. >> What are used to today, but cheaper. >> But that, does that, is that going to confer advantage to Google's large language (indistinct)? >> It will, because there were deep science (indistinct). >> Google, I don't think Google search is doing a large language model on their search, it's keyword search. You know, what's the weather in Santa Cruz? Or how, what's the weather going to be? Or you know, how do I find this? Now they have done a smart job of doing some things with those queries, auto complete, re direct navigation. But it's, it's not entity. It's not like, "Hey, what's Dave Vellante thinking this week in Breaking Analysis?" ChatGPT might get that, because it'll get your Breaking Analysis, it'll synthesize it. There'll be some, maybe some clips. It'll be like, you know, I mean. >> Well I got to tell you, I asked ChatGPT to, like, I said, I'm going to enter a transcript of a discussion I had with Nir Zuk, the CTO of Palo Alto Networks, And I want you to write a 750 word blog. I never input the transcript. It wrote a 750 word blog. It attributed quotes to him, and it just pulled a bunch of stuff that, and said, okay, here it is. It talked about Supercloud, it defined Supercloud. >> It's made, it makes you- >> Wow, But it was a big lie. It was fraudulent, but still, blew me away. >> Again, vanilla content and non accurate content. So we are going to see a surge of misinformation on steroids, but I call it the vanilla content. Wow, that's just so boring, (indistinct). >> There's so many dangers. >> Make your point, cause we got to, almost out of time. >> Okay, so the consumption, like how do you consume this thing. As humans, we are consuming it and we are, like, getting a nicely, like, surprisingly shocked, you know, wow, that's cool. It's going to increase productivity and all that stuff, right? And on the danger side as well, the bad actors can take hold of it and create fake content and we have the fake sort of intelligence, if you go out there. So that's one thing. The second thing is, we are as humans are consuming this as language. Like we read that, we listen to it, whatever format we consume that is, but the ultimate usage of that will be when the machines can take that output from likes of ChatGPT, and do actions based on that. The robots can work, the robot can paint your house, we were talking about, right? Right now we can't do that. >> Data apps. >> So the data has to be ingested by the machines. It has to be digestible by the machines. And the machines cannot digest unorganized data right now, we will get better on the ingestion side as well. So we are getting better. >> Data, reasoning, insights, and action. >> I like that mall, paint my house. >> So, okay- >> By the way, that means drones that'll come in. Spray painting your house. >> Hey, it wasn't too long ago that robots couldn't climb stairs, as I like to point out. Okay, and of course it's no surprise the venture capitalists are lining up to eat at the trough, as I'd like to say. Let's hear, you'd referenced this earlier, John, let's hear what AI expert Howie Xu said at the Supercloud event, about what it takes to clone ChatGPT. Please, play the clip. >> So one of the VCs actually asked me the other day, right? "Hey, how much money do I need to spend, invest to get a, you know, another shot to the openAI sort of the level." You know, I did a (indistinct) >> Line up. >> A hundred million dollar is the order of magnitude that I came up with, right? You know, not a billion, not 10 million, right? So a hundred- >> Guys a hundred million dollars, that's an astoundingly low figure. What do you make of it? >> I was in an interview with, I was interviewing, I think he said hundred million or so, but in the hundreds of millions, not a billion right? >> You were trying to get him up, you were like "Hundreds of millions." >> Well I think, I- >> He's like, eh, not 10, not a billion. >> Well first of all, Howie Xu's an expert machine learning. He's at Zscaler, he's a machine learning AI guy. But he comes from VMware, he's got his technology pedigrees really off the chart. Great friend of theCUBE and kind of like a CUBE analyst for us. And he's smart. He's right. I think the barriers to entry from a dollar standpoint are lower than say the CapEx required to compete with AWS. Clearly, the CapEx spending to build all the tech for the run a cloud. >> And you don't need a huge sales force. >> And in some case apps too, it's the same thing. But I think it's not that hard. >> But am I right about that? You don't need a huge sales force either. It's, what, you know >> If the product's good, it will sell, this is a new era. The better mouse trap will win. This is the new economics in software, right? So- >> Because you look at the amount of money Lacework, and Snyk, Snowflake, Databrooks. Look at the amount of money they've raised. I mean it's like a billion dollars before they get to IPO or more. 'Cause they need promotion, they need go to market. You don't need (indistinct) >> OpenAI's been working on this for multiple five years plus it's, hasn't, wasn't born yesterday. Took a lot of years to get going. And Sam is depositioning all the success, because he's trying to manage expectations, To your point Sarbjeet, earlier. It's like, yeah, he's trying to "Whoa, whoa, settle down everybody, (Dave laughs) it's not that great." because he doesn't want to fall into that, you know, hero and then get taken down, so. >> It may take a 100 million or 150 or 200 million to train the model. But to, for the inference to, yeah to for the inference machine, It will take a lot more, I believe. >> Give it, so imagine, >> Because- >> Go ahead, sorry. >> Go ahead. But because it consumes a lot more compute cycles and it's certain level of storage and everything, right, which they already have. So I think to compute is different. To frame the model is a different cost. But to run the business is different, because I think 100 million can go into just fighting the Fed. >> Well there's a flywheel too. >> Oh that's (indistinct) >> (indistinct) >> We are running the business, right? >> It's an interesting number, but it's also kind of, like, context to it. So here, a hundred million spend it, you get there, but you got to factor in the fact that the ways companies win these days is critical mass scale, hitting a flywheel. If they can keep that flywheel of the value that they got going on and get better, you can almost imagine a marketplace where, hey, we have proprietary data, we're SiliconANGLE in theCUBE. We have proprietary content, CUBE videos, transcripts. Well wouldn't it be great if someone in a marketplace could sell a module for us, right? We buy that, Amazon's thing and things like that. So if they can get a marketplace going where you can apply to data sets that may be proprietary, you can start to see this become bigger. And so I think the key barriers to entry is going to be success. I'll give you an example, Reddit. Reddit is successful and it's hard to copy, not because of the software. >> They built the moat. >> Because you can, buy Reddit open source software and try To compete. >> They built the moat with their community. >> Their community, their scale, their user expectation. Twitter, we referenced earlier, that thing should have gone under the first two years, but there was such a great emotional product. People would tolerate the fail whale. And then, you know, well that was a whole 'nother thing. >> Then a plane landed in (John laughs) the Hudson and it was over. >> I think verticals, a lot of verticals will build applications using these models like for lawyers, for doctors, for scientists, for content creators, for- >> So you'll have many hundreds of millions of dollars investments that are going to be seeping out. If, all right, we got to wrap, if you had to put odds on it that that OpenAI is going to be the leader, maybe not a winner take all leader, but like you look at like Amazon and cloud, they're not winner take all, these aren't necessarily winner take all markets. It's not necessarily a zero sum game, but let's call it winner take most. What odds would you give that open AI 10 years from now will be in that position. >> If I'm 0 to 10 kind of thing? >> Yeah, it's like horse race, 3 to 1, 2 to 1, even money, 10 to 1, 50 to 1. >> Maybe 2 to 1, >> 2 to 1, that's pretty low odds. That's basically saying they're the favorite, they're the front runner. Would you agree with that? >> I'd say 4 to 1. >> Yeah, I was going to say I'm like a 5 to 1, 7 to 1 type of person, 'cause I'm a skeptic with, you know, there's so much competition, but- >> I think they're definitely the leader. I mean you got to say, I mean. >> Oh there's no question. There's no question about it. >> The question is can they execute? >> They're not Friendster, is what you're saying. >> They're not Friendster and they're more like Twitter and Reddit where they have momentum. If they can execute on the product side, and if they don't stumble on that, they will continue to have the lead. >> If they say stay neutral, as Sam is, has been saying, that, hey, Microsoft is one of our partners, if you look at their company model, how they have structured the company, then they're going to pay back to the investors, like Microsoft is the biggest one, up to certain, like by certain number of years, they're going to pay back from all the money they make, and after that, they're going to give the money back to the public, to the, I don't know who they give it to, like non-profit or something. (indistinct) >> Okay, the odds are dropping. (group talks over each other) That's a good point though >> Actually they might have done that to fend off the criticism of this. But it's really interesting to see the model they have adopted. >> The wildcard in all this, My last word on this is that, if there's a developer shift in how developers and data can come together again, we have conferences around the future of data, Supercloud and meshs versus, you know, how the data world, coding with data, how that evolves will also dictate, 'cause a wild card could be a shift in the landscape around how developers are using either machine learning or AI like techniques to code into their apps, so. >> That's fantastic insight. I can't thank you enough for your time, on the heels of Supercloud 2, really appreciate it. All right, thanks to John and Sarbjeet for the outstanding conversation today. Special thanks to the Palo Alto studio team. My goodness, Anderson, this great backdrop. You guys got it all out here, I'm jealous. And Noah, really appreciate it, Chuck, Andrew Frick and Cameron, Andrew Frick switching, Cameron on the video lake, great job. And Alex Myerson, he's on production, manages the podcast for us, Ken Schiffman as well. Kristen Martin and Cheryl Knight help get the word out on social media and our newsletters. Rob Hof is our editor-in-chief over at SiliconANGLE, does some great editing, thanks to all. Remember, all these episodes are available as podcasts. All you got to do is search Breaking Analysis podcast, wherever you listen. Publish each week on wikibon.com and siliconangle.com. Want to get in touch, email me directly, david.vellante@siliconangle.com or DM me at dvellante, or comment on our LinkedIn post. And by all means, check out etr.ai. They got really great survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, We'll see you next time on Breaking Analysis. (electronic music)

Published Date : Jan 20 2023

SUMMARY :

bringing you data-driven and ChatGPT have taken the world by storm. So I asked it, give it to the large language models to do that. So to your point, it's So one of the problems with ChatGPT, and he simply gave the system the prompts, or the OS to help it do but it kind of levels the playing- and the answers were coming as the data you can get. Yeah, and leveled to certain extent. I check the facts, save me about maybe- and then I write a killer because like if the it's, the law is we, you know, I think that's true and I ask the set of similar question, What's your counter point? and not it's underestimated long term. That's what he said. for the first time, wow. the overhyped at the No, it was, it was I got, right I mean? the internet in the early days, and it's only going to get better." So you're saying it's bifurcated. and possibly the debate the first mobile device. So I mean. on the right with ChatGPT, and convicted by the Department of Justice the scrutiny from the Fed, right, so- And the privacy and thing to do what Sam Altman- So even though it'll get like, you know, it's- It's more than clever. I mean you write- I think that's a big thing. I think he was doing- I was not impressed because You know like. And he did the same thing he's got a lot of hyperbole. the browser moment to me, So OpenAI could stay on the right side You're right, it was terrible, They could be the Netscape Navigator, and in the horizontal axis's So I guess that's the other point is, I mean to quote IBM's So the data problem factors and the government's around the world, and they're slow to catch up. Yeah, and now they got years, you know, OpenAI. But the problem with government to kill Big Tech, and the 20% is probably relevant, back in the day, right? are they going to apply it? and also to write code as well, that the marketplace I don't, I don't see you had an interesting comment. No, no. First of all, the AI chops that Google has, right? are off the scales, right? I mean they got to be and the capacity to process that data, on some of the thinking So Lina Kahn is looming, and this is the third, could be a third rail. But the first thing What they will do out the separate company Is it to charge you for a query? it's cool to type stuff in natural language is the way and how many cents the and they're going through Google search results. It will, because there were It'll be like, you know, I mean. I never input the transcript. Wow, But it was a big lie. but I call it the vanilla content. Make your point, cause we And on the danger side as well, So the data By the way, that means at the Supercloud event, So one of the VCs actually What do you make of it? you were like "Hundreds of millions." not 10, not a billion. Clearly, the CapEx spending to build all But I think it's not that hard. It's, what, you know This is the new economics Look at the amount of And Sam is depositioning all the success, or 150 or 200 million to train the model. So I think to compute is different. not because of the software. Because you can, buy They built the moat And then, you know, well that the Hudson and it was over. that are going to be seeping out. Yeah, it's like horse race, 3 to 1, 2 to 1, that's pretty low odds. I mean you got to say, I mean. Oh there's no question. is what you're saying. and if they don't stumble on that, the money back to the public, to the, Okay, the odds are dropping. the model they have adopted. Supercloud and meshs versus, you know, on the heels of Supercloud

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Sarbjeet	PERSON	0.99+
Brian Gracely	PERSON	0.99+
Lina Khan	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Reid Hoffman	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Lena Khan	PERSON	0.99+
Sam Altman	PERSON	0.99+
Apple	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Ken Schiffman	PERSON	0.99+
Google	ORGANIZATION	0.99+
David Flynn	PERSON	0.99+
Sam	PERSON	0.99+
Noah	PERSON	0.99+
Ray Amara	PERSON	0.99+
10 billion	QUANTITY	0.99+
150	QUANTITY	0.99+
Rob Hof	PERSON	0.99+
Chuck	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Howie Xu	PERSON	0.99+
Anderson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
John Furrier	PERSON	0.99+
Hewlett Packard	ORGANIZATION	0.99+
Santa Cruz	LOCATION	0.99+
1995	DATE	0.99+
Lina Kahn	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
50 words	QUANTITY	0.99+
Hundreds of millions	QUANTITY	0.99+
Compaq	ORGANIZATION	0.99+
10	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
two sentences	QUANTITY	0.99+
Dave	PERSON	0.99+
hundreds of millions	QUANTITY	0.99+
Satya Nadella	PERSON	0.99+
Cameron	PERSON	0.99+
100 million	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
one sentence	QUANTITY	0.99+
10 million	QUANTITY	0.99+
yesterday	DATE	0.99+
Clay Christensen	PERSON	0.99+
Sarbjeet Johal	PERSON	0.99+
Netscape	ORGANIZATION	0.99+

Closing Remarks | Supercloud2

>> Welcome back everyone to the closing remarks here before we kick off our ecosystem portion of the program. We're live in Palo Alto for theCUBE special presentation of Supercloud 2. It's the second edition, the first one was in August. I'm John Furrier with Dave Vellante. Here to wrap up with our special guest analyst George Gilbert, investor and industry legend former colleague of ours, analyst at Wikibon. George great to see you. Dave, you know, wrapping up this day what in a phenomenal program. We had a contribution from industry vendors, industry experts, practitioners and customers building and redefining their company's business model. Rolling out technology for Supercloud and multicloud and ultimately changing how they do data. And data was the theme today. So very, very great program. Before we jump into our favorite parts let's give a shout out to the folks who make this possible. Free contents our mission. We'll always stay true to that mission. We want to thank VMware, alkira, ChaosSearch, prosimo for being sponsors of this great program. We will have Supercloud 3 coming up in a month or so, or two months. We'll see. Or sooner, we don't know. But it'll be more about security, but a lot more momentum. Okay, so that's... >> And don't forget too that this program not going to end now. We've got a whole ecosystem speaks track so stay tuned for that. >> John: Yeah, we got another 20 interviews. Feels like it. >> Well, you're going to hear from Saks, Veronika Durgin. You're going to hear from Western Union, Harveer Singh. You're going to hear from Ionis Pharmaceuticals, Nick Taylor. Brian Gracely chimes in on Supecloud. So he's the man behind the cloud cast. >> Yeah, and you know, the practitioners again, pay attention to also to the cloud networking interviews. Lot of change going on there that's going to be disruptive and actually change the landscape as well. Again, as Supercloud progresses to be the next big thing. If you're not on this next wave, you'll drift what, as Pat Gelsinger says. >> Yep. >> To kick off the closing segments, George, Dave, this is a wave that's been identified. Again, people debate the word all you want Supercloud. It is a gateway to multicloud eventually it is the standard for new applications, new ways to do data. There's new computer science being generated and customer requirements being addressed. So it's the confluence of, you know, tectonic plates shifting in the industry, new computer science seeing things like AI and machine learning and data at the center of it and new infrastructure all kind of coming together. So, to me, that's my takeaway so far. That is the big story and it's going to change society and ultimately the business models of these companies. >> Well, we've had 10, you know, you think about it we came out of the financial crisis. We've had 10, 12 years despite the Covid of tech success, right? And just now CIOs are starting to hit the brakes. And so my point is you've had all this innovation building up for a decade and you've got this massive ecosystem that is running on the cloud and the ecosystem is saying, hey, we can have even more value by tapping best of of breed across clouds. And you've got customers saying, hey, we need help. We want to do more and we want to point our business and our intellectual property, our software tooling at our customers and monetize our data. So you have all these forces coming together and it's sort of entering a new era. >> George, I want to go to you for a second because you are big contributor to this event. Your interview with Bob Moglia with Dave was I thought a watershed moment for me to hear that the data apps, how databases are being rethought because we've been seeing a diversity of databases with Amazon Web services, you know, promoting no one database rules of the world. Now it's not one database kind of architecture that's puling these new apps. What's your takeaway from this event? >> So if you keep your eye on this North Star where instead of building apps that are based on code you're building apps that are defined by data coming off of things that are linked to the real world like people, places, things and activities. Then the idea is, and the example we use is, you know, Uber but it could be, you know, amazon.com is defined by stuff coming off data in the Amazon ecosystem or marketplace. And then the question is, and everyone was talking at different angles on this, which was, where's the data live? How much do you hide from the developer? You know, and when can you offer that? You know, and you started with Walmart which was describing apps, traditional apps that are just code. And frankly that's easier to make that cross cloud and you know, essentially location independent. As soon as you have data you need data management technology that a customer does not have the sophistication to build. And then the argument was like, so how much can you hide from the developer who's building data apps? Tristan's version was you take the modern data stack and you start adding these APIs that define business concepts like bookings, billings and revenue, you know, or in the Uber example like drivers and riders, you know, and ETA's and prices. But those things execute still on the data warehouse or data lakehouse. Then Bob Muglia was saying you're not really hiding enough from the developer because you still got to say how to do all that. And his vision is not only do you hide where the data is but you hide how to sort of get at all that code by just saying what you want. You define how a car and how a driver and how a rider works. And then those things automatically figure out underneath the cover. >> So huge challenges, right? There's governance, there's security, they could be big blockers to, you know, the Supercloud but the industry's going to be attacking that problem. >> Well, what's your take? What's your favorite segment? Zhamak Dehghani came on, she's starting in that company, exclusive news. That was big notable moment for theCUBE. She launched her company. She pioneered the data mesh concept. And I think what George is saying and what data mesh points to is something that we've been saying for a long time. That data is now going to flip the script on how apps behave. And the Uber example I think is illustrated 'cause people can relate to Uber. But imagine that for every business whether it's a manufacturing business or retail or oil and gas or FinTech, they can look at their business like a game almost gamify it with data, riders, cars you know, moving data around the value of data. This is something that Adam Selipsky teased out at AWS, Dave. So what's your takeaway from this Supercloud? Where are we in your mind? Well big thing is data products and decentralizing your data architecture, but putting data in the hands of domain experts who can actually monetize the data. And I think that's, to me that's really exciting. Because look, data products financial industry has always been doing building data products. Mortgage backed securities is a data product. But why should the financial industry have all the fun? I mean virtually every organization can tap its ecosystem build data products, take its internal IP and processes and software and point it to the world and actually begin to make money out of it. >> Okay, so let's go around the horn. I'll start, I'll get you guys some time to think. Next question, what did you learn today? I learned that I think it's an infrastructure game and talking to Kit Colbert at VMware, I think it's all about infrastructure refactoring and I think the data's going to be an ingredient that's going to be operating system like. I think you're going to see the infrastructure influencing operations that will enable Superclouds to be real. And developers won't even know what a Supercloud is because they'll be using it. It's the operations focus is going to be very critical. Just like DevOps movements started Cloud native I think you're going to see a data native movement and I think infrastructure is critical as people go to the next level. That's my big takeaway today. And I'll say the data conversation is at the center. I think security, data are going to be always active horizontally scalable concepts, but every company's going to reset their infrastructure, how it looks and if it's not set up for data and or things that there need to be agile on, it's going to be a non-starter. So I think that's the cloud NextGen, distributed computing. >> I mean, what came into focus for me was I think the hyperscaler is going to continue to do their thing, you know, and be very, very successful and they're each coming at it from different approaches. We talk about this all the time in theCUBE. Amazon the best infrastructure, you know, Google's got its you know, data and AI thing and it's playing catch up and Microsoft's got this massive estate. Okay, cool. Check. The next wave of innovation which is coming from data, I've always said follow the data. That's where the where the money's going to be is going to come from other places. People want to be able to, organizations want to be able to share data across clouds across their organization, outside of their ecosystem and make money with that data sharing. They don't want to FTP it anymore. I got it. You take it. They want to work with live data in real time and I think the edge, we didn't talk much about the edge today is going to even take that to a new level real time inferencing at the edge, AI and and being able to do new things with data that we haven't even seen. But playing around with ChatGPT, it's blowing our mind. And I think you're right, it's like when we first saw the browser, holy crap, this is going to change the world. >> Yeah. And the ChatGPT by the way is going to create a wave of machine learning and data refactoring for sure. But also Howie Liu had an interesting comment, he was asked by a VC how much to replicate that and he said it's in the hundreds of millions, not billions. Now if you asked that same question how much does it cost to replicate AWS? The CapEx alone is unstoppable, they're already done. So, you know, the hyperscalers are going to continue to boom. I think they're going to drive the infrastructure. I think Amazon's going to be really strong at silicon and physics and squeeze every ounce atom out of every physical thing and then get latency as your bottleneck and the rest is all going to be... >> That never blew me away, a hundred million to create kind of an open AI, you know, competitor. Look at companies like Lacework. >> John: Some people have that much cash on the balance sheet. >> These are security companies that have raised a billion dollars, right? To compete. You know, so... >> If you're not shifting left what do you do with data, shift up? >> But, you know. >> What did you learn, George? >> I'm listening to you and I think you're helping me crystallize something which is the software infrastructure to enable the data apps is wide open. The way Zhamak described it is like if you want a data product like a sales and operation plan, that is built on other data products, like a sales plan which has a forecast in it, it has a production plan, it has a procurement plan and then a sales and operation plan is actually a composition of all those and they call each other. Now in her current platform, you need to expose to the developer a certain amount of mechanics on how to move all that data, when to move it. Like what happens if something fails. Now Muglia is saying I can hide that completely. So all you have to say is what you want and the underlying machinery takes care of everything. The problem is Muglia stuff is still a few years off. And Tristan is saying, I can give you much of that today but it's got to run in the data warehouse. So this trade offs all different ways. But again, I agree with you that the Cloud platform vendors or the ecosystem participants who can run across Cloud platforms and private infrastructure will be the next platform. And then the cloud platform is sort of where you run the big honking centralized stuff where someone else manages the operations. >> Sounds like middleware to me, Dave >> And key is, I'll just end with this. The key is being able to get to the data, whether it's in a data warehouse or a data lake or a S3 bucket or an object store, Oracle database, whatever. It's got to be inclusive that is critical to execute on the vision that you just talked about 'cause that data's in different systems and you're not going to put it all into some new system. >> So creating middleware in the cloud that sounds what it sounds like to me. >> It's like, you discovered PaaS >> It's a super PaaS. >> But it's platform services 'cause PaaS connotes like a tightly integrated platform. >> Well this is the real thing that's going on. We're going to see how this evolves. George, great to have you on, Dave. Thanks for the summary. I enjoyed this segment a lot today. This ends our stage performance live here in Palo Alto. As you know, we're live stage performance and syndicate out virtually. Our afternoon program's going to kick in now you're going to hear some great interviews. We got ChaosSearch. Defining the network Supercloud from prosimo. Future of Cloud Network, alkira. We got Saks, a retail company here, Veronika Durgin. We got Dave with Western Union. So a lot of customers, a pharmaceutical company Warner Brothers, Discovery, media company. And then you know, what is really needed for Supercloud, good panels. So stay with us for the afternoon program. That's part two of Supercloud 2. This is a wrap up for our stage live performance. I'm John Furrier with Dave Vellante and George Gilbert here wrapping up. Thanks for watching and enjoy the program. (bright music)

Published Date : Jan 17 2023

SUMMARY :

to the closing remarks here program not going to end now. John: Yeah, we got You're going to hear from Yeah, and you know, It is a gateway to multicloud starting to hit the brakes. go to you for a second the sophistication to build. but the industry's going to And I think that's, to me and talking to Kit Colbert at VMware, to do their thing, you know, I think Amazon's going to be really strong kind of an open AI, you know, competitor. on the balance sheet. that have raised a billion dollars, right? I'm listening to you and I think It's got to be inclusive that is critical So creating middleware in the cloud But it's platform services George, great to have you on, Dave.

ENTITIES

Entity	Category	Confidence
Tristan	PERSON	0.99+
Dave Vellante	PERSON	0.99+
George Gilbert	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Adam Selipsky	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Bob Moglia	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
John	PERSON	0.99+
Bob Muglia	PERSON	0.99+
George	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Western Union	ORGANIZATION	0.99+
Nick Taylor	PERSON	0.99+
Palo Alto	LOCATION	0.99+
10	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Brian Gracely	PERSON	0.99+
Howie Liu	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
hundreds of millions	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Ionis Pharmaceuticals	ORGANIZATION	0.99+
August	DATE	0.99+
Warner Brothers	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
billions	QUANTITY	0.99+
Zhamak	PERSON	0.99+
Muglia	PERSON	0.99+
20 interviews	QUANTITY	0.99+
Discovery	ORGANIZATION	0.99+
second edition	QUANTITY	0.99+
ChaosSearch	ORGANIZATION	0.99+
today	DATE	0.99+
two months	QUANTITY	0.99+
Supercloud 2	TITLE	0.98+
VMware	ORGANIZATION	0.98+
Saks	ORGANIZATION	0.98+
PaaS	TITLE	0.98+
amazon.com	ORGANIZATION	0.98+
first one	QUANTITY	0.98+
Lacework	ORGANIZATION	0.98+
Harveer Singh	PERSON	0.98+
Oracle	ORGANIZATION	0.97+
alkira	PERSON	0.96+
first	QUANTITY	0.96+
Supercloud	ORGANIZATION	0.95+
Supercloud2	TITLE	0.94+
Wikibon	ORGANIZATION	0.94+
Supecloud	ORGANIZATION	0.94+
each	QUANTITY	0.93+
hundred million	QUANTITY	0.92+
multicloud	ORGANIZATION	0.92+
every ounce atom	QUANTITY	0.91+
Amazon Web	ORGANIZATION	0.88+
Supercloud 3	TITLE	0.87+

Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps

>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)

Published Date : Jan 14 2023

SUMMARY :

with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal

ENTITIES

Entity	Category	Confidence
Bob Muglia	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
David Flynn	PERSON	0.99+
Veronica	PERSON	0.99+
Jack	PERSON	0.99+
Nelu Mihai	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Nick Taylor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Veronica Durgin	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Warner Media	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ionis Pharmaceutical	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Bob Muglia	PERSON	0.99+
David Flore	PERSON	0.99+
DBT Labs	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Palo Alto	LOCATION	0.99+
21 sessions	QUANTITY	0.99+
Darren Bramberm	PERSON	0.99+
33 guests	QUANTITY	0.99+
Nir Zuk	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Harveer Singh	PERSON	0.99+
Kit Colbert	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Supercloud 2	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
last year	DATE	0.99+
Western Union	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
200 locations	QUANTITY	0.99+
August	DATE	0.99+
Keith Townsend	PERSON	0.99+
Data Mesh	ORGANIZATION	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
next week	DATE	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
second	QUANTITY	0.99+
first point	QUANTITY	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
VMware	ORGANIZATION	0.98+
Silicon Angle	ORGANIZATION	0.98+
ETR	ORGANIZATION	0.98+
Eric Bradley	PERSON	0.98+
two	QUANTITY	0.98+
today	DATE	0.98+
Sachs	ORGANIZATION	0.98+
SAKS	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
last August	DATE	0.98+
each week	QUANTITY	0.98+

Analyst Predictions 2023: The Future of Data Management

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Doug	PERSON	0.99+
Carl	PERSON	0.99+
Carl Olofson	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Tony Baer	PERSON	0.99+
Tony	PERSON	0.99+
Dave Valente	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
Curt Monash	PERSON	0.99+
Sanjeev Mohan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
Dave Valente	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Sanjeev	PERSON	0.99+
Constellation Research	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Ventana Research	ORGANIZATION	0.99+
2022	DATE	0.99+
Hazelcast	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Tony Bear	PERSON	0.99+
25%	QUANTITY	0.99+
2021	DATE	0.99+
last year	DATE	0.99+
65%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
today	DATE	0.99+
five-year	QUANTITY	0.99+
TigerGraph	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
two services	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
David	PERSON	0.99+
RisingWave Labs	ORGANIZATION	0.99+

Jack Greenfield, Walmart | A Dive into Walmart's Retail Supercloud

>> Welcome back to SuperCloud2. This is Dave Vellante, and we're here with Jack Greenfield. He's the Vice President of Enterprise Architecture and the Chief Architect for the global technology platform at Walmart. Jack, I want to thank you for coming on the program. Really appreciate your time. >> Glad to be here, Dave. Thanks for inviting me and appreciate the opportunity to chat with you. >> Yeah, it's our pleasure. Now we call what you've built a SuperCloud. That's our term, not yours, but how would you describe the Walmart Cloud Native Platform? >> So WCNP, as the acronym goes, is essentially an implementation of Kubernetes for the Walmart ecosystem. And what that means is that we've taken Kubernetes off the shelf as open source, and we have integrated it with a number of foundational services that provide other aspects of our computational environment. So Kubernetes off the shelf doesn't do everything. It does a lot. In particular the orchestration of containers, but it delegates through API a lot of key functions. So for example, secret management, traffic management, there's a need for telemetry and observability at a scale beyond what you get from raw Kubernetes. That is to say, harvesting the metrics that are coming out of Kubernetes and processing them, storing them in time series databases, dashboarding them, and so on. There's also an angle to Kubernetes that gets a lot of attention in the daily DevOps routine, that's not really part of the open source deliverable itself, and that is the DevOps sort of CICD pipeline-oriented lifecycle. And that is something else that we've added and integrated nicely. And then one more piece of this picture is that within a Kubernetes cluster, there's a function that is critical to allowing services to discover each other and integrate with each other securely and with proper configuration provided by the concept of a service mesh. So Istio, Linkerd, these are examples of service mesh technologies. And we have gone ahead and integrated actually those two. There's more than those two, but we've integrated those two with Kubernetes. So the net effect is that when a developer within Walmart is going to build an application, they don't have to think about all those other capabilities where they come from or how they're provided. Those are already present, and the way the CICD pipelines are set up, it's already sort of in the picture, and there are configuration points that they can take advantage of in the primary YAML and a couple of other pieces of config that we supply where they can tune it. But at the end of the day, it offloads an awful lot of work for them, having to stand up and operate those services, fail them over properly, and make them robust. All of that's provided for. >> Yeah, you know, developers often complain they spend too much time wrangling and doing things that aren't productive. So I wonder if you could talk about the high level business goals of the initiative in terms of the hardcore benefits. Was the real impetus to tap into best of breed cloud services? Were you trying to cut costs? Maybe gain negotiating leverage with the cloud guys? Resiliency, you know, I know was a major theme. Maybe you could give us a sense of kind of the anatomy of the decision making process that went in. >> Sure, and in the course of answering your question, I think I'm going to introduce the concept of our triplet architecture which we haven't yet touched on in the interview here. First off, just to sort of wrap up the motivation for WCNP itself which is kind of orthogonal to the triplet architecture. It can exist with or without it. Currently does exist with it, which is key, and I'll get to that in a moment. The key drivers, business drivers for WCNP were developer productivity by offloading the kinds of concerns that we've just discussed. Number two, improving resiliency, that is to say reducing opportunity for human error. One of the challenges you tend to run into in a large enterprise is what we call snowflakes, lots of gratuitously different workloads, projects, configurations to the extent that by developing and using WCNP and continuing to evolve it as we have, we end up with cookie cutter like consistency across our workloads which is super valuable when it comes to building tools or building services to automate operations that would otherwise be manual. When everything is pretty much done the same way, that becomes much simpler. Another key motivation for WCNP was the ability to abstract from the underlying cloud provider. And this is going to lead to a discussion of our triplet architecture. At the end of the day, when one works directly with an underlying cloud provider, one ends up taking a lot of dependencies on that particular cloud provider. Those dependencies can be valuable. For example, there are best of breed services like say Cloud Spanner offered by Google or say Cosmos DB offered by Microsoft that one wants to use and one is willing to take the dependency on the cloud provider to get that functionality because it's unique and valuable. On the other hand, one doesn't want to take dependencies on a cloud provider that don't add a lot of value. And with Kubernetes, we have the opportunity, and this is a large part of how Kubernetes was designed and why it is the way it is, we have the opportunity to sort of abstract from the underlying cloud provider for stateless workloads on compute. And so what this lets us do is build container-based applications that can run without change on different cloud provider infrastructure. So the same applications can run on WCNP over Azure, WCNP over GCP, or WCNP over the Walmart private cloud. And we have a private cloud. Our private cloud is OpenStack based and it gives us some significant cost advantages as well as control advantages. So to your point, in terms of business motivation, there's a key cost driver here, which is that we can use our own private cloud when it's advantageous and then use the public cloud provider capabilities when we need to. A key place with this comes into play is with elasticity. So while the private cloud is much more cost effective for us to run and use, it isn't as elastic as what the cloud providers offer, right? We don't have essentially unlimited scale. We have large scale, but the public cloud providers are elastic in the extreme which is a very powerful capability. So what we're able to do is burst, and we use this term bursting workloads into the public cloud from the private cloud to take advantage of the elasticity they offer and then fall back into the private cloud when the traffic load diminishes to the point where we don't need that elastic capability, elastic capacity at low cost. And this is a very important paradigm that I think is going to be very commonplace ultimately as the industry evolves. Private cloud is easier to operate and less expensive, and yet the public cloud provider capabilities are difficult to match. >> And the triplet, the tri is your on-prem private cloud and the two public clouds that you mentioned, is that right? >> That is correct. And we actually have an architecture in which we operate all three of those cloud platforms in close proximity with one another in three different major regions in the US. So we have east, west, and central. And in each of those regions, we have all three cloud providers. And the way it's configured, those data centers are within 10 milliseconds of each other, meaning that it's of negligible cost to interact between them. And this allows us to be fairly agnostic to where a particular workload is running. >> Does a human make that decision, Jack or is there some intelligence in the system that determines that? >> That's a really great question, Dave. And it's a great question because we're at the cusp of that transition. So currently humans make that decision. Humans choose to deploy workloads into a particular region and a particular provider within that region. That said, we're actively developing patterns and practices that will allow us to automate the placement of the workloads for a variety of criteria. For example, if in a particular region, a particular provider is heavily overloaded and is unable to provide the level of service that's expected through our SLAs, we could choose to fail workloads over from that cloud provider to a different one within the same region. But that's manual today. We do that, but people do it. Okay, we'd like to get to where that happens automatically. In the same way, we'd like to be able to automate the failovers, both for high availability and sort of the heavier disaster recovery model between, within a region between providers and even within a provider between the availability zones that are there, but also between regions for the sort of heavier disaster recovery or maintenance driven realignment of workload placement. Today, that's all manual. So we have people moving workloads from region A to region B or data center A to data center B. It's clean because of the abstraction. The workloads don't have to know or care, but there are latency considerations that come into play, and the humans have to be cognizant of those. And automating that can help ensure that we get the best performance and the best reliability. >> But you're developing the dataset to actually, I would imagine, be able to make those decisions in an automated fashion over time anyway. Is that a fair assumption? >> It is, and that's what we're actively developing right now. So if you were to look at us today, we have these nice abstractions and APIs in place, but people run that machine, if you will, moving toward a world where that machine is fully automated. >> What exactly are you abstracting? Is it sort of the deployment model or, you know, are you able to abstract, I'm just making this up like Azure functions and GCP functions so that you can sort of run them, you know, with a consistent experience. What exactly are you abstracting and how difficult was it to achieve that objective technically? >> that's a good question. What we're abstracting is the Kubernetes node construct. That is to say a cluster of Kubernetes nodes which are typically VMs, although they can run bare metal in certain contexts, is something that typically to stand up requires knowledge of the underlying cloud provider. So for example, with GCP, you would use GKE to set up a Kubernetes cluster, and in Azure, you'd use AKS. We are actually abstracting that aspect of things so that the developers standing up applications don't have to know what the underlying cluster management provider is. They don't have to know if it's GCP, AKS or our own Walmart private cloud. Now, in terms of functions like Azure functions that you've mentioned there, we haven't done that yet. That's another piece that we have sort of on our radar screen that, we'd like to get to is serverless approach, and the Knative work from Google and the Azure functions, those are things that we see good opportunity to use for a whole variety of use cases. But right now we're not doing much with that. We're strictly container based right now, and we do have some VMs that are running in sort of more of a traditional model. So our stateful workloads are primarily VM based, but for serverless, that's an opportunity for us to take some of these stateless workloads and turn them into cloud functions. >> Well, and that's another cost lever that you can pull down the road that's going to drop right to the bottom line. Do you see a day or maybe you're doing it today, but I'd be surprised, but where you build applications that actually span multiple clouds or is there, in your view, always going to be a direct one-to-one mapping between where an application runs and the specific cloud platform? >> That's a really great question. Well, yes and no. So today, application development teams choose a cloud provider to deploy to and a location to deploy to, and they have to get involved in moving an application like we talked about today. That said, the bursting capability that I mentioned previously is something that is a step in the direction of automatic migration. That is to say we're migrating workload to different locations automatically. Currently, the prototypes we've been developing and that we think are going to eventually make their way into production are leveraging Istio to assess the load incoming on a particular cluster and start shedding that load into a different location. Right now, the configuration of that is still manual, but there's another opportunity for automation there. And I think a key piece of this is that down the road, well, that's a, sort of a small step in the direction of an application being multi provider. We expect to see really an abstraction of the fact that there is a triplet even. So the workloads are moving around according to whatever the control plane decides is necessary based on a whole variety of inputs. And at that point, you will have true multi-cloud applications, applications that are distributed across the different providers and in a way that application developers don't have to think about. >> So Walmart's been a leader, Jack, in using data for competitive advantages for decades. It's kind of been a poster child for that. You've got a mountain of IP in the form of data, tools, applications best practices that until the cloud came out was all On Prem. But I'm really interested in this idea of building a Walmart ecosystem, which obviously you have. Do you see a day or maybe you're even doing it today where you take what we call the Walmart SuperCloud, WCNP in your words, and point or turn that toward an external world or your ecosystem, you know, supporting those partners or customers that could drive new revenue streams, you know directly from the platform? >> Great question, Steve. So there's really two things to say here. The first is that with respect to data, our data workloads are primarily VM basis. I've mentioned before some VMware, some straight open stack. But the key here is that WCNP and Kubernetes are very powerful for stateless workloads, but for stateful workloads tend to be still climbing a bit of a growth curve in the industry. So our data workloads are not primarily based on WCNP. They're VM based. Now that said, there is opportunity to make some progress there, and we are looking at ways to move things into containers that are currently running in VMs which are stateful. The other question you asked is related to how we expose data to third parties and also functionality. Right now we do have in-house, for our own use, a very robust data architecture, and we have followed the sort of domain-oriented data architecture guidance from Martin Fowler. And we have data lakes in which we collect data from all the transactional systems and which we can then use and do use to build models which are then used in our applications. But right now we're not exposing the data directly to customers as a product. That's an interesting direction that's been talked about and may happen at some point, but right now that's internal. What we are exposing to customers is applications. So we're offering our global integrated fulfillment capabilities, our order picking and curbside pickup capabilities, and our cloud powered checkout capabilities to third parties. And this means we're standing up our own internal applications as externally facing SaaS applications which can serve our partners' customers. >> Yeah, of course, Martin Fowler really first introduced to the world Zhamak Dehghani's data mesh concept and this whole idea of data products and domain oriented thinking. Zhamak Dehghani, by the way, is a speaker at our event as well. Last question I had is edge, and how you think about the edge? You know, the stores are an edge. Are you putting resources there that sort of mirror this this triplet model? Or is it better to consolidate things in the cloud? I know there are trade-offs in terms of latency. How are you thinking about that? >> All really good questions. It's a challenging area as you can imagine because edges are subject to disconnection, right? Or reduced connection. So we do place the same architecture at the edge. So WCNP runs at the edge, and an application that's designed to run at WCNP can run at the edge. That said, there are a number of very specific considerations that come up when running at the edge, such as the possibility of disconnection or degraded connectivity. And so one of the challenges we have faced and have grappled with and done a good job of I think is dealing with the fact that applications go offline and come back online and have to reconnect and resynchronize, the sort of online offline capability is something that can be quite challenging. And we have a couple of application architectures that sort of form the two core sets of patterns that we use. One is an offline/online synchronization architecture where we discover that we've come back online, and we understand the differences between the online dataset and the offline dataset and how they have to be reconciled. The other is a message-based architecture. And here in our health and wellness domain, we've developed applications that are queue based. So they're essentially business processes that consist of multiple steps where each step has its own queue. And what that allows us to do is devote whatever bandwidth we do have to those pieces of the process that are most latency sensitive and allow the queue lengths to increase in parts of the process that are not latency sensitive, knowing that they will eventually catch up when the bandwidth is restored. And to put that in a little bit of context, we have fiber lengths to all of our locations, and we have I'll just use a round number, 10-ish thousand locations. It's larger than that, but that's the ballpark, and we have fiber to all of them, but when the fiber is disconnected, and it does get disconnected on a regular basis. In fact, I forget the exact number, but some several dozen locations get disconnected daily just by virtue of the fact that there's construction going on and things are happening in the real world. When the disconnection happens, we're able to fall back to 5G and to Starlink. Starlink is preferred. It's a higher bandwidth. 5G if that fails. But in each of those cases, the bandwidth drops significantly. And so the applications have to be intelligent about throttling back the traffic that isn't essential, so that it can push the essential traffic in those lower bandwidth scenarios. >> So much technology to support this amazing business which started in the early 1960s. Jack, unfortunately, we're out of time. I would love to have you back or some members of your team and drill into how you're using open source, but really thank you so much for explaining the approach that you've taken and participating in SuperCloud2. >> You're very welcome, Dave, and we're happy to come back and talk about other aspects of what we do. For example, we could talk more about the data lakes and the data mesh that we have in place. We could talk more about the directions we might go with serverless. So please look us up again. Happy to chat. >> I'm going to take you up on that, Jack. All right. This is Dave Vellante for John Furrier and the Cube community. Keep it right there for more action from SuperCloud2. (upbeat music)

Published Date : Jan 9 2023

SUMMARY :

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Dave	PERSON	0.99+
Jack	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Walmart	ORGANIZATION	0.99+
Martin Fowler	PERSON	0.99+
US	LOCATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Today	DATE	0.99+
each	QUANTITY	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
Starlink	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
today	DATE	0.99+
three	QUANTITY	0.99+
first	QUANTITY	0.99+
each step	QUANTITY	0.99+
First	QUANTITY	0.99+
early 1960s	DATE	0.98+
one	QUANTITY	0.98+
a day	QUANTITY	0.98+
GCP	TITLE	0.97+
Azure	TITLE	0.96+
WCNP	TITLE	0.96+
10 milliseconds	QUANTITY	0.96+
both	QUANTITY	0.96+
Kubernetes	TITLE	0.94+
Cloud Spanner	TITLE	0.94+
Linkerd	ORGANIZATION	0.93+
Cube	ORGANIZATION	0.93+
triplet	QUANTITY	0.92+
three cloud providers	QUANTITY	0.91+
two core sets	QUANTITY	0.88+
John Furrier	PERSON	0.86+
one more piece	QUANTITY	0.86+
SuperCloud2	ORGANIZATION	0.86+
two public clouds	QUANTITY	0.86+
thousand locations	QUANTITY	0.83+
Vice President	PERSON	0.8+
10-ish	QUANTITY	0.79+
WCNP	ORGANIZATION	0.75+
decades	QUANTITY	0.75+
three different major regions	QUANTITY	0.74+

Supercloud2: What's in it for me?

>> On January 17th, 2023 join theCUBE community for SuperCloud2 where we explore the intersection of cloud and data. One of our gold sponsors is ChaosSearch and I'm here with Ed Walsh, CEO of the company. Ed, why should people attend SuperCloud2? >> That's good question. Listen, Supercloud is a mega trend, just like you said, data and cloud, I would also add analytics to it and some companies but also some end user enterprise and some companies are using it for great, things you couldn't possibly do without this design principle. In fact, if you're doing anything around cloud, data analytics, you need to look at these things or you're not going to keep up with your data growth. >> Awesome. January 17th, go to SuperCloud.World and register. You don't want to miss the conversations with data mesh founders, Zhamak Dehghani, technologists like Bob Muglia and customers building super clouds like Wal-Mart. Don't miss it.

Published Date : Jan 6 2023

SUMMARY :

and I'm here with Ed and some companies but also World and register.

ENTITIES

Entity	Category	Confidence
Ed Walsh	PERSON	0.99+
January 17th	DATE	0.99+
Bob Muglia	PERSON	0.99+
January 17th, 2023	DATE	0.99+
Zhamak Dehghani	PERSON	0.99+
Wal-Mart	ORGANIZATION	0.99+
Ed	PERSON	0.99+
SuperCloud2	EVENT	0.99+
One	QUANTITY	0.99+
ChaosSearch	ORGANIZATION	0.96+
Supercloud2	ORGANIZATION	0.9+
Supercloud	ORGANIZATION	0.9+
SuperCloud.World	EVENT	0.78+

Why Attend Supercloud2?

>> Ed, Supercloud 2, why should I go? >> So I would say supercloud is really a megatrend. You always talk about cloud and data analytics who are the megatrends? And if you're focused on those things, you need to be looking at the principles. I look at design principles about you can use your supercloud. 'Cause companies, end user companies and big companies are using these type of principles to get results that you just simply can't do if you're not looking at these principles. So if you're looking at cloud, data or analytics, these are the principles you got to understand 'cause it's really a megatrend in the industry. >> January 17th, go to supercloud.world and register. You don't want to miss the conversations with data mesh founder, Zhamak Dehghani, technologists like Bob Muglia, and customers building superclouds like Walmart. Don't miss it.

Published Date : Jan 6 2023

SUMMARY :

So if you're looking at January 17th, go to

ENTITIES

Entity	Category	Confidence
Walmart	ORGANIZATION	0.99+
January 17th	DATE	0.99+
Bob Muglia	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
supercloud	ORGANIZATION	0.99+
Supercloud2	ORGANIZATION	0.98+
superclouds	ORGANIZATION	0.92+
Ed	PERSON	0.69+
Supercloud 2	ORGANIZATION	0.67+
supercloud.world	ORGANIZATION	0.64+

David Flynn Supercloud Audio

>> From every ISV to solve the problems. You want there to be tools in place that you can use, either open source tools or whatever it is that help you build it. And slowly over time, that building will become easier and easier. So my question to you was, where do you see you playing? Do you see yourself playing to ISVs as a set of tools, which will make their life a lot easier and provide that work? >> Absolutely. >> If they don't have, so they don't have to do it. Or you're providing this for the end users? Or both? >> So it's a progression. If you go to the ISVs first, you're doomed to starved before you have time for that other option. >> Yeah. >> Right? So it's a question of phase, the phasing of it. And also if you go directly to end users, you can demonstrate the power of it and get the attention of the ISVs. I believe that the ISVs, especially those with the biggest footprints and the most, you know, coveted estates, they have already made massive investments at trying to solve decentralization of their software stack. And I believe that they have used it as a hook to try to move to a software as a service model and rope people into leasing their infrastructure. So if you look at the clouds that have been propped up by Autodesk or by Adobe, or you name the company, they are building proprietary makeshift solutions for decentralizing or hybrid clouding. Or maybe they're not even doing that at all and all they're is saying hey, if you want to get location agnosticness, then what you should just, is just move into our cloud. >> Right. >> And then they try to solve on the background how to decentralize it between different regions so they can have decent offerings in each region. But those who are more advanced have already made larger investments and will be more averse to, you know, throwing that stuff away, all of their makeshift machinery away, and using a platform that gives them high performance parallel, low level file system access, while at the same time having metadata-driven, you know, policy-based, intent-based orchestration to manage the diffusion of data across a decentralized infrastructure. They are not going to be as open because they've made such an investment and they're going to look at how do they monetize it. So what we have found with like the movie studios who are using us already, many of the app they're using, many of those software offerings, the ISVs have their own cloud that offers that software for the cloud. But what we got when I asked about this, 'cause I was dealt specifically into this question because I'm very interested to know how we're going to make that leap from end user upstream into the ISVs where I believe we need to, and they said, look, we cannot use these software ISV-specific SAS clouds for two reasons. Number one is we lose control of the data. We're giving it to them. That's security and other issues. And here you're talking about we're doing work for Disney, we're doing work for Netflix, and they're not going to let us put our data on those software clouds, on those SAS clouds. Secondly, in any reasonable pipeline, the data is shared by many different applications. We need to be agnostic as to the application. 'Cause the inputs to one application, you know, the output for one application provides the input to the next, and it's not necessarily from the same vendor. So they need to have a data platform that lets them, you know, go from one software stack, and you know, to run it on another. Because they might do the rendering with this and yet, they do the editing with that, and you know, et cetera, et cetera. So I think the further you go up the stack in the structured data and dedicated applications for specific functions in specific verticals, the further up the stack you go, the harder it is to justify a SAS offering where you're basically telling the end users you need to park all your data with us and then you can run your application in our cloud and get this. That ultimately is a dead end path versus having the data be open and available to many applications across this supercloud layer. >> Okay, so-- >> Is that making any sense? >> Yes, so if I could just ask a clarifying question. So, if I had to take Snowflake as an example, I think they're doing exactly what you're saying is a dead end, put everything into our proprietary system and then we'll figure out how to distribute it. >> Yeah. >> And and I think if you're familiar with Zhamak Dehghaniis' data mesh concept. Are you? >> A little bit, yeah. >> But in her model, Snowflake, a Snowflake warehouse is just a node on the mesh and that mesh is-- >> That's right. >> Ultimately the supercloud and you're an enabler of that is what I'm hearing. >> That's right. What they're doing up at the structured level and what they're talking about at the structured level we're doing at the underlying, unstructured level, which by the way has implications for how you implement those distributed database things. In other words, implementing a Snowflake on top of Hammerspace would have made building stuff like in the first place easier. It would allow you to easily shift and run the database engine anywhere. You still have to solve how to shard and distribute at the transaction layer above, so I'm not saying we're a substitute for what you need to do at the app layer. By the way, there is another example of that and that's Microsoft Office, right? It's one thing to share that, to have a file share where you can share all the docs. It's something else to have Word and PowerPoint, Excel know how to allow people to be simultaneously editing the same doc. That's always going to happen in the app layer. But not all applications need that level of, you know, in-app decentralization. You know, many of them, many workflows are pipelined, especially the ones that are very data intensive where you're doing drug discovery or you're doing rendering, or you're doing machine learning training. These things are human in the loop with large stages of processing across tens of thousands of cores. And I think that kind of data processing pipeline is what we're focusing on first. Not so much the Microsoft Office or the Snowflake, you know, parking a relational database because that takes a lot of application layer stuff and that's what they're good at. >> Right. >> But I think... >> Go ahead, sorry. >> Later entrance in these markets will find Hammerspace as a way to accelerate their work so they can focus more narrowly on just the stuff that's app-specific, higher level sharing in the app. >> Yes, Snowflake founders-- >> I think it might be worth mentioning also, just keep this confidential guys, but one of our customers is Blue Origin. And one of the things that we have found is kind of the point of what you're talking about with our customers. They're needing to build this and since it's not commercially available or they don't know where to look for it to be commercially available, they're all building themselves. So this layer is needed. And Blue is just one of the examples of quite a few we're now talking to. And like manufacturing, HPC, research where they're out trying to solve this problem with their own scripting tools and things like that. And I just, I don't know if there's anything you want to add, David, but you know, but there's definitely a demand here and customers are trying to figure out how to solve it beyond what Hammerspace is doing. Like the need is so great that they're just putting developers on trying to do it themselves. >> Well, and you know, Snowflake founders, they didn't have a Hammerspace to lean on. But, one of the things that's interesting about supercloud is we feel as though industry clouds will emerge, that as part of company's digital transformations, they will, you know, every company's a software company, they'll begin to build their own clouds and they will be able to use a Hammerspace to do that. >> A super pass layer. >> Yes. It's really, I don't know if David's speaking, I don't want to speak over him, but we can't hear you. May be going through a bad... >> Well, a regional, regional talks that make that possible. And so they're doing these render farms and editing farms, and it's a cloud-specific to the types of workflows in the median entertainment world. Or clouds specifically to workflows in the chip design world or in the drug and bio and life sciences exploration world. There are large organizations that are kind of a blend of end users, like the Broad, which has their own kind of cloud where they're asking collaborators to come in and work with them. So it starts to even blur who's an end user versus an ISV. >> Yes. >> Right? When you start talking about the massive data is the main gravity is to having lots of people participate. >> Yep, and that's where the value is. And that's where the value is. And this is a megatrend that we see. And so it's really important for us to get to the point of what is and what is not a supercloud and, you know, that's where we're trying to evolve. >> Let's talk about this for a second 'cause I want to, I want to challenge you on something and it's something that I got challenged on and it has led me to thinking differently than I did at first, which Molly can attest to. Okay? So, we have been looking for a way to talk about the concept of cloud of utility computing, run anything anywhere that isn't addressed in today's realization of cloud. 'Cause today's cloud is not run anything anywhere, it's quite the opposite. You park your data in AWS and that's where you run stuff. And you pretty much have to. Same with with Azure. They're using data gravity to keep you captive there, just like the old infrastructure guys did. But now it's even worse because it's coupled back with the software to some degree, as well. And you have to use their storage, networking, and compute. It's not, I mean it fell back to the mainframe era. Anyhow, so I love the concept of supercloud. By the way, I was going to suggest that a better term might be hyper cloud since hyper speaks to the multidimensionality of it and the ability to be in a, you know, be in a different dimension, a different plane of existence kind of thing like hyperspace. But super and hyper are somewhat synonyms. I mean, you have hyper cars and you have super cars and blah, blah, blah. I happen to like hyper maybe also because it ties into the whole Hammerspace notion of a hyper-dimensional, you know, reality, having your data centers connected by a wormhole that is Hammerspace. But regardless, what I got challenged on is calling it something different at all versus simply saying, this is what cloud has always meant to be. This is the true cloud, this is real cloud, this is cloud. And I think back to what happened, you'll remember, at Fusion IO we talked about IO memory and we did that because people had a conceptualization of what an SSD was. And an SSD back then was low capacity, low endurance, made to go military, aerospace where things needed to be rugged but was completely useless in the data center. And we needed people to imagine this thing as being able to displace entire SAND, with the kind of capacity density, performance density, endurance. And so we talked IO memory, we could have said enterprise SSD, and that's what the industry now refers to for that concept. What will people be saying five and 10 years from now? Will they simply say, well this is cloud as it was always meant to be where you are truly able to run anything anywhere and have not only the same APIs, but you're same data available with high performance access, all forms of access, block file and object everywhere. So yeah. And I wonder, and this is just me throwing it out there, I wonder if, well, there's trade offs, right? Giving it a new moniker, supercloud, versus simply talking about how cloud is always intended to be and what it was meant to be, you know, the real cloud or true cloud, there are trade-offs. By putting a name on it and branding it, that lets people talk about it and understand they're talking about something different. But it also is that an affront to people who thought that that's what they already had. >> What's different, what's new? Yes, and so we've given a lot of thought to this. >> Right, it's like you. >> And it's because we've been asked that why does the industry need a new term, and we've tried to address some of that. But some of the inside baseball that we haven't shared is, you remember the Web 2.0, back then? >> Yep. >> Web 2.0 was the same thing. And I remember Tim Burners Lee saying, "Why do we need Web 2.0? "This is what the Web was always supposed to be." But the truth is-- >> I know, that was another perfect-- >> But the truth is it wasn't, number one. Number two, everybody hated the Web 2.0 term. John Furrier was actually in the middle of it all. And then it created this groundswell. So one of the things we wrote about is that supercloud is an evocative term that catalyzes debate and conversation, which is what we like, of course. And maybe that's self-serving. But yeah, HyperCloud, Metacloud, super, meaning, it's funny because super came from Latin supra, above, it was never the superlative. But the superlative was a convenient byproduct that caused a lot of friction and flack, which again, in the media business is like a perfect storm brewing. >> The bad thing to have to, and I think you do need to shake people out of their, the complacency of the limitations that they're used to. And I'll tell you what, the fact that you even have the terms hybrid cloud, multi-cloud, private cloud, edge computing, those are all just referring to the different boundaries that isolate the silo that is the current limited cloud. >> Right. >> So if I heard correctly, what just, in terms of us defining what is and what isn't in supercloud, you would say traditional applications which have to run in a certain place, in a certain cloud can't run anywhere else, would be the stuff that you would not put in as being addressed by supercloud. And over time, you would want to be able to run the data where you want to and in any of those concepts. >> Or even modern apps, right? Or even modern apps that are siloed in SAS within an individual cloud, right? >> So yeah, I guess it's twofold. Number one, if you're going at the high application layers, there's lots of ways that you can give the appearance of anything running anywhere. The ISV, the SAS vendor can engineer stuff to have the ability to serve with low enough latency to different geographies, right? So if you go too high up the stack, it kind of loses its meaning because there's lots of different ways to make due and give the appearance of omni-presence of the service. Okay? As you come down more towards the platform layer, it gets harder and harder to mask the fact that supercloud is something entirely different than just a good regionally-distributed SAS service. So I don't think you, I don't think you can distinguish supercloud if you go too high up the stack because it's just SAS, it's just a good SAS service where the SAS vendor has done the hard work to give you low latency access from different geographic regions. >> Yeah, so this is one of the hardest things, David. >> Common among them. >> Yeah, this is really an important point. This is one of the things I've had the most trouble with is why is this not just SAS? >> So you dilute your message when you go up to the SAS layer. If you were to focus most of this around the super pass layer, the how can you host applications and run them anywhere and not host this, not run a service, not have a service available everywhere. So how can you take any application, even applications that are written, you know, in a traditional legacy data center fashion and be able to run them anywhere and have them have their binaries and their datasets and the runtime environment and the infrastructure to start them and stop them? You know, the jobs, the, what the Kubernetes, the job scheduler? What we're really talking about here, what I think we're really talking about here is building the operating system for a decentralized cloud. What is the operating system, the operating environment for a decentralized cloud? Where you can, and that the main two functions of an operating system or an operating environment are the process scheduler, the thing that's scheduling what is running where and when and so forth, and the file system, right? The thing that's supplying a common view and access to data. So when we talk about this, I think that the strongest argument for supercloud is made when you go down to the platform layer and talk of it, talk about it as an operating environment on which you can run all forms of applications. >> Would you exclude--? >> Not a specific application that's been engineered as a SAS. (audio distortion) >> He'll come back. >> Are you there? >> Yeah, yeah, you just cut out for a minute. >> I lost your last statement when you broke up. >> We heard you, you said that not the specific application. So would you exclude Snowflake from supercloud? >> Frankly, I would. I would. Because, well, and this is kind of hard to do because Snowflake doesn't like to, Frank doesn't like to talk about Snowflake as a SAS service. It has a negative connotation. >> But it is. >> I know, we all know it is. We all know it is and because it is, yes, I would exclude them. >> I think I actually have him on camera. >> There's nothing in common. >> I think I have him on camera or maybe Benoit as saying, "Well, we are a SAS." I think it's Slootman. I think I said to Slootman, "I know you don't like to say you're a SAS." And I think he said, "Well, we are a SAS." >> Because again, if you go to the top of the application stack, there's any number of ways you can give it location agnostic function or you know, regional, local stuff. It's like let's solve the location problem by having me be your one location. How can it be decentralized if you're centralizing on (audio distortion)? >> Well, it's more decentralized than if it's all in one cloud. So let me actually, so the spectrum. So again, in the spirit of what is and what isn't, I think it's safe to say Hammerspace is supercloud. I think there's no debate there, right? Certainly among this crowd. And I think we can all agree that Dell, Dell Storage is not supercloud. Where it gets fuzzy is this Snowflake example or even, how about a, how about a Cohesity that instantiates its stack in different cloud regions in different clouds, and synchronizes, however magic sauce it does that. Is that a supercloud? I mean, so I'm cautious about having too strict of a definition 'cause then only-- >> Fair enough, fair enough. >> But I could use your help and thoughts on that. >> So I think we're talking about two different spectrums here. One is the spectrum of platform to application-specific. As you go up the application stack and it becomes this specific thing. Or you go up to the more and more structured where it's serving a specific application function where it's more of a SAS thing. I think it's harder to call a SAS service a supercloud. And I would argue that the reason there, and what you're lacking in the definition is to talk about it as general purpose. Okay? Now, that said, a data warehouse is general purpose at the structured data level. So you could make the argument for why Snowflake is a supercloud by saying that it is a general purpose platform for doing lots of different things. It's just one at a higher level up at the structured data level. So one spectrum is the high level going from platform to, you know, unstructured data to structured data to very application-specific, right? Like a specific, you know, CAD/CAM mechanical design cloud, like an Autodesk would want to give you their cloud for running, you know, and sharing CAD/CAM designs, doing your CAD/CAM anywhere stuff. Well, the other spectrum is how well does the purported supercloud technology actually live up to allowing you to run anything anywhere with not just the same APIs but with the local presence of data with the exact same runtime environment everywhere, and to be able to correctly manage how to get that runtime environment anywhere. So a Cohesity has some means of running things in different places and some means of coordinating what's where and of serving diff, you know, things in different places. I would argue that it is a very poor approximation of what Hammerspace does in providing the exact same file system with local high performance access everywhere with metadata ability to control where the data is actually instantiated so that you don't have to wait for it to get orchestrated. But even then when you do have to wait for it, it happens automatically and so it's still only a matter of, well, how quick is it? And on the other end of the spectrum is you could look at NetApp with Flexcache and say, "Is that supercloud?" And I would argue, well kind of because it allows you to run things in different places because it's a cache. But you know, it really isn't because it presumes some central silo from which you're cacheing stuff. So, you know, is it or isn't it? Well, it's on a spectrum of exactly how fully is it decoupling a runtime environment from specific locality? And I think a cache doesn't, it stretches a specific silo and makes it have some semblance of similar access in other places. But there's still a very big difference to the central silo, right? You can't turn off that central silo, for example. >> So it comes down to how specific you make the definition. And this is where it gets kind of really interesting. It's like cloud. Does IBM have a cloud? >> Exactly. >> I would say yes. Does it have the kind of quality that you would expect from a hyper-scale cloud? No. Or see if you could say the same thing about-- >> But that's a problem with choosing a name. That's the problem with choosing a name supercloud versus talking about the concept of cloud and how true up you are to that concept. >> For sure. >> Right? Because without getting a name, you don't have to draw, yeah. >> I'd like to explore one particular or bring them together. You made a very interesting observation that from a enterprise point of view, they want to safeguard their store, their data, and they want to make sure that they can have that data running in their own workflows, as well as, as other service providers providing services to them for that data. So, and in in particular, if you go back to, you go back to Snowflake. If Snowflake could provide the ability for you to have your data where you wanted, you were in charge of that, would that make Snowflake a supercloud? >> I'll tell you, in my mind, they would be closer to my conceptualization of supercloud if you can instantiate Snowflake as software on your own infrastructure, and pump your own data to Snowflake that's instantiated on your own infrastructure. The fact that it has to be on their infrastructure or that it's on their, that it's on their account in the cloud, that you're giving them the data and they're, that fundamentally goes against it to me. If they, you know, they would be a pure, a pure plate if they were a software defined thing where you could instantiate Snowflake machinery on the infrastructure of your choice and then put your data into that machinery and get all the benefits of Snowflake. >> So did you see--? >> In other words, if they were not a SAS service, but offered all of the similar benefits of being, you know, if it were a service that you could run on your own infrastructure. >> So did you see what they announced, that--? >> I hope that's making sense. >> It does, did you see what they announced at Dell? They basically announced the ability to take non-native Snowflake data, read it in from an object store on-prem, like a Dell object store. They do the same thing with Pure, read it in, running it in the cloud, and then push it back out. And I was saying to Dell, look, that's fine. Okay, that's interesting. You're taking a materialized view or an extended table, whatever you're doing, wouldn't it be more interesting if you could actually run the query locally with your compute? That would be an extension that would actually get my attention and extend that. >> That is what I'm talking about. That's what I'm talking about. And that's why I'm saying I think Hammerspace is more progressive on that front because with our technology, anybody who can instantiate a service, can make a service. And so I, so MSPs can use Hammerspace as a way to build a super pass layer and host their clients on their infrastructure in a cloud-like fashion. And their clients can have their own private data centers and the MSP or the public clouds, and Hammerspace can be instantiated, get this, by different parties in these different pieces of infrastructure and yet linked together to make a common file system across all of it. >> But this is data mesh. If I were HPE and Dell it's exactly what I'd be doing. I'd be working with Hammerspace to create my own data. I'd work with Databricks, Snowflake, and any other-- >> Data mesh is a good way to put it. Data mesh is a good way to put it. And this is at the lowest level of, you know, the underlying file system that's mountable by the operating system, consumed as a real file system. You can't get lower level than that. That's why this is the foundation for all of the other apps and structured data systems because you need to have a data mesh that can at least mesh the binary blob. >> Okay. >> That hold the binaries and that hold the datasets that those applications are running. >> So David, in the third week of January, we're doing supercloud 2 and I'm trying to convince John Furrier to make it a data slash data mesh edition. I'm slowly getting him to the knothole. I would very much, I mean you're in the Bay Area, I'd very much like you to be one of the headlines. As Zhamak Dehghaniis going to speak, she's the creator of Data Mesh, >> Sure. >> I'd love to have you come into our studio as well, for the live session. If you can't make it, we can pre-record. But you're right there, so I'll get you the dates. >> We'd love to, yeah. No, you can count on it. No, definitely. And you know, we don't typically talk about what we do as Data Mesh. We've been, you know, using global data environment. But, you know, under the covers, that's what the thing is. And so yeah, I think we can frame the discussion like that to line up with other, you know, with the other discussions. >> Yeah, and Data Mesh, of course, is one of those evocative names, but she has come up with some very well defined principles around decentralized data, data as products, self-serve infrastructure, automated governance, and and so forth, which I think your vision plugs right into. And she's brilliant. You'll love meeting her. >> Well, you know, and I think.. Oh, go ahead. Go ahead, Peter. >> Just like to work one other interface which I think is important. How do you see yourself and the open source? You talked about having an operating system. Obviously, Linux is the operating system at one level. How are you imagining that you would interface with cost community as part of this development? >> Well, it's funny you ask 'cause my CTO is the kernel maintainer of the storage networking stack. So how the Linux operating system perceives and consumes networked data at the file system level, the network file system stack is his purview. He owns that, he wrote most of it over the last decade that he's been the maintainer, but he's the gatekeeper of what goes in. And we have leveraged his abilities to enhance Linux to be able to use this decentralized data, in particular with decoupling the control plane driven by metadata from the data access path and the many storage systems on which the data gets accessed. So this factoring, this splitting of control plane from data path, metadata from data, was absolutely necessary to create a data mesh like we're talking about. And to be able to build this supercloud concept. And the highways on which the data runs and the client which knows how to talk to it is all open source. And we have, we've driven the NFS 4.2 spec. The newest NFS spec came from my team. And it was specifically the enhancements needed to be able to build a spanning file system, a data mesh at a file system level. Now that said, our file system itself and our server, our file server, our data orchestration, our data management stuff, that's all closed source, proprietary Hammerspace tech. But the highways on which the mesh connects are actually all open source and the client that knows how to consume it. So we would, honestly, I would welcome competitors using those same highways. They would be at a major disadvantage because we kind of built them, but it would still be very validating and I think only increase the potential adoption rate by more than whatever they might take of the market. So it'd actually be good to split the market with somebody else to come in and share those now super highways for how to mesh data at the file system level, you know, in here. So yeah, hopefully that answered your question. Does that answer the question about how we embrace the open source? >> Right, and there was one other, just that my last one is how do you enable something to run in every environment? And if we take the edge, for example, as being, as an environment which is much very, very compute heavy, but having a lot less capability, how do you do a hold? >> Perfect question. Perfect question. What we do today is a software appliance. We are using a Linux RHEL 8, RHEL 8 equivalent or a CentOS 8, or it's, you know, they're all roughly equivalent. But we have bundled and a software appliance which can be instantiated on bare metal hardware on any type of VM system from VMware to all of the different hypervisors in the Linux world, to even Nutanix and such. So it can run in any virtualized environment and it can run on any cloud instance, server instance in the cloud. And we have it packaged and deployable from the marketplaces within the different clouds. So you can literally spin it up at the click of an API in the cloud on instances in the cloud. So with all of these together, you can basically instantiate a Hammerspace set of machinery that can offer up this file system mesh. like we've been using the terminology we've been using now, anywhere. So it's like being able to take and spin up Snowflake and then just be able to install and run some VMs anywhere you want and boom, now you have a Snowflake service. And by the way, it is so complete that some of our customers, I would argue many aren't even using public clouds at all, they're using this just to run their own data centers in a cloud-like fashion, you know, where they have a data service that can span it all. >> Yeah and to Molly's first point, we would consider that, you know, cloud. Let me put you on the spot. If you had to describe conceptually without a chalkboard what an architectural diagram would look like for supercloud, what would you say? >> I would say it's to have the same runtime environment within every data center and defining that runtime environment as what it takes to schedule the execution of applications, so job scheduling, runtime stuff, and here we're talking Kubernetes, Slurm, other things that do job scheduling. We're talking about having a common way to, you know, instantiate compute resources. So a global compute environment, having a common compute environment where you can instantiate things that need computing. Okay? So that's the first part. And then the second is the data platform where you can have file block and object volumes, and have them available with the same APIs in each of these distributed data centers and have the exact same data omnipresent with the ability to control where the data is from one moment to the next, local, where all the data is instantiate. So my definition would be a common runtime environment that's bifurcate-- >> Oh. (attendees chuckling) We just lost them at the money slide. >> That's part of the magic makes people listen. We keep someone on pin and needles waiting. (attendees chuckling) >> That's good. >> Are you back, David? >> I'm on the edge of my seat. Common runtime environment. It was like... >> And just wait, there's more. >> But see, I'm maybe hyper-focused on the lower level of what it takes to host and run applications. And that's the stuff to schedule what resources they need to run and to get them going and to get them connected through to their persistence, you know, and their data. And to have that data available in all forms and have it be the same data everywhere. On top of that, you could then instantiate applications of different types, including relational databases, and data warehouses and such. And then you could say, now I've got, you know, now I've got these more application-level or structured data-level things. I tend to focus less on that structured data level and the application level and am more focused on what it takes to host any of them generically on that super pass layer. And I'll admit, I'm maybe hyper-focused on the pass layer and I think it's valid to include, you know, higher levels up the stack like the structured data level. But as soon as you go all the way up to like, you know, a very specific SAS service, I don't know that you would call that supercloud. >> Well, and that's the question, is there value? And Marianna Tessel from Intuit said, you know, we looked at it, we did it, and it just, it was actually negative value for us because connecting to all these separate clouds was a real pain in the neck. Didn't bring us any additional-- >> Well that's 'cause they don't have this pass layer underneath it so they can't even shop around, which actually makes it hard to stand up your own SAS service. And ultimately they end up having to build their own infrastructure. Like, you know, I think there's been examples like Netflix moving away from the cloud to their own infrastructure. Basically, if you're going to rent it for more than a few months, it makes sense to build it yourself, if it's at any kind of scale. >> Yeah, for certain components of that cloud. But if the Goldman Sachs came to you, David, and said, "Hey, we want to collaborate and we want to build "out a cloud and essentially build our SAS system "and we want to do that with Hammerspace, "and we want to tap the physical infrastructure "of not only our data centers but all the clouds," then that essentially would be a SAS, would it not? And wouldn't that be a Super SAS or a supercloud? >> Well, you know, what they may be using to build their service is a supercloud, but their service at the end of the day is just a SAS service with global reach. Right? >> Yeah. >> You know, look at, oh shoot. What's the name of the company that does? It has a cloud for doing bookkeeping and accounting. I forget their name, net something. NetSuite. >> NetSuite. NetSuite, yeah, Oracle. >> Yeah. >> Yep. >> Oracle acquired them, right? Is NetSuite a supercloud or is it just a SAS service? You know? I think under the covers you might ask are they using supercloud under the covers so that they can run their SAS service anywhere and be able to shop the venue, get elasticity, get all the benefits of cloud in the, to the benefit of their service that they're offering? But you know, folks who consume the service, they don't care because to them they're just connecting to some endpoint somewhere and they don't have to care. So the further up the stack you go, the more location-agnostic it is inherently anyway. >> And I think it's, paths is really the critical layer. We thought about IAS Plus and we thought about SAS Minus, you know, Heroku and hence, that's why we kind of got caught up and included it. But SAS, I admit, is the hardest one to crack. And so maybe we exclude that as a deployment model. >> That's right, and maybe coming down a level to saying but you can have a structured data supercloud, so you could still include, say, Snowflake. Because what Snowflake is doing is more general purpose. So it's about how general purpose it is. Is it hosting lots of other applications or is it the end application? Right? >> Yeah. >> So I would argue general purpose nature forces you to go further towards platform down-stack. And you really need that general purpose or else there is no real distinguishing. So if you want defensible turf to say supercloud is something different, I think it's important to not try to wrap your arms around SAS in the general sense. >> Yeah, and we've kind of not really gone, leaned hard into SAS, we've just included it as a deployment model, which, given the constraints that you just described for structured data would apply if it's general purpose. So David, super helpful. >> Had it sign. Define the SAS as including the hybrid model hold SAS. >> Yep. >> Okay, so with your permission, I'm going to add you to the list of contributors to the definition. I'm going to add-- >> Absolutely. >> I'm going to add this in. I'll share with Molly. >> Absolutely. >> We'll get on the calendar for the date. >> If Molly can share some specific language that we've been putting in that kind of goes to stuff we've been talking about, so. >> Oh, great. >> I think we can, we can share some written kind of concrete recommendations around this stuff, around the general purpose, nature, the common data thing and yeah. >> Okay. >> Really look forward to it and would be glad to be part of this thing. You said it's in February? >> It's in January, I'll let Molly know. >> Oh, January. >> What the date is. >> Excellent. >> Yeah, third week of January. Third week of January on a Tuesday, whatever that is. So yeah, we would welcome you in. But like I said, if it doesn't work for your schedule, we can prerecord something. But it would be awesome to have you in studio. >> I'm sure with this much notice we'll be able to get something. Let's make sure we have the dates communicated to Molly and she'll get my admin to set it up outside so that we have it. >> I'll get those today to you, Molly. Thank you. >> By the way, I am so, so pleased with being able to work with you guys on this. I think the industry needs it very bad. They need something to break them out of the box of their own mental constraints of what the cloud is versus what it's supposed to be. And obviously, the more we get people to question their reality and what is real, what are we really capable of today that then the more business that we're going to get. So we're excited to lend the hand behind this notion of supercloud and a super pass layer in whatever way we can. >> Awesome. >> Can I ask you whether your platforms include ARM as well as X86? >> So we have not done an ARM port yet. It has been entertained and won't be much of a stretch. >> Yeah, it's just a matter of time. >> Actually, entertained doing it on behalf of NVIDIA, but it will absolutely happen because ARM in the data center I think is a foregone conclusion. Well, it's already there in some cases, but not quite at volume. So definitely will be the case. And I'll tell you where this gets really interesting, discussion for another time, is back to my old friend, the SSD, and having SSDs that have enough brains on them to be part of that fabric. Directly. >> Interesting. Interesting. >> Very interesting. >> Directly attached to ethernet and able to create a data mesh global file system, that's going to be really fascinating. Got to run now. >> All right, hey, thanks you guys. Thanks David, thanks Molly. Great to catch up. Bye-bye. >> Bye >> Talk to you soon.

Published Date : Oct 5 2022

SUMMARY :

So my question to you was, they don't have to do it. to starved before you have I believe that the ISVs, especially those the end users you need to So, if I had to take And and I think Ultimately the supercloud or the Snowflake, you know, more narrowly on just the stuff of the point of what you're talking Well, and you know, Snowflake founders, I don't want to speak over So it starts to even blur who's the main gravity is to having and, you know, that's where to be in a, you know, a lot of thought to this. But some of the inside baseball But the truth is-- So one of the things we wrote the fact that you even have that you would not put in as to give you low latency access the hardest things, David. This is one of the things I've the how can you host applications Not a specific application Yeah, yeah, you just statement when you broke up. So would you exclude is kind of hard to do I know, we all know it is. I think I said to Slootman, of ways you can give it So again, in the spirit But I could use your to allowing you to run anything anywhere So it comes down to how quality that you would expect and how true up you are to that concept. you don't have to draw, yeah. the ability for you and get all the benefits of Snowflake. of being, you know, if it were a service They do the same thing and the MSP or the public clouds, to create my own data. for all of the other apps and that hold the datasets So David, in the third week of January, I'd love to have you come like that to line up with other, you know, Yeah, and Data Mesh, of course, is one Well, you know, and I think.. and the open source? and the client which knows how to talk and then just be able to we would consider that, you know, cloud. and have the exact same data We just lost them at the money slide. That's part of the I'm on the edge of my seat. And that's the stuff to schedule Well, and that's the Like, you know, I think But if the Goldman Sachs Well, you know, what they may be using What's the name of the company that does? NetSuite, yeah, Oracle. So the further up the stack you go, But SAS, I admit, is the to saying but you can have a So if you want defensible that you just described Define the SAS as including permission, I'm going to add you I'm going to add this in. We'll get on the calendar to stuff we've been talking about, so. nature, the common data thing and yeah. to it and would be glad to have you in studio. and she'll get my admin to set it up I'll get those today to you, Molly. And obviously, the more we get people So we have not done an ARM port yet. because ARM in the data center I think is Interesting. that's going to be really fascinating. All right, hey, thanks you guys.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Slootman	PERSON	0.99+
Netflix	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Molly	PERSON	0.99+
Marianna Tessel	PERSON	0.99+
Dell	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Disney	ORGANIZATION	0.99+
Goldman Sachs	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
January	DATE	0.99+
John Furrier	PERSON	0.99+
February	DATE	0.99+
Peter	PERSON	0.99+
Zhamak Dehghaniis	PERSON	0.99+
Hammerspace	ORGANIZATION	0.99+
Word	TITLE	0.99+
AWS	ORGANIZATION	0.99+
RHEL 8	TITLE	0.99+
Oracle	ORGANIZATION	0.99+
Benoit	PERSON	0.99+
Excel	TITLE	0.99+
second	QUANTITY	0.99+
Autodesk	ORGANIZATION	0.99+
CentOS 8	TITLE	0.99+
David Flynn	PERSON	0.99+
one	QUANTITY	0.99+
Databricks	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
PowerPoint	TITLE	0.99+
first point	QUANTITY	0.99+
both	QUANTITY	0.99+
Tuesday	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
first part	QUANTITY	0.99+
today	DATE	0.99+
each region	QUANTITY	0.98+
Linux	TITLE	0.98+
One	QUANTITY	0.98+
Intuit	ORGANIZATION	0.98+
Tim Burners Lee	PERSON	0.98+
Zhamak Dehghaniis'	PERSON	0.98+
Blue Origin	ORGANIZATION	0.98+
Bay Area	LOCATION	0.98+
two reasons	QUANTITY	0.98+
each	QUANTITY	0.98+
one application	QUANTITY	0.98+
Snowflake	TITLE	0.98+
first	QUANTITY	0.98+
more than a few months	QUANTITY	0.97+
SAS	ORGANIZATION	0.97+
ARM	ORGANIZATION	0.97+
Microsoft	ORGANIZATION	0.97+

Lie 1, The Most Effective Data Architecture Is Centralized | Starburst

(bright upbeat music) >> In 2011, early Facebook employee and Cloudera co-founder Jeff Hammerbacher famously said, "The best minds of my generation are thinking about how to get people to click on ads, and that sucks!" Let's face it. More than a decade later, organizations continue to be frustrated with how difficult it is to get value from data and build a truly agile and data-driven enterprise. What does that even mean, you ask? Well, it means that everyone in the organization has the data they need when they need it in a context that's relevant to advance the mission of an organization. Now, that could mean cutting costs, could mean increasing profits, driving productivity, saving lives, accelerating drug discovery, making better diagnoses, solving supply chain problems, predicting weather disasters, simplifying processes, and thousands of other examples where data can completely transform people's lives beyond manipulating internet users to behave a certain way. We've heard the prognostications about the possibilities of data before and in fairness we've made progress, but the hard truth is the original promises of master data management, enterprise data warehouses, data marts, data hubs, and yes even data lakes were broken and left us wanting for more. Welcome to The Data Doesn't Lie... Or Does It? A series of conversations produced by theCUBE and made possible by Starburst Data. I'm your host, Dave Vellante, and joining me today are three industry experts. Justin Borgman is the co-founder and CEO of Starburst, Richard Jarvis is the CTO at EMIS Health, and Teresa Tung is cloud first technologist at Accenture. Today, we're going to have a candid discussion that will expose the unfulfilled, and yes, broken promises of a data past. We'll expose data lies: big lies, little lies, white lies, and hidden truths. And we'll challenge, age old data conventions and bust some data myths. We're debating questions like is the demise of a single source of truth inevitable? Will the data warehouse ever have feature parity with the data lake or vice versa? Is the so-called modern data stack simply centralization in the cloud, AKA the old guards model in new cloud close? How can organizations rethink their data architectures and regimes to realize the true promises of data? Can and will an open ecosystem deliver on these promises in our lifetimes? We're spanning much of the Western world today. Richard is in the UK, Teresa is on the West Coast, and Justin is in Massachusetts with me. I'm in theCUBE studios, about 30 miles outside of Boston. Folks, welcome to the program. Thanks for coming on. >> Thanks for having us. >> Okay, let's get right into it. You're very welcome. Now, here's the first lie. The most effective data architecture is one that is centralized with a team of data specialists serving various lines of business. What do you think Justin? >> Yeah, definitely a lie. My first startup was a company called Hadapt, which was an early SQL engine for IDU that was acquired by Teradata. And when I got to Teradata, of course, Teradata is the pioneer of that central enterprise data warehouse model. One of the things that I found fascinating was that not one of their customers had actually lived up to that vision of centralizing all of their data into one place. They all had data silos. They all had data in different systems. They had data on prem, data in the cloud. Those companies were acquiring other companies and inheriting their data architecture. So despite being the industry leader for 40 years, not one of their customers truly had everything in one place. So I think definitely history has proven that to be a lie. >> So Richard, from a practitioner's point of view, what are your thoughts? I mean, there's a lot of pressure to cut cost, keep things centralized, serve the business as best as possible from that standpoint. What does your experience show? >> Yeah, I mean, I think I would echo Justin's experience really that we as a business have grown up through acquisition, through storing data in different places sometimes to do information governance in different ways to store data in a platform that's close to data experts people who really understand healthcare data from pharmacies or from doctors. And so, although if you were starting from a greenfield site and you were building something brand new, you might be able to centralize all the data and all of the tooling and teams in one place. The reality is that businesses just don't grow up like that. And it's just really impossible to get that academic perfection of storing everything in one place. >> Teresa, I feel like Sarbanes-Oxley have kind of saved the data warehouse, right? (laughs) You actually did have to have a single version of the truth for certain financial data, but really for some of those other use cases I mentioned, I do feel like the industry has kind of let us down. What's your take on this? Where does it make sense to have that sort of centralized approach versus where does it make sense to maybe decentralize? >> I think you got to have centralized governance, right? So from the central team, for things like Sarbanes-Oxley, for things like security, for certain very core data sets having a centralized set of roles, responsibilities to really QA, right? To serve as a design authority for your entire data estate, just like you might with security, but how it's implemented has to be distributed. Otherwise, you're not going to be able to scale, right? So being able to have different parts of the business really make the right data investments for their needs. And then ultimately, you're going to collaborate with your partners. So partners that are not within the company, right? External partners. We're going to see a lot more data sharing and model creation. And so you're definitely going to be decentralized. >> So Justin, you guys last, jeez, I think it was about a year ago, had a session on data mesh. It was a great program. You invited Zhamak Dehghani. Of course, she's the creator of the data mesh. One of our fundamental premises is that you've got this hyper specialized team that you've got to go through if you want anything. But at the same time, these individuals actually become a bottleneck, even though they're some of the most talented people in the organization. So I guess, a question for you Richard. How do you deal with that? Do you organize so that there are a few sort of rock stars that build cubes and the like or have you had any success in sort of decentralizing with your constituencies that data model? >> Yeah. So we absolutely have got rockstar data scientists and data guardians, if you like. People who understand what it means to use this data, particularly the data that we use at EMIS is very private, it's healthcare information. And some of the rules and regulations around using the data are very complex and strict. So we have to have people who understand the usage of the data, then people who understand how to build models, how to process the data effectively. And you can think of them like consultants to the wider business because a pharmacist might not understand how to structure a SQL query, but they do understand how they want to process medication information to improve patient lives. And so that becomes a consulting type experience from a set of rock stars to help a more decentralized business who needs to understand the data and to generate some valuable output. >> Justin, what do you say to a customer or prospect that says, "Look, Justin. I got a centralized team and that's the most cost effective way to serve the business. Otherwise, I got duplication." What do you say to that? >> Well, I would argue it's probably not the most cost effective, and the reason being really twofold. I think, first of all, when you are deploying a enterprise data warehouse model, the data warehouse itself is very expensive, generally speaking. And so you're putting all of your most valuable data in the hands of one vendor who now has tremendous leverage over you for many, many years to come. I think that's the story at Oracle or Teradata or other proprietary database systems. But the other aspect I think is that the reality is those central data warehouse teams, as much as they are experts in the technology, they don't necessarily understand the data itself. And this is one of the core tenets of data mesh that Zhamak writes about is this idea of the domain owners actually know the data the best. And so by not only acknowledging that data is generally decentralized, and to your earlier point about Sarbanes-Oxley, maybe saving the data warehouse, I would argue maybe GDPR and data sovereignty will destroy it because data has to be decentralized for those laws to be compliant. But I think the reality is the data mesh model basically says data's decentralized and we're going to turn that into an asset rather than a liability. And we're going to turn that into an asset by empowering the people that know the data the best to participate in the process of curating and creating data products for consumption. So I think when you think about it that way, you're going to get higher quality data and faster time to insight, which is ultimately going to drive more revenue for your business and reduce costs. So I think that that's the way I see the two models comparing and contrasting. >> So do you think the demise of the data warehouse is inevitable? Teresa, you work with a lot of clients. They're not just going to rip and replace their existing infrastructure. Maybe they're going to build on top of it, but what does that mean? Does that mean the EDW just becomes less and less valuable over time or it's maybe just isolated to specific use cases? What's your take on that? >> Listen, I still would love all my data within a data warehouse. I would love it mastered, would love it owned by a central team, right? I think that's still what I would love to have. That's just not the reality, right? The investment to actually migrate and keep that up to date, I would say it's a losing battle. Like we've been trying to do it for a long time. Nobody has the budgets and then data changes, right? There's going to be a new technology that's going to emerge that we're going to want to tap into. There's going to be not enough investment to bring all the legacy, but still very useful systems into that centralized view. So you keep the data warehouse. I think it's a very, very valuable, very high performance tool for what it's there for, but you could have this new mesh layer that still takes advantage of the things I mentioned: the data products in the systems that are meaningful today, and the data products that actually might span a number of systems. Maybe either those that either source systems with the domains that know it best, or the consumer-based systems or products that need to be packaged in a way that'd be really meaningful for that end user, right? Each of those are useful for a different part of the business and making sure that the mesh actually allows you to use all of them. >> So, Richard, let me ask you. Take Zhamak's principles back to those. You got the domain ownership and data as product. Okay, great. Sounds good. But it creates what I would argue are two challenges: self-serve infrastructure, let's park that for a second, and then in your industry, one of the most regulated, most sensitive, computational governance. How do you automate and ensure federated governance in that mesh model that Teresa was just talking about? >> Well, it absolutely depends on some of the tooling and processes that you put in place around those tools to centralize the security and the governance of the data. And I think although a data warehouse makes that very simple 'cause it's a single tool, it's not impossible with some of the data mesh technologies that are available. And so what we've done at EMIS is we have a single security layer that sits on top of our data mesh, which means that no matter which user is accessing which data source, we go through a well audited, well understood security layer. That means that we know exactly who's got access to which data field, which data tables. And then everything that they do is audited in a very kind of standard way regardless of the underlying data storage technology. So for me, although storing the data in one place might not be possible, understanding where your source of truth is and securing that in a common way is still a valuable approach, and you can do it without having to bring all that data into a single bucket so that it's all in one place. And so having done that and investing quite heavily in making that possible has paid dividends in terms of giving wider access to the platform, and ensuring that only data that's available under GDPR and other regulations is being used by the data users. >> Yeah. So Justin, we always talk about data democratization, and up until recently, they really haven't been line of sight as to how to get there, but do you have anything to add to this because you're essentially doing analytic queries with data that's all dispersed all over. How are you seeing your customers handle this challenge? >> Yeah, I mean, I think data products is a really interesting aspect of the answer to that. It allows you to, again, leverage the data domain owners, the people who know the data the best, to create data as a product ultimately to be consumed. And we try to represent that in our product as effectively, almost eCommerce like experience where you go and discover and look for the data products that have been created in your organization, and then you can start to consume them as you'd like. And so really trying to build on that notion of data democratization and self-service, and making it very easy to discover and start to use with whatever BI tool you may like or even just running SQL queries yourself. >> Okay guys, grab a sip of water. After the short break, we'll be back to debate whether proprietary or open platforms are the best path to the future of data excellence. Keep it right there. (bright upbeat music)

Published Date : Aug 22 2022

SUMMARY :

has the data they need when they need it Now, here's the first lie. has proven that to be a lie. of pressure to cut cost, and all of the tooling have kind of saved the data So from the central team, for that build cubes and the like and to generate some valuable output. and that's the most cost effective way is that the reality is those of the data warehouse is inevitable? and making sure that the mesh one of the most regulated, most sensitive, and processes that you put as to how to get there, aspect of the answer to that. or open platforms are the best path

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Richard	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Justin	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
Teresa Tung	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
Teresa	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Massachusetts	LOCATION	0.99+
Zhamak Dehghani	PERSON	0.99+
UK	LOCATION	0.99+
2011	DATE	0.99+
two challenges	QUANTITY	0.99+
Hadapt	ORGANIZATION	0.99+
40 years	QUANTITY	0.99+
Starburst	ORGANIZATION	0.99+
two models	QUANTITY	0.99+
thousands	QUANTITY	0.99+
Boston	LOCATION	0.99+
Facebook	ORGANIZATION	0.99+
Sarbanes-Oxley	ORGANIZATION	0.99+
Each	QUANTITY	0.99+
first lie	QUANTITY	0.99+
Accenture	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Today	DATE	0.98+
today	DATE	0.98+
SQL	TITLE	0.98+
Starburst Data	ORGANIZATION	0.98+
EMIS Health	ORGANIZATION	0.98+
Cloudera	ORGANIZATION	0.98+
one	QUANTITY	0.98+
first startup	QUANTITY	0.98+
one place	QUANTITY	0.98+
about 30 miles	QUANTITY	0.98+
One	QUANTITY	0.97+
More than a decade later	DATE	0.97+
EMIS	ORGANIZATION	0.97+
single bucket	QUANTITY	0.97+
first technologist	QUANTITY	0.96+
three industry experts	QUANTITY	0.96+
single tool	QUANTITY	0.96+
single version	QUANTITY	0.94+
Zhamak	PERSON	0.92+
theCUBE	ORGANIZATION	0.91+
single source	QUANTITY	0.9+
West Coast	LOCATION	0.87+
one vendor	QUANTITY	0.84+
single security layer	QUANTITY	0.81+
about a year ago	DATE	0.75+
IDU	ORGANIZATION	0.68+
Is	TITLE	0.65+
a second	QUANTITY	0.64+
EDW	ORGANIZATION	0.57+
examples	QUANTITY	0.55+
echo	COMMERCIAL_ITEM	0.54+
twofold	QUANTITY	0.5+
Lie	TITLE	0.35+

Paula Hansen, Alteryx | Supercloud22

(upbeat music) >> Welcome back to Supercloud22. This is an open community event, and it's dedicated to tracking the future of cloud in the 2020s. Supercloud is a term that we use to describe an architectural abstraction layer that hides the underlying complexities of the individual cloud primitives and APIs and creates a common experience for developers and users irrespective of where data is physically stored or on which cloud platform it lives. We're now going to explore the nuances of going to market in a world where data architectures span on premises across multiple clouds and are increasingly stretching out to the edge. Paula Hansen is the President and Chief Revenue Officer at Alteryx. And the reason we asked her to join us for Supercloud22 is because first of all, Alteryx is a company that is building a form of Supercloud in our view. If you have data in a bunch of different places and you need to pull in different data sets together, you might want to filter it or blend it, cleanse it, shape it, enrich it with other data, analyze it, report it out to your colleagues. Alteryx allows you to do that and automate that life cycle. And in our view is working to break down the data silos across clouds, hence Supercloud. Now, the other reason we invited Paula to the program is because she's a rockstar female in tech, and since day one at theCube, we've celebrated great women in tech, and in this case, a woman of data, Paula Hansen, welcome to the program. >> Thank you, Dave. I am absolutely thrilled to be here. >> Okay, we're going to focus on customers, their challenges and going to market in this cross cloud, multi-cloud, Supercloud world. First, Paula, what's changing in your view in the way that customers are innovating with data in the 2020s? >> Well, I think we've all learned very clearly over these last two years that the global pandemic has altered life and business as we know it. And now we're in an interesting time from a macroeconomic perspective as well. And so what we've seen is that every company in every industry has had to pivot and think about how they meet redefined customer expectations and an ever evolving competitive landscape. There really isn't an industry that wasn't reshaped in some way over the last couple of years. And we've been fortunate to work with companies in all industries that have adapted to this ever changing environment by leveraging Alteryx to help accelerate their digital transformations. Companies know that they need to unlock the full potential of their data to be able to move quickly to pivot and to respond to their customer's needs, as well as manage their businesses most efficiently. So I think nothing tells that story better than sharing a customer example with you, Dave. We love to share stories of our very innovative customers. And so the one that I'll share with you today in regards to this is Delta Airlines, who we're all very familiar with. And of course Delta's goal is to always keep their airplanes in the air flying passengers and getting people to their destinations efficiently. So they focus on the maintenance of their aircraft as a necessary part of running their business and they need to manage their maintenance stops and the maintenance of their aircraft very efficiently and effectively. So we work with them. They leverage our platform to automate all the processes for their aircraft maintenance centers. And so they've built out a fully automated reporting system on our platform leveraging tons of data. And this gives their service managers and their aircraft technicians foresight into what's happening with their scheduling and their maintenance processes. So this ensures that they've got the right technicians in the service center when the aircrafts land and that everything across that process is fully in place. And previously because of data silos and just complexity of data, this process would've taken them many many hours in each independent service center, and now leveraging Alteryx and the power of analytics and bringing all the data together. Those centers can do this process in just minutes and get their planes back in the air efficiently and delivering on their promises to their customers. So that's just one of many examples that we have in terms of the way the Alteryx analytics automation helps customers in this new age and helping to really unlock the power of their data. >> You know, Paul, that's an interesting example. Because in a previous life I worked with some airlines and people maybe don't realize this but, aircraft maintenance is the mission critical application for carriers. It's not the booking system. Because we've been there before, we show you there's a problem when you're booking or sometimes it's unfortunate, but people they get de booked. But the aircraft maintenance is the one that matters the most and that keeps planes in the air. So we hear all the time, you just mention it. About data silos and how problematic they are. So, specifically how are you seeing customers thinking about busting the data silos? >> Yeah, that's right, it's a big topic right now. Because companies realize that business processes that they run their business with, is very cross-functional in nature and requires data across every department in the enterprise. And you can't keep data locked in one department. So if you think of business processes like pay to procure or quote to cash, these are business processes that companies in every industry run their business. And that requires them to get data from multiple departments and bring all of that data together seamlessly to make the best business decisions that they can make. So what our platform does is, and is really well known for, is being very easy for users number one, and then number two, being really great at getting access to data quickly and easily from all those data silos, really, regardless of where it is. We talk about being everywhere. And when we say that we mean, whether it's on-prem, in your legacy applications and databases, or whether it's in the cloud with of course, all the multiple cloud platforms and modern cloud data warehouses. Regardless of where it is, we have the ability to bring that data together across hundreds of different data sources, bring it together to help drive insights and ultimately help our customers make better decisions, take action, and deliver on the business outcomes that they all are trying to drive within their respective industries. And what's- >> You know- >> Go ahead. >> Please carry on. >> Well, I was just going to say that what I do think has really sort of a tipping point in the last six months in particular is that executives themselves are really demanding of their organizations, this democratization of data. And the breaking down of the silos and empowering all of the employees across their enterprise regardless of how sophisticated they are with analytics to participate in the analytic opportunity. So we've seen some really cool things of late where executives, CEOs, chief financial officers, chief data officers are sponsoring events within their organizations to break down these silos and encourage their employees to come together on this democratization opportunity of democratization of data and analytics. And there's a shortage of data scientists on top of this. So there's no way that you're going to be able to hire enough data scientists to make sense of all this data running around your enterprise. So we believe with our platform we empower people regardless of their skillset. And so we see executives sponsoring these hackathons within their environments to bring together people to brainstorm and ideate on use cases, to share examples of how they leverage our platform and leverage the data within their organization to make better decisions. And it's really quite cool. Companies like Stanley Black & Decker, Ingersoll Rand, Inchcape PLC, these are all companies that the executive team has sponsored these hackathon events and seen really powerful things come out of it. As an example Ingersoll Rand sponsored their Alteryx hackathon with all of their data workers across various different functions where the data exists. And they focused on both top line revenue use cases as well as bottom line efficiency cases. And one of the outcomes was a use case that helped with their distribution center in north America and bringing all the data together across their various applications to reduce the amount of over ordering and under ordering of parts and more effectively manage their inventory within that distribution center. So, really cool to see this is now an executive level board level conversation. >> Very cool, a hackathon bringing people together for collaboration. A couple things that you said I want to comment on. Again, one of the reasons why we invited you guys to come on is, when you think about on-prem data and anybody who follows theCube and my breaking analysis program, knows we're big fans of Zhamak Dehghani's concept of data mesh. And data mesh is supposed to be inclusive. It doesn't matter if it's an S3 bucket, Oracle data base, or data warehouse, or data lake, that's just a note on the data mesh. And so it should be inclusive and Supercloud should include on-prem data to the extent that you can make that experience consistent. We have a lot of technical sessions here at Supercloud22, we're focusing now and go to market and the ecosystem. And we live in a world of multiple partners exploding ecosystems. And a lot of times it's co-opetition. So Paula, when you joined Alteryx you brought a proven go to market discipline to the company. Alignment with the customer, playbooks, best practice of sales, et cetera. And we've seen the results. It's a big reason why Mark Anderson and the board promoted you to president just after 10 months. Summarize how you approached the situation at Alteryx when you joined last spring. >> Yeah, I think first we were really intentional about what part of the market, what type of enterprises get the most benefit from the innovation that we deliver? And it's really clear that it's large enterprises. That the more complex a company is, most likely the more data they have and oftentimes the more decentralized that data is. And they're also really all trying to figure out how to remain competitive by leveraging that data. So, the first thing we did was be very intentional that we're focused on the enterprise and building out all of the capability required to be able to serve the enterprise. Of course, essential to all of that is having a platform capability because enterprises require that. So, with Suresh Vittal our Chief Product Officer, he's been fantastic in building out an end to end analytic platform that serves a wide range of analytic capabilities to a wide range of users. And then of course has this flexibility to operate both on-prem and in the cloud which is very important. Because we see this hybrid environment in this multicloud environment being something that is important to our customers. The second thing that I was really focused on was understanding how do you have those conversations with customers when they all are in maybe different types of backgrounds? So the way that you work with a business analyst in the office of finance or supply chain or sales and marketing, is different than the way that you serve a data scientist or a data engineer in IT. The way that you talk to a business owner who wants not to really understand the workflow level of data but wants to understand the insights of data, that's a different conversation. When you want to have a conversation of analytics for all or democratization of analytics at the executive level with the chief data officer or a CIO, that's a whole different conversation. And so we've built very specific sales plays to be able to have those conversations bring the relevant information to the relevant person so that we're really making sure that we explain the value proposition of the platform. Fully understand their world, their language and can work with them to deliver the value to them. And then the third thing that we did, was really heavily invest in our partnerships and you referenced this day. It's a a broad ecosystem out there. And we know that we have to integrate into that broad data ecosystem. and be a good partner to serve our customers. So, we've invested both in technology integration as well as go to market strategies with cloud data warehouse companies like Snowflake and Databricks, or RPA companies like UiPath and Blue Prism, as well as a wide range of other application and all of the cloud platforms because that's what our customers expect from us. So that's been a really important sort of third pillar of our strategy in making sure that from a go to market perspective, we understand where we fit in the ecosystem and how we collectively deliver on value to our joint customers. >> So that's super helpful. What I'm taking away from this is you didn't come to it with a generic playbook. Frank Lyman always talks about situation leadership. You assess the situation and applied that and a great example of partners is Snowflake and Databricks, these sort of opposites, but trying to solve similar problems. So you've got to be inclusive of all that. So we're trying to sort of squint through this Paula and say, okay, are there nuances and best practices beyond some of the the things that you just described that are unique to what we call Supercloud? Are there observations you can make with respect to what's different in this post isolation economy? Specifically in managing remote employees and of course remote partners, working with these complex ecosystems and the rise of this multi-cloud world, is it different or is it same wine new bottle? >> Well, I think it's both common from the on-prem or pre-cloud world, but there's also some differences as well. So what's common is that companies still expect innovation from us and still want us to be able to serve a wide range of skill sets. So our belief is that regardless of the skill set that you have, you can participate in the analytics opportunity for your company and unlocking the potential of your data. So we've been very focused since our inception to build out a platform that really serves this wide range of capabilities across the enterprise space. What's perhaps changed more or continues to evolve in this cloud world is just the flexibility that's required. You have to be everywhere. You have to be able to serve users wherever they are and be able to live in a multi-cloud or super cloud world. So when I think of cloud, I think it just unlocks a whole bigger opportunity for Alteryx and for companies that want to become analytic leaders. Because now you have users all over the globe, many of them looking for web-based analytic solutions. And of course these enterprises are all in various places on their journey to cloud and they want a partner and a platform that operates in all of those environments, which is what we do at Alteryx. So, I think it's an exciting time. I think that it's still very early in the analytic market and what companies are going to do to leverage their data to drive their transformation. And we're really excited to be a part of it. >> So last question is, I said up front we always like to celebrate women in tech. How'd you get into tech.? You've got a background, you've got somewhat of a technical background of being technical sales. And then of course rose up throughout your career and now have a leadership position. I called you a woman of data. How'd you get into it? Where'd you find the love of data? Give us the background and help us inspire some of the young women out there. >> Oh, well, but I'm super passionate about inspiring young women and thinking about the future next generation of women that can participate in technology and in data specifically. I grew up loving math and science. I went to school and got an electrical engineering degree but my passion around technology hasn't been just around technology for technology's sake, my passion around technology is what can it enable? What can it do? What are the outcomes that technology makes possible? And that's why data is so attractive because data makes amazing things possible. I shared some of those examples with you earlier but it not only can we have effect with data in businesses and enterprise, but governments globally now are realizing the ability for data to really have broad societal impact. And so I think that that speaks to women many times. Is that what does technology enable? What are the outcomes? What are the stories and examples that we can all share and be inspired by and feel good and and inspired to be a part of a broader opportunity that technology and data specifically enables? So that's what drives me. And those are the conversations that I have with the women that I speak with in all ages all the way down to K through 12 to inspire them to have a career in technology. >> Awesome, the more people in STEM the better, and the more women in our industry the better. Paula Hansen, thanks so much for coming in the program. Appreciate it. >> Thank you, Dave. >> Okay, keep it right there for more coverage from Supercloud 22, you're watching theCube. (upbeat music)

Published Date : Jul 28 2022

SUMMARY :

the nuances of going to market I am absolutely thrilled to be here. and going to market in this and the maintenance of their aircraft that matters the most and And that requires them to get and bringing all the data together and the board promoted you and all of the cloud platforms because of the the things that you just described of the skill set that you have, of the young women out there. What are the outcomes that and the more women in from Supercloud 22,

ENTITIES

Entity	Category	Confidence
Paula	PERSON	0.99+
Suresh Vittal	PERSON	0.99+
Dave	PERSON	0.99+
Paula Hansen	PERSON	0.99+
Alteryx	ORGANIZATION	0.99+
Frank Lyman	PERSON	0.99+
Paul	PERSON	0.99+
Delta	ORGANIZATION	0.99+
UiPath	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Mark Anderson	PERSON	0.99+
Stanley Black & Decker	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
north America	LOCATION	0.99+
Delta Airlines	ORGANIZATION	0.99+
2020s	DATE	0.99+
Supercloud	ORGANIZATION	0.99+
Ingersoll Rand	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Blue Prism	ORGANIZATION	0.99+
second thing	QUANTITY	0.99+
one	QUANTITY	0.99+
Supercloud22	ORGANIZATION	0.99+
First	QUANTITY	0.99+
Inchcape PLC	ORGANIZATION	0.99+
one department	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
first	QUANTITY	0.98+
hundreds	QUANTITY	0.98+
last spring	DATE	0.98+
third thing	QUANTITY	0.98+
Oracle	ORGANIZATION	0.97+
Supercloud22	EVENT	0.96+
Supercloud 22	ORGANIZATION	0.94+
Alteryx	PERSON	0.94+
10 months	QUANTITY	0.91+
third pillar	QUANTITY	0.91+
first thing	QUANTITY	0.9+
theCube	ORGANIZATION	0.88+
last six months	DATE	0.87+
each independent service center	QUANTITY	0.85+
last couple of years	DATE	0.85+
12	QUANTITY	0.85+
day one	QUANTITY	0.83+
couple things	QUANTITY	0.8+
last two years	DATE	0.79+
pandemic	EVENT	0.79+
one of the reasons	QUANTITY	0.79+
Alteryx hackathon	EVENT	0.78+
RPA	ORGANIZATION	0.76+
number two	QUANTITY	0.74+
number one	QUANTITY	0.69+
Supercloud	TITLE	0.65+
tons of data	QUANTITY	0.65+
President	PERSON	0.63+
S3	TITLE	0.53+

Breaking Analysis: H1 of ‘22 was ugly…H2 could be worse Here’s why we’re still optimistic

>> From theCUBE Studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is Breaking Analysis with Dave Vellante. >> After a two-year epic run in tech, 2022 has been an epically bad year. Through yesterday, The NASDAQ composite is down 30%. The S$P 500 is off 21%. And the Dow Jones Industrial average 16% down. And the poor holders at Bitcoin have had to endure a nearly 60% decline year to date. But judging by the attendance and enthusiasm, in major in-person tech events this spring. You'd never know that tech was in the tank. Moreover, walking around the streets of Las Vegas, where most tech conferences are held these days. One can't help but notice that the good folks of Main Street, don't seem the least bit concerned that the economy is headed for a recession. Hello, and welcome to this weeks Wiki Bond Cube Insights powered by ETR. In this Breaking Analysis we'll share our main takeaways from the first half of 2022. And talk about the outlook for tech going forward, and why despite some pretty concerning headwinds we remain sanguine about tech generally, but especially enterprise tech. Look, here's the bumper sticker on why many folks are really bearish at the moment. Of course, inflation is high, other than last year, the previous inflation high this century was in July of 2008, it was 5.6%. Inflation has proven to be very, very hard to tame. You got gas at $7 dollars a gallon. Energy prices they're not going to suddenly drop. Interest rates are climbing, which will eventually damage housing. Going to have that ripple effect, no doubt. We're seeing layoffs at companies like Tesla and the crypto names are also trimming staff. Workers, however are still in short supply. So wages are going up. Companies in retail are really struggling with the right inventory, and they can't even accurately guide on their earnings. We've seen a version of this movie before. Now, as it pertains to tech, Crawford Del Prete, who's the CEO of IDC explained this on theCUBE this very week. And I thought he did a really good job. He said the following, >> Matt, you have a great statistic that 80% of companies used COVID as their point to pivot into digital transformation. And to invest in a different way. And so what we saw now is that tech is now where I think companies need to focus. They need to invest in tech. They need to make people more productive with tech and it played out in the numbers. Now so this year what's fascinating is we're looking at two vastly different markets. We got gasoline at $7 a gallon. We've got that affecting food prices. Interesting fun fact recently it now costs over $1,000 to fill an 18 wheeler. All right, based on, I mean, this just kind of can't continue. So you think about it. >> Don't put the boat in the water. >> Yeah, yeah, yeah. Good luck if ya, yeah exactly. So a family has kind of this bag of money, and that bag of money goes up by maybe three, 4% every year, depending upon earnings. So that is sort of sloshing around. So if food and fuel and rent is taking up more, gadgets and consumer tech are not, you're going to use that iPhone a little longer. You're going to use that Android phone a little longer. You're going to use that TV a little longer. So consumer tech is getting crushed, really it's very, very, and you saw it immediately in ad spending. You've seen it in Meta, you've seen it in Facebook. Consumer tech is doing very, very, it is tough. Enterprise tech, we haven't been in the office for two and a half years. We haven't upgraded whether that be campus wifi, whether that be servers, whether that be commercial PCs as much as we would have. So enterprise tech, we're seeing double digit order rates. We're seeing strong, strong demand. We have combined that with a component shortage, and you're seeing some enterprise companies with a quarter of backlog, I mean that's really unheard of. >> And higher prices, which also profit. >> And therefore that drives up the prices. >> And this is a theme that we've heard this year at major tech events, they've really come roaring back. Last year, theCUBE had a huge presence at AWS Reinvent. The first Reinvent since 2019, it was really well attended. Now this was before the effects of the omicron variant, before they were really well understood. And in the first quarter of 2022, things were pretty quiet as far as tech events go But theCUBE'a been really busy this spring and early into the summer. We did 12 physical events as we're showing here in the slide. Coupa, did Women in Data Science at Stanford, Coupa Inspire was in Las Vegas. Now these are both smaller events, but they were well attended and beat expectations. San Francisco Summit, the AWS San Francisco Summit was a bit off, frankly 'cause of the COVID concerns. They were on the rise, then we hit Dell Tech World which was packed, it had probably around 7,000 attendees. Now Dockercon was virtual, but we decided to include it here because it was a huge global event with watch parties and many, many tens of thousands of people attending. Now the Red Hat Summit was really interesting. The choice that Red Hat made this year. It was purposefully scaled down and turned into a smaller VIP event in Boston at the Western, a couple thousand people only. It was very intimate with a much larger virtual presence. VeeamON was very well attended, not as large as previous VeeamON events, but again beat expectations. KubeCon and Cloud Native Con was really successful in Spain, Valencia, Spain. PagerDuty Summit was again a smaller intimate event in San Francisco. And then MongoDB World was at the new Javits Center and really well attended over the three day period. There were lots of developers there, lots of business people, lots of ecosystem partners. And then the Snowflake summit in Las Vegas, it was the most vibrant from the standpoint of the ecosystem with nearly 10,000 attendees. And I'll come back to that in a moment. Amazon re:Mars is the Amazon AI robotic event, it's smaller but very, very cool, a lot of innovation. And just last week we were at HPE Discover. They had around 8,000 people attending which was really good. Now I've been to over a dozen HPE or HPE Discover events, within Europe and the United States over the past decade. And this was by far the most vibrant, lot of action. HPE had a little spring in its step because the company's much more focused now but people was really well attended and people were excited to be there, not only to be back at physical events, but also to hear about some of the new innovations that are coming and HPE has a long way to go in terms of building out that ecosystem, but it's starting to form. So we saw that last week. So tech events are back, but they are smaller. And of course now a virtual overlay, they're hybrid. And just to give you some context, theCUBE did, as I said 12 physical events in the first half of 2022. Just to compare that in 2019, through June of that year we had done 35 physical events. Yeah, 35. And what's perhaps more interesting is we had our largest first half ever in our 12 year history because we're doing so much hybrid and virtual to compliment the physical. So that's the new format is CUBE plus digital or sometimes just digital but that's really what's happening in our business. So I think it's a reflection of what's happening in the broader tech community. So everyone's still trying to figure that out but it's clear that events are back and there's no replacing face to face. Or as I like to say, belly to belly, because deals are done at physical events. All these events we've been to, the sales people are so excited. They're saying we're closing business. Pipelines coming out of these events are much stronger, than they are out of the virtual events but the post virtual event continues to deliver that long tail effect. So that's not going to go away. The bottom line is hybrid is the new model. Okay let's look at some of the big themes that we've taken away from the first half of 2022. Now of course, this is all happening under the umbrella of digital transformation. I'm not going to talk about that too much, you've had plenty of DX Kool-Aid injected into your veins over the last 27 months. But one of the first observations I'll share is that the so-called big data ecosystem that was forming during the hoop and around, the hadoop infrastructure days and years. then remember it dispersed, right when the cloud came in and kind of you know, not wiped out but definitely dampened the hadoop enthusiasm for on-prem, the ecosystem dispersed, but now it's reforming. There are large pockets that are obviously seen in the various clouds. And we definitely see a ecosystem forming around MongoDB and the open source community gathering in the data bricks ecosystem. But the most notable momentum is within the Snowflake ecosystem. Snowflake is moving fast to win the day in the data ecosystem. They're providing a single platform that's bringing different data types together. Live data from systems of record, systems of engagement together with so-called systems of insight. These are converging and while others notably, Oracle are architecting for this new reality, Snowflake is leading with the ecosystem momentum and a new stack is emerging that comprises cloud infrastructure at the bottom layer. Data PaaS layer for app dev and is enabling an ecosystem of partners to build data products and data services that can be monetized. That's the key, that's the top of the stack. So let's dig into that further in a moment but you're seeing machine intelligence and data being driven into applications and the data and application stacks they're coming together to support the acceleration of physical into digital. It's happening right before our eyes in every industry. We're also seeing the evolution of cloud. It started with the SaaS-ification of the enterprise where organizations realized that they didn't have to run their own software on-prem and it made sense to move to SaaS for CRM or HR, certainly email and collaboration and certain parts of ERP and early IS was really about getting out of the data center infrastructure management business called that cloud 1.0, and then 2.0 was really about changing the operating model. And now we're seeing that operating model spill into on-prem workloads finally. We're talking about here about initiatives like HPE's Green Lake, which we heard a lot about last week at Discover and Dell's Apex, which we heard about in May, in Las Vegas. John Furrier had a really interesting observation that basically this is HPE's and Dell's version of outposts. And I found that interesting because outpost was kind of a wake up call in 2018 and a shot across the bow at the legacy enterprise infrastructure players. And they initially responded with these flexible financial schemes, but finally we're seeing real platforms emerge. Again, we saw this at Discover and at Dell Tech World, early implementations of the cloud operating model on-prem. I mean, honestly, you're seeing things like consoles and billing, similar to AWS circa 2014, but players like Dell and HPE they have a distinct advantage with respect to their customer bases, their service organizations, their very large portfolios, especially in the case of Dell and the fact that they have more mature stacks and knowhow to run mission critical enterprise applications on-prem. So John's comment was quite interesting that these firms are basically building their own version of outposts. Outposts obviously came into their wheelhouse and now they've finally responded. And this is setting up cloud 3.0 or Supercloud, as we like to call it, an abstraction layer, that sits above the clouds that serves as a unifying experience across a continuum of on-prem across clouds, whether it's AWS, Azure, or Google. And out to both the near and far edge, near edge being a Lowes or a Home Depot, but far edge could be space. And that edge again is fragmented. You've got the examples like the retail stores at the near edge. Outer space maybe is the far edge and IOT devices is perhaps the tiny edge. No one really knows how the tiny edge is going to play out but it's pretty clear that it's not going to comprise traditional X86 systems with a cool name tossed out to the edge. Rather, it's likely going to require a new low cost, low power, high performance architecture, most likely RM based that will enable things like realtime AI inferencing at that edge. Now we've talked about this a lot on Breaking Analysis, so I'm not going to double click on it. But suffice to say that it's very possible that new innovations are going to emerge from the tiny edge that could really disrupt the enterprise in terms of price performance. Okay, two other quick observations. One is that data protection is becoming a much closer cohort to the security stack where data immutability and air gaps and fast recovery are increasingly becoming a fundamental component of the security strategy to combat ransomware and recover from other potential hacks or disasters. And I got to say from our observation, Veeam is leading the pack here. It's now claiming the number one revenue spot in a statistical dead heat with the Dell's data protection business. That's according to Veeam, according to IDC. And so that space continues to be of interest. And finally, Broadcom's acquisition of Dell. It's going to have ripple effects throughout the enterprise technology business. And there of course, there are a lot of questions that remain, but the one other thing that John Furrier and I were discussing last night John looked at me and said, "Dave imagine if VMware runs better on Broadcom components and OEMs that use Broadcom run VMware better, maybe Broadcom doesn't even have to raise prices on on VMware licenses. Maybe they'll just raise prices on the OEMs and let them raise prices to the end customer." Interesting thought, I think because Broadcom is so P&L focused that it's probably not going to be the prevailing model but we'll see what happens to some of the strategic projects rather like Monterey and Capitola and Thunder. We've talked a lot about project Monterey, the others we'll see if they can make the cut. That's one of the big concerns because it's how OEMs like the ones that are building their versions of outposts are going to compete with the cloud vendors, namely AWS in the future. I want to come back to the comment on the data stack for a moment that we were talking about earlier, we talked about how the big data ecosystem that was once coalescing around hadoop dispersed. Well, the data value chain is reforming and we think it looks something like this picture, where cloud infrastructure lives at the bottom. We've said many times the cloud is expanding and evolving. And if companies like Dell and HPE can truly build a super cloud infrastructure experience then they will be in a position to capture more of the data value. If not, then it's going to go to the cloud players. And there's a live data layer that is increasingly being converged into platforms that not only simplify the movement in ELTing of data but also allow organizations to compress the time to value. Now there's a layer above that, we sometimes call it the super PaaS layer if you will, that must comprise open source tooling, partners are going to write applications and leverage platform APIs and build data products and services that can be monetized at the top of the stack. So when you observe the battle for the data future it's unlikely that any one company is going to be able to do this all on their own, which is why I often joke that the 2020s version of a sweaty Steve Bomber running around the stage, screaming, developers, developers developers, and getting the whole audience into it is now about ecosystem ecosystem ecosystem. Because when you need to fill gaps and accelerate features and provide optionality a list of capabilities on the left hand side of this chart, that's going to come from a variety of different companies and places, we're talking about catalogs and AI tools and data science capabilities, data quality, governance tools and it should be of no surprise to followers of Breaking Analysis that on the right hand side of this chart we're including the four principles of data mesh, which of course were popularized by Zhamak Dehghani. So decentralized data ownership, data as products, self-serve platform and automated or computational governance. Now whether this vision becomes a reality via a proprietary platform like Snowflake or somehow is replicated by an open source remains to be seen but history generally shows that a defacto standard for more complex problems like this is often going to emerge prior to an open source alternative. And that would be where I would place my bets. Although even that proprietary platform has to include open source optionality. But it's not a winner take all market. It's plenty of room for multiple players and ecosystem innovators, but winner will definitely take more in my opinion. Okay, let's close with some ETR data that looks at some of those major platform plays who talk a lot about digital transformation and world changing impactful missions. And they have the resources really to compete. This is an XY graphic. It's a view that we often show, it's got net score on the vertical access. That's a measure of spending momentum, and overlap or presence in the ETR survey. That red, that's the horizontal access. The red dotted line at 40% indicates that the platform is among the highest in terms of spending velocity. Which is why I always point out how impressive that makes AWS and Azure because not only are they large on the horizontal axis, the spending momentum on those two platforms rivals even that of Snowflake which continues to lead all on the vertical access. Now, while Google has momentum, given its goals and resources, it's well behind the two leaders. We've added Service Now and Salesforce, two platform names that have become the next great software companies. Joining likes of Oracle, which we show here and SAP not shown along with IBM, you can see them on this chart. We've also plotted MongoDB, which we think has real momentum as a company generally but also with Atlas, it's managed cloud database as a service specifically and Red Hat with trying to become the standard for app dev in Kubernetes environments, which is the hottest trend right now in application development and application modernization. Everybody's doing something with Kubernetes and of course, Red Hat with OpenShift wants to make that a better experience than do it yourself. The DYI brings a lot more complexity. And finally, we've got HPE and Dell both of which we've talked about pretty extensively here and VMware and Cisco. Now Cisco is executing on its portfolio strategy. It's got a lot of diverse components to its company. And it's coming at the cloud of course from a networking and security perspective. And that's their position of strength. And VMware is a staple of the enterprise. Yes, there's some uncertainty with regards to the Broadcom acquisition, but one thing is clear vSphere isn't going anywhere. It's entrenched and will continue to run lots of IT for years to come because it's the best platform on the planet. Now, of course, these are just some of the players in the mix. We expect that numerous non-traditional technology companies this is important to emerge as new cloud players. We've put a lot of emphasis on the data ecosystem because to us that's really going to be the main spring of digital, i.e., a digital company is a data company and that means an ecosystem of data partners that can advance outcomes like better healthcare, faster drug discovery, less fraud, cleaner energy, autonomous vehicles that are safer, smarter, more efficient grids and factories, better government and virtually endless litany of societal improvements that can be addressed. And these companies will be building innovations on top of cloud platforms creating their own super clouds, if you will. And they'll come from non-traditional places, industries, finance that take their data, their software, their tooling bring them to their customers and run them on various clouds. Okay, that's it for today. Thanks to Alex Myerson, who is on production and does the podcast for Breaking Analysis, Kristin Martin and Cheryl Knight, they help get the word out. And Rob Hoofe is our editor and chief over at Silicon Angle who helps edit our posts. Remember all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis podcast. I publish each week on wikibon.com and siliconangle.com. You can email me directly at david.vellante@siliconangle.com or DM me at dvellante, or comment on my LinkedIn posts. And please do check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE's Insights powered by ETR. Thanks for watching be well. And we'll see you next time on Breaking Analysis. (upbeat music)

Published Date : Jul 2 2022

SUMMARY :

This is Breaking Analysis that the good folks of Main Street, and it played out in the numbers. haven't been in the office And higher prices, And therefore that is that the so-called big data ecosystem

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Rob Hoofe	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Cheryl Knight	PERSON	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Kristin Martin	PERSON	0.99+
July of 2008	DATE	0.99+
Europe	LOCATION	0.99+
5.6%	QUANTITY	0.99+
Matt	PERSON	0.99+
Spain	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
San Francisco	LOCATION	0.99+
Monterey	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
12 year	QUANTITY	0.99+
2018	DATE	0.99+
Discover	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Las Vegas	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
2019	DATE	0.99+
May	DATE	0.99+
June	DATE	0.99+
AWS	ORGANIZATION	0.99+
IDC	ORGANIZATION	0.99+
Last year	DATE	0.99+
Oracle	ORGANIZATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Broadcom	ORGANIZATION	0.99+
Silicon Angle	ORGANIZATION	0.99+
Crawford Del Prete	PERSON	0.99+
30%	QUANTITY	0.99+
80%	QUANTITY	0.99+
HPE	ORGANIZATION	0.99+
12 physical events	QUANTITY	0.99+
Dave	PERSON	0.99+
KubeCon	EVENT	0.99+
last week	DATE	0.99+
United States	LOCATION	0.99+
Android	TITLE	0.99+
Dockercon	EVENT	0.99+
40%	QUANTITY	0.99+
two and a half years	QUANTITY	0.99+
35 physical events	QUANTITY	0.99+
Steve Bomber	PERSON	0.99+
Capitola	ORGANIZATION	0.99+
Cloud Native Con	EVENT	0.99+
Red Hat Summit	EVENT	0.99+
two leaders	QUANTITY	0.99+
San Francisco Summit	EVENT	0.99+
last year	DATE	0.99+
21%	QUANTITY	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
Veeam	ORGANIZATION	0.99+
yesterday	DATE	0.99+
One	QUANTITY	0.99+
John Furrier	PERSON	0.99+
VeeamON	EVENT	0.99+
this year	DATE	0.99+
16%	QUANTITY	0.99+
$7 a gallon	QUANTITY	0.98+
each week	QUANTITY	0.98+
over $1,000	QUANTITY	0.98+
35	QUANTITY	0.98+
PagerDuty Summit	EVENT	0.98+

Breaking Analysis: Snowflake Summit 2022...All About Apps & Monetization

>> From theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Snowflake Summit 2022 underscored that the ecosystem excitement which was once forming around Hadoop is being reborn, escalated and coalescing around Snowflake's data cloud. What was once seen as a simpler cloud data warehouse and good marketing with the data cloud is evolving rapidly with new workloads of vertical industry focus, data applications, monetization, and more. The question is, will the promise of data be fulfilled this time around, or is it same wine, new bottle? Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR. In this "Breaking Analysis," we'll talk about the event, the announcements that Snowflake made that are of greatest interest, the major themes of the show, what was hype and what was real, the competition, and some concerns that remain in many parts of the ecosystem and pockets of customers. First let's look at the overall event. It was held at Caesars Forum. Not my favorite venue, but I'll tell you it was packed. Fire Marshall Full, as we sometimes say. Nearly 10,000 people attended the event. Here's Snowflake's CMO Denise Persson on theCUBE describing how this event has evolved. >> Yeah, two, three years ago, we were about 1800 people at a Hilton in San Francisco. We had about 40 partners attending. This week we're close to 10,000 attendees here. Almost 10,000 people online as well, and over over 200 partners here on the show floor. >> Now, those numbers from 2019 remind me of the early days of Hadoop World, which was put on by Cloudera but then Cloudera handed off the event to O'Reilly as this article that we've inserted, if you bring back that slide would say. The headline it almost got it right. Hadoop World was a failure, but it didn't have to be. Snowflake has filled the void created by O'Reilly when it first killed Hadoop World, and killed the name and then killed Strata. Now, ironically, the momentum and excitement from Hadoop's early days, it probably could have stayed with Cloudera but the beginning of the end was when they gave the conference over to O'Reilly. We can't imagine Frank Slootman handing the keys to the kingdom to a third party. Serious business was done at this event. I'm talking substantive deals. Salespeople from a host sponsor and the ecosystems that support these events, they love physical. They really don't like virtual because physical belly to belly means relationship building, pipeline, and deals. And that was blatantly obvious at this show. And in fairness, all theCUBE events that we've done year but this one was more vibrant because of its attendance and the action in the ecosystem. Ecosystem is a hallmark of a cloud company, and that's what Snowflake is. We asked Frank Slootman on theCUBE, was this ecosystem evolution by design or did Snowflake just kind of stumble into it? Here's what he said. >> Well, when you are a data clouding, you have data, people want to do things with that data. They don't want just run data operations, populate dashboards, run reports. Pretty soon they want to build applications and after they build applications, they want build businesses on it. So it goes on and on and on. So it drives your development to enable more and more functionality on that data cloud. Didn't start out that way, you know, we were very, very much focused on data operations. Then it becomes application development and then it becomes, hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many ways. >> So it sounds like it was maybe a little bit of both. The Facebook analogy is interesting because Facebook is a walled garden, as is Snowflake, but when you come into that garden, you have assurances that things are going to work in a very specific way because a set of standards and protocols is being enforced by a steward, i.e. Snowflake. This means things run better inside of Snowflake than if you try to do all the integration yourself. Now, maybe over time, an open source version of that will come out but if you wait for that, you're going to be left behind. That said, Snowflake has made moves to make its platform more accommodating to open source tooling in many of its announcements this week. Now, I'm not going to do a deep dive on the announcements. Matt Sulkins from Monte Carlo wrote a decent summary of the keynotes and a number of analysts like Sanjeev Mohan, Tony Bear and others are posting some deeper analysis on these innovations, and so we'll point to those. I'll say a few things though. Unistore extends the type of data that can live in the Snowflake data cloud. It's enabled by a new feature called hybrid tables, a new table type in Snowflake. One of the big knocks against Snowflake was it couldn't handle and transaction data. Several database companies are creating this notion of a hybrid where both analytic and transactional workloads can live in the same data store. Oracle's doing this for example, with MySQL HeatWave and there are many others. We saw Mongo earlier this month add an analytics capability to its transaction system. Mongo also added sequel, which was kind of interesting. Here's what Constellation Research analyst Doug Henschen said about Snowflake's moves into transaction data. Play the clip. >> Well with Unistore, they're reaching out and trying to bring transactional data in. Hey, don't limit this to analytical information and there's other ways to do that like CDC and streaming but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So another reach to a broader play across a big community that they're building. >> And you're also seeing Snowflake expand its workload types in its unique way and through Snowpark and its stream lit acquisition, enabling Python so that native apps can be built in the data cloud and benefit from all that structure and the features that Snowflake is built in. Hence that Facebook analogy, or maybe the App Store, the Apple App Store as I propose as well. Python support also widens the aperture for machine intelligence workloads. We asked Snowflake senior VP of product, Christian Kleinerman which announcements he thought were the most impactful. And despite the who's your favorite child nature of the question, he did answer. Here's what he said. >> I think the native applications is the one that looks like, eh, I don't know about it on the surface but he has the biggest potential to change everything. That's create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >> Snowflake also announced support for Apache Iceberg, which is a new open table format standard that's emerging. So you're seeing Snowflake respond to these concerns about its lack of openness, and they're building optionality into their cloud. They also showed some cost op optimization tools both from Snowflake itself and from the ecosystem, notably Capital One which launched a software business on top of Snowflake focused on optimizing cost and eventually the rollout data management capabilities, and all kinds of features that Snowflake announced that the show around governance, cross cloud, what we call super cloud, a new security workload, and they reemphasize their ability to read non-native on-prem data into Snowflake through partnerships with Dell and Pure and a lot more. Let's hear from some of the analysts that came on theCUBE this week at Snowflake Summit to see what they said about the announcements and their takeaways from the event. This is Dave Menninger, Sanjeev Mohan, and Tony Bear, roll the clip. >> Our research shows that the majority of organizations, the majority of people do not have access to analytics. And so a couple of the things they've announced I think address those or help to address those issues very directly. So Snowpark and support for Python and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job and they need analytics in that job. >> Primarily, I think it is to contract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started but it's usually beneficial to the customers, to the users because now if you have large amount of data in paket files you can leave it on S3, but then you using the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the micro partitioning, you get the metadata. And in a single query, you can join, you can do select from a Snowflake table union and select from an iceberg table and you can do store procedure, user defined function. So I think what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. So the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking, but most organizations have more than Snowflake. So what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other like relational databases, you can run these cross platform capabilities in that layer. So that way, you know, Snowflake's done a great job of enabling that ecosystem. >> I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically that are very comfortable with Tableau but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, and I think part of it, this kind of plays into it is what makes this different from the Hadoop era is the fact that all these capabilities, you know, a lot of vendors are taking it very seriously to put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the Streamlit capabilities are going to be native. >> I want to share a little bit about the higher level thinking at Snowflake, here's a chart from Frank Slootman's keynote. It's his version of the modern data stack, if you will. Now, Snowflake of course, was built on the public cloud. If there were no AWS, there would be no Snowflake. Now, they're all about bringing data and live data and expanding the types of data, including structured, we just heard about that, unstructured, geospatial, and the list is going to continue on and on. Eventually I think it's going to bleed into the edge if we can figure out what to do with that edge data. Executing on new workloads is a big deal. They started with data sharing and they recently added security and they've essentially created a PaaS layer. We call it a SuperPaaS layer, if you will, to attract application developers. Snowflake has a developer-focused event coming up in November and they've extended the marketplace with 1300 native apps listings. And at the top, that's the holy grail, monetization. We always talk about building data products and we saw a lot of that at this event, very, very impressive and unique. Now here's the thing. There's a lot of talk in the press, in the Wall Street and the broader community about consumption-based pricing and concerns over Snowflake's visibility and its forecast and how analytics may be discretionary. But if you're a company building apps in Snowflake and monetizing like Capital One intends to do, and you're now selling in the marketplace, that is not discretionary, unless of course your costs are greater than your revenue for that service, in which case is going to fail anyway. But the point is we're entering a new error where data apps and data products are beginning to be built and Snowflake is attempting to make the data cloud the defacto place as to where you're going to build them. In our view they're well ahead in that journey. Okay, let's talk about some of the bigger themes that we heard at the event. Bringing apps to the data instead of moving the data to the apps, this was a constant refrain and one that certainly makes sense from a physics point of view. But having a single source of data that is discoverable, sharable and governed with increasingly robust ecosystem options, it doesn't have to be moved. Sometimes it may have to be moved if you're going across regions, but that's unique and a differentiator for Snowflake in our view. I mean, I'm yet to see a data ecosystem that is as rich and growing as fast as the Snowflake ecosystem. Monetization, we talked about that, industry clouds, financial services, healthcare, retail, and media, all front and center at the event. My understanding is that Frank Slootman was a major force behind this shift, this development and go to market focus on verticals. It's really an attempt, and he talked about this in his keynote to align with the customer mission ultimately align with their objectives which not surprisingly, are increasingly monetizing with data as a differentiating ingredient. We heard a ton about data mesh, there were numerous presentations about the topic. And I'll say this, if you map the seven pillars Snowflake talks about, Benoit Dageville talked about this in his keynote, but if you map those into Zhamak Dehghani's data mesh framework and the four principles, they align better than most of the data mesh washing that I've seen. The seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace and governance. Those are the seven pillars that he talked about in his keynote. All data, well, maybe with hybrid tables that becomes more of a reality. Global architecture means the data is globally distributed. It's not necessarily physically in one place. Self-managed is key. Self-service infrastructure is one of Zhamak's four principles. And then inherent governance. Zhamak talks about computational, what I'll call automated governance, built in. And with all the talk about monetization, that aligns with the second principle which is data as product. So while it's not a pure hit and to its credit, by the way, Snowflake doesn't use data mesh in its messaging anymore. But by the way, its customers do, several customers talked about it. Geico, JPMC, and a number of other customers and partners are using the term and using it pretty closely to the concepts put forth by Zhamak Dehghani. But back to the point, they essentially, Snowflake that is, is building a proprietary system that substantially addresses some, if not many of the goals of data mesh. Okay, back to the list, supercloud, that's our term. We saw lots of examples of clouds on top of clouds that are architected to spin multiple clouds, not just run on individual clouds as separate services. And this includes Snowflake's data cloud itself but a number of ecosystem partners that are headed in a very similar direction. Snowflake still talks about data sharing but now it uses the term collaboration in its high level messaging, which is I think smart. Data sharing is kind of a geeky term. And also this is an attempt by Snowflake to differentiate from everyone else that's saying, hey, we do data sharing too. And finally Snowflake doesn't say data marketplace anymore. It's now marketplace, accounting for its application market. Okay, let's take a quick look at the competitive landscape via this ETR X-Y graph. Vertical access remembers net score or spending momentum and the x-axis is penetration, pervasiveness in the data center. That's what ETR calls overlap. Snowflake continues to lead on the vertical axis. They guide it conservatively last quarter, remember, so I wouldn't be surprised if that lofty height, even though it's well down from its earlier levels but I wouldn't be surprised if it ticks down again a bit in the July survey, which will be in the field shortly. Databricks is a key competitor obviously at a strong spending momentum, as you can see. We didn't draw it here but we usually draw that 40% line or red line at 40%, anything above that is considered elevated. So you can see Databricks is quite elevated. But it doesn't have the market presence of Snowflake. It didn't get to IPO during the bubble and it doesn't have nearly as deep and capable go-to market machinery. Now, they're getting better and they're getting some attention in the market, nonetheless. But as a private company, you just naturally, more people are aware of Snowflake. Some analysts, Tony Bear in particular, believe Mongo and Snowflake are on a bit of a collision course long term. I actually can see his point. You know, I mean, they're both platforms, they're both about data. It's long ways off, but you can see them sort of in a similar path. They talk about kind of similar aspirations and visions even though they're quite in different markets today but they're definitely participating in similar tam. The cloud players are probably the biggest or definitely the biggest partners and probably the biggest competitors to Snowflake. And then there's always Oracle. Doesn't have the spending velocity of the others but it's got strong market presence. It owns a cloud and it knows a thing about data and it definitely is a go-to market machine. Okay, we're going to end on some of the things that we heard in the ecosystem. 'Cause look, we've heard before how particular technology, enterprise data warehouse, data hubs, MDM, data lakes, Hadoop, et cetera. We're going to solve all of our data problems and of course they didn't. And in fact, sometimes they create more problems that allow vendors to push more incremental technology to solve the problems that they created. Like tools and platforms to clean up the no schema on right nature of data lakes or data swamps. But here are some of the things that I heard firsthand from some customers and partners. First thing is, they said to me that they're having a hard time keeping up sometimes with the pace of Snowflake. It reminds me of AWS in 2014, 2015 timeframe. You remember that fire hose of announcements which causes increased complexity for customers and partners. I talked to several customers that said, well, yeah this is all well and good but I still need skilled people to understand all these tools that I'm integrated in the ecosystem, the catalogs, the machine learning observability. A number of customers said, I just can't use one governance tool, I need multiple governance tools and a lot of other technologies as well, and they're concerned that that's going to drive up their cost and their complexity. I heard other concerns from the ecosystem that it used to be sort of clear as to where they could add value you know, when Snowflake was just a better data warehouse. But to point number one, they're either concerned that they'll be left behind or they're concerned that they'll be subsumed. Look, I mean, just like we tell AWS customers and partners, you got to move fast, you got to keep innovating. If you don't, you're going to be left. Either if your customer you're going to be left behind your competitor, or if you're a partner, somebody else is going to get there or AWS is going to solve the problem for you. Okay, and there were a number of skeptical practitioners, really thoughtful and experienced data pros that suggested that they've seen this movie before. That's hence the same wine, new bottle. Well, this time around I certainly hope not given all the energy and investment that is going into this ecosystem. And the fact is Snowflake is unquestionably making it easier to put data to work. They built on AWS so you didn't have to worry about provisioning, compute and storage and networking and scaling. Snowflake is optimizing its platform to take advantage of things like Graviton so you don't have to, and they're doing some of their own optimization tools. The ecosystem is building optimization tools so that's all good. And firm belief is the less expensive it is, the more data will get brought into the data cloud. And they're building a data platform on which their ecosystem can build and run data applications, aka data products without having to worry about all the hard work that needs to get done to make data discoverable, shareable, and governed. And unlike the last 10 years, you don't have to be a keeper and integrate all the animals in the Hadoop zoo. Okay, that's it for today, thanks for watching. Thanks to my colleague, Stephanie Chan who helps research "Breaking Analysis" topics. Sometimes Alex Myerson is on production and manages the podcasts. Kristin Martin and Cheryl Knight help get the word out on social and in our newsletters, and Rob Hof is our editor in chief over at Silicon, and Hailey does some wonderful editing, thanks to all. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis Podcasts. I publish each week on wikibon.com and siliconangle.com and you can email me at David.Vellante@siliconangle.com or DM me @DVellante. If you got something interesting, I'll respond. If you don't, I'm sorry I won't. Or comment on my LinkedIn post. Please check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time. (upbeat music)

Published Date : Jun 18 2022

SUMMARY :

bringing you data driven that the ecosystem excitement here on the show floor. and the action in the ecosystem. Didn't start out that way, you know, One of the big knocks against Snowflake the idea of bring your data of the question, he did answer. is the one that looks like, and from the ecosystem, And so a couple of the So that way, you know, from the Hadoop era is the fact the defacto place as to where

ENTITIES

Entity	Category	Confidence
Frank Slootman	PERSON	0.99+
Frank Slootman	PERSON	0.99+
Doug Henschen	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Christian Kleinerman	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Rob Hof	PERSON	0.99+
Benoit Dageville	PERSON	0.99+
2014	DATE	0.99+
Matt Sulkins	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
2019	DATE	0.99+
Cheryl Knight	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Denise Persson	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Tony Bear	PERSON	0.99+
Dave Menninger	PERSON	0.99+
Dell	ORGANIZATION	0.99+
July	DATE	0.99+
Geico	ORGANIZATION	0.99+
November	DATE	0.99+
Snowflake	TITLE	0.99+
40%	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
App Store	TITLE	0.99+
Capital One	ORGANIZATION	0.99+
second principle	QUANTITY	0.99+
Sanjeev Mohan	PERSON	0.99+
Snowflake	ORGANIZATION	0.99+
1300 native apps	QUANTITY	0.99+
Tony Bear	PERSON	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
Kristin Martin	PERSON	0.99+
Mongo	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Snowflake Summit 2022	EVENT	0.99+
First	QUANTITY	0.99+
two	DATE	0.99+
Python	TITLE	0.99+
10 different tables	QUANTITY	0.99+
Facebook	ORGANIZATION	0.99+
ETR	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Snowflake	EVENT	0.98+
one place	QUANTITY	0.98+
each week	QUANTITY	0.98+
O'Reilly	ORGANIZATION	0.98+
This week	DATE	0.98+
Hadoop World	EVENT	0.98+
this week	DATE	0.98+
Pure	ORGANIZATION	0.98+
about 40 partners	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
last quarter	DATE	0.98+
One	QUANTITY	0.98+
S3	TITLE	0.97+
Hadoop	LOCATION	0.97+
single	QUANTITY	0.97+
Caesars Forum	LOCATION	0.97+
Iceberg	TITLE	0.97+
single source	QUANTITY	0.97+
Silicon	ORGANIZATION	0.97+
Nearly 10,000 people	QUANTITY	0.97+
Apache Iceberg	ORGANIZATION	0.97+

theCUBE Insights | Snowflake Summit 2022

(upbeat music) >> Hey everyone, welcome back to theCUBE's three day coverage of Snowflake Summit 22. Lisa Martin here with Dave Vellante. We have been here as I said for three days. Dave, we have had an amazing three days. The energy, the momentum, the number of people still here speaks volumes for- >> Yeah, I was just saying, you look back, theCUBE, when it started, early days was a big part of the Hadoop ecosystem. You know Cloudera kind of got it started, the whole big data movement, it was awesome energy, and that whole ecosystem has been, I think, just hoovered into the Snowflake ecosystem. They've taken over as the data company, the data cloud, I mean, that was Cloudera, it could have been Cloudera, and now they didn't, they missed it, it was a variety of factors, but Snowflake has nailed it. And now it's theirs to lose. Benoit talked about that on our previous segment, how he knew that technically Hadoop was too complex, and was going to fail, and they didn't know it was going to do this. They were going to turn their company into what we see here. But the event itself, Lisa, is almost 10,000 people, the right people, people are doing business, we've had a number of people tell us that they're booking deals. That's why people come to face-to-face shows, right? That's the criticism of virtual. It takes too long to close business. Salespeople want to be belly-to-belly. And this is a belly-to belly-show. >> It absolutely is. When you and I were trying to get into the keynote on Tuesday, we finally got in standing room only, multiple overflow rooms, and we're even hearing that, so this is day four of the summit for them, there are still queues to get into breakout sessions. The momentum, but the appetite for this flywheel, and what they're creating, but also they're involving this massively growing ecosystem in its evolution. It's that synergy was really very much heard, and echoed throughout pretty much all of our segments the last couple days. >> Yeah, it was amazing actually. So we like to go, we want to be in the front row in the keynotes, we're taking notes, we always do that. Sometimes we listen remotely, but when you listen remotely, you miss some things. When you're there, you can see the executives, you can feel their energy, you can chit chat to them on the side, be seen, whatever. And it was crazy, we couldn't get in. So we had to do our thing, and sneak our way in, and "Hey, we're media." "Oh yeah, come on in." And then no, they were taking us to a breakout room. We had to sneak in a side door, got like the last two seats, and wow, I'm glad we were in there because it gave us a better sense. When you're in the remote watching rooms you just can't get a sense of the energy. That's why I like to be there, I know you do too. And then to your point about ecosystem. So we've said many times that what Snowflake is developing is what we call supercloud. It's not just a SaaS, it's not just a cloud database, it's a new layer that they're creating. And so what are the attributes of that layer? Well, it hides the underlying complexity of the underlying primitives of the cloud. We've said that ad nauseam, and it adds new value on top. Well, what's that value that they're adding? Well, they're adding value of being able to share data, collaborate, have data that's governed, and secure, globally. And now the other hallmark of a cloud company is ecosystem. And so they're building that ecosystem much more rapidly than we saw at ServiceNow, which is Slootman's previous company. And the key to me is they've launched an application development platform, essentially a super PaaS, so that you can develop applications on top of the data cloud. And we're hearing tons about monetization. Duh, you could actually make money with data. You can package data into data products, and data services, or feed data products and services, and actually sell that in a cloud, in a supercloud. That's exactly what's happening here. So that's critical. I think my one question mark if I had to lay one out, is the other hallmark of a cloud is startup, startups come into that cloud. And I think we're seeing that, maybe not at the pace that AWS did, it's a little different. Snowflake are, they're whale hunters. They're after big companies. But it looks to me like they're relying on the ecosystem to be the startup innovators. That's the important thing about cloud, cloud brings scale. It definitely brings lower cost 'cause you're eliminating all this undifferentiated labor, but it also brings innovation through startups. So unlike AWS, who sold the startups directly, and startups built businesses on AWS, and by paying AWS, it's a little bit indirect, but it's actually happening where startups in the ecosystem are building products on the data cloud, and that ultimately is going to drive value for customers, and money for Snowflake, and ultimately AWS, and Google, and Azure. The other thing I would say is the criticism or concern that the cost of goods sold for cloud are going to be so high that it's going to force people to come back on-prem. I think it's a step in the wrong direction. I think cloud, and the cloud operating model is here to stay. I think it's going to be very difficult to replicate that on-prem. I don't think you can do cloud without cloud, and we'll see what the edge brings. >> Curious what your thoughts are. We were just at Dell technologies world a month or so ago when the big announcement, the Snowflake partnership there, cloud native companies recognizing, ah, there's still a lot of data that lives on-prem. Given that, and everything that we've heard the last couple of days, what are your thoughts around that and their partnerships there? >> So Dell is, I think finally, now maybe they weren't publicly talking like this, but certainly their marketing was defensive. But in the last year or so, Dell has really embraced cloud, not just the cloud operating model, Dell has said, "Look, we can build value on top of all these hyperscalers." And we saw some examples at Dell Tech World of them stepping their toe into supercloud. Project Alpine is an example, and there are others. And then of course the Snowflake deal, where Snowflake and Dell got together, I asked Frank Slootman how that deal came about. And 'cause I said, "Did the customer get you into a headlock?" 'Cause I presume that was the case. Customer said, "You got to do this or we're not going to do business with you." He said, "Well, no, not really. Michael and I had a chat, and that's how it started." Which was my other scenario, and that's exactly what happened I guess. The point being that those worlds are coming together. And so what it means for Dell is as they embrace cloud, as they develop supercloud capabilities, they're going to do a lot of business. Dell for sure knows how to sell, they know how to execute. What I would be doing if I were Dell, is I would be trying to substantially replicate what's happening in the cloud on-prem with on-prem data. So what happens with that Snowflake deal is, it's read-only data, you read the data into the cloud, the compute is in the cloud. And I should've asked Terry this, I mean Benoit. Can there be an architecture on-prem? We've seen at Vertica has one, it's called Vertica Eon where you separate compute from storage. It doesn't have unlimited elasticity, but you can grow, compute, and storage independently, and have a lot more. With Dell doing APEX on demand, it's cloudlike, they could begin to develop a little mini data cloud, or a big data cloud within on-prem that connects to the public cloud. So what Snowflake is missing, a big part of their TAM that they're missing is the on-prem. The Dell and Pure deals are forays into that, but this on-prem is massive, and Dell is the on-prem poster child. So I think again what it means for them is they've got to continue to embrace it, they got to do more in software, more in data management, they got to push on APEX. And I'd say the same thing for HPE. I think they're both well behind this in terms of ecosystems. I mean they're not even close. But they have to start, and they got to start somewhere, and they've got resources to make it happen. >> You said in your breaking analysis that you published just a few days ago before the event that Snowflake plans to create a de facto standard in data platforms. What we heard from our guests on this program, your mainstage session with Frank Slootman. Still think that? >> I do. I think it more than I believed it coming in. And the reason I called it that is because I am a super fan of Zhamak Dehghani and her data mesh. And what her vision is, it's kind of the Immaculate Conception, where she wants everything to be open, open standards, and those don't exist today. And I think she perfectly realizes the practicality of de facto standards are going to get to market, and add value sooner than open standards. Now open standards over time, and I'll come back to that, may occur, but that's clear to me what Snowflake is creating, is the de facto standard for data platforms, the data cloud, the supercloud. And what's most impressive, or I think really important, is they're layering applications now on top of that. The metric to me, and I don't know if we can even count this, but VMware used to use it. For every dollar spent on VMware license, $15 was spent in the ecosystem. It started at 1 to 1.5, 1 to 2, 1 to 10, 1 to 15, I think it went up to 1 to 30 at the max. I don't know how they counted that, but it's countable. Reasonable people can make estimates like that. And I think as the ecosystem grows, what Snowflake's doing is it's in many respects modeling the cloud, what the cloud has. Cloud has ecosystems, we talked about startups, and the cloud also has optionality. And optionality means open source. So what you saw with Apache Iceberg is we're going to extend to open technologies. What you saw with Hybrid tables is we're going to extend a new workloads like transactions. The other thing about Snowflake that's really impressive is you're seeing the vertical focus. Financial services, healthcare, retail, media and entertainment. It's very rare for a company in this tenure, they're only 10 years old, to really start going vertical with their go-to-market, and building expertise around that. I think what's going to happen is the GSIs are going to come in, they love to eat at the trough, the trough here is maybe not big enough for them yet, but it will be. And they're going to start to align with the GSIs, and they're going to do really well within those industries, connecting people, collaborating with data. But I think it's a killer strategy, but they're executing on it. >> Right, and we heard a lot of great customer stories from all of those four verticals that you talked about, and then some, that that direction and that pivot from a customer perspective, from a sales and marketing perspective is all aligned. And that was kind of one of the themes as well that Frank talked about in his keynote is mission alignment, mission alignment with customers, but also with the ecosystem. And I feel that I heard that with every customer conversation, with every partner conversation, and Snowflake conversation that we had over the last I think 36 segments, Dave. >> Yeah, I mean, yeah, it's the power of many versus the resources of one. And even though Snowflake tell you they have $5 billion in cash, and assets on the balance sheet, and that's fine, that's nothing compared to what an ecosystem has. And Amazon's part of that ecosystem. Azure is part of that ecosystem. Google is part of that ecosystem. Those companies have huge resources, and Snowflake it seems has figured out how to tap those resources, and build value on top of it. To me they're doing a better job than a lot of the cloud databases out there. They don't necessarily have a better database, in fact, I could argue that their database is less functional. And I would argue that actually in many cases. Their database is less functional if you just want a database. But if you want a data cloud, and an ecosystem, and develop applications on top of that, and to be able to monetize, that's unique, and that is a moat that they're building that is highly differentiable, and being able to do that relatively easily. I mean, I think they overstate the simplicity with which that is being done. We talked to some customers who said, he didn't say same wine, new bottle. I did ask him that, about Hadoop complexity. And he said, "No, it's not that bad." But you still got to put this stuff together. And I think in the early parts of a market that are immature, people get really excited because it's so much easier than what was previous. So my other question is, okay, what's somebody working on now, that's looking at what Snowflake's doing and saying, I can improve on that. And what's going to be really interesting to see is, can they improve on it in a way, and can they raise enough capital such that they can disrupt, or is Snowflake going to keep staying paranoid, 'cause they got good leaders, and keep executing? And then I think the other wild card is edge. Snowflake doesn't really have an edge strategy right now. I think they will develop one. >> Through the ecosystem? >> And I don't think they're missing the boat, and they'll do it through the ecosystem, exactly. I don't think they're missing the boat, I think they're just like, "Well, we don't know what to do today." It's all distributed data, and it's ephemeral, and nobody's storing the data. You know anything that comes back to the cloud, we get. But new architectures are emerging on the edge that are going to bring new economics. There's new silicon, you see what's happening with Apple, and the M1, the M1 Ultra, and the new systems that they've just developed. What Tesla is doing with custom silicon, and amazing things, and programmability of the arm model. So it's early days, but semiconductors are the mainspring of innovation in this industry. Without chips, you got nothing. And when you get innovations in silicon, it drives innovations in software, because developers go, "Wow, I can do that now?" I can do things in parallel, I can do things faster, I can do things more simply, and programmable at scale. So that's happening. And that's going to bring a new set of economics that the premise is that will eventually bleed into the data center. It will, it always does. And I guess the other thing is every 15 years or so, the world gets disrupted, the tech world. We're about 15, 16 years in now to the cloud. So at this point, everybody's like, "Wow this is insurmountable, this is all we'll ever see. Everything that's ever been invented, this is the model of the future." We know that's not the case. I don't know how it's going to get disrupted, but I think edge is going to be part of that. It could be public policy. Governments could come in and take big tech on, seems like Sharekhan wants to do that. So that's what makes this industry so fun. >> Never a dull moment, Dave. This has been a great three days hosting this show with you. We've uncovered a lot. Your breaking analysis was great to get me prepared for the show. If you haven't seen it, check it out on siliconangle.com. Thanks, Dave, I appreciate all of your insights. >> Thank you, Lisa, It's been a pleasure working with you. >> Always good to work with you. >> Awesome, great job. >> Likewise. Great job to the team. >> Yes, thank you to our awesome production team. They've kept us going for three days. >> Yes, and the team back, Kristin, and Cheryl, and everybody back at the office. >> Exactly, it takes a village. For Dave Vellante, I am Lisa Martin. We are wrappin' up three days of wall-to-wall coverage at Snowflake Summit 22 from Vegas. Thanks for watching guys, we'll see you soon. (upbeat music)

Published Date : Jun 17 2022

SUMMARY :

The energy, the momentum, And now it's theirs to lose. The momentum, but the And the key to me is they've launched the last couple of days, and Dell is the on-prem poster child. that Snowflake plans to is the GSIs are going to come in, And I feel that I heard that and assets on the balance And I guess the other thing to get me prepared for the show. a pleasure working with you. Great job to the team. Yes, thank you to our Yes, and the team guys, we'll see you soon.

ENTITIES

Entity	Category	Confidence
Frank Slootman	PERSON	0.99+
Michael	PERSON	0.99+
Kristin	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cheryl	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Terry	PERSON	0.99+
Lisa	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Dell	ORGANIZATION	0.99+
$15	QUANTITY	0.99+
$5 billion	QUANTITY	0.99+
Vertica	ORGANIZATION	0.99+
Tuesday	DATE	0.99+
Vegas	LOCATION	0.99+
Benoit	PERSON	0.99+
three days	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Apache Iceberg	ORGANIZATION	0.99+
three day	QUANTITY	0.99+
Snowflake Summit 22	EVENT	0.99+
last year	DATE	0.99+
Apple	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
1	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
15	QUANTITY	0.98+
36 segments	QUANTITY	0.98+
30	QUANTITY	0.98+
1.5	QUANTITY	0.98+
M1 Ultra	COMMERCIAL_ITEM	0.98+
10	QUANTITY	0.98+
today	DATE	0.98+
theCUBE	ORGANIZATION	0.97+
siliconangle.com	OTHER	0.97+
both	QUANTITY	0.97+
Snowflake Summit 2022	EVENT	0.97+
2	QUANTITY	0.96+
Cloudera	ORGANIZATION	0.96+
M1	COMMERCIAL_ITEM	0.94+
Vertica Eon	ORGANIZATION	0.94+
two seats	QUANTITY	0.94+
Dell Tech World	ORGANIZATION	0.92+
few days ago	DATE	0.92+
one question	QUANTITY	0.91+
one	QUANTITY	0.91+
ServiceNow	ORGANIZATION	0.91+
up	QUANTITY	0.9+
VMware	ORGANIZATION	0.9+
10 years old	QUANTITY	0.89+
TAM	ORGANIZATION	0.87+
four verticals	QUANTITY	0.85+
almost 10,000 people	QUANTITY	0.84+
a month or so ago	DATE	0.83+
last couple of days	DATE	0.82+

Power Panel: Does Hardware Still Matter

(upbeat music) >> The ascendancy of cloud and SAS has shown new light on how organizations think about, pay for, and value hardware. Once sought after skills for practitioners with expertise in hardware troubleshooting, configuring ports, tuning storage arrays, and maximizing server utilization has been superseded by demand for cloud architects, DevOps pros, developers with expertise in microservices, container, application development, and like. Even a company like Dell, the largest hardware company in enterprise tech touts that it has more software engineers than those working in hardware. Begs the question, is hardware going the way of Coball? Well, not likely. Software has to run on something, but the labor needed to deploy, and troubleshoot, and manage hardware infrastructure is shifting. At the same time, we've seen the value flow also shifting in hardware. Once a world dominated by X86 processors value is flowing to alternatives like Nvidia and arm based designs. Moreover, other componentry like NICs, accelerators, and storage controllers are becoming more advanced, integrated, and increasingly important. The question is, does it matter? And if so, why does it matter and to whom? What does it mean to customers, workloads, OEMs, and the broader society? Hello and welcome to this week's Wikibon theCUBE Insights powered by ETR. In this breaking analysis, we've organized a special power panel of industry analysts and experts to address the question, does hardware still matter? Allow me to introduce the panel. Bob O'Donnell is president and chief analyst at TECHnalysis Research. Zeus Kerravala is the founder and principal analyst at ZK Research. David Nicholson is a CTO and tech expert. Keith Townson is CEO and founder of CTO Advisor. And Marc Staimer is the chief dragon slayer at Dragon Slayer Consulting and oftentimes a Wikibon contributor. Guys, welcome to theCUBE. Thanks so much for spending some time here. >> Good to be here. >> Thanks. >> Thanks for having us. >> Okay before we get into it, I just want to bring up some data from ETR. This is a survey that ETR does every quarter. It's a survey of about 1200 to 1500 CIOs and IT buyers and I'm showing a subset of the taxonomy here. This XY axis and the vertical axis is something called net score. That's a measure of spending momentum. It's essentially the percentage of customers that are spending more on a particular area than those spending less. You subtract the lesses from the mores and you get a net score. Anything the horizontal axis is pervasion in the data set. Sometimes they call it market share. It's not like IDC market share. It's just the percentage of activity in the data set as a percentage of the total. That red 40% line, anything over that is considered highly elevated. And for the past, I don't know, eight to 12 quarters, the big four have been AI and machine learning, containers, RPA and cloud and cloud of course is very impressive because not only is it elevated in the vertical access, but you know it's very highly pervasive on the horizontal. So what I've done is highlighted in red that historical hardware sector. The server, the storage, the networking, and even PCs despite the work from home are depressed in relative terms. And of course, data center collocation services. Okay so you're seeing obviously hardware is not... People don't have the spending momentum today that they used to. They've got other priorities, et cetera, but I want to start and go kind of around the horn with each of you, what is the number one trend that each of you sees in hardware and why does it matter? Bob O'Donnell, can you please start us off? >> Sure Dave, so look, I mean, hardware is incredibly important and one comment first I'll make on that slide is let's not forget that hardware, even though it may not be growing, the amount of money spent on hardware continues to be very, very high. It's just a little bit more stable. It's not as subject to big jumps as we see certainly in other software areas. But look, the important thing that's happening in hardware is the diversification of the types of chip architectures we're seeing and how and where they're being deployed, right? You refer to this in your opening. We've moved from a world of x86 CPUs from Intel and AMD to things like obviously GPUs, DPUs. We've got VPU for, you know, computer vision processing. We've got AI-dedicated accelerators, we've got all kinds of other network acceleration tools and AI-powered tools. There's an incredible diversification of these chip architectures and that's been happening for a while but now we're seeing them more widely deployed and it's being done that way because workloads are evolving. The kinds of workloads that we're seeing in some of these software areas require different types of compute engines than traditionally we've had. The other thing is (coughs), excuse me, the power requirements based on where geographically that compute happens is also evolving. This whole notion of the edge, which I'm sure we'll get into a little bit more detail later is driven by the fact that where the compute actually sits closer to in theory the edge and where edge devices are, depending on your definition, changes the power requirements. It changes the kind of connectivity that connects the applications to those edge devices and those applications. So all of those things are being impacted by this growing diversity in chip architectures. And that's a very long-term trend that I think we're going to continue to see play out through this decade and well into the 2030s as well. >> Excellent, great, great points. Thank you, Bob. Zeus up next, please. >> Yeah, and I think the other thing when you look at this chart to remember too is, you know, through the pandemic and the work from home period a lot of companies did put their office modernization projects on hold and you heard that echoed, you know, from really all the network manufacturers anyways. They always had projects underway to upgrade networks. They put 'em on hold. Now that people are starting to come back to the office, they're looking at that now. So we might see some change there, but Bob's right. The size of those market are quite a bit different. I think the other big trend here is the hardware companies, at least in the areas that I look at networking are understanding now that it's a combination of hardware and software and silicon that works together that creates that optimum type of performance and experience, right? So some things are best done in silicon. Some like data forwarding and things like that. Historically when you look at the way network devices were built, you did everything in hardware. You configured in hardware, they did all the data for you, and did all the management. And that's been decoupled now. So more and more of the control element has been placed in software. A lot of the high-performance things, encryption, and as I mentioned, data forwarding, packet analysis, stuff like that is still done in hardware, but not everything is done in hardware. And so it's a combination of the two. I think, for the people that work with the equipment as well, there's been more shift to understanding how to work with software. And this is a mistake I think the industry made for a while is we had everybody convinced they had to become a programmer. It's really more a software power user. Can you pull things out of software? Can you through API calls and things like that. But I think the big frame here is, David, it's a combination of hardware, software working together that really make a difference. And you know how much you invest in hardware versus software kind of depends on the performance requirements you have. And I'll talk about that later but that's really the big shift that's happened here. It's the vendors that figured out how to optimize performance by leveraging the best of all of those. >> Excellent. You guys both brought up some really good themes that we can tap into Dave Nicholson, please. >> Yeah, so just kind of picking up where Bob started off. Not only are we seeing the rise of a variety of CPU designs, but I think increasingly the connectivity that's involved from a hardware perspective, from a kind of a server or service design perspective has become increasingly important. I think we'll get a chance to look at this in more depth a little bit later but when you look at what happens on the motherboard, you know we're not in so much a CPU-centric world anymore. Various application environments have various demands and you can meet them by using a variety of components. And it's extremely significant when you start looking down at the component level. It's really important that you optimize around those components. So I guess my summary would be, I think we are moving out of the CPU-centric hardware model into more of a connectivity-centric model. We can talk more about that later. >> Yeah, great. And thank you, David, and Keith Townsend I really interested in your perspectives on this. I mean, for years you worked in a data center surrounded by hardware. Now that we have the software defined data center, please chime in here. >> Well, you know, I'm going to dig deeper into that software-defined data center nature of what's happening with hardware. Hardware is meeting software infrastructure as code is a thing. What does that code look like? We're still trying to figure out but servicing up these capabilities that the previous analysts have brought up, how do I ensure that I can get the level of services needed for the applications that I need? Whether they're legacy, traditional data center, workloads, AI ML, workloads, workloads at the edge. How do I codify that and consume that as a service? And hardware vendors are figuring this out. HPE, the big push into GreenLake as a service. Dale now with Apex taking what we need, these bare bone components, moving it forward with DDR five, six CXL, et cetera, and surfacing that as cold or as services. This is a very tough problem. As we transition from consuming a hardware-based configuration to this infrastructure as cold paradigm shift. >> Yeah, programmable infrastructure, really attacking that sort of labor discussion that we were having earlier, okay. Last but not least Marc Staimer, please. >> Thanks, Dave. My peers raised really good points. I agree with most of them, but I'm going to disagree with the title of this session, which is, does hardware matter? It absolutely matters. You can't run software on the air. You can't run it in an ephemeral cloud, although there's the technical cloud and that's a different issue. The cloud is kind of changed everything. And from a market perspective in the 40 plus years I've been in this business, I've seen this perception that hardware has to go down in price every year. And part of that was driven by Moore's law. And we're coming to, let's say a lag or an end, depending on who you talk to Moore's law. So we're not doubling our transistors every 18 to 24 months in a chip and as a result of that, there's been a higher emphasis on software. From a market perception, there's no penalty. They don't put the same pressure on software from the market to reduce the cost every year that they do on hardware, which kind of bass ackwards when you think about it. Hardware costs are fixed. Software costs tend to be very low. It's kind of a weird thing that we do in the market. And what's changing is we're now starting to treat hardware like software from an OPEX versus CapEx perspective. So yes, hardware matters. And we'll talk about that more in length. >> You know, I want to follow up on that. And I wonder if you guys have a thought on this, Bob O'Donnell, you and I have talked about this a little bit. Marc, you just pointed out that Moore's laws could have waning. Pat Gelsinger recently at their investor meeting said that he promised that Moore's law is alive and well. And the point I made in breaking analysis was okay, great. You know, Pat said, doubling transistors every 18 to 24 months, let's say that Intel can do that. Even though we know it's waning somewhat. Look at the M1 Ultra from Apple (chuckles). In about 15 months increased transistor density on their package by 6X. So to your earlier point, Bob, we have this sort of these alternative processors that are really changing things. And to Dave Nicholson's point, there's a whole lot of supporting components as well. Do you have a comment on that, Bob? >> Yeah, I mean, it's a great point, Dave. And one thing to bear in mind as well, not only are we seeing a diversity of these different chip architectures and different types of components as a number of us have raised the other big point and I think it was Keith that mentioned it. CXL and interconnect on the chip itself is dramatically changing it. And a lot of the more interesting advances that are going to continue to drive Moore's law forward in terms of the way we think about performance, if perhaps not number of transistors per se, is the interconnects that become available. You're seeing the development of chiplets or tiles, people use different names, but the idea is you can have different components being put together eventually in sort of a Lego block style. And what that's also going to allow, not only is that going to give interesting performance possibilities 'cause of the faster interconnect. So you can share, have shared memory between things which for big workloads like AI, huge data sets can make a huge difference in terms of how you talk to memory over a network connection, for example, but not only that you're going to see more diversity in the types of solutions that can be built. So we're going to see even more choices in hardware from a silicon perspective because you'll be able to piece together different elements. And oh, by the way, the other benefit of that is we've reached a point in chip architectures where not everything benefits from being smaller. We've been so focused and so obsessed when it comes to Moore's law, to the size of each individual transistor and yes, for certain architecture types, CPUs and GPUs in particular, that's absolutely true, but we've already hit the point where things like RF for 5g and wifi and other wireless technologies and a whole bunch of other things actually don't get any better with a smaller transistor size. They actually get worse. So the beauty of these chiplet architectures is you could actually combine different chip manufacturing sizes. You know you hear about four nanometer and five nanometer along with 14 nanometer on a single chip, each one optimized for its specific application yet together, they can give you the best of all worlds. And so we're just at the very beginning of that era, which I think is going to drive a ton of innovation. Again, gets back to my comment about different types of devices located geographically different places at the edge, in the data center, you know, in a private cloud versus a public cloud. All of those things are going to be impacted and there'll be a lot more options because of this silicon diversity and this interconnect diversity that we're just starting to see. >> Yeah, David. David Nicholson's got a graphic on that. They're going to show later. Before we do that, I want to introduce some data. I actually want to ask Keith to comment on this before we, you know, go on. This next slide is some data from ETR that shows the percent of customers that cited difficulty procuring hardware. And you can see the red is they had significant issues and it's most pronounced in laptops and networking hardware on the far right-hand side, but virtually all categories, firewalls, peripheral servers, storage are having moderately difficult procurement issues. That's the sort of pinkish or significant challenges. So Keith, I mean, what are you seeing with your customers in the hardware supply chains and bottlenecks? And you know we're seeing it with automobiles and appliances but so it goes beyond IT. The semiconductor, you know, challenges. What's been the impact on the buyer community and society and do you have any sense as to when it will subside? >> You know, I was just asked this question yesterday and I'm feeling the pain. People question, kind of a side project within the CTO advisor, we built a hybrid infrastructure, traditional IT data center that we're walking with the traditional customer and modernizing that data center. So it was, you know, kind of a snapshot of time in 2016, 2017, 10 gigabit, ARISTA switches, some older Dell's 730 XD switches, you know, speeds and feeds. And we said we would modern that with the latest Intel stack and connected to the public cloud and then the pandemic hit and we are experiencing a lot of the same challenges. I thought we'd easily migrate from 10 gig networking to 25 gig networking path that customers are going on. The 10 gig network switches that I bought used are now double the price because you can't get legacy 10 gig network switches because all of the manufacturers are focusing on the more profitable 25 gig for capacity, even the 25 gig switches. And we're focused on networking right now. It's hard to procure. We're talking about nine to 12 months or more lead time. So we're seeing customers adjust by adopting cloud. But if you remember early on in the pandemic, Microsoft Azure kind of gated customers that didn't have a capacity agreement. So customers are keeping an eye on that. There's a desire to abstract away from the underlying vendor to be able to control or provision your IT services in a way that we do with VMware VP or some other virtualization technology where it doesn't matter who can get me the hardware, they can just get me the hardware because it's critically impacting projects and timelines. >> So that's a great setup Zeus for you with Keith mentioned the earlier the software-defined data center with software-defined networking and cloud. Do you see a day where networking hardware is monetized and it's all about the software, or are we there already? >> No, we're not there already. And I don't see that really happening any time in the near future. I do think it's changed though. And just to be clear, I mean, when you look at that data, this is saying customers have had problems procuring the equipment, right? And there's not a network vendor out there. I've talked to Norman Rice at Extreme, and I've talked to the folks at Cisco and ARISTA about this. They all said they could have had blowout quarters had they had the inventory to ship. So it's not like customers aren't buying this anymore. Right? I do think though, when it comes to networking network has certainly changed some because there's a lot more controls as I mentioned before that you can do in software. And I think the customers need to start thinking about the types of hardware they buy and you know, where they're going to use it and, you know, what its purpose is. Because I've talked to customers that have tried to run software and commodity hardware and where the performance requirements are very high and it's bogged down, right? It just doesn't have the horsepower to run it. And, you know, even when you do that, you have to start thinking of the components you use. The NICs you buy. And I've talked to customers that have simply just gone through the process replacing a NIC card and a commodity box and had some performance problems and, you know, things like that. So if agility is more important than performance, then by all means try running software on commodity hardware. I think that works in some cases. If performance though is more important, that's when you need that kind of turnkey hardware system. And I've actually seen more and more customers reverting back to that model. In fact, when you talk to even some startups I think today about when they come to market, they're delivering things more on appliances because that's what customers want. And so there's this kind of app pivot this pendulum of agility and performance. And if performance absolutely matters, that's when you do need to buy these kind of turnkey, prebuilt hardware systems. If agility matters more, that's when you can go more to software, but the underlying hardware still does matter. So I think, you know, will we ever have a day where you can just run it on whatever hardware? Maybe but I'll long be retired by that point. So I don't care. >> Well, you bring up a good point Zeus. And I remember the early days of cloud, the narrative was, oh, the cloud vendors. They don't use EMC storage, they just run on commodity storage. And then of course, low and behold, you know, they've trot out James Hamilton to talk about all the custom hardware that they were building. And you saw Google and Microsoft follow suit. >> Well, (indistinct) been falling for this forever. Right? And I mean, all the way back to the turn of the century, we were calling for the commodity of hardware. And it's never really happened because you can still drive. As long as you can drive innovation into it, customers will always lean towards the innovation cycles 'cause they get more features faster and things. And so the vendors have done a good job of keeping that cycle up but it'll be a long time before. >> Yeah, and that's why you see companies like Pure Storage. A storage company has 69% gross margins. All right. I want to go jump ahead. We're going to bring up the slide four. I want to go back to something that Bob O'Donnell was talking about, the sort of supporting act. The diversity of silicon and we've marched to the cadence of Moore's law for decades. You know, we asked, you know, is Moore's law dead? We say it's moderating. Dave Nicholson. You want to talk about those supporting components. And you shared with us a slide that shift. You call it a shift from a processor-centric world to a connect-centric world. What do you mean by that? And let's bring up slide four and you can talk to that. >> Yeah, yeah. So first, I want to echo this sentiment that the question does hardware matter is sort of the answer is of course it matters. Maybe the real question should be, should you care about it? And the answer to that is it depends who you are. If you're an end user using an application on your mobile device, maybe you don't care how the architecture is put together. You just care that the service is delivered but as you back away from that and you get closer and closer to the source, someone needs to care about the hardware and it should matter. Why? Because essentially what hardware is doing is it's consuming electricity and dollars and the more efficiently you can configure hardware, the more bang you're going to get for your buck. So it's not only a quantitative question in terms of how much can you deliver? But it also ends up being a qualitative change as capabilities allow for things we couldn't do before, because we just didn't have the aggregate horsepower to do it. So this chart actually comes out of some performance tests that were done. So it happens to be Dell servers with Broadcom components. And the point here was to peel back, you know, peel off the top of the server and look at what's in that server, starting with, you know, the PCI interconnect. So PCIE gen three, gen four, moving forward. What are the effects on from an interconnect versus on performance application performance, translating into new orders per minute, processed per dollar, et cetera, et cetera? If you look at the advances in CPU architecture mapped against the advances in interconnect and storage subsystem performance, you can see that CPU architecture is sort of lagging behind in a way. And Bob mentioned this idea of tiling and all of the different ways to get around that. When we do performance testing, we can actually peg CPUs, just running the performance tests without any actual database environments working. So right now we're at this sort of imbalance point where you have to make sure you design things properly to get the most bang per kilowatt hour of power per dollar input. So the key thing here what this is highlighting is just as a very specific example, you take a card that's designed as a gen three PCIE device, and you plug it into a gen four slot. Now the card is the bottleneck. You plug a gen four card into a gen four slot. Now the gen four slot is the bottleneck. So we're constantly chasing these bottlenecks. Someone has to be focused on that from an architectural perspective, it's critically important. So there's no question that it matters. But of course, various people in this food chain won't care where it comes from. I guess a good analogy might be, where does our food come from? If I get a steak, it's a pink thing wrapped in plastic, right? Well, there are a lot of inputs that a lot of people have to care about to get that to me. Do I care about all of those things? No. Are they important? They're critically important. >> So, okay. So all I want to get to the, okay. So what does this all mean to customers? And so what I'm hearing from you is to balance a system it's becoming, you know, more complicated. And I kind of been waiting for this day for a long time, because as we all know the bottleneck was always the spinning disc, the last mechanical. So people who wrote software knew that when they were doing it right, the disc had to go and do stuff. And so they were doing other things in the software. And now with all these new interconnects and flash and things like you could do atomic rights. And so that opens up new software possibilities and combine that with alternative processes. But what's the so what on this to the customer and the application impact? Can anybody address that? >> Yeah, let me address that for a moment. I want to leverage some of the things that Bob said, Keith said, Zeus said, and David said, yeah. So I'm a bit of a contrarian in some of this. For example, on the chip side. As the chips get smaller, 14 nanometer, 10 nanometer, five nanometer, soon three nanometer, we talk about more cores, but the biggest problem on the chip is the interconnect from the chip 'cause the wires get smaller. People don't realize in 2004 the latency on those wires in the chips was 80 picoseconds. Today it's 1300 picoseconds. That's on the chip. This is why they're not getting faster. So we maybe getting a little bit slowing down in Moore's law. But even as we kind of conquer that you still have the interconnect problem and the interconnect problem goes beyond the chip. It goes within the system, composable architectures. It goes to the point where Keith made, ultimately you need a hybrid because what we're seeing, what I'm seeing and I'm talking to customers, the biggest issue they have is moving data. Whether it be in a chip, in a system, in a data center, between data centers, moving data is now the biggest gating item in performance. So if you want to move it from, let's say your transactional database to your machine learning, it's the bottleneck, it's moving the data. And so when you look at it from a distributed environment, now you've got to move the compute to the data. The only way to get around these bottlenecks today is to spend less time in trying to move the data and more time in taking the compute, the software, running on hardware closer to the data. Go ahead. >> So is this what you mean when Nicholson was talking about a shift from a processor centric world to a connectivity centric world? You're talking about moving the bits across all the different components, not having the processor you're saying is essentially becoming the bottleneck or the memory, I guess. >> Well, that's one of them and there's a lot of different bottlenecks, but it's the data movement itself. It's moving away from, wait, why do we need to move the data? Can we move the compute, the processing closer to the data? Because if we keep them separate and this has been a trend now where people are moving processing away from it. It's like the edge. I think it was Zeus or David. You were talking about the edge earlier. As you look at the edge, who defines the edge, right? Is the edge a closet or is it a sensor? If it's a sensor, how do you do AI at the edge? When you don't have enough power, you don't have enough computable. People were inventing chips to do that. To do all that at the edge, to do AI within the sensor, instead of moving the data to a data center or a cloud to do the processing. Because the lag in latency is always limited by speed of light. How fast can you move the electrons? And all this interconnecting, all the processing, and all the improvement we're seeing in the PCIE bus from three, to four, to five, to CXL, to a higher bandwidth on the network. And that's all great but none of that deals with the speed of light latency. And that's an-- Go ahead. >> You know Marc, no, I just want to just because what you're referring to could be looked at at a macro level, which I think is what you're describing. You can also look at it at a more micro level from a systems design perspective, right? I'm going to be the resident knuckle dragging hardware guy on the panel today. But it's exactly right. You moving compute closer to data includes concepts like peripheral cards that have built in intelligence, right? So again, in some of this testing that I'm referring to, we saw dramatic improvements when you basically took the horsepower instead of using the CPU horsepower for the like IO. Now you have essentially offload engines in the form of storage controllers, rate controllers, of course, for ethernet NICs, smart NICs. And so when you can have these sort of offload engines and we've gone through these waves over time. People think, well, wait a minute, raid controller and NVMe? You know, flash storage devices. Does that make sense? It turns out it does. Why? Because you're actually at a micro level doing exactly what you're referring to. You're bringing compute closer to the data. Now, closer to the data meaning closer to the data storage subsystem. It doesn't solve the macro issue that you're referring to but it is important. Again, going back to this idea of system design optimization, always chasing the bottleneck, plugging the holes. Someone needs to do that in this value chain in order to get the best value for every kilowatt hour of power and every dollar. >> Yeah. >> Well this whole drive performance has created some really interesting architectural designs, right? Like Nickelson, the rise of the DPU right? Brings more processing power into systems that already had a lot of processing power. There's also been some really interesting, you know, kind of innovation in the area of systems architecture too. If you look at the way Nvidia goes to market, their drive kit is a prebuilt piece of hardware, you know, optimized for self-driving cars, right? They partnered with Pure Storage and ARISTA to build that AI-ready infrastructure. I remember when I talked to Charlie Giancarlo, the CEO of Pure about when the three companies rolled that out. He said, "Look, if you're going to do AI, "you need good store. "You need fast storage, fast processor and fast network." And so for customers to be able to put that together themselves was very, very difficult. There's a lot of software that needs tuning as well. So the three companies partner together to create a fully integrated turnkey hardware system with a bunch of optimized software that runs on it. And so in that case, in some ways the hardware was leading the software innovation. And so, the variety of different architectures we have today around hardware has really exploded. And I think it, part of the what Bob brought up at the beginning about the different chip design. >> Yeah, Bob talked about that earlier. Bob, I mean, most AI today is modeling, you know, and a lot of that's done in the cloud and it looks from my standpoint anyway that the future is going to be a lot of AI inferencing at the edge. And that's a radically different architecture, Bob, isn't it? >> It is, it's a completely different architecture. And just to follow up on a couple points, excellent conversation guys. Dave talked about system architecture and really this that's what this boils down to, right? But it's looking at architecture at every level. I was talking about the individual different components the new interconnect methods. There's this new thing called UCIE universal connection. I forget what it stands answer for, but it's a mechanism for doing chiplet architectures, but then again, you have to take it up to the system level, 'cause it's all fine and good. If you have this SOC that's tuned and optimized, but it has to talk to the rest of the system. And that's where you see other issues. And you've seen things like CXL and other interconnect standards, you know, and nobody likes to talk about interconnect 'cause it's really wonky and really technical and not that sexy, but at the end of the day it's incredibly important exactly. To the other points that were being raised like mark raised, for example, about getting that compute closer to where the data is and that's where again, a diversity of chip architectures help and exactly to your last comment there Dave, putting that ability in an edge device is really at the cutting edge of what we're seeing on a semiconductor design and the ability to, for example, maybe it's an FPGA, maybe it's a dedicated AI chip. It's another kind of chip architecture that's being created to do that inferencing on the edge. Because again, it's that the cost and the challenges of moving lots of data, whether it be from say a smartphone to a cloud-based application or whether it be from a private network to a cloud or any other kinds of permutations we can think of really matters. And the other thing is we're tackling bigger problems. So architecturally, not even just architecturally within a system, but when we think about DPUs and the sort of the east west data center movement conversation that we hear Nvidia and others talk about, it's about combining multiple sets of these systems to function together more efficiently again with even bigger sets of data. So really is about tackling where the processing is needed, having the interconnect and the ability to get where the data you need to the right place at the right time. And because those needs are diversifying, we're just going to continue to see an explosion of different choices and options, which is going to make hardware even more essential I would argue than it is today. And so I think what we're going to see not only does hardware matter, it's going to matter even more in the future than it does now. >> Great, yeah. Great discussion, guys. I want to bring Keith back into the conversation here. Keith, if your main expertise in tech is provisioning LUNs, you probably you want to look for another job. So maybe clearly hardware matters, but with software defined everything, do people with hardware expertise matter outside of for instance, component manufacturers or cloud companies? I mean, VMware certainly changed the dynamic in servers. Dell just spun off its most profitable asset and VMware. So it obviously thinks hardware can stand alone. How does an enterprise architect view the shift to software defined hyperscale cloud and how do you see the shifting demand for skills in enterprise IT? >> So I love the question and I'll take a different view of it. If you're a data analyst and your primary value add is that you do ETL transformation, talk to a CDO, a chief data officer over midsize bank a little bit ago. He said 80% of his data scientists' time is done on ETL. Super not value ad. He wants his data scientists to do data science work. Chances are if your only value is that you do LUN provisioning, then you probably don't have a job now. The technologies have gotten much more intelligent. As infrastructure pros, we want to give infrastructure pros the opportunities to shine and I think the software defined nature and the automation that we're seeing vendors undertake, whether it's Dell, HP, Lenovo take your pick that Pure Storage, NetApp that are doing the automation and the ML needed so that these practitioners don't spend 80% of their time doing LUN provisioning and focusing on their true expertise, which is ensuring that data is stored. Data is retrievable, data's protected, et cetera. I think the shift is to focus on that part of the job that you're ensuring no matter where the data's at, because as my data is spread across the enterprise hybrid different types, you know, Dave, you talk about the super cloud a lot. If my data is in the super cloud, protecting that data and securing that data becomes much more complicated when than when it was me just procuring or provisioning LUNs. So when you say, where should the shift be, or look be, you know, focusing on the real value, which is making sure that customers can access data, can recover data, can get data at performance levels that they need within the price point. They need to get at those datasets and where they need it. We talked a lot about where they need out. One last point about this interconnecting. I have this vision and I think we all do of composable infrastructure. This idea that scaled out does not solve every problem. The cloud can give me infinite scale out. Sometimes I just need a single OS with 64 terabytes of RAM and 204 GPUs or GPU instances that single OS does not exist today. And the opportunity is to create composable infrastructure so that we solve a lot of these problems that just simply don't scale out. >> You know, wow. So many interesting points there. I had just interviewed Zhamak Dehghani, who's the founder of Data Mesh last week. And she made a really interesting point. She said, "Think about, we have separate stacks. "We have an application stack and we have "a data pipeline stack and the transaction systems, "the transaction database, we extract data from that," to your point, "We ETL it in, you know, it takes forever. "And then we have this separate sort of data stack." If we're going to inject more intelligence and data and AI into applications, those two stacks, her contention is they have to come together. And when you think about, you know, super cloud bringing compute to data, that was what Haduck was supposed to be. It ended up all sort of going into a central location, but it's almost a rhetorical question. I mean, it seems that that necessitates new thinking around hardware architectures as it kind of everything's the edge. And the other point is to your point, Keith, it's really hard to secure that. So when you can think about offloads, right, you've heard the stats, you know, Nvidia talks about it. Broadcom talks about it that, you know, that 30%, 25 to 30% of the CPU cycles are wasted on doing things like storage offloads, or networking or security. It seems like maybe Zeus you have a comment on this. It seems like new architectures need to come other to support, you know, all of that stuff that Keith and I just dispute. >> Yeah, and by the way, I do want to Keith, the question you just asked. Keith, it's the point I made at the beginning too about engineers do need to be more software-centric, right? They do need to have better software skills. In fact, I remember talking to Cisco about this last year when they surveyed their engineer base, only about a third of 'em had ever made an API call, which you know that that kind of shows this big skillset change, you know, that has to come. But on the point of architectures, I think the big change here is edge because it brings in distributed compute models. Historically, when you think about compute, even with multi-cloud, we never really had multi-cloud. We'd use multiple centralized clouds, but compute was always centralized, right? It was in a branch office, in a data center, in a cloud. With edge what we creates is the rise of distributed computing where we'll have an application that actually accesses different resources and at different edge locations. And I think Marc, you were talking about this, like the edge could be in your IoT device. It could be your campus edge. It could be cellular edge, it could be your car, right? And so we need to start thinkin' about how our applications interact with all those different parts of that edge ecosystem, you know, to create a single experience. The consumer apps, a lot of consumer apps largely works that way. If you think of like app like Uber, right? It pulls in information from all kinds of different edge application, edge services. And, you know, it creates pretty cool experience. We're just starting to get to that point in the business world now. There's a lot of security implications and things like that, but I do think it drives more architectural decisions to be made about how I deploy what data where and where I do my processing, where I do my AI and things like that. It actually makes the world more complicated. In some ways we can do so much more with it, but I think it does drive us more towards turnkey systems, at least initially in order to, you know, ensure performance and security. >> Right. Marc, I wanted to go to you. You had indicated to me that you wanted to chat about this a little bit. You've written quite a bit about the integration of hardware and software. You know, we've watched Oracle's move from, you know, buying Sun and then basically using that in a highly differentiated approach. Engineered systems. What's your take on all that? I know you also have some thoughts on the shift from CapEx to OPEX chime in on that. >> Sure. When you look at it, there are advantages to having one vendor who has the software and hardware. They can synergistically make them work together that you can't do in a commodity basis. If you own the software and somebody else has the hardware, I'll give you an example would be Oracle. As you talked about with their exit data platform, they literally are leveraging microcode in the Intel chips. And now in AMD chips and all the way down to Optane, they make basically AMD database servers work with Optane memory PMM in their storage systems, not MVME, SSD PMM. I'm talking about the cards itself. So there are advantages you can take advantage of if you own the stack, as you were putting out earlier, Dave, of both the software and the hardware. Okay, that's great. But on the other side of that, that tends to give you better performance, but it tends to cost a little more. On the commodity side it costs less but you get less performance. What Zeus had said earlier, it depends where you're running your application. How much performance do you need? What kind of performance do you need? One of the things about moving to the edge and I'll get to the OPEX CapEx in a second. One of the issues about moving to the edge is what kind of processing do you need? If you're running in a CCTV camera on top of a traffic light, how much power do you have? How much cooling do you have that you can run this? And more importantly, do you have to take the data you're getting and move it somewhere else and get processed and the information is sent back? I mean, there are companies out there like Brain Chip that have developed AI chips that can run on the sensor without a CPU. Without any additional memory. So, I mean, there's innovation going on to deal with this question of data movement. There's companies out there like Tachyon that are combining GPUs, CPUs, and DPUs in a single chip. Think of it as super composable architecture. They're looking at being able to do more in less. On the OPEX and CapEx issue. >> Hold that thought, hold that thought on the OPEX CapEx, 'cause we're running out of time and maybe you can wrap on that. I just wanted to pick up on something you said about the integrated hardware software. I mean, other than the fact that, you know, Michael Dell unlocked whatever $40 billion for himself and Silverlake, I was always a fan of a spin in with VMware basically become the Oracle of hardware. Now I know it would've been a nightmare for the ecosystem and culturally, they probably would've had a VMware brain drain, but what does anybody have any thoughts on that as a sort of a thought exercise? I was always a fan of that on paper. >> I got to eat a little crow. I did not like the Dale VMware acquisition for the industry in general. And I think it hurt the industry in general, HPE, Cisco walked away a little bit from that VMware relationship. But when I talked to customers, they loved it. You know, I got to be honest. They absolutely loved the integration. The VxRail, VxRack solution exploded. Nutanix became kind of a afterthought when it came to competing. So that spin in, when we talk about the ability to innovate and the ability to create solutions that you just simply can't create because you don't have the full stack. Dell was well positioned to do that with a potential span in of VMware. >> Yeah, we're going to be-- Go ahead please. >> Yeah, in fact, I think you're right, Keith, it was terrible for the industry. Great for Dell. And I remember talking to Chad Sakac when he was running, you know, VCE, which became Rack and Rail, their ability to stay in lockstep with what VMware was doing. What was the number one workload running on hyperconverged forever? It was VMware. So their ability to remain in lockstep with VMware gave them a huge competitive advantage. And Dell came out of nowhere in, you know, the hyper-converged market and just started taking share because of that relationship. So, you know, this sort I guess it's, you know, from a Dell perspective I thought it gave them a pretty big advantage that they didn't really exploit across their other properties, right? Networking and service and things like they could have given the dominance that VMware had. From an industry perspective though, I do think it's better to have them be coupled. So. >> I agree. I mean, they could. I think they could have dominated in super cloud and maybe they would become the next Oracle where everybody hates 'em, but they kick ass. But guys. We got to wrap up here. And so what I'm going to ask you is I'm going to go and reverse the order this time, you know, big takeaways from this conversation today, which guys by the way, I can't thank you enough phenomenal insights, but big takeaways, any final thoughts, any research that you're working on that you want highlight or you know, what you look for in the future? Try to keep it brief. We'll go in reverse order. Maybe Marc, you could start us off please. >> Sure, on the research front, I'm working on a total cost of ownership of an integrated database analytics machine learning versus separate services. On the other aspect that I would wanted to chat about real quickly, OPEX versus CapEx, the cloud changed the market perception of hardware in the sense that you can use hardware or buy hardware like you do software. As you use it, pay for what you use in arrears. The good thing about that is you're only paying for what you use, period. You're not for what you don't use. I mean, it's compute time, everything else. The bad side about that is you have no predictability in your bill. It's elastic, but every user I've talked to says every month it's different. And from a budgeting perspective, it's very hard to set up your budget year to year and it's causing a lot of nightmares. So it's just something to be aware of. From a CapEx perspective, you have no more CapEx if you're using that kind of base system but you lose a certain amount of control as well. So ultimately that's some of the issues. But my biggest point, my biggest takeaway from this is the biggest issue right now that everybody I talk to in some shape or form it comes down to data movement whether it be ETLs that you talked about Keith or other aspects moving it between hybrid locations, moving it within a system, moving it within a chip. All those are key issues. >> Great, thank you. Okay, CTO advisor, give us your final thoughts. >> All right. Really, really great commentary. Again, I'm going to point back to us taking the walk that our customers are taking, which is trying to do this conversion of all primary data center to a hybrid of which I have this hard earned philosophy that enterprise IT is additive. When we add a service, we rarely subtract a service. So the landscape and service area what we support has to grow. So our research focuses on taking that walk. We are taking a monolithic application, decomposing that to containers, and putting that in a public cloud, and connecting that back private data center and telling that story and walking that walk with our customers. This has been a super enlightening panel. >> Yeah, thank you. Real, real different world coming. David Nicholson, please. >> You know, it really hearkens back to the beginning of the conversation. You talked about momentum in the direction of cloud. I'm sort of spending my time under the hood, getting grease under my fingernails, focusing on where still the lions share of spend will be in coming years, which is OnPrem. And then of course, obviously data center infrastructure for cloud but really diving under the covers and helping folks understand the ramifications of movement between generations of CPU architecture. I know we all know Sapphire Rapids pushed into the future. When's the next Intel release coming? Who knows? We think, you know, in 2023. There have been a lot of people standing by from a practitioner's standpoint asking, well, what do I do between now and then? Does it make sense to upgrade bits and pieces of hardware or go from a last generation to a current generation when we know the next generation is coming? And so I've been very, very focused on looking at how these connectivity components like rate controllers and NICs. I know it's not as sexy as talking about cloud but just how these opponents completely change the game and actually can justify movement from say a 14th-generation architecture to a 15th-generation architecture today, even though gen 16 is coming, let's say 12 months from now. So that's where I am. Keep my phone number in the Rolodex. I literally reference Rolodex intentionally because like I said, I'm in there under the hood and it's not as sexy. But yeah, so that's what I'm focused on Dave. >> Well, you know, to paraphrase it, maybe derivative paraphrase of, you know, Larry Ellison's rant on what is cloud? It's operating systems and databases, et cetera. Rate controllers and NICs live inside of clouds. All right. You know, one of the reasons I love working with you guys is 'cause have such a wide observation space and Zeus Kerravala you, of all people, you know you have your fingers in a lot of pies. So give us your final thoughts. >> Yeah, I'm not a propeller heady as my chip counterparts here. (all laugh) So, you know, I look at the world a little differently and a lot of my research I'm doing now is the impact that distributed computing has on customer employee experiences, right? You talk to every business and how the experiences they deliver to their customers is really differentiating how they go to market. And so they're looking at these different ways of feeding up data and analytics and things like that in different places. And I think this is going to have a really profound impact on enterprise IT architecture. We're putting more data, more compute in more places all the way down to like little micro edges and retailers and things like that. And so we need the variety. Historically, if you think back to when I was in IT you know, pre-Y2K, we didn't have a lot of choice in things, right? We had a server that was rack mount or standup, right? And there wasn't a whole lot of, you know, differences in choice. But today we can deploy, you know, these really high-performance compute systems on little blades inside servers or inside, you know, autonomous vehicles and things. I think the world from here gets... You know, just the choice of what we have and the way hardware and software works together is really going to, I think, change the world the way we do things. We're already seeing that, like I said, in the consumer world, right? There's so many things you can do from, you know, smart home perspective, you know, natural language processing, stuff like that. And it's starting to hit businesses now. So just wait and watch the next five years. >> Yeah, totally. The computing power at the edge is just going to be mind blowing. >> It's unbelievable what you can do at the edge. >> Yeah, yeah. Hey Z, I just want to say that we know you're not a propeller head and I for one would like to thank you for having your master's thesis hanging on the wall behind you 'cause we know that you studied basket weaving. >> I was actually a physics math major, so. >> Good man. Another math major. All right, Bob O'Donnell, you're going to bring us home. I mean, we've seen the importance of semiconductors and silicon in our everyday lives, but your last thoughts please. >> Sure and just to clarify, by the way I was a great books major and this was actually for my final paper. And so I was like philosophy and all that kind of stuff and literature but I still somehow got into tech. Look, it's been a great conversation and I want to pick up a little bit on a comment Zeus made, which is this it's the combination of the hardware and the software and coming together and the manner with which that needs to happen, I think is critically important. And the other thing is because of the diversity of the chip architectures and all those different pieces and elements, it's going to be how software tools evolve to adapt to that new world. So I look at things like what Intel's trying to do with oneAPI. You know, what Nvidia has done with CUDA. What other platform companies are trying to create tools that allow them to leverage the hardware, but also embrace the variety of hardware that is there. And so as those software development environments and software development tools evolve to take advantage of these new capabilities, that's going to open up a lot of interesting opportunities that can leverage all these new chip architectures. That can leverage all these new interconnects. That can leverage all these new system architectures and figure out ways to make that all happen, I think is going to be critically important. And then finally, I'll mention the research I'm actually currently working on is on private 5g and how companies are thinking about deploying private 5g and the potential for edge applications for that. So I'm doing a survey of several hundred us companies as we speak and really looking forward to getting that done in the next couple of weeks. >> Yeah, look forward to that. Guys, again, thank you so much. Outstanding conversation. Anybody going to be at Dell tech world in a couple of weeks? Bob's going to be there. Dave Nicholson. Well drinks on me and guys I really can't thank you enough for the insights and your participation today. Really appreciate it. Okay, and thank you for watching this special power panel episode of theCube Insights powered by ETR. Remember we publish each week on Siliconangle.com and wikibon.com. All these episodes they're available as podcasts. DM me or any of these guys. I'm at DVellante. You can email me at David.Vellante@siliconangle.com. Check out etr.ai for all the data. This is Dave Vellante. We'll see you next time. (upbeat music)

Published Date : Apr 25 2022

SUMMARY :

but the labor needed to go kind of around the horn the applications to those edge devices Zeus up next, please. on the performance requirements you have. that we can tap into It's really important that you optimize I mean, for years you worked for the applications that I need? that we were having earlier, okay. on software from the market And the point I made in breaking at the edge, in the data center, you know, and society and do you have any sense as and I'm feeling the pain. and it's all about the software, of the components you use. And I remember the early days And I mean, all the way back Yeah, and that's why you see And the answer to that is the disc had to go and do stuff. the compute to the data. So is this what you mean when Nicholson the processing closer to the data? And so when you can have kind of innovation in the area that the future is going to be the ability to get where and how do you see the shifting demand And the opportunity is to to support, you know, of that edge ecosystem, you know, that you wanted to chat One of the things about moving to the edge I mean, other than the and the ability to create solutions Yeah, we're going to be-- And I remember talking to Chad the order this time, you know, in the sense that you can use hardware us your final thoughts. So the landscape and service area Yeah, thank you. in the direction of cloud. You know, one of the reasons And I think this is going to The computing power at the edge you can do at the edge. on the wall behind you I was actually a of semiconductors and silicon and the manner with which Okay, and thank you for watching

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Marc Staimer	PERSON	0.99+
Keith Townson	PERSON	0.99+
David Nicholson	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Keith	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Marc	PERSON	0.99+
Bob O'Donnell	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Bob	PERSON	0.99+
HP	ORGANIZATION	0.99+
Lenovo	ORGANIZATION	0.99+
2004	DATE	0.99+
Charlie Giancarlo	PERSON	0.99+
ZK Research	ORGANIZATION	0.99+
Pat	PERSON	0.99+
10 nanometer	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
10 gig	QUANTITY	0.99+
25	QUANTITY	0.99+
Pat Gelsinger	PERSON	0.99+
80%	QUANTITY	0.99+
ARISTA	ORGANIZATION	0.99+
64 terabytes	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
Zeus Kerravala	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Larry Ellison	PERSON	0.99+
25 gig	QUANTITY	0.99+
14 nanometer	QUANTITY	0.99+
2017	DATE	0.99+
2016	DATE	0.99+
Norman Rice	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
69%	QUANTITY	0.99+
30%	QUANTITY	0.99+
OPEX	ORGANIZATION	0.99+
Pure Storage	ORGANIZATION	0.99+
$40 billion	QUANTITY	0.99+
Dragon Slayer Consulting	ORGANIZATION	0.99+

Breaking Analysis: Technology & Architectural Considerations for Data Mesh

>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)

Published Date : Apr 20 2022

SUMMARY :

bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.

ENTITIES

Entity	Category	Confidence
Kristen Martin	PERSON	0.99+
Rob Hoff	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Dave	PERSON	0.99+
Zhamak	PERSON	0.99+
one	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 lakes	QUANTITY	0.99+
Sanji Mohan	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Andrew	PERSON	0.99+
two	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Data Mesh: Delivering Data-Driven Value at Scale	TITLE	0.99+
Boston	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
14 plus years	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two points	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
second layer	QUANTITY	0.99+
2016	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
today	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
hundreds of lakes	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
theCUBE Studios	ORGANIZATION	0.98+
SQL	TITLE	0.98+
one unit	QUANTITY	0.98+
first	QUANTITY	0.98+
100 level	QUANTITY	0.98+
third point	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
three buckets	QUANTITY	0.98+
ETR	ORGANIZATION	0.98+
DevStack	TITLE	0.97+
One	QUANTITY	0.97+
wikibon.com	OTHER	0.97+
both	QUANTITY	0.97+
Thoughtworks	ORGANIZATION	0.96+
one set	QUANTITY	0.96+
one stream	QUANTITY	0.96+
Intuit	ORGANIZATION	0.95+
one way	QUANTITY	0.93+
two worlds	QUANTITY	0.93+
HelloFresh	ORGANIZATION	0.93+
this week	DATE	0.93+
last night	DATE	0.91+
fourth one	QUANTITY	0.91+
Snowflake	TITLE	0.91+
two different models	QUANTITY	0.91+
ML Analytics	TITLE	0.91+
Breaking Analysis	TITLE	0.87+
two worlds	QUANTITY	0.84+

Breaking Analysis: Enterprise Technology Predictions 2022

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR, this is Breaking Analysis with Dave Vellante. >> The pandemic has changed the way we think about and predict the future. As we enter the third year of a global pandemic, we see the significant impact that it's had on technology strategy, spending patterns, and company fortunes Much has changed. And while many of these changes were forced reactions to a new abnormal, the trends that we've seen over the past 24 months have become more entrenched, and point to the way that's coming ahead in the technology business. Hello and welcome to this week's Wikibon CUBE Insights powered by ETR. In this Breaking Analysis, we welcome our partner and colleague and business friend, Erik Porter Bradley, as we deliver what's becoming an annual tradition for Erik and me, our predictions for Enterprise Technology in 2022 and beyond Erik, welcome. Thanks for taking some time out. >> Thank you, Dave. Luckily we did pretty well last year, so we were able to do this again. So hopefully we can keep that momentum going. >> Yeah, you know, I want to mention that, you know, we get a lot of inbound predictions from companies and PR firms that help shape our thinking. But one of the main objectives that we have is we try to make predictions that can be measured. That's why we use a lot of data. Now not all will necessarily fit that parameter, but if you've seen the grading of our 2021 predictions that Erik and I did, you'll see we do a pretty good job of trying to put forth prognostications that can be declared correct or not, you know, as black and white as possible. Now let's get right into it. Our first prediction, we're going to go run into spending, something that ETR surveys for quarterly. And we've reported extensively on this. We're calling for tech spending to increase somewhere around 8% in 2022, we can see there on the slide, Erik, we predicted spending last year would increase by 4% IDC. Last check was came in at five and a half percent. Gardner was somewhat higher, but in general, you know, not too bad, but looking ahead, we're seeing an acceleration from the ETR September surveys, as you can see in the yellow versus the blue bar in this chart, many of the SMBs that were hard hit by the pandemic are picking up spending again. And the ETR data is showing acceleration above the mean for industries like energy, utilities, retail, and services, and also, notably, in the Forbes largest 225 private companies. These are companies like Mars or Koch industries. They're predicting well above average spending for 2022. So Erik, please weigh in here. >> Yeah, a lot to bring up on this one, I'm going to be quick. So 1200 respondents on this, over a third of which were at the C-suite level. So really good data that we brought in, the usual bucket of, you know, fortune 500, global 2000 make up the meat of that median, but it's 8.3% and rising with momentum as we see. What's really interesting right now is that energy and utilities. This is usually like, you know, an orphan stock dividend type of play. You don't see them at the highest point of tech spending. And the reason why right now is really because this state of tech infrastructure in our energy infrastructure needs help. And it's obvious, remember the Florida municipality break reach last year? When they took over the water systems or they had the ability to? And this is a real issue, you know, there's bad nation state actors out there, and I'm no alarmist, but the energy and utility has to spend this money to keep up. It's really important. And then you also hit on the retail consumer. Obviously what's happened, the work from home shift created a shop from home shift, and the trends that are happening right now in retail. If you don't spend and keep up, you're not going to be around much longer. So I think the really two interesting things here to call out are energy utilities, usually a laggard in IT spend and it's leading, and also retail consumer, a lot of changes happening. >> Yeah. Great stuff. I mean, I recall when we entered the pandemic, really ETR was the first to emphasize the impact that work from home was going to have, so I really put a lot of weight on this data. Okay. Our next prediction is we're going to get into security, it's one of our favorite topics. And that is that the number one priority that needs to be addressed by organizations in 2022 is security and you can see, in this slide, the degree to which security is top of mind, relative to some other pretty important areas like cloud, productivity, data, and automation, and some others. Now people may say, "Oh, this is obvious." But I'm going to add some context here, Erik, and then bring you in. First, organizations, they don't have unlimited budgets. And there are a lot of competing priorities for dollars, especially with the digital transformation mandate. And depending on the size of the company, this data will vary. For example, while security is still number one at the largest public companies, and those are of course of the biggest spenders, it's not nearly as pronounced as it is on average, or in, for example, mid-sized companies and government agencies. And this is because midsized companies or smaller companies, they don't have the resources that larger companies do. Larger companies have done a better job of securing their infrastructure. So these mid-size firms are playing catch up and the data suggests cyber is even a bigger priority there, gaps that they have to fill, you know, going forward. And that's why we think there's going to be more demand for MSSPs, managed security service providers. And we may even see some IPO action there. And then of course, Erik, you and I have talked about events like the SolarWinds Hack, there's more ransomware attacks, other vulnerabilities. Just recently, like Log4j in December. All of this has heightened concerns. Now I want to talk a little bit more about how we measure this, you know, relatively, okay, it's an obvious prediction, but let's stick our necks out a little bit. And so in addition to the rise of managed security services, we're calling for M&A and/or IPOs, we've specified some names here on this chart, and we're also pointing to the digital supply chain as an area of emphasis. Again, Log4j really shone that under a light. And this is going to help the likes of Auth0, which is now Okta, SailPoint, which is called out on this chart, and some others. We're calling some winners in end point security. Erik, you're going to talk about sort of that lifecycle, that transformation that we're seeing, that migration to new endpoint technologies that are going to benefit from this reset refresh cycle. So Erik, weigh in here, let's talk about some of the elements of this prediction and some of the names on that chart. >> Yeah, certainly. I'm going to start right with Log4j top of mind. And the reason why is because we're seeing a real paradigm shift here where things are no longer being attacked at the network layer, they're being attacked at the application layer, and in the application stack itself. And that is a huge shift left. And that's taking in DevSecOps now as a real priority in 2022. That's a real paradigm shift over the last 20 years. That's not where attacks used to come from. And this is going to have a lot of changes. You called out a bunch of names in there that are, they're either going to work. I would add to that list Wiz. I would add Orca Security. Two names in our emerging technology study, in addition to the ones you added that are involved in cloud security and container security. These names are either going to get gobbled up. So the traditional legacy names are going to have to start writing checks and, you know, legacy is not fair, but they're in the data center, right? They're, on-prem, they're not cloud native. So these are the names that money is going to be flowing to. So they're either going to get gobbled up, or we're going to see some IPO's. And on the other thing I want to talk about too, is what you mentioned. We have CrowdStrike on that list, We have SentinalOne on the list. Everyone knows them. Our data was so strong on Tanium that we actually went positive for the first time just today, just this morning, where that was released. The trifecta of these are so important because of what you mentioned, under resourcing. We can't have security just tell us when something happens, it has to automate, and it has to respond. So in this next generation of EDR and XDR, an automated response has to happen because people are under-resourced, salaries are really high, there's a skill shortage out there. Security has to become responsive. It can't just monitor anymore. >> Yeah. Great. And we should call out too. So we named some names, Snyk, Aqua, Arctic Wolf, Lacework, Netskope, Illumio. These are all sort of IPO, or possibly even M&A candidates. All right. Our next prediction goes right to the way we work. Again, something that ETR has been on for awhile. We're calling for a major rethink in remote work for 2022. We had predicted last year that by the end of 2021, there'd be a larger return to the office with the norm being around a third of workers permanently remote. And of course the variants changed that equation and, you know, gave more time for people to think about this idea of hybrid work and that's really come in to focus. So we're predicting that is going to overtake fully remote as the dominant work model with only about a third of the workers back in the office full-time. And Erik, we expect a somewhat lower percentage to be fully remote. It's now sort of dipped under 30%, at around 29%, but it's still significantly higher than the historical average of around 15 to 16%. So still a major change, but this idea of hybrid and getting hybrid right, has really come into focus. Hasn't it? >> Yeah. It's here to stay. There's no doubt about it. We started this in March of 2020, as soon as the virus hit. This is the 10th iteration of the survey. No one, no one ever thought we'd see a number where only 34% of people were going to be in office permanently. That's a permanent number. They're expecting only a third of the workers to ever come back fully in office. And against that, there's 63% that are saying their permanent workforce is going to be either fully remote or hybrid. And this, I can't really explain how big of a paradigm shift this is. Since the start of the industrial revolution, people leave their house and go to work. Now they're saying that's not going to happen. The economic impact here is so broad, on so many different areas And, you know, the reason is like, why not? Right? The productivity increase is real. We're seeing the productivity increase. Enterprises are spending on collaboration tools, productivity tools, We're seeing an increased perception in productivity of their workforce. And the CFOs can cut down an expense item. I just don't see a reason why this would end, you know, I think it's going to continue. And I also want to point out these results, as high as they are, were before the Omicron wave hit us. I can only imagine what these results would have been if we had sent the survey out just two or three weeks later. >> Yeah. That's a great point. Okay. Next prediction, we're going to look at the supply chain, specifically in how it's affecting some of the hardware spending and cloud strategies in the future. So in this chart, ETRS buyers, have you experienced problems procuring hardware as a result of supply chain issues? And, you know, despite the fact that some companies are, you know, I would call out Dell, for example, doing really well in terms of delivering, you can see that in the numbers, it's pretty clear, there's been an impact. And that's not not an across the board, you know, thing where vendors are able to deliver, especially acute in PCs, but also pronounced in networking, also in firewall servers and storage. And what's interesting is how companies are responding and reacting. So first, you know, I'm going to call the laptop and PC demand staying well above pre-COVID norms. It had peaked in 2012. Pre-pandemic it kept dropping and dropping and dropping, in terms of, you know, unit volume, where the market was contracting. And we think can continue to grow this year in double digits in 2022. But what's interesting, Erik, is when you survey customers, is despite the difficulty they're having in procuring network hardware, there's as much of a migration away from existing networks to the cloud. You could probably comment on that. Their networks are more fossilized, but when it comes to firewalls and servers and storage, there's a much higher propensity to move to the cloud. 30% of customers that ETR surveyed will replace security appliances with cloud services and 41% and 34% respectively will move to cloud compute and storage in 2022. So cloud's relentless march on traditional on-prem models continues. Erik, what do you make of this data? Please weigh in on this prediction. >> As if we needed another reason to go to the cloud. Right here, here it is yet again. So this was added to the survey by client demand. They were asking about the procurement difficulties, the supply chain issues, and how it was impacting our community. So this is the first time we ran it. And it really was interesting to see, you know, the move there. And storage particularly I found interesting because it correlated with a huge jump that we saw on one of our vendor names, which was Rubrik, had the highest net score that it's ever had. So clearly we're seeing some correlation with some of these names that are there, you know, really well positioned to take storage, to take data into the cloud. So again, you didn't need another reason to, you know, hasten this digital transformation, but here we are, we have it yet again, and I don't see it slowing down anytime soon. >> You know, that's a really good point. I mean, it's not necessarily bad news for the... I mean, obviously you wish that it had no change, would be great, but things, you know, always going to change. So we'll talk about this a little bit later when we get into the Supercloud conversation, but this is an opportunity for people who embrace the cloud. So we'll come back to that. And I want to hang on cloud a bit and share some recent projections that we've made. The next prediction is the big four cloud players are going to surpass 167 billion, an IaaS and PaaS revenue in 2022. We track this. Observers of this program know that we try to create an apples to apples comparison between AWS, Azure, GCP and Alibaba in IaaS and PaaS. So we're calling for 38% revenue growth in 2022, which is astounding for such a massive market. You know, AWS is probably not going to hit a hundred billion dollar run rate, but they're going to be close this year. And we're going to get there by 2023, you know they're going to surpass that. Azure continues to close the gap. Now they're about two thirds of the size of AWS and Google, we think is going to surpass Alibaba and take the number three spot. Erik, anything you'd like to add here? >> Yeah, first of all, just on a sector level, we saw our sector, new survey net score on cloud jumped another 10%. It was already really high at 48. Went up to 53. This train is not slowing down anytime soon. And we even added an edge compute type of player, like CloudFlare into our cloud bucket this year. And it debuted with a net score of almost 60. So this is really an area that's expanding, not just the big three, but everywhere. We even saw Oracle and IBM jump up. So even they're having success, taking some of their on-prem customers and then selling them to their cloud services. This is a massive opportunity and it's not changing anytime soon, it's going to continue. >> And I think the operative word there is opportunity. So, you know, the next prediction is something that we've been having fun with and that's this Supercloud becomes a thing. Now, the reason I say we've been having fun is we put this concept of Supercloud out and it's become a bit of a controversy. First, you know, what the heck's the Supercloud right? It's sort of a buzz-wordy term, but there really is, we believe, a thing here. We think there needs to be a rethinking or at least an evolution of the term multi-cloud. And what we mean is that in our view, you know, multicloud from a vendor perspective was really cloud compatibility. It wasn't marketed that way, but that's what it was. Either a vendor would containerize its legacy stack, shove it into the cloud, or a company, you know, they'd do the work, they'd build a cloud native service on one of the big clouds and they did do it for AWS, and then Azure, and then Google. But there really wasn't much, if any, leverage across clouds. Now from a buyer perspective, we've always said multicloud was a symptom of multi-vendor, meaning I got different workloads, running in different clouds, or I bought a company and they run on Azure, and I do a lot of work on AWS, but generally it wasn't necessarily a prescribed strategy to build value on top of hyperscale infrastructure. There certainly was somewhat of a, you know, reducing lock-in and hedging the risk. But we're talking about something more here. We're talking about building value on top of the hyperscale gift of hundreds of billions of dollars in CapEx. So in addition, we're not just talking about transforming IT, which is what the last 10 years of cloud have been like. And, you know, doing work in the cloud because it's cheaper or simpler or more agile, all of those things. So that's beginning to change. And this chart shows some of the technology vendors that are leaning toward this Supercloud vision, in our view, building on top of the hyperscalers that are highlighted in red. Now, Jerry Chan at Greylock, they wrote a piece called Castles in the Cloud. It got our thinking going, and he and the team at Greylock, they're building out a database of all the cloud services and all the sub-markets in cloud. And that got us thinking that there's a higher level of abstraction coalescing in the market, where there's tight integration of services across clouds, but the underlying complexity is hidden, and there's an identical experience across clouds, and even, in my dreams, on-prem for some platforms, so what's new or new-ish and evolving are things like location independence, you've got to include the edge on that, metadata services to optimize locality of reference and data source awareness, governance, privacy, you know, application independent and dependent, actually, recovery across clouds. So we're seeing this evolve. And in our view, the two biggest things that are new are the technology is evolving, where you're seeing services truly integrate cross-cloud. And the other big change is digital transformation, where there's this new innovation curve developing, and it's not just about making your IT better. It's about SaaS-ifying and automating your entire company workflows. So Supercloud, it's not just a vendor thing to us. It's the evolution of, you know, the, the Marc Andreessen quote, "Every company will be a SaaS company." Every company will deliver capabilities that can be consumed as cloud services. So Erik, the chart shows spending momentum on the y-axis and net score, or presence in the ETR data center, or market share on the x-axis. We've talked about snowflake as the poster child for this concept where the vision is you're in their cloud and sharing data in that safe place. Maybe you could make some comments, you know, what do you think of this Supercloud concept and this change that we're sensing in the market? >> Well, I think you did a great job describing the concept. So maybe I'll support it a little bit on the vendor level and then kind of give examples of the ones that are doing it. You stole the lead there with Snowflake, right? There is no better example than what we've seen with what Snowflake can do. Cross-portability in the cloud, the ability to be able to be, you know, completely agnostic, but then build those services on top. They're better than anything they could offer. And it's not just there. I mean, you mentioned edge compute, that's a whole nother layer where this is coming in. And CloudFlare, the momentum there is out of control. I mean, this is a company that started off just doing CDN and trying to compete with Okta Mite. And now they're giving you a full soup to nuts with security and actual edge compute layer, but it's a fantastic company. What they're doing, it's another great example of what you're seeing here. I'm going to call out HashiCorp as well. They're more of an infrastructure services, a little bit more of an open-source freemium model, but what they're doing as well is completely cloud agnostic. It's dynamic. It doesn't care if you're in a container, it doesn't matter where you are. They recently IPO'd and they're down 25%, but their data looks so good across both of our emerging technology and TISA survey. It's certainly another name that's playing on this. And another one that we mentioned as well is Rubrik. If you need storage, compute, and in the cloud layer and you need to be agnostic to it, they're another one that's really playing in this space. So I think it's a great concept you're bringing up. I think it's one that's here to stay and there's certainly a lot of vendors that fit into what you're describing. >> Excellent. Thank you. All right, let's shift to data. The next prediction, it might be a little tough to measure. Before I said we're trying to be a little black and white here, but it relates to Data Mesh, which is, the ideas behind that term were created by Zhamak Dehghani of ThoughtWorks. And we see Data Mesh is really gaining momentum in 2022, but it's largely going to be, we think, confined to a more narrow scope. Now, the impetus for change in data architecture in many companies really stems from the fact that their Hadoop infrastructure really didn't solve their data problems and they struggle to get more value out of their data investments. Data Mesh prescribes a shift to a decentralized architecture in domain ownership of data and a shift to data product thinking, beyond data for analytics, but data products and services that can be monetized. Now this a very powerful in our view, but they're difficult for organizations to get their heads around and further decentralization creates the need for a self-service platform and federated data governance that can be automated. And not a lot of standards around this. So it's going to take some time. At our power panel a couple of weeks ago on data management, Tony Baer predicted a backlash on Data Mesh. And I don't think it's going to be so much of a backlash, but rather the adoption will be more limited. Most implementations we think are going to use a starting point of AWS and they'll enable domains to access and control their own data lakes. And while that is a very small slice of the Data Mesh vision, I think it's going to be a starting point. And the last thing I'll say is, this is going to take a decade to evolve, but I think it's the right direction. And whether it's a data lake or a data warehouse or a data hub or an S3 bucket, these are really, the concept is, they'll eventually just become nodes on the data mesh that are discoverable and access is governed. And so the idea is that the stranglehold that the data pipeline and process and hyper-specialized roles that they have on data agility is going to evolve. And decentralized architectures and the democratization of data will eventually become a norm for a lot of different use cases. And Erik, I wonder if you'd add anything to this. >> Yeah. There's a lot to add there. The first thing that jumped out to me was that that mention of the word backlash you said, and you said it's not really a backlash, but what it could be is these are new words trying to solve an old problem. And I do think sometimes the industry will notice that right away and maybe that'll be a little pushback. And the problems are what you already mentioned, right? We're trying to get to an area where we can have more assets in our data site, more deliverable, and more usable and relevant to the business. And you mentioned that as self-service with governance laid on top. And that's really what we're trying to get to. Now, there's a lot of ways you can get there. Data fabric is really the technical aspect and data mesh is really more about the people, the process, and the governance, but the two of those need to meet, in order to make that happen. And as far as tools, you know, there's even cataloging names like Informatica that play in this, right? Istio plays in this, Snowflake plays in this. So there's a lot of different tools that will support it. But I think you're right in calling out AWS, right? They have AWS Lake, they have AWS Glue. They have so much that's trying to drive this. But I think the really important thing to keep here is what you said. It's going to be a decade long journey. And by the way, we're on the shoulders of giants a decade ago that have even gotten us to this point to talk about these new words because this has been an ongoing type of issue, but ultimately, no matter which vendors you use, this is going to come down to your data governance plan and the data literacy in your business. This is really about workflows and people as much as it is tools. So, you know, the new term of data mesh is wonderful, but you still have to have the people and the governance and the processes in place to get there. >> Great, thank you for that, Erik. Some great points. All right, for the next prediction, we're going to shine the spotlight on two of our favorite topics, Snowflake and Databricks, and the prediction here is that, of course, Databricks is going to IPO this year, as expected. Everybody sort of expects that. And while, but the prediction really is, well, while these two companies are facing off already in the market, they're also going to compete with each other for M&A, especially as Databricks, you know, after the IPO, you're going to have, you know, more prominence and a war chest. So first, these companies, they're both looking pretty good, the same XY graph with spending velocity and presence and market share on the horizontal axis. And both Snowflake and Databricks are well above that magic 40% red dotted line, the elevated line, to us. And for context, we've included a few other firms. So you can see kind of what a good position these two companies are really in, especially, I mean, Snowflake, wow, it just keeps moving to the right on this horizontal picture, but maintaining the next net score in the Y axis. Amazing. So, but here's the thing, Databricks is using the term Lakehouse implying that it has the best of data lakes and data warehouses. And Snowflake has the vision of the data cloud and data sharing. And Snowflake, they've nailed analytics, and now they're moving into data science in the domain of Databricks. Databricks, on the other hand, has nailed data science and is moving into the domain of Snowflake, in the data warehouse and analytics space. But to really make this seamless, there has to be a semantic layer between these two worlds and they're either going to build it or buy it or both. And there are other areas like data clean rooms and privacy and data prep and governance and machine learning tooling and AI, all that stuff. So the prediction is they'll not only compete in the market, but they'll step up and in their competition for M&A, especially after the Databricks IPO. We've listed some target names here, like Atscale, you know, Iguazio, Infosum, Habu, Immuta, and I'm sure there are many, many others. Erik, you care to comment? >> Yeah. I remember a year ago when we were talking Snowflake when they first came out and you, and I said, "I'm shocked if they don't use this war chest of money" "and start going after more" "because we know Slootman, we have so much respect for him." "We've seen his playbook." And I'm actually a little bit surprised that here we are, at 12 months later, and he hasn't spent that money yet. So I think this prediction's just spot on. To talk a little bit about the data side, Snowflake is in rarefied air. It's all by itself. It is the number one net score in our entire TISA universe. It is absolutely incredible. There's almost no negative intentions. Global 2000 organizations are increasing their spend on it. We maintain our positive outlook. It's really just, you know, stands alone. Databricks, however, also has one of the highest overall net sentiments in the entire universe, not just its area. And this is the first time we're coming up positive on this name as well. It looks like it's not slowing down. Really interesting comment you made though that we normally hear from our end-user commentary in our panels and our interviews. Databricks is really more used for the data science side. The MLAI is where it's best positioned in our survey. So it might still have some catching up to do to really have that caliber of usability that you know Snowflake is seeing right now. That's snowflake having its own marketplace. There's just a lot more to Snowflake right now than there is Databricks. But I do think you're right. These two massive vendors are sort of heading towards a collision course, and it'll be very interesting to see how they deploy their cash. I think Snowflake, with their incredible management and leadership, probably will make the first move. >> Well, I think you're right on that. And by the way, I'll just add, you know, Databricks has basically said, hey, it's going to be easier for us to come from data lakes into data warehouse. I'm not sure I buy that. I think, again, that semantic layer is a missing ingredient. So it's going to be really interesting to see how this plays out. And to your point, you know, Snowflake's got the war chest, they got the momentum, they've got the public presence now since November, 2020. And so, you know, they're probably going to start making some aggressive moves. Anyway, next prediction is something, Erik, that you and I have talked about many, many times, and that is observability. I know it's one of your favorite topics. And we see this world screaming for more consolidation it's going all in on cloud native. These legacy stacks, they're fighting to stay relevant, but the direction is pretty clear. And the same XY graph lays out the players in the field, with some of the new entrants that we've also highlighted, like Observe and Honeycomb and ChaosSearch that we've talked about. Erik, we put a big red target around Splunk because everyone wants their gold. So please give us your thoughts. >> Oh man, I feel like I've been saying negative things about Splunk for too long. I've got a bad rap on this name. The Splunk shareholders come after me all the time. Listen, it really comes down to this. They're a fantastic company that was designed to do logging and monitoring and had some great tool sets around what you could do with it. But they were designed for the data center. They were designed for prem. The world we're in now is so dynamic. Everything I hear from our end user community is that all net new workloads will be going to cloud native players. It's that simple. So Splunk has entrenched. It's going to continue doing what it's doing and it does it really, really well. But if you're doing something new, the new workloads are going to be in a dynamic environment and that's going to go to the cloud native players. And in our data, it is extremely clear that that means Datadog and Elastic. They are by far number one and two in net score, increase rates, adoption rates. It's not even close. Even New Relic actually is starting to, you know, entrench itself really well. We saw New Relic's adoption's going up, which is super important because they went to that freemium model, you know, to try to get their little bit of an entrenched customer base and that's working as well. And then you made a great list here, of all the new entrants, but it goes beyond this. There's so many more. In our emerging technology survey, we're seeing Century, Catchpoint, Securonix, Lucid Works. There are so many options in this space. And let's not forget, the biggest data that we're seeing is with Grafana. And Grafana labs as yet to turn on their enterprise. Elastic did it, why can't Grafana labs do it? They have an enterprise stack. So when you look at how crowded this space is, there has to be consolidation. I recently hosted a panel and every single guy on that panel said, "Please give me a consolidation." Because they're the end users trying to actually deploy these and it's getting a little bit confusing. >> Great. Thank you for that. Okay. Last prediction. Erik, might be a little out of your wheelhouse, but you know, you might have some thoughts on it. And that's a hybrid events become the new digital model and a new category in 2022. You got these pure play digital or virtual events. They're going to take a back seat to in-person hybrids. The virtual experience will eventually give way to metaverse experiences and that's going to take some time, but the physical hybrid is going to drive it. And metaverse is ultimately going to define the virtual experience because the virtual experience today is not great. Nobody likes virtual. And hybrid is going to become the business model. Today's pure virtual experience has to evolve, you know, theCUBE first delivered hybrid mid last decade, but nobody really wanted it. We did Mobile World Congress last summer in Barcelona in an amazing hybrid model, which we're showing in some of the pictures here. Alex, if you don't mind bringing that back up. And every physical event that we're we're doing now has a hybrid and virtual component, including the pre-records. You can see in our studios, you see that the green screen. I don't know. Erik, what do you think about, you know, the Zoom fatigue and all this. I know you host regular events with your round tables, but what are your thoughts? >> Well, first of all, I think you and your company here have just done an amazing job on this. So that's really your expertise. I spent 20 years of my career hosting intimate wall street idea dinners. So I'm better at navigating a wine list than I am navigating a conference floor. But I will say that, you know, the trend just goes along with what we saw. If 35% are going to be fully remote. If 70% are going to be hybrid, then our events are going to be as well. I used to host round table dinners on, you know, one or two nights a week. Now those have gone virtual. They're now panels. They're now one-on-one interviews. You know, we do chats. We do submitted questions. We do what we can, but there's no reason that this is going to change anytime soon. I think you're spot on here. >> Yeah. Great. All right. So there you have it, Erik and I, Listen, we always love the feedback. Love to know what you think. Thank you, Erik, for your partnership, your collaboration, and love doing these predictions with you. >> Yeah. I always enjoy them too. And I'm actually happy. Last year you made us do a baker's dozen, so thanks for keeping it to 10 this year. >> (laughs) We've got a lot to say. I know, you know, we cut out. We didn't do much on crypto. We didn't really talk about SaaS. I mean, I got some thoughts there. We didn't really do much on containers and AI. >> You want to keep going? I've got another 10 for you. >> RPA...All right, we'll have you back and then let's do that. All right. All right. Don't forget, these episodes are all available as podcasts, wherever you listen, all you can do is search Breaking Analysis podcast. Check out ETR's website at etr.plus, they've got a new website out. It's the best data in the industry, and we publish a full report every week on wikibon.com and siliconangle.com. You can always reach out on email, David.Vellante@siliconangle.com I'm @DVellante on Twitter. Comment on our LinkedIn posts. This is Dave Vellante for the Cube Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (mellow music)

Published Date : Jan 22 2022

SUMMARY :

bringing you data-driven and predict the future. So hopefully we can keep to mention that, you know, And this is a real issue, you know, And that is that the number one priority and in the application stack itself. And of course the variants And the CFOs can cut down an expense item. the board, you know, thing interesting to see, you know, and take the number three spot. not just the big three, but everywhere. It's the evolution of, you know, the, the ability to be able to be, and the democratization of data and the processes in place to get there. and is moving into the It is the number one net score And by the way, I'll just add, you know, and that's going to go to has to evolve, you know, that this is going to change anytime soon. Love to know what you think. so thanks for keeping it to 10 this year. I know, you know, we cut out. You want to keep going? This is Dave Vellante for the

ENTITIES

Entity	Category	Confidence
Erik	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jerry Chan	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
March of 2020	DATE	0.99+
Dave Vellante	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Dave	PERSON	0.99+
Marc Andreessen	PERSON	0.99+
Google	ORGANIZATION	0.99+
2022	DATE	0.99+
Tony Baer	PERSON	0.99+
Alex	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
8.3%	QUANTITY	0.99+
2021	DATE	0.99+
December	DATE	0.99+
38%	QUANTITY	0.99+
last year	DATE	0.99+
November, 2020	DATE	0.99+
two	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Last year	DATE	0.99+
Erik Porter Bradley	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
41%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
Mars	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
30%	QUANTITY	0.99+
Netskope	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Boston	LOCATION	0.99+
Grafana	ORGANIZATION	0.99+
63%	QUANTITY	0.99+
Arctic Wolf	ORGANIZATION	0.99+
167 billion	QUANTITY	0.99+
Slootman	PERSON	0.99+
two companies	QUANTITY	0.99+
35%	QUANTITY	0.99+
34%	QUANTITY	0.99+
Snyk	ORGANIZATION	0.99+
70%	QUANTITY	0.99+
Florida	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
4%	QUANTITY	0.99+
Greylock	ORGANIZATION	0.99+

Clemence W. Chee & Christoph Sawade, HelloFresh

(upbeat music) >> Hello everyone. We're here at theCUBE startup showcase made possible by AWS. Thanks so much for joining us today. You know, when Zhamak Dehghani was formulating her ideas around data mesh, she wasn't the only one thinking about decentralized data architectures. HelloFresh was going into hyper-growth mode and realized that in order to support its scale, it needed to rethink how it thought about data. Like many companies that started in the early part of the last decade, HelloFresh relied on a monolithic data architecture and the internal team it had concerns about its ability to support continued innovation at high velocity. The company's data team began to think about the future and work backwards from a target architecture, which possessed many principles of so-called data mesh, even though they didn't use that term specifically. The company is a strong example of an early but practical pioneer of data mesh. Now, there are many practitioners and stakeholders involved in evolving the company's data architecture many of whom are listed here on this slide. Two are highlighted in red and joining us today. We're really excited to welcome you to theCUBE, Clemence Chee, who is the global senior director for data at HelloFresh, and Christoph Sawade, who's the global senior director of data also of course at HelloFresh. Folks, welcome. Thanks so much for making some time today and sharing your story. >> Thank you very much. >> Thanks, Dave. >> All right, let's start with HelloFresh. You guys are number one in the world in your field. You deliver hundreds of millions of meals each year to many, many millions of people around the globe. You're scaling. Christoph, tell us a little bit more about your company and its vision. >> Yeah. Should I start or Clemence? Maybe take over the first piece because Clemence has actually been longer a director at HelloFresh. >> Yeah go ahead Clemence. >> I mean, yes, about approximately six years ago I joined and HelloFresh, and I didn't think about the startup I was joining would eventually IPO. And just two years later, HelloFresh went public. And approximately three years and 10 months after HelloFresh was listed on the German stock exchange which was just last week, HelloFresh was included in the DAX Germany's leading stock market index and that, to mind a great, great milestone, and I'm really looking forward and I'm very excited for the future for HelloFresh and also our data. The vision that we have is to become the world's leading food solution group. And there are a lot of attractive opportunities. So recently we did launch and expand in Norway. This was in July. And earlier this year, we launched the US brand, Green Chef, in the UK as well. We're committed to launch continuously different geographies in the next coming years and have a strong path ahead of us. With the acquisition of ready to eat companies like factor in the US and the plant acquisition of Youfoodz in Australia, we are diversifying our offer, now reaching even more and more untapped customer segments and increase our total address for the market. So by offering customers and growing range of different alternatives to shop food and to consume meals, we are charging towards this vision and this goal to become the world's leading integrated food solutions group. >> Love it. You guys are on a rocket ship. You're really transforming the industry. And as you expand your TAM, it brings us to sort of the data as a core part of that strategy. So maybe you guys could talk a little bit about your journey as a company, specifically as it relates to your data journey. I mean, you began as a startup, you had a basic architecture and like everyone, you've made extensive use of spreadsheets, you built a Hadoop based system that started to grow. And when the company IPO'd, you really started to explode. So maybe describe that journey from a data perspective. >> Yes, Dave. So HelloFresh by 2015, approximately had evolved what amount, a classical centralized data management set up. So we grew very organically over the years, and there were a lot of very smart people around the globe, really building the company and building our infrastructure. This also means that there were a small number of internal and external sources, data sources, and a centralized BI team with a number of people producing different reports, different dashboards and, and products for our executives, for example, or for different operations teams to see a company's performance and knowledge was transferred just by our talking to each other face-to-face conversations. And the people in the data warehouse team were considered as the data wizard or as the ETL wizard. Very classical challenges. And it was ETL, who reserved, indicated the kind of like a style of knowledge of data management, right? So our central data warehouse team then was responsible for different type of verticals in different domains, different geographies. And all this setup gave us in the beginning, the flexibility to grow fast as a company in 2015. >> Christoph, anything to add to that? >> Yes, not explicitly to that one, but as, as Clemence said, right, this was kind of the setup that actually worked for us quite a while. And then in 2017, when HelloFresh went public, the company also grew rapidly. And just to give you an idea how that looked like as well, the tech departments have actually increased from about 40 people to almost 300 engineers. And in the same way as the business units, as there Clemence has described, also grew sustainably. So we continue to launch HelloFresh in new countries, launched new brands like Every Plate, and also acquired other brands like we have Factor. And that grows also from a data perspective, the number of data requests that the central (mumbles), we're getting become more and more and more, and also more and more complex. So that for the team meant that they had a fairly high mental load. So they had to achieve a very, or basically get a very deep understanding about the business and also suffered a lot from this context, switching back and forth. Essentially, they had to prioritize across our product requests from our physical product, digital product, from a physical, from, sorry, from the marketing perspective, and also from the central reporting teams. And in a nutshell, this was very hard for these people, and that altered situations that let's say the solution that we have built. We can not really optimal. So in a, in a, in a, in a nutshell, the central function became a bottleneck and slow down of all the innovation of the company. >> It's a classic case. Isn't it? I mean, Clemence, you see, you see the central team becomes a bottleneck, and so the lines of business, the marketing team, sales teams say "Okay, we're going to take things into our own hands." And then of course IT and the technical team is called in later to clean up the mess. Maybe, maybe I'm overstating it, but, but that's a common situation. Isn't it? >> Yeah this is what exactly happened. Right. So we had a bottleneck, we had those central teams, there was always a bit of tension. Analytics teams then started in those business domains like marketing, supply chain, finance, HR, and so on started really to build their own data solutions. At some point you have to get the ball rolling, right? And then continue the trajectory, which means then that the data pipelines didn't meet the engineering standards. And there was an increased need for maintenance and support from central teams. Hence over time, the knowledge about those pipelines and how to maintain a particular infrastructure, for example, left the company, such that most of those data assets and data sets that turned into a huge debt with decreasing data quality, also decreasing lack of trust, decreasing transparency. And this was an increasing challenge where a majority of time was spent in meeting rooms to align on, on data quality for example. >> Yeah. And the point you were making Christoph about context switching, and this is, this is a point that Zhamak makes quite often as we've, we've, we've contextualized our operational systems like our sales systems, our marketing systems, but not our, our data systems. So you're asking the data team, okay, be an expert in sales, be an expert in marketing, be an expert in logistics, be an expert in supply chain and it's start, stop, start, stop. It's a paper cut environment, and it's just not as productive. But, but, and the flip side of that is when you think about a centralized organization, you think, hey, this is going to be a very efficient way across functional team to support the organization, but it's not necessarily the highest velocity, most effective organizational structure. >> Yeah. So, so I agree with that piece, that's up to a certain scale. A centralized function has a lot of advantages, right? So it's a tool for everyone, which would go to a destined kind of expert team. However, if you see that you actually would like to accelerate that in specific as the type of growth. But you want to actually have autonomy on certain teams and move the teams, or let's say the data to the experts in these teams. And this, as you have mentioned, right, that increases mental load. And you can either internally start splitting your team into different kinds of sub teams focusing on different areas, however, that is then again, just adding another piece where actually collaboration needs to happen because the external seized, so why not bridging that gap immediately and actually move these teams end to end into the, into the function themselves. So maybe just to continue what Clemence was saying, and this is actually where our, so, Clemence and my journey started to become one joint journey. So Clemence was coming actually from one of these teams who builds their own solutions. I was basically heading the platform team called data warehouse team these days. And in 2019, where (mumbles) become more and more serious, I would say, so more and more people have recognized that this model does not really scale, in 2019, basically the leadership of the company came together and identified data as a key strategic asset. And what we mean by that, that if he leveraged it in a, in a, an appropriate way, it gives us a unique, competitive advantage, which could help us to, to support and actually fully automate our decision making process across the entire value chain. So once we, what we're trying to do now, or what we would be aiming for is that HelloFresh is able to build data products that have a purpose. We're moving away from the idea that it's just a bi-product. We have a purpose why we would like to collect this data. There's a clear business need behind that. And because it's so important to, for the company as a business, we also want to provide them as a trustworthy asset to the rest of the organization. We'd say, this is the best customer experience, but at least in a way that users can easily discover, understand and securely access, high quality data. >> Yeah. So, and, and, and Clemence, when you see Zhamak's writing, you see, you know, she has the four pillars and the principles. As practitioners, you look at that say, okay, hey, that's pretty good thinking. And then now we have to apply it. And that's where the devil meets the details. So it's the for, the decentralized data ownership, data as a product, which we'll talk about a little bit, self-serve, which you guys have spent a lot of time on, and Clemence your wheelhouse, which is, which is governance and a federated governance model. And it's almost like if you, if you achieve the first two, then you have to solve for the second two, it almost creates a new challenges, but maybe you could talk about that a little bit as to how it relates to HelloFresh. >> Yes. So Chris has mentioned that we identified kind of a challenge beforehand and said, how can we actually decentralized and actually empower the different colleagues of ours? And this was more a, we realized that it was more an organizational or a cultural change. And this is something that someone also mentioned. I think ThoughtWorks mentioned one of the white papers, it's more of an organizational or a cultural impact. And we kicked off a phased reorganization, or different phases we're currently on, in the middle of still, but we kicked off different phases of organizational restructuring or reorganization trying to lock this data at scale. And the idea was really moving away from ever growing complex matrix organizations or matrix setups and split between two different things. One is the value creation. So basically when people ask the question, what can we actually do? What should we do? This is value creation and the how, which is capability building, and both are equal in authority. This actually then creates a high urge in collaboration and this collaboration breaks up the different silos that were built. And of course, this also includes different needs of staffing for teams staffing with more, let's say data scientists or data engineers, data professionals into those business domains, enhance, or some more capability building. >> Okay, go ahead. Sorry. >> So back to Zhamak Dehghani. So we, the idea also then crossed over when she published her papers in May, 2019. And we thought, well, the four pillars that she described were around decentralized data ownership, product, data as a product mindset, we have a self-service infrastructure. And as you mentioned, federated computational governance. And this suited very much with our thinking at that point of time to reorganize the different teams and this then that to not only organizational restructure, but also in completely new approach of how we need to manage data, through data. >> Got it. Okay. So your businesses is exploding. The data team was having to become domain experts to many areas, constantly context switching as we said, people started to take things into their own hands. So again, we said classic story, but, but you didn't let it get out of control and that's important. And so we, we actually have a picture of kind of where you're going today and it's evolved into this, Pat, if you could bring up the picture with the, the elephant, here we go. So I will talk a little bit about the architecture. It doesn't show it here, the spreadsheet era, but Christoph, maybe you could talk about that. It does show the Hadoop monolith, which exists today. I think that's in a managed hosting service, but, but you, you preserve that piece of it. But if I understand it correctly, everything is evolving to the cloud. I think you're running a lot of this or all of it in AWS. You've got, everybody's got their own data sources. You've got a data hub, which I think is enabled by a master catalog for discovery and all this underlying technical infrastructure that is, is really not the focus of this conversation today. But the key here, if I understand correctly is these domains are autonomous and that not only this required technical thinking, but really supportive organizational mindset, which we're going to talk about today. But, but Christoph, maybe you could address, you know, at a high level, some of the architectural evolution that you guys went through. >> Yeah, sure. Yeah. Maybe it's also a good summary about the entire history. So as you have mentioned, right, we started in the very beginning, it's a monolith on the operational plan, right? Actually it wasn't just one model it was two, one for the backend and one for the front end. And our analytical plan was essentially a couple of spreadsheets. And I think there's nothing wrong with spreadsheets, but it allows you to store information, it allows you to transform data, it allows you to share this information, it allows you to visualize this data, but all kind of, it's not actually separating concern, right? Every single one tool. And this means that it's obviously not scalable, right? You reach the point where this kind of management's set up in, or data management is in one tool, reached elements. So what we have started is we created our data lake, as we have seen here on our dupe. And just in the very beginning actually reflected very much our operation upon this. On top of that, we used Impala as a data warehouse, but there was not really a distinction between what is our data warehouse and what is our data lakes as the Impala was used as kind of both as a kind of engine to create a warehouse and data lake constructed itself. And this organic growth actually led to a situation. As I think it's clear now that we had the centralized model as, for all the domains that were really lose Kimball, the modeling standards and there's new uniformity we used to actually build, in-house, a base of building materialized use, of use that we have used for the presentation there. There was a lot of duplication of effort. And in the end, essentially the amendments and feedback tool, which helped us to, to improve of what we, have built during the end in a natural, as you said, the lack of trust. And this basically was a starting point for us to understand, okay, how can we move away? And there are a lot of different things that we can discuss of apart from this organizational structure that we have set up here, we have three or four pillars from Zhamak. However, there's also the next, extra question around, how do we implement product, right? What are the implications on that level and I think that is, that's something that we are, that we are currently still in progress. >> Got it. Okay. So I wonder if we could talk about, switch gears a little bit, and talk about the organizational and cultural challenges that you faced. What were those conversations like? And let's, let's dig into that a little bit. I want to get into governance as well. >> The conversations on the cultural change. I mean, yes, we went through a hyper growth through the last year, and obviously there were a lot of new joiners, a lot of different, very, very smart people joining the company, which then results that collaborations got a bit more difficult. Of course, the time zone changes. You have different, different artifacts that you had recreated in documentation that were flying around. So we were, we had to build the company from scratch, right? Of course, this then resulted always this tension, which I described before. But the most important part here is that data has always been a very important factor at HelloFresh, and we collected more of this data and continued to improve, use data to improve the different key areas of our business. Even when organizational struggles like the central (mumbles) struggles, data somehow always helped us to grow through this kind of change, right? In the end, those decentralized teams in our local geographies started with solutions that serve the business, which was very, very important. Otherwise, we wouldn't be at the place where we are today, but they did violate best practices and standards. And I always use the sports analogy, Dave. So like any sport, there are different rules and regulations that need to be followed. These routes are defined by, I'll call it, the sports association. And this is what you can think about other data governance and then our compliance team. Now we add the players to it who need to follow those rules and abide by them. This is what we then call data management. Now we have the different players, the professionals they also need to be trained and understand the strategy and the rules before they can play. And this is what I then called data literacy. So we realized that we need to focus on helping our teams to develop those capabilities and teach the standards for how work is being done to truly drive functional excellence in the different domains. And one of our ambition of our data literacy program for example, is to really empower every employee at HelloFresh, everyone, to make the right data-informed decisions by providing data education that scales (mumbles), and that can be different things. Different things like including data capabilities with, in the learning path for example, right? So help them to create and deploy data products, connecting data, producers, and data consumers, and create a common sense and more understanding of each other's dependencies, which is important. For example, SIS, SLO, state of contracts, et cetera, people get more of a sense of ownership and responsibility. Of course, we have to define what it means. What does ownership means? What does responsibility mean? But we are teaching this to our colleagues via individual learning patterns and help them upscale to use also their shared infrastructure, and those self-service data applications. And of all to summarize, we are still in this progress of learning. We're still learning as well. So learning never stops at Hello Fresh, but we are really trying this to make it as much fun as possible. And in the end, we all know user behavior is changed through positive experience. So instead of having massive training programs over endless courses of workshops, leaving our new joiners and colleagues confused and overwhelmed, we're applying gamification, right? So split different levels of certification where our colleagues, can access, have had access points. They can earn badges along the way, which then simplifies the process of learning and engagement of the users. And this is what we see in surveys, for example, where our employees value this gamification approach a lot and are even competing to collect those learning pet badges, to become the number one on the leaderboard. >> I love the gamification. I mean, we've seen it work so well in so many different industries, not the least of which is crypto. So you've identified some of the process gaps that you, you saw, you just gloss over them. Sometimes I say, pave the cow path. You didn't try to force. In other words, a new architecture into the legacy processes, you really had to rethink your approach to data management. So what did that entail? >> To rethink the way of data management, 100%. So if I take the example of revolution, industrial revolution or classical supply chain revolution, but just imagine that you have been riding a horse, for example, your whole life, and suddenly you can operate a car or you suddenly receive just a complete new way of transporting assets from A to B. So we needed to establish a new set of cross-functional business processes to run faster, drive faster, more robustly, and deliver data products which can be trusted and used by downstream processes and systems. Hence we had a subset of new standards and new procedures that would fall into the internal data governance and compliance sector. With internal, I'm always referring to the data operations around new things like data catalog, how to identify ownership, how to change ownership, how to certify data assets, everything around classical is software development, which we now apply to data. This, this is some old and new thinking, right? Deployment, versioning, QA, all the different things, ingestion policies, the deletion procedures, all the things that software development has been doing, we do it now with data as well. And it's simple terms, it's a whole redesign of the supply chain of our data with new procedures and new processes in asset creation, asset management and asset consumption. >> So data's become kind of the new development kit, if you will. I want to shift gears and talk about the notion of data product, and we have a slide that, that we pulled from your deck. And I'd like to unpack it a little bit. I'll just, if you can bring that up, I'll, I'll read it. A data product is a product whose primary objective is to leverage on data to solve customer problems, where customers are both internal and external. so pretty straightforward. I know you've, you've gone much deeper in your thinking and into your organization, but how do you think about that and how do you determine for instance, who owns what, how did you get everybody to agree? >> I can take that one. Maybe let me start as a data product. So I think that's an ongoing debate, right? And I think the debate itself is the important piece here, right? You mentioned the debate, you've clarified what we actually mean by that, a product, and what is actually the mindset. So I think just from a definition perspective, right? I think we find the common denominator that we say, okay, that our product is something which is important for the company that comes with value. What do you mean by that? Okay. It's a solution to a customer problem that delivers ideally maximum value to the business. And yes, leverage is the power of data. And we have a couple of examples, and I'll hit refresh here, the historical and classical ones around dashboards, for example, to monitor our error rates, but also more sophisticated based for example, to incorporate machine learning algorithms in our recipe recommendation. However, I think the important aspects of a data product is A: there is an owner, right? There's someone accountable for making sure that the product that you're providing is actually served and has maintained. And there are, there's someone who's making sure that this actually keeps the value of what we are promising. Combined with the idea of the proper documentation, like a product description, right? The people understand how to use it. What is this about? And related to that piece is the idea of, there's a purpose, right? We need to understand or ask ourselves, okay, why does a thing exist? Does it provide the value that we think it does? Then it leads in to a good understanding of what the life cycle of the data product and product life cycle. What do we mean? Okay. From the beginning, from the creation, you need to have a good understanding. You need to collect feedback. We need to learn about that, you need to rework, and actually finally, also to think about, okay, when is it time to decommission that piece So overall I think the core of this data product is product thinking 101, right? That we start, the point is, the starting point needs to be the problem and not the solution. And this is essentially what we have seen, what was missing, what brought us to this kind of data spaghetti that we have built there in Rush, essentially, we built it. Certain data assets develop in isolation and continuously patch the solution just to fulfill these ad hoc requests that we got and actually really understanding what the stakeholder needs. And the interesting piece as a results in duplication of (mumbled) And this is not just frustrating and probably not the most efficient way, how the company should work. But also if I build the same data assets, but slightly different assumption across the company and multiple teams that leads to data inconsistency. And imagine the following scenario. You, as a management, for management perspective, you're asking basically a specific question and you get essentially from a couple of different teams, different kinds of graphs, different kinds of data and numbers. And in the end, you do not know which ones to trust. So there's actually much (mumbles) but good. You do not know what actually is it noise for times of observing or is it just actually, is there actually a signal that I'm looking for? And the same as if I'm running an AB test, right? I have a new feature, I would like to understand what is the business impact of this feature? I run that with a specific source and an unfortunate scenario. Your production system is actually running on a different source. You see different numbers. What you have seen in the AB test is actually not what you see then in production, typical thing. Then as you asking some analytics team to actually do a deep dive, to understand where the discrepancies are coming from, worst case scenario again, there's a different kind of source. So in the end, it's a pretty frustrating scenario. And it's actually a waste of time of people that have to identify the root cause of this type of divergence. So in a nutshell, the highest degree of consistency is actually achieved if people are just reusing data assets. And also in the end, the meetup talk they've given, right? We start trying to establish this approach by AB testing. So we have a team, but just providing, or is kind of owning their target metric associated business teams, and they're providing that as a product also to other services, including the AB testing team. The AB testing team can use this information to find an interface say, okay, I'm drawing information for the metadata of an experiment. And in the end, after the assignment, after this data collection phase, they can easily add a graph to a dashboard just grouped by the AB testing barrier. And we have seen that also in other companies. So it's not just a nice dream that we have, right? I have actually looked at other companies maybe looked on search and we established a complete KPI pipeline that was computing all these information and this information both hosted by the team and those that (mumbles) AB testing, deep dives and, and regular reporting again. So just one last second, the, the important piece, Now, why I'm coming back to that is that it requires that we are treating this data as a product, right? If we want to have multiple people using the thing that I am owning and building, we have to provide this as a trust (mumbles) asset and in a way that it's easy for people to discover and to actually work with. >> Yeah. And coming back to that. So this is, to me this is why I get so excited about data mesh, because I really do think it's the right direction for organizations. When people hear data product, they think, "Well, what does that mean?" But then when you start to sort of define it as you did, it's using data to add value that could be cutting costs, that could be generating revenue, it could be actually directly creating a product that you monetize. So it's sort of in the eyes of the beholder, but I think the other point that we've made, is you made it earlier on too, and again, context. So when you have a centralized data team and you have all these P&L managers, a lot of times they'll question the data 'cause they don't own it. They're like, "Well, wait a minute." If it doesn't agree with their agenda, they'll attack the data. But if they own the data, then they're responsible for defending that. And that is a mindset change that's really important. And I'm curious is how you got to that ownership. Was it a top-down or was somebody providing leadership? Was it more organic bottom up? Was it a sort of a combination? How do you decide who owned what? In other words, you know, did you get, how did you get the business to take ownership of the data and what does owning the data actually mean? >> That's a very good question, Dave. I think that one of the pieces where I think we have a lot of learning and basically if you ask me how we could stop the filling, I think that would be the first piece that we need to start. Really think about how that should be approached. If it's staff has ownership, right? That means somehow that the team has the responsibility to host themselves the data assets to minimum acceptable standards. That's minimum dependencies up and down stream. The interesting piece has to be looking backwards. What was happening is that under that definition, this extra process that we have to go through is not actually transferring ownership from a central team to the other teams, but actually in most cases to establish ownership. I make this difference because saying we have to transfer ownership actually would erroneously suggest that the dataset was owned before, but this platform team, yes, they had the capability to make the change, but actually the analytics team, but always once we had the business understand the use cases and what no one actually bought, it's actually expensive, expected. So we had to go through this very lengthy process and establishing ownership, how we have done that as in the beginning, very naively started, here's a document, here are all the data assets, what is probably the nearest neighbor who can actually take care of that. And then we, we moved it over. But the problem here is that all these things is kind of technical debt, right? It's not really properly documented, pretty unstable. It was built in a very inconsistent way over years. And these people that built this thing have already left the company. So this is actually not a nice thing that you want to see and people build up a certain resistance, even if they have actually bought into this idea of domain ownership. So if you ask me these learnings, what needs to happen is first, the company needs to really understand what our core business concept that we have the need to have this mapping from this other core business concept that we have. These are the domain teams who are owning this concept, and then actually linked that to the, the assets and integrate that better, but suppose understanding how we can evolve, actually the data assets and new data builds things new and the, in this piece and the domain, but also how can we address reduction of technical depth and stabilizing what we have already. >> Thank you for that Christoph. So I want to turn a direction here and talk Clemence about governance. And I know that's an area that's passionate, you're passionate about. I pulled this slide from your deck, which I kind of messed up a little bit, sorry for that. But, but, but by the way, we're going to publish a link to the full video that you guys did. So we'll share that with folks, but it's one of the most challenging aspects of data mesh. If you're going to decentralize, you, you quickly realize this could be the wild west, as we talked about all over again. So how are you approaching governance? There's a lot of items on this slide that are, you know, underscore the complexity, whether it's privacy compliance, et cetera. So, so how did you approach this? >> It's yeah, it's about connecting those dots, right? So the aim of the data governance program is to promote the autonomy of every team while still ensuring that everybody has the right interoperability. So when we want to move from the wild west, riding horses to a civilized way of transport, I can take the example of modern street traffic. Like when all participants can maneuver independently, and as long as they follow the same rules and standards, everybody can remain compatible with each other and understand and learn from each other so we can avoid car crashes. So when I go from country to country, I do understand what the street infrastructure means. How do I drive my car? I can also read the traffic lights and the different signals. So likewise, as a business in HelloFresh we do operate autonomously and consequently need to follow those external and internal rules and standards set forth by the tradition in which we operate. So in order to prevent a, a car crash, we need to at least ensure compliance with regulations, to account for societies and our customers' increasing concern with data protection and privacy. So teaching and advocating this imaging, evangelizing this to everyone in the company was a key community or communication strategy. And of course, I mean, I mentioned data privacy, external factors, the same goes for internal regulations and processes to help our colleagues to adapt for this very new environment. So when I mentioned before, the new way of thinking, the new way of dealing and managing data, this of course implies that we need new processes and regulations for our colleagues as well. In a nutshell, then this means that data governance provides a framework for managing our people, the processes and technology and culture around our data traffic. And that governance must come together in order to have this effective program providing at least a common denominator is especially critical for shared data sets, which we have across our different geographies managed, and shared applications on shared infrastructure and applications. And as then consumed by centralized processes, for example, master data, everything, and all the metrics and KPIs, which are also used for a central steering. It's a big change, right? And our ultimate goal is to have this non-invasive federated, automated and computational governance. And for that, we can't just talk about it. We actually have to go deep and use case by use case and QC by PUC and generate learnings and learnings with the different teams. And this would be a classical approach of identifying the target structure, the target status, match it with the current status, by identifying together with the business teams, with the different domains and have a risk assessment, for example, to increase transparency because a lot of teams, they might not even know what kind of situation they might be. And this is where this training and this piece of data literacy comes into place, where we go in and trade based on the findings, based on the most valuable use case. And based on that, help our teams to do this change, to increase their capability. I just told a little bit more, I wouldn't say hand-holding, but a lot of guidance. >> Can I kind of kind of chime in quickly and (mumbled) below me, I mean, there's a lot of governance piece, but I think that is important. And if you're talking about documentation, for example, yes, we can go from team to team and tell these people, hey, you have to document your data assets and data catalog, or you have to establish a data contract and so on and forth. But if we would like to build data products at scale, following actual governance, we need to think about automation, right? We need to think about a lot of things that we can learn from engineering before, and just starts as simple things. Like if we would like to build up trust in our data products, right? And actually want to apply the same rigor and the best practices that we know from engineering. There are things that we can do. And we should probably think about what we can copy. And one example might be so the level of service level agreements, so that level objectives. So the level of indicators, right, that represent on a, on an engineering level, right? Are we providing services? They're representing the promises we make to our customer and to our consumers. These are the internal objectives that help us to keep those promises. And actually these audits of, of how we are tracking ourselves, how we are doing. And this is just one example of where I think the federated governance, governance comes into play, right? In an ideal world, you should not just talk about data as a product, but also data product that's code. That'd be say, okay, as most, as much as possible, right? Give the engineers the tool that they are familiar with, and actually not ask the product managers, for example, to document the data assets in the data catalog, but make it part of the configuration has as, as a, as a CDCI continuous delivery pipeline, as we typically see in other engineering, tasks through it and services maybe say, okay, there is configuration, we can think about PII, we can think about data quality monitoring, we can think about the ingestion data catalog and so on and forth. But I think ideally in a data product goals become a sort of templates that can be deployed and are actually rejected or verified at build time before we actually make them and deploy them to production. >> Yeah so it's like DevOps for data product. So, so I'm envisioning almost a three-phase approach to governance. And you're kind of, it sounds like you're in the early phase of it, call it phase zero, where there's learning, there's literacy, there's training education, there's kind of self-governance. And then there's some kind of oversight, some, a lot of manual stuff going on, and then you, you're trying to process builders at this phase and then you codify it and then you can automate it. Is that fair? >> Yeah. I would rather think, think about automation as early as possible in a way, and yes, it needs to be separate rules, but then actually start actually use case by use case. Is there anything that small piece that we can already automate? If just possible roll that out at the next extended step-by-step. >> Is there a role though, that adjudicates that? Is there a central, you know, chief state officer who's responsible for making sure people are complying or is it, how do you handle it? >> I mean, from a, from a, from a platform perspective, yes. This applies in to, to implement certain pieces, that we are saying are important and actually would like to implement, however, that is actually working very closely with the governance department, So it's Clemence's piece to understand that defy the policies that needs to be implemented. >> So good. So Clemence essentially, it's, it's, it's your responsibility to make sure that the policy is being followed. And then as you were saying, Christoph, you want to compress the time to automation as fast as possible. Is that, is that-- >> Yeah, so it's a really, it's a, what needs to be really clear is that it's always a split effort, right? So you can't just do one or the other thing, but there is some that really goes hand in hand because for the right information, for the right engineering tooling, we need to have the transparency first. I mean, code needs to be coded. So we kind of need to operate on the same level with the right understanding. So there's actually two things that are important, which is one it's policies and guidelines, but not only that, because more importantly or equally important is to align with the end-user and tech teams and engineering and really bridge between business value business teams and the engineering teams. >> Got it. So just a couple more questions, because we got to wrap up, I want to talk a little bit about the business outcome. I know it's hard to quantify and I'll talk about that in a moment, but, but major learnings, we've got some of the challenges that, that you cited. I'll just put them up here. We don't have to go detailed into this, but I just wanted to share with some folks, but my question, I mean, this is the advice for your peers question. If you had to do it differently, if you had a do over or a Mulligan, as we like to say for you, golfers, what, what would you do differently? >> I mean, I, can we start with, from, from the transformational challenge that understanding that it's also high load of cultural exchange. I think this is, this is important that a particular communication strategy needs to be put into place and people really need to be supported, right? So it's not that we go in and say, well, we have to change into, towards data mash, but naturally it's the human nature, nature, nature, we are kind of resistant to change, right? And (mumbles) uncomfortable. So we need to take that away by training and by communicating. Chris, you might want to add something to that. >> Definitely. I think the point that I've also made before, right? We need to acknowledge that data mesh it's an architectural scale, right? If you're looking for something which is necessary by huge companies who are vulnerable, that are product at scale. I mean, Dave, you mentioned that right, there are a lot of advantages to have a centralized team, but at some point it may make sense to actually decentralize here. And at this point, right, if you think about data mesh, you have to recognize that you're not building something on a green field. And I think there's a big learning, which is also reflected on the slide is, don't underestimate your baggage. It's typically is you come to a point where the old model doesn't work anymore. And as had a fresh write, we lost the trust in our data. And actually we have seen certain risks of slowing down our innovation. So we triggered that, this was triggering the need to actually change something. So at this transition applies that you took, we have a lot of technical depth accumulated over years. And I think what we have learned is that potentially we have, de-centralized some assets too early. This is not actually taking into account the maturity of the team. We are actually investigating too. And now we'll be actually in the face of correcting pieces of that one, right? But I think if you, if you, if you start from scratch, you have to understand, okay, is all my teams actually ready for taking on this new, this new capability? And you have to make sure that this is decentralization. You build up these capabilities and the teams, and as Clemence has mentioned, right? Make sure that you take the, the people on your journey. I think these are the pieces that also here it comes with this knowledge gap, right? That we need to think about hiring literacy, the technical depth I just talked about. And I think the, the last piece that I would add now, which is not here on the slide deck is also from our perspective, we started on the analytical layer because it was kind of where things are exploding, right? This is the bit where people feel the pain. But I think a lot of the efforts that we have started to actually modernize the current stage and data products, towards data mesh, we've understood that it always comes down basically to a proper shape of our operational plan. And I think what needs to happen is I think we got through a lot of pains, but the learning here is this needs to really be an, a commitment from the company. It needs to have an end to end. >> I think that point, that last point you made is so critical because I, I, I hear a lot from the vendor community about how they're going to make analytics better. And that's not, that's not unimportant, but, but true data product thinking and decentralized data organizations really have to operationalize in order to scale it. So these decisions around data architecture and organization, they're fundamental and lasting, it's not necessarily about an individual project ROI. They're going to be projects, sub projects, you know, within this architecture. But the architectural decision itself is organizational it's cultural and, and what's the best approach to support your business at scale. It really speaks to, to, to what you are, who you are as a company, how you operate and getting that right, as we've seen in the success of data-driven companies is, yields tremendous results. So I'll, I'll, I'll ask each of you to give, give us your final thoughts and then we'll wrap. Maybe. >> Just can I quickly, maybe just jumping on this piece, what you have mentioned, right, the target architecture. If you talk about these pieces, right, people often have this picture of (mumbled). Okay. There are different kinds of stages. We have (incomprehensible speech), we have actually a gesture layer, we have a storage layer, transformation layer, presentation data, and then we are basically putting a lot of technology on top of that. That's kind of our target architecture. However, I think what we really need to make sure is that we have these different kinds of views, right? We need to understand what are actually the capabilities that we need to know, what new goals, how does it look and feel from the different kinds of personas and experience view. And then finally that should actually go to the, to the target architecture from a technical perspective. Maybe just to give an outlook what we are planning to do, how we want to move that forward. Yes. Actually based on our strategy in the, in the sense of we would like to increase the maturity as a whole across the entire company. And this is kind of a framework around the business strategy and it's breaking down into four pillars as well. People meaning the data culture, data literacy, data organizational structure and so on. If you're talking about governance, as Clemence had actually mentioned that right, compliance, governance, data management, and so on, you're talking about technology. And I think we could talk for hours for that one it's around data platform, data science platform. And then finally also about enablements through data. Meaning we need to understand data quality, data accessibility and applied science and data monetization. >> Great. Thank you, Christoph. Clemence why don't you bring us home. Give us your final thoughts. >> Okay. I can just agree with Christoph that important is to understand what kind of maturity people have, but I understand we're at the maturity level, where a company, where people, our organization is, and really understand what does kind of, it's just kind of a change applies to that, those four pillars, for example, what needs to be tackled first. And this is not very clear from the very first beginning (mumbles). It's kind of like green field, you come up with must wins to come up with things that you really want to do out of theory and out of different white papers. Only if you really start conducting the first initiatives, you do understand that you are going to have to put those thoughts together. And where do I miss out on one of those four different pillars, people process technology and governance, but, and then that can often the integration like doing step by step, small steps, by small steps, not pulling the ocean where you're capable, really to identify the gaps and see where either you can fill the gaps or where you have to increase maturity first and train people or increase your tech stack. >> You know, HelloFresh is an excellent example of a company that is innovating. It was not born in Silicon Valley, which I love. It's a global company. And, and I got to ask you guys, it seems like it's just an amazing place to work. Are you guys hiring? >> Yes, definitely. We do. As, as mentioned right as well as one of these aspects distributing and actually hiring as an entire company, specifically for data. I think there are a lot of open roles, so yes, please visit or our page from data engineering, data, product management, and Clemence has a lot of roles that you can speak to about. But yes. >> Guys, thanks so much for sharing with theCUBE audience, you're, you're pioneers, and we look forward to collaborations in the future to track progress, and really want to thank you for your time. >> Thank you very much. >> Thank you very much Dave. >> And thank you for watching theCUBE's startup showcase made possible by AWS. This is Dave Volante. We'll see you next time. (cheerful music)

Published Date : Sep 15 2021

SUMMARY :

and the internal team it had the world in your field. Maybe take over the first and the plant acquisition And as you expand your TAM, the flexibility to grow So that for the team meant and so the lines of business, and so on started really to and the flip side of that say the data to the experts So it's the for, And the idea was really moving away Okay, go ahead. And as you mentioned, federated computational governance. is really not the focus of And in the end, and talk about the organizational And in the end, we all know user behavior not the least of which is crypto. So if I take the example of revolution, of the new development kit, And also in the end, So it's sort of in the the company needs to really but it's one of the most So the aim of the data governance and actually not ask the the early phase of it, that we can already automate? that defy the policies that the time to automation on the same level with the about the business outcome. So it's not that we go in and say, well, efforts that we have started to I hear a lot from the vendor in the sense of we would like Clemence why don't you bring us home. fill the gaps or where you And, and I got to ask you guys, that you can speak to about. collaborations in the future to track And thank you for watching

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Christoph	PERSON	0.99+
Chris	PERSON	0.99+
Christoph Sawade	PERSON	0.99+
2015	DATE	0.99+
Zhamak Dehghani	PERSON	0.99+
Youfoodz	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Clemence Chee	PERSON	0.99+
2019	DATE	0.99+
Norway	LOCATION	0.99+
2017	DATE	0.99+
AWS	ORGANIZATION	0.99+
May, 2019	DATE	0.99+
UK	LOCATION	0.99+
HelloFresh	ORGANIZATION	0.99+
Clemence	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Australia	LOCATION	0.99+
100%	QUANTITY	0.99+
US	LOCATION	0.99+
July	DATE	0.99+
two	QUANTITY	0.99+
Clemence W. Chee	PERSON	0.99+
Two	QUANTITY	0.99+
TAM	ORGANIZATION	0.99+
one	QUANTITY	0.99+
three	QUANTITY	0.99+
Hello Fresh	ORGANIZATION	0.99+
first piece	QUANTITY	0.99+
one tool	QUANTITY	0.99+
last year	DATE	0.99+
last week	DATE	0.99+
two things	QUANTITY	0.99+
Zhamak	PERSON	0.99+
first	QUANTITY	0.99+
two years later	DATE	0.99+
Pat	PERSON	0.99+
second two	QUANTITY	0.99+
one last second	QUANTITY	0.99+
Green Chef	ORGANIZATION	0.99+
One	QUANTITY	0.98+
first two	QUANTITY	0.98+
one example	QUANTITY	0.98+
both	QUANTITY	0.98+
one model	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
four pillars	QUANTITY	0.97+
Every Plate	ORGANIZATION	0.97+
today	DATE	0.97+
each	QUANTITY	0.97+
earlier this year	DATE	0.97+

Mai Lan Tomsen Bukovec | AWS Storage Day 2021

(pensive music) >> Thank you, Jenna, it's great to see you guys and thank you for watching theCUBE's continuous coverage of AWS Storage Day. We're here at The Spheres, it's amazing venue. My name is Dave Vellante. I'm here with Mai-Lan Tomsen Bukovec who's Vice President of Block and Object Storage. Mai-Lan, always a pleasure to see you. Thanks for coming on. >> Nice to see you, Dave. >> It's pretty crazy, you know, this is kind of a hybrid event. We were in Barcelona a while ago, big hybrid event. And now it's, you know, it's hard to tell. It's almost like day-to-day what's happening with COVID and some things are permanent. I think a lot of things are becoming permanent. What are you seeing out there in terms of when you talk to customers, how are they thinking about their business, building resiliency and agility into their business in the context of COVID and beyond? >> Well, Dave, I think what we've learned today is that this is a new normal. These fluctuations that companies are having and supply and demand, in all industries all over the world. That's the new normal. And that has what, is what has driven so much more adoption of cloud in the last 12 to 18 months. And we're going to continue to see that rapid migration to the cloud because companies now know that in the course of days and months, you're, the whole world of your expectations of where your business is going and where, what your customers are going to do, that can change. And that can change not just for a year, but maybe longer than that. That's the new normal. And I think companies are realizing it and our AWS customers are seeing how important it is to accelerate moving everything to the cloud, to continue to adapt to this new normal. >> So storage historically has been, I'm going to drop a box off at the loading dock and, you know, have a nice day. And then maybe the services team is involved in, in a more intimate way, but you're involved every day. So I'm curious as to what that permanence, that new normal, some people call it the new abnormal, but it's the new normal now, what does that mean for storage? >> Dave, in the course of us sitting here over the next few minutes, we're going to have dozens of deployments go out all across our AWS storage services. That means our customers that are using our file services, our transfer services, block and object services, they're all getting improvements as we sit here and talk. That is such a fundamentally different model than the one that you talked about, which is the appliance gets dropped off at the loading dock. It takes a couple months for it to get scheduled for setup and then you have to do data migration to get the data on the new appliance. Meanwhile, we're sitting here and customers storage is just improving, under the hood and in major announcements, like what we're doing today. >> So take us through the sort of, let's go back, 'cause I remember vividly when, when S3 was announced that launched this cloud era and people would, you know, they would do a lot of experimentation of, we were storing, you know, maybe gigabytes, maybe even some terabytes back then. And, and that's evolved. What are you seeing in terms of how people are using data? What are the patterns that you're seeing today? How is that different than maybe 10 years ago? >> I think what's really unique about AWS is that we are the only provider that has been operating at scale for 15 years. And what that means is that we have customers of all sizes, terabytes, petabytes, exabytes, that are running their storage on AWS and running their applications using that storage. And so we have this really unique position of being able to observe and work with customers to develop what they need for storage. And it really breaks down to three main patterns. The first one is what I call the crown jewels, the crown jewels in the cloud. And that pattern is adopted by customers who are looking at the core mission of their business and they're saying to themselves, I actually can't scale this core mission on on-premises. And they're choosing to go to the cloud on the most important thing that their business does because they must, they have to. And so, a great example of that is FINRA, the regulatory body of the US stock exchanges, where, you know, a number of years ago, they took a look at all the data silos that were popping up across their data centers. They were looking at the rate of stock transactions going up and they're saying, we just can't keep up. Not if we want to follow the mission of being the watchdog for consumers, for transactions, for stock transactions. And so they moved that crown jewel of their application to AWS. And what's really interesting Dave, is, as you know, 'cause you've talked to many different companies, it's not technology that stops people from moving to the cloud as quick as they want to, it's culture, it's people, it's processes, it's how businesses work. And when you move the crown jewels into the cloud, you are accelerating that cultural change and that's certainly what FINRA saw. Second thing we see, is where a company will pick a few cloud pilots. We'll take a couple of applications, maybe one or a several across the organization and they'll move that as sort of a reference implementation to the cloud. And then the goal is to try to get the people who did that to generalize all the learning across the company. That is actually a really slow way to change culture. Because, as many of us know, in large organizations, you know, you have, you have some resistance to other organizations changing culture. And so that cloud pilot, while it seems like it would work, it seems logical, it's actually counter-productive to a lot of companies that want to move quickly to the cloud. And the third example is what I think of as new applications or cloud first, net new. And that pattern is where a company or a startup says all new technology initiatives are on the cloud. And we see that for companies like McDonald's, which has transformed their drive up experience by dynamically looking at location orders and providing recommendations. And we see it for the Digital Athlete, which is what the NFL has put together to dynamically take data sources and build these models that help them programmatically simulate risks to player health and put in place some ways to predict and prevent that. But those are the three patterns that we see so many customers falling into depending on what their business wants. >> I like that term, Digital Athlete, my business partner, John Furrier, coined the term tech athlete, you know, years ago on theCUBE. That third pattern seems to me, because you're right, you almost have to shock the system. If you just put your toe in the water, it's going to take too long. But it seems like that third pattern really actually de-risks it in a lot of cases, it's so it's said, people, who's going to argue, oh, the new stuff should be in the cloud. And so, that seems to me to be a very sensible way to approach that, that blocker, if you will, what are your thoughts on that? >> I think you're right, Dave. I think what it does is it allows a company to be able to see the ideas and the technology and the cultural change of cloud in different parts of the organization. And so rather than having a, one group that's supposed to generalize it across an organization, you get it decentralized and adopted by different groups and the culture change just goes faster. >> So you, you bring up decentralization and there's a, there's an emerging trend referred to as a data mesh. It was, it was coined, the term coined by Zhamak Dehghani, a very thought-provoking individual. And the concept is basically the, you know, data is decentralized, and yet we have this tendency to sort of shove it all into, you know, one box or one container, or you could say one cloud, well, the cloud is expanding, it's the cloud is, is decentralizing in many ways. So how do you see data mesh fitting in to those patterns? >> We have customers today that are taking the data mesh architectures and implementing them with AWS services. And Dave, I want to go back to the start of Amazon, when Amazon first began, we grew because the Amazon technologies were built in microservices. Fundamentally, a data mesh is about separation or abstraction of what individual components do. And so if I look at data mesh, really, you're talking about two things, you're talking about separating the data storage and the characteristics of data from the data services that interact and operate on that storage. And with data mesh, it's all about making sure that the businesses, the decentralized business model can work with that data. Now our AWS customers are putting their storage in a centralized place because it's easier to track, it's easier to view compliance and it's easier to predict growth and control costs. But, we started with building blocks and we deliberately built our storage services separate from our data services. So we have data services like Lake Formation and Glue. We have a number of these data services that our customers are using to build that customized data mesh on top of that centralized storage. So really, it's about at the end of the day, speed, it's about innovation. It's about making sure that you can decentralize and separate your data services from your storage so businesses can go faster. >> But that centralized storage is logically centralized. It might not be physically centralized, I mean, we put storage all over the world, >> Mai-Lan: That's correct. >> right? But, but we, to the developer, it looks like it's in one place. >> Mai-Lan: That's right. >> Right? And so, so that's not antithetical to the concept of a data mesh. In fact, it fits in perfectly to the point you were making. I wonder if we could talk a little bit about AWS's storage strategy and it started of course, with, with S3, and that was the focus for years and now of course EBS as well. But now we're seeing, we heard from Wayne this morning, the portfolio is expanding. The innovation is, is accelerating that flywheel that we always talk about. How would you characterize and how do you think about AWS's storage strategy per se? >> We are a dynamically and constantly evolving our AWS storage services based on what the application and the customer want. That is fundamentally what we do every day. We talked a little bit about those deployments that are happening right now, Dave. That is something, that idea of constant dynamic evolution just can't be replicated by on-premises where you buy a box and it sits in your data center for three or more years. And what's unique about us among the cloud services, is again that perspective of the 15 years where we are building applications in ways that are unique because we have more customers and we have more customers doing more things. So, you know, I've said this before. It's all about speed of innovation Dave, time and change wait for no one. And if you're a business and you're trying to transform your business and base it on a set of technologies that change rapidly, you have to use AWS services. Let's, I mean, if you look at some of the launches that we talk about today, and you think about S3's multi-region access points, that's a fundamental change for customers that want to store copies of their data in any number of different regions and get a 60% performance improvement by leveraging the technology that we've built up over, over time, leveraging the, the ability for us to route, to intelligently route a request across our network. That, and FSx for NetApp ONTAP, nobody else has these capabilities today. And it's because we are at the forefront of talking to different customers and that dynamic evolution of storage, that's the core of our strategy. >> So Andy Jassy used to say, oftentimes, AWS is misunderstood and you, you comfortable with that. So help me square this circle 'cause you talked about things you couldn't do on on-prem, and yet you mentioned the relationship with NetApp. You think, look at things like Outposts and Local Zones. So you're actually moving the cloud out to the edge, including on-prem data centers. So, so how do you think about hybrid in that context? >> For us, Dave, it always comes back to what the customer's asking for. And we were talking to customers and they were talking about their edge and what they wanted to do with it. We said, how are we going to help? And so if I just take S3 for Outposts, as an example, or EBS and Outposts, you know, we have customers like Morningstar and Morningstar wants Outposts because they are using it as a step in their journey to being on the cloud. If you take a customer like First Abu Dhabi Bank, they're using Outposts because they need data residency for their compliance requirements. And then we have other customers that are using Outposts to help, like Dish, Dish Networks, as an example, to place the storage as close as account to the applications for low latency. All of those are customer driven requirements for their architecture. For us, Dave, we think in the fullness of time, every customer and all applications are going to be on the cloud, because it makes sense and those businesses need that speed of innovation. But when we build things like our announcement today of FSx for NetApp ONTAP, we build them because customers asked us to help them with their journey to the cloud, just like we built S3 and EBS for Outposts for the same reason. >> Well, when you say over time, you're, you believe that all workloads will be on the cloud, but the cloud is, it's like the universe. I mean, it's expanding. So what's not cloud in the future? When you say on the cloud, you mean wherever you meet customers with that cloud, that includes Outposts, just the programming, it's the programmability of that model, is that correct? That's it, >> That's right. that's what you're talking about? >> In fact, our S3 and EBS Outposts customers, the way that they look at how they use Outposts, it's either as part of developing applications where they'll eventually go the cloud or taking applications that are in the cloud today in AWS regions and running them locally. And so, as you say, this definition of the cloud, you know, it, it's going to evolve over time. But the one thing that we know for sure, is that AWS storage and AWS in general is going to be there one or two steps ahead of where customers are, and deliver on what they need. >> I want to talk about block storage for a moment, if I can, you know, you guys are making some moves in that space. We heard some announcements earlier today. Some of the hardest stuff to move, whether it's cultural or maybe it's just hardened tops, maybe it's, you know, governance edicts, or those really hardcore mission critical apps and workloads, whether it's SAP stuff, Oracle, Microsoft, et cetera. You're clearly seeing that as an opportunity for your customers and in storage in some respects was a blocker previously because of whatever, latency, et cetera, then there's still some, some considerations there. How do you see those workloads eventually moving to the cloud? >> Well, they can move now. With io2 Block Express, we have the performance that those high-end applications need and it's available today. We have customers using them and they're very excited about that technology. And, you know, again, it goes back to what I just said, Dave, we had customers saying, I would like to move my highest performing applications to the cloud and this is what I need from the, from the, the storage underneath them. And that's why we built io2 Block Express and that's how we'll continue to evolve io2 Block Express. It is the first SAN technology in the cloud, but it's built on those core principles that we talked about a few minutes ago, which is dynamically evolving and capabilities that we can add on the fly and customers just get the benefit of it without the cost of migration. >> I want to ask you about, about just the storage, how you think about storage in general, because typically it's been a bucket, you know, it's a container, but it seems, I always say the next 10 years aren't going to be like the last, it seems like, you're really in the data business and you're bringing in machine intelligence, you're bringing in other database technology, this rich set of other services to apply to the data. That's now, there's a lot of data in the cloud and so we can now, whether it's build data products, build data services. So how do you think about the business in that sense? It's no longer just a place to store stuff. It's actually a place to accelerate innovation and build and monetize for your customers. How do you think about that? >> Our customers use the word foundational. Every time they talk about storage, they say for us, it's foundational, and Dave, that's because every business is a data business. Every business is making decisions now on this changing landscape in a world where the new normal means you cannot predict what's going to happen in six months, in a year. And the way that they're making those smart decisions is through data. And so they're taking the data that they have in our storage services and they're using SageMaker to build models. They're, they're using all kinds of different applications like Lake Formation and Glue to build some of the services that you're talking about around authorization and data discovery, to sit on top of the data. And they're able to leverage the data in a way that they have never been able to do before, because they have to. That's what the business world demands today, and that's what we need in the new normal. We need the flexibility and the dynamic foundational storage that we provide in AWS. >> And you think about the great data companies, those were the, you know, trillions in the market cap, their data companies, they put data at their core, but that doesn't mean they shove all the data into a centralized location. It means they have the identity access capabilities, the governance capabilities to, to enable data to be used wherever it needs to be used and, and build that future. That, exciting times we're entering here, Mai-Lan. >> We're just set the start, Dave, we're just at the start. >> Really, what ending do you think we have? So, how do you think about Amazon? It was, it's not a baby anymore. It's not even an adolescent, right? You guys are obviously major player, early adulthood, day one, day zero? (chuckles) >> Dave, we don't age ourself. I think if I look at where we're going for AWS, we are just at the start. So many companies are moving to the cloud, but we're really just at the start. And what's really exciting for us who work on AWS storage, is that when we build these storage services and these data services, we are seeing customers do things that they never thought they could do before. And it's just the beginning. >> I think the potential is unlimited. You mentioned Dish before, I mean, I see what they're doing in the cloud for Telco. I mean, Telco Transformation, that's an industry, every industry, there's a transformation scenario, a disruption scenario. Healthcare has been so reluctant for years and that's happening so quickly, I mean, COVID's certainly accelerating that. Obviously financial services have been super tech savvy, but they're looking at the Fintech saying, okay, how do we play? I mean, there isn't manufacturing with EV. >> Mai-Lan: Government. >> Government, totally. >> It's everywhere, oil and gas. >> There isn't a single industry that's not a digital industry. >> That's right. >> And there's implications for everyone. And it's not just bits and atoms anymore, the old Negroponte, although Nicholas, I think was prescient because he's, he saw this coming, it really is fundamental. Data is fundamental to every business. >> And I think you want, for all of those in different industries, you want to pick the provider where innovation and invention is in our DNA. And that is true, not just for storage, but AWS, and that is driving a lot of the changes you have today, but really what's coming in the future. >> You're right. It's the common editorial factors. It's not just the, the storage of the data. It's the ability to apply other technologies that map into your business process, that map into your organizational skill sets that drive innovation in whatever industry you're in. It's great Mai-Lan, awesome to see you. Thanks so much for coming on theCUBE. >> Great seeing you Dave, take care. >> All right, you too. And keep it right there for more action. We're going to now toss it back to Jenna, Canal and Darko in the studio. Guys, over to you. (pensive music)

Published Date : Sep 2 2021

SUMMARY :

it's great to see you guys And now it's, you know, it's hard to tell. in the last 12 to 18 months. the loading dock and, you know, than the one that you talked about, and people would, you know, and they're saying to themselves, coined the term tech athlete, you know, and the cultural change of cloud And the concept is and it's easier to predict But that centralized storage it looks like it's in one place. to the point you were making. is again that perspective of the 15 years the cloud out to the edge, in the fullness of time, it's the programmability of that's what you're talking about? definition of the cloud, you know, Some of the hardest stuff to move, and customers just get the benefit of it lot of data in the cloud and the dynamic foundational and build that future. We're just set the start, Dave, So, how do you think about Amazon? And it's just the beginning. doing in the cloud for Telco. It's everywhere, that's not a digital industry. Data is fundamental to every business. the changes you have today, It's the ability to Great seeing you Dave, Jenna, Canal and Darko in the studio.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Jenna	PERSON	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
three	QUANTITY	0.99+
FINRA	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
one	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Barcelona	LOCATION	0.99+
Nicholas	PERSON	0.99+
60%	QUANTITY	0.99+
Mai-Lan	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
15 years	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
NFL	ORGANIZATION	0.99+
Morningstar	ORGANIZATION	0.99+
McDonald's	ORGANIZATION	0.99+
Wayne	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
third example	QUANTITY	0.99+
First Abu Dhabi Bank	ORGANIZATION	0.99+
three patterns	QUANTITY	0.99+
two things	QUANTITY	0.99+
Lake Formation	ORGANIZATION	0.99+
third pattern	QUANTITY	0.99+
two steps	QUANTITY	0.99+
10 years ago	DATE	0.99+
six months	QUANTITY	0.98+
Glue	ORGANIZATION	0.98+
one box	QUANTITY	0.98+
Mai-Lan Tomsen Bukovec	PERSON	0.98+
one container	QUANTITY	0.98+
first one	QUANTITY	0.98+
Darko	PERSON	0.97+
today	DATE	0.97+
first	QUANTITY	0.97+
EBS	ORGANIZATION	0.97+
Second thing	QUANTITY	0.96+
NetApp	TITLE	0.96+
S3	TITLE	0.95+
Telco Transformation	ORGANIZATION	0.95+
Block	ORGANIZATION	0.94+
Fintech	ORGANIZATION	0.94+
years ago	DATE	0.93+
a year	QUANTITY	0.92+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Zhamak: