Is Data Mesh the Killer App for Supercloud | Supercloud2

(gentle bright music) >> Okay, welcome back to our "Supercloud 2" event live coverage here at stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We've got exclusive news and a scoop here for SiliconANGLE and theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called NextData.com NextData, she's a cube alumni and contributor to our Supercloud initiative, as well as our coverage and breaking analysis with Dave Vellante on data, the killer app for Supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> John: Wonderful. Your contributions to the data conversation has been well-documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing, you know, cold water on it. Some are, I think, it's the next big thing. Tell us about the data mesh super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, it's, you know, the pain point that it surfaced were universal. Everybody said, "Oh, why didn't I think of that?" You know, it was just an obvious next step and people are approaching it, implementing it. I guess the last few years, I've been involved in many of those implementations, and I guess Supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include boundaries, organizational boundaries cloud technology boundaries and trust boundaries. >> I want to bring that up because your venture, NextData which is new, just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Zhamak: Absolutely, yes. So next data is the result of, I suppose, the pains that I suffered from implementing a database for many of the organizations. Basically, a lot of organizations that I've worked with, they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find, I guess, the common denominator that solves those problems and enables that developer experience for data sharing. >> John: Since you just announced the news, what's been the reaction? >> Zhamak: I just announced the news right now, so what's the reaction? >> John: But people in the industry that know you, you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth modes, so we haven't publicly talked about it, but folks that have been close to us in fact have reached out. We already have implementations of our pilot platform with early customers, which is super exciting. And we're going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those where we are going to have multiple pilots, implementations of our platform in real world. We're real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak: When I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally obviously not surprising. They don't include the big vision of inclusivity across clouds across different data stores. But it seems like people are having to go through some gymnastics to get to, you know, the organizational reality of decentralizing data, and at least pushing data ownership to the line of business. How are you approaching or are you approaching, solving that problem? Are you taking a narrow slice? What can you tell us about Next Data? >> Zhamak: Sure, yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that, you know, the data, as you know, resides on different platforms. It's owned by different people, it's processed by pipelines that who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem, the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in this autonomous units, we call them data products, I guess in data mesh, right? That constitutes computation, that governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is, you know, data in different places, decentralization and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation, APIs to get to it in a technology agnostic way, in an open way. And then, sit on top and use existing existing tech, you know, Snowflake, Databricks, whatever exists, you know, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here, that the language, and the modeling that we use is really native to data mesh is that I will make a data product, I'm sharing a data product, and that encapsulates on providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side connected to peer-to-peer data sharing with data product as a primitive first class concept. >> Okay, so the idea would be developers would build applications leveraging those data products which are discoverable and governed. Now, today you see some companies, you know, take a snowflake for example. >> Zhamak: Yeah. >> Attempting to do that within their own little walled garden. They even, at one point, used the term, "Mesh." I dunno if they pull back on that. And then they sort of became aware of some of your work. But a lot of the things that they're doing within their little insulated environment, you know, support that, that, you know, governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realize that, you know, and this is a reality, like you go to organizations, they have a snowflake and half of the organization happily operates on Snowflake. And on the other half, oh, we are on, you know, bare infrastructure on AWS, or we are on Databricks. This is the realities, you know, this Supercloud that's written up here. It's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use next data mesh operating system. People will have different platforms." So you have to build with openness in mind, and in case of Snowflake, I think, you know, they have I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So, it's worth reviewing that basically, the concept of data mesh is that, whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember, I wrote a blog post in 2007 called, "Data's the new developer kit." Back then, they used to call 'em developer kits, if you remember. And that we said at that time, whoever can code data >> Zhamak: Yes. >> Will have a competitive advantage. >> Aren't there machines going to be doing that? Didn't we just hear that? >> Well we have, and you know, Hey Siri, hey Cube. Find me that best video for data mesh. There it is. I mean, this is the point, like what's happening is that, now, data has to be addressable >> Zhamak: Yes. >> For machines and for coding. >> Zhamak: Yes. >> Because as you need to call the data. So the question is, how do you manage the complexity of big things as promiscuous as possible, making it available as well as then governing it because it's a trade off. The more you make open >> Zhamak: Definitely. >> The better the machine learning. >> Zhamak: Yes. >> But yet, the governance issue, so this is the, you need an OS to handle this maybe. >> Yes, well, we call our mental model for our platform is an OS operating system. Operating systems, you know, have shown us how you can kind of abstract what's complex and take care of, you know, a lot of complexities, but yet provide an open and, you know, dynamic enough interface. So we think about it that way. We try to solve the problem of policies live with the data. An enforcement of the policies happens at the most granular level which is, in this concept, the data product. And that would happen whether you read, write, or access a data product. But we can never imagine what are these policies could be. So our thinking is, okay, we should have a open policy framework that can allow organizations write their own policy drivers, and policy definitions, and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, you know, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now, the primitives that we work with to train machine-learning model are still bits and bites in data. They're fields, rows, columns, right? And that creates quite a large surface area, an attack area for, you know, for privacy of the data. So perhaps, one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area so you can really leave the control of the data to the sovereign owners of that data, right? So that data product. So I think the evolution of our data APIs perhaps will become more and more computational. So you describe what you want, and the data owner decides, you know, how to manage the- >> John: That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment with you, who's a machine learning, could really been around the industry. It's almost as if you're starting to see reason come into the data, reasoning. It's like you starting to see not just metadata, using the data to reason so that you don't have to expose the raw data. It's almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that 'cause that seems to be where the dots are connecting. >> Zhamak: Yes, this is perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge-making mode, however, by just the basic notion of saying, "I'm going to put an API in front of my data, and that API today might be as primitive as a level of indirection as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage, and insert all of this intelligence that need to happen. And then I will, today, I will still give you a file. But by just defining that API and standardizing it, now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can kind of evolve that, right? Now you have a point of evolution to this very futuristic, I guess, future where you just describe the question that you're asking from the chat. >> Well, this is the Supercloud, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so, his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has, and he wants your feedback on this, "Is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products, how do you respond to that? How do you see, is that a problem or is that something that is overstated, or do you have an answer for that?" >> Zhamak: Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not burdening them with the complexity of the application and application logic, and yet reducing their cognitive load by localizing what they need to know about which is that domain where they're operating within. Because what's happening right now? what's happening right now is that data engineers, a ton of empathy for them for their high threshold of pain that they can, you know, deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curates it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers, these are still separately moving units. Your app and your data products are independent but yet tightly closed with each other, tightly coupled with each other based on the context of the domain, so reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application but yet have them them separate from app because app provides a very different service. Transactional data for my e-commerce transaction, data product provides a very different service, longitudinal data for the, you know, variety of this intelligent analysis that I can do on the data. But yet, it's all within the domain of e-commerce or sales or whatnot. >> So a lot of decoupling and coupling create that cohesiveness. >> Zhamak: Absolutely. >> Architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server, data center days and cloud, SRE, Google coined the term, "Site Reliability Engineer" for someone to look over the hundreds of thousands of servers. We asked a question to data engineering community who have been suffering, by the way, agree. Is there an SRE-like role for data? Because in a way, data engineering, that platform engineer, they are like the SRE for data. In other words, managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Zhamak: Yes, exactly. So, maybe we go through that history of how SRE came to be. So we had the first DevOps movement which was, remove the wall between dev and ops and bring them together. So you have one cross-functional units of the organization that's responsible for, you build it you run it, right? So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that, and then we said, "Okay, as we decentralized and had this many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing and running a lot while giving autonomy to this cross-functional team." And that's where the SRE, a new generation of engineers came to exist. So I think if I just look- >> Hence Borg, hence Kubernetes. >> Hence, hence, exactly. Hence chaos engineering, hence embracing the complexity and messiness, right? And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think, if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain oriented cross-functional teams, full stop, and still have a very advanced maybe at the platform infrastructure level kind of operational team that they're not busy doing two jobs which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> John: So you see similarities. >> Absolutely, but I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening yet. Eh, a little bit fast and loose with some complexities to clean up. >> Yes, yes. This is a different restructure. As you said we, you know, the job of this industry as a whole on architects is decompose, recompose, decompose, recomposing a new way, and now we're like decomposing centralized team, recomposing them as domains and- >> John: So is data mesh the killer app for Supercloud? >> You had to do this for me. >> Dave: Sorry, I couldn't- (John and Dave laughing) >> Zhamak: What do you want me to say, Dave? >> John: Yes. >> Zhamak: Yes of course. >> I mean Supercloud, I think it's, really the terminology's Supercloud, Opencloud. But I think, in spirits of it, this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> John: Well thank you so much for coming on Supercloud too, really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. (John laughs) >> John: That's now going well. We can move faster, so thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Okay, Supercloud 2 live here in Palo Alto. Our stage performance, I'm John Furrier with Dave Vellante. We're back with more after this short break, Stay with us all day for Supercloud 2. (gentle bright music)

Published Date : Feb 17 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few years, What's the pain point? a database for many of the organizations. in terms of the approach, but folks that have been close to us to get to, you know, the data, as you know, resides Okay, so the idea would be developers But a lot of the things that they're doing This is the realities, you know, inside of the data. And that we said at that Well we have, and you know, So the question is, how do so this is the, you need and the data owner decides, you know, so that you don't have 'cause that seems to be where of this API, you not So the concern that he has, into the domain closer to So a lot of decoupling So I have to ask you, this a lot of the complexity of domains and the infrastructure, in a more early days of that movement. to clean up. the job of this industry the world would work. John: Well thank you so much for coming Dave: Been a great catalyst. We can move faster, so Thank you for hosting me. after this short break,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Zhamak	PERSON	0.99+
Dave	PERSON	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
Palo Alto	LOCATION	0.99+
John Furrier	PERSON	0.99+
John Furrier	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dav	PERSON	0.99+
two jobs	QUANTITY	0.99+
Supercloud	ORGANIZATION	0.99+
NextData	ORGANIZATION	0.99+
today	DATE	0.99+
Opencloud	ORGANIZATION	0.99+
last year	DATE	0.99+
Siri	TITLE	0.99+
ThoughtWorks	ORGANIZATION	0.98+
NextData.com	ORGANIZATION	0.98+
Supercloud 2	EVENT	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
HelloFresh	ORGANIZATION	0.98+
first	QUANTITY	0.98+
millions of dollars	QUANTITY	0.96+
Snowflake	EVENT	0.96+
Oracle	ORGANIZATION	0.96+
SRE	TITLE	0.94+
Snowflake	ORGANIZATION	0.94+
Cube	PERSON	0.93+
Zhama	PERSON	0.92+
Data Mesh the Killer App	TITLE	0.92+
SiliconANGLE	ORGANIZATION	0.91+
Databricks	ORGANIZATION	0.9+
first class	QUANTITY	0.89+
Supercloud 2	ORGANIZATION	0.88+
theCUBE	ORGANIZATION	0.88+
hundreds of thousands	QUANTITY	0.85+
one point	QUANTITY	0.84+
Zham	PERSON	0.83+
Supercloud	EVENT	0.83+
ChatGPT	ORGANIZATION	0.72+
SRE	ORGANIZATION	0.72+
Borg	PERSON	0.7+
Snowflake	TITLE	0.66+
Supercloud	TITLE	0.65+
half	QUANTITY	0.64+

Is Data Mesh the Next Killer App for Supercloud?

(upbeat music) >> Welcome back to our Supercloud 2 event live coverage here of stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We got exclusive news and a scoop here for SiliconANGLE in theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called Nextdata.com, Nextdata. She's a cube alumni and contributor to our supercloud initiative, as well as our coverage and Breaking Analysis with Dave Vellante on data, the killer app for supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> Wonderful. Your contributions to the data conversation has been well documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing cold water on it. Some are thinking it's the next big thing. Tell us about the data mesh, super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, the pain point that it surface were universal. Everybody said, "Oh, why didn't I think of that?" It was just an obvious next step and people are approaching it, implementing it. I guess the last few years I've been involved in many of those implementations and I guess supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include organizational boundaries, cloud technology boundaries, and trust boundaries. >> I want to bring that up because your venture, Nextdata, which is new just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Absolutely. Yes, so Nextdata is the result of, I suppose the pains that I suffered from implementing data mesh for many of the organizations. Basically a lot of organizations that I've worked with they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data, and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find the, I guess the common denominator that solves those problems and enables that developer experience for data sharing. >> Since you just announced the news, what's been the reaction? >> I just announced the news right now, so what's the reaction? >> But people in the industry know you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth mode so we haven't publicly talked about it, but folks that have been close to us, in fact have reached that we already have implementations of our pilot platform with early customers, which is super exciting. And we going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those, but we are going to have multiple pilot implementations of our platform in real world where real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak, when I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients, helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally, obviously not surprising, they don't include the big vision of inclusivity across clouds, across different data storage. But it seems like people are having to go through some gymnastics to get to the organizational reality of decentralizing data and at least pushing data ownership to the line of business. How are you approaching, or are you approaching solving that problem? Are you taking a narrow slice? What can you tell us about Nextdata? >> Yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that the data as you know resides on different platforms, it's owned by different people, is processed by pipelines that who knows who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in these autonomous units. We call them data products, I guess in data mesh. That constitutes computation. That governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is data in different places, decentralization, and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation APIs to get to it in a technology agnostic way, in an open way. And then sit on top and use existing tech, Snowflake, Databricks, whatever exists, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here. The language and the modeling that we use is really native to data mesh, which is that I'm building a data product I'm sharing a data product, and that encapsulates I'm providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side, connected to peer-to-peer data sharing with data product as a primitive first class concept. >> So the idea would be developers would build applications leveraging those data products, which are discoverable and governed. Now today you see some companies, take a Snowflake for example, attempting to do that within their own little walled garden. They even at one point used the term mesh. I don't know if they pull back on that. And then they became aware of some of your work. But a lot of the things that they're doing within their little insulated environment support that governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realized that, and this is a reality, like you go to organizations, they have a Snowflake and half of the organization happily operates on Snowflake. And on the other half, "oh, we are on Bare infrastructure on AWS or we are on Databricks." This is the reality. This supercloud that's written up here, it's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use Nextdata, data mesh operating system. People will have different platforms." So you have to build with openness in mind and in case of Snowflake, I think, they have very, I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So it's worth reviewing that basically the concept of data mesh is that whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember I wrote a blog post in 2007 called "Data as the New Developer Kit" back then we used to call them developer kits if you remember. And that we said at that time, whoever can code data will have a competitive advantage. >> Aren't the machines going to be doing that? Didn't we just hear that? >> Well, we have. Hey, Siri. Hey, Cube, find me that best video for data mesh. There it is. But this is the point, like what's happening is that now data has to be addressable. for machines and for coding because as you need to call the data. So the question is how do you manage the complexity of big things as promiscuous as possible, making it available, as well as then governing it? Because it's a trade off. The more you make open, the better the machine learning. But yet the governance issue, so this is the, you need an OS to handle this maybe. >> Yes. So yes, well we call, our mental model for our platform is an OS operating system. Operating systems have shown us how you can abstract what's complex and take care of a lot of complexities, but yet provide an open and dynamic enough interface. So we think about it that way. Just, we try to solve the problem of policies live with the data, an enforcement of the policies happens at the most granular level, which is in this concept of the data product. And that would happen whether you read, write or access a data product. But we can never imagine what are these policies could be. So our thinking is we should have a policy, open policy framework that can allow organizations write their own policy drivers and policy definitions and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now the primitives that we work with to train machine learning model are still bits and bytes and data. They're fields, rows, columns and that creates quite a large surface area and attack area for privacy of the data. So perhaps one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area. So you can really leave the control of the data to the sovereign owners of that data. So that data product. So I think that evolution of our data APIs perhaps will become more and more computational. So you describe what you want and the data owner decides how to manage. >> That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment we had with you. It was a machine learning have been around the industry. It's almost as if you're starting to see reason come into, the data reasoning is like starting to see not just metadata. Using the data to reason so that you don't have to expose the raw data. So almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that? 'Cause that seems to be where the dots are connecting. >> Yes, perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge making mode. However, by just the basic notion of saying, "I'm going to put an API in front of my data." And that API today might be as primitive as a level of indirection, as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage and insert all of this intelligence that need to happen. And then today, I will still give you a file. But by just defining that API and standardizing it now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can evolve that. Now you have a point of evolution to this very futuristic, I guess, future where you just described the question that you're asking from the ChatGPT. >> Well, this is the supercloud, go ahead, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has and he wants your feedback on this, is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products? How do you respond to that? How do you see? Is that a problem? Is that something that is overstated or do you have an answer for that? >> Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not overburdening them with the complexity of the application and application logic and yet reducing their cognitive load by localizing what they need to know about, which is that domain where they're operating within. Because what's happening right now? What's happening right now is that data engineers with, a ton of empathy for them for their high threshold of pain that they can deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curate it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers. These are still separately moving units. Your app and your data products are independent, but yet tightly closed with each other, tightly coupled with each other based on the context of the domain. So reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application, but yet have them separate from app because app provides a very different service. Transactional data for my e-commerce transaction. Data product provides a very different service. Longitudinal data for the variety of this intelligent analysis that I can do on the data. But yet it's all within the domain of e-commerce or sales or whatnot. >> It's a lot of decoupling and coupling create that cohesiveness architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server data center days and cloud, SRE, Google coined the term, site reliability engineer, for someone to look over the hundreds of thousands of servers. We asked the question to data engineering community who have been suffering, by the way, I agree. Is there an SRE like role for data? Because in a way data engineering, that platform engineer, they are like the SRE for data. In other words managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Yes, exactly. So maybe we go through that history of how SRE came to be. So we had the first DevOps movement, which was remove the wall between dev and ops and bring them together. So you have one unit of one cross-functional units of the organization that's responsible for you build it, you run it. So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that and then we said, okay, there is a ton, as we decentralized and had these many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing, and running a lot while giving autonomy to this cross-functional team. And that's where the SRE, a new generation of engineers came to exist. So I think if I just look at. >> Hence, Kubernetes. >> Hence, hence, exactly. Hence, chaos engineering. Hence, embracing the complexity and messiness. And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain-oriented cross-functional teams full stop and still have a very advanced maybe at the platform level, infrastructure level operational team that they're not busy doing two jobs, which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> So you see similarities? >> I see, absolutely. But I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening, yet a little bit fast and loose with some complexities to clean up. >> Yes. This is a different restructure. As you said, the job of this industry as a whole, an architect, is decompose recompose, decompose recompose in new way and now we're like decomposing centralized team, recomposing them as domains. >> So is data mesh the killer app for supercloud? >> You had to do this to me. >> Sorry, I couldn't resist. >> I know. Of course you want me to say this. >> Yes. >> Yes, of course. I mean, supercloud, I think it's really, the terminology supercloud, open cloud, but I think in spirits of it this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> Well, thank you so much for coming on Supercloud 2. We really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. >> That's now going well. We can move faster. So thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Supercloud 2 live here in Palo Alto, our stage performance. I'm John Furrier with Dave Vellante. We'll back with more after this short break. Stay with us all day for Supercloud 2. (upbeat music)

Published Date : Jan 25 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few What's the pain point? for many of the organizations. But people in the industry know you did but folks that have been close to us, at least the ones that I've is that the data as you know But a lot of the things that they're doing and half of the organization that basically the concept of data mesh And that we said at that time, is that now data has to be addressable. and the data owner decides how to manage. the data reasoning is like starting to see 'Cause that seems to be where What's a logic that I need to go Well, this is the So the concern that he has into the domain closer to We asked the question to of the organization that's responsible So I think if we look at that evolution, in a more early days of that movement. So it's a data DevOps As you said, the job of Of course you want me to say this. assume the world would work. the conversation and exposed So thanks for coming on. Thank you for hosting me. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
George Gilbert	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Nextdata	ORGANIZATION	0.99+
Zhamak	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
one	QUANTITY	0.99+
Nextdata.com	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
today	DATE	0.99+
HelloFresh	ORGANIZATION	0.99+
ThoughtWorks	ORGANIZATION	0.99+
last year	DATE	0.99+
Supercloud 2	EVENT	0.99+
Oracle	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Siri	TITLE	0.98+
Cube	PERSON	0.98+
Databricks	ORGANIZATION	0.98+
Snowflake	ORGANIZATION	0.97+
Supercloud	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one unit	QUANTITY	0.97+
Snowflake	TITLE	0.96+
SRE	TITLE	0.95+
millions of dollars	QUANTITY	0.94+
first class	QUANTITY	0.94+
hundreds of thousands of servers	QUANTITY	0.92+
supercloud	ORGANIZATION	0.92+
one point	QUANTITY	0.92+
Supercloud 2	TITLE	0.89+
ChatGPT	ORGANIZATION	0.81+
half	QUANTITY	0.81+
Data Mesh the Next Killer App	TITLE	0.78+
supercloud	TITLE	0.75+
a ton	QUANTITY	0.73+
Supercloud 2	ORGANIZATION	0.72+
SiliconANGLE	ORGANIZATION	0.7+
DevOps	TITLE	0.66+
Snowflake	EVENT	0.59+
S3	TITLE	0.54+
last	DATE	0.54+
supercloud	EVENT	0.48+
Kubernetes	TITLE	0.47+

Breaking Analysis: Technology & Architectural Considerations for Data Mesh

>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)

Published Date : Apr 20 2022

SUMMARY :

bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.

ENTITIES

Entity	Category	Confidence
Kristen Martin	PERSON	0.99+
Rob Hoff	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Dave	PERSON	0.99+
Zhamak	PERSON	0.99+
one	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 lakes	QUANTITY	0.99+
Sanji Mohan	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Andrew	PERSON	0.99+
two	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Data Mesh: Delivering Data-Driven Value at Scale	TITLE	0.99+
Boston	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
14 plus years	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two points	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
second layer	QUANTITY	0.99+
2016	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
today	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
hundreds of lakes	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
theCUBE Studios	ORGANIZATION	0.98+
SQL	TITLE	0.98+
one unit	QUANTITY	0.98+
first	QUANTITY	0.98+
100 level	QUANTITY	0.98+
third point	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
three buckets	QUANTITY	0.98+
ETR	ORGANIZATION	0.98+
DevStack	TITLE	0.97+
One	QUANTITY	0.97+
wikibon.com	OTHER	0.97+
both	QUANTITY	0.97+
Thoughtworks	ORGANIZATION	0.96+
one set	QUANTITY	0.96+
one stream	QUANTITY	0.96+
Intuit	ORGANIZATION	0.95+
one way	QUANTITY	0.93+
two worlds	QUANTITY	0.93+
HelloFresh	ORGANIZATION	0.93+
this week	DATE	0.93+
last night	DATE	0.91+
fourth one	QUANTITY	0.91+
Snowflake	TITLE	0.91+
two different models	QUANTITY	0.91+
ML Analytics	TITLE	0.91+
Breaking Analysis	TITLE	0.87+
two worlds	QUANTITY	0.84+

Breaking Analysis: Data Mesh...A New Paradigm for Data Management

from the cube studios in palo alto in boston bringing you data driven insights from the cube and etr this is breaking analysis with dave vellante data mesh is a new way of thinking about how to use data to create organizational value leading edge practitioners are beginning to implement data mesh in earnest and importantly data mesh is not a single tool or a rigid reference architecture if you will rather it's an architectural and organizational model that's really designed to address the shortcomings of decades of data challenges and failures many of which we've talked about on the cube as important by the way it's a new way to think about how to leverage data at scale across an organization and across ecosystems data mesh in our view will become the defining paradigm for the next generation of data excellence hello and welcome to this week's wikibon cube insights powered by etr in this breaking analysis we welcome the founder and creator of data mesh author thought leader technologist jamaak dagani shamak thank you for joining us today good to see you hi dave it's great to be here all right real quick let's talk about what we're going to cover i'll introduce or reintroduce you to jamaac she joined us earlier this year in our cube on cloud program she's the director of emerging tech at dot works north america and a thought leader practitioner software engineer architect and a passionate advocate for decentralized technology solutions and and data architectures and jamaa since we last had you on as a guest which was less than a year ago i think you've written two books in your spare time one on data mesh and another called software architecture the hard parts both published by o'reilly so how are you you've been busy i've been busy yes um good it's been a great year it's been a busy year i'm looking forward to the end of the year and the end of these two books but it's great to be back and um speaking with you well you got to be pleased with the the momentum that data mesh has and let's just jump back to the agenda for a bit and get that out of the way we're going to set the stage by sharing some etr data our partner our data partner on the spending profile and some of the key data sectors and then we're going to review the four key principles of data mesh just it's always worthwhile to sort of set that framework we'll talk a little bit about some of the dependencies and the data flows and we're really going to dig today into principle number three and a bit around the self-service data platforms and to that end we're going to talk about some of the learnings that shamak has captured since she embarked on the datamess journey with her colleagues and her clients and we specifically want to talk about some of the successful models for building the data mesh experience and then we're going to hit on some practical advice and we'll wrap with some thought exercises maybe a little tongue-in-cheek some of the community questions that we get so the first thing i want to do we'll just get this out of the way is introduce the spending climate we use this xy chart to do this we do this all the time it shows the spending profiles and the etr data set for some of the more data related sectors of the ecr etr taxonomy they they dropped their october data last friday so i'm using the july survey here we'll get into the october survey in future weeks but about 1500 respondents i don't see a dramatic change coming in the october survey but the the y-axis is net score or spending momentum the horizontal axis is market share or presence in the data set and that red line that 40 percent anything over that we consider elevated so for the past eight quarters or so we've seen machine learning slash ai rpa containers and cloud is the four areas where cios and technology buyers have shown the highest net scores and as we've said what's so impressive for cloud is it's both pervasive and it shows high velocity from a spending standpoint and we plotted the three other data related areas database edw analytics bi and big data and storage the first two well under the red line are still elevated the storage market continues to kind of plot along and we've we've plotted the outsourced it just to balance it out for context that's an area that's not so hot right now so i just want to point out that these areas ai automation containers and cloud they're all relevant to data and they're fundamental building blocks of data architectures as are the two that are directly related to data database and analytics and of course storage so it just gives you a picture of the spending sector so i wanted to share this slide jamark uh that that we presented in that you presented in your webinar i love this it's a taxonomy put together by matt turk who's a vc and he called this the the mad landscape machine learning and ai and data and jamock the key point here is there's no lack of tooling you've you've made the the data mesh concept sort of tools agnostic it's not like we need more tools to succeed in data mesh right absolutely great i think we have plenty of tools i think what's missing is a meta architecture that defines the landscape in a way that it's in step with organizational growth and then defines that meta architecture in a way that these tools can actually interoperable and to operate and integrate really well like the the clients right now have a lot of challenges in terms of picking the right tool regardless of the technology they go down the path either they have to go in and big you know bite into a big data solution and then try to fit the other integrated solutions around it or as you see go to that menu of large list of applications and spend a lot of time trying to kind of integrate and stitch this tooling together so i'm hoping that data mesh creates that kind of meta architecture for tools to interoperate and plug in and i think our conversation today around self-subjective platform um hopefully eliminate that yeah we'll definitely circle back because that's one of the questions we get all the time from the community okay let's review the four main principles of data mesh for those who might not be familiar with it and those who are it's worth reviewing jamar allow me to introduce them and then we can discuss a bit so a big frustration i hear constantly from practitioners is that the data teams don't have domain context the data team is separated from the lines of business and as a result they have to constantly context switch and as such there's a lack of alignment so principle number one is focused on putting end-to-end data ownership in the hands of the domain or what i would call the business lines the second principle is data as a product which does cause people's brains to hurt sometimes but it's a key component and if you start sort of thinking about it you'll and talking to people who have done it it actually makes a lot of sense and this leads to principle number three which is a self-serve data infrastructure which we're going to drill into quite a bit today and then the question we always get is when we introduce data meshes how to enforce governance in a federated model so let me bring up a more detailed slide jamar with the dependencies and ask you to comment please sure but as you said the the really the root cause we're trying to address is the siloing of the data external to where the action happens where the data gets produced where the data needs to be shared when the data gets used right in the context of the business so it's about the the really the root cause of the centralization gets addressed by distribution of the accountability end to end back to the domains and these domains this distribution of accountability technical accountability to the domains have already happened in the last you know decade or so we saw the transition from you know one general i.t addressing all of the needs of the organization to technology groups within the itu or even outside of the iit aligning themselves to build applications and services that the different business units need so what data mesh does it just extends that model and say okay we're aligning business with the tech and data now right so both application of the data in ml or inside generation in the domains related to the domain's needs as well as sharing the data that the domains are generating with the rest of the organization but the moment you do that then you have to solve other problems that may arise and that you know gives birth to the second principle which is about um data as a product as a way of preventing data siloing happening within the domain so changing the focus of the domains that are now producing data from i'm just going to create that data i collect for myself and that satisfy my needs to in fact the responsibility of domain is to share the data as a product with all of the you know wonderful characteristics that a product has and i think that leads to really interesting architectural and technical implications of what actually constitutes state has a product and we can have a separate conversation but once you do that then that's the point in the conversation that cio says well how do i even manage the cost of operation if i decentralize you know building and sharing data to my technical teams to my application teams do i need to go and hire another hundred data engineers and i think that's the role of a self-serve data platform in a way that it enables and empowers generalist technologies that we already have in the technical domains the the majority population of our developers these days right so the data platform attempts to mobilize the generalist technologies to become data producers to become data consumers and really rethink what tools these people need um and the last last principle so data platform is really to giving autonomy to domain teams and empowering them and reducing the cost of ownership of the data products and finally as you mentioned the question around how do i still assure that these different data products are interoperable are secure you know respecting privacy now in a decentralized fashion right when we are respecting the sovereignty or the domain ownership of um each domain and that leads to uh this idea of both from operational model um you know applying some sort of a federation where the domain owners are accountable for interoperability of their data product they have incentives that are aligned with global harmony of the data mesh as well as from the technology perspective thinking about this data is a product with a new lens with a lens that all of those policies that need to be respected by these data products such as privacy such as confidentiality can we encode these policies as computational executable units and encode them in every data product so that um we get automation we get governance through automation so that's uh those that's the relationship the complex relationship between the four principles yeah thank you for that i mean it's just a couple of points there's so many important points in there but the idea of the silos and the data as a product sort of breaking down those cells because if you have a product and you want to sell more of it you make it discoverable and you know as a p l manager you put it out there you want to share it as opposed to hide it and then you know this idea of managing the cost you know number three where people say well centralize and and you can be more efficient but that but that essentially was the the failure in your other point related point is generalist versus specialist that's kind of one of the failures of hadoop was you had these hyper specialist roles emerge and and so you couldn't scale and so let's talk about the goals of data mesh for a moment you've said that the objective is to extend exchange you call it a new unit of value between data producers and data consumers and that unit of value is a data product and you've stated that a goal is to lower the cognitive load on our brains i love this and simplify the way in which data are presented to both producers and consumers and doing so in a self-serve manner that eliminates the tapping on the shoulders or emails or raising tickets so how you know i'm trying to understand how data should be used etc so please explain why this is so important and how you've seen organizations reduce the friction across the data flows and the interconnectedness of things like data products across the company yeah i mean this is important um as you mentioned you know initially when this whole idea of a data-driven innovation came to exist and we needed all sorts of you know technology stacks we we centralized um creation of the data and usage of the data and that's okay when you first get started with where the expertise and knowledge is not yet diffused and it's only you know the privilege of a very few people in the organization but as we move to a data driven um you know innovation cycle in the organization as we learn how data can unlock new new programs new models of experience new products then it's really really important as you mentioned to get the consumers and producers talk to each other directly without a broker in the middle because even though that having that centralized broker could be a cost-effective model but if you if we include uh the cost of missed opportunity for something that we could have innovated well we missed that opportunity because of months of looking for the right data then that cost parented the cost benefit parameters and formula changes so um so to to have that innovation um really embedded data-driven innovation embedded into every domain every team we need to enable a model where the producer can directly peer-to-peer discover the data uh use it understand it and use it so the litmus test for that would be going from you know a hypothesis that you know as a data scientist i think there is a pattern and there is an insight in um you know in in the customer behavior that if i have access to all of the different informations about the customer all of the different touch points i might be able to discover that pattern and personalize the experience of my customer the liquid stuff is going from that hypothesis to finding all of the different sources be able to understanding and be able to connect them um and then turn them them into you know training of my machine learning and and the rest is i guess known as an intelligent product got it thank you so i i you know a lot of what we do here in breaking it house is we try to curate and then point people to new resources so we will have some additional resources because this this is not superficial uh what you and your colleagues in the community are creating but but so i do want to you know curate some of the other material that you had so if i bring up this next chart the left-hand side is a curated description both sides of your observations of most of the monolithic data platforms they're optimized for control they serve a centralized team that has hyper-specialized roles as we talked about the operational stacks are running running enterprise software they're on kubernetes and the microservices are isolated from let's say the spark clusters you know which are managing the analytical data etc whereas the data mesh proposes much greater autonomy and the management of code and data pipelines and policy as independent entities versus a single unit and you've made this the point that we have to enable generalists to borrow from so many other examples in the in the industry so it's an architecture based on decentralized thinking that can really be applied to any domain really domain agnostic in a way yes and i think if i pick one key point from that diagram is really um or that comparison is the um the the the data platforms or the the platform capabilities need to present a continuous experience from an application developer building an application that generates some data let's say i have an e-commerce application that generates some data to the data product that now presents and shares that data as as temporal immutable facts that can be used for analytics to the data scientist that uses that data to personalize the experience to the deployment of that ml model now back to that e-commerce application so if we really look at this continuous journey um the walls between these separate platforms that we have built needs to come down the platforms underneath that generate you know that support the operational systems versus supported data platforms versus supporting the ml models they need to kind of play really nicely together because as a user i'll probably fall off the cliff every time i go through these stages of this value stream um so then the interoperability of our data solutions and operational solutions need to increase drastically because so far we've got away with running operational systems an application on one end of the organization running and data analytics in another and build a spaghetti pipeline to you know connect them together neither of the ends are happy i hear from data scientists you know data analyst pointing finger at the application developer saying you're not developing your database the right way and application point dipping you're saying my database is for running my application it wasn't designed for sharing analytical data so so we've got to really what data mesh as a mesh tries to do is bring these two world together closer because and then the platform itself has to come closer and turn into a continuous set of you know services and capabilities as opposed to this disjointed big you know isolated stacks very powerful observations there so we want to dig a little bit deeper into the platform uh jamar can have you explain your thinking here because it's everybody always goes to the platform what do i do with the infrastructure what do i do so you've stressed the importance of interfaces the entries to and the exits from the platform and you've said you use a particular parlance to describe it and and this chart kind of shows what you call the planes not layers the planes of the platform it's complicated with a lot of connection points so please explain these planes and how they fit together sure i mean there was a really good point that you started with that um when we think about capabilities or that enables build of application builds of our data products build their analytical solutions usually we jump too quickly to the deep end of the actual implementation of these technologies right do i need to go buy a data catalog or do i need you know some sort of a warehouse storage and what i'm trying to kind of elevate us up and out is to to to force us to think about interfaces and apis the experiences that the platform needs to provide to run this secure safe trustworthy you know performance mesh of data products and if you focus on then the interfaces the implementation underneath can swap out right you can you can swap one for the other over time so that's the purpose of like having those lollipops and focusing and emphasizing okay what is the interface that provides a certain capability like the storage like the data product life cycle management and so on the purpose of the planes the mesh experience playing data product expense utility plan is really giving us a language to classify different set of interfaces and capabilities that play nicely together to provide that cohesive journey of a data product developer data consumer so then the three planes are really around okay at the bottom layer we have a lot of utilities we have that mad mac turks you know kind of mad data tooling chart so we have a lot of utilities right now they they manage workflow management you know they they do um data processing you've got your spark link you've got your storage you've got your lake storage you've got your um time series of storage you've got a lot of tooling at that level but the layer that we kind of need to imagine and build today we don't buy yet as as long as i know is this linger that allows us to uh exchange that um unit of value right to build and manage these data products so so the language and the apis and interface of this product data product experience plan is not oh i need this storage or i need that you know workflow processing is that i have a data product it needs to deliver certain types of data so i need to be able to model my data it needs to as part of this data product i need to write some processing code that keeps this data constantly alive because it's receiving you know upstream let's say user interactions with a website and generating the profile of my user so i need to be able to to write that i need to serve the data i need to keep the data alive and i need to provide a set of slos and guarantees for my data so that good documentation so that some you know someone who comes to data product knows but what's the cadence of refresh what's the retention of the data and a lot of other slos that i need to provide and finally i need to be able to enforce and guarantee certain policies in terms of access control privacy encryption and so on so as a data product developer i just work with this unit a complete autonomous self-contained unit um and the platform should give me ways of provisioning this unit and testing this unit and so on that's why kind of i emphasize on the experience and of course we're not dealing with one or two data product we're dealing with a mesh of data products so at the kind of mesh level experience we need a set of capabilities and interfaces to be able to search the mesh for the right data to be able to explore the knowledge graph that emerges from this interconnection of data products need to be able to observe the mesh for any anomalies did we create one of these giant master data products that all the data goes into and all the data comes out of how we found ourselves the bottlenecks to be able to kind of do those level machine level capabilities we need to have a certain level of apis and interfaces and once we decide and decide what constitutes that to satisfy this mesh experience then we can step back and say okay now what sort of a tool do i need to build or buy to satisfy them and that's that is not what the data community or data part of our organizations used to i think traditionally we're very comfortable with buying a tool and then changing the way we work to serve to serve the tool and this is slightly inverse to that model that we might be comfortable with right and pragmatists will will to tell you people who've implemented data match they'll tell you they spent a lot of time on figuring out data as a product and the definitions there the organizational the getting getting domain experts to actually own the data and and that's and and they will tell you look the technology will come and go and so to your point if you have those lollipops and those interfaces you'll be able to evolve because we know one thing's for sure in this business technology is going to change um so you you had some practical advice um and i wanted to discuss that for those that are thinking about data mesh i scraped this slide from your presentation that you made and and by the way we'll put links in there your colleague emily who i believe is a data scientist had some really great points there as well that that practitioners should dig into but you made a couple of points that i'd like you to summarize and to me that you know the big takeaway was it's not a one and done this is not a 60-day project it's a it's a journey and i know that's kind of cliche but it's so very true here yes um this was a few starting points for um people who are embarking on building or buying the platform that enables the people enables the mesh creation so it was it was a bit of a focus on kind of the platform angle and i think the first one is what we just discussed you know instead of thinking about mechanisms that you're building think about the experiences that you're enabling uh identify who are the people like what are the what is the persona of data scientists i mean data scientist has a wide range of personas or did a product developer the same what is the persona i need to develop today or enable empower today what skill sets do they have and and so think about experience as mechanisms i think we are at this really magical point i mean how many times in our lifetime we come across a complete blanks you know kind of white space to a degree to innovate so so let's take that opportunity and use a bit of a creativity while being pragmatic of course we need solutions today or yesterday but but still think about the experiences not not mechanisms that you need to buy so that was kind of the first step and and the nice thing about that is that there is an evolutionary there is an iterative path to maturity of your data mesh i mean if you start with thinking about okay which are the initial use cases i need to enable what are the data products that those use cases depend on that we need to unlock and what is the persona of my or general skill set of my data product developer what are the interfaces i need to enable you can start with the simplest possible platform for your first two use cases and then think about okay the next set of data you know data developers they have a different set of needs maybe today i just enable the sql-like querying of the data tomorrow i enable the data scientists file based access of the data the day after i enable the streaming aspect so so have this evolutionary kind of path ahead of you and don't think that you have to start with building out everything i mean one of the things we've done is taking this harvesting approach that we work collaboratively with those technical cross-functional domains that are building the data products and see how they are using those utilities and harvesting what they are building as the solutions for themselves back into the back into the platform but at the end of the day we have to think about mobilization of the large you know largest population of technologies we have we'd have to think about diffusing the technology and making it available and accessible by the generous technologies that you know and we've come a long way like we've we've gone through these sort of paradigm shifts in terms of mobile development in terms of functional programming in terms of cloud operation it's not that we are we're struggling with learning something new but we have to learn something that works nicely with the rest of the tooling that we have in our you know toolbox right now so so again put that generalist as the uh as one of your center personas not the only person of course we will have specialists of course we will always have data scientists specialists but any problem that can be solved as a general kind of engineering problem and i think there's a lot of aspects of data michigan that can be just a simple engineering problem um let's just approach it that way and then create the tooling um to empower those journalists great thank you so listen i've i've been around a long time and so as an analyst i've seen many waves and we we often say language matters um and so i mean i've seen it with the mainframe language it was different than the pc language it's different than internet different than cloud different than big data et cetera et cetera and so we have to evolve our language and so i was going to throw a couple things out here i often say data is not the new oil because because data doesn't live by the laws of scarcity we're not running out of data but i get the analogy it's powerful it powered the industrial economy but it's it's it's bigger than that what do you what do you feel what do you think when you hear the data is the new oil yeah i don't respond to those data as the gold or oil or whatever scarce resource because as you said it evokes a very different emotion it doesn't evoke the emotion of i want to use this i want to utilize it feels like i need to kind of hide it and collect it and keep it to myself and not share it with anyone it doesn't evoke that emotion of sharing i really do think that data and i with it with a little asterisk and i think the definition of data changes and that's why i keep using the language of data product or data quantum data becomes the um the most important essential element of existence of uh computation what do i mean by that i mean that you know a lot of applications that we have written so far are based on logic imperative logic if this happens do that and else do the other and we're moving to a world where those applications generating data that we then look at and and the data that's generated becomes the source the patterns that we can exploit to build our applications as in you know um curate the weekly playlist for dave every monday based on what he has listened to and the you know other people has listened to based on his you know profile so so we're moving to the world that is not so much about applications using the data necessarily to run their businesses that data is really truly is the foundational building block for the applications of the future and then i think in that we need to rethink the definition of the data and maybe that's for a different conversation but that's that's i really think we have to converge the the processing that the data together the substance substance and the processing together to have a unit that is uh composable reusable trustworthy and that's that's the idea behind the kind of data product as an atomic unit of um what we build from future solutions got it now something else that that i heard you say or read that really struck me because it's another sort of often stated phrase which is data is you know our most valuable asset and and you push back a little bit on that um when you hear people call data and asset people people said often have said they think data should be or will eventually be listed as an asset on the balance sheet and i i in hearing what you said i thought about that i said well you know maybe data as a product that's an income statement thing that's generating revenue or it's cutting costs it's not necessarily because i don't share my my assets with people i don't make them discoverable add some color to this discussion i think so i think it's it's actually interesting you mentioned that because i read the new policy in china that cfos actually have a line item around the data that they capture we don't have to go to the political conversation around authoritarian of um collecting data and the power that that creates and the society that leads to but that aside that big conversation little conversation aside i think you're right i mean the data as an asset generates a different behavior it's um it creates different performance metrics that we would measure i mean before conversation around data mesh came to you know kind of exist we were measuring the success of our data teams by the terabytes of data they were collecting by the thousands of tables that they had you know stamped as golden data none of that leads to necessarily there's no direct line i can see between that and actually the value that data generated but if we invert that so that's why i think it's rather harmful because it leads to the wrong measures metrics to measure for success so if you invert that to a bit of a product thinking or something that you share to delight the experience of users your measures are very different your measures are the the happiness of the user they decrease lead time for them to actually use and get value out of it they're um you know the growth of the population of the users so it evokes a very different uh kind of behavior and success metrics i do say if if i may that i probably come back and regret the choice of word around product one day because of the monetization aspect of it but maybe there is a better word to use but but that's the best i think we can use at this point in time why do you say that jamar because it's too directly related to monetization that has a negative connotation or it might might not apply in things like healthcare or you know i think because if we want to take your shortcuts and i remember this conversation years back that people think that the reason to you know kind of collect data or have data so that we can sell it you know it's just the monetization of the data and we have this idea of the data market places and so on and i think that is actually the least valuable um you know outcome that we can get from thinking about data as a product that direct cell an exchange of data as a monetary you know exchange of value so so i think that might redirect our attention to something that really matters which is um enabling using data for generating ultimately value for people for the customers for the organizations for the partners as opposed to thinking about it as a unit of exchange for for money i love data as a product i think you were your instinct was was right on and i think i'm glad you brought that up because because i think people misunderstood you know in the last decade data as selling data directly but you really what you're talking about is using data as a you know ingredient to actually build a product that has value and value either generate revenue cut costs or help with a mission like it could be saving lives but in some way for a commercial company it's about the bottom line and that's just the way it is so i i love data as a product i think it's going to stick so one of the other things that struck me in one of your webinars was one of the q a one of the questions was can i finally get rid of my data warehouse so i want to talk about the data warehouse the data lake jpmc used that term the data lake which some people don't like i know john furrier my business partner doesn't like that term but the data hub and one of the things i've learned from sort of observing your work is that whether it's a data lake a data warehouse data hub data whatever it's it should be a discoverable node on the mesh it really doesn't matter the the technology what are your your thoughts on that yeah i think the the really shift is from a centralized data warehouse to data warehouse where it fits so i think if you just cross that centralized piece uh we are all in agreement that data warehousing provides you know interesting and capable interesting capabilities that are still required perhaps as a edge node of the mesh that is optimizing for certain queries let's say financial reporting and we still want to direct a fair bit of data into a node that is just for those financial reportings and it requires the precision and the um you know the speed of um operation that the warehouse technology provides so i think um definitely that technology has a place where it falls apart is when you want to have a warehouse to rule you know all of your data and model canonically model your data because um it you have to put so much energy into you know kind of try to harness this model and create this very complex the complex and fragile snowflake schemas and so on that that's all you do you spend energy against the entropy of your organization to try to get your arms around this model and the model is constantly out of step with what's happening in reality because reality the model the reality of the business is moving faster than our ability to model everything into into uh into one you know canonical representation i think that's the one we need to you know challenge not necessarily application of data warehousing on a node i want to close by coming back to the issues of standards um you've specifically envisioned data mesh to be technology agnostic as i said before and of course everyone myself included we're going to run a vendor's technology platform through a data mesh filter the reality is per the matt turc chart we showed earlier there are lots of technologies that that can be nodes within the data mesh or facilitate data sharing or governance etc but there's clearly a lack of standardization i'm sometimes skeptical that the vendor community will drive this but maybe like you know kubernetes you know google or some other internet giant is going to contribute something to open source that addresses this problem but talk a little bit more about your thoughts on standardization what kinds of standards are needed and where do you think they'll come from sure i mean the you write that the vendors are not today incentivized to create those open standards because majority of the vet not all of them but some vendors operational model is about bring your data to my platform and then bring your computation to me uh and all will be great and and that will be great for a portion of the clients and portion of environments where that complexity we're talking about doesn't exist so so we need yes other players perhaps maybe um some of the cloud providers or people that are more incentivized to open um open their platform in a way for data sharing so as a starting point i think standardization around data sharing so if you look at the spectrum right now we have um a de facto sound it's not even a standard for something like sql i mean everybody's bastardized to call and extended it with so many things that i don't even know what this standard sql is anymore but we have that for some form of a querying but beyond that i know for example folks at databricks to start to create some standards around delta sharing and sharing the data in different models so i think data sharing as a concept the same way that apis were about capability sharing so we need to have the data apis or analytical data apis and data sharing extended to go beyond simply sql or languages like that i think we need standards around computational prior policies so this is again something that is formulating in the operational world we have a few standards around how do you articulate access control how do you identify the agents who are trying to access with different authentication mechanism we need to bring some of those our ad our own you know our data specific um articulation of policies uh some something as simple as uh identity management across different technologies it's non-existent so if you want to secure your data across three different technologies there is no common way of saying who's the agent that is acting uh to act to to access the data can i authenticate and authorize them so so those are some of the very basic building blocks and then the gravy on top would be new standards around enriched kind of semantic modeling of the data so we have a common language to describe the semantic of the data in different nodes and then relationship between them we have prior work with rdf and folks that were focused on i guess linking data across the web with the um kind of the data web i guess work that we had in the past we need to revisit those and see their practicality in the enterprise con context so so data modeling a rich language for data semantic modeling and data connectivity most importantly i think those are some of the items on my wish list that's good well we'll do our part to try to keep the standards you know push that push that uh uh movement jamaica we're going to leave it there i'm so grateful to have you uh come on to the cube really appreciate your time it's just always a pleasure you're such a clear thinker so thanks again thank you dave that's it's wonderful to be here now we're going to post a number of links to some of the great work that jamark and her team and her books and so you check that out because we remember we publish each week on siliconangle.com and wikibon.com and these episodes are all available as podcasts wherever you listen listen to just search breaking analysis podcast don't forget to check out etr.plus for all the survey data do keep in touch i'm at d vallante follow jamac d z h a m a k d or you can email me at david.velante at siliconangle.com comment on the linkedin post this is dave vellante for the cube insights powered by etrbwell and we'll see you next time you

Published Date : Oct 25 2021

SUMMARY :

all of the you know wonderful

ENTITIES

Entity	Category	Confidence
60-day	QUANTITY	0.99+
one	QUANTITY	0.99+
40 percent	QUANTITY	0.99+
matt turk	PERSON	0.99+
two books	QUANTITY	0.99+
china	LOCATION	0.99+
thousands of tables	QUANTITY	0.99+
dave vellante	PERSON	0.99+
jamaac	PERSON	0.99+
google	ORGANIZATION	0.99+
siliconangle.com	OTHER	0.99+
tomorrow	DATE	0.99+
yesterday	DATE	0.99+
october	DATE	0.99+
boston	LOCATION	0.99+
first step	QUANTITY	0.98+
jamar	PERSON	0.98+
today	DATE	0.98+
jamaica	PERSON	0.98+
both sides	QUANTITY	0.98+
shamak	PERSON	0.98+
dave	PERSON	0.98+
jamark	PERSON	0.98+
first one	QUANTITY	0.98+
o'reilly	ORGANIZATION	0.98+
both	QUANTITY	0.97+
each week	QUANTITY	0.97+
john furrier	PERSON	0.97+
second principle	QUANTITY	0.97+
jamaak dagani shamak	PERSON	0.96+
less than a year ago	DATE	0.96+
earlier this year	DATE	0.96+
three different technologies	QUANTITY	0.96+
jamaa	PERSON	0.95+
each domain	QUANTITY	0.95+
terabytes of data	QUANTITY	0.94+
three planes	QUANTITY	0.94+
july	DATE	0.94+
last decade	DATE	0.93+
about 1500 respondents	QUANTITY	0.93+
decades	QUANTITY	0.93+
first	QUANTITY	0.93+
first two	QUANTITY	0.93+
dot works	ORGANIZATION	0.93+
one key point	QUANTITY	0.93+
first two use cases	QUANTITY	0.92+
last friday	DATE	0.92+
this week	DATE	0.92+
two	QUANTITY	0.92+
three other	QUANTITY	0.92+
ndor	ORGANIZATION	0.92+
first thing	QUANTITY	0.9+
two data	QUANTITY	0.9+
lake	ORGANIZATION	0.89+
four areas	QUANTITY	0.88+
single tool	QUANTITY	0.88+
north america	LOCATION	0.88+
single unit	QUANTITY	0.87+
jamac	PERSON	0.86+
one of	QUANTITY	0.85+
things	QUANTITY	0.85+
david.velante	OTHER	0.83+
past eight quarters	DATE	0.83+
four principles	QUANTITY	0.82+
dave	ORGANIZATION	0.82+
a lot of applications	QUANTITY	0.81+
four main principles	QUANTITY	0.8+
sql	TITLE	0.8+
palo alto	ORGANIZATION	0.8+
emily	PERSON	0.8+
d vallante	PERSON	0.8+

Breaking Analysis: How JPMC is Implementing a Data Mesh Architecture on the AWS Cloud

>> From theCUBE studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR. This is braking analysis with Dave Vellante. >> A new era of data is upon us, and we're in a state of transition. You know, even our language reflects that. We rarely use the phrase big data anymore, rather we talk about digital transformation or digital business, or data-driven companies. Many have come to the realization that data is a not the new oil, because unlike oil, the same data can be used over and over for different purposes. We still use terms like data as an asset. However, that same narrative, when it's put forth by the vendor and practitioner communities, includes further discussions about democratizing and sharing data. Let me ask you this, when was the last time you wanted to share your financial assets with your coworkers or your partners or your customers? Hello everyone, and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis, we want to share our assessment of the state of the data business. We'll do so by looking at the data mesh concept and how a leading financial institution, JP Morgan Chase is practically applying these relatively new ideas to transform its data architecture. Let's start by looking at what is the data mesh. As we've previously reported many times, data mesh is a concept and set of principles that was introduced in 2018 by Zhamak Deghani who's director of technology at ThoughtWorks, it's a global consultancy and software development company. And she created this movement because her clients, who were some of the leading firms in the world had invested heavily in predominantly monolithic data architectures that had failed to deliver desired outcomes in ROI. So her work went deep into trying to understand that problem. And her main conclusion that came out of this effort was the world of data is distributed and shoving all the data into a single monolithic architecture is an approach that fundamentally limits agility and scale. Now a profound concept of data mesh is the idea that data architectures should be organized around business lines with domain context. That the highly technical and hyper specialized roles of a centralized cross functional team are a key blocker to achieving our data aspirations. This is the first of four high level principles of data mesh. So first again, that the business domain should own the data end-to-end, rather than have it go through a centralized big data technical team. Second, a self-service platform is fundamental to a successful architectural approach where data is discoverable and shareable across an organization and an ecosystem. Third, product thinking is central to the idea of data mesh. In other words, data products will power the next era of data success. And fourth data products must be built with governance and compliance that is automated and federated. Now there's lot more to this concept and there are tons of resources on the web to learn more, including an entire community that is formed around data mesh. But this should give you a basic idea. Now, the other point is that, in observing Zhamak Deghani's work, she is deliberately avoided discussions around specific tooling, which I think has frustrated some folks because we all like to have references that tie to products and tools and companies. So this has been a two-edged sword in that, on the one hand it's good, because data mesh is designed to be tool agnostic and technology agnostic. On the other hand, it's led some folks to take liberties with the term data mesh and claim mission accomplished when their solution, you know, maybe more marketing than reality. So let's look at JP Morgan Chase in their data mesh journey. Is why I got really excited when I saw this past week, a team from JPMC held a meet up to discuss what they called, data lake strategy via data mesh architecture. I saw that title, I thought, well, that's a weird title. And I wondered, are they just taking their legacy data lakes and claiming they're now transformed into a data mesh? But in listening to the presentation, which was over an hour long, the answer is a definitive no, not at all in my opinion. A gentleman named Scott Hollerman organized the session that comprised these three speakers here, James Reid, who's a divisional CIO at JPMC, Arup Nanda who is a technologist and architect and Serita Bakst who is an information architect, again, all from JPMC. This was the most detailed and practical discussion that I've seen to date about implementing a data mesh. And this is JP Morgan's their approach, and we know they're extremely savvy and technically sound. And they've invested, it has to be billions in the past decade on data architecture across their massive company. And rather than dwell on the downsides of their big data past, I was really pleased to see how they're evolving their approach and embracing new thinking around data mesh. So today, we're going to share some of the slides that they use and comment on how it dovetails into the concept of data mesh that Zhamak Deghani has been promoting, and at least as we understand it. And dig a bit into some of the tooling that is being used by JP Morgan, particularly around it's AWS cloud. So the first point is it's all about business value, JPMC, they're in the money business, and in that world, business value is everything. So Jr Reid, the CIO showed this slide and talked about their overall goals, which centered on a cloud first strategy to modernize the JPMC platform. I think it's simple and sensible, but there's three factors on which he focused, cut costs always short, you got to do that. Number two was about unlocking new opportunities, or accelerating time to value. But I was really happy to see number three, data reuse. That's a fundamental value ingredient in the slide that he's presenting here. And his commentary was all about aligning with the domains and maximizing data reuse, i.e. data is not like oil and making sure there's appropriate governance around that. Now don't get caught up in the term data lake, I think it's just how JP Morgan communicates internally. It's invested in the data lake concept, so they use water analogies. They use things like data puddles, for example, which are single project data marts or data ponds, which comprise multiple data puddles. And these can feed in to data lakes. And as we'll see, JPMC doesn't strive to have a single version of the truth from a data standpoint that resides in a monolithic data lake, rather it enables the business lines to create and own their own data lakes that comprise fit for purpose data products. And they do have a single truth of metadata. Okay, we'll get to that. But generally speaking, each of the domains will own end-to-end their own data and be responsible for those data products, we'll talk about that more. Now the genesis of this was sort of a cloud first platform, JPMC is leaning into public cloud, which is ironic since the early days, in the early days of cloud, all the financial institutions were like never. Anyway, JPMC is going hard after it, they're adopting agile methods and microservices architectures, and it sees cloud as a fundamental enabler, but it recognizes that on-prem data must be part of the data mesh equation. Here's a slide that starts to get into some of that generic tooling, and then we'll go deeper. And I want to make a couple of points here that tie back to Zhamak Deghani's original concept. The first is that unlike many data architectures, this puts data as products right in the fat middle of the chart. The data products live in the business domains and are at the heart of the architecture. The databases, the Hadoop clusters, the files and APIs on the left-hand side, they serve the data product builders. The specialized roles on the right hand side, the DBA's, the data engineers, the data scientists, the data analysts, we could have put in quality engineers, et cetera, they serve the data products. Because the data products are owned by the business, they inherently have the context that is the middle of this diagram. And you can see at the bottom of the slide, the key principles include domain thinking, an end-to-end ownership of the data products. They build it, they own it, they run it, they manage it. At the same time, the goal is to democratize data with a self-service as a platform. One of the biggest points of contention of data mesh is governance. And as Serita Bakst said on the Meetup, metadata is your friend, and she kind of made a joke, she said, "This sounds kind of geeky, but it's important to have a metadata catalog to understand where data resides and the data lineage in overall change management. So to me, this really past the data mesh stink test pretty well. Let's look at data as products. CIO Reid said the most difficult thing for JPMC was getting their heads around data product, and they spent a lot of time getting this concept to work. Here's the slide they use to describe their data products as it related to their specific industry. They set a common language and taxonomy is very important, and you can imagine how difficult that was. He said, for example, it took a lot of discussion and debate to define what a transaction was. But you can see at a high level, these three product groups around wholesale, credit risk, party, and trade and position data as products, and each of these can have sub products, like, party, we'll have to know your customer, KYC for example. So a key for JPMC was to start at a high level and iterate to get more granular over time. So lots of decisions had to be made around who owns the products and the sub-products. The product owners interestingly had to defend why that product should even exist, what boundaries should be in place and what data sets do and don't belong in the various products. And this was a collaborative discussion, I'm sure there was contention around that between the lines of business. And which sub products should be part of these circles? They didn't say this, but tying it back to data mesh, each of these products, whether in a data lake or a data hub or a data pond or data warehouse, data puddle, each of these is a node in the global data mesh that is discoverable and governed. And supporting this notion, Serita said that, "This should not be infrastructure-bound, logically, any of these data products, whether on-prem or in the cloud can connect via the data mesh." So again, I felt like this really stayed true to the data mesh concept. Well, let's look at some of the key technical considerations that JPM discussed in quite some detail. This chart here shows a diagram of how JP Morgan thinks about the problem, and some of the challenges they had to consider were how to write to various data stores, can you and how can you move data from one data store to another? How can data be transformed? Where's the data located? Can the data be trusted? How can it be easily accessed? Who has the right to access that data? These are all problems that technology can help solve. And to address these issues, Arup Nanda explained that the heart of this slide is the data in ingestor instead of ETL. All data producers and contributors, they send their data to the ingestor and the ingestor then registers the data so it's in the data catalog. It does a data quality check and it tracks the lineage. Then, data is sent to the router, which persists the data in the data store based on the best destination as informed by the registration. This is designed to be a flexible system. In other words, the data store for a data product is not fixed, it's determined at the point of inventory, and that allows changes to be easily made in one place. The router simply reads that optimal location and sends it to the appropriate data store. Nowadays you see the schema infer there is used when there is no clear schema on right. In this case, the data product is not allowed to be consumed until the schema is inferred, and then the data goes into a raw area, and the inferer determines the schema and then updates the inventory system so that the data can be routed to the proper location and properly tracked. So that's some of the detail of how the sausage factory works in this particular use case, it was very interesting and informative. Now let's take a look at the specific implementation on AWS and dig into some of the tooling. As described in some detail by Arup Nanda, this diagram shows the reference architecture used by this group within JP Morgan, and it shows all the various AWS services and components that support their data mesh approach. So start with the authorization block right there underneath Kinesis. The lake formation is the single point of entitlement and has a number of buckets including, you can see there the raw area that we just talked about, a trusted bucket, a refined bucket, et cetera. Depending on the data characteristics at the data catalog registration block where you see the glue catalog, that determines in which bucket the router puts the data. And you can see the many AWS services in use here, identity, the EMR, the elastic MapReduce cluster from the legacy Hadoop work done over the years, the Redshift Spectrum and Athena, JPMC uses Athena for single threaded workloads and Redshift Spectrum for nested types so they can be queried independent of each other. Now remember very importantly, in this use case, there is not a single lake formation, rather than multiple lines of business will be authorized to create their own lakes, and that creates a challenge. So how can that be done in a flexible and automated manner? And that's where the data mesh comes into play. So JPMC came up with this federated lake formation accounts idea, and each line of business can create as many data producer or consumer accounts as they desire and roll them up into their master line of business lake formation account. And they cross-connect these data products in a federated model. And these all roll up into a master glue catalog so that any authorized user can find out where a specific data element is located. So this is like a super set catalog that comprises multiple sources and syncs up across the data mesh. So again to me, this was a very well thought out and practical application of database. Yes, it includes some notion of centralized management, but much of that responsibility has been passed down to the lines of business. It does roll up to a master catalog, but that's a metadata management effort that seems compulsory to ensure federated and automated governance. As well at JPMC, the office of the chief data officer is responsible for ensuring governance and compliance throughout the federation. All right, so let's take a look at some of the suspects in this world of data mesh and bring in the ETR data. Now, of course, ETR doesn't have a data mesh category, there's no such thing as that data mesh vendor, you build a data mesh, you don't buy it. So, what we did is we use the ETR dataset to select and filter on some of the culprits that we thought might contribute to the data mesh to see how they're performing. This chart depicts a popular view that we often like to share. It's a two dimensional graphic with net score or spending momentum on the vertical axis and market share or pervasiveness in the data set on the horizontal axis. And we filtered the data on sectors such as analytics, data warehouse, and the adjacencies to things that might fit into data mesh. And we think that these pretty well reflect participation that data mesh is certainly not all compassing. And it's a subset obviously, of all the vendors who could play in the space. Let's make a few observations. Now as is often the case, Azure and AWS, they're almost literally off the charts with very high spending velocity and large presence in the market. Oracle you can see also stands out because much of the world's data lives inside of Oracle databases. It doesn't have the spending momentum or growth, but the company remains prominent. And you can see Google Cloud doesn't have nearly the presence in the dataset, but it's momentum is highly elevated. Remember that red dotted line there, that 40% line, anything over that indicates elevated spending momentum. Let's go to Snowflake. Snowflake is consistently shown to be the gold standard in net score in the ETR dataset. It continues to maintain highly elevated spending velocity in the data. And in many ways, Snowflake with its data marketplace and its data cloud vision and data sharing approach, fit nicely into the data mesh concept. Now, a caution, Snowflake has used the term data mesh in it's marketing, but in our view, it lacks clarity, and we feel like they're still trying to figure out how to communicate what that really is. But is really, we think a lot of potential there to that vision. Databricks is also interesting because the firm has momentum and we expect further elevated levels in the vertical axis in upcoming surveys, especially as it readies for its IPO. The firm has a strong product and managed service, and is really one to watch. Now we included a number of other database companies for obvious reasons like Redis and Mongo, MariaDB, Couchbase and Terradata. SAP as well is in there, but that's not all database, but SAP is prominent so we included them. As is IBM more of a database, traditional database player also with the big presence. Cloudera includes Hortonworks and HPE Ezmeral comprises the MapR business that HPE acquired. So these guys got the big data movement started, between Cloudera, Hortonworks which is born out of Yahoo, which was the early big data, sorry early Hadoop innovator, kind of MapR when it's kind of owned course, and now that's all kind of come together in various forms. And of course, we've got Talend and Informatica are there, they are two data integration companies that are worth noting. We also included some of the AI and ML specialists and data science players in the mix like DataRobot who just did a monster $250 million round. Dataiku, H2O.ai and ThoughtSpot, which is all about democratizing data and injecting AI, and I think fits well into the data mesh concept. And you know we put VMware Cloud in there for reference because it really is the predominant on-prem infrastructure platform. All right, let's wrap with some final thoughts here, first, thanks a lot to the JP Morgan team for sharing this data. I really want to encourage practitioners and technologists, go to watch the YouTube of that meetup, we'll include it in the link of this session. And thank you to Zhamak Deghani and the entire data mesh community for the outstanding work that you're doing, challenging the established conventions of monolithic data architectures. The JPM presentation, it gives you real credibility, it takes Data Mesh well beyond concept, it demonstrates how it can be and is being done. And you know, this is not a perfect world, you're going to start somewhere and there's going to be some failures, the key is to recognize that shoving everything into a monolithic data architecture won't support massive scale and agility that you're after. It's maybe fine for smaller use cases in smaller firms, but if you're building a global platform in a data business, it's time to rethink data architecture. Now much of this is enabled by the cloud, but cloud first doesn't mean cloud only, doesn't mean you'll leave your on-prem data behind, on the contrary, you have to include non-public cloud data in your Data Mesh vision just as JPMC has done. You've got to get some quick wins, that's crucial so you can gain credibility within the organization and grow. And one of the key takeaways from the JP Morgan team is, there is a place for dogma, like organizing around data products and domains and getting that right. On the other hand, you have to remain flexible because technologies is going to come, technology is going to go, so you got to be flexible in that regard. And look, if you're going to embrace the metaphor of water like puddles and ponds and lakes, we suggest maybe a little tongue in cheek, but still we believe in this, that you expand your scope to include data ocean, something John Furry and I have talked about and laughed about extensively in theCUBE. Data oceans, it's huge. It's the new data lake, go transcend data lake, think oceans. And think about this, just as we're evolving our language, we should be evolving our metrics. Much the last the decade of big data was around just getting the stuff to work, getting it up and running, standing up infrastructure and managing massive, how much data you got? Massive amounts of data. And there were many KPIs built around, again, standing up that infrastructure, ingesting data, a lot of technical KPIs. This decade is not just about enabling better insights, it's a more than that. Data mesh points us to a new era of data value, and that requires the new metrics around monetizing data products, like how long does it take to go from data product conception to monetization? And how does that compare to what it is today? And what is the time to quality if the business owns the data, and the business has the context? the quality that comes out of them, out of the shoot should be at a basic level, pretty good, and at a higher mark than out of a big data team with no business context. Automation, AI, and very importantly, organizational restructuring of our data teams will heavily contribute to success in the coming years. So we encourage you, learn, lean in and create your data future. Okay, that's it for now, remember these episodes, they're all available as podcasts wherever you listen, all you got to do is search, breaking analysis podcast, and please subscribe. Check out ETR's website at etr.plus for all the data and all the survey information. We publish a full report every week on wikibon.com and siliconangle.com. And you can get in touch with us, email me david.vellante@siliconangle.com, you can DM me @dvellante, or you can comment on my LinkedIn posts. This is Dave Vellante for theCUBE insights powered by ETR. Have a great week everybody, stay safe, be well, and we'll see you next time. (upbeat music)

Published Date : Jul 12 2021

SUMMARY :

This is braking analysis and the adjacencies to things

ENTITIES

Entity	Category	Confidence
JPMC	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
2018	DATE	0.99+
Zhamak Deghani	PERSON	0.99+
James Reid	PERSON	0.99+
JP Morgan	ORGANIZATION	0.99+
JP Morgan	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
Serita Bakst	PERSON	0.99+
IBM	ORGANIZATION	0.99+
HPE	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Scott Hollerman	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
40%	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
Serita	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Arup Nanda	PERSON	0.99+
each	QUANTITY	0.99+
ThoughtWorks	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
each line	QUANTITY	0.99+
Terradata	ORGANIZATION	0.99+
Redis	ORGANIZATION	0.99+
$250 million	QUANTITY	0.99+
first point	QUANTITY	0.99+
three factors	QUANTITY	0.99+
Second	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
today	DATE	0.99+
Informatica	ORGANIZATION	0.99+
Talend	ORGANIZATION	0.99+
John Furry	PERSON	0.99+
Zhamak Deghani	PERSON	0.99+
first platform	QUANTITY	0.98+
YouTube	ORGANIZATION	0.98+
fourth	QUANTITY	0.98+
single	QUANTITY	0.98+
One	QUANTITY	0.98+
Third	QUANTITY	0.97+
Couchbase	ORGANIZATION	0.97+
three speakers	QUANTITY	0.97+
two data	QUANTITY	0.97+
first strategy	QUANTITY	0.96+
one	QUANTITY	0.96+
one place	QUANTITY	0.96+
Jr Reid	PERSON	0.96+
single lake	QUANTITY	0.95+
SAP	ORGANIZATION	0.95+
wikibon.com	OTHER	0.95+
siliconangle.com	OTHER	0.94+
Azure	ORGANIZATION	0.93+

Welcome to Supercloud2

(bright upbeat melody) >> Hello everyone, welcome back to Supercloud2. I'm John Furrier, my co-host Dave Vellante, here at theCUBE in Palo Alto, California, for our live stage performance all day for Supercloud2. Unpacking this next generation movement in cloud computing. Dave, Supercloud1 was in August. We had great response and acceleration of that momentum. We had some haters too. We had some folks out there throwing shade on this. But at the same time, a lot of leaders came out of the woodwork, a lot of practitioners. And this Supercloud2 event I think will expose and illustrate some of the examples of what's happening in the industry and more importantly, kind of where it's going. >> Well it's great to be back in our studios in Palo Alto, John. Seems like just yesterday was August 9th, where the community was really refining the definition of Super Cloud. We were identifying the essential characteristics, with some of the leading technologists in Silicon Valley. We were digging into the deployment models. Whereas this Supercloud, Supercloud2 is really taking a practitioner view. We're going to hear from Walmart today. They've built a Supercloud. They called it the Walmart Cloud native platform. We're going to hear from other data practitioners, like Saks. We're going to hear from Western Union. They've got 200 locations around the world, how they're dealing with data sovereignty. And of course we've got some local technologists and practitioners coming in, analysts, consultants, theCUBE community. I'm really excited to be here. >> And we've got some great keynotes from executives at VMware. We're going to expose some of the things that they're working on around cross cloud services, which leads into multicloud. I think the practitioner angle highlights my favorite part of this program, 'cause you're starting to see the builders, a term coined by Andy Jassy, early days of AWS. That builder movement has been continuing to go. And you're seeing the enterprise, global enterprises adopt this builder mentality with Cloud Native. This is going to power the next generation global economy. And I think the role of the cloud computing vendors like AWS, Azure, Google, Alibaba are going to be the source engine of innovation. And what gets built on top of and with the clouds will be a big significant market value for all businesses and their business models. So I think the market wants the supercloud, the business models are pointing to Supercloud. The technology needs supercloud. And society, from an economic standpoint and from a use case standpoint, needs supercloud. You're seeing it today. Everyone's talking about chat GPT. This is an example of what will come out of this next generation and it's just getting started. So to me, you're either on the supercloud side of the camp or you're on the old school, hugging onto the old school mentality of wait a minute, that's cloud computing. So I think if you're not on the super cloud wave, you're going to be driftwood. And that's a term coined by Pat Gelsinger. And this is really the reality. Are you on the super cloud side? Or are you on the old huggin' the old model? And that's going to be a determinant. And you're going to see who's going to be the players on that, Dave. This is going to be a real big year. >> Everybody's heard the phrase follow the money. Well, my philosophy is follow the data. And that's a big part of what Supercloud2 is, because the data is where the money is across the clouds. And people want more simplicity, or greater simplicity across the clouds. So it's really, there's two forces here. You've got the ecosystem that's saying, hey the hyperscalers, they've done a great job but there's problems that they're not solving. So we're going to lean in and solve those problems. At the same time, you have the practitioners saying we have multicloud, we have to deal with this, help us. It's got to be simpler. Because we want to share data across clouds. We want to build data products, we want to monetize and drive revenue and cut costs. >> This is the key thing. The builder movement is hitting a wall, and that wall will be broken down because the business models of the companies themselves are demanding that the value from the data with security has to be embedded. So I think you're going to see a big year this next year or so where the builders will accelerate through this next generation, supercloud wave, will be a builder's wave for business. And I think that's going to be the nuance here. And all the people that are on the side of Supercloud are all pro-business, pro-technology. The ones that aren't are like, wait a minute I used to do things differently. They're stuck. And so I think this is going to be a question of are we stuck? Are builders accelerating? Will the business models develop around it? That's digital transformation. At the end of the day, the market's speaking, Dave. The market wants more. Chat GPT, you're seeing AI starting to flourish, powered by data. It's unstoppable, supercloud's unstoppable. >> One of our headliners today is Zhamak Dehghani, the creator of Data Mesh. We've got some news around her. She's going to be live in studio. Super excited about that. Kit Colbert in Supercloud, the first Supercloud in last August, laid out an initial architecture for Supercloud. He's going to advance that today, tell us what's changed, and really dig into and really talk about the meat on the bone, if you will. And we've got some other technologists that are coming in saying, Hey, is it a platform? Is it an architecture? What's the right model here? So we're going to debate that a little bit today. >> And before we close, I'll just say look at the guests, look at the talk tracks. You're seeing a diversity of startups doing cloud networking, you're seeing big practitioners building their own thing, being builders for business value and business model advantages. And you got companies like VMware, who have been on the wave of virtualization. So the, everyone who's involved in super cloud, they're seeing it, they're on the front lines. They're seeing the trend. They are riding that wave. And they have, they're bringing data to the table. So to me, you look at who's involved and you judge it that way. To me, that's the way I look at this. And because we're making it open, Supercloud is going to continue to be debated. But more importantly, the results are going to come in. The market supports it, the business needs it, tech's there, and will it happen? So I think the builders movement, Dave, is going to be big to watch. And then ultimately how that business transformation kicks in, and I think those are the two variables that I would watch on Supercloud. >> Our mission has always been around free content, giving back to the community. So I really want to thank our sponsors today. We've had a great partnership with VMware, who's not only contributed some financial support, but also great content. Alkira, ChaosSearch, prosimo, all phenomenal, allowing us to achieve our mission of serving our audiences and really trying to give more than we take from. >> Free content, that's our mission. Dave, great to kick it off. Kickin' off Supercloud2 all day, we've got some great programs here. We've got VMware coming up next. We have Victoria Viering, who's been on before. He's got a great vision for cross cloud service. We're getting also a keynote with Kit Colbert, who's going to lay out the fragmentation and the benefits that that solves, from solvent fragmentation and silos, breaking down the silos and bringing multicloud future to the table via Super Cloud. So stay with us. We'll be right back after this short break. (bright upbeat music) (music fades)

Published Date : Feb 17 2023

SUMMARY :

and illustrate some of the examples We're going to hear from Walmart today. And that's going to be a determinant. At the same time, you And so I think this is going to the meat on the bone, if you will. Dave, is going to be big to watch. giving back to the community. and the benefits that that solves,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Pat Gelsinger	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
Google	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
August	DATE	0.99+
Victoria Viering	PERSON	0.99+
August 9th	DATE	0.99+
John Furrier	PERSON	0.99+
200 locations	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
Supercloud2	EVENT	0.99+
two forces	QUANTITY	0.99+
last August	DATE	0.99+
yesterday	DATE	0.99+
first	QUANTITY	0.99+
two variables	QUANTITY	0.99+
today	DATE	0.98+
One	QUANTITY	0.98+
supercloud	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.97+
ChaosSearch	ORGANIZATION	0.95+
super cloud wave	EVENT	0.94+
Supercloud1	EVENT	0.94+
Super Cloud	TITLE	0.93+
Alkira	PERSON	0.83+
Palo Alto, John	LOCATION	0.83+
this next year	DATE	0.81+
Data Mesh	ORGANIZATION	0.8+
supercloud wave	EVENT	0.79+
wave of	EVENT	0.79+
Western Union	LOCATION	0.78+
Saks	ORGANIZATION	0.76+
GPT	ORGANIZATION	0.73+
Supercloud2	ORGANIZATION	0.72+
Cloud Native	TITLE	0.69+
Supercloud	TITLE	0.67+
Supercloud2	COMMERCIAL_ITEM	0.66+
multicloud	ORGANIZATION	0.57+
Supercloud	COMMERCIAL_ITEM	0.53+
Supercloud2	TITLE	0.53+
theCUBE	ORGANIZATION	0.51+
super cloud	TITLE	0.51+
Cloud	TITLE	0.41+

Breaking Analysis: Supercloud2 Explores Cloud Practitioner Realities & the Future of Data Apps

>> Narrator: From theCUBE Studios in Palo Alto and Boston bringing you data-driven insights from theCUBE and ETR. This is breaking analysis with Dave Vellante >> Enterprise tech practitioners, like most of us they want to make their lives easier so they can focus on delivering more value to their businesses. And to do so, they want to tap best of breed services in the public cloud, but at the same time connect their on-prem intellectual property to emerging applications which drive top line revenue and bottom line profits. But creating a consistent experience across clouds and on-prem estates has been an elusive capability for most organizations, forcing trade-offs and injecting friction into the system. The need to create seamless experiences is clear and the technology industry is starting to respond with platforms, architectures, and visions of what we've called the Supercloud. Hello and welcome to this week's Wikibon Cube Insights powered by ETR. In this breaking analysis we give you a preview of Supercloud 2, the second event of its kind that we've had on the topic. Yes, folks that's right Supercloud 2 is here. As of this recording, it's just about four days away 33 guests, 21 sessions, combining live discussions and fireside chats from theCUBE's Palo Alto Studio with prerecorded conversations on the future of cloud and data. You can register for free at supercloud.world. And we are super excited about the Supercloud 2 lineup of guests whereas Supercloud 22 in August, was all about refining the definition of Supercloud testing its technical feasibility and understanding various deployment models. Supercloud 2 features practitioners, technologists and analysts discussing what customers need with real-world examples of Supercloud and will expose thinking around a new breed of cross-cloud apps, data apps, if you will that change the way machines and humans interact with each other. Now the example we'd use if you think about applications today, say a CRM system, sales reps, what are they doing? They're entering data into opportunities they're choosing products they're importing contacts, et cetera. And sure the machine can then take all that data and spit out a forecast by rep, by region, by product, et cetera. But today's applications are largely about filling in forms and or codifying processes. In the future, the Supercloud community sees a new breed of applications emerging where data resides on different clouds, in different data storages, databases, Lakehouse, et cetera. And the machine uses AI to inspect the e-commerce system the inventory data, supply chain information and other systems, and puts together a plan without any human intervention whatsoever. Think about a system that orchestrates people, places and things like an Uber for business. So at Supercloud 2, you'll hear about this vision along with some of today's challenges facing practitioners. Zhamak Dehghani, the founder of Data Mesh is a headliner. Kit Colbert also is headlining. He laid out at the first Supercloud an initial architecture for what that's going to look like. That was last August. And he's going to present his most current thinking on the topic. Veronika Durgin of Sachs will be featured and talk about data sharing across clouds and you know what she needs in the future. One of the main highlights of Supercloud 2 is a dive into Walmart's Supercloud. Other featured practitioners include Western Union Ionis Pharmaceuticals, Warner Media. We've got deep, deep technology dives with folks like Bob Muglia, David Flynn Tristan Handy of DBT Labs, Nir Zuk, the founder of Palo Alto Networks focused on security. Thomas Hazel, who's going to talk about a new type of database for Supercloud. It's several analysts including Keith Townsend Maribel Lopez, George Gilbert, Sanjeev Mohan and so many more guests, we don't have time to list them all. They're all up on supercloud.world with a full agenda, so you can check that out. Now let's take a look at some of the things that we're exploring in more detail starting with the Walmart Cloud native platform, they call it WCNP. We definitely see this as a Supercloud and we dig into it with Jack Greenfield. He's the head of architecture at Walmart. Here's a quote from Jack. "WCNP is an implementation of Kubernetes for the Walmart ecosystem. We've taken Kubernetes off the shelf as open source." By the way, they do the same thing with OpenStack. "And we have integrated it with a number of foundational services that provide other aspects of our computational environment. Kubernetes off the shelf doesn't do everything." And so what Walmart chose to do, they took a do-it-yourself approach to build a Supercloud for a variety of reasons that Jack will explain, along with Walmart's so-called triplet architecture connecting on-prem, Azure and GCP. No surprise, there's no Amazon at Walmart for obvious reasons. And what they do is they create a common experience for devs across clouds. Jack is going to talk about how Walmart is evolving its Supercloud in the future. You don't want to miss that. Now, next, let's take a look at how Veronica Durgin of SAKS thinks about data sharing across clouds. Data sharing we think is a potential killer use case for Supercloud. In fact, let's hear it in Veronica's own words. Please play the clip. >> How do we talk to each other? And more importantly, how do we data share? You know, I work with data, you know this is what I do. So if you know I want to get data from a company that's using, say Google, how do we share it in a smooth way where it doesn't have to be this crazy I don't know, SFTP file moving? So that's where I think Supercloud comes to me in my mind, is like practical applications. How do we create that mesh, that network that we can easily share data with each other? >> Now data mesh is a possible architectural approach that will enable more facile data sharing and the monetization of data products. You'll hear Zhamak Dehghani live in studio talking about what standards are missing to make this vision a reality across the Supercloud. Now one of the other things that we're really excited about is digging deeper into the right approach for Supercloud adoption. And we're going to share a preview of a debate that's going on right now in the community. Bob Muglia, former CEO of Snowflake and Microsoft Exec was kind enough to spend some time looking at the community's supercloud definition and he felt that it needed to be simplified. So in near real time he came up with the following definition that we're showing here. I'll read it. "A Supercloud is a platform that provides programmatically consistent services hosted on heterogeneous cloud providers." So not only did Bob simplify the initial definition he's stressed that the Supercloud is a platform versus an architecture implying that the platform provider eg Snowflake, VMware, Databricks, Cohesity, et cetera is responsible for determining the architecture. Now interestingly in the shared Google doc that the working group uses to collaborate on the supercloud de definition, Dr. Nelu Mihai who is actually building a Supercloud responded as follows to Bob's assertion "We need to avoid creating many Supercloud platforms with their own architectures. If we do that, then we create other proprietary clouds on top of existing ones. We need to define an architecture of how Supercloud interfaces with all other clouds. What is the information model? What is the execution model and how users will interact with Supercloud?" What does this seemingly nuanced point tell us and why does it matter? Well, history suggests that de facto standards will emerge more quickly to resolve real world practitioner problems and catch on more quickly than consensus-based architectures and standards-based architectures. But in the long run, the ladder may serve customers better. So we'll be exploring this topic in more detail in Supercloud 2, and of course we'd love to hear what you think platform, architecture, both? Now one of the real technical gurus that we'll have in studio at Supercloud two is David Flynn. He's one of the people behind the the movement that enabled enterprise flash adoption, that craze. And he did that with Fusion IO and he is now working on a system to enable read write data access to any user in any application in any data center or on any cloud anywhere. So think of this company as a Supercloud enabler. Allow me to share an excerpt from a conversation David Flore and I had with David Flynn last year. He as well gave a lot of thought to the Supercloud definition and was really helpful with an opinionated point of view. He said something to us that was, we thought relevant. "What is the operating system for a decentralized cloud? The main two functions of an operating system or an operating environment are one the process scheduler and two, the file system. The strongest argument for supercloud is made when you go down to the platform layer and talk about it as an operating environment on which you can run all forms of applications." So a couple of implications here that will be exploring with David Flynn in studio. First we're inferring from his comment that he's in the platform camp where the platform owner is responsible for the architecture and there are obviously trade-offs there and benefits but we'll have to clarify that with him. And second, he's basically saying, you kill the concept the further you move up the stack. So the weak, the further you move the stack the weaker the supercloud argument becomes because it's just becoming SaaS. Now this is something we're going to explore to better understand is thinking on this, but also whether the existing notion of SaaS is changing and whether or not a new breed of Supercloud apps will emerge. Which brings us to this really interesting fellow that George Gilbert and I RIFed with ahead of Supercloud two. Tristan Handy, he's the founder and CEO of DBT Labs and he has a highly opinionated and technical mind. Here's what he said, "One of the things that we still don't know how to API-ify is concepts that live inside of your data warehouse inside of your data lake. These are core concepts that the business should be able to create applications around very easily. In fact, that's not the case because it involves a lot of data engineering pipeline and other work to make these available. So if you really want to make it easy to create these data experiences for users you need to have an ability to describe these metrics and then to turn them into APIs to make them accessible to application developers who have literally no idea how they're calculated behind the scenes and they don't need to." A lot of implications to this statement that will explore at Supercloud two versus Jamma Dani's data mesh comes into play here with her critique of hyper specialized data pipeline experts with little or no domain knowledge. Also the need for simplified self-service infrastructure which Kit Colbert is likely going to touch upon. Veronica Durgin of SAKS and her ideal state for data shearing along with Harveer Singh of Western Union. They got to deal with 200 locations around the world in data privacy issues, data sovereignty how do you share data safely? Same with Nick Taylor of Ionis Pharmaceutical. And not to blow your mind but Thomas Hazel and Bob Muglia deposit that to make data apps a reality across the Supercloud you have to rethink everything. You can't just let in memory databases and caching architectures take care of everything in a brute force manner. Rather you have to get down to really detailed levels even things like how data is laid out on disk, ie flash and think about rewriting applications for the Supercloud and the MLAI era. All of this and more at Supercloud two which wouldn't be complete without some data. So we pinged our friends from ETR Eric Bradley and Darren Bramberm to see if they had any data on Supercloud that we could tap. And so we're going to be analyzing a number of the players as well at Supercloud two. Now, many of you are familiar with this graphic here we show some of the players involved in delivering or enabling Supercloud-like capabilities. On the Y axis is spending momentum and on the horizontal accesses market presence or pervasiveness in the data. So netscore versus what they call overlap or end in the data. And the table insert shows how the dots are plotted now not to steal ETR's thunder but the first point is you really can't have supercloud without the hyperscale cloud platforms which is shown on this graphic. But the exciting aspect of Supercloud is the opportunity to build value on top of that hyperscale infrastructure. Snowflake here continues to show strong spending velocity as those Databricks, Hashi, Rubrik. VMware Tanzu, which we all put under the magnifying glass after the Broadcom announcements, is also showing momentum. Unfortunately due to a scheduling conflict we weren't able to get Red Hat on the program but they're clearly a player here. And we've put Cohesity and Veeam on the chart as well because backup is a likely use case across clouds and on-premises. And now one other call out that we drill down on at Supercloud two is CloudFlare, which actually uses the term supercloud maybe in a different way. They look at Supercloud really as you know, serverless on steroids. And so the data brains at ETR will have more to say on this topic at Supercloud two along with many others. Okay, so why should you attend Supercloud two? What's in it for me kind of thing? So first of all, if you're a practitioner and you want to understand what the possibilities are for doing cross-cloud services for monetizing data how your peers are doing data sharing, how some of your peers are actually building out a Supercloud you're going to get real world input from practitioners. If you're a technologist, you're trying to figure out various ways to solve problems around data, data sharing, cross-cloud service deployment there's going to be a number of deep technology experts that are going to share how they're doing it. We're also going to drill down with Walmart into a practical example of Supercloud with some other examples of how practitioners are dealing with cross-cloud complexity. Some of them, by the way, are kind of thrown up their hands and saying, Hey, we're going mono cloud. And we'll talk about the potential implications and dangers and risks of doing that. And also some of the benefits. You know, there's a question, right? Is Supercloud the same wine new bottle or is it truly something different that can drive substantive business value? So look, go to Supercloud.world it's January 17th at 9:00 AM Pacific. You can register for free and participate directly in the program. Okay, that's a wrap. I want to give a shout out to the Supercloud supporters. VMware has been a great partner as our anchor sponsor Chaos Search Proximo, and Alura as well. For contributing to the effort I want to thank Alex Myerson who's on production and manages the podcast. Ken Schiffman is his supporting cast as well. Kristen Martin and Cheryl Knight to help get the word out on social media and at our newsletters. And Rob Ho is our editor-in-chief over at Silicon Angle. Thank you all. Remember, these episodes are all available as podcast. Wherever you listen we really appreciate the support that you've given. We just saw some stats from from Buzz Sprout, we hit the top 25% we're almost at 400,000 downloads last year. So really appreciate your participation. All you got to do is search Breaking Analysis podcast and you'll find those I publish each week on wikibon.com and siliconangle.com. Or if you want to get ahold of me you can email me directly at David.Vellante@siliconangle.com or dm me DVellante or comment on our LinkedIn post. I want you to check out etr.ai. They've got the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights, powered by ETR. Thanks for watching. We'll see you next week at Supercloud two or next time on breaking analysis. (light music)

Published Date : Jan 14 2023

SUMMARY :

with Dave Vellante of the things that we're So if you know I want to get data and on the horizontal

ENTITIES

Entity	Category	Confidence
Bob Muglia	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
David Flynn	PERSON	0.99+
Veronica	PERSON	0.99+
Jack	PERSON	0.99+
Nelu Mihai	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Thomas Hazel	PERSON	0.99+
Nick Taylor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jack Greenfield	PERSON	0.99+
Kristen Martin	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Veronica Durgin	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Warner Media	ORGANIZATION	0.99+
Tristan Handy	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Ionis Pharmaceutical	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Bob Muglia	PERSON	0.99+
David Flore	PERSON	0.99+
DBT Labs	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Bob	PERSON	0.99+
Palo Alto	LOCATION	0.99+
21 sessions	QUANTITY	0.99+
Darren Bramberm	PERSON	0.99+
33 guests	QUANTITY	0.99+
Nir Zuk	PERSON	0.99+
Boston	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Harveer Singh	PERSON	0.99+
Kit Colbert	PERSON	0.99+
Databricks	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Supercloud 2	TITLE	0.99+
Snowflake	ORGANIZATION	0.99+
last year	DATE	0.99+
Western Union	ORGANIZATION	0.99+
Cohesity	ORGANIZATION	0.99+
Supercloud	ORGANIZATION	0.99+
200 locations	QUANTITY	0.99+
August	DATE	0.99+
Keith Townsend	PERSON	0.99+
Data Mesh	ORGANIZATION	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
David.Vellante@siliconangle.com	OTHER	0.99+
next week	DATE	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
second	QUANTITY	0.99+
first point	QUANTITY	0.99+
One	QUANTITY	0.99+
First	QUANTITY	0.99+
VMware	ORGANIZATION	0.98+
Silicon Angle	ORGANIZATION	0.98+
ETR	ORGANIZATION	0.98+
Eric Bradley	PERSON	0.98+
two	QUANTITY	0.98+
today	DATE	0.98+
Sachs	ORGANIZATION	0.98+
SAKS	ORGANIZATION	0.98+
Supercloud	EVENT	0.98+
last August	DATE	0.98+
each week	QUANTITY	0.98+

Breaking Analysis: Grading our 2022 Enterprise Technology Predictions

>>From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from the cube and E T R. This is breaking analysis with Dave Valante. >>Making technology predictions in 2022 was tricky business, especially if you were projecting the performance of markets or identifying I P O prospects and making binary forecast on data AI and the macro spending climate and other related topics in enterprise tech 2022, of course was characterized by a seesaw economy where central banks were restructuring their balance sheets. The war on Ukraine fueled inflation supply chains were a mess. And the unintended consequences of of forced march to digital and the acceleration still being sorted out. Hello and welcome to this week's weekly on Cube Insights powered by E T R. In this breaking analysis, we continue our annual tradition of transparently grading last year's enterprise tech predictions. And you may or may not agree with our self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, tell us what you think. >>All right, let's get right to it. So our first prediction was tech spending increases by 8% in 2022. And as we exited 2021 CIOs, they were optimistic about their digital transformation plans. You know, they rushed to make changes to their business and were eager to sharpen their focus and continue to iterate on their digital business models and plug the holes that they, the, in the learnings that they had. And so we predicted that 8% rise in enterprise tech spending, which looked pretty good until Ukraine and the Fed decided that, you know, had to rush and make up for lost time. We kind of nailed the momentum in the energy sector, but we can't give ourselves too much credit for that layup. And as of October, Gartner had it spending growing at just over 5%. I think it was 5.1%. So we're gonna take a C plus on this one and, and move on. >>Our next prediction was basically kind of a slow ground ball. The second base, if I have to be honest, but we felt it was important to highlight that security would remain front and center as the number one priority for organizations in 2022. As is our tradition, you know, we try to up the degree of difficulty by specifically identifying companies that are gonna benefit from these trends. So we highlighted some possible I P O candidates, which of course didn't pan out. S NQ was on our radar. The company had just had to do another raise and they recently took a valuation hit and it was a down round. They raised 196 million. So good chunk of cash, but, but not the i p O that we had predicted Aqua Securities focus on containers and cloud native. That was a trendy call and we thought maybe an M SS P or multiple managed security service providers like Arctic Wolf would I p o, but no way that was happening in the crummy market. >>Nonetheless, we think these types of companies, they're still faring well as the talent shortage in security remains really acute, particularly in the sort of mid-size and small businesses that often don't have a sock Lacework laid off 20% of its workforce in 2022. And CO C e o Dave Hatfield left the company. So that I p o didn't, didn't happen. It was probably too early for Lacework. Anyway, meanwhile you got Netscope, which we've cited as strong in the E T R data as particularly in the emerging technology survey. And then, you know, I lumia holding its own, you know, we never liked that 7 billion price tag that Okta paid for auth zero, but we loved the TAM expansion strategy to target developers beyond sort of Okta's enterprise strength. But we gotta take some points off of the failure thus far of, of Okta to really nail the integration and the go to market model with azero and build, you know, bring that into the, the, the core Okta. >>So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge with others holding their own, not the least of which was Palo Alto Networks as it continued to expand beyond its core network security and firewall business, you know, through acquisition. So overall we're gonna give ourselves an A minus for this relatively easy call, but again, we had some specifics associated with it to make it a little tougher. And of course we're watching ve very closely this this coming year in 2023. The vendor consolidation trend. You know, according to a recent Palo Alto network survey with 1300 SecOps pros on average organizations have more than 30 tools to manage security tools. So this is a logical way to optimize cost consolidating vendors and consolidating redundant vendors. The E T R data shows that's clearly a trend that's on the upswing. >>Now moving on, a big theme of 2020 and 2021 of course was remote work and hybrid work and new ways to work and return to work. So we predicted in 2022 that hybrid work models would become the dominant protocol, which clearly is the case. We predicted that about 33% of the workforce would come back to the office in 2022 in September. The E T R data showed that figure was at 29%, but organizations expected that 32% would be in the office, you know, pretty much full-time by year end. That hasn't quite happened, but we were pretty close with the projection, so we're gonna take an A minus on this one. Now, supply chain disruption was another big theme that we felt would carry through 2022. And sure that sounds like another easy one, but as is our tradition, again we try to put some binary metrics around our predictions to put some meat in the bone, so to speak, and and allow us than you to say, okay, did it come true or not? >>So we had some data that we presented last year and supply chain issues impacting hardware spend. We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain above pre covid levels, which would reverse a decade of year on year declines, which I think started in around 2011, 2012. Now, while demand is down this year pretty substantially relative to 2021, I D C has worldwide unit shipments for PCs at just over 300 million for 22. If you go back to 2019 and you're looking at around let's say 260 million units shipped globally, you know, roughly, so, you know, pretty good call there. Definitely much higher than pre covid levels. But so what you might be asking why the B, well, we projected that 30% of customers would replace security appliances with cloud-based services and that more than a third would replace their internal data center server and storage hardware with cloud services like 30 and 40% respectively. >>And we don't have explicit survey data on exactly these metrics, but anecdotally we see this happening in earnest. And we do have some data that we're showing here on cloud adoption from ET R'S October survey where the midpoint of workloads running in the cloud is around 34% and forecast, as you can see, to grow steadily over the next three years. So this, well look, this is not, we understand it's not a one-to-one correlation with our prediction, but it's a pretty good bet that we were right, but we gotta take some points off, we think for the lack of unequivocal proof. Cause again, we always strive to make our predictions in ways that can be measured as accurate or not. Is it binary? Did it happen, did it not? Kind of like an O K R and you know, we strive to provide data as proof and in this case it's a bit fuzzy. >>We have to admit that although we're pretty comfortable that the prediction was accurate. And look, when you make an hard forecast, sometimes you gotta pay the price. All right, next, we said in 2022 that the big four cloud players would generate 167 billion in IS and PaaS revenue combining for 38% market growth. And our current forecasts are shown here with a comparison to our January, 2022 figures. So coming into this year now where we are today, so currently we expect 162 billion in total revenue and a 33% growth rate. Still very healthy, but not on our mark. So we think a w s is gonna miss our predictions by about a billion dollars, not, you know, not bad for an 80 billion company. So they're not gonna hit that expectation though of getting really close to a hundred billion run rate. We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're gonna get there. >>Look, we pretty much nailed Azure even though our prediction W was was correct about g Google Cloud platform surpassing Alibaba, Alibaba, we way overestimated the performance of both of those companies. So we're gonna give ourselves a C plus here and we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, but the misses on GCP and Alibaba we think warrant a a self penalty on this one. All right, let's move on to our prediction about Supercloud. We said it becomes a thing in 2022 and we think by many accounts it has, despite the naysayers, we're seeing clear evidence that the concept of a layer of value add that sits above and across clouds is taking shape. And on this slide we showed just some of the pickup in the industry. I mean one of the most interesting is CloudFlare, the biggest supercloud antagonist. >>Charles Fitzgerald even predicted that no vendor would ever use the term in their marketing. And that would be proof if that happened that Supercloud was a thing and he said it would never happen. Well CloudFlare has, and they launched their version of Supercloud at their developer week. Chris Miller of the register put out a Supercloud block diagram, something else that Charles Fitzgerald was, it was was pushing us for, which is rightly so, it was a good call on his part. And Chris Miller actually came up with one that's pretty good at David Linthicum also has produced a a a A block diagram, kind of similar, David uses the term metacloud and he uses the term supercloud kind of interchangeably to describe that trend. And so we we're aligned on that front. Brian Gracely has covered the concept on the popular cloud podcast. Berkeley launched the Sky computing initiative. >>You read through that white paper and many of the concepts highlighted in the Supercloud 3.0 community developed definition align with that. Walmart launched a platform with many of the supercloud salient attributes. So did Goldman Sachs, so did Capital One, so did nasdaq. So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud storm. We're gonna take an a plus on this one. Sorry, haters. Alright, let's talk about data mesh in our 21 predictions posts. We said that in the 2020s, 75% of large organizations are gonna re-architect their big data platforms. So kind of a decade long prediction. We don't like to do that always, but sometimes it's warranted. And because it was a longer term prediction, we, at the time in, in coming into 22 when we were evaluating our 21 predictions, we took a grade of incomplete because the sort of decade long or majority of the decade better part of the decade prediction. >>So last year, earlier this year, we said our number seven prediction was data mesh gains momentum in 22. But it's largely confined and narrow data problems with limited scope as you can see here with some of the key bullets. So there's a lot of discussion in the data community about data mesh and while there are an increasing number of examples, JP Morgan Chase, Intuit, H S P C, HelloFresh, and others that are completely rearchitecting parts of their data platform completely rearchitecting entire data platforms is non-trivial. There are organizational challenges, there're data, data ownership, debates, technical considerations, and in particular two of the four fundamental data mesh principles that the, the need for a self-service infrastructure and federated computational governance are challenging. Look, democratizing data and facilitating data sharing creates conflicts with regulatory requirements around data privacy. As such many organizations are being really selective with their data mesh implementations and hence our prediction of narrowing the scope of data mesh initiatives. >>I think that was right on J P M C is a good example of this, where you got a single group within a, within a division narrowly implementing the data mesh architecture. They're using a w s, they're using data lakes, they're using Amazon Glue, creating a catalog and a variety of other techniques to meet their objectives. They kind of automating data quality and it was pretty well thought out and interesting approach and I think it's gonna be made easier by some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to eliminate ET t l, better connections between Aurora and Redshift and, and, and better data sharing the data clean room. So a lot of that is gonna help. Of course, snowflake has been on this for a while now. Many other companies are facing, you know, limitations as we said here and this slide with their Hadoop data platforms. They need to do new, some new thinking around that to scale. HelloFresh is a really good example of this. Look, the bottom line is that organizations want to get more value from data and having a centralized, highly specialized teams that own the data problem, it's been a barrier and a blocker to success. The data mesh starts with organizational considerations as described in great detail by Ash Nair of Warner Brothers. So take a listen to this clip. >>Yeah, so when people think of Warner Brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of H B O, you think of t n t, you think of C N N. We have 30 plus brands in our portfolio and each have their own needs. So the, the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against, as an example, HBO if Game of Thrones is going on. >>So it's often the case that data mesh is in the eyes of the implementer. And while a company's implementation may not strictly adhere to Jamma Dani's vision of data mesh, and that's okay, the goal is to use data more effectively. And despite Gartner's attempts to deposition data mesh in favor of the somewhat confusing or frankly far more confusing data fabric concept that they stole from NetApp data mesh is taking hold in organizations globally today. So we're gonna take a B on this one. The prediction is shaping up the way we envision, but as we previously reported, it's gonna take some time. The better part of a decade in our view, new standards have to emerge to make this vision become reality and they'll come in the form of both open and de facto approaches. Okay, our eighth prediction last year focused on the face off between Snowflake and Databricks. >>And we realized this popular topic, and maybe one that's getting a little overplayed, but these are two companies that initially, you know, looked like they were shaping up as partners and they, by the way, they are still partnering in the field. But you go back a couple years ago, the idea of using an AW w s infrastructure, Databricks machine intelligence and applying that on top of Snowflake as a facile data warehouse, still very viable. But both of these companies, they have much larger ambitions. They got big total available markets to chase and large valuations that they have to justify. So what's happening is, as we've previously reported, each of these companies is moving toward the other firm's core domain and they're building out an ecosystem that'll be critical for their future. So as part of that effort, we said each is gonna become aggressive investors and maybe start doing some m and a and they have in various companies. >>And on this chart that we produced last year, we studied some of the companies that were targets and we've added some recent investments of both Snowflake and Databricks. As you can see, they've both, for example, invested in elation snowflake's, put money into Lacework, the Secur security firm, ThoughtSpot, which is trying to democratize data with ai. Collibra is a governance platform and you can see Databricks investments in data transformation with D B T labs, Matillion doing simplified business intelligence hunters. So that's, you know, they're security investment and so forth. So other than our thought that we'd see Databricks I p o last year, this prediction been pretty spot on. So we'll give ourselves an A on that one. Now observability has been a hot topic and we've been covering it for a while with our friends at E T R, particularly Eric Bradley. Our number nine prediction last year was basically that if you're not cloud native and observability, you are gonna be in big trouble. >>So everything guys gotta go cloud native. And that's clearly been the case. Splunk, the big player in the space has been transitioning to the cloud, hasn't always been pretty, as we reported, Datadog real momentum, the elk stack, that's open source model. You got new entrants that we've cited before, like observe, honeycomb, chaos search and others that we've, we've reported on, they're all born in the cloud. So we're gonna take another a on this one, admittedly, yeah, it's a re reasonably easy call, but you gotta have a few of those in the mix. Okay, our last prediction, our number 10 was around events. Something the cube knows a little bit about. We said that a new category of events would emerge as hybrid and that for the most part is happened. So that's gonna be the mainstay is what we said. That pure play virtual events are gonna give way to hi hybrid. >>And the narrative is that virtual only events are, you know, they're good for quick hits, but lousy replacements for in-person events. And you know that said, organizations of all shapes and sizes, they learn how to create better virtual content and support remote audiences during the pandemic. So when we set at pure play is gonna give way to hybrid, we said we, we i we implied or specific or specified that the physical event that v i p experience is going defined. That overall experience and those v i p events would create a little fomo, fear of, of missing out in a virtual component would overlay that serves an audience 10 x the size of the physical. We saw that really two really good examples. Red Hat Summit in Boston, small event, couple thousand people served tens of thousands, you know, online. Second was Google Cloud next v i p event in, in New York City. >>Everything else was, was, was, was virtual. You know, even examples of our prediction of metaverse like immersion have popped up and, and and, and you know, other companies are doing roadshow as we predicted like a lot of companies are doing it. You're seeing that as a major trend where organizations are going with their sales teams out into the regions and doing a little belly to belly action as opposed to the big giant event. That's a definitely a, a trend that we're seeing. So in reviewing this prediction, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, but the, but the organization still haven't figured it out. They have hybrid experiences but they generally do a really poor job of leveraging the afterglow and of event of an event. It still tends to be one and done, let's move on to the next event or the next city. >>Let the sales team pick up the pieces if they were paying attention. So because of that, we're only taking a B plus on this one. Okay, so that's the review of last year's predictions. You know, overall if you average out our grade on the 10 predictions that come out to a b plus, I dunno why we can't seem to get that elusive a, but we're gonna keep trying our friends at E T R and we are starting to look at the data for 2023 from the surveys and all the work that we've done on the cube and our, our analysis and we're gonna put together our predictions. We've had literally hundreds of inbounds from PR pros pitching us. We've got this huge thick folder that we've started to review with our yellow highlighter. And our plan is to review it this month, take a look at all the data, get some ideas from the inbounds and then the e t R of January surveys in the field. >>It's probably got a little over a thousand responses right now. You know, they'll get up to, you know, 1400 or so. And once we've digested all that, we're gonna go back and publish our predictions for 2023 sometime in January. So stay tuned for that. All right, we're gonna leave it there for today. You wanna thank Alex Myerson who's on production and he manages the podcast, Ken Schiffman as well out of our, our Boston studio. I gotta really heartfelt thank you to Kristen Martin and Cheryl Knight and their team. They helped get the word out on social and in our newsletters. Rob Ho is our editor in chief over at Silicon Angle who does some great editing for us. Thank you all. Remember all these podcasts are available or all these episodes are available is podcasts. Wherever you listen, just all you do Search Breaking analysis podcast, really getting some great traction there. Appreciate you guys subscribing. I published each week on wikibon.com, silicon angle.com or you can email me directly at david dot valante silicon angle.com or dm me Dante, or you can comment on my LinkedIn post. And please check out ETR AI for the very best survey data in the enterprise tech business. Some awesome stuff in there. This is Dante for the Cube Insights powered by etr. Thanks for watching and we'll see you next time on breaking analysis.

Published Date : Dec 18 2022

SUMMARY :

From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, We kind of nailed the momentum in the energy but not the i p O that we had predicted Aqua Securities focus on And then, you know, I lumia holding its own, you So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge put some meat in the bone, so to speak, and and allow us than you to say, okay, We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain Kind of like an O K R and you know, we strive to provide data We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, Chris Miller of the register put out a Supercloud block diagram, something else that So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud But it's largely confined and narrow data problems with limited scope as you can see here with some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to the company so that, you know, CNN can work at their own pace. So it's often the case that data mesh is in the eyes of the implementer. but these are two companies that initially, you know, looked like they were shaping up as partners and they, So that's, you know, they're security investment and so forth. So that's gonna be the mainstay is what we And the narrative is that virtual only events are, you know, they're good for quick hits, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, You know, overall if you average out our grade on the 10 predictions that come out to a b plus, You know, they'll get up to, you know,

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Chris Miller	PERSON	0.99+
CNN	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
5.1%	QUANTITY	0.99+
2022	DATE	0.99+
Charles Fitzgerald	PERSON	0.99+
Dave Hatfield	PERSON	0.99+
Brian Gracely	PERSON	0.99+
2019	DATE	0.99+
Lacework	ORGANIZATION	0.99+
two	QUANTITY	0.99+
GCP	ORGANIZATION	0.99+
33%	QUANTITY	0.99+
Walmart	ORGANIZATION	0.99+
David	PERSON	0.99+
2021	DATE	0.99+
20%	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
Palo Alto	LOCATION	0.99+
2020	DATE	0.99+
Ash Nair	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
162 billion	QUANTITY	0.99+
New York City	LOCATION	0.99+
Databricks	ORGANIZATION	0.99+
October	DATE	0.99+
last year	DATE	0.99+
Arctic Wolf	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
38%	QUANTITY	0.99+
September	DATE	0.99+
Fed	ORGANIZATION	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
80 billion	QUANTITY	0.99+
29%	QUANTITY	0.99+
32%	QUANTITY	0.99+
21 predictions	QUANTITY	0.99+
30%	QUANTITY	0.99+
HBO	ORGANIZATION	0.99+
75%	QUANTITY	0.99+
Game of Thrones	TITLE	0.99+
January	DATE	0.99+
2023	DATE	0.99+
10 predictions	QUANTITY	0.99+
both	QUANTITY	0.99+
22	QUANTITY	0.99+
ThoughtSpot	ORGANIZATION	0.99+
196 million	QUANTITY	0.99+
30	QUANTITY	0.99+
each	QUANTITY	0.99+
last year	DATE	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
2020s	DATE	0.99+
167 billion	QUANTITY	0.99+
Okta	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Eric Bradley	PERSON	0.99+
Aqua Securities	ORGANIZATION	0.99+
Dante	PERSON	0.99+
8%	QUANTITY	0.99+
Warner Brothers	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
Cube Studios	ORGANIZATION	0.99+
each week	QUANTITY	0.99+
7 billion	QUANTITY	0.99+
40%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+

Felix Van de Maele, Collibra, Data Citizens 22

(upbeat techno music) >> Collibra is a company that was founded in 2008 right before the so-called modern big data era kicked into high gear. The company was one of the first to focus its business on data governance. Now, historically, data governance and data quality initiatives, they were back office functions, and they were largely confined to regulated industries that had to comply with public policy mandates. But as the cloud went mainstream the tech giants showed us how valuable data could become, and the value proposition for data quality and trust, it evolved from primarily a compliance driven issue, to becoming a linchpin of competitive advantage. But, data in the decade of the 2010s was largely about getting the technology to work. You had these highly centralized technical teams that were formed and they had hyper-specialized skills, to develop data architectures and processes, to serve the myriad data needs of organizations. And it resulted in a lot of frustration, with data initiatives for most organizations, that didn't have the resources of the cloud guys and the social media giants, to really attack their data problems and turn data into gold. This is why today, for example, there's quite a bit of momentum to re-thinking monolithic data architectures. You see, you hear about initiatives like Data Mesh and the idea of data as a product. They're gaining traction as a way to better serve the the data needs of decentralized business users. You hear a lot about data democratization. So these decentralization efforts around data, they're great, but they create a new set of problems. Specifically, how do you deliver, like a self-service infrastructure to business users and domain experts? Now the cloud is definitely helping with that but also, how do you automate governance? This becomes especially tricky as protecting data privacy has become more and more important. In other words, while it's enticing to experiment, and run fast and loose with data initiatives, kind of like the Wild West, to find new veins of gold, it has to be done responsibly. As such, the idea of data governance has had to evolve to become more automated and intelligent. Governance and data lineage is still fundamental to ensuring trust as data. It moves like water through an organization. No one is going to use data that is entrusted. Metadata has become increasingly important for data discovery and data classification. As data flows through an organization, the continuously ability to check for data flaws and automating that data quality, they become a functional requirement of any modern data management platform. And finally, data privacy has become a critical adjacency to cyber security. So you can see how data governance has evolved into a much richer set of capabilities than it was 10 or 15 years ago. Hello and welcome to theCUBE's coverage of Data Citizens made possible by Collibra, a leader in so-called Data intelligence and the host of Data Citizens 2022, which is taking place in San Diego. My name is Dave Vellante and I'm one of the hosts of our program which is running in parallel to Data Citizens. Now at theCUBE we like to say we extract the signal from the noise, and over the next couple of days we're going to feature some of the themes from the keynote speakers at Data Citizens, and we'll hear from several of the executives. Felix Van de Maele, who is the co-founder and CEO of Collibra, will join us. Along with one of the other founders of Collibra, Stan Christiaens, who's going to join my colleague Lisa Martin. I'm going to also sit down with Laura Sellers, she's the Chief Product Officer at Collibra. We'll talk about some of the the announcements and innovations they're making at the event, and then we'll dig in further to data quality with Kirk Haslbeck. He's the Vice President of Data Quality at Collibra. He's an amazingly smart dude who founded Owl DQ, a company that he sold to Collibra last year. Now, many companies they didn't make it through the Hadoop era, you know they missed the industry waves and they became driftwood. Collibra, on the other hand, has evolved its business, they've leveraged the cloud, expanded its product portfolio and leaned in heavily to some major partnerships with cloud providers as well as receiving a strategic investment from Snowflake, earlier this year. So, it's a really interesting story that we're thrilled to be sharing with you. Thanks for watching and I hope you enjoy the program. (upbeat rock music) Last year theCUBE covered Data Citizens, Collibra's customer event, and the premise that we put forth prior to that event was that despite all the innovation that's gone on over the last decade or more with data, you know starting with the Hadoop movement, we had Data lakes, we had Spark, the ascendancy of programming languages like Python, the introduction of frameworks like Tensorflow, the rise of AI, Low Code, No Code, et cetera. Businesses still find it's too difficult to get more value from their data initiatives, and we said at the time, you know maybe it's time to rethink data innovation. While a lot of the effort has been focused on, you more efficiently storing and processing data, perhaps more energy needs to go into thinking about the people and the process side of the equation. Meaning, making it easier for domain experts to both gain insights from data, trust the data, and begin to use that data in new ways, fueling data products, monetization, and insights. Data Citizens 2022 is back and we're pleased to have Felix Van de Maele who is the founder and CEO of Collibra. He's on theCUBE. We're excited to have you Felix. Good to see you again. >> Likewise Dave. Thanks for having me again. >> You bet. All right, we're going to get the update from Felix on the current data landscape, how he sees it why data intelligence is more important now than ever, and get current on what Collibra has been up to over the past year, and what's changed since Data citizens 2021, and we may even touch on some of the product news. So Felix, we're living in a very different world today with businesses and consumers. They're struggling with things like supply chains, uncertain economic trends and we're not just snapping back to the 2010s, that's clear, and that's really true as well in the world of data. So what's different in your mind, in the data landscape of the 2020s, from the previous decade, and what challenges does that bring for your customers? >> Yeah, absolutely, and and I think you said it well, Dave and the intro that, that rising complexity and fragmentation, in the broader data landscape, that hasn't gotten any better over the last couple of years. When when we talk to our customers, that level of fragmentation, the complexity, how do we find data that we can trust, that we know we can use, has only gotten more more difficult. So that trend that's continuing, I think what is changing is that trend has become much more acute. Well, the other thing we've seen over the last couple of years is that the level of scrutiny that organizations are under, respect to data, as data becomes more mission critical, as data becomes more impactful than important, the level of scrutiny with respect to privacy, security, regulatory compliance, as only increasing as well. Which again, is really difficult in this environment of continuous innovation, continuous change, continuous growing complexity, and fragmentation. So, it's become much more acute. And to your earlier point, we do live in a different world and and the past couple of years we could probably just kind of brute force it, right? We could focus on, on the top line, there was enough kind of investments to be, to be had. I think nowadays organizations are focused or are, are, are are, are, are in a very different environment where there's much more focus on cost control, productivity, efficiency, how do we truly get the value from that data? So again, I think it just another incentive for organization to now truly look at data and to scale with data, not just from a a technology and infrastructure perspective, but how do we actually scale data from an organizational perspective, right? You said at the, the people and process, how do we do that at scale? And that's only, only, only becoming much more important, and we do believe that the, the economic environment that we find ourselves in today is going to be catalyst for organizations to really take that more seriously if, if, if you will, than they maybe have in the have in the past. >> You know, I don't know when you guys founded Collibra, if you had a sense as to how complicated it was going to get, but you've been on a mission to really address these problems from the beginning. How would you describe your, your, your mission and what are you doing to address these challenges? >> Yeah, absolutely. We, we started Collibra in 2008. So, in some sense and the, the last kind of financial crisis and that was really the, the start of Collibra, where we found product market fit, working with large financial institutions to help them cope with the increasing compliance requirements that they were faced with because of the, of the financial crisis. And kind of here we are again, in a very different environment of course 15 years, almost 15 years later, but data only becoming more important. But our mission to deliver trusted data for every user, every use case and across every source, frankly, has only become more important. So, what has been an incredible journey over the last 14, 15 years, I think we're still relatively early in our mission to again, be able to provide everyone, and that's why we call it Data Citizens, we truly believe that everyone in the organization should be able to use trusted data in an easy, easy matter. That mission is is only becoming more important, more relevant. We definitely have a lot more work ahead of us because we still relatively early in that, in that journey. >> Well that's interesting, because you know, in my observation it takes 7 to 10 years to actually build a company, and then the fact that you're still in the early days is kind of interesting. I mean, you, Collibra's had a good 12 months or so since we last spoke at Data Citizens. Give us the latest update on your business. What do people need to know about your current momentum? >> Yeah, absolutely. Again, there's a lot of tailwind organizations that are only maturing their data practices and we've seen that kind of transform or influence a lot of our business growth that we've seen, broader adoption of the platform. We work at some of the largest organizations in the world with its Adobe, Heineken, Bank of America and many more. We have now over 600 enterprise customers, all industry leaders and every single vertical. So it's, it's really exciting to see that and continue to partner with those organizations. On the partnership side, again, a lot of momentum in the org in the, in the market with some of the cloud partners like Google, Amazon, Snowflake, Data Breaks, and and others, right? As those kind of new modern data infrastructures, modern data architectures, are definitely all moving to the cloud. A great opportunity for us, our partners, and of course our customers, to help them kind of transition to the cloud even faster. And so we see a lot of excitement and momentum there. We did an acquisition about 18 months ago around data quality, data observability, which we believe is an enormous opportunity. Of course data quality isn't new but I think there's a lot of reasons why we're so excited about quality and observability now. One, is around leveraging AI machine learning again to drive more automation. And a second is that those data pipelines, that are now being created in the cloud, in these modern data architecture, architectures, they've become mission critical. They've become real time. And so monitoring, observing those data pipelines continuously, has become absolutely critical so that they're really excited about, about that as well. And on the organizational side, I'm sure you've heard the term around kind of data mesh, something that's gaining a lot of momentum, rightfully so. It's really the type of governance that we always believed in. Federated, focused on domains, giving a lot of ownership to different teams. I think that's the way to scale data organizations, and so that aligns really well with our vision and from a product perspective, we've seen a lot of momentum with our customers there as well. >> Yeah, you know, a couple things there. I mean, the acquisition of OwlDQ, you know Kirk Haslbeck and, and their team. It's interesting, you know the whole data quality used to be this back office function and and really confined to highly regulated industries. It's come to the front office, it's top of mind for Chief Data Officers. Data mesh, you mentioned you guys are a connective tissue for all these different nodes on the data mesh. That's key. And of course we see you at all the shows. You're, you're a critical part of many ecosystems and you're developing your own ecosystem. So, let's chat a little bit about the, the products. We're going to go deeper into products later on, at Data Citizens 22, but we know you're debuting some, some new innovations, you know, whether it's, you know, the the under the covers in security, sort of making data more accessible for people, just dealing with workflows and processes, as you talked about earlier. Tell us a little bit about what you're introducing. >> Yeah, absolutely. We we're super excited, a ton of innovation. And if we think about the big theme and like, like I said, we're still relatively early in this, in this journey towards kind of that mission of data intelligence that really bolts and compelling mission. Either customers are still start, are just starting on that, on that journey. We want to make it as easy as possible for the, for organization to actually get started, because we know that's important that they do. And for our organization and customers, that have been with us for some time, there's still a tremendous amount of opportunity to kind of expand the platform further. And again to make it easier for, really to, to accomplish that mission and vision around that Data Citizen, that everyone has access to trustworthy data in a very easy, easy way. So that's really the theme of a lot of the innovation that we're driving, a lot of kind of ease of adoption, ease of use, but also then, how do we make sure that, as clear becomes this kind of mission critical enterprise platform, from a security performance, architecture scale supportability, that we're truly able to deliver that kind of an enterprise mission critical platform. And so that's the big theme. From an innovation perspective, from a product perspective, a lot of new innovation that we're really excited about. A couple of highlights. One, is around data marketplace. Again, a lot of our customers have plans in that direction, How to make it easy? How do we make How do we make available to true kind of shopping experience? So that anybody in the organization can, in a very easy search first way, find the right data product, find the right dataset, that they can then consume. Usage analytics, how do you, how do we help organizations drive adoption? Tell them where they're working really well and where they have opportunities. Homepages again to, to make things easy for, for people, for anyone in your organization, to kind of get started with Collibra. You mentioned Workflow Designer, again, we have a very powerful enterprise platform, one of our key differentiators is the ability to really drive a lot of automation through workflows. And now we provided a, a new Low-Code, No-Code kind of workflow designer experience. So, so really customers can take it to the next level. There's a lot more new product around Collibra protect, which in partnership with Snowflake, which has been a strategic investor in Collibra, focused on how do we make access governance easier? How do we, how do we, how are we able to make sure that as you move to the cloud, things like access management, masking around sensitive data, PIA data, is managed as a much more effective, effective rate. Really excited about that product. There's more around data quality. Again, how do we, how do we get that deployed as easily, and quickly, and widely as we can? Moving that to the cloud has been a big part of our strategy. So, we launch our data quality cloud product, as well as making use of those, those native compute capabilities and platforms, like Snowflake, Databricks, Google, Amazon, and others. And so we are bettering a capability, a capability that we call push down, so we're actually pushing down the computer and data quality, to monitoring into the underlying platform, which again from a scale performance and ease of use perspective, is going to make a massive difference. And then more broadly, we talked a little bit about the ecosystem. Again, integrations, we talk about being able to connect to every source. Integrations are absolutely critical, and we're really excited to deliver new integrations with Snowflake, Azure and Google Cloud storage as well. So that's a lot coming out, the team has been work, at work really hard, and we are really really excited about what we are coming, what we're bringing to market. >> Yeah, a lot going on there. I wonder if you could give us your, your closing thoughts. I mean, you you talked about, you know, the marketplace, you know you think about Data Mesh, you think of data as product, one of the key principles, you think about monetization. This is really different than what we've been used to in data, which is just getting the technology to work has been, been so hard. So, how do you see sort of the future and, you know give us the, your closing thoughts please? >> Yeah, absolutely. And, and I think we we're really at a pivotal moment and I think you said it well. We, we all know the constraint and the challenges with data, how to actually do data at scale. And while we've seen a ton of innovation on the infrastructure side, we fundamentally believe that just getting a faster database is important, but it's not going to fully solve the challenges and truly kind of deliver on the opportunity. And that's why now is really the time to, deliver this data intelligence vision, this data intelligence platform. We are still early, making it as easy as we can, as kind of our, as our mission. And so I'm really, really excited to see what we, what we are going to, how the marks are going to evolve over the next, next few quarters and years. I think the trend is clearly there. We talked about Data Mesh, this kind of federated approach focus on data products, is just another signal that we believe, that a lot of our organization are now at the time, they're understanding need to go beyond just the technology. I really, really think about how to actually scale data as a business function, just like we've done with IT, with HR, with sales and marketing, with finance. That's how we need to think about data. I think now is the time, given the economic environment that we are in, much more focus on control, much more focus on productivity, efficiency, and now is the time we need to look beyond just the technology and infrastructure to think of how to scale data, how to manage data at scale. >> Yeah, it's a new era. The next 10 years of data won't be like the last, as I always say. Felix, thanks so much. Good luck in, in San Diego. I know you're going to crush it out there. >> Thank you Dave. >> Yeah, it's a great spot for an in-person event and and of course the content post-event is going to be available at collibra.com and you can of course catch theCUBE coverage at theCUBE.net and all the news at siliconangle.com. This is Dave Vellante for theCUBE, your leader in enterprise and emerging tech coverage. (upbeat techno music)

Published Date : Nov 2 2022

SUMMARY :

and the premise that we put for having me again. in the data landscape of the 2020s, and to scale with data, and what are you doing to And kind of here we are again, still in the early days a lot of momentum in the org in the, And of course we see you at all the shows. is the ability to the technology to work and now is the time we need to look of data won't be like the and of course the content

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Heineken	ORGANIZATION	0.99+
Adobe	ORGANIZATION	0.99+
Felix Van de Maele	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Laura Sellers	PERSON	0.99+
Collibra	ORGANIZATION	0.99+
2008	DATE	0.99+
Felix	PERSON	0.99+
San Diego	LOCATION	0.99+
Stan Christiaens	PERSON	0.99+
Dave	PERSON	0.99+
Bank of America	ORGANIZATION	0.99+
7	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+
2020s	DATE	0.99+
last year	DATE	0.99+
2010s	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
Python	TITLE	0.99+
Last year	DATE	0.99+
12 months	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
one	QUANTITY	0.99+
Data Citizens	ORGANIZATION	0.99+
Databricks	ORGANIZATION	0.99+
Owl DQ	ORGANIZATION	0.98+
10	DATE	0.98+
OwlDQ	ORGANIZATION	0.98+
Kirk Haslbeck	PERSON	0.98+
10 years	QUANTITY	0.98+
One	QUANTITY	0.98+
Spark	TITLE	0.98+
today	DATE	0.98+
first	QUANTITY	0.97+
Data Citizens	EVENT	0.97+
earlier this year	DATE	0.96+
Tensorflow	TITLE	0.96+
Data Citizens 22	ORGANIZATION	0.95+
both	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
15 years ago	DATE	0.93+
over 600 enterprise customers	QUANTITY	0.91+
past couple of years	DATE	0.91+
about 18 months ago	DATE	0.9+
collibra.com	OTHER	0.89+
Data citizens 2021	ORGANIZATION	0.88+
Data Citizens 2022	EVENT	0.86+
almost 15 years later	DATE	0.85+
West	LOCATION	0.85+
Azure	TITLE	0.84+
first way	QUANTITY	0.83+
Vice President	PERSON	0.83+
last couple of years	DATE	0.8+

Power Panel: Does Hardware Still Matter

(upbeat music) >> The ascendancy of cloud and SAS has shown new light on how organizations think about, pay for, and value hardware. Once sought after skills for practitioners with expertise in hardware troubleshooting, configuring ports, tuning storage arrays, and maximizing server utilization has been superseded by demand for cloud architects, DevOps pros, developers with expertise in microservices, container, application development, and like. Even a company like Dell, the largest hardware company in enterprise tech touts that it has more software engineers than those working in hardware. Begs the question, is hardware going the way of Coball? Well, not likely. Software has to run on something, but the labor needed to deploy, and troubleshoot, and manage hardware infrastructure is shifting. At the same time, we've seen the value flow also shifting in hardware. Once a world dominated by X86 processors value is flowing to alternatives like Nvidia and arm based designs. Moreover, other componentry like NICs, accelerators, and storage controllers are becoming more advanced, integrated, and increasingly important. The question is, does it matter? And if so, why does it matter and to whom? What does it mean to customers, workloads, OEMs, and the broader society? Hello and welcome to this week's Wikibon theCUBE Insights powered by ETR. In this breaking analysis, we've organized a special power panel of industry analysts and experts to address the question, does hardware still matter? Allow me to introduce the panel. Bob O'Donnell is president and chief analyst at TECHnalysis Research. Zeus Kerravala is the founder and principal analyst at ZK Research. David Nicholson is a CTO and tech expert. Keith Townson is CEO and founder of CTO Advisor. And Marc Staimer is the chief dragon slayer at Dragon Slayer Consulting and oftentimes a Wikibon contributor. Guys, welcome to theCUBE. Thanks so much for spending some time here. >> Good to be here. >> Thanks. >> Thanks for having us. >> Okay before we get into it, I just want to bring up some data from ETR. This is a survey that ETR does every quarter. It's a survey of about 1200 to 1500 CIOs and IT buyers and I'm showing a subset of the taxonomy here. This XY axis and the vertical axis is something called net score. That's a measure of spending momentum. It's essentially the percentage of customers that are spending more on a particular area than those spending less. You subtract the lesses from the mores and you get a net score. Anything the horizontal axis is pervasion in the data set. Sometimes they call it market share. It's not like IDC market share. It's just the percentage of activity in the data set as a percentage of the total. That red 40% line, anything over that is considered highly elevated. And for the past, I don't know, eight to 12 quarters, the big four have been AI and machine learning, containers, RPA and cloud and cloud of course is very impressive because not only is it elevated in the vertical access, but you know it's very highly pervasive on the horizontal. So what I've done is highlighted in red that historical hardware sector. The server, the storage, the networking, and even PCs despite the work from home are depressed in relative terms. And of course, data center collocation services. Okay so you're seeing obviously hardware is not... People don't have the spending momentum today that they used to. They've got other priorities, et cetera, but I want to start and go kind of around the horn with each of you, what is the number one trend that each of you sees in hardware and why does it matter? Bob O'Donnell, can you please start us off? >> Sure Dave, so look, I mean, hardware is incredibly important and one comment first I'll make on that slide is let's not forget that hardware, even though it may not be growing, the amount of money spent on hardware continues to be very, very high. It's just a little bit more stable. It's not as subject to big jumps as we see certainly in other software areas. But look, the important thing that's happening in hardware is the diversification of the types of chip architectures we're seeing and how and where they're being deployed, right? You refer to this in your opening. We've moved from a world of x86 CPUs from Intel and AMD to things like obviously GPUs, DPUs. We've got VPU for, you know, computer vision processing. We've got AI-dedicated accelerators, we've got all kinds of other network acceleration tools and AI-powered tools. There's an incredible diversification of these chip architectures and that's been happening for a while but now we're seeing them more widely deployed and it's being done that way because workloads are evolving. The kinds of workloads that we're seeing in some of these software areas require different types of compute engines than traditionally we've had. The other thing is (coughs), excuse me, the power requirements based on where geographically that compute happens is also evolving. This whole notion of the edge, which I'm sure we'll get into a little bit more detail later is driven by the fact that where the compute actually sits closer to in theory the edge and where edge devices are, depending on your definition, changes the power requirements. It changes the kind of connectivity that connects the applications to those edge devices and those applications. So all of those things are being impacted by this growing diversity in chip architectures. And that's a very long-term trend that I think we're going to continue to see play out through this decade and well into the 2030s as well. >> Excellent, great, great points. Thank you, Bob. Zeus up next, please. >> Yeah, and I think the other thing when you look at this chart to remember too is, you know, through the pandemic and the work from home period a lot of companies did put their office modernization projects on hold and you heard that echoed, you know, from really all the network manufacturers anyways. They always had projects underway to upgrade networks. They put 'em on hold. Now that people are starting to come back to the office, they're looking at that now. So we might see some change there, but Bob's right. The size of those market are quite a bit different. I think the other big trend here is the hardware companies, at least in the areas that I look at networking are understanding now that it's a combination of hardware and software and silicon that works together that creates that optimum type of performance and experience, right? So some things are best done in silicon. Some like data forwarding and things like that. Historically when you look at the way network devices were built, you did everything in hardware. You configured in hardware, they did all the data for you, and did all the management. And that's been decoupled now. So more and more of the control element has been placed in software. A lot of the high-performance things, encryption, and as I mentioned, data forwarding, packet analysis, stuff like that is still done in hardware, but not everything is done in hardware. And so it's a combination of the two. I think, for the people that work with the equipment as well, there's been more shift to understanding how to work with software. And this is a mistake I think the industry made for a while is we had everybody convinced they had to become a programmer. It's really more a software power user. Can you pull things out of software? Can you through API calls and things like that. But I think the big frame here is, David, it's a combination of hardware, software working together that really make a difference. And you know how much you invest in hardware versus software kind of depends on the performance requirements you have. And I'll talk about that later but that's really the big shift that's happened here. It's the vendors that figured out how to optimize performance by leveraging the best of all of those. >> Excellent. You guys both brought up some really good themes that we can tap into Dave Nicholson, please. >> Yeah, so just kind of picking up where Bob started off. Not only are we seeing the rise of a variety of CPU designs, but I think increasingly the connectivity that's involved from a hardware perspective, from a kind of a server or service design perspective has become increasingly important. I think we'll get a chance to look at this in more depth a little bit later but when you look at what happens on the motherboard, you know we're not in so much a CPU-centric world anymore. Various application environments have various demands and you can meet them by using a variety of components. And it's extremely significant when you start looking down at the component level. It's really important that you optimize around those components. So I guess my summary would be, I think we are moving out of the CPU-centric hardware model into more of a connectivity-centric model. We can talk more about that later. >> Yeah, great. And thank you, David, and Keith Townsend I really interested in your perspectives on this. I mean, for years you worked in a data center surrounded by hardware. Now that we have the software defined data center, please chime in here. >> Well, you know, I'm going to dig deeper into that software-defined data center nature of what's happening with hardware. Hardware is meeting software infrastructure as code is a thing. What does that code look like? We're still trying to figure out but servicing up these capabilities that the previous analysts have brought up, how do I ensure that I can get the level of services needed for the applications that I need? Whether they're legacy, traditional data center, workloads, AI ML, workloads, workloads at the edge. How do I codify that and consume that as a service? And hardware vendors are figuring this out. HPE, the big push into GreenLake as a service. Dale now with Apex taking what we need, these bare bone components, moving it forward with DDR five, six CXL, et cetera, and surfacing that as cold or as services. This is a very tough problem. As we transition from consuming a hardware-based configuration to this infrastructure as cold paradigm shift. >> Yeah, programmable infrastructure, really attacking that sort of labor discussion that we were having earlier, okay. Last but not least Marc Staimer, please. >> Thanks, Dave. My peers raised really good points. I agree with most of them, but I'm going to disagree with the title of this session, which is, does hardware matter? It absolutely matters. You can't run software on the air. You can't run it in an ephemeral cloud, although there's the technical cloud and that's a different issue. The cloud is kind of changed everything. And from a market perspective in the 40 plus years I've been in this business, I've seen this perception that hardware has to go down in price every year. And part of that was driven by Moore's law. And we're coming to, let's say a lag or an end, depending on who you talk to Moore's law. So we're not doubling our transistors every 18 to 24 months in a chip and as a result of that, there's been a higher emphasis on software. From a market perception, there's no penalty. They don't put the same pressure on software from the market to reduce the cost every year that they do on hardware, which kind of bass ackwards when you think about it. Hardware costs are fixed. Software costs tend to be very low. It's kind of a weird thing that we do in the market. And what's changing is we're now starting to treat hardware like software from an OPEX versus CapEx perspective. So yes, hardware matters. And we'll talk about that more in length. >> You know, I want to follow up on that. And I wonder if you guys have a thought on this, Bob O'Donnell, you and I have talked about this a little bit. Marc, you just pointed out that Moore's laws could have waning. Pat Gelsinger recently at their investor meeting said that he promised that Moore's law is alive and well. And the point I made in breaking analysis was okay, great. You know, Pat said, doubling transistors every 18 to 24 months, let's say that Intel can do that. Even though we know it's waning somewhat. Look at the M1 Ultra from Apple (chuckles). In about 15 months increased transistor density on their package by 6X. So to your earlier point, Bob, we have this sort of these alternative processors that are really changing things. And to Dave Nicholson's point, there's a whole lot of supporting components as well. Do you have a comment on that, Bob? >> Yeah, I mean, it's a great point, Dave. And one thing to bear in mind as well, not only are we seeing a diversity of these different chip architectures and different types of components as a number of us have raised the other big point and I think it was Keith that mentioned it. CXL and interconnect on the chip itself is dramatically changing it. And a lot of the more interesting advances that are going to continue to drive Moore's law forward in terms of the way we think about performance, if perhaps not number of transistors per se, is the interconnects that become available. You're seeing the development of chiplets or tiles, people use different names, but the idea is you can have different components being put together eventually in sort of a Lego block style. And what that's also going to allow, not only is that going to give interesting performance possibilities 'cause of the faster interconnect. So you can share, have shared memory between things which for big workloads like AI, huge data sets can make a huge difference in terms of how you talk to memory over a network connection, for example, but not only that you're going to see more diversity in the types of solutions that can be built. So we're going to see even more choices in hardware from a silicon perspective because you'll be able to piece together different elements. And oh, by the way, the other benefit of that is we've reached a point in chip architectures where not everything benefits from being smaller. We've been so focused and so obsessed when it comes to Moore's law, to the size of each individual transistor and yes, for certain architecture types, CPUs and GPUs in particular, that's absolutely true, but we've already hit the point where things like RF for 5g and wifi and other wireless technologies and a whole bunch of other things actually don't get any better with a smaller transistor size. They actually get worse. So the beauty of these chiplet architectures is you could actually combine different chip manufacturing sizes. You know you hear about four nanometer and five nanometer along with 14 nanometer on a single chip, each one optimized for its specific application yet together, they can give you the best of all worlds. And so we're just at the very beginning of that era, which I think is going to drive a ton of innovation. Again, gets back to my comment about different types of devices located geographically different places at the edge, in the data center, you know, in a private cloud versus a public cloud. All of those things are going to be impacted and there'll be a lot more options because of this silicon diversity and this interconnect diversity that we're just starting to see. >> Yeah, David. David Nicholson's got a graphic on that. They're going to show later. Before we do that, I want to introduce some data. I actually want to ask Keith to comment on this before we, you know, go on. This next slide is some data from ETR that shows the percent of customers that cited difficulty procuring hardware. And you can see the red is they had significant issues and it's most pronounced in laptops and networking hardware on the far right-hand side, but virtually all categories, firewalls, peripheral servers, storage are having moderately difficult procurement issues. That's the sort of pinkish or significant challenges. So Keith, I mean, what are you seeing with your customers in the hardware supply chains and bottlenecks? And you know we're seeing it with automobiles and appliances but so it goes beyond IT. The semiconductor, you know, challenges. What's been the impact on the buyer community and society and do you have any sense as to when it will subside? >> You know, I was just asked this question yesterday and I'm feeling the pain. People question, kind of a side project within the CTO advisor, we built a hybrid infrastructure, traditional IT data center that we're walking with the traditional customer and modernizing that data center. So it was, you know, kind of a snapshot of time in 2016, 2017, 10 gigabit, ARISTA switches, some older Dell's 730 XD switches, you know, speeds and feeds. And we said we would modern that with the latest Intel stack and connected to the public cloud and then the pandemic hit and we are experiencing a lot of the same challenges. I thought we'd easily migrate from 10 gig networking to 25 gig networking path that customers are going on. The 10 gig network switches that I bought used are now double the price because you can't get legacy 10 gig network switches because all of the manufacturers are focusing on the more profitable 25 gig for capacity, even the 25 gig switches. And we're focused on networking right now. It's hard to procure. We're talking about nine to 12 months or more lead time. So we're seeing customers adjust by adopting cloud. But if you remember early on in the pandemic, Microsoft Azure kind of gated customers that didn't have a capacity agreement. So customers are keeping an eye on that. There's a desire to abstract away from the underlying vendor to be able to control or provision your IT services in a way that we do with VMware VP or some other virtualization technology where it doesn't matter who can get me the hardware, they can just get me the hardware because it's critically impacting projects and timelines. >> So that's a great setup Zeus for you with Keith mentioned the earlier the software-defined data center with software-defined networking and cloud. Do you see a day where networking hardware is monetized and it's all about the software, or are we there already? >> No, we're not there already. And I don't see that really happening any time in the near future. I do think it's changed though. And just to be clear, I mean, when you look at that data, this is saying customers have had problems procuring the equipment, right? And there's not a network vendor out there. I've talked to Norman Rice at Extreme, and I've talked to the folks at Cisco and ARISTA about this. They all said they could have had blowout quarters had they had the inventory to ship. So it's not like customers aren't buying this anymore. Right? I do think though, when it comes to networking network has certainly changed some because there's a lot more controls as I mentioned before that you can do in software. And I think the customers need to start thinking about the types of hardware they buy and you know, where they're going to use it and, you know, what its purpose is. Because I've talked to customers that have tried to run software and commodity hardware and where the performance requirements are very high and it's bogged down, right? It just doesn't have the horsepower to run it. And, you know, even when you do that, you have to start thinking of the components you use. The NICs you buy. And I've talked to customers that have simply just gone through the process replacing a NIC card and a commodity box and had some performance problems and, you know, things like that. So if agility is more important than performance, then by all means try running software on commodity hardware. I think that works in some cases. If performance though is more important, that's when you need that kind of turnkey hardware system. And I've actually seen more and more customers reverting back to that model. In fact, when you talk to even some startups I think today about when they come to market, they're delivering things more on appliances because that's what customers want. And so there's this kind of app pivot this pendulum of agility and performance. And if performance absolutely matters, that's when you do need to buy these kind of turnkey, prebuilt hardware systems. If agility matters more, that's when you can go more to software, but the underlying hardware still does matter. So I think, you know, will we ever have a day where you can just run it on whatever hardware? Maybe but I'll long be retired by that point. So I don't care. >> Well, you bring up a good point Zeus. And I remember the early days of cloud, the narrative was, oh, the cloud vendors. They don't use EMC storage, they just run on commodity storage. And then of course, low and behold, you know, they've trot out James Hamilton to talk about all the custom hardware that they were building. And you saw Google and Microsoft follow suit. >> Well, (indistinct) been falling for this forever. Right? And I mean, all the way back to the turn of the century, we were calling for the commodity of hardware. And it's never really happened because you can still drive. As long as you can drive innovation into it, customers will always lean towards the innovation cycles 'cause they get more features faster and things. And so the vendors have done a good job of keeping that cycle up but it'll be a long time before. >> Yeah, and that's why you see companies like Pure Storage. A storage company has 69% gross margins. All right. I want to go jump ahead. We're going to bring up the slide four. I want to go back to something that Bob O'Donnell was talking about, the sort of supporting act. The diversity of silicon and we've marched to the cadence of Moore's law for decades. You know, we asked, you know, is Moore's law dead? We say it's moderating. Dave Nicholson. You want to talk about those supporting components. And you shared with us a slide that shift. You call it a shift from a processor-centric world to a connect-centric world. What do you mean by that? And let's bring up slide four and you can talk to that. >> Yeah, yeah. So first, I want to echo this sentiment that the question does hardware matter is sort of the answer is of course it matters. Maybe the real question should be, should you care about it? And the answer to that is it depends who you are. If you're an end user using an application on your mobile device, maybe you don't care how the architecture is put together. You just care that the service is delivered but as you back away from that and you get closer and closer to the source, someone needs to care about the hardware and it should matter. Why? Because essentially what hardware is doing is it's consuming electricity and dollars and the more efficiently you can configure hardware, the more bang you're going to get for your buck. So it's not only a quantitative question in terms of how much can you deliver? But it also ends up being a qualitative change as capabilities allow for things we couldn't do before, because we just didn't have the aggregate horsepower to do it. So this chart actually comes out of some performance tests that were done. So it happens to be Dell servers with Broadcom components. And the point here was to peel back, you know, peel off the top of the server and look at what's in that server, starting with, you know, the PCI interconnect. So PCIE gen three, gen four, moving forward. What are the effects on from an interconnect versus on performance application performance, translating into new orders per minute, processed per dollar, et cetera, et cetera? If you look at the advances in CPU architecture mapped against the advances in interconnect and storage subsystem performance, you can see that CPU architecture is sort of lagging behind in a way. And Bob mentioned this idea of tiling and all of the different ways to get around that. When we do performance testing, we can actually peg CPUs, just running the performance tests without any actual database environments working. So right now we're at this sort of imbalance point where you have to make sure you design things properly to get the most bang per kilowatt hour of power per dollar input. So the key thing here what this is highlighting is just as a very specific example, you take a card that's designed as a gen three PCIE device, and you plug it into a gen four slot. Now the card is the bottleneck. You plug a gen four card into a gen four slot. Now the gen four slot is the bottleneck. So we're constantly chasing these bottlenecks. Someone has to be focused on that from an architectural perspective, it's critically important. So there's no question that it matters. But of course, various people in this food chain won't care where it comes from. I guess a good analogy might be, where does our food come from? If I get a steak, it's a pink thing wrapped in plastic, right? Well, there are a lot of inputs that a lot of people have to care about to get that to me. Do I care about all of those things? No. Are they important? They're critically important. >> So, okay. So all I want to get to the, okay. So what does this all mean to customers? And so what I'm hearing from you is to balance a system it's becoming, you know, more complicated. And I kind of been waiting for this day for a long time, because as we all know the bottleneck was always the spinning disc, the last mechanical. So people who wrote software knew that when they were doing it right, the disc had to go and do stuff. And so they were doing other things in the software. And now with all these new interconnects and flash and things like you could do atomic rights. And so that opens up new software possibilities and combine that with alternative processes. But what's the so what on this to the customer and the application impact? Can anybody address that? >> Yeah, let me address that for a moment. I want to leverage some of the things that Bob said, Keith said, Zeus said, and David said, yeah. So I'm a bit of a contrarian in some of this. For example, on the chip side. As the chips get smaller, 14 nanometer, 10 nanometer, five nanometer, soon three nanometer, we talk about more cores, but the biggest problem on the chip is the interconnect from the chip 'cause the wires get smaller. People don't realize in 2004 the latency on those wires in the chips was 80 picoseconds. Today it's 1300 picoseconds. That's on the chip. This is why they're not getting faster. So we maybe getting a little bit slowing down in Moore's law. But even as we kind of conquer that you still have the interconnect problem and the interconnect problem goes beyond the chip. It goes within the system, composable architectures. It goes to the point where Keith made, ultimately you need a hybrid because what we're seeing, what I'm seeing and I'm talking to customers, the biggest issue they have is moving data. Whether it be in a chip, in a system, in a data center, between data centers, moving data is now the biggest gating item in performance. So if you want to move it from, let's say your transactional database to your machine learning, it's the bottleneck, it's moving the data. And so when you look at it from a distributed environment, now you've got to move the compute to the data. The only way to get around these bottlenecks today is to spend less time in trying to move the data and more time in taking the compute, the software, running on hardware closer to the data. Go ahead. >> So is this what you mean when Nicholson was talking about a shift from a processor centric world to a connectivity centric world? You're talking about moving the bits across all the different components, not having the processor you're saying is essentially becoming the bottleneck or the memory, I guess. >> Well, that's one of them and there's a lot of different bottlenecks, but it's the data movement itself. It's moving away from, wait, why do we need to move the data? Can we move the compute, the processing closer to the data? Because if we keep them separate and this has been a trend now where people are moving processing away from it. It's like the edge. I think it was Zeus or David. You were talking about the edge earlier. As you look at the edge, who defines the edge, right? Is the edge a closet or is it a sensor? If it's a sensor, how do you do AI at the edge? When you don't have enough power, you don't have enough computable. People were inventing chips to do that. To do all that at the edge, to do AI within the sensor, instead of moving the data to a data center or a cloud to do the processing. Because the lag in latency is always limited by speed of light. How fast can you move the electrons? And all this interconnecting, all the processing, and all the improvement we're seeing in the PCIE bus from three, to four, to five, to CXL, to a higher bandwidth on the network. And that's all great but none of that deals with the speed of light latency. And that's an-- Go ahead. >> You know Marc, no, I just want to just because what you're referring to could be looked at at a macro level, which I think is what you're describing. You can also look at it at a more micro level from a systems design perspective, right? I'm going to be the resident knuckle dragging hardware guy on the panel today. But it's exactly right. You moving compute closer to data includes concepts like peripheral cards that have built in intelligence, right? So again, in some of this testing that I'm referring to, we saw dramatic improvements when you basically took the horsepower instead of using the CPU horsepower for the like IO. Now you have essentially offload engines in the form of storage controllers, rate controllers, of course, for ethernet NICs, smart NICs. And so when you can have these sort of offload engines and we've gone through these waves over time. People think, well, wait a minute, raid controller and NVMe? You know, flash storage devices. Does that make sense? It turns out it does. Why? Because you're actually at a micro level doing exactly what you're referring to. You're bringing compute closer to the data. Now, closer to the data meaning closer to the data storage subsystem. It doesn't solve the macro issue that you're referring to but it is important. Again, going back to this idea of system design optimization, always chasing the bottleneck, plugging the holes. Someone needs to do that in this value chain in order to get the best value for every kilowatt hour of power and every dollar. >> Yeah. >> Well this whole drive performance has created some really interesting architectural designs, right? Like Nickelson, the rise of the DPU right? Brings more processing power into systems that already had a lot of processing power. There's also been some really interesting, you know, kind of innovation in the area of systems architecture too. If you look at the way Nvidia goes to market, their drive kit is a prebuilt piece of hardware, you know, optimized for self-driving cars, right? They partnered with Pure Storage and ARISTA to build that AI-ready infrastructure. I remember when I talked to Charlie Giancarlo, the CEO of Pure about when the three companies rolled that out. He said, "Look, if you're going to do AI, "you need good store. "You need fast storage, fast processor and fast network." And so for customers to be able to put that together themselves was very, very difficult. There's a lot of software that needs tuning as well. So the three companies partner together to create a fully integrated turnkey hardware system with a bunch of optimized software that runs on it. And so in that case, in some ways the hardware was leading the software innovation. And so, the variety of different architectures we have today around hardware has really exploded. And I think it, part of the what Bob brought up at the beginning about the different chip design. >> Yeah, Bob talked about that earlier. Bob, I mean, most AI today is modeling, you know, and a lot of that's done in the cloud and it looks from my standpoint anyway that the future is going to be a lot of AI inferencing at the edge. And that's a radically different architecture, Bob, isn't it? >> It is, it's a completely different architecture. And just to follow up on a couple points, excellent conversation guys. Dave talked about system architecture and really this that's what this boils down to, right? But it's looking at architecture at every level. I was talking about the individual different components the new interconnect methods. There's this new thing called UCIE universal connection. I forget what it stands answer for, but it's a mechanism for doing chiplet architectures, but then again, you have to take it up to the system level, 'cause it's all fine and good. If you have this SOC that's tuned and optimized, but it has to talk to the rest of the system. And that's where you see other issues. And you've seen things like CXL and other interconnect standards, you know, and nobody likes to talk about interconnect 'cause it's really wonky and really technical and not that sexy, but at the end of the day it's incredibly important exactly. To the other points that were being raised like mark raised, for example, about getting that compute closer to where the data is and that's where again, a diversity of chip architectures help and exactly to your last comment there Dave, putting that ability in an edge device is really at the cutting edge of what we're seeing on a semiconductor design and the ability to, for example, maybe it's an FPGA, maybe it's a dedicated AI chip. It's another kind of chip architecture that's being created to do that inferencing on the edge. Because again, it's that the cost and the challenges of moving lots of data, whether it be from say a smartphone to a cloud-based application or whether it be from a private network to a cloud or any other kinds of permutations we can think of really matters. And the other thing is we're tackling bigger problems. So architecturally, not even just architecturally within a system, but when we think about DPUs and the sort of the east west data center movement conversation that we hear Nvidia and others talk about, it's about combining multiple sets of these systems to function together more efficiently again with even bigger sets of data. So really is about tackling where the processing is needed, having the interconnect and the ability to get where the data you need to the right place at the right time. And because those needs are diversifying, we're just going to continue to see an explosion of different choices and options, which is going to make hardware even more essential I would argue than it is today. And so I think what we're going to see not only does hardware matter, it's going to matter even more in the future than it does now. >> Great, yeah. Great discussion, guys. I want to bring Keith back into the conversation here. Keith, if your main expertise in tech is provisioning LUNs, you probably you want to look for another job. So maybe clearly hardware matters, but with software defined everything, do people with hardware expertise matter outside of for instance, component manufacturers or cloud companies? I mean, VMware certainly changed the dynamic in servers. Dell just spun off its most profitable asset and VMware. So it obviously thinks hardware can stand alone. How does an enterprise architect view the shift to software defined hyperscale cloud and how do you see the shifting demand for skills in enterprise IT? >> So I love the question and I'll take a different view of it. If you're a data analyst and your primary value add is that you do ETL transformation, talk to a CDO, a chief data officer over midsize bank a little bit ago. He said 80% of his data scientists' time is done on ETL. Super not value ad. He wants his data scientists to do data science work. Chances are if your only value is that you do LUN provisioning, then you probably don't have a job now. The technologies have gotten much more intelligent. As infrastructure pros, we want to give infrastructure pros the opportunities to shine and I think the software defined nature and the automation that we're seeing vendors undertake, whether it's Dell, HP, Lenovo take your pick that Pure Storage, NetApp that are doing the automation and the ML needed so that these practitioners don't spend 80% of their time doing LUN provisioning and focusing on their true expertise, which is ensuring that data is stored. Data is retrievable, data's protected, et cetera. I think the shift is to focus on that part of the job that you're ensuring no matter where the data's at, because as my data is spread across the enterprise hybrid different types, you know, Dave, you talk about the super cloud a lot. If my data is in the super cloud, protecting that data and securing that data becomes much more complicated when than when it was me just procuring or provisioning LUNs. So when you say, where should the shift be, or look be, you know, focusing on the real value, which is making sure that customers can access data, can recover data, can get data at performance levels that they need within the price point. They need to get at those datasets and where they need it. We talked a lot about where they need out. One last point about this interconnecting. I have this vision and I think we all do of composable infrastructure. This idea that scaled out does not solve every problem. The cloud can give me infinite scale out. Sometimes I just need a single OS with 64 terabytes of RAM and 204 GPUs or GPU instances that single OS does not exist today. And the opportunity is to create composable infrastructure so that we solve a lot of these problems that just simply don't scale out. >> You know, wow. So many interesting points there. I had just interviewed Zhamak Dehghani, who's the founder of Data Mesh last week. And she made a really interesting point. She said, "Think about, we have separate stacks. "We have an application stack and we have "a data pipeline stack and the transaction systems, "the transaction database, we extract data from that," to your point, "We ETL it in, you know, it takes forever. "And then we have this separate sort of data stack." If we're going to inject more intelligence and data and AI into applications, those two stacks, her contention is they have to come together. And when you think about, you know, super cloud bringing compute to data, that was what Haduck was supposed to be. It ended up all sort of going into a central location, but it's almost a rhetorical question. I mean, it seems that that necessitates new thinking around hardware architectures as it kind of everything's the edge. And the other point is to your point, Keith, it's really hard to secure that. So when you can think about offloads, right, you've heard the stats, you know, Nvidia talks about it. Broadcom talks about it that, you know, that 30%, 25 to 30% of the CPU cycles are wasted on doing things like storage offloads, or networking or security. It seems like maybe Zeus you have a comment on this. It seems like new architectures need to come other to support, you know, all of that stuff that Keith and I just dispute. >> Yeah, and by the way, I do want to Keith, the question you just asked. Keith, it's the point I made at the beginning too about engineers do need to be more software-centric, right? They do need to have better software skills. In fact, I remember talking to Cisco about this last year when they surveyed their engineer base, only about a third of 'em had ever made an API call, which you know that that kind of shows this big skillset change, you know, that has to come. But on the point of architectures, I think the big change here is edge because it brings in distributed compute models. Historically, when you think about compute, even with multi-cloud, we never really had multi-cloud. We'd use multiple centralized clouds, but compute was always centralized, right? It was in a branch office, in a data center, in a cloud. With edge what we creates is the rise of distributed computing where we'll have an application that actually accesses different resources and at different edge locations. And I think Marc, you were talking about this, like the edge could be in your IoT device. It could be your campus edge. It could be cellular edge, it could be your car, right? And so we need to start thinkin' about how our applications interact with all those different parts of that edge ecosystem, you know, to create a single experience. The consumer apps, a lot of consumer apps largely works that way. If you think of like app like Uber, right? It pulls in information from all kinds of different edge application, edge services. And, you know, it creates pretty cool experience. We're just starting to get to that point in the business world now. There's a lot of security implications and things like that, but I do think it drives more architectural decisions to be made about how I deploy what data where and where I do my processing, where I do my AI and things like that. It actually makes the world more complicated. In some ways we can do so much more with it, but I think it does drive us more towards turnkey systems, at least initially in order to, you know, ensure performance and security. >> Right. Marc, I wanted to go to you. You had indicated to me that you wanted to chat about this a little bit. You've written quite a bit about the integration of hardware and software. You know, we've watched Oracle's move from, you know, buying Sun and then basically using that in a highly differentiated approach. Engineered systems. What's your take on all that? I know you also have some thoughts on the shift from CapEx to OPEX chime in on that. >> Sure. When you look at it, there are advantages to having one vendor who has the software and hardware. They can synergistically make them work together that you can't do in a commodity basis. If you own the software and somebody else has the hardware, I'll give you an example would be Oracle. As you talked about with their exit data platform, they literally are leveraging microcode in the Intel chips. And now in AMD chips and all the way down to Optane, they make basically AMD database servers work with Optane memory PMM in their storage systems, not MVME, SSD PMM. I'm talking about the cards itself. So there are advantages you can take advantage of if you own the stack, as you were putting out earlier, Dave, of both the software and the hardware. Okay, that's great. But on the other side of that, that tends to give you better performance, but it tends to cost a little more. On the commodity side it costs less but you get less performance. What Zeus had said earlier, it depends where you're running your application. How much performance do you need? What kind of performance do you need? One of the things about moving to the edge and I'll get to the OPEX CapEx in a second. One of the issues about moving to the edge is what kind of processing do you need? If you're running in a CCTV camera on top of a traffic light, how much power do you have? How much cooling do you have that you can run this? And more importantly, do you have to take the data you're getting and move it somewhere else and get processed and the information is sent back? I mean, there are companies out there like Brain Chip that have developed AI chips that can run on the sensor without a CPU. Without any additional memory. So, I mean, there's innovation going on to deal with this question of data movement. There's companies out there like Tachyon that are combining GPUs, CPUs, and DPUs in a single chip. Think of it as super composable architecture. They're looking at being able to do more in less. On the OPEX and CapEx issue. >> Hold that thought, hold that thought on the OPEX CapEx, 'cause we're running out of time and maybe you can wrap on that. I just wanted to pick up on something you said about the integrated hardware software. I mean, other than the fact that, you know, Michael Dell unlocked whatever $40 billion for himself and Silverlake, I was always a fan of a spin in with VMware basically become the Oracle of hardware. Now I know it would've been a nightmare for the ecosystem and culturally, they probably would've had a VMware brain drain, but what does anybody have any thoughts on that as a sort of a thought exercise? I was always a fan of that on paper. >> I got to eat a little crow. I did not like the Dale VMware acquisition for the industry in general. And I think it hurt the industry in general, HPE, Cisco walked away a little bit from that VMware relationship. But when I talked to customers, they loved it. You know, I got to be honest. They absolutely loved the integration. The VxRail, VxRack solution exploded. Nutanix became kind of a afterthought when it came to competing. So that spin in, when we talk about the ability to innovate and the ability to create solutions that you just simply can't create because you don't have the full stack. Dell was well positioned to do that with a potential span in of VMware. >> Yeah, we're going to be-- Go ahead please. >> Yeah, in fact, I think you're right, Keith, it was terrible for the industry. Great for Dell. And I remember talking to Chad Sakac when he was running, you know, VCE, which became Rack and Rail, their ability to stay in lockstep with what VMware was doing. What was the number one workload running on hyperconverged forever? It was VMware. So their ability to remain in lockstep with VMware gave them a huge competitive advantage. And Dell came out of nowhere in, you know, the hyper-converged market and just started taking share because of that relationship. So, you know, this sort I guess it's, you know, from a Dell perspective I thought it gave them a pretty big advantage that they didn't really exploit across their other properties, right? Networking and service and things like they could have given the dominance that VMware had. From an industry perspective though, I do think it's better to have them be coupled. So. >> I agree. I mean, they could. I think they could have dominated in super cloud and maybe they would become the next Oracle where everybody hates 'em, but they kick ass. But guys. We got to wrap up here. And so what I'm going to ask you is I'm going to go and reverse the order this time, you know, big takeaways from this conversation today, which guys by the way, I can't thank you enough phenomenal insights, but big takeaways, any final thoughts, any research that you're working on that you want highlight or you know, what you look for in the future? Try to keep it brief. We'll go in reverse order. Maybe Marc, you could start us off please. >> Sure, on the research front, I'm working on a total cost of ownership of an integrated database analytics machine learning versus separate services. On the other aspect that I would wanted to chat about real quickly, OPEX versus CapEx, the cloud changed the market perception of hardware in the sense that you can use hardware or buy hardware like you do software. As you use it, pay for what you use in arrears. The good thing about that is you're only paying for what you use, period. You're not for what you don't use. I mean, it's compute time, everything else. The bad side about that is you have no predictability in your bill. It's elastic, but every user I've talked to says every month it's different. And from a budgeting perspective, it's very hard to set up your budget year to year and it's causing a lot of nightmares. So it's just something to be aware of. From a CapEx perspective, you have no more CapEx if you're using that kind of base system but you lose a certain amount of control as well. So ultimately that's some of the issues. But my biggest point, my biggest takeaway from this is the biggest issue right now that everybody I talk to in some shape or form it comes down to data movement whether it be ETLs that you talked about Keith or other aspects moving it between hybrid locations, moving it within a system, moving it within a chip. All those are key issues. >> Great, thank you. Okay, CTO advisor, give us your final thoughts. >> All right. Really, really great commentary. Again, I'm going to point back to us taking the walk that our customers are taking, which is trying to do this conversion of all primary data center to a hybrid of which I have this hard earned philosophy that enterprise IT is additive. When we add a service, we rarely subtract a service. So the landscape and service area what we support has to grow. So our research focuses on taking that walk. We are taking a monolithic application, decomposing that to containers, and putting that in a public cloud, and connecting that back private data center and telling that story and walking that walk with our customers. This has been a super enlightening panel. >> Yeah, thank you. Real, real different world coming. David Nicholson, please. >> You know, it really hearkens back to the beginning of the conversation. You talked about momentum in the direction of cloud. I'm sort of spending my time under the hood, getting grease under my fingernails, focusing on where still the lions share of spend will be in coming years, which is OnPrem. And then of course, obviously data center infrastructure for cloud but really diving under the covers and helping folks understand the ramifications of movement between generations of CPU architecture. I know we all know Sapphire Rapids pushed into the future. When's the next Intel release coming? Who knows? We think, you know, in 2023. There have been a lot of people standing by from a practitioner's standpoint asking, well, what do I do between now and then? Does it make sense to upgrade bits and pieces of hardware or go from a last generation to a current generation when we know the next generation is coming? And so I've been very, very focused on looking at how these connectivity components like rate controllers and NICs. I know it's not as sexy as talking about cloud but just how these opponents completely change the game and actually can justify movement from say a 14th-generation architecture to a 15th-generation architecture today, even though gen 16 is coming, let's say 12 months from now. So that's where I am. Keep my phone number in the Rolodex. I literally reference Rolodex intentionally because like I said, I'm in there under the hood and it's not as sexy. But yeah, so that's what I'm focused on Dave. >> Well, you know, to paraphrase it, maybe derivative paraphrase of, you know, Larry Ellison's rant on what is cloud? It's operating systems and databases, et cetera. Rate controllers and NICs live inside of clouds. All right. You know, one of the reasons I love working with you guys is 'cause have such a wide observation space and Zeus Kerravala you, of all people, you know you have your fingers in a lot of pies. So give us your final thoughts. >> Yeah, I'm not a propeller heady as my chip counterparts here. (all laugh) So, you know, I look at the world a little differently and a lot of my research I'm doing now is the impact that distributed computing has on customer employee experiences, right? You talk to every business and how the experiences they deliver to their customers is really differentiating how they go to market. And so they're looking at these different ways of feeding up data and analytics and things like that in different places. And I think this is going to have a really profound impact on enterprise IT architecture. We're putting more data, more compute in more places all the way down to like little micro edges and retailers and things like that. And so we need the variety. Historically, if you think back to when I was in IT you know, pre-Y2K, we didn't have a lot of choice in things, right? We had a server that was rack mount or standup, right? And there wasn't a whole lot of, you know, differences in choice. But today we can deploy, you know, these really high-performance compute systems on little blades inside servers or inside, you know, autonomous vehicles and things. I think the world from here gets... You know, just the choice of what we have and the way hardware and software works together is really going to, I think, change the world the way we do things. We're already seeing that, like I said, in the consumer world, right? There's so many things you can do from, you know, smart home perspective, you know, natural language processing, stuff like that. And it's starting to hit businesses now. So just wait and watch the next five years. >> Yeah, totally. The computing power at the edge is just going to be mind blowing. >> It's unbelievable what you can do at the edge. >> Yeah, yeah. Hey Z, I just want to say that we know you're not a propeller head and I for one would like to thank you for having your master's thesis hanging on the wall behind you 'cause we know that you studied basket weaving. >> I was actually a physics math major, so. >> Good man. Another math major. All right, Bob O'Donnell, you're going to bring us home. I mean, we've seen the importance of semiconductors and silicon in our everyday lives, but your last thoughts please. >> Sure and just to clarify, by the way I was a great books major and this was actually for my final paper. And so I was like philosophy and all that kind of stuff and literature but I still somehow got into tech. Look, it's been a great conversation and I want to pick up a little bit on a comment Zeus made, which is this it's the combination of the hardware and the software and coming together and the manner with which that needs to happen, I think is critically important. And the other thing is because of the diversity of the chip architectures and all those different pieces and elements, it's going to be how software tools evolve to adapt to that new world. So I look at things like what Intel's trying to do with oneAPI. You know, what Nvidia has done with CUDA. What other platform companies are trying to create tools that allow them to leverage the hardware, but also embrace the variety of hardware that is there. And so as those software development environments and software development tools evolve to take advantage of these new capabilities, that's going to open up a lot of interesting opportunities that can leverage all these new chip architectures. That can leverage all these new interconnects. That can leverage all these new system architectures and figure out ways to make that all happen, I think is going to be critically important. And then finally, I'll mention the research I'm actually currently working on is on private 5g and how companies are thinking about deploying private 5g and the potential for edge applications for that. So I'm doing a survey of several hundred us companies as we speak and really looking forward to getting that done in the next couple of weeks. >> Yeah, look forward to that. Guys, again, thank you so much. Outstanding conversation. Anybody going to be at Dell tech world in a couple of weeks? Bob's going to be there. Dave Nicholson. Well drinks on me and guys I really can't thank you enough for the insights and your participation today. Really appreciate it. Okay, and thank you for watching this special power panel episode of theCube Insights powered by ETR. Remember we publish each week on Siliconangle.com and wikibon.com. All these episodes they're available as podcasts. DM me or any of these guys. I'm at DVellante. You can email me at David.Vellante@siliconangle.com. Check out etr.ai for all the data. This is Dave Vellante. We'll see you next time. (upbeat music)

Published Date : Apr 25 2022

SUMMARY :

but the labor needed to go kind of around the horn the applications to those edge devices Zeus up next, please. on the performance requirements you have. that we can tap into It's really important that you optimize I mean, for years you worked for the applications that I need? that we were having earlier, okay. on software from the market And the point I made in breaking at the edge, in the data center, you know, and society and do you have any sense as and I'm feeling the pain. and it's all about the software, of the components you use. And I remember the early days And I mean, all the way back Yeah, and that's why you see And the answer to that is the disc had to go and do stuff. the compute to the data. So is this what you mean when Nicholson the processing closer to the data? And so when you can have kind of innovation in the area that the future is going to be the ability to get where and how do you see the shifting demand And the opportunity is to to support, you know, of that edge ecosystem, you know, that you wanted to chat One of the things about moving to the edge I mean, other than the and the ability to create solutions Yeah, we're going to be-- And I remember talking to Chad the order this time, you know, in the sense that you can use hardware us your final thoughts. So the landscape and service area Yeah, thank you. in the direction of cloud. You know, one of the reasons And I think this is going to The computing power at the edge you can do at the edge. on the wall behind you I was actually a of semiconductors and silicon and the manner with which Okay, and thank you for watching

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Marc Staimer	PERSON	0.99+
Keith Townson	PERSON	0.99+
David Nicholson	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Keith	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Marc	PERSON	0.99+
Bob O'Donnell	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Bob	PERSON	0.99+
HP	ORGANIZATION	0.99+
Lenovo	ORGANIZATION	0.99+
2004	DATE	0.99+
Charlie Giancarlo	PERSON	0.99+
ZK Research	ORGANIZATION	0.99+
Pat	PERSON	0.99+
10 nanometer	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
10 gig	QUANTITY	0.99+
25	QUANTITY	0.99+
Pat Gelsinger	PERSON	0.99+
80%	QUANTITY	0.99+
ARISTA	ORGANIZATION	0.99+
64 terabytes	QUANTITY	0.99+
Nvidia	ORGANIZATION	0.99+
Zeus Kerravala	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Larry Ellison	PERSON	0.99+
25 gig	QUANTITY	0.99+
14 nanometer	QUANTITY	0.99+
2017	DATE	0.99+
2016	DATE	0.99+
Norman Rice	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
VMware	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
69%	QUANTITY	0.99+
30%	QUANTITY	0.99+
OPEX	ORGANIZATION	0.99+
Pure Storage	ORGANIZATION	0.99+
$40 billion	QUANTITY	0.99+
Dragon Slayer Consulting	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Addison: