HelloFresh v2

>>Hello. And we're here at the cube startup showcase made possible by a Ws. Thanks so much for joining us today. You know when Jim McDaid Ghani was formulating her ideas around data mesh, She wasn't the only one thinking about decentralized data architecture. Hello, Fresh was going into hyper growth mode and realized that in order to support its scale, it needed to rethink how it thought about data. Like many companies that started in the early part of last decade, Hello Fresh relied on a monolithic data architecture and the internal team. It had concerns about its ability to support continued innovation at high velocity. The company's data team began to think about the future and work backwards from a target architecture which possessed many principles of so called data mesh even though they didn't use that term. Specifically, the company is a strong example of an early but practical pioneer of data mission. Now there are many practitioners and stakeholders involved in evolving the company's data architecture, many of whom are listed here on this on the slide to are highlighted in red are joining us today, we're really excited to welcome into the cube Clements cheese, the Global Senior Director for Data at Hello Fresh and christoph Nevada who's the Global Senior Director of data also, of course. Hello Fresh folks. Welcome. Thanks so much for making some time today and sharing your story. >>Thank you very much. Hey >>steve. All right, let's start with Hello Fresh. You guys are number one in the world in your field, you deliver hundreds of millions of meals each year to many, many millions of people around the globe. You're scaling christoph. Tell us a little bit more about your company and its vision. >>Yeah. Should I start or Clements maybe maybe take over the first piece because Clements has actually been a longer trajectory yet have a fresh. >>Yeah go ahead. Climate change. I mean yes about approximately six years ago I joined handle fresh and I didn't think about the startup I was joining would eventually I. P. O. And just two years later and the freshman public and approximately three years and 10 months after. Hello fresh was listed on the German stock exchange which was just last week. Hello Fresh was included in the Ducks Germany's leading stock market index and debt to mind a great great milestone and I'm really looking forward and I'm very excited for the future for the future for head of fashion. All our data. Um the vision that we have is to become the world's leading food solution group and there's a lot of attractive opportunities. So recently we did lounge and expand Norway. This was in july and earlier this year we launched the U. S. Brand green >>chef in the U. K. As >>well. We're committed to launch continuously different geographies in the next coming years and have a strong pipe ahead of us with the acquisition of ready to eat companies like factor in the U. S. And the planned acquisition of you foods in Australia. We're diversifying our offer now reaching even more and more untapped customer segments and increase our total addressable market. So by offering customers and growing range of different alternatives to shop food and consumer meals. We are charging towards this vision and the school to become the world's leading integrated food solutions group. >>Love it. You guys are on a rocket ship, you're really transforming the industry and as you expand your tam it brings us to sort of the data as a as a core part of that strategy. So maybe you guys could talk a little bit about your journey as a company specifically as it relates to your data journey. You began as a start up. You had a basic architecture like everyone. You made extensive use of spreadsheets. You built a Hadoop based system that started to grow and when the company I. P. O. You really started to explode. So maybe describe that journey from a data perspective. >>Yes they saw Hello fresh by 2015 approximately had evolved what amount of classical centralized management set up. So we grew very organically over the years and there were a lot of very smart people around the globe. Really building the company and building our infrastructure. Um This also means that there were a small number of internal and external sources. Data sources and a centralized the I team with a number of people producing different reports, different dashboards and products for our executives for example of our different operations teams, christian company's performance and knowledge was transferred um just via talking to each other face to face conversations and the people in the data where's team were considered as the data wizard or as the E. T. L. Wizard. Very classical challenges. And those et al. Reserves indicated the kind of like a silent knowledge of data management. Right? Um so a central data whereas team then was responsible for different type of verticals and different domains, different geographies and all this setup gave us to the beginning the flexibility to grow fast as a company in 2015 >>christoph anything that might add to that. >>Yes. Um Not expected to that one but as as clement says it right, this was kind of set up that actually work for us quite a while. And then in 2017 when L. A. Freshman public, the company also grew rapidly and just to give you an idea how that looked like. As was that the tech department self actually increased from about 40 people to almost 300 engineers And the same way as a business units as Clemens has described, also grew sustainable, sustainably. So we continue to launch hello fresh and new countries launching brands like every plate and also acquired other brands like much of a factor and with that grows also from a data perspective the number of data requests that centrally we're getting become more and more and more and also more and more complex. So that for the team meant that they had a fairly high mental load. So they had to achieve a very or basically get a very deep understanding about the business. And also suffered a lot from this context switching back and forth, essentially there to prioritize across our product request from our physical product, digital product from the physical from sorry, from the marketing perspective and also from the central reporting uh teams. And in a nutshell this was very hard for these people. And this that also to a situation that, let's say the solution that we have became not really optimal. So in a nutshell, the central function became a bottleneck and slowdown of all the innovation of the company. >>It's a classic case, isn't it? I mean Clements, you see you see the central team becomes a bottleneck and so the lines of business, the marketing team salesman's okay, we're going to take things into our own hands. And then of course I I. T. And the technical team is called in later to clean up the mess. Uh maybe, I mean was that maybe I'm overstating it, but that's a common situation, isn't it? >>Yeah. Uh This is what exactly happened. Right. So um we had a bottleneck, we have the central teams, there was always a little of tension um analytics teams then started in this business domains like marketing, trade chain, finance, HR and so on. Started really to build their own data solutions at some point you have to get the ball rolling right and then continue the trajectory um which means then that the data pipelines didn't meet the engineering standards. And um there was an increased need for maintenance and support from central teams. Hence over time the knowledge about those pipelines and how to maintain a particular uh infrastructure for example left the company such that most of those data assets and data sets are turned into a huge step with decreasing data quality um also decrease the lack of trust, decreasing transparency. And this was increasing challenge where majority of time was spent in meeting rooms to align on on data quality for example. >>Yeah. And and the point you were making christoph about context switching and this is this is a point that Jemaah makes quite often is we've we've we've contextualized are operational systems like our sales systems, our marketing system but not our our data system. So you're asking the data team, Okay. Be an expert in sales, be an expert in marketing, be an expert in logistics, be an expert in supply chain and it start stop, start, stop, it's a paper cut environment and it's just not as productive. But but on the flip side of that is when you think about a centralized organization you think, hey this is going to be a very efficient way, a cross functional team to support the organization but it's not necessarily the highest velocity, most effective organizational structure. >>Yeah, so so I agree with that. Is that up to a certain scale, a centralized function has a lot of advantages, right? That's clear for everyone which would go to some kind of expert team. However, if you see that you actually would like to accelerate that and specific and this hyper growth, right, you wanna actually have autonomy and certain teams and move the teams or let's say the data to the experts in these teams and this, as you have mentioned, right, that increases mental load and you can either internally start splitting your team into a different kind of sub teams focusing on different areas. However, that is then again, just adding another peace where actually collaboration needs to happen busy external sees, so why not bridging that gap immediately and actually move these teams and to end into into the function themselves. So maybe just to continue what, what was Clements was saying and this is actually where over. So Clements, my journey started to become one joint journey. So Clements was coming actually from one of these teams to build their own solutions. I was basically having the platform team called database housed in these days and in 2019 where basically the situation become more and more serious, I would say so more and more people have recognized that this model doesn't really scale In 2019, basically the leadership of the company came together and I identified data as a key strategic asset and what we mean by that, that if we leverage data in a proper way, it gives us a unique competitive advantage which could help us to, to support and actually fully automated our decision making process across the entire value chain. So what we're, what we're trying to do now or what we should be aiming for is that Hello, Fresh is able to build data products that have a purpose. We're moving away from the idea. Data is just a by problem products, we have a purpose why we would like to collect this data. There's a clear business need behind that. And because it's so important to for the company as a business, we also want to provide them as a trust versi asset to the rest of the organization. We say there's the best customer experience, but at least in a way that users can easily discover, understand and security access high quality data. >>Yeah, so and and and Clements, when you c J Maxx writing, you see, you know, she has the four pillars and and the principles as practitioners you look at that say, okay, hey, that's pretty good thinking and then now we have to apply it and that's and that's where the devil meets the details. So it's the four, you know, the decentralized data ownership data as a product, which we'll talk about a little bit self serve, which you guys have spent a lot of time on inclement your wheelhouse which is which is governance and a Federated governance model. And it's almost like if you if you achieve the first two then you have to solve for the second to it almost creates a new challenges but maybe you could talk about that a little bit as to how it relates to Hello fresh. >>Yes. So christophe mentioned that we identified economic challenge beforehand and for how can we actually decentralized and actually empower the different colleagues of ours. This was more a we realized that it was more an organizational or a cultural change and this is something that somebody also mentioned I think thought words mentioned one of the white papers, it's more of a organizational or cultural impact and we kicked off a um faced reorganization or different phases we're currently and um in the middle of still but we kicked off different phases of organizational reconstruct oring reorganization, try unlock this data at scale. And the idea was really moving away from um ever growing complex matrix organizations or matrix setups and split between two different things. One is the value creation. So basically when people ask the question, what can we actually do, what shall we do? This is value creation and how, which is capability building and both are equal in authority. This actually then creates a high urge and collaboration and this collaboration breaks up the different silos that were built and of course this also includes different needs of stuffing forward teams stuffing with more, let's say data scientists or data engineers, data professionals into those business domains and hence also more capability building. Um Okay, >>go ahead. Sorry. >>So back to Tzemach did johnny. So we the idea also Then crossed over when she published her papers in May 2019 and we thought well The four colors that she described um we're around decentralized data ownership, product data as a product mindset, we have a self service infrastructure and as you mentioned, Federated confidential governance. And this suited very much with our thinking at that point of time to reorganize the different teams and this then leads to a not only organisational restructure but also in completely new approach of how we need to manage data, show data. >>Got it. Okay, so your business is is exploding. Your data team will have to become domain experts in too many areas, constantly contact switching as we said, people started to take things into their own hands. So again we said classic story but but you didn't let it get out of control and that's important. So we actually have a picture of kind of where you're going today and it's evolved into this Pat, if you could bring up the picture with the the elephant here we go. So I would talk a little bit about the architecture, doesn't show it here, the spreadsheet era but christoph maybe you can talk about that. It does show the Hadoop monolith which exists today. I think that's in a managed managed hosting service, but but you you preserve that piece of it, but if I understand it correctly, everything is evolving to the cloud, I think you're running a lot of this or all of it in A W. S. Uh you've got everybody's got their own data sources, uh you've got a data hub which I think is enabled by a master catalog for discovery and all this underlying technical infrastructure. That is really not the focus of this conversation today. But the key here, if I understand it correctly is these domains are autonomous and not only that this required technical thinking, but really supportive organizational mindset, which we're gonna talk about today. But christoph maybe you could address, you know, at a high level some of the architectural evolution that you guys went through. >>Yeah, sure. Yeah, maybe it's also a good summary about the entire history. So as you have mentioned, right, we started in the very beginning with the model is on the operation of playing right? Actually, it wasn't just one model is both to one for the back end and one for the for the front and and or analytical plane was essentially a couple of spreadsheets and I think there's nothing wrong with spreadsheets, right, allows you to store information, it allows you to transform data allows you to share this information. It allows you to visualize this data, but all the kind of that's not actually separating concern right? Everything in one tool. And this means that obviously not scalable, right? You reach the point where this kind of management set up in or data management of isn't one tool reached elements. So what we have started is we've created our data lake as we have seen here on Youtube. And this at the very beginning actually reflected very much our operational populace on top of that. We used impala is a data warehouse, but there was not really a distinction between borders, our data warehouse and borders our data like the impala was used as a kind of those as the kind of engine to create a warehouse and data like construct itself and this organic growth actually led to a situation as I think it's it's clear now that we had to centralized model is for all the domains that will really lose kimball modeling standards. There was no uniformity used actually build in house uh ways of building materialized use abuse that we have used for the presentation layer, there was a lot of duplication of effort and in the end essentially they were missing feedbacks, food, which helped us to to improve of what we are filled. So in the end, in the natural, as we have said, the lack of trust and that's basically what the starting point for us to understand. Okay, how can we move away and there are a lot of different things that you can discuss of apart from this organizational structure that we have said, okay, we have these three or four pillars from from Denmark. However, there's also the next extra question around how do we implement our talking about actual right, what are the implications on that level? And I think that is there's something that we are that we are currently still in progress. >>Got it. Okay, so I wonder if we could talk about switch gears a little bit and talk about the organizational and cultural challenges that you faced. What were those conversations like? Uh let's dig into that a little bit. I want to get into governance as well. >>The conversations on the cultural change. I mean yes, we went through a hyper growth for the last year since obviously there were a lot of new joiners, a lot of different, very, very smart people joining the company which then results that collaboration uh >>got a bit more difficult. Of course >>there are times and changes, you have different different artifacts that you were created um and documentation that were flying around. Um so we were we had to build the company from scratch right? Um Of course this then resulted always this tension which I described before, but the most important part here is that data has always been a very important factor at l a fresh and we collected >>more of this >>data and continued to improve use data to improve the different key areas of our business. >>Um even >>when organizational struggles, the central organizational struggles data somehow always helped us to go through this this kind of change. Right? Um in the end those decentralized teams in our local geography ease started with solutions that serve the business which was very very important otherwise wouldn't be at the place where we are today but they did by all late best practices and standards and I always used sport analogy Dave So like any sport, there are different rules and regulations that need to be followed. These rules are defined by calling the sports association and this is what you can think about data governance and compliance team. Now we add the players to it who need to follow those rules and bite by them. This is what we then called data management. Now we have the different players and professionals, they need to be trained and understand the strategy and it rules before they can play. And this is what I then called data literacy. So we realized that we need to focus on helping our teams to develop those capabilities and teach the standards for how work is being done to truly drive functional excellence in a different domains. And one of our mission of our data literacy program for example is to really empower >>every employee at hello >>fresh everyone to make the right data informs decisions by providing data education that scaled by royal Entry team. Then this can be different things, different things like including data capabilities, um, with the learning paths for example. Right? So help them to create and deploy data products connecting data producers and data consumers and create a common sense and more understanding of each other's dependencies, which is important, for example, S. S. L. O. State of contracts and etcetera. Um, people getting more of a sense of ownership and responsibility. Of course, we have to define what it means, what does ownership means? But the responsibility means. But we're teaching this to our colleagues via individual learning patterns and help them up skill to use. Also, there's shared infrastructure and those self self service applications and overall to summarize, we're still in this progress of of, of learning, we are still learning as well. So learning never stops the tele fish, but we are really trying this um, to make it as much fun as possible. And in the end we all know user behavior has changed through positive experience. Uh, so instead of having massive training programs over endless courses of workshops, um, leaving our new journalists and colleagues confused and overwhelmed. >>We're applying um, >>game ification, right? So split different levels of certification where our colleagues can access, have had access points, they can earn badges along the way, which then simplifies the process of learning and engagement of the users and this is what we see in surveys, for example, where our employees that your justification approach a lot and are even competing to collect Those learning path batteries to become the # one on the leader board. >>I love the game ification, we've seen it work so well and so many different industries, not the least of which is crypto so you've identified some of the process gaps uh that you, you saw it is gloss over them. Sometimes I say paved the cow path. You didn't try to force, in other words, a new architecture into the legacy processes. You really have to rethink your approach to data management. So what what did that entail? >>Um, to rethink the way of data management. 100%. So if I take the example of Revolution, Industrial Revolution or classical supply chain revolution, but just imagine that you have been riding a horse, for example, your whole life and suddenly you can operate a car or you suddenly receive just a complete new way of transporting assets from A to B. Um, so we needed to establish a new set of cross functional business processes to run faster, dry faster, um, more robustly and deliver data products which can be trusted and used by downstream processes and systems. Hence we had a subset of new standards and new procedures that would fall into the internal data governance and compliance sector with internal, I'm always referring to the data operations around new things like data catalog, how to identify >>ownership, >>how to change ownership, how to certify data assets, everything around classical software development, which we know apply to data. This this is similar to a new thinking, right? Um deployment, versioning, QA all the different things, ingestion policies, policing procedures, all the things that suffer. Development has been doing. We do it now with data as well. And in simple terms, it's a whole redesign of the supply chain of our data with new procedures and new processes and as a creation as management and as a consumption. >>So data has become kind of the new development kit. If you will um I want to shift gears and talk about the notion of data product and, and we have a slide uh that we pulled from your deck and I'd like to unpack it a little bit. Uh I'll just, if you can bring that up, I'll read it. A data product is a product whose primary objective is to leverage on data to solve customer problems where customers, both internal and external. So pretty straightforward. I know you've gone much deeper and you're thinking and into your organization, but how do you think about that And how do you determine for instance who owns what? How did you get everybody to agree? >>I can take that one. Um, maybe let me start with the data product. So I think um that's an ongoing debate. Right? And I think the debate itself is an important piece here, right? That visit the debate, you clarify what we actually mean by that product and what is actually the mindset. So I think just from a definition perspective, right? I think we find the common denominator that we say okay that our product is something which is important for the company has come to its value what you mean by that. Okay, it's it's a solution to a customer problem that delivers ideally maximum value to the business. And yes, it leverages the power of data and we have a couple of examples but it had a fresh year, the historical and classical ones around dashboards for example, to monitor or error rates but also more sophisticated ways for example to incorporate machine learning algorithms in our recipe recommendations. However, I think the important aspects of the data product is a there is an owner, right? There's someone accountable for making sure that the product that we are providing is actually served and is maintained and there are, there is someone who is making sure that this actually keeps the value of that problem thing combined with the idea of the proper documentation, like a product description, right that people understand how to use their bodies is about and related to that peace is the idea of it is a purpose. Right? You need to understand or ask ourselves, Okay, why does this thing exist does it provide the value that you think it does. That leads into a good understanding about the life cycle of the data product and life cycle what we mean? Okay from the beginning from the creation you need to have a good understanding, we need to collect feedback, we need to learn about that. We need to rework and actually finally also to think about okay benefits time to decommission piece. So overall, I think the core of the data product is product thinking 11 right that we start the point is the starting point needs to be the problem and not the solution and this is essentially what we have seen what was missing but brought us to this kind of data spaghetti that we have built there in in Russia, essentially we built at certain data assets, develop in isolation and continuously patch the solution just to fulfill these articles that we got and actually these aren't really understanding of the stakeholder needs and the interesting piece as a result in duplication of work and this is not just frustrating and probably not the most efficient way how the company should work. But also if I build the same that assets but slightly different assumption across the company and multiple teams that leads to data inconsistency and imagine the following too narrow you as a management for management perspective, you're asking basically a specific question and you get essentially from a couple of different teams, different kind of grass, different kind of data and numbers and in the end you do not know which ones to trust. So there's actually much more ambiguity and you do not know actually is a noise for times of observing or is it just actually is there actually a signal that I'm looking for? And the same is if I'm running in a B test right, I have a new future, I would like to understand what has it been the business impact of this feature. I run that specific source in an unfortunate scenario. Your production system is actually running on a different source. You see different numbers. What you've seen in a B test is actually not what you see then in production typical thing then is you're asking some analytics tend to actually do a deep dive to understand where the discrepancies are coming from. The worst case scenario. Again, there's a different kind of source. So in the end it's a pretty frustrating scenario and that's actually based of time of people that have to identify the root cause of this divergence. So in a nutshell, the highest degree of consistency is actually achieved that people are just reusing Dallas assets and also in the media talk that we have given right, we we start trying to establish this approach for a B testing. So we have a team but just providing or is kind of owning their target metric associated business teams and they're providing that as a product also to other services including the A B testing team, they'll be testing team can use this information defines an interface is okay I'm joining this information that the metadata of an experiment and in the end after the assignment after this data collection face, they can easily add a graph to the dashboard. Just group by the >>Beatles Hungarian. >>And we have seen that also in other companies. So it's not just a nice dream that we have right. I have actually worked in other companies where we worked on search and we established a complete KPI pipeline that was computing all this information. And this information was hosted by the team and it was used for everything A B test and deep dives and and regular reporting. So uh just one of the second the important piece now, why I'm coming back to that is that requires that we are treating this data as a product right? If you want to have multiple people using the things that I am owning and building, we have to provide this as a trust mercy asset and in a way that it's easy for people to discover and actually work with. >>Yeah. And coming back to that. So this is to me this is why I get so excited about data mesh because I really do think it's the right direction for organizations. When people hear data product they say well, what does that mean? Uh but then when you start to sort of define it as you did, it's it's using data to add value, that could be cutting costs, that could be generating revenue, it could be actually directly you're creating a product that you monetize, So it's sort of in the eyes of the beholder. But I think the other point that we've made is you made it earlier on to and again, context. So when you have a centralized data team and you have all these P NL managers a lot of times they'll question the data because they don't own it. They're like wait a minute. If they don't, if it doesn't agree with their agenda, they'll attack the data. But if they own the data then they're responsible for defending that and that is a mindset change, that's really important. Um And I'm curious uh is how you got to, you know, that ownership? Was it a was it a top down with somebody providing leadership? Was it more organic bottom up? Was it a sort of a combination? How do you decide who owned what in other words, you know, did you get, how did you get the business to take ownership of the data and what is owning? You know, the data actually mean? >>That's a very good question. Dave I think this is one of the pieces where I think we have a lot of learnings and basically if you ask me how we could start the feeling. I think that would be the first piece. Maybe we need to start to really think about how that should be approached if it stopped his ownership. Right? It means somehow that the team has a responsibility to host and self the data efforts to minimum acceptable standards. This minimum dependencies up and down string. The interesting piece has been looking backwards. What what's happening is that under that definition has actually process that we have to go through is not actually transferring ownership from the central team to the distributor teams. But actually most cases to establish ownership, I make this difference because saying we have to transfer ownership actually would erroneously suggests that the data set was owned before. But this platform team, yes, they had the capability to make the changes on data pipelines, but actually the analytics team, they're always the ones who had the business understands, you use cases and but no one actually, but it's actually expensive expected. So we had to go through this very lengthy process and establishing ownership. We have done that, as in the beginning, very naively. They have started, here's a document here, all the data assets, what is probably the nearest neighbor who can actually take care of that and then we we moved it over. But the problem here is that all these things is kind of technical debt, right? It's not really properly documented, pretty unstable. It was built in a very inconsistent over years and these people who have built this thing have already left the company. So there's actually not a nice thing that is that you want to see and people build up a certain resistance, e even if they have actually bought into this idea of domain ownership. So if you ask me these learnings, but what needs to happen as first, the company needs to really understand what our core business concept that they have, they need to have this mapping from. These are the core business concept that we have. These are the domain teams who are owning this concept and then actually link that to the to the assets and integrated better with both understanding how we can evolve actually, the data assets and new data build things new in the in this piece in the domain. But also how can we address reduction of technical death and stabilizing what we have already. >>Thank you for that christoph. So I want to turn a direction here and talk about governance and I know that's an area that's passionate, you're passionate about. Uh I pulled this slide from your deck, which I kind of messed up a little bit sorry for that, but but by the way, we're going to publish a link to the full video that you guys did. So we'll share that with folks. But it's one of the most challenging aspects of data mesh, if you're going to decentralize you, you quickly realize this could be the Wild West as we talked about all over again. So how are you approaching governance? There's a lot of items on this slide that are, you know, underscore the complexity, whether it's privacy, compliance etcetera. So, so how did you approach this? >>It's yeah, it's about connecting those dots. Right. So the aim of the data governance program is about the autonomy of every team was still ensuring that everybody has the right interoperability. So when we want to move from the Wild West riding horses to a civilised way of transport, um you can take the example of modern street traffic, like when all participants can manoeuvre independently and as long as they follow the same rules and standards, everybody can remain compatible with each other and understand and learn from each other so we can avoid car crashes. So when I go from country to country, I do understand what the street infrastructure means. How do I drive my car? I can also read the traffic lights in the different signals. Um, so likewise as a business and Hello Fresh, we do operate autonomously and consequently need to follow those external and internal rules and standards to set forth by the redistribution in which we operate so in order to prevent a car crash, we need to at least ensure compliance with regulations to account for society's and our customers increasing concern with data protection and privacy. So teaching and advocating this advantage, realizing this to everyone in the company um was a key community communication strategy and of course, I mean I mentioned data privacy external factors, the same goes for internal regulations and processes to help our colleagues to adapt to this very new environment. So when I mentioned before the new way of thinking the new way of um dealing and managing data, this of course implies that we need new processes and regulations for our colleagues as well. Um in a nutshell then this means the data governance provides a framework for managing our people the processes and technology and culture around our data traffic. And those components must come together in order to have this effective program providing at least a common denominator, especially critical for shared dataset, which we have across our different geographies managed and shared applications on shared infrastructure and applications and is then consumed by centralized processes um for example, master data, everything and all the metrics and KPI s which are also used for a central steering. Um it's a big change day. Right. And our ultimate goal is to have this noninvasive, Federated um ultimatum and computational governance and for that we can't just talk about it. We actually have to go deep and use case by use case and Qc buy PVC and generate learnings and learnings with the different teams. And this would be a classical approach of identifying the target structure, the target status, match it with the current status by identifying together with the business teams with the different domains have a risk assessment for example, to increase transparency because a lot of teams, they might not even know what kind of situation they might be. And this is where this training and this piece of illiteracy comes into place where we go in and trade based on the findings based on the most valuable use case um and based on that help our teams to do this change to increase um their capability just a little bit more and once they hand holding. But a lot of guidance >>can I kind of kind of trying to quickly David will allow me I mean there's there's a lot of governance piece but I think um that is important. And if you're talking about documentation for example, yes, we can go from team to team and tell these people how you have to document your data and data catalog or you have to establish data contracts and so on the force. But if you would like to build data products at scale following actual governance, we need to think about automation right. We need to think about a lot of things that we can learn from engineering before. And that starts with simple things like if we would like to build up trust in our data products, right, and actually want to apply the same rigor and the best practices that we know from engineering. There are things that we can do and we should probably think about what we can copy and one example might be. So the level of service level agreements, service level objectives. So that level indicators right, that represent on on an engineering level, right? If we're providing services there representing the promises we made to our customers or consumers, these are the internal objectives that help us to keep those promises. And actually these are the way of how we are tracking ourselves, how we are doing. And this is just one example of that thing. The Federated Governor governance comes into play right. In an ideal world, we should not just talk about data as a product but also data product. That's code that we say, okay, as most as much as possible. Right? Give the engineers the tool that they are familiar basis and actually not ask the product managers for example to document their data assets in the data catalog but make it part of the configuration. Have this as a, as a C D C I, a continuous delivery pipeline as we typically see another engineering task through and services we say, okay, there is configuration, we can think about pr I can think about data quality monitoring, we can think about um the ingestion data catalog and so on and forest, I think ideally in the data product will become of a certain templates that can be deployed and are actually rejected or verified at build time before we actually make them deploy them to production. >>Yeah, So it's like devoPS for data product um so I'm envisioning almost a three phase approach to governance and you kind of, it sounds like you're in early phases called phase zero where there's there's learning, there's literacy, there's training, education, there's kind of self governance and then there's some kind of oversight, some a lot of manual stuff going on and then you you're trying to process builders at this phase and then you codify it and then you can automate it. Is that fair? >>Yeah, I would rather think think about automation as early as possible in the way and yes, there needs to be certain rules but then actually start actually use case by use case. Is there anything that small piece that we can already automate? It's as possible. Roll that out and then actually extended step by step, >>is there a role though that adjudicates that? Is there a central Chief state officer who is responsible for making sure people are complying or is it how do you handle that? >>I mean from a from a from a platform perspective, yes, we have a centralized team to uh implement certain pieces they'll be saying are important and actually would like to implement. However, that is actually working very closely with the governance department. So it's Clements piece to understand and defy the policies that needs to be implemented. >>So Clements essentially it's it's your responsibility to make sure that the policy is being followed. And then as you were saying, christoph trying to compress the time to automation as fast as possible percent. >>So >>it's really it's uh >>what needs to be really clear that it's always a split effort, Right? So you can't just do one thing or the other thing, but everything really goes hand in hand because for the right automation for the right engineering tooling, we need to have the transparency first. Uh I mean code needs to be coded so we kind of need to operate on the same level with the right understanding. So there's actually two things that are important which is one its policies and guidelines, but not only that because more importantly or even well equally important to align with the end user and tech teams and engineering and really bridge between business value business teams and the engineering teams. >>Got it. So just a couple more questions because we gotta wrap I want to talk a little bit about the business outcome. I know it's hard to quantify and I'll talk about that in a moment but but major learnings, we've got some of the challenges that you cited. I'll just put them up here. We don't have to go detailed into this, but I just wanted to share with some folks. But my question, I mean this is the advice for your peers question if you had to do it differently if you had a do over or a Mulligan as we like to say for you golfers, what would you do differently? Yeah, >>I mean can we start with from a from the transformational challenge that understanding that it's also high load of cultural change. I think this is this is important that a particular communication strategy needs to be put into place and people really need to be um supported. Right? So it's not that we go in and say well we have to change towards data mesh but naturally it's in human nature, you know, we're kind of resistance to to change right? Her speech uncomfortable. So we need to take that away by training and by communicating um chris we're gonna add something to that >>and definitely I think the point that I have also made before right we need to acknowledge that data mesh is an architecture of scale, right? You're looking for something which is necessary by huge companies who are vulnerable, data productive scale. I mean Dave you mentioned it right, there are a lot of advantages to have a centralized team but at some point it may make sense to actually decentralized here and at this point right? If you think about data Mash, you have to recognize that you're not building something on a green field. And I think there's a big learning which is also reflected here on the slide is don't underestimate your baggage. It's typically you come to a point where the old model doesn't doesn't broke anymore and has had a fresh right? We lost our trust in our data and actually we have seen certain risks that we're slowing down our innovation so we triggered that this was triggering the need to actually change something. So this transition implies that you typically have a lot of technical debt accumulated over years and I think what we have learned is that potentially we have decentralized some assets to earlier, this is not actually taking into account the maturity of the team where we are actually distributed to and now we actually in the face of correcting pieces of that one. Right? But I think if you if you if you start from scratch you have to understand, okay, is are my team is actually ready for taking on this new uh, this news capabilities and you have to make sure that business decentralization, you build up these >>capabilities and the >>teams and as Clements has mentioned, right, make sure that you take the people on your journey. I think these are the pieces that also here, it comes with this knowledge gap, right? That we need to think about hiring and literacy the technical depth I just talked about and I think the last piece that I would add now which is not here on the flight deck is also from our perspective, we started on the analytical layer because that's kind of where things are exploding, right, this is the thing that people feel the pain but I think a lot of the efforts that we have started to actually modernize the current state uh, towards data product towards data Mash. We've understood that it always comes down basically to a proper shape of our operational plane and I think what needs to happen is is I think we got through a lot of pains but the learning here is this need to really be a commitment from the company that needs to happen and to act. >>I think that point that last point you made it so critical because I I hear a lot from the vendor community about how they're gonna make analytics better and that's that's not unimportant, but but through data product thinking and decentralized data organizations really have to operationalize in order to scale. So these decisions around data architecture an organization, their fundamental and lasting, it's not necessarily about an individual project are why they're gonna be project sub projects within this architecture. But the architectural decision itself is an organizational, its cultural and what's the best approach to support your business at scale. It really speaks to to to what you are, who you are as a company, how you operate and getting that right, as we've seen in the success of data driven driven companies is yields tremendous results. So I'll ask each of you to give give us your final thoughts and then we'll wrap maybe >>maybe it quickly, please. Yeah, maybe just just jumping on this piece that you have mentioned, right, the target architecture. If we talk about these pieces right, people often have this picture of mind like OK, there are different kind of stages, we have sources, we have actually ingestion layer, we have historical transformation presentation layer and then we're basically putting a lot of technology on top of that kind of our target architecture. However, I think what we really need to make sure is that we have these different kind of viewers, right? We need to understand what are actually the capabilities that we need in our new goals. How does it look and feel from the different kind of personas and experience view? And then finally, that should actually go to the to the target architecture from a technical perspective um maybe just to give an outlook but what we're what we're planning to do, how we want to move that forward. We have actually based on our strategy in the in the sense of we would like to increase that to maturity as a whole across the entire company and this is kind of a framework around the business strategy and it's breaking down into four pillars as well. People meaning the data, cultural, data literacy, data organizational structure and so on that. We're talking about governance as Clements has actually mentioned that, right, compliance, governance, data management and so on. You talk about technology and I think we could talk for hours for that one. It's around data platform, better science platform and then finally also about enablement through data, meaning we need to understand that a quality data accessibility and the science and data monetization. >>Great, thank you christophe clement. Once you bring us home give us your final thoughts. >>Can't can just agree with christoph that uh important is to understand what kind of maturity people have to understand what the maturity level, where the company where where people organization is and really understand what does kind of some kind of a change replies to that those four pillars for example, um what needs to be taken first and this is not very clear from the very first beginning of course them it's kind of like Greenfield you come up with must wins to come up with things that we really want to do out of theory and out of different white papers. Um only if you really start conducting the first initiatives you do understand. Okay, where we have to put the starts together and where do I missed out on one of those four different pillars? People, process technology and governance. Right? And then that kind of an integration. Doing step by step, small steps by small steps not boiling the ocean where you're capable ready to identify the gaps and see where either you can fill um the gaps are where you have to increase maturity first and train people or increase your text text, >>you know Hello Fresh is an excellent example of a company that is innovating. It was not born in Silicon Valley which I love. It's a global company. Uh and I gotta ask you guys, it seems like this is an amazing place to work you guys hiring? >>Yes, >>definitely. We do >>uh as many rights as was one of these aspects distributing. And actually we are hiring as an entire company specifically for data. I think there are a lot of open roles serious. Please visit or our page from better engineering, data, product management and Clemens has a lot of rules that you can speak about. But yes >>guys, thanks so much for sharing with the cube audience, your, your pioneers and we look forward to collaborations in the future to track progress and really want to thank you for your time. >>Thank you very much. Thank you very much. Dave >>thank you for watching the cubes startup showcase made possible by A W. S. This is Dave Volonte. We'll see you next time. >>Yeah.

Published Date : Sep 20 2021

SUMMARY :

and realized that in order to support its scale, it needed to rethink how it thought Thank you very much. You guys are number one in the world in your field, Clements has actually been a longer trajectory yet have a fresh. So recently we did lounge and expand Norway. ready to eat companies like factor in the U. S. And the planned acquisition of you foods in Australia. So maybe you guys could talk a little bit about your journey as a company specifically as So we grew very organically So that for the team becomes a bottleneck and so the lines of business, the marketing team salesman's okay, we're going to take things into our own Started really to build their own data solutions at some point you have to get the ball rolling But but on the flip side of that is when you think about a centralized organization say the data to the experts in these teams and this, as you have mentioned, right, that increases mental load look at that say, okay, hey, that's pretty good thinking and then now we have to apply it and that's And the idea was really moving away from um ever growing complex go ahead. we have a self service infrastructure and as you mentioned, the spreadsheet era but christoph maybe you can talk about that. So in the end, in the natural, as we have said, the lack of trust and that's and cultural challenges that you faced. The conversations on the cultural change. got a bit more difficult. there are times and changes, you have different different artifacts that you were created These rules are defined by calling the sports association and this is what you can think about So learning never stops the tele fish, but we are really trying this and this is what we see in surveys, for example, where our employees that your justification not the least of which is crypto so you've identified some of the process gaps uh So if I take the example of This this is similar to a new thinking, right? gears and talk about the notion of data product and, and we have a slide uh that we There's someone accountable for making sure that the product that we are providing is actually So it's not just a nice dream that we have right. So this is to me this is why I get so excited about data mesh because I really do the company needs to really understand what our core business concept that they have, they need to have this mapping from. to the full video that you guys did. in order to prevent a car crash, we need to at least ensure the promises we made to our customers or consumers, these are the internal objectives that help us to keep a three phase approach to governance and you kind of, it sounds like you're in early phases called phase zero where Is there anything that small piece that we can already automate? and defy the policies that needs to be implemented. that the policy is being followed. so we kind of need to operate on the same level with the right understanding. or a Mulligan as we like to say for you golfers, what would you do differently? So it's not that we go in and say So this transition implies that you typically have a lot of the company that needs to happen and to act. It really speaks to to to what you are, who you are as a company, how you operate and in the in the sense of we would like to increase that to maturity as a whole across the entire company and this is kind Once you bring us home give us your final thoughts. and see where either you can fill um the gaps are where you Uh and I gotta ask you guys, it seems like this is an amazing place to work you guys hiring? We do you can speak about. really want to thank you for your time. Thank you very much. thank you for watching the cubes startup showcase made possible by A W. S.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
2015	DATE	0.99+
Australia	LOCATION	0.99+
Dave Volonte	PERSON	0.99+
May 2019	DATE	0.99+
2017	DATE	0.99+
2019	DATE	0.99+
three	QUANTITY	0.99+
Hello Fresh	ORGANIZATION	0.99+
Russia	LOCATION	0.99+
David	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
100%	QUANTITY	0.99+
july	DATE	0.99+
Denmark	LOCATION	0.99+
Clements	PERSON	0.99+
Jim McDaid Ghani	PERSON	0.99+
U. S.	LOCATION	0.99+
christophe	PERSON	0.99+
two years later	DATE	0.99+
last year	DATE	0.99+
first piece	QUANTITY	0.99+
one example	QUANTITY	0.99+
Clements	ORGANIZATION	0.99+
steve	PERSON	0.99+
last week	DATE	0.99+
Beatles	ORGANIZATION	0.99+
One	QUANTITY	0.99+
one	QUANTITY	0.99+
one tool	QUANTITY	0.98+
two things	QUANTITY	0.98+
Norway	LOCATION	0.98+
second	QUANTITY	0.98+
both	QUANTITY	0.98+
four	QUANTITY	0.98+
christoph	PERSON	0.98+
today	DATE	0.98+
first two	QUANTITY	0.98+
hundreds of millions of meals	QUANTITY	0.98+
one model	QUANTITY	0.98+
four colors	QUANTITY	0.97+
four pillars	QUANTITY	0.97+
first	QUANTITY	0.97+
first initiatives	QUANTITY	0.97+
earlier this year	DATE	0.97+
Jemaah	PERSON	0.97+
each	QUANTITY	0.96+
handle fresh	ORGANIZATION	0.96+
U. K.	LOCATION	0.95+
Dallas	LOCATION	0.95+
christoph Nevada	PERSON	0.95+
johnny	PERSON	0.95+
Wild West	LOCATION	0.94+
Youtube	ORGANIZATION	0.94+
christophe clement	PERSON	0.94+
four different pillars	QUANTITY	0.94+
about 40 people	QUANTITY	0.93+
each year	QUANTITY	0.93+
A W. S.	PERSON	0.92+
two different things	QUANTITY	0.92+
Hello fresh	ORGANIZATION	0.92+
millions of people	QUANTITY	0.91+

Clemence W. Chee & Christoph Sawade, HelloFresh

(upbeat music) >> Hello everyone. We're here at theCUBE startup showcase made possible by AWS. Thanks so much for joining us today. You know, when Zhamak Dehghani was formulating her ideas around data mesh, she wasn't the only one thinking about decentralized data architectures. HelloFresh was going into hyper-growth mode and realized that in order to support its scale, it needed to rethink how it thought about data. Like many companies that started in the early part of the last decade, HelloFresh relied on a monolithic data architecture and the internal team it had concerns about its ability to support continued innovation at high velocity. The company's data team began to think about the future and work backwards from a target architecture, which possessed many principles of so-called data mesh, even though they didn't use that term specifically. The company is a strong example of an early but practical pioneer of data mesh. Now, there are many practitioners and stakeholders involved in evolving the company's data architecture many of whom are listed here on this slide. Two are highlighted in red and joining us today. We're really excited to welcome you to theCUBE, Clemence Chee, who is the global senior director for data at HelloFresh, and Christoph Sawade, who's the global senior director of data also of course at HelloFresh. Folks, welcome. Thanks so much for making some time today and sharing your story. >> Thank you very much. >> Thanks, Dave. >> All right, let's start with HelloFresh. You guys are number one in the world in your field. You deliver hundreds of millions of meals each year to many, many millions of people around the globe. You're scaling. Christoph, tell us a little bit more about your company and its vision. >> Yeah. Should I start or Clemence? Maybe take over the first piece because Clemence has actually been longer a director at HelloFresh. >> Yeah go ahead Clemence. >> I mean, yes, about approximately six years ago I joined and HelloFresh, and I didn't think about the startup I was joining would eventually IPO. And just two years later, HelloFresh went public. And approximately three years and 10 months after HelloFresh was listed on the German stock exchange which was just last week, HelloFresh was included in the DAX Germany's leading stock market index and that, to mind a great, great milestone, and I'm really looking forward and I'm very excited for the future for HelloFresh and also our data. The vision that we have is to become the world's leading food solution group. And there are a lot of attractive opportunities. So recently we did launch and expand in Norway. This was in July. And earlier this year, we launched the US brand, Green Chef, in the UK as well. We're committed to launch continuously different geographies in the next coming years and have a strong path ahead of us. With the acquisition of ready to eat companies like factor in the US and the plant acquisition of Youfoodz in Australia, we are diversifying our offer, now reaching even more and more untapped customer segments and increase our total address for the market. So by offering customers and growing range of different alternatives to shop food and to consume meals, we are charging towards this vision and this goal to become the world's leading integrated food solutions group. >> Love it. You guys are on a rocket ship. You're really transforming the industry. And as you expand your TAM, it brings us to sort of the data as a core part of that strategy. So maybe you guys could talk a little bit about your journey as a company, specifically as it relates to your data journey. I mean, you began as a startup, you had a basic architecture and like everyone, you've made extensive use of spreadsheets, you built a Hadoop based system that started to grow. And when the company IPO'd, you really started to explode. So maybe describe that journey from a data perspective. >> Yes, Dave. So HelloFresh by 2015, approximately had evolved what amount, a classical centralized data management set up. So we grew very organically over the years, and there were a lot of very smart people around the globe, really building the company and building our infrastructure. This also means that there were a small number of internal and external sources, data sources, and a centralized BI team with a number of people producing different reports, different dashboards and, and products for our executives, for example, or for different operations teams to see a company's performance and knowledge was transferred just by our talking to each other face-to-face conversations. And the people in the data warehouse team were considered as the data wizard or as the ETL wizard. Very classical challenges. And it was ETL, who reserved, indicated the kind of like a style of knowledge of data management, right? So our central data warehouse team then was responsible for different type of verticals in different domains, different geographies. And all this setup gave us in the beginning, the flexibility to grow fast as a company in 2015. >> Christoph, anything to add to that? >> Yes, not explicitly to that one, but as, as Clemence said, right, this was kind of the setup that actually worked for us quite a while. And then in 2017, when HelloFresh went public, the company also grew rapidly. And just to give you an idea how that looked like as well, the tech departments have actually increased from about 40 people to almost 300 engineers. And in the same way as the business units, as there Clemence has described, also grew sustainably. So we continue to launch HelloFresh in new countries, launched new brands like Every Plate, and also acquired other brands like we have Factor. And that grows also from a data perspective, the number of data requests that the central (mumbles), we're getting become more and more and more, and also more and more complex. So that for the team meant that they had a fairly high mental load. So they had to achieve a very, or basically get a very deep understanding about the business and also suffered a lot from this context, switching back and forth. Essentially, they had to prioritize across our product requests from our physical product, digital product, from a physical, from, sorry, from the marketing perspective, and also from the central reporting teams. And in a nutshell, this was very hard for these people, and that altered situations that let's say the solution that we have built. We can not really optimal. So in a, in a, in a, in a nutshell, the central function became a bottleneck and slow down of all the innovation of the company. >> It's a classic case. Isn't it? I mean, Clemence, you see, you see the central team becomes a bottleneck, and so the lines of business, the marketing team, sales teams say "Okay, we're going to take things into our own hands." And then of course IT and the technical team is called in later to clean up the mess. Maybe, maybe I'm overstating it, but, but that's a common situation. Isn't it? >> Yeah this is what exactly happened. Right. So we had a bottleneck, we had those central teams, there was always a bit of tension. Analytics teams then started in those business domains like marketing, supply chain, finance, HR, and so on started really to build their own data solutions. At some point you have to get the ball rolling, right? And then continue the trajectory, which means then that the data pipelines didn't meet the engineering standards. And there was an increased need for maintenance and support from central teams. Hence over time, the knowledge about those pipelines and how to maintain a particular infrastructure, for example, left the company, such that most of those data assets and data sets that turned into a huge debt with decreasing data quality, also decreasing lack of trust, decreasing transparency. And this was an increasing challenge where a majority of time was spent in meeting rooms to align on, on data quality for example. >> Yeah. And the point you were making Christoph about context switching, and this is, this is a point that Zhamak makes quite often as we've, we've, we've contextualized our operational systems like our sales systems, our marketing systems, but not our, our data systems. So you're asking the data team, okay, be an expert in sales, be an expert in marketing, be an expert in logistics, be an expert in supply chain and it's start, stop, start, stop. It's a paper cut environment, and it's just not as productive. But, but, and the flip side of that is when you think about a centralized organization, you think, hey, this is going to be a very efficient way across functional team to support the organization, but it's not necessarily the highest velocity, most effective organizational structure. >> Yeah. So, so I agree with that piece, that's up to a certain scale. A centralized function has a lot of advantages, right? So it's a tool for everyone, which would go to a destined kind of expert team. However, if you see that you actually would like to accelerate that in specific as the type of growth. But you want to actually have autonomy on certain teams and move the teams, or let's say the data to the experts in these teams. And this, as you have mentioned, right, that increases mental load. And you can either internally start splitting your team into different kinds of sub teams focusing on different areas, however, that is then again, just adding another piece where actually collaboration needs to happen because the external seized, so why not bridging that gap immediately and actually move these teams end to end into the, into the function themselves. So maybe just to continue what Clemence was saying, and this is actually where our, so, Clemence and my journey started to become one joint journey. So Clemence was coming actually from one of these teams who builds their own solutions. I was basically heading the platform team called data warehouse team these days. And in 2019, where (mumbles) become more and more serious, I would say, so more and more people have recognized that this model does not really scale, in 2019, basically the leadership of the company came together and identified data as a key strategic asset. And what we mean by that, that if he leveraged it in a, in a, an appropriate way, it gives us a unique, competitive advantage, which could help us to, to support and actually fully automate our decision making process across the entire value chain. So once we, what we're trying to do now, or what we would be aiming for is that HelloFresh is able to build data products that have a purpose. We're moving away from the idea that it's just a bi-product. We have a purpose why we would like to collect this data. There's a clear business need behind that. And because it's so important to, for the company as a business, we also want to provide them as a trustworthy asset to the rest of the organization. We'd say, this is the best customer experience, but at least in a way that users can easily discover, understand and securely access, high quality data. >> Yeah. So, and, and, and Clemence, when you see Zhamak's writing, you see, you know, she has the four pillars and the principles. As practitioners, you look at that say, okay, hey, that's pretty good thinking. And then now we have to apply it. And that's where the devil meets the details. So it's the for, the decentralized data ownership, data as a product, which we'll talk about a little bit, self-serve, which you guys have spent a lot of time on, and Clemence your wheelhouse, which is, which is governance and a federated governance model. And it's almost like if you, if you achieve the first two, then you have to solve for the second two, it almost creates a new challenges, but maybe you could talk about that a little bit as to how it relates to HelloFresh. >> Yes. So Chris has mentioned that we identified kind of a challenge beforehand and said, how can we actually decentralized and actually empower the different colleagues of ours? And this was more a, we realized that it was more an organizational or a cultural change. And this is something that someone also mentioned. I think ThoughtWorks mentioned one of the white papers, it's more of an organizational or a cultural impact. And we kicked off a phased reorganization, or different phases we're currently on, in the middle of still, but we kicked off different phases of organizational restructuring or reorganization trying to lock this data at scale. And the idea was really moving away from ever growing complex matrix organizations or matrix setups and split between two different things. One is the value creation. So basically when people ask the question, what can we actually do? What should we do? This is value creation and the how, which is capability building, and both are equal in authority. This actually then creates a high urge in collaboration and this collaboration breaks up the different silos that were built. And of course, this also includes different needs of staffing for teams staffing with more, let's say data scientists or data engineers, data professionals into those business domains, enhance, or some more capability building. >> Okay, go ahead. Sorry. >> So back to Zhamak Dehghani. So we, the idea also then crossed over when she published her papers in May, 2019. And we thought, well, the four pillars that she described were around decentralized data ownership, product, data as a product mindset, we have a self-service infrastructure. And as you mentioned, federated computational governance. And this suited very much with our thinking at that point of time to reorganize the different teams and this then that to not only organizational restructure, but also in completely new approach of how we need to manage data, through data. >> Got it. Okay. So your businesses is exploding. The data team was having to become domain experts to many areas, constantly context switching as we said, people started to take things into their own hands. So again, we said classic story, but, but you didn't let it get out of control and that's important. And so we, we actually have a picture of kind of where you're going today and it's evolved into this, Pat, if you could bring up the picture with the, the elephant, here we go. So I will talk a little bit about the architecture. It doesn't show it here, the spreadsheet era, but Christoph, maybe you could talk about that. It does show the Hadoop monolith, which exists today. I think that's in a managed hosting service, but, but you, you preserve that piece of it. But if I understand it correctly, everything is evolving to the cloud. I think you're running a lot of this or all of it in AWS. You've got, everybody's got their own data sources. You've got a data hub, which I think is enabled by a master catalog for discovery and all this underlying technical infrastructure that is, is really not the focus of this conversation today. But the key here, if I understand correctly is these domains are autonomous and that not only this required technical thinking, but really supportive organizational mindset, which we're going to talk about today. But, but Christoph, maybe you could address, you know, at a high level, some of the architectural evolution that you guys went through. >> Yeah, sure. Yeah. Maybe it's also a good summary about the entire history. So as you have mentioned, right, we started in the very beginning, it's a monolith on the operational plan, right? Actually it wasn't just one model it was two, one for the backend and one for the front end. And our analytical plan was essentially a couple of spreadsheets. And I think there's nothing wrong with spreadsheets, but it allows you to store information, it allows you to transform data, it allows you to share this information, it allows you to visualize this data, but all kind of, it's not actually separating concern, right? Every single one tool. And this means that it's obviously not scalable, right? You reach the point where this kind of management's set up in, or data management is in one tool, reached elements. So what we have started is we created our data lake, as we have seen here on our dupe. And just in the very beginning actually reflected very much our operation upon this. On top of that, we used Impala as a data warehouse, but there was not really a distinction between what is our data warehouse and what is our data lakes as the Impala was used as kind of both as a kind of engine to create a warehouse and data lake constructed itself. And this organic growth actually led to a situation. As I think it's clear now that we had the centralized model as, for all the domains that were really lose Kimball, the modeling standards and there's new uniformity we used to actually build, in-house, a base of building materialized use, of use that we have used for the presentation there. There was a lot of duplication of effort. And in the end, essentially the amendments and feedback tool, which helped us to, to improve of what we, have built during the end in a natural, as you said, the lack of trust. And this basically was a starting point for us to understand, okay, how can we move away? And there are a lot of different things that we can discuss of apart from this organizational structure that we have set up here, we have three or four pillars from Zhamak. However, there's also the next, extra question around, how do we implement product, right? What are the implications on that level and I think that is, that's something that we are, that we are currently still in progress. >> Got it. Okay. So I wonder if we could talk about, switch gears a little bit, and talk about the organizational and cultural challenges that you faced. What were those conversations like? And let's, let's dig into that a little bit. I want to get into governance as well. >> The conversations on the cultural change. I mean, yes, we went through a hyper growth through the last year, and obviously there were a lot of new joiners, a lot of different, very, very smart people joining the company, which then results that collaborations got a bit more difficult. Of course, the time zone changes. You have different, different artifacts that you had recreated in documentation that were flying around. So we were, we had to build the company from scratch, right? Of course, this then resulted always this tension, which I described before. But the most important part here is that data has always been a very important factor at HelloFresh, and we collected more of this data and continued to improve, use data to improve the different key areas of our business. Even when organizational struggles like the central (mumbles) struggles, data somehow always helped us to grow through this kind of change, right? In the end, those decentralized teams in our local geographies started with solutions that serve the business, which was very, very important. Otherwise, we wouldn't be at the place where we are today, but they did violate best practices and standards. And I always use the sports analogy, Dave. So like any sport, there are different rules and regulations that need to be followed. These routes are defined by, I'll call it, the sports association. And this is what you can think about other data governance and then our compliance team. Now we add the players to it who need to follow those rules and abide by them. This is what we then call data management. Now we have the different players, the professionals they also need to be trained and understand the strategy and the rules before they can play. And this is what I then called data literacy. So we realized that we need to focus on helping our teams to develop those capabilities and teach the standards for how work is being done to truly drive functional excellence in the different domains. And one of our ambition of our data literacy program for example, is to really empower every employee at HelloFresh, everyone, to make the right data-informed decisions by providing data education that scales (mumbles), and that can be different things. Different things like including data capabilities with, in the learning path for example, right? So help them to create and deploy data products, connecting data, producers, and data consumers, and create a common sense and more understanding of each other's dependencies, which is important. For example, SIS, SLO, state of contracts, et cetera, people get more of a sense of ownership and responsibility. Of course, we have to define what it means. What does ownership means? What does responsibility mean? But we are teaching this to our colleagues via individual learning patterns and help them upscale to use also their shared infrastructure, and those self-service data applications. And of all to summarize, we are still in this progress of learning. We're still learning as well. So learning never stops at Hello Fresh, but we are really trying this to make it as much fun as possible. And in the end, we all know user behavior is changed through positive experience. So instead of having massive training programs over endless courses of workshops, leaving our new joiners and colleagues confused and overwhelmed, we're applying gamification, right? So split different levels of certification where our colleagues, can access, have had access points. They can earn badges along the way, which then simplifies the process of learning and engagement of the users. And this is what we see in surveys, for example, where our employees value this gamification approach a lot and are even competing to collect those learning pet badges, to become the number one on the leaderboard. >> I love the gamification. I mean, we've seen it work so well in so many different industries, not the least of which is crypto. So you've identified some of the process gaps that you, you saw, you just gloss over them. Sometimes I say, pave the cow path. You didn't try to force. In other words, a new architecture into the legacy processes, you really had to rethink your approach to data management. So what did that entail? >> To rethink the way of data management, 100%. So if I take the example of revolution, industrial revolution or classical supply chain revolution, but just imagine that you have been riding a horse, for example, your whole life, and suddenly you can operate a car or you suddenly receive just a complete new way of transporting assets from A to B. So we needed to establish a new set of cross-functional business processes to run faster, drive faster, more robustly, and deliver data products which can be trusted and used by downstream processes and systems. Hence we had a subset of new standards and new procedures that would fall into the internal data governance and compliance sector. With internal, I'm always referring to the data operations around new things like data catalog, how to identify ownership, how to change ownership, how to certify data assets, everything around classical is software development, which we now apply to data. This, this is some old and new thinking, right? Deployment, versioning, QA, all the different things, ingestion policies, the deletion procedures, all the things that software development has been doing, we do it now with data as well. And it's simple terms, it's a whole redesign of the supply chain of our data with new procedures and new processes in asset creation, asset management and asset consumption. >> So data's become kind of the new development kit, if you will. I want to shift gears and talk about the notion of data product, and we have a slide that, that we pulled from your deck. And I'd like to unpack it a little bit. I'll just, if you can bring that up, I'll, I'll read it. A data product is a product whose primary objective is to leverage on data to solve customer problems, where customers are both internal and external. so pretty straightforward. I know you've, you've gone much deeper in your thinking and into your organization, but how do you think about that and how do you determine for instance, who owns what, how did you get everybody to agree? >> I can take that one. Maybe let me start as a data product. So I think that's an ongoing debate, right? And I think the debate itself is the important piece here, right? You mentioned the debate, you've clarified what we actually mean by that, a product, and what is actually the mindset. So I think just from a definition perspective, right? I think we find the common denominator that we say, okay, that our product is something which is important for the company that comes with value. What do you mean by that? Okay. It's a solution to a customer problem that delivers ideally maximum value to the business. And yes, leverage is the power of data. And we have a couple of examples, and I'll hit refresh here, the historical and classical ones around dashboards, for example, to monitor our error rates, but also more sophisticated based for example, to incorporate machine learning algorithms in our recipe recommendation. However, I think the important aspects of a data product is A: there is an owner, right? There's someone accountable for making sure that the product that you're providing is actually served and has maintained. And there are, there's someone who's making sure that this actually keeps the value of what we are promising. Combined with the idea of the proper documentation, like a product description, right? The people understand how to use it. What is this about? And related to that piece is the idea of, there's a purpose, right? We need to understand or ask ourselves, okay, why does a thing exist? Does it provide the value that we think it does? Then it leads in to a good understanding of what the life cycle of the data product and product life cycle. What do we mean? Okay. From the beginning, from the creation, you need to have a good understanding. You need to collect feedback. We need to learn about that, you need to rework, and actually finally, also to think about, okay, when is it time to decommission that piece So overall I think the core of this data product is product thinking 101, right? That we start, the point is, the starting point needs to be the problem and not the solution. And this is essentially what we have seen, what was missing, what brought us to this kind of data spaghetti that we have built there in Rush, essentially, we built it. Certain data assets develop in isolation and continuously patch the solution just to fulfill these ad hoc requests that we got and actually really understanding what the stakeholder needs. And the interesting piece as a results in duplication of (mumbled) And this is not just frustrating and probably not the most efficient way, how the company should work. But also if I build the same data assets, but slightly different assumption across the company and multiple teams that leads to data inconsistency. And imagine the following scenario. You, as a management, for management perspective, you're asking basically a specific question and you get essentially from a couple of different teams, different kinds of graphs, different kinds of data and numbers. And in the end, you do not know which ones to trust. So there's actually much (mumbles) but good. You do not know what actually is it noise for times of observing or is it just actually, is there actually a signal that I'm looking for? And the same as if I'm running an AB test, right? I have a new feature, I would like to understand what is the business impact of this feature? I run that with a specific source and an unfortunate scenario. Your production system is actually running on a different source. You see different numbers. What you have seen in the AB test is actually not what you see then in production, typical thing. Then as you asking some analytics team to actually do a deep dive, to understand where the discrepancies are coming from, worst case scenario again, there's a different kind of source. So in the end, it's a pretty frustrating scenario. And it's actually a waste of time of people that have to identify the root cause of this type of divergence. So in a nutshell, the highest degree of consistency is actually achieved if people are just reusing data assets. And also in the end, the meetup talk they've given, right? We start trying to establish this approach by AB testing. So we have a team, but just providing, or is kind of owning their target metric associated business teams, and they're providing that as a product also to other services, including the AB testing team. The AB testing team can use this information to find an interface say, okay, I'm drawing information for the metadata of an experiment. And in the end, after the assignment, after this data collection phase, they can easily add a graph to a dashboard just grouped by the AB testing barrier. And we have seen that also in other companies. So it's not just a nice dream that we have, right? I have actually looked at other companies maybe looked on search and we established a complete KPI pipeline that was computing all these information and this information both hosted by the team and those that (mumbles) AB testing, deep dives and, and regular reporting again. So just one last second, the, the important piece, Now, why I'm coming back to that is that it requires that we are treating this data as a product, right? If we want to have multiple people using the thing that I am owning and building, we have to provide this as a trust (mumbles) asset and in a way that it's easy for people to discover and to actually work with. >> Yeah. And coming back to that. So this is, to me this is why I get so excited about data mesh, because I really do think it's the right direction for organizations. When people hear data product, they think, "Well, what does that mean?" But then when you start to sort of define it as you did, it's using data to add value that could be cutting costs, that could be generating revenue, it could be actually directly creating a product that you monetize. So it's sort of in the eyes of the beholder, but I think the other point that we've made, is you made it earlier on too, and again, context. So when you have a centralized data team and you have all these P&L managers, a lot of times they'll question the data 'cause they don't own it. They're like, "Well, wait a minute." If it doesn't agree with their agenda, they'll attack the data. But if they own the data, then they're responsible for defending that. And that is a mindset change that's really important. And I'm curious is how you got to that ownership. Was it a top-down or was somebody providing leadership? Was it more organic bottom up? Was it a sort of a combination? How do you decide who owned what? In other words, you know, did you get, how did you get the business to take ownership of the data and what does owning the data actually mean? >> That's a very good question, Dave. I think that one of the pieces where I think we have a lot of learning and basically if you ask me how we could stop the filling, I think that would be the first piece that we need to start. Really think about how that should be approached. If it's staff has ownership, right? That means somehow that the team has the responsibility to host themselves the data assets to minimum acceptable standards. That's minimum dependencies up and down stream. The interesting piece has to be looking backwards. What was happening is that under that definition, this extra process that we have to go through is not actually transferring ownership from a central team to the other teams, but actually in most cases to establish ownership. I make this difference because saying we have to transfer ownership actually would erroneously suggest that the dataset was owned before, but this platform team, yes, they had the capability to make the change, but actually the analytics team, but always once we had the business understand the use cases and what no one actually bought, it's actually expensive, expected. So we had to go through this very lengthy process and establishing ownership, how we have done that as in the beginning, very naively started, here's a document, here are all the data assets, what is probably the nearest neighbor who can actually take care of that. And then we, we moved it over. But the problem here is that all these things is kind of technical debt, right? It's not really properly documented, pretty unstable. It was built in a very inconsistent way over years. And these people that built this thing have already left the company. So this is actually not a nice thing that you want to see and people build up a certain resistance, even if they have actually bought into this idea of domain ownership. So if you ask me these learnings, what needs to happen is first, the company needs to really understand what our core business concept that we have the need to have this mapping from this other core business concept that we have. These are the domain teams who are owning this concept, and then actually linked that to the, the assets and integrate that better, but suppose understanding how we can evolve, actually the data assets and new data builds things new and the, in this piece and the domain, but also how can we address reduction of technical depth and stabilizing what we have already. >> Thank you for that Christoph. So I want to turn a direction here and talk Clemence about governance. And I know that's an area that's passionate, you're passionate about. I pulled this slide from your deck, which I kind of messed up a little bit, sorry for that. But, but, but by the way, we're going to publish a link to the full video that you guys did. So we'll share that with folks, but it's one of the most challenging aspects of data mesh. If you're going to decentralize, you, you quickly realize this could be the wild west, as we talked about all over again. So how are you approaching governance? There's a lot of items on this slide that are, you know, underscore the complexity, whether it's privacy compliance, et cetera. So, so how did you approach this? >> It's yeah, it's about connecting those dots, right? So the aim of the data governance program is to promote the autonomy of every team while still ensuring that everybody has the right interoperability. So when we want to move from the wild west, riding horses to a civilized way of transport, I can take the example of modern street traffic. Like when all participants can maneuver independently, and as long as they follow the same rules and standards, everybody can remain compatible with each other and understand and learn from each other so we can avoid car crashes. So when I go from country to country, I do understand what the street infrastructure means. How do I drive my car? I can also read the traffic lights and the different signals. So likewise, as a business in HelloFresh we do operate autonomously and consequently need to follow those external and internal rules and standards set forth by the tradition in which we operate. So in order to prevent a, a car crash, we need to at least ensure compliance with regulations, to account for societies and our customers' increasing concern with data protection and privacy. So teaching and advocating this imaging, evangelizing this to everyone in the company was a key community or communication strategy. And of course, I mean, I mentioned data privacy, external factors, the same goes for internal regulations and processes to help our colleagues to adapt for this very new environment. So when I mentioned before, the new way of thinking, the new way of dealing and managing data, this of course implies that we need new processes and regulations for our colleagues as well. In a nutshell, then this means that data governance provides a framework for managing our people, the processes and technology and culture around our data traffic. And that governance must come together in order to have this effective program providing at least a common denominator is especially critical for shared data sets, which we have across our different geographies managed, and shared applications on shared infrastructure and applications. And as then consumed by centralized processes, for example, master data, everything, and all the metrics and KPIs, which are also used for a central steering. It's a big change, right? And our ultimate goal is to have this non-invasive federated, automated and computational governance. And for that, we can't just talk about it. We actually have to go deep and use case by use case and QC by PUC and generate learnings and learnings with the different teams. And this would be a classical approach of identifying the target structure, the target status, match it with the current status, by identifying together with the business teams, with the different domains and have a risk assessment, for example, to increase transparency because a lot of teams, they might not even know what kind of situation they might be. And this is where this training and this piece of data literacy comes into place, where we go in and trade based on the findings, based on the most valuable use case. And based on that, help our teams to do this change, to increase their capability. I just told a little bit more, I wouldn't say hand-holding, but a lot of guidance. >> Can I kind of kind of chime in quickly and (mumbled) below me, I mean, there's a lot of governance piece, but I think that is important. And if you're talking about documentation, for example, yes, we can go from team to team and tell these people, hey, you have to document your data assets and data catalog, or you have to establish a data contract and so on and forth. But if we would like to build data products at scale, following actual governance, we need to think about automation, right? We need to think about a lot of things that we can learn from engineering before, and just starts as simple things. Like if we would like to build up trust in our data products, right? And actually want to apply the same rigor and the best practices that we know from engineering. There are things that we can do. And we should probably think about what we can copy. And one example might be so the level of service level agreements, so that level objectives. So the level of indicators, right, that represent on a, on an engineering level, right? Are we providing services? They're representing the promises we make to our customer and to our consumers. These are the internal objectives that help us to keep those promises. And actually these audits of, of how we are tracking ourselves, how we are doing. And this is just one example of where I think the federated governance, governance comes into play, right? In an ideal world, you should not just talk about data as a product, but also data product that's code. That'd be say, okay, as most, as much as possible, right? Give the engineers the tool that they are familiar with, and actually not ask the product managers, for example, to document the data assets in the data catalog, but make it part of the configuration has as, as a, as a CDCI continuous delivery pipeline, as we typically see in other engineering, tasks through it and services maybe say, okay, there is configuration, we can think about PII, we can think about data quality monitoring, we can think about the ingestion data catalog and so on and forth. But I think ideally in a data product goals become a sort of templates that can be deployed and are actually rejected or verified at build time before we actually make them and deploy them to production. >> Yeah so it's like DevOps for data product. So, so I'm envisioning almost a three-phase approach to governance. And you're kind of, it sounds like you're in the early phase of it, call it phase zero, where there's learning, there's literacy, there's training education, there's kind of self-governance. And then there's some kind of oversight, some, a lot of manual stuff going on, and then you, you're trying to process builders at this phase and then you codify it and then you can automate it. Is that fair? >> Yeah. I would rather think, think about automation as early as possible in a way, and yes, it needs to be separate rules, but then actually start actually use case by use case. Is there anything that small piece that we can already automate? If just possible roll that out at the next extended step-by-step. >> Is there a role though, that adjudicates that? Is there a central, you know, chief state officer who's responsible for making sure people are complying or is it, how do you handle it? >> I mean, from a, from a, from a platform perspective, yes. This applies in to, to implement certain pieces, that we are saying are important and actually would like to implement, however, that is actually working very closely with the governance department, So it's Clemence's piece to understand that defy the policies that needs to be implemented. >> So good. So Clemence essentially, it's, it's, it's your responsibility to make sure that the policy is being followed. And then as you were saying, Christoph, you want to compress the time to automation as fast as possible. Is that, is that-- >> Yeah, so it's a really, it's a, what needs to be really clear is that it's always a split effort, right? So you can't just do one or the other thing, but there is some that really goes hand in hand because for the right information, for the right engineering tooling, we need to have the transparency first. I mean, code needs to be coded. So we kind of need to operate on the same level with the right understanding. So there's actually two things that are important, which is one it's policies and guidelines, but not only that, because more importantly or equally important is to align with the end-user and tech teams and engineering and really bridge between business value business teams and the engineering teams. >> Got it. So just a couple more questions, because we got to wrap up, I want to talk a little bit about the business outcome. I know it's hard to quantify and I'll talk about that in a moment, but, but major learnings, we've got some of the challenges that, that you cited. I'll just put them up here. We don't have to go detailed into this, but I just wanted to share with some folks, but my question, I mean, this is the advice for your peers question. If you had to do it differently, if you had a do over or a Mulligan, as we like to say for you, golfers, what, what would you do differently? >> I mean, I, can we start with, from, from the transformational challenge that understanding that it's also high load of cultural exchange. I think this is, this is important that a particular communication strategy needs to be put into place and people really need to be supported, right? So it's not that we go in and say, well, we have to change into, towards data mash, but naturally it's the human nature, nature, nature, we are kind of resistant to change, right? And (mumbles) uncomfortable. So we need to take that away by training and by communicating. Chris, you might want to add something to that. >> Definitely. I think the point that I've also made before, right? We need to acknowledge that data mesh it's an architectural scale, right? If you're looking for something which is necessary by huge companies who are vulnerable, that are product at scale. I mean, Dave, you mentioned that right, there are a lot of advantages to have a centralized team, but at some point it may make sense to actually decentralize here. And at this point, right, if you think about data mesh, you have to recognize that you're not building something on a green field. And I think there's a big learning, which is also reflected on the slide is, don't underestimate your baggage. It's typically is you come to a point where the old model doesn't work anymore. And as had a fresh write, we lost the trust in our data. And actually we have seen certain risks of slowing down our innovation. So we triggered that, this was triggering the need to actually change something. So at this transition applies that you took, we have a lot of technical depth accumulated over years. And I think what we have learned is that potentially we have, de-centralized some assets too early. This is not actually taking into account the maturity of the team. We are actually investigating too. And now we'll be actually in the face of correcting pieces of that one, right? But I think if you, if you, if you start from scratch, you have to understand, okay, is all my teams actually ready for taking on this new, this new capability? And you have to make sure that this is decentralization. You build up these capabilities and the teams, and as Clemence has mentioned, right? Make sure that you take the, the people on your journey. I think these are the pieces that also here it comes with this knowledge gap, right? That we need to think about hiring literacy, the technical depth I just talked about. And I think the, the last piece that I would add now, which is not here on the slide deck is also from our perspective, we started on the analytical layer because it was kind of where things are exploding, right? This is the bit where people feel the pain. But I think a lot of the efforts that we have started to actually modernize the current stage and data products, towards data mesh, we've understood that it always comes down basically to a proper shape of our operational plan. And I think what needs to happen is I think we got through a lot of pains, but the learning here is this needs to really be an, a commitment from the company. It needs to have an end to end. >> I think that point, that last point you made is so critical because I, I, I hear a lot from the vendor community about how they're going to make analytics better. And that's not, that's not unimportant, but, but true data product thinking and decentralized data organizations really have to operationalize in order to scale it. So these decisions around data architecture and organization, they're fundamental and lasting, it's not necessarily about an individual project ROI. They're going to be projects, sub projects, you know, within this architecture. But the architectural decision itself is organizational it's cultural and, and what's the best approach to support your business at scale. It really speaks to, to, to what you are, who you are as a company, how you operate and getting that right, as we've seen in the success of data-driven companies is, yields tremendous results. So I'll, I'll, I'll ask each of you to give, give us your final thoughts and then we'll wrap. Maybe. >> Just can I quickly, maybe just jumping on this piece, what you have mentioned, right, the target architecture. If you talk about these pieces, right, people often have this picture of (mumbled). Okay. There are different kinds of stages. We have (incomprehensible speech), we have actually a gesture layer, we have a storage layer, transformation layer, presentation data, and then we are basically putting a lot of technology on top of that. That's kind of our target architecture. However, I think what we really need to make sure is that we have these different kinds of views, right? We need to understand what are actually the capabilities that we need to know, what new goals, how does it look and feel from the different kinds of personas and experience view. And then finally that should actually go to the, to the target architecture from a technical perspective. Maybe just to give an outlook what we are planning to do, how we want to move that forward. Yes. Actually based on our strategy in the, in the sense of we would like to increase the maturity as a whole across the entire company. And this is kind of a framework around the business strategy and it's breaking down into four pillars as well. People meaning the data culture, data literacy, data organizational structure and so on. If you're talking about governance, as Clemence had actually mentioned that right, compliance, governance, data management, and so on, you're talking about technology. And I think we could talk for hours for that one it's around data platform, data science platform. And then finally also about enablements through data. Meaning we need to understand data quality, data accessibility and applied science and data monetization. >> Great. Thank you, Christoph. Clemence why don't you bring us home. Give us your final thoughts. >> Okay. I can just agree with Christoph that important is to understand what kind of maturity people have, but I understand we're at the maturity level, where a company, where people, our organization is, and really understand what does kind of, it's just kind of a change applies to that, those four pillars, for example, what needs to be tackled first. And this is not very clear from the very first beginning (mumbles). It's kind of like green field, you come up with must wins to come up with things that you really want to do out of theory and out of different white papers. Only if you really start conducting the first initiatives, you do understand that you are going to have to put those thoughts together. And where do I miss out on one of those four different pillars, people process technology and governance, but, and then that can often the integration like doing step by step, small steps, by small steps, not pulling the ocean where you're capable, really to identify the gaps and see where either you can fill the gaps or where you have to increase maturity first and train people or increase your tech stack. >> You know, HelloFresh is an excellent example of a company that is innovating. It was not born in Silicon Valley, which I love. It's a global company. And, and I got to ask you guys, it seems like it's just an amazing place to work. Are you guys hiring? >> Yes, definitely. We do. As, as mentioned right as well as one of these aspects distributing and actually hiring as an entire company, specifically for data. I think there are a lot of open roles, so yes, please visit or our page from data engineering, data, product management, and Clemence has a lot of roles that you can speak to about. But yes. >> Guys, thanks so much for sharing with theCUBE audience, you're, you're pioneers, and we look forward to collaborations in the future to track progress, and really want to thank you for your time. >> Thank you very much. >> Thank you very much Dave. >> And thank you for watching theCUBE's startup showcase made possible by AWS. This is Dave Volante. We'll see you next time. (cheerful music)

Published Date : Sep 15 2021

SUMMARY :

and the internal team it had the world in your field. Maybe take over the first and the plant acquisition And as you expand your TAM, the flexibility to grow So that for the team meant and so the lines of business, and so on started really to and the flip side of that say the data to the experts So it's the for, And the idea was really moving away Okay, go ahead. And as you mentioned, federated computational governance. is really not the focus of And in the end, and talk about the organizational And in the end, we all know user behavior not the least of which is crypto. So if I take the example of revolution, of the new development kit, And also in the end, So it's sort of in the the company needs to really but it's one of the most So the aim of the data governance and actually not ask the the early phase of it, that we can already automate? that defy the policies that the time to automation on the same level with the about the business outcome. So it's not that we go in and say, well, efforts that we have started to I hear a lot from the vendor in the sense of we would like Clemence why don't you bring us home. fill the gaps or where you And, and I got to ask you guys, that you can speak to about. collaborations in the future to track And thank you for watching

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Christoph	PERSON	0.99+
Chris	PERSON	0.99+
Christoph Sawade	PERSON	0.99+
2015	DATE	0.99+
Zhamak Dehghani	PERSON	0.99+
Youfoodz	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Clemence Chee	PERSON	0.99+
2019	DATE	0.99+
Norway	LOCATION	0.99+
2017	DATE	0.99+
AWS	ORGANIZATION	0.99+
May, 2019	DATE	0.99+
UK	LOCATION	0.99+
HelloFresh	ORGANIZATION	0.99+
Clemence	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Australia	LOCATION	0.99+
100%	QUANTITY	0.99+
US	LOCATION	0.99+
July	DATE	0.99+
two	QUANTITY	0.99+
Clemence W. Chee	PERSON	0.99+
Two	QUANTITY	0.99+
TAM	ORGANIZATION	0.99+
one	QUANTITY	0.99+
three	QUANTITY	0.99+
Hello Fresh	ORGANIZATION	0.99+
first piece	QUANTITY	0.99+
one tool	QUANTITY	0.99+
last year	DATE	0.99+
last week	DATE	0.99+
two things	QUANTITY	0.99+
Zhamak	PERSON	0.99+
first	QUANTITY	0.99+
two years later	DATE	0.99+
Pat	PERSON	0.99+
second two	QUANTITY	0.99+
one last second	QUANTITY	0.99+
Green Chef	ORGANIZATION	0.99+
One	QUANTITY	0.98+
first two	QUANTITY	0.98+
one example	QUANTITY	0.98+
both	QUANTITY	0.98+
one model	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
four pillars	QUANTITY	0.97+
Every Plate	ORGANIZATION	0.97+
today	DATE	0.97+
each	QUANTITY	0.97+
earlier this year	DATE	0.97+

Is Data Mesh the Killer App for Supercloud | Supercloud2

(gentle bright music) >> Okay, welcome back to our "Supercloud 2" event live coverage here at stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We've got exclusive news and a scoop here for SiliconANGLE and theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called NextData.com NextData, she's a cube alumni and contributor to our Supercloud initiative, as well as our coverage and breaking analysis with Dave Vellante on data, the killer app for Supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> John: Wonderful. Your contributions to the data conversation has been well-documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing, you know, cold water on it. Some are, I think, it's the next big thing. Tell us about the data mesh super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, it's, you know, the pain point that it surfaced were universal. Everybody said, "Oh, why didn't I think of that?" You know, it was just an obvious next step and people are approaching it, implementing it. I guess the last few years, I've been involved in many of those implementations, and I guess Supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include boundaries, organizational boundaries cloud technology boundaries and trust boundaries. >> I want to bring that up because your venture, NextData which is new, just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Zhamak: Absolutely, yes. So next data is the result of, I suppose, the pains that I suffered from implementing a database for many of the organizations. Basically, a lot of organizations that I've worked with, they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find, I guess, the common denominator that solves those problems and enables that developer experience for data sharing. >> John: Since you just announced the news, what's been the reaction? >> Zhamak: I just announced the news right now, so what's the reaction? >> John: But people in the industry that know you, you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth modes, so we haven't publicly talked about it, but folks that have been close to us in fact have reached out. We already have implementations of our pilot platform with early customers, which is super exciting. And we're going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those where we are going to have multiple pilots, implementations of our platform in real world. We're real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak: When I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally obviously not surprising. They don't include the big vision of inclusivity across clouds across different data stores. But it seems like people are having to go through some gymnastics to get to, you know, the organizational reality of decentralizing data, and at least pushing data ownership to the line of business. How are you approaching or are you approaching, solving that problem? Are you taking a narrow slice? What can you tell us about Next Data? >> Zhamak: Sure, yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that, you know, the data, as you know, resides on different platforms. It's owned by different people, it's processed by pipelines that who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem, the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in this autonomous units, we call them data products, I guess in data mesh, right? That constitutes computation, that governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is, you know, data in different places, decentralization and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation, APIs to get to it in a technology agnostic way, in an open way. And then, sit on top and use existing existing tech, you know, Snowflake, Databricks, whatever exists, you know, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here, that the language, and the modeling that we use is really native to data mesh is that I will make a data product, I'm sharing a data product, and that encapsulates on providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side connected to peer-to-peer data sharing with data product as a primitive first class concept. >> Okay, so the idea would be developers would build applications leveraging those data products which are discoverable and governed. Now, today you see some companies, you know, take a snowflake for example. >> Zhamak: Yeah. >> Attempting to do that within their own little walled garden. They even, at one point, used the term, "Mesh." I dunno if they pull back on that. And then they sort of became aware of some of your work. But a lot of the things that they're doing within their little insulated environment, you know, support that, that, you know, governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realize that, you know, and this is a reality, like you go to organizations, they have a snowflake and half of the organization happily operates on Snowflake. And on the other half, oh, we are on, you know, bare infrastructure on AWS, or we are on Databricks. This is the realities, you know, this Supercloud that's written up here. It's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use next data mesh operating system. People will have different platforms." So you have to build with openness in mind, and in case of Snowflake, I think, you know, they have I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So, it's worth reviewing that basically, the concept of data mesh is that, whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember, I wrote a blog post in 2007 called, "Data's the new developer kit." Back then, they used to call 'em developer kits, if you remember. And that we said at that time, whoever can code data >> Zhamak: Yes. >> Will have a competitive advantage. >> Aren't there machines going to be doing that? Didn't we just hear that? >> Well we have, and you know, Hey Siri, hey Cube. Find me that best video for data mesh. There it is. I mean, this is the point, like what's happening is that, now, data has to be addressable >> Zhamak: Yes. >> For machines and for coding. >> Zhamak: Yes. >> Because as you need to call the data. So the question is, how do you manage the complexity of big things as promiscuous as possible, making it available as well as then governing it because it's a trade off. The more you make open >> Zhamak: Definitely. >> The better the machine learning. >> Zhamak: Yes. >> But yet, the governance issue, so this is the, you need an OS to handle this maybe. >> Yes, well, we call our mental model for our platform is an OS operating system. Operating systems, you know, have shown us how you can kind of abstract what's complex and take care of, you know, a lot of complexities, but yet provide an open and, you know, dynamic enough interface. So we think about it that way. We try to solve the problem of policies live with the data. An enforcement of the policies happens at the most granular level which is, in this concept, the data product. And that would happen whether you read, write, or access a data product. But we can never imagine what are these policies could be. So our thinking is, okay, we should have a open policy framework that can allow organizations write their own policy drivers, and policy definitions, and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, you know, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now, the primitives that we work with to train machine-learning model are still bits and bites in data. They're fields, rows, columns, right? And that creates quite a large surface area, an attack area for, you know, for privacy of the data. So perhaps, one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area so you can really leave the control of the data to the sovereign owners of that data, right? So that data product. So I think the evolution of our data APIs perhaps will become more and more computational. So you describe what you want, and the data owner decides, you know, how to manage the- >> John: That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment with you, who's a machine learning, could really been around the industry. It's almost as if you're starting to see reason come into the data, reasoning. It's like you starting to see not just metadata, using the data to reason so that you don't have to expose the raw data. It's almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that 'cause that seems to be where the dots are connecting. >> Zhamak: Yes, this is perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge-making mode, however, by just the basic notion of saying, "I'm going to put an API in front of my data, and that API today might be as primitive as a level of indirection as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage, and insert all of this intelligence that need to happen. And then I will, today, I will still give you a file. But by just defining that API and standardizing it, now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can kind of evolve that, right? Now you have a point of evolution to this very futuristic, I guess, future where you just describe the question that you're asking from the chat. >> Well, this is the Supercloud, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so, his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has, and he wants your feedback on this, "Is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products, how do you respond to that? How do you see, is that a problem or is that something that is overstated, or do you have an answer for that?" >> Zhamak: Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not burdening them with the complexity of the application and application logic, and yet reducing their cognitive load by localizing what they need to know about which is that domain where they're operating within. Because what's happening right now? what's happening right now is that data engineers, a ton of empathy for them for their high threshold of pain that they can, you know, deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curates it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers, these are still separately moving units. Your app and your data products are independent but yet tightly closed with each other, tightly coupled with each other based on the context of the domain, so reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application but yet have them them separate from app because app provides a very different service. Transactional data for my e-commerce transaction, data product provides a very different service, longitudinal data for the, you know, variety of this intelligent analysis that I can do on the data. But yet, it's all within the domain of e-commerce or sales or whatnot. >> So a lot of decoupling and coupling create that cohesiveness. >> Zhamak: Absolutely. >> Architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server, data center days and cloud, SRE, Google coined the term, "Site Reliability Engineer" for someone to look over the hundreds of thousands of servers. We asked a question to data engineering community who have been suffering, by the way, agree. Is there an SRE-like role for data? Because in a way, data engineering, that platform engineer, they are like the SRE for data. In other words, managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Zhamak: Yes, exactly. So, maybe we go through that history of how SRE came to be. So we had the first DevOps movement which was, remove the wall between dev and ops and bring them together. So you have one cross-functional units of the organization that's responsible for, you build it you run it, right? So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that, and then we said, "Okay, as we decentralized and had this many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing and running a lot while giving autonomy to this cross-functional team." And that's where the SRE, a new generation of engineers came to exist. So I think if I just look- >> Hence Borg, hence Kubernetes. >> Hence, hence, exactly. Hence chaos engineering, hence embracing the complexity and messiness, right? And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think, if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain oriented cross-functional teams, full stop, and still have a very advanced maybe at the platform infrastructure level kind of operational team that they're not busy doing two jobs which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> John: So you see similarities. >> Absolutely, but I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening yet. Eh, a little bit fast and loose with some complexities to clean up. >> Yes, yes. This is a different restructure. As you said we, you know, the job of this industry as a whole on architects is decompose, recompose, decompose, recomposing a new way, and now we're like decomposing centralized team, recomposing them as domains and- >> John: So is data mesh the killer app for Supercloud? >> You had to do this for me. >> Dave: Sorry, I couldn't- (John and Dave laughing) >> Zhamak: What do you want me to say, Dave? >> John: Yes. >> Zhamak: Yes of course. >> I mean Supercloud, I think it's, really the terminology's Supercloud, Opencloud. But I think, in spirits of it, this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> John: Well thank you so much for coming on Supercloud too, really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. (John laughs) >> John: That's now going well. We can move faster, so thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Okay, Supercloud 2 live here in Palo Alto. Our stage performance, I'm John Furrier with Dave Vellante. We're back with more after this short break, Stay with us all day for Supercloud 2. (gentle bright music)

Published Date : Feb 17 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few years, What's the pain point? a database for many of the organizations. in terms of the approach, but folks that have been close to us to get to, you know, the data, as you know, resides Okay, so the idea would be developers But a lot of the things that they're doing This is the realities, you know, inside of the data. And that we said at that Well we have, and you know, So the question is, how do so this is the, you need and the data owner decides, you know, so that you don't have 'cause that seems to be where of this API, you not So the concern that he has, into the domain closer to So a lot of decoupling So I have to ask you, this a lot of the complexity of domains and the infrastructure, in a more early days of that movement. to clean up. the job of this industry the world would work. John: Well thank you so much for coming Dave: Been a great catalyst. We can move faster, so Thank you for hosting me. after this short break,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Zhamak	PERSON	0.99+
Dave	PERSON	0.99+
George Gilbert	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
Palo Alto	LOCATION	0.99+
John Furrier	PERSON	0.99+
John Furrier	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
JPMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Dav	PERSON	0.99+
two jobs	QUANTITY	0.99+
Supercloud	ORGANIZATION	0.99+
NextData	ORGANIZATION	0.99+
today	DATE	0.99+
Opencloud	ORGANIZATION	0.99+
last year	DATE	0.99+
Siri	TITLE	0.99+
ThoughtWorks	ORGANIZATION	0.98+
NextData.com	ORGANIZATION	0.98+
Supercloud 2	EVENT	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
HelloFresh	ORGANIZATION	0.98+
first	QUANTITY	0.98+
millions of dollars	QUANTITY	0.96+
Snowflake	EVENT	0.96+
Oracle	ORGANIZATION	0.96+
SRE	TITLE	0.94+
Snowflake	ORGANIZATION	0.94+
Cube	PERSON	0.93+
Zhama	PERSON	0.92+
Data Mesh the Killer App	TITLE	0.92+
SiliconANGLE	ORGANIZATION	0.91+
Databricks	ORGANIZATION	0.9+
first class	QUANTITY	0.89+
Supercloud 2	ORGANIZATION	0.88+
theCUBE	ORGANIZATION	0.88+
hundreds of thousands	QUANTITY	0.85+
one point	QUANTITY	0.84+
Zham	PERSON	0.83+
Supercloud	EVENT	0.83+
ChatGPT	ORGANIZATION	0.72+
SRE	ORGANIZATION	0.72+
Borg	PERSON	0.7+
Snowflake	TITLE	0.66+
Supercloud	TITLE	0.65+
half	QUANTITY	0.64+

Is Data Mesh the Next Killer App for Supercloud?

(upbeat music) >> Welcome back to our Supercloud 2 event live coverage here of stage performance in Palo Alto syndicating around the world. I'm John Furrier with Dave Vellante. We got exclusive news and a scoop here for SiliconANGLE in theCUBE. Zhamak Dehghani, creator of data mesh has formed a new company called Nextdata.com, Nextdata. She's a cube alumni and contributor to our supercloud initiative, as well as our coverage and Breaking Analysis with Dave Vellante on data, the killer app for supercloud. Zhamak, great to see you. Thank you for coming into the studio and congratulations on your newly formed venture and continued success on the data mesh. >> Thank you so much. It's great to be here. Great to see you in person. >> Dave: Yeah, finally. >> Wonderful. Your contributions to the data conversation has been well documented certainly by us and others in the industry. Data mesh taking the world by storm. Some people are debating it, throwing cold water on it. Some are thinking it's the next big thing. Tell us about the data mesh, super data apps that are emerging out of cloud. >> I mean, data mesh, as you said, the pain point that it surface were universal. Everybody said, "Oh, why didn't I think of that?" It was just an obvious next step and people are approaching it, implementing it. I guess the last few years I've been involved in many of those implementations and I guess supercloud is somewhat a prerequisite for it because it's data mesh and building applications using data mesh is about sharing data responsibly across boundaries. And those boundaries include organizational boundaries, cloud technology boundaries, and trust boundaries. >> I want to bring that up because your venture, Nextdata, which is new just formed. Tell us about that. What wave is that riding? What specifically are you targeting? What's the pain point? >> Absolutely. Yes, so Nextdata is the result of, I suppose the pains that I suffered from implementing data mesh for many of the organizations. Basically a lot of organizations that I've worked with they want decentralized data. So they really embrace this idea of decentralized ownership of the data, but yet they want interconnectivity through standard APIs, yet they want discoverability and governance. So they want to have policies implemented, they want to govern that data, they want to be able to discover that data, and yet they want to decentralize it. And we do that with a developer experience that is easy and native to a generalist developer. So we try to find the, I guess the common denominator that solves those problems and enables that developer experience for data sharing. >> Since you just announced the news, what's been the reaction? >> I just announced the news right now, so what's the reaction? >> But people in the industry know you did a lot of work in the area. What have been some of the feedback on the new venture in terms of the approach, the customers, problem? >> Yeah, so we've been in stealth mode so we haven't publicly talked about it, but folks that have been close to us, in fact have reached that we already have implementations of our pilot platform with early customers, which is super exciting. And we going to have multiple of those. Of course, we're a tiny, tiny company. We can have many of those, but we are going to have multiple pilot implementations of our platform in real world where real global large scale organizations that have real world problems. So we're not going to build our platform in vacuum. And that's what's happening right now. >> Zhamak, when I think about your role at ThoughtWorks, you had a very wide observation space with a number of clients, helping them implement data mesh and other things as well prior to your data mesh initiative. But when I look at data mesh, at least the ones that I've seen, they're very narrow. I think of JPMC, I think of HelloFresh. They're generally, obviously not surprising, they don't include the big vision of inclusivity across clouds, across different data storage. But it seems like people are having to go through some gymnastics to get to the organizational reality of decentralizing data and at least pushing data ownership to the line of business. How are you approaching, or are you approaching solving that problem? Are you taking a narrow slice? What can you tell us about Nextdata? >> Yeah, absolutely. Gymnastics, the cute word to describe what the organizations have to go through. And one of those problems is that the data as you know resides on different platforms, it's owned by different people, is processed by pipelines that who knows who owns them. So there's this very disparate and disconnected set of technologies that were very useful for when we thought about data and processing as a centralized problem. But when you think about data as a decentralized problem the cost of integration of these technologies in a cohesive developer experience is what's missing. And we want to focus on that cohesive end-to-end developer experience to share data responsibly in these autonomous units. We call them data products, I guess in data mesh. That constitutes computation. That governs that data policies, discoverability. So I guess, I heard this expression in the last talks that you can have your cake and eat it too. So we want people have their cakes, which is data in different places, decentralization, and eat it too, which is interconnected access to it. So we start with standardizing and codifying this idea of a data product container that encapsulates data computation APIs to get to it in a technology agnostic way, in an open way. And then sit on top and use existing tech, Snowflake, Databricks, whatever exists, the millions of dollars of investments that companies have made, sit on top of those but create this cohesive, integrated experience where data product is a first class primitive. And that's really key here. The language and the modeling that we use is really native to data mesh, which is that I'm building a data product I'm sharing a data product, and that encapsulates I'm providing metadata about this. I'm providing computation that's constantly changing the data. I'm providing the API for that. So we we're trying to kind of codify and create a new developer experience based on that. And developer, both from provider side and user side, connected to peer-to-peer data sharing with data product as a primitive first class concept. >> So the idea would be developers would build applications leveraging those data products, which are discoverable and governed. Now today you see some companies, take a Snowflake for example, attempting to do that within their own little walled garden. They even at one point used the term mesh. I don't know if they pull back on that. And then they became aware of some of your work. But a lot of the things that they're doing within their little insulated environment support that governance, they're building out an ecosystem. What's different in your vision? >> Exactly. So we realized that, and this is a reality, like you go to organizations, they have a Snowflake and half of the organization happily operates on Snowflake. And on the other half, "oh, we are on Bare infrastructure on AWS or we are on Databricks." This is the reality. This supercloud that's written up here, it's about working across boundaries of technology. So we try to embrace that. And even for our own technology with the way we're building it, we say, "Okay, nobody's going to use Nextdata, data mesh operating system. People will have different platforms." So you have to build with openness in mind and in case of Snowflake, I think, they have very, I'm sure very happy customers as long as customers can be on Snowflake. But once you cross that boundary of platforms then that becomes a problem. And we try to keep that in mind in our solution. >> So it's worth reviewing that basically the concept of data mesh is that whether you're a data lake or a data warehouse, an S3 bucket, an Oracle database as well, they should be inclusive inside of the data. >> We did a session with AWS on the startup showcase, data as code. And remember I wrote a blog post in 2007 called "Data as the New Developer Kit" back then we used to call them developer kits if you remember. And that we said at that time, whoever can code data will have a competitive advantage. >> Aren't the machines going to be doing that? Didn't we just hear that? >> Well, we have. Hey, Siri. Hey, Cube, find me that best video for data mesh. There it is. But this is the point, like what's happening is that now data has to be addressable. for machines and for coding because as you need to call the data. So the question is how do you manage the complexity of big things as promiscuous as possible, making it available, as well as then governing it? Because it's a trade off. The more you make open, the better the machine learning. But yet the governance issue, so this is the, you need an OS to handle this maybe. >> Yes. So yes, well we call, our mental model for our platform is an OS operating system. Operating systems have shown us how you can abstract what's complex and take care of a lot of complexities, but yet provide an open and dynamic enough interface. So we think about it that way. Just, we try to solve the problem of policies live with the data, an enforcement of the policies happens at the most granular level, which is in this concept of the data product. And that would happen whether you read, write or access a data product. But we can never imagine what are these policies could be. So our thinking is we should have a policy, open policy framework that can allow organizations write their own policy drivers and policy definitions and encode it and encapsulated in this data product container. But I'm not going to fool myself to say that, that's going to solve the problem that you just described. I think we are in this, I don't know, if I look into my crystal ball, what I think might happen is that right now the primitives that we work with to train machine learning model are still bits and bytes and data. They're fields, rows, columns and that creates quite a large surface area and attack area for privacy of the data. So perhaps one of the trends that we might see is this evolution of data APIs to become more and more computational aware to bring the compute to the data to reduce that surface area. So you can really leave the control of the data to the sovereign owners of that data. So that data product. So I think that evolution of our data APIs perhaps will become more and more computational. So you describe what you want and the data owner decides how to manage. >> That's interesting, Dave, 'cause it's almost like we just talked about ChatGPT in the last segment we had with you. It was a machine learning have been around the industry. It's almost as if you're starting to see reason come into, the data reasoning is like starting to see not just metadata. Using the data to reason so that you don't have to expose the raw data. So almost like a, I won't say curation layer, but an intelligence layer. >> Zhamak: Exactly. >> Can you share your vision on that? 'Cause that seems to be where the dots are connecting. >> Yes, perhaps further into the future because just from where we stand, we have to create still that bridge of familiarity between that future and present. So we are still in that bridge making mode. However, by just the basic notion of saying, "I'm going to put an API in front of my data." And that API today might be as primitive as a level of indirection, as in you tell me what you want, tell me who you are, let me go process that, all the policies and lineage and insert all of this intelligence that need to happen. And then today, I will still give you a file. But by just defining that API and standardizing it now we have this amazing extension point that we can say, "Well, the next revision of this API, you not just tell me who you are, but you actually tell me what intelligence you're after. What's a logic that I need to go and now compute on your API?" And you can evolve that. Now you have a point of evolution to this very futuristic, I guess, future where you just described the question that you're asking from the ChatGPT. >> Well, this is the supercloud, go ahead, Dave. >> I have a question from a fan, I got to get it in. It's George Gilbert. And so his question is, you're blowing away the way we synchronize data from operational systems to the data stack to applications. So the concern that he has and he wants your feedback on this, is the data product app devs get exposed to more complexity with respect to moving data between data products or maybe it's attributes between data products? How do you respond to that? How do you see? Is that a problem? Is that something that is overstated or do you have an answer for that? >> Absolutely. So I think there's a sweet spot in getting data developers, data product developers closer to the app, but yet not overburdening them with the complexity of the application and application logic and yet reducing their cognitive load by localizing what they need to know about, which is that domain where they're operating within. Because what's happening right now? What's happening right now is that data engineers with, a ton of empathy for them for their high threshold of pain that they can deal with, they have been centralized, they've put into the data team, and they have been given this unbelievable task of make meaning out of data, put semantic over it, curate it, cleans it, and so on. So what we are saying is that get those folks embedded into the domain closer to the application developers. These are still separately moving units. Your app and your data products are independent, but yet tightly closed with each other, tightly coupled with each other based on the context of the domain. So reduce cognitive load by localizing what they need to know about to the domain, get them closer to the application, but yet have them separate from app because app provides a very different service. Transactional data for my e-commerce transaction. Data product provides a very different service. Longitudinal data for the variety of this intelligent analysis that I can do on the data. But yet it's all within the domain of e-commerce or sales or whatnot. >> It's a lot of decoupling and coupling create that cohesiveness architecture. So I have to ask you, this is an interesting question 'cause it came up on theCUBE all last year. Back on the old server data center days and cloud, SRE, Google coined the term, site reliability engineer, for someone to look over the hundreds of thousands of servers. We asked the question to data engineering community who have been suffering, by the way, I agree. Is there an SRE like role for data? Because in a way data engineering, that platform engineer, they are like the SRE for data. In other words managing the large scale to enable automation and cell service. What's your thoughts and reaction to that? >> Yes, exactly. So maybe we go through that history of how SRE came to be. So we had the first DevOps movement, which was remove the wall between dev and ops and bring them together. So you have one unit of one cross-functional units of the organization that's responsible for you build it, you run it. So then there is no, I'm going to just shoot my application over the wall for somebody else to manage it. So we did that and then we said, okay, there is a ton, as we decentralized and had these many microservices running around, we had to create a layer that abstracted a lot of the complexity around running now a lot or monitoring, observing, and running a lot while giving autonomy to this cross-functional team. And that's where the SRE, a new generation of engineers came to exist. So I think if I just look at. >> Hence, Kubernetes. >> Hence, hence, exactly. Hence, chaos engineering. Hence, embracing the complexity and messiness. And putting engineering discipline to embrace that and yet give a cohesive and high integrity experience of those systems. So I think if we look at that evolution, perhaps something like that is happening by bringing data and apps closer and make them these domain-oriented data product teams or domain-oriented cross-functional teams full stop and still have a very advanced maybe at the platform level, infrastructure level operational team that they're not busy doing two jobs, which is taking care of domains and the infrastructure, but they're building infrastructure that is embracing that complexity, interconnectivity of this data process. >> So you see similarities? >> I see, absolutely. But I feel like we're probably in a more early days of that movement. >> So it's a data DevOps kind of thing happening where scales happening. It's good things are happening, yet a little bit fast and loose with some complexities to clean up. >> Yes. This is a different restructure. As you said, the job of this industry as a whole, an architect, is decompose recompose, decompose recompose in new way and now we're like decomposing centralized team, recomposing them as domains. >> So is data mesh the killer app for supercloud? >> You had to do this to me. >> Sorry, I couldn't resist. >> I know. Of course you want me to say this. >> Yes. >> Yes, of course. I mean, supercloud, I think it's really, the terminology supercloud, open cloud, but I think in spirits of it this embracing of diversity and giving autonomy for people to make decisions for what's right for them and not yet lock them in. I think just embracing that is baked into how data mesh assume the world would work. >> Well, thank you so much for coming on Supercloud 2. We really appreciate it. Data has driven this conversation. Your success of data mesh has really opened up the conversation and exposed the slow moving data industry. >> Dave: Been a great catalyst. >> That's now going well. We can move faster. So thanks for coming on. >> Thank you for hosting me. It was wonderful. >> Supercloud 2 live here in Palo Alto, our stage performance. I'm John Furrier with Dave Vellante. We'll back with more after this short break. Stay with us all day for Supercloud 2. (upbeat music)

Published Date : Jan 25 2023

SUMMARY :

and continued success on the data mesh. Great to see you in person. and others in the industry. I guess the last few What's the pain point? for many of the organizations. But people in the industry know you did but folks that have been close to us, at least the ones that I've is that the data as you know But a lot of the things that they're doing and half of the organization that basically the concept of data mesh And that we said at that time, is that now data has to be addressable. and the data owner decides how to manage. the data reasoning is like starting to see 'Cause that seems to be where What's a logic that I need to go Well, this is the So the concern that he has into the domain closer to We asked the question to of the organization that's responsible So I think if we look at that evolution, in a more early days of that movement. So it's a data DevOps As you said, the job of Of course you want me to say this. assume the world would work. the conversation and exposed So thanks for coming on. Thank you for hosting me. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
AWS	ORGANIZATION	0.99+
2007	DATE	0.99+
George Gilbert	PERSON	0.99+
Zhamak Dehghani	PERSON	0.99+
Nextdata	ORGANIZATION	0.99+
Zhamak	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
one	QUANTITY	0.99+
Nextdata.com	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
today	DATE	0.99+
HelloFresh	ORGANIZATION	0.99+
ThoughtWorks	ORGANIZATION	0.99+
last year	DATE	0.99+
Supercloud 2	EVENT	0.99+
Oracle	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Siri	TITLE	0.98+
Cube	PERSON	0.98+
Databricks	ORGANIZATION	0.98+
Snowflake	ORGANIZATION	0.97+
Supercloud	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one unit	QUANTITY	0.97+
Snowflake	TITLE	0.96+
SRE	TITLE	0.95+
millions of dollars	QUANTITY	0.94+
first class	QUANTITY	0.94+
hundreds of thousands of servers	QUANTITY	0.92+
supercloud	ORGANIZATION	0.92+
one point	QUANTITY	0.92+
Supercloud 2	TITLE	0.89+
ChatGPT	ORGANIZATION	0.81+
half	QUANTITY	0.81+
Data Mesh the Next Killer App	TITLE	0.78+
supercloud	TITLE	0.75+
a ton	QUANTITY	0.73+
Supercloud 2	ORGANIZATION	0.72+
SiliconANGLE	ORGANIZATION	0.7+
DevOps	TITLE	0.66+
Snowflake	EVENT	0.59+
S3	TITLE	0.54+
last	DATE	0.54+
supercloud	EVENT	0.48+
Kubernetes	TITLE	0.47+

Breaking Analysis: Grading our 2022 Enterprise Technology Predictions

>>From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from the cube and E T R. This is breaking analysis with Dave Valante. >>Making technology predictions in 2022 was tricky business, especially if you were projecting the performance of markets or identifying I P O prospects and making binary forecast on data AI and the macro spending climate and other related topics in enterprise tech 2022, of course was characterized by a seesaw economy where central banks were restructuring their balance sheets. The war on Ukraine fueled inflation supply chains were a mess. And the unintended consequences of of forced march to digital and the acceleration still being sorted out. Hello and welcome to this week's weekly on Cube Insights powered by E T R. In this breaking analysis, we continue our annual tradition of transparently grading last year's enterprise tech predictions. And you may or may not agree with our self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, tell us what you think. >>All right, let's get right to it. So our first prediction was tech spending increases by 8% in 2022. And as we exited 2021 CIOs, they were optimistic about their digital transformation plans. You know, they rushed to make changes to their business and were eager to sharpen their focus and continue to iterate on their digital business models and plug the holes that they, the, in the learnings that they had. And so we predicted that 8% rise in enterprise tech spending, which looked pretty good until Ukraine and the Fed decided that, you know, had to rush and make up for lost time. We kind of nailed the momentum in the energy sector, but we can't give ourselves too much credit for that layup. And as of October, Gartner had it spending growing at just over 5%. I think it was 5.1%. So we're gonna take a C plus on this one and, and move on. >>Our next prediction was basically kind of a slow ground ball. The second base, if I have to be honest, but we felt it was important to highlight that security would remain front and center as the number one priority for organizations in 2022. As is our tradition, you know, we try to up the degree of difficulty by specifically identifying companies that are gonna benefit from these trends. So we highlighted some possible I P O candidates, which of course didn't pan out. S NQ was on our radar. The company had just had to do another raise and they recently took a valuation hit and it was a down round. They raised 196 million. So good chunk of cash, but, but not the i p O that we had predicted Aqua Securities focus on containers and cloud native. That was a trendy call and we thought maybe an M SS P or multiple managed security service providers like Arctic Wolf would I p o, but no way that was happening in the crummy market. >>Nonetheless, we think these types of companies, they're still faring well as the talent shortage in security remains really acute, particularly in the sort of mid-size and small businesses that often don't have a sock Lacework laid off 20% of its workforce in 2022. And CO C e o Dave Hatfield left the company. So that I p o didn't, didn't happen. It was probably too early for Lacework. Anyway, meanwhile you got Netscope, which we've cited as strong in the E T R data as particularly in the emerging technology survey. And then, you know, I lumia holding its own, you know, we never liked that 7 billion price tag that Okta paid for auth zero, but we loved the TAM expansion strategy to target developers beyond sort of Okta's enterprise strength. But we gotta take some points off of the failure thus far of, of Okta to really nail the integration and the go to market model with azero and build, you know, bring that into the, the, the core Okta. >>So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge with others holding their own, not the least of which was Palo Alto Networks as it continued to expand beyond its core network security and firewall business, you know, through acquisition. So overall we're gonna give ourselves an A minus for this relatively easy call, but again, we had some specifics associated with it to make it a little tougher. And of course we're watching ve very closely this this coming year in 2023. The vendor consolidation trend. You know, according to a recent Palo Alto network survey with 1300 SecOps pros on average organizations have more than 30 tools to manage security tools. So this is a logical way to optimize cost consolidating vendors and consolidating redundant vendors. The E T R data shows that's clearly a trend that's on the upswing. >>Now moving on, a big theme of 2020 and 2021 of course was remote work and hybrid work and new ways to work and return to work. So we predicted in 2022 that hybrid work models would become the dominant protocol, which clearly is the case. We predicted that about 33% of the workforce would come back to the office in 2022 in September. The E T R data showed that figure was at 29%, but organizations expected that 32% would be in the office, you know, pretty much full-time by year end. That hasn't quite happened, but we were pretty close with the projection, so we're gonna take an A minus on this one. Now, supply chain disruption was another big theme that we felt would carry through 2022. And sure that sounds like another easy one, but as is our tradition, again we try to put some binary metrics around our predictions to put some meat in the bone, so to speak, and and allow us than you to say, okay, did it come true or not? >>So we had some data that we presented last year and supply chain issues impacting hardware spend. We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain above pre covid levels, which would reverse a decade of year on year declines, which I think started in around 2011, 2012. Now, while demand is down this year pretty substantially relative to 2021, I D C has worldwide unit shipments for PCs at just over 300 million for 22. If you go back to 2019 and you're looking at around let's say 260 million units shipped globally, you know, roughly, so, you know, pretty good call there. Definitely much higher than pre covid levels. But so what you might be asking why the B, well, we projected that 30% of customers would replace security appliances with cloud-based services and that more than a third would replace their internal data center server and storage hardware with cloud services like 30 and 40% respectively. >>And we don't have explicit survey data on exactly these metrics, but anecdotally we see this happening in earnest. And we do have some data that we're showing here on cloud adoption from ET R'S October survey where the midpoint of workloads running in the cloud is around 34% and forecast, as you can see, to grow steadily over the next three years. So this, well look, this is not, we understand it's not a one-to-one correlation with our prediction, but it's a pretty good bet that we were right, but we gotta take some points off, we think for the lack of unequivocal proof. Cause again, we always strive to make our predictions in ways that can be measured as accurate or not. Is it binary? Did it happen, did it not? Kind of like an O K R and you know, we strive to provide data as proof and in this case it's a bit fuzzy. >>We have to admit that although we're pretty comfortable that the prediction was accurate. And look, when you make an hard forecast, sometimes you gotta pay the price. All right, next, we said in 2022 that the big four cloud players would generate 167 billion in IS and PaaS revenue combining for 38% market growth. And our current forecasts are shown here with a comparison to our January, 2022 figures. So coming into this year now where we are today, so currently we expect 162 billion in total revenue and a 33% growth rate. Still very healthy, but not on our mark. So we think a w s is gonna miss our predictions by about a billion dollars, not, you know, not bad for an 80 billion company. So they're not gonna hit that expectation though of getting really close to a hundred billion run rate. We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're gonna get there. >>Look, we pretty much nailed Azure even though our prediction W was was correct about g Google Cloud platform surpassing Alibaba, Alibaba, we way overestimated the performance of both of those companies. So we're gonna give ourselves a C plus here and we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, but the misses on GCP and Alibaba we think warrant a a self penalty on this one. All right, let's move on to our prediction about Supercloud. We said it becomes a thing in 2022 and we think by many accounts it has, despite the naysayers, we're seeing clear evidence that the concept of a layer of value add that sits above and across clouds is taking shape. And on this slide we showed just some of the pickup in the industry. I mean one of the most interesting is CloudFlare, the biggest supercloud antagonist. >>Charles Fitzgerald even predicted that no vendor would ever use the term in their marketing. And that would be proof if that happened that Supercloud was a thing and he said it would never happen. Well CloudFlare has, and they launched their version of Supercloud at their developer week. Chris Miller of the register put out a Supercloud block diagram, something else that Charles Fitzgerald was, it was was pushing us for, which is rightly so, it was a good call on his part. And Chris Miller actually came up with one that's pretty good at David Linthicum also has produced a a a A block diagram, kind of similar, David uses the term metacloud and he uses the term supercloud kind of interchangeably to describe that trend. And so we we're aligned on that front. Brian Gracely has covered the concept on the popular cloud podcast. Berkeley launched the Sky computing initiative. >>You read through that white paper and many of the concepts highlighted in the Supercloud 3.0 community developed definition align with that. Walmart launched a platform with many of the supercloud salient attributes. So did Goldman Sachs, so did Capital One, so did nasdaq. So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud storm. We're gonna take an a plus on this one. Sorry, haters. Alright, let's talk about data mesh in our 21 predictions posts. We said that in the 2020s, 75% of large organizations are gonna re-architect their big data platforms. So kind of a decade long prediction. We don't like to do that always, but sometimes it's warranted. And because it was a longer term prediction, we, at the time in, in coming into 22 when we were evaluating our 21 predictions, we took a grade of incomplete because the sort of decade long or majority of the decade better part of the decade prediction. >>So last year, earlier this year, we said our number seven prediction was data mesh gains momentum in 22. But it's largely confined and narrow data problems with limited scope as you can see here with some of the key bullets. So there's a lot of discussion in the data community about data mesh and while there are an increasing number of examples, JP Morgan Chase, Intuit, H S P C, HelloFresh, and others that are completely rearchitecting parts of their data platform completely rearchitecting entire data platforms is non-trivial. There are organizational challenges, there're data, data ownership, debates, technical considerations, and in particular two of the four fundamental data mesh principles that the, the need for a self-service infrastructure and federated computational governance are challenging. Look, democratizing data and facilitating data sharing creates conflicts with regulatory requirements around data privacy. As such many organizations are being really selective with their data mesh implementations and hence our prediction of narrowing the scope of data mesh initiatives. >>I think that was right on J P M C is a good example of this, where you got a single group within a, within a division narrowly implementing the data mesh architecture. They're using a w s, they're using data lakes, they're using Amazon Glue, creating a catalog and a variety of other techniques to meet their objectives. They kind of automating data quality and it was pretty well thought out and interesting approach and I think it's gonna be made easier by some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to eliminate ET t l, better connections between Aurora and Redshift and, and, and better data sharing the data clean room. So a lot of that is gonna help. Of course, snowflake has been on this for a while now. Many other companies are facing, you know, limitations as we said here and this slide with their Hadoop data platforms. They need to do new, some new thinking around that to scale. HelloFresh is a really good example of this. Look, the bottom line is that organizations want to get more value from data and having a centralized, highly specialized teams that own the data problem, it's been a barrier and a blocker to success. The data mesh starts with organizational considerations as described in great detail by Ash Nair of Warner Brothers. So take a listen to this clip. >>Yeah, so when people think of Warner Brothers, you always think of like the movie studio, but we're more than that, right? I mean, you think of H B O, you think of t n t, you think of C N N. We have 30 plus brands in our portfolio and each have their own needs. So the, the idea of a data mesh really helps us because what we can do is we can federate access across the company so that, you know, CNN can work at their own pace. You know, when there's election season, they can ingest their own data and they don't have to, you know, bump up against, as an example, HBO if Game of Thrones is going on. >>So it's often the case that data mesh is in the eyes of the implementer. And while a company's implementation may not strictly adhere to Jamma Dani's vision of data mesh, and that's okay, the goal is to use data more effectively. And despite Gartner's attempts to deposition data mesh in favor of the somewhat confusing or frankly far more confusing data fabric concept that they stole from NetApp data mesh is taking hold in organizations globally today. So we're gonna take a B on this one. The prediction is shaping up the way we envision, but as we previously reported, it's gonna take some time. The better part of a decade in our view, new standards have to emerge to make this vision become reality and they'll come in the form of both open and de facto approaches. Okay, our eighth prediction last year focused on the face off between Snowflake and Databricks. >>And we realized this popular topic, and maybe one that's getting a little overplayed, but these are two companies that initially, you know, looked like they were shaping up as partners and they, by the way, they are still partnering in the field. But you go back a couple years ago, the idea of using an AW w s infrastructure, Databricks machine intelligence and applying that on top of Snowflake as a facile data warehouse, still very viable. But both of these companies, they have much larger ambitions. They got big total available markets to chase and large valuations that they have to justify. So what's happening is, as we've previously reported, each of these companies is moving toward the other firm's core domain and they're building out an ecosystem that'll be critical for their future. So as part of that effort, we said each is gonna become aggressive investors and maybe start doing some m and a and they have in various companies. >>And on this chart that we produced last year, we studied some of the companies that were targets and we've added some recent investments of both Snowflake and Databricks. As you can see, they've both, for example, invested in elation snowflake's, put money into Lacework, the Secur security firm, ThoughtSpot, which is trying to democratize data with ai. Collibra is a governance platform and you can see Databricks investments in data transformation with D B T labs, Matillion doing simplified business intelligence hunters. So that's, you know, they're security investment and so forth. So other than our thought that we'd see Databricks I p o last year, this prediction been pretty spot on. So we'll give ourselves an A on that one. Now observability has been a hot topic and we've been covering it for a while with our friends at E T R, particularly Eric Bradley. Our number nine prediction last year was basically that if you're not cloud native and observability, you are gonna be in big trouble. >>So everything guys gotta go cloud native. And that's clearly been the case. Splunk, the big player in the space has been transitioning to the cloud, hasn't always been pretty, as we reported, Datadog real momentum, the elk stack, that's open source model. You got new entrants that we've cited before, like observe, honeycomb, chaos search and others that we've, we've reported on, they're all born in the cloud. So we're gonna take another a on this one, admittedly, yeah, it's a re reasonably easy call, but you gotta have a few of those in the mix. Okay, our last prediction, our number 10 was around events. Something the cube knows a little bit about. We said that a new category of events would emerge as hybrid and that for the most part is happened. So that's gonna be the mainstay is what we said. That pure play virtual events are gonna give way to hi hybrid. >>And the narrative is that virtual only events are, you know, they're good for quick hits, but lousy replacements for in-person events. And you know that said, organizations of all shapes and sizes, they learn how to create better virtual content and support remote audiences during the pandemic. So when we set at pure play is gonna give way to hybrid, we said we, we i we implied or specific or specified that the physical event that v i p experience is going defined. That overall experience and those v i p events would create a little fomo, fear of, of missing out in a virtual component would overlay that serves an audience 10 x the size of the physical. We saw that really two really good examples. Red Hat Summit in Boston, small event, couple thousand people served tens of thousands, you know, online. Second was Google Cloud next v i p event in, in New York City. >>Everything else was, was, was, was virtual. You know, even examples of our prediction of metaverse like immersion have popped up and, and and, and you know, other companies are doing roadshow as we predicted like a lot of companies are doing it. You're seeing that as a major trend where organizations are going with their sales teams out into the regions and doing a little belly to belly action as opposed to the big giant event. That's a definitely a, a trend that we're seeing. So in reviewing this prediction, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, but the, but the organization still haven't figured it out. They have hybrid experiences but they generally do a really poor job of leveraging the afterglow and of event of an event. It still tends to be one and done, let's move on to the next event or the next city. >>Let the sales team pick up the pieces if they were paying attention. So because of that, we're only taking a B plus on this one. Okay, so that's the review of last year's predictions. You know, overall if you average out our grade on the 10 predictions that come out to a b plus, I dunno why we can't seem to get that elusive a, but we're gonna keep trying our friends at E T R and we are starting to look at the data for 2023 from the surveys and all the work that we've done on the cube and our, our analysis and we're gonna put together our predictions. We've had literally hundreds of inbounds from PR pros pitching us. We've got this huge thick folder that we've started to review with our yellow highlighter. And our plan is to review it this month, take a look at all the data, get some ideas from the inbounds and then the e t R of January surveys in the field. >>It's probably got a little over a thousand responses right now. You know, they'll get up to, you know, 1400 or so. And once we've digested all that, we're gonna go back and publish our predictions for 2023 sometime in January. So stay tuned for that. All right, we're gonna leave it there for today. You wanna thank Alex Myerson who's on production and he manages the podcast, Ken Schiffman as well out of our, our Boston studio. I gotta really heartfelt thank you to Kristen Martin and Cheryl Knight and their team. They helped get the word out on social and in our newsletters. Rob Ho is our editor in chief over at Silicon Angle who does some great editing for us. Thank you all. Remember all these podcasts are available or all these episodes are available is podcasts. Wherever you listen, just all you do Search Breaking analysis podcast, really getting some great traction there. Appreciate you guys subscribing. I published each week on wikibon.com, silicon angle.com or you can email me directly at david dot valante silicon angle.com or dm me Dante, or you can comment on my LinkedIn post. And please check out ETR AI for the very best survey data in the enterprise tech business. Some awesome stuff in there. This is Dante for the Cube Insights powered by etr. Thanks for watching and we'll see you next time on breaking analysis.

Published Date : Dec 18 2022

SUMMARY :

From the Cube Studios in Palo Alto in Boston, bringing you data-driven insights from self grading system, but look, we're gonna give you the data and you can draw your own conclusions and tell you what, We kind of nailed the momentum in the energy but not the i p O that we had predicted Aqua Securities focus on And then, you know, I lumia holding its own, you So the focus on endpoint security that was a winner in 2022 is CrowdStrike led that charge put some meat in the bone, so to speak, and and allow us than you to say, okay, We said at the time, you can see this on the left hand side of this chart, the PC laptop demand would remain Kind of like an O K R and you know, we strive to provide data We thought they'd exit the year, you know, closer to, you know, 25 billion a quarter and we don't think they're we think, yeah, you might think it's a little bit harsh, we could argue for a B minus to the professor, Chris Miller of the register put out a Supercloud block diagram, something else that So you know, sorry you can hate the term, but very clearly the evidence is gathering for the super cloud But it's largely confined and narrow data problems with limited scope as you can see here with some of the announcements that Amazon made at the recent, you know, reinvent, particularly trying to the company so that, you know, CNN can work at their own pace. So it's often the case that data mesh is in the eyes of the implementer. but these are two companies that initially, you know, looked like they were shaping up as partners and they, So that's, you know, they're security investment and so forth. So that's gonna be the mainstay is what we And the narrative is that virtual only events are, you know, they're good for quick hits, the grade we gave ourselves is, you know, maybe a bit unfair, it should be, you could argue for a higher grade, You know, overall if you average out our grade on the 10 predictions that come out to a b plus, You know, they'll get up to, you know,

ENTITIES

Entity	Category	Confidence
Alex Myerson	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Ken Schiffman	PERSON	0.99+
Chris Miller	PERSON	0.99+
CNN	ORGANIZATION	0.99+
Rob Ho	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave Valante	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
5.1%	QUANTITY	0.99+
2022	DATE	0.99+
Charles Fitzgerald	PERSON	0.99+
Dave Hatfield	PERSON	0.99+
Brian Gracely	PERSON	0.99+
2019	DATE	0.99+
Lacework	ORGANIZATION	0.99+
two	QUANTITY	0.99+
GCP	ORGANIZATION	0.99+
33%	QUANTITY	0.99+
Walmart	ORGANIZATION	0.99+
David	PERSON	0.99+
2021	DATE	0.99+
20%	QUANTITY	0.99+
Kristen Martin	PERSON	0.99+
Palo Alto	LOCATION	0.99+
2020	DATE	0.99+
Ash Nair	PERSON	0.99+
Goldman Sachs	ORGANIZATION	0.99+
162 billion	QUANTITY	0.99+
New York City	LOCATION	0.99+
Databricks	ORGANIZATION	0.99+
October	DATE	0.99+
last year	DATE	0.99+
Arctic Wolf	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
38%	QUANTITY	0.99+
September	DATE	0.99+
Fed	ORGANIZATION	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
80 billion	QUANTITY	0.99+
29%	QUANTITY	0.99+
32%	QUANTITY	0.99+
21 predictions	QUANTITY	0.99+
30%	QUANTITY	0.99+
HBO	ORGANIZATION	0.99+
75%	QUANTITY	0.99+
Game of Thrones	TITLE	0.99+
January	DATE	0.99+
2023	DATE	0.99+
10 predictions	QUANTITY	0.99+
both	QUANTITY	0.99+
22	QUANTITY	0.99+
ThoughtSpot	ORGANIZATION	0.99+
196 million	QUANTITY	0.99+
30	QUANTITY	0.99+
each	QUANTITY	0.99+
last year	DATE	0.99+
Palo Alto Networks	ORGANIZATION	0.99+
2020s	DATE	0.99+
167 billion	QUANTITY	0.99+
Okta	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Eric Bradley	PERSON	0.99+
Aqua Securities	ORGANIZATION	0.99+
Dante	PERSON	0.99+
8%	QUANTITY	0.99+
Warner Brothers	ORGANIZATION	0.99+
Intuit	ORGANIZATION	0.99+
Cube Studios	ORGANIZATION	0.99+
each week	QUANTITY	0.99+
7 billion	QUANTITY	0.99+
40%	QUANTITY	0.99+
Snowflake	ORGANIZATION	0.99+

Lie 3, Today’s Modern Data Stack Is Modern | Starburst

(energetic music) >> Okay, we're back with Justin Borgman, CEO of Starburst, Richard Jarvis is the CTO of EMIS Health, and Teresa Tung is the cloud first technologist from Accenture. We're on to lie number three. And that is the claim that today's "Modern Data Stack" is actually modern. So (chuckles), I guess that's the lie. Or, is that it's not modern. Justin, what do you say? >> Yeah, I think new isn't modern. Right? I think it's the new data stack. It's the cloud data stack, but that doesn't necessarily mean it's modern. I think a lot of the components actually, are exactly the same as what we've had for 40 years. Rather than Teradata, you have Snowflake. Rather than Informatica, you have Fivetran. So, it's the same general stack, just, y'know, a cloud version of it. And I think a lot of the challenges that have plagued us for 40 years still maintain. >> So, let me come back to you Justin. Okay, but there are differences, right? You can scale. You can throw resources at the problem. You can separate compute from storage. You really, there's a lot of money being thrown at that by venture capitalists, and Snowflake you mentioned, its competitors. So that's different. Is it not? Is that not at least an aspect of modern dial it up, dial it down? So what do you say to that? >> Well, it is. It's certainly taking, y'know what the cloud offers and taking advantage of that. But it's important to note that the cloud data warehouses out there are really just separating their compute from their storage. So it's allowing them to scale up and down, but your data's still stored in a proprietary format. You're still locked in. You still have to ingest the data to get it even prepared for analysis. So a lot of the same structural constraints that exist with the old enterprise data warehouse model on-preem still exist. Just yes, a little bit more elastic now because the cloud offers that. >> So Teresa, let me go to you, 'cause you have cloud-first in your title. So, what's say you to this conversation? >> Well, even the cloud providers are looking towards more of a cloud continuum, right? So the centralized cloud as we know it, maybe data lake, data warehouse in the central place, that's not even how the cloud providers are looking at it. They have use query services. Every provider has one that really expands those queries to be beyond a single location. And if we look at a lot of where our- the future goes, right? That's going to very much fall the same thing. There was going to be more edge. There's going to be more on-premise, because of data sovereignty, data gravity, because you're working with different parts of the business that have already made major cloud investments in different cloud providers, right? So, there's a lot of reasons why the modern, I guess, the next modern generation of the data stack needs to be much more federated. >> Okay, so Richard, how do you deal with this? You've obviously got, you know, the technical debt, the existing infrastructure, it's on the books. You don't want to just throw it out. A lot of conversation about modernizing applications, which a lot of times is, you know, of microservices layer on top of legacy apps. How do you think about the Modern Data Stack? >> Well, I think probably the first thing to say is that the stack really has to include the processes and people around the data as well is all well and good changing the technology. But if you don't modernize how people use that technology, then you're not going to be able to, to scale because just 'cause you can scale CPU and storage doesn't mean you can get more people to use your data to generate you more value for the business. And so what we've been looking at is really changing in very much aligned to data products and, and data mesh. How do you enable more people to consume the service and have the stack respond in a way that keeps costs low? Because that's important for our customers consuming this data but also allows people to occasionally run enormous queries and then tick along with smaller ones when required. And it's a good job we did because during COVID all of a sudden we had enormous pressures on our data platform to answer really important life threatening queries. And if we couldn't scale both our data stack and our teams we wouldn't have been able to answer those as quickly as we had. So I think the stack needs to support a scalable business not just the technology itself. >> Well thank you for that. So Justin let's, let's try to break down what the critical aspects are of the modern data stack. So you think about the past, you know, five seven years cloud obviously has given a different pricing model. Derisked experimentation, you know that we talked about the ability to scale up scale down, but it's, I'm taking away that that's not enough. Based on what Richard just said, the modern data stack has to serve the business and enable the business to build data products. I buy that. I'm you a big fan of the data mesh concepts, even though we're early days. So what are the critical aspects if you had to think about you know, the, maybe putting some guardrails and definitions around the modern data stack, what does that look like? What are some of the attributes and, and principles there >> Of how it should look like or, or how >> Yeah. What it should be? >> Yeah. Yeah. Well, I think, you know, in, in Theresa mentioned this in in a previous segment about the data warehouse is not necessarily going to disappear. It just becomes one node, one element of the overall data mesh. And I certainly agree with that. So by no means, are we suggesting that, you know Snowflake or what Redshift or whatever cloud data warehouse you may be using is going to disappear, but it's it's not going to become the end all be all. It's not the, the central single source of truth. And I think that's the paradigm shift that needs to occur. And I think it's also worth noting that those who were the early adopters of the modern data stack were primarily digital, native born in the cloud young companies who had the benefit of of idealism. They had the benefit of starting with a clean slate that does not reflect the vast majority of enterprises. And even those companies, as they grow up, mature out of that ideal state, they go by a business. Now they've got something on another cloud provider that has a different data stack and they have to deal with that heterogeneity that is just change and change is a part of life. And so I think there is an element here that is almost philosophical. It's like, do you believe in an absolute ideal where I can just fit everything into one place or do I believe in reality? And I think the far more pragmatic approach is really what data mesh represents. So to answer your question directly, I think it's adding you know, the ability to access data that lives outside of the data warehouse, maybe living in open data formats in a data lake or accessing operational systems as well. Maybe you want to directly access data that lives in an Oracle database or a Mongo database or, or what have you. So creating that flexibility to really future proof yourself from the inevitable change that you will you won't encounter over time. >> So thank you. So Theresa, based on what Justin just said, I I might take away there is it's inclusive whether it's a data mart, data hub, data lake, data warehouse, just a node on the mesh. Okay. I get that. Does that include Theresa on, on Preem data? Obviously it has to. What are you seeing in terms of the ability to, to take that data mesh concept on Preem I mean most implementations I've seen and data mesh, frankly really aren't, you know adhering to the philosophy there. Maybe, maybe it's data lake and maybe it's using glue. You look at what JPMC is doing, HelloFresh, a lot of stuff happening on the AWS cloud in that, you know, closed stack, if you will. What's the answer to that Theresa? >> I mean, I think it's a killer case for data mesh. The fact that you have valuable data sources on Preem, and then yet you still want to modernize and take the best of cloud. Cloud is still, like we mentioned, there's a lot of great reasons for it around the economics and the way ability to tap into the innovation that the cloud providers are giving around data and AI architecture. It's an easy button. So the mesh allows you to have the best of both world. You can start using the data products on Preem, or in the existing systems that are working already. It's meaningful for the business. At the same time, you can modernize the ones that make business sense because it needs better performance. It needs, you know, something that is, is cheaper or or maybe just tapping into better analytics to get better insights, right? So you're going to be able to stretch and really have the best of both worlds. That, again, going back to Richard's point, that is meaningful by the business. Not everything has to have that one size fits all set a tool. >> Okay. Thank you. So Richard, you know, talking about data as product wonder if we could give us your perspectives here what are the advantages of treating data as a product? What, what role do data products have in the modern data stack? We talk about monetizing data. What are your thoughts on data products? >> So for us, one of the most important data products that we've been creating is taking data that is healthcare data across a wide variety of different settings. So information about patients, demographics about their their treatment, about their medications and so on, and taking that into a standards format that can be utilized by a wide variety of different researchers because misinterpreting that data or having the data not presented in the way that the user is expecting means that you generate the wrong insight and in any business that's clearly not a desirable outcome but when that insight is so critical as it might be in healthcare or some security settings you really have to have gone to the trouble of understanding the data, presenting it in a format that everyone can clearly agree on. And then letting people consume in a very structured managed way, even if that data comes from a variety of different sources in the first place. And so our data product journey has really begun by standardizing data across a number of different silos through the data mesh. So we can present out both internally and through the right governance externally to, to researchers. >> So that data product through whatever APIs is is accessible, it's discoverable, but it's obviously got to be governed as well. You mentioned appropriately provided to internally. >> Yeah. >> But also, you know, external folks as well. So the, so you've, you've architected that capability today? >> We have and because the data is standard it can generate value much more quickly and we can be sure of the security and value that that's providing, because the data product isn't just about formatting the data into the correct tables, it's understanding what it means to redact the data or to remove certain rows from it or to interpret what a date actually means. Is it the start of the contract or the start of the treatment or the date of birth of a patient? These things can be lost in the data storage without having the proper product management around the data to say in a very clear business context what does this data mean, and what does it mean to process this data for a particular use case. >> Yeah, it makes sense. It's got the context. If the, if the domains on the data, you know you got to cut through a lot of the, the centralized teams, the technical teams that that data agnostic, they don't really have that context. All right, let's end. Justin. How does Starburst fit into this modern data stack? Bring us home. >> Yeah. So I think for us it's really providing our customers with, you know the flexibility to operate and analyze data that lives in a wide variety of different systems. Ultimately giving them that optionality, you know and optionality provides the ability to reduce costs store more in a data lake rather than data warehouse. It provides the ability for the fastest time to insight to access the data directly where it lives. And ultimately with this concept of data products that we've now, you know incorporated into our offering as well you can really create and, and curate, you know data as a product to be shared and consumed. So we're trying to help enable the data mesh, you know model and make that an appropriate compliment to you know, the modern data stack that people have today. >> Excellent. Hey, I want to thank Justin, Teresa, and Richard for joining us today. You guys are great. Big believers in the in the data mesh concept, and I think, you know we're seeing the future of data architecture. So thank you. Now, remember, all these conversations are going to be available on the cube.net for on demand viewing. You can also go to starburst.io. They have some great content on the website and they host some really thought provoking interviews and they have awesome resources. Lots of data mesh conversations over there and really good stuff in, in the resource section. So check that out. Thanks for watching the "Data Doesn't Lie... or Does It?" made possible by Starburst data. This is Dave Vellante for the Cube, and we'll see you next time. (upbeat music)

Published Date : Aug 22 2022

SUMMARY :

And that is the claim It's the cloud data stack, So, let me come back to you Justin. that the cloud data warehouses out there So Teresa, let me go to you, So the centralized cloud as we know it, it's on the books. the first thing to say is of the modern data stack. from the inevitable change that you will What's the answer to that Theresa? So the mesh allows you to in the modern data stack? or having the data not presented So that data product But also, you know, around the data to say in a on the data, you know enable the data mesh, you know in the data mesh concept,

ENTITIES

Entity	Category	Confidence
Richard	PERSON	0.99+
Teresa Tung	PERSON	0.99+
Justin	PERSON	0.99+
Teresa	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Justin Borgman	PERSON	0.99+
Richard Jarvis	PERSON	0.99+
40 years	QUANTITY	0.99+
Theresa	PERSON	0.99+
Starburst	ORGANIZATION	0.99+
JPMC	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Informatica	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
both worlds	QUANTITY	0.99+
today	DATE	0.99+
EMIS Health	ORGANIZATION	0.99+
first technologist	QUANTITY	0.98+
one element	QUANTITY	0.98+
both	QUANTITY	0.98+
first thing	QUANTITY	0.98+
five seven years	QUANTITY	0.98+
one	QUANTITY	0.97+
Teradata	ORGANIZATION	0.97+
Oracle	ORGANIZATION	0.97+
cube.net	OTHER	0.96+
Mongo	ORGANIZATION	0.95+
one size	QUANTITY	0.93+
Cube	ORGANIZATION	0.92+
Preem	TITLE	0.92+
both world	QUANTITY	0.91+
one place	QUANTITY	0.91+
Today’s	TITLE	0.89+
Fivetran	ORGANIZATION	0.86+
Data Doesn't Lie... or Does It?	TITLE	0.86+
single location	QUANTITY	0.85+
HelloFresh	ORGANIZATION	0.84+
first place	QUANTITY	0.83+
CEO	PERSON	0.83+
Lie	TITLE	0.82+
single source	QUANTITY	0.79+
first	QUANTITY	0.75+
one node	QUANTITY	0.72+
Snowflake	ORGANIZATION	0.66+
Snowflake	TITLE	0.66+
three	QUANTITY	0.59+
CTO	PERSON	0.53+
Data Stack	TITLE	0.53+
Redshift	TITLE	0.52+
starburst.io	OTHER	0.48+
COVID	TITLE	0.37+

George Fraser, Fivetran & Veronika Durgin, Saks | Snowflake Summit 2022

(upbeat music) >> Hey, gang. Welcome back to theCUBE's coverage of Snowflake Summit '22 live on the show floor at Caesar's Forum in Las Vegas. Lisa Martin here with Dave Vellante. Couple of guests joining us to unpack more of what we've been talking about today. George Fraser joins us, the CEO of Fivetran, and Veronika Durgin, the head of data at Saks Fifth Avenue. Guys, welcome to the program. >> Thank you for having us. >> Hello. >> George, talk to us about Fivetran for the audience that may not be super familiar. Talk to us about the company, your vision, your mission, your differentiation, and then maybe the partnership with Snowflake. >> Well, a lot of people in the audience here at Snowflake Summit probably are familiar with Fivetran. We have almost 2000 shared customers with them. So a considerable amount of the data that we're all talking about here, flows through Fivetran. But in brief, what Fivetran is, is we're data pipeline. And that means that we go get all the data of your company in all the places that it lives. So all your tools and systems that you use to run your company. We go get that data and we bring it all together in one place like Snowflake. And that is the first step in doing anything with data is getting it all in one place. >> So you've been considerable amount of shared customers. I think I saw this morning on the slide over 5,900, but you're saying you're already at around 2000 shared customers. Lots of innovation I'm sure, with between both companies, but talk to us about some of the latest developments at Fivetran, in terms of product, in terms of company growth, what's going on? >> Well, one of the biggest things that happened recently with Fivetran is we acquired another data integration company called HVR. And HVR specialty has always been replicating the biggest, baddest enterprise databases like Oracle and SQL Server databases that are enormous, that are run within an inch of their capabilities by their DBAs. And HVR was always known as the best in the business at that scenario. And by bringing that together with Fivetran, we now really have the full spectrum of capabilities. We can replicate all types of data for all sizes of company. And so that's a really exciting development for us and for the industry. >> So Veronika, head of data at Saks, what does that entail? How do you spend your time? What's your purview? >> So the cool thing abouts Saks is a very old company. Saks is the premier luxury e-commerce platform. And we help our Saks Fifth Avenue customers just express themselves through fashion. So we're trying to modernize very old company and we do have the biggest, baddest databases of any flavor you can imagine. So my job is to modernize, to bring us to near real-time data, to make sure data is available to all of our users so they can actually take advantage of it. >> So let's talk about some of those biggest, baddest hair balls that you've, and how you deal with that. So lot of over time, you've built up a lot of data. You've got different data stores. So, what are you doing with that? And what role does Fivetran and Snowflake play in helping you modernize? >> Yeah, Fivetran helps us ingest data from all of those data sources into Snowflake near real-time. It's very important to us. And like one of the examples that I give is within a matter of maybe a few weeks, we were able to get data from over a dozen of different data sources into Snowflake in near real-time. And some of those data sources were not available to our users in the past, and everybody was so excited. And the reason they weren't available is because they require a lot of engineering effort to actually build those data pipelines to manage them and maintain them. >> Lisa: Whoa, sorry. >> That was just a follow up. So, Fivetran is the consolidator of all that data and- >> That's right. >> Snowflake plays that role also. >> We bring it all together, and the place that it is consolidated is Snowflake. And from there you can really do anything with it. And there's really three things you were touching on it that make data integration hard. One is volume, and that's the one that people tend to talk about, just size of data. And that is important, but it's not the only thing. It's also latency. How fresh is the data in the locus of consolidation? Before Fivetran, the state of the art was nightly snapshots, once a day was considered pretty good. And we consider now once a minute pretty good and we're trying to make it even better. And then the last challenge, which people tend not to talk about, it's the dark secret of our industry is just incidental complexity. All of these data sources have a lot of strange behaviors and rules and corner cases. Every data source is a little bit different. And so a lot of what we bring that to the table, is that we've done the work over 10 years. And in the case of HVR, since the 90s', to map out all of these little complexities of all these data sources, that as a user, you don't have to see it. You just connect source, connect destination, and that's it. >> So you don't have to do the M word migrate off of all those databases. You can maybe allow them to dial them down over time, then create new value with using Fivetran and Snowflake. Is that the right way to think about it? >> Well, Fivetran, it's incredibly simple. You just connect it to whatever source, And then the matter of minutes you have a pipeline. And for us, it's in the matter of minutes, for Fivetran, there's hundreds of engineers, we're extending our data engineering team to now Fivetran. And we can pick and choose which tables we want to replicate which fields. And once data lands in Snowflake, now we have data across different sources in one place, in central place. And now we can do all kinds of different things. We can integrate it data together, we can do validations, we can do reconciliations. We now have ability to do point in time historical journey, in the past in transactional system, you don't see that, you only see data that's right now, but now that we replicate everything to Snowflake and Snowflake being so powerful as an analytical platform, we can do, what did it look like two months ago? What did it look like two years ago? >> You've got all that time series data, okay. >> And to address that word you mentioned a moment ago, migrate, this is something people often get confused about. What we're talking about here is not a migration, these source systems are not going away. These databases are the systems powering saks.com and they're staying right there. They're the systems you interact with when you place an order on this site. The purpose of our tool and the whole stack that Veronika has put together, is to serve other workloads in Snowflake that need to have access to all of the data together. >> But if you didn't have Snowflake, you would have to push those other data stores, try to have them do things that they have sometimes a tough time doing. >> Yeah, and you can't run analytical workloads. You cannot do reporting on the transactional database. It's not meant for that. It's supporting capability of an application and it's configured to be optimized for that. So we always had to offload those specific analytical reporting functionality, or machine learning somewhere else, and Snowflake is excellent for that. It's meant for that, yeah. >> I was going to ask you what you were doing before, you just answered that. What was the aha moment for realizing you needed to work with the power of Fivetran and Snowflake? If we look at, you talked about Saks being a legacy history company that's obviously been very successful at transforming to the digital age, but what was that one thing, as the head of the data you felt this is it? >> Great question. I've worked with Fivetran in the past. This is my third company, same with Snowflake. I actually brought Fivetran into two companies at this point. So my first experience with both Fivetran and Snowflake, was this like, this is where I want to be, this is the stack and the tooling, and just the engineering behind it. So as I moved on the next company, that that was, I'm bringing tools with me. So that was part. And the other thing I wanted to mention, when we evaluate tools for a new platform, we look at things in like three dimensions, right? One with cloud first, we want to have cloud native tools, and they have to be modular, but we also don't want to have too many tools. So Fivetran's certainly checks that off. They're first cloud native, and they also have a very long list of connectors. The other thing is for us, it's very important that data engineering effort is spent on actually analyzing data, not building pipelines and supporting infrastructure. In Fivetran, reliable, it's secure, it has various connectors, so it checks off that box as well. And another thing is that we're looking for companies we can partner with. So companies that help us grow and grow with us, we'll look in a company culture, their maturity, how they treat their customers and how they innovate. And again, Fivetran checks off that box as well. >> And I imagine Snowflake does as well, Frank Lutman on stage this morning talked about mission alignment. And it seemed to me like, wow, one of the missions of Snowflake is to align with its customer's missions. It sounds like from the conversations that Dave and I have had today, that it's the same with partners, but it sounds like you have that cultural alignment with Fivetran and Snowflake. >> Oh, absolutely. >> And Fivetran has that, obviously with 2000 shared customers. >> Yeah, I think that, well, not quite there yet, but we're close, (laughs) I think that the most important way that we've always been aligned with our customers is that we've been very clear on what we do and don't do. And that our job is to get the data from here to there, that the data be accurately replicated, which means in practice often joke that it is exactly as messed up as it was in the source. No better and no worse, but we really will accomplish that task. You do not need to worry about that. You can well and fully delegate it to us, but then what you do with the data, we don't claim that we're going to solve that problem for you. That's up to you. And anyone who claims that they're going to solve that problem for you, you should be very skeptical. >> So how do you solve that problem? >> Well, that's where modeling comes in, right? You get data from point A to point B, and it's like bad in, bad out. Like, that's it, and that's where we do those reconciliations, and that's where we model our data. We actually try to understand what our businesses, how our users, how they talk about data, how they talk about business. And that's where data warehouse is important. And in our case, it's data evolve. >> Talk to me a little bit before we wrap here about the benefits to the end user, the consumer. Say I'm on saks.com, I'm looking for a particular item. What is it about this foundation that Saks has built with Fivetran and with Snowflake, that's empowering me as a consumer, to be able to get, find what I want, get the transaction done like that? >> So getting access to, our end goal is to help our customers, right? Make their experience beautiful, luxurious. We want to make sure that what we put in front of you is what you're looking for. So you can actually make that purchase, and you're happy with it. So having that data, having that data coming from various different sources into one place enables us to do that near real-time analytics so we can help you as a customer to find what you're looking for. >> Magic on the back end, delighting customers. >> So the world is still messed up, right? Airlines are out of whack. There's supply imbalances. You've got the situation in Ukraine with oil prices. The Fed missed the mark. So can data solve these problems? If you think about the context of the macro environment, and you bring it down to what you're seeing at Saks, with your relationship with Fivetran and with Snowflake, do you see the light at the end of that confusion tunnel? >> That's such a great question. Very philosophical. I don't think data can solve it. Is the people looking at data and working together that can solve it. >> I think data can help, data can't stop a war. Data can help you forecast supply chain misses and mitigate those problems. So data can help. >> Can be a facilitator. >> Sorry, what? >> Can be a facilitator. >> Yeah, it can be a facilitator of whatever you end up doing with it. Data can be used for good or evil. It's ultimately up to the user. >> It's a tool, right? Do you bring a hammer to a gunfight? No, but t's a tool in the right hands, for the right purpose, it can definitely help. >> So you have this great foundation, you're able to delight customers as especially from a luxury brand perspective. I imagine that luxury customers have high expectations. What's next for Saks from a data perspective? >> Well, we want to first and foremost to modernize our data platform. We want to make sure we actually bring that near real-time data to our customers. We want to make sure data's reliable. That well understood that we do the data engineering and the modeling behind the scenes so that people that are using our data can rely on it. Because it's like, there is bad data is bad data but we want to make sure it's very clear. And what's next? The sky's the limit. >> Can you describe your data teams? Is it highly centralized? What's your philosophy in terms of the architecture of the organization? >> So right now we are starting with a centralized team. It just works for us as we're trying to rebuild our platform, and modernize it. But as we become more mature, we establish our practices, our data governance, our definitions, then I see a future where we like decentralize a little bit and actually each team has their own analytical function, or potentially data engineering function as well. >> That'll be an interesting discussion when you get there. >> That's a hot topic. >> It's one of the hardest problems in building a data team is whether decentralized or decentralized. We're still centralized at Fivetran, but companies now over 1000 people, and we're starting to feel the strain of that. And inevitably, you eventually have to find a way to find scenes and create specialization. >> You just have to be fluid, right? And then go with the company as the company grows and things change. >> Yeah, I've worked with some companies. JPMC is here, they've got a little, I'll call it a skunk works. They're probably under states what they're doing, but they're testing that out. A company like HelloFresh is doing some things 'cause their Hadoop cluster just couldn't scale. So they have to begin to decentralize. It is a hot topic these days. And I'm not sure there's a right or wrong. It's really a situational. But I think in a lot of situations, it's maybe the trend. >> Yeah. >> Yeah, I think centralized versus decentralized technology is a different question than centralized versus decentralized teams. >> Yes. >> They're both valid, but they're very different. And sometimes people conflate them, and that's very dangerous. Because you might want one to be centralized and the other to be decentralized. >> Well, it's true. And I think a lot of folks look at a centralized team and say, "Hey, it's more efficient to have these specialized roles, but at the same time, what's the outcome?" If the outcome can be optimized and it's maybe a little bit more people expensive, or I don't know. And they're in the lines of business where there's data context, that might be a better solution for a company. >> So to truly understand the value of data, you have to specialize in that specific area. So I see people like deep diving into specific vertical or whatever that is, and truly understanding what data they have and how to taken advantage of it. >> Well, all this talk about monetization and building data products, you're there, right? >> Yeah. >> You're on the cusp of that. And so who's going to build those data products? It's going to be somebody in the business. Today they don't "Own the life cycle" of the data. They don't feel responsible for it, but they complain when it's not what they want. And so, I feel as though what Snowflake is doing is actually attacking some of those problems. Not 100% there obviously, but a lot of work to do. >> Great analysts are great navigators of organizations amongst other things. And one of the best things that's happened as part of this evolution from technology like Hadoop to technology like Snowflake is the new stack is a lot simpler. There's a lot less technical knowledge that you need. You still need technical knowledge, but not nearly what you used to. And that has made it accessible to more people. People who bring different skills to the table. And in many cases, those are the skills you really need to deliver value from data is not, do you know the inner workings of HDFS? But do you know how to extract from your constituents in the organization, a precise version of the question that they're trying to ask? >> We really want them spending their time, the technical infrastructure is an operational detail, so you can put your teams on those types of questions, not how do we make it work? And that's what Hadoop was, "Hey, we got it to work." >> And that's something we're obsessed with. We're always trying to hide the technical complexities of the problem of data centralization behind the scenes. Even if it's harder for us, even if it's more expensive for us, we will pay any costs so that you don't have to see it. Because that allows our customers to focus on more high impact. >> Well, this is a case where a technology vendor's R&D is making your life easier. >> Veronika: Easier, right. >> I would presume you'd rather spend money to save time, than spend your time, to save engineering time, to save money. >> That's true. And at the end of the day, hiring three data engineers to do custom work that a tool does, it's actually not saving money. It costs more in the end. But to your point, pulling business people into those data teams gives them ownership, and they feel like they're part of the solution. And it's such a great feeling so that they're excited to contribute, they're excited to help us. So I love where the industry's going like in that direction. >> And of course, that's the theme of the show, the world around data collaborations. Absolutely critical, guys. Thank you so much for joining Dave and me, talking about Fivetran, Snowflake together, what you're doing to empower Saks, to be a data company. I'm going to absolutely have a different perspective next time I shop there. Thanks for joining us. Thank you. >> Dave: Thank you, guys. >> Thank you. >> For our guests and for Dave Vellante, I'm Lisa Martin. You're watching theCUBE live from Snowflake Summit '22, from Vegas. Stick around, our next guest joins us momentarily. (upbeat music)

Published Date : Jun 15 2022

SUMMARY :

on the show floor at for the audience that may And that is the first step of the latest developments and for the industry. Saks is the premier luxury and how you deal with that. And like one of the examples that I give So, Fivetran is the consolidator And in the case of HVR, since the 90s', Is that the right way to think about it? but now that we replicate You've got all that They're the systems you interact with that they have sometimes and it's configured to as the head of the data And the other thing I wanted to mention, that it's the same with partners, And Fivetran has that, And that our job is to get And in our case, it's data evolve. to be able to get, find what I want, so we can help you as a customer Magic on the back end, of the macro environment, Is the people looking at data Data can help you forecast of whatever you end up doing with it. for the right purpose, So you have this great foundation, and the modeling behind the scenes So right now we are starting discussion when you get there. And inevitably, you as the company grows and things change. So they have to begin to decentralize. is a different question and the other to be decentralized. but at the same time, what's the outcome?" and how to taken advantage of it. of the data. And one of the best things that's happened And that's what Hadoop was, so that you don't have to see it. is making your life easier. to save engineering time, to save money. And at the end of the day, And of course, that's guest joins us momentarily.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Veronika Durgin	PERSON	0.99+
Saks	ORGANIZATION	0.99+
Frank Lutman	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Fivetran	ORGANIZATION	0.99+
George Fraser	PERSON	0.99+
Veronika	PERSON	0.99+
George	PERSON	0.99+
Vegas	LOCATION	0.99+
JPMC	ORGANIZATION	0.99+
HelloFresh	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
Lisa	PERSON	0.99+
Ukraine	LOCATION	0.99+
Snowflake	ORGANIZATION	0.99+
both companies	QUANTITY	0.99+
third company	QUANTITY	0.99+
first step	QUANTITY	0.99+
each team	QUANTITY	0.99+
Snowflake	TITLE	0.99+
first experience	QUANTITY	0.99+
100%	QUANTITY	0.99+
two years ago	DATE	0.99+
over 1000 people	QUANTITY	0.99+
two months ago	DATE	0.98+
Today	DATE	0.98+
Las Vegas	LOCATION	0.98+
over 10 years	QUANTITY	0.98+
over a dozen	QUANTITY	0.98+
one	QUANTITY	0.98+
one place	QUANTITY	0.98+
today	DATE	0.98+
Snowflake Summit '22	EVENT	0.98+
HVR	ORGANIZATION	0.98+
both	QUANTITY	0.97+
Couple of guests	QUANTITY	0.97+
One	QUANTITY	0.97+
once a minute	QUANTITY	0.97+
Snowflake Summit 2022	EVENT	0.97+
Hadoop	TITLE	0.97+
Snowflake Summit	EVENT	0.97+
Caesar's Forum	LOCATION	0.97+
saks.com	OTHER	0.96+
three dimensions	QUANTITY	0.96+

Breaking Analysis: Technology & Architectural Considerations for Data Mesh

>> From theCUBE Studios in Palo Alto and Boston, bringing you data driven insights from theCUBE in ETR, this is Breaking Analysis with Dave Vellante. >> The introduction in socialization of data mesh has caused practitioners, business technology executives, and technologists to pause, and ask some probing questions about the organization of their data teams, their data strategies, future investments, and their current architectural approaches. Some in the technology community have embraced the concept, others have twisted the definition, while still others remain oblivious to the momentum building around data mesh. Here we are in the early days of data mesh adoption. Organizations that have taken the plunge will tell you that aligning stakeholders is a non-trivial effort, but necessary to break through the limitations that monolithic data architectures and highly specialized teams have imposed over frustrated business and domain leaders. However, practical data mesh examples often lie in the eyes of the implementer, and may not strictly adhere to the principles of data mesh. Now, part of the problem is lack of open technologies and standards that can accelerate adoption and reduce friction, and that's what we're going to talk about today. Some of the key technology and architecture questions around data mesh. Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR, and in this Breaking Analysis, we welcome back the founder of data mesh and director of Emerging Technologies at Thoughtworks, Zhamak Dehghani. Hello, Zhamak. Thanks for being here today. >> Hi Dave, thank you for having me back. It's always a delight to connect and have a conversation. Thank you. >> Great, looking forward to it. Okay, so before we get into it in the technology details, I just want to quickly share some data from our friends at ETR. You know, despite the importance of data initiative since the pandemic, CIOs and IT organizations have had to juggle of course, a few other priorities, this is why in the survey data, cyber and cloud computing are rated as two most important priorities. Analytics and machine learning, and AI, which are kind of data topics, still make the top of the list, well ahead of many other categories. And look, a sound data architecture and strategy is fundamental to digital transformations, and much of the past two years, as we've often said, has been like a forced march into digital. So while organizations are moving forward, they really have to think hard about the data architecture decisions that they make, because it's going to impact them, Zhamak, for years to come, isn't it? >> Yes, absolutely. I mean, we are moving really from, slowly moving from reason based logical algorithmic to model based computation and decision making, where we exploit the patterns and signals within the data. So data becomes a very important ingredient, of not only decision making, and analytics and discovering trends, but also the features and applications that we build for the future. So we can't really ignore it, and as we see, some of the existing challenges around getting value from data is not necessarily that no longer is access to computation, is actually access to trustworthy, reliable data at scale. >> Yeah, and you see these domains coming together with the cloud and obviously it has to be secure and trusted, and that's why we're here today talking about data mesh. So let's get into it. Zhamak, first, your new book is out, 'Data Mesh: Delivering Data-Driven Value at Scale' just recently published, so congratulations on getting that done, awesome. Now in a recent presentation, you pulled excerpts from the book and we're going to talk through some of the technology and architectural considerations. Just quickly for the audience, four principles of data mesh. Domain driven ownership, data as product, self-served data platform and federated computational governance. So I want to start with self-serve platform and some of the data that you shared recently. You say that, "Data mesh serves autonomous domain oriented teams versus existing platforms, which serve a centralized team." Can you elaborate? >> Sure. I mean the role of the platform is to lower the cognitive load for domain teams, for people who are focusing on the business outcomes, the technologies that are building the applications, to really lower the cognitive load for them, to be able to work with data. Whether they are building analytics, automated decision making, intelligent modeling. They need to be able to get access to data and use it. So the role of the platform, I guess, just stepping back for a moment is to empower and enable these teams. Data mesh by definition is a scale out model. It's a decentralized model that wants to give autonomy to cross-functional teams. So it is core requires a set of tools that work really well in that decentralized model. When we look at the existing platforms, they try to achieve this similar outcome, right? Lower the cognitive load, give the tools to data practitioners, to manage data at scale because today centralized teams, really their job, the centralized data teams, their job isn't really directly aligned with a one or two or different, you know, business units and business outcomes in terms of getting value from data. Their job is manage the data and make the data available for then those cross-functional teams or business units to use the data. So the platforms they've been given are really centralized around or tuned to work with this structure as a team, structure of centralized team. Although on the surface, it seems that why not? Why can't I use my, you know, cloud storage or computation or data warehouse in a decentralized way? You should be able to, but some changes need to happen to those online platforms. As an example, some cloud providers simply have hard limits on the number of like account storage, storage accounts that you can have. Because they never envisaged you have hundreds of lakes. They envisage one or two, maybe 10 lakes, right. They envisage really centralizing data, not decentralizing data. So I think we see a shift in thinking about enabling autonomous independent teams versus a centralized team. >> So just a follow up if I may, we could be here for a while. But so this assumes that you've sorted out the organizational considerations? That you've defined all the, what a data product is and a sub product. And people will say, of course we use the term monolithic as a pejorative, let's face it. But the data warehouse crowd will say, "Well, that's what data march did. So we got that covered." But Europe... The primest of data mesh, if I understand it is whether it's a data march or a data mart or a data warehouse, or a data lake or whatever, a snowflake warehouse, it's a node on the mesh. Okay. So don't build your organization around the technology, let the technology serve the organization is that-- >> That's a perfect way of putting it, exactly. I mean, for a very long time, when we look at decomposition of complexity, we've looked at decomposition of complexity around technology, right? So we have technology and that's maybe a good segue to actually the next item on that list that we looked at. Oh, I need to decompose based on whether I want to have access to raw data and put it on the lake. Whether I want to have access to model data and put it on the warehouse. You know I need to have a team in the middle to move the data around. And then try to figure organization into that model. So data mesh really inverses that, and as you said, is look at the organizational structure first. Then scale boundaries around which your organization and operation can scale. And then the second layer look at the technology and how you decompose it. >> Okay. So let's go to that next point and talk about how you serve and manage autonomous interoperable data products. Where code, data policy you say is treated as one unit. Whereas your contention is existing platforms of course have independent management and dashboards for catalogs or storage, et cetera. Maybe we double click on that a bit. >> Yeah. So if you think about that functional, or technical decomposition, right? Of concerns, that's one way, that's a very valid way of decomposing, complexity and concerns. And then build solutions, independent solutions to address them. That's what we see in the technology landscape today. We will see technologies that are taking care of your management of data, bring your data under some sort of a control and modeling. You'll see technology that moves that data around, will perform various transformations and computations on it. And then you see technology that tries to overlay some level of meaning. Metadata, understandability, discovery was the end policy, right? So that's where your data processing kind of pipeline technologies versus data warehouse, storage, lake technologies, and then the governance come to play. And over time, we decomposed and we compose, right? Deconstruct and reconstruct back this together. But, right now that's where we stand. I think for data mesh really to become a reality, as in independent sources of data and teams can responsibly share data in a way that can be understood right then and there can impose policies, right then when the data gets accessed in that source and in a resilient manner, like in a way that data changes structure of the data or changes to the scheme of the data, doesn't have those downstream down times. We've got to think about this new nucleus or new units of data sharing. And we need to really bring back transformation and governing data and the data itself together around these decentralized nodes on the mesh. So that's another, I guess, deconstruction and reconstruction that needs to happen around the technology to formulate ourselves around the domains. And again the data and the logic of the data itself, the meaning of the data itself. >> Great. Got it. And we're going to talk more about the importance of data sharing and the implications. But the third point deals with how operational, analytical technologies are constructed. You've got an app DevStack, you've got a data stack. You've made the point many times actually that we've contextualized our operational systems, but not our data systems, they remain separate. Maybe you could elaborate on this point. >> Yes. I think this is, again, has a historical background and beginning. For a really long time, applications have dealt with features and the logic of running the business and encapsulating the data and the state that they need to run that feature or run that business function. And then we had for anything analytical driven, which required access data across these applications and across the longer dimension of time around different subjects within the organization. This analytical data, we had made a decision that, "Okay, let's leave those applications aside. Let's leave those databases aside. We'll extract the data out and we'll load it, or we'll transform it and put it under the analytical kind of a data stack and then downstream from it, we will have analytical data users, the data analysts, the data sciences and the, you know, the portfolio of users that are growing use that data stack. And that led to this really separation of dual stack with point to point integration. So applications went down the path of transactional databases or urban document store, but using APIs for communicating and then we've gone to, you know, lake storage or data warehouse on the other side. If we are moving and that again, enforces the silo of data versus app, right? So if we are moving to the world that our missions that are ambitions around making applications, more intelligent. Making them data driven. These two worlds need to come closer. As in ML Analytics gets embedded into those app applications themselves. And the data sharing, as a very essential ingredient of that, gets embedded and gets closer, becomes closer to those applications. So, if you are looking at this now cross-functional, app data, based team, right? Business team, then the technology stacks can't be so segregated, right? There has to be a continuum of experience from app delivery, to sharing of the data, to using that data, to embed models back into those applications. And that continuum of experience requires well integrated technologies. I'll give you an example, which actually in some sense, we are somewhat moving to that direction. But if we are talking about data sharing or data modeling and applications use one set of APIs, you know, HTTP compliant, GraQL or RAC APIs. And on the other hand, you have proprietary SQL, like connect to my database and run SQL. Like those are very two different models of representing and accessing data. So we kind of have to harmonize or integrate those two worlds a bit more closely to achieve that domain oriented cross-functional teams. >> Yeah. We are going to talk about some of the gaps later and actually you look at them as opportunities, more than barriers. But they are barriers, but they're opportunities for more innovation. Let's go on to the fourth one. The next point, it deals with the roles that the platform serves. Data mesh proposes that domain experts own the data and take responsibility for it end to end and are served by the technology. Kind of, we referenced that before. Whereas your contention is that today, data systems are really designed for specialists. I think you use the term hyper specialists a lot. I love that term. And the generalist are kind of passive bystanders waiting in line for the technical teams to serve them. >> Yes. I mean, if you think about the, again, the intention behind data mesh was creating a responsible data sharing model that scales out. And I challenge any organization that has a scaled ambitions around data or usage of data that relies on small pockets of very expensive specialists resources, right? So we have no choice, but upscaling cross-scaling. The majority population of our technologists, we often call them generalists, right? That's a short hand for people that can really move from one technology to another technology. Sometimes we call them pandric people sometimes we call them T-shaped people. But regardless, like we need to have ability to really mobilize our generalists. And we had to do that at Thoughtworks. We serve a lot of our clients and like many other organizations, we are also challenged with hiring specialists. So we have tested the model of having a few specialists, really conveying and translating the knowledge to generalists and bring them forward. And of course, platform is a big enabler of that. Like what is the language of using the technology? What are the APIs that delight that generalist experience? This doesn't mean no code, low code. We have to throw away in to good engineering practices. And I think good software engineering practices remain to exist. Of course, they get adopted to the world of data to build resilient you know, sustainable solutions, but specialty, especially around kind of proprietary technology is going to be a hard one to scale. >> Okay. I'm definitely going to come back and pick your brain on that one. And, you know, your point about scale out in the examples, the practical examples of companies that have implemented data mesh that I've talked to. I think in all cases, you know, there's only a handful that I've really gone deep with, but it was their hadoop instances, their clusters wouldn't scale, they couldn't scale the business and around it. So that's really a key point of a common pattern that we've seen now. I think in all cases, they went to like the data lake model and AWS. And so that maybe has some violation of the principles, but we'll come back to that. But so let me go on to the next one. Of course, data mesh leans heavily, toward this concept of decentralization, to support domain ownership over the centralized approaches. And we certainly see this, the public cloud players, database companies as key actors here with very large install bases, pushing a centralized approach. So I guess my question is, how realistic is this next point where you have decentralized technologies ruling the roost? >> I think if you look at the history of places, in our industry where decentralization has succeeded, they heavily relied on standardization of connectivity with, you know, across different components of technology. And I think right now you are right. The way we get value from data relies on collection. At the end of the day, collection of data. Whether you have a deep learning machinery model that you're training, or you have, you know, reports to generate. Regardless, the model is bring your data to a place that you can collect it, so that we can use it. And that leads to a naturally set of technologies that try to operate as a full stack integrated proprietary with no intention of, you know, opening, data for sharing. Now, conversely, if you think about internet itself, web itself, microservices, even at the enterprise level, not at the planetary level, they succeeded as decentralized technologies to a large degree because of their emphasis on open net and openness and sharing, right. API sharing. We don't talk about, in the API worlds, like we don't say, you know, "I will build a platform to manage your logical applications." Maybe to a degree but we actually moved away from that. We say, "I'll build a platform that opens around applications to manage your APIs, manage your interfaces." Right? Give you access to API. So I think the shift needs to... That definition of decentralized there means really composable, open pieces of the technology that can play nicely with each other, rather than a full stack, all have control of your data yet being somewhat decentralized within the boundary of my platform. That's just simply not going to scale if data needs to come from different platforms, different locations, different geographical locations, it needs to rethink. >> Okay, thank you. And then the final point is, is data mesh favors technologies that are domain agnostic versus those that are domain aware. And I wonder if you could help me square the circle cause it's nuanced and I'm kind of a 100 level student of your work. But you have said for example, that the data teams lack context of the domain and so help us understand what you mean here in this case. >> Sure. Absolutely. So as you said, we want to take... Data mesh tries to give autonomy and decision making power and responsibility to people that have the context of those domains, right? The people that are really familiar with different business domains and naturally the data that that domain needs, or that naturally the data that domains shares. So if the intention of the platform is really to give the power to people with most relevant and timely context, the platform itself naturally becomes as a shared component, becomes domain agnostic to a large degree. Of course those domains can still... The platform is a (chuckles) fairly overloaded world. As in, if you think about it as a set of technology that abstracts complexity and allows building the next level solutions on top, those domains may have their own set of platforms that are very much doing agnostic. But as a generalized shareable set of technologies or tools that allows us share data. So that piece of technology needs to relinquish the knowledge of the context to the domain teams and actually becomes domain agnostic. >> Got it. Okay. Makes sense. All right. Let's shift gears here. Talk about some of the gaps and some of the standards that are needed. You and I have talked about this a little bit before, but this digs deeper. What types of standards are needed? Maybe you could walk us through this graphic, please. >> Sure. So what I'm trying to depict here is that if we imagine a world that data can be shared from many different locations, for a variety of analytical use cases, naturally the boundary of what we call a node on the mesh will encapsulates internally a fair few pieces. It's not just the boundary of that, not on the mesh, is the data itself that it's controlling and updating and maintaining. It's of course a computation and the code that's responsible for that data. And then the policies that continue to govern that data as long as that data exists. So if that's the boundary, then if we shift that focus from implementation details, that we can leave that for later, what becomes really important is the scene or the APIs and interfaces that this node exposes. And I think that's where the work that needs to be done and the standards that are missing. And we want the scene and those interfaces be open because that allows, you know, different organizations with different boundaries of trust to share data. Not only to share data to kind of move that data to yes, another location, to share the data in a way that distributed workloads, distributed analytics, distributed machine learning model can happen on the data where it is. So if you follow that line of thinking around the centralization and connection of data versus collection of data, I think the very, very important piece of it that needs really deep thinking, and I don't claim that I have done that, is how do we share data responsibly and sustainably, right? That is not brittle. If you think about it today, the ways we share data, one of the very common ways is around, I'll give you a JDC endpoint, or I give you an endpoint to your, you know, database of choice. And now as technology, whereas a user actually, you can now have access to the schema of the underlying data and then run various queries or SQL queries on it. That's very simple and easy to get started with. That's why SQL is an evergreen, you know, standard or semi standard, pseudo standard that we all use. But it's also very brittle, because we are dependent on a underlying schema and formatting of the data that's been designed to tell the computer how to store and manage the data. So I think that the data sharing APIs of the future really need to think about removing this brittle dependencies, think about sharing, not only the data, but what we call metadata, I suppose. Additional set of characteristics that is always shared along with data to make the data usage, I suppose ethical and also friendly for the users and also, I think we have to... That data sharing API, the other element of it, is to allow kind of computation to run where the data exists. So if you think about SQL again, as a simple primitive example of computation, when we select and when we filter and when we join, the computation is happening on that data. So maybe there is a next level of articulating, distributed computational data that simply trains models, right? Your language primitives change in a way to allow sophisticated analytical workloads run on the data more responsibly with policies and access control and force. So I think that output port that I mentioned simply is about next generation data sharing, responsible data sharing APIs. Suitable for decentralized analytical workloads. >> So I'm not trying to bait you here, but I have a follow up as well. So you schema, for all its good creates constraints. No schema on right, that didn't work, cause it was just a free for all and it created the data swamps. But now you have technology companies trying to solve that problem. Take Snowflake for example, you know, enabling, data sharing. But it is within its proprietary environment. Certainly Databricks doing something, you know, trying to come at it from its angle, bringing some of the best to data warehouse, with the data science. Is your contention that those remain sort of proprietary and defacto standards? And then what we need is more open standards? Maybe you could comment. >> Sure. I think the two points one is, as you mentioned. Open standards that allow... Actually make the underlying platform invisible. I mean my litmus test for a technology provider to say, "I'm a data mesh," (laughs) kind of compliant is, "Is your platform invisible?" As in, can I replace it with another and yet get the similar data sharing experience that I need? So part of it is that. Part of it is open standards, they're not really proprietary. The other angle for kind of sharing data across different platforms so that you know, we don't get stuck with one technology or another is around APIs. It is around code that is protecting that internal schema. So where we are on the curve of evolution of technology, right now we are exposing the internal structure of the data. That is designed to optimize certain modes of access. We're exposing that to the end client and application APIs, right? So the APIs that use the data today are very much aware that this database was optimized for machine learning workloads. Hence you will deal with a columnar storage of the file versus this other API is optimized for a very different, report type access, relational access and is optimized around roles. I think that should become irrelevant in the API sharing of the future. Because as a user, I shouldn't care how this data is internally optimized, right? The language primitive that I'm using should be really agnostic to the machine optimization underneath that. And if we did that, perhaps this war between warehouse or lake or the other will become actually irrelevant. So we're optimizing for that human best human experience, as opposed to the best machine experience. We still have to do that but we have to make that invisible. Make that an implementation concern. So that's another angle of what should... If we daydream together, the best experience and resilient experience in terms of data usage than these APIs with diagnostics to the internal storage structure. >> Great, thank you for that. We've wrapped our ankles now on the controversy, so we might as well wade all the way in, I can't let you go without addressing some of this. Which you've catalyzed, which I, by the way, I see as a sign of progress. So this gentleman, Paul Andrew is an architect and he gave a presentation I think last night. And he teased it as quote, "The theory from Zhamak Dehghani versus the practical experience of a technical architect, AKA me," meaning him. And Zhamak, you were quick to shoot back that data mesh is not theory, it's based on practice. And some practices are experimental. Some are more baked and data mesh really avoids by design, the specificity of vendor or technology. Perhaps you intend to frame your post as a technology or vendor specific, specific implementation. So touche, that was excellent. (Zhamak laughs) Now you don't need me to defend you, but I will anyway. You spent 14 plus years as a software engineer and the better part of a decade consulting with some of the most technically advanced companies in the world. But I'm going to push you a little bit here and say, some of this tension is of your own making because you purposefully don't talk about technologies and vendors. Sometimes doing so it's instructive for us neophytes. So, why don't you ever like use specific examples of technology for frames of reference? >> Yes. My role is pushes to the next level. So, you know everybody picks their fights, pick their battles. My role in this battle is to push us to think beyond what's available today. Of course, that's my public persona. On a day to day basis, actually I work with clients and existing technology and I think at Thoughtworks we have given the talk we gave a case study talk with a colleague of mine and I intentionally got him to talk about (indistinct) I want to talk about the technology that we use to implement data mesh. And the reason I haven't really embraced, in my conversations, the specific technology. One is, I feel the technology solutions we're using today are still not ready for the vision. I mean, we have to be in this transitional step, no matter what we have to be pragmatic, of course, and practical, I suppose. And use the existing vendors that exist and I wholeheartedly embrace that, but that's just not my role, to show that. I've gone through this transformation once before in my life. When microservices happened, we were building microservices like architectures with technology that wasn't ready for it. Big application, web application servers that were designed to run these giant monolithic applications. And now we're trying to run little microservices onto them. And the tail was riding the dock, the environmental complexity of running these services was consuming so much of our effort that we couldn't really pay attention to that business logic, the business value. And that's where we are today. The complexity of integrating existing technologies is really overwhelmingly, capturing a lot of our attention and cost and effort, money and effort as opposed to really focusing on the data product themselves. So it's just that's the role I have, but it doesn't mean that, you know, we have to rebuild the world. We've got to do with what we have in this transitional phase until the new generation, I guess, technologies come around and reshape our landscape of tools. >> Well, impressive public discipline. Your point about microservice is interesting because a lot of those early microservices, weren't so micro and for the naysayers look past this, not prologue, but Thoughtworks was really early on in the whole concept of microservices. So be very excited to see how this plays out. But now there was some other good comments. There was one from a gentleman who said the most interesting aspects of data mesh are organizational. And that's how my colleague Sanji Mohan frames data mesh versus data fabric. You know, I'm not sure, I think we've sort of scratched the surface today that data today, data mesh is more. And I still think data fabric is what NetApp defined as software defined storage infrastructure that can serve on-prem and public cloud workloads back whatever, 2016. But the point you make in the thread that we're showing you here is that you're warning, and you referenced this earlier, that the segregating different modes of access will lead to fragmentation. And we don't want to repeat the mistakes of the past. >> Yes, there are comments around. Again going back to that original conversation that we have got this at a macro level. We've got this tendency to decompose complexity based on technical solutions. And, you know, the conversation could be, "Oh, I do batch or you do a stream and we are different."' They create these bifurcations in our decisions based on the technology where I do events and you do tables, right? So that sort of segregation of modes of access causes accidental complexity that we keep dealing with. Because every time in this tree, you create a new branch, you create new kind of new set of tools and then somehow need to be point to point integrated. You create new specialization around that. So the least number of branches that we have, and think about really about the continuum of experiences that we need to create and technologies that simplify, that continuum experience. So one of the things, for example, give you a past experience. I was really excited around the papers and the work that came around on Apache Beam, and generally flow based programming and stream processing. Because basically they were saying whether you are doing batch or whether you're doing streaming, it's all one stream. And sometimes the window of time, narrows and sometimes the window of time over which you're computing, widens and at the end of today, is you are just getting... Doing the stream processing. So it is those sort of notions that simplify and create continuum of experience. I think resonate with me personally, more than creating these tribal fights of this type versus that mode of access. So that's why data mesh naturally selects kind of this multimodal access to support end users, right? The persona of end users. >> Okay. So the last topic I want to hit, this whole discussion, the topic of data mesh it's highly nuanced, it's new, and people are going to shoehorn data mesh into their respective views of the world. And we talked about lake houses and there's three buckets. And of course, the gentleman from LinkedIn with Azure, Microsoft has a data mesh community. See you're going to have to enlist some serious army of enforcers to adjudicate. And I wrote some of the stuff down. I mean, it's interesting. Monte Carlo has a data mesh calculator. Starburst is leaning in, chaos. Search sees themselves as an enabler. Oracle and Snowflake both use the term data mesh. And then of course you've got big practitioners J-P-M-C, we've talked to Intuit, Orlando, HelloFresh has been on, Netflix has this event based sort of streaming implementation. So my question is, how realistic is it that the clarity of your vision can be implemented and not polluted by really rich technology companies and others? (Zhamak laughs) >> Is it even possible, right? Is it even possible? That's a yes. That's why I practice then. This is why I should practice things. Cause I think, it's going to be hard. What I'm hopeful, is that the socio-technical, Leveling Data mentioned that this is a socio-technical concern or solution, not just a technology solution. Hopefully always brings us back to, you know, the reality that vendors try to sell you safe oil that solves all of your problems. (chuckles) All of your data mesh problems. It's just going to cause more problem down the track. So we'll see, time will tell Dave and I count on you as one of those members of, (laughs) you know, folks that will continue to share their platform. To go back to the roots, as why in the first place? I mean, I dedicated a whole part of the book to 'Why?' Because we get, as you said, we get carried away with vendors and technology solution try to ride a wave. And in that story, we forget the reason for which we even making this change and we are going to spend all of this resources. So hopefully we can always come back to that. >> Yeah. And I think we can. I think you have really given this some deep thought and as we pointed out, this was based on practical knowledge and experience. And look, we've been trying to solve this data problem for a long, long time. You've not only articulated it well, but you've come up with solutions. So Zhamak, thank you so much. We're going to leave it there and I'd love to have you back. >> Thank you for the conversation. I really enjoyed it. And thank you for sharing your platform to talk about data mesh. >> Yeah, you bet. All right. And I want to thank my colleague, Stephanie Chan, who helps research topics for us. Alex Myerson is on production and Kristen Martin, Cheryl Knight and Rob Hoff on editorial. Remember all these episodes are available as podcasts, wherever you listen. And all you got to do is search Breaking Analysis Podcast. Check out ETR's website at etr.ai for all the data. And we publish a full report every week on wikibon.com, siliconangle.com. You can reach me by email david.vellante@siliconangle.com or DM me @dvellante. Hit us up on our LinkedIn post. This is Dave Vellante for theCUBE Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (bright music)

Published Date : Apr 20 2022

SUMMARY :

bringing you data driven insights Organizations that have taken the plunge and have a conversation. and much of the past two years, and as we see, and some of the data and make the data available But the data warehouse crowd will say, in the middle to move the data around. and talk about how you serve and the data itself together and the implications. and the logic of running the business and are served by the technology. to build resilient you I think in all cases, you know, And that leads to a that the data teams lack and naturally the data and some of the standards that are needed. and formatting of the data and it created the data swamps. We're exposing that to the end client and the better part of a decade So it's just that's the role I have, and for the naysayers look and at the end of today, And of course, the gentleman part of the book to 'Why?' and I'd love to have you back. And thank you for sharing your platform etr.ai for all the data.

ENTITIES

Entity	Category	Confidence
Kristen Martin	PERSON	0.99+
Rob Hoff	PERSON	0.99+
Cheryl Knight	PERSON	0.99+
Stephanie Chan	PERSON	0.99+
Alex Myerson	PERSON	0.99+
Dave	PERSON	0.99+
Zhamak	PERSON	0.99+
one	QUANTITY	0.99+
Dave Vellante	PERSON	0.99+
AWS	ORGANIZATION	0.99+
10 lakes	QUANTITY	0.99+
Sanji Mohan	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Andrew	PERSON	0.99+
two	QUANTITY	0.99+
Netflix	ORGANIZATION	0.99+
Zhamak Dehghani	PERSON	0.99+
Data Mesh: Delivering Data-Driven Value at Scale	TITLE	0.99+
Boston	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
14 plus years	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two points	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
second layer	QUANTITY	0.99+
2016	DATE	0.99+
LinkedIn	ORGANIZATION	0.99+
today	DATE	0.99+
Snowflake	ORGANIZATION	0.99+
hundreds of lakes	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
david.vellante@siliconangle.com	OTHER	0.99+
theCUBE Studios	ORGANIZATION	0.98+
SQL	TITLE	0.98+
one unit	QUANTITY	0.98+
first	QUANTITY	0.98+
100 level	QUANTITY	0.98+
third point	QUANTITY	0.98+
Databricks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
three buckets	QUANTITY	0.98+
ETR	ORGANIZATION	0.98+
DevStack	TITLE	0.97+
One	QUANTITY	0.97+
wikibon.com	OTHER	0.97+
both	QUANTITY	0.97+
Thoughtworks	ORGANIZATION	0.96+
one set	QUANTITY	0.96+
one stream	QUANTITY	0.96+
Intuit	ORGANIZATION	0.95+
one way	QUANTITY	0.93+
two worlds	QUANTITY	0.93+
HelloFresh	ORGANIZATION	0.93+
this week	DATE	0.93+
last night	DATE	0.91+
fourth one	QUANTITY	0.91+
Snowflake	TITLE	0.91+
two different models	QUANTITY	0.91+
ML Analytics	TITLE	0.91+
Breaking Analysis	TITLE	0.87+
two worlds	QUANTITY	0.84+

Breaking Analysis: Thinking Outside the Box...AWS signals a new era for storage

from the cube studios in palo alto in boston bringing you data-driven insights from the cube and etr this is breaking analysis with dave vellante by our estimates aws will generate around nine billion dollars in storage revenue this year and is now the second largest supplier of enterprise storage behind dell we believe aws storage revenue will hit 11 billion in 2022 and continue to outpace on-prem storage growth by more than a thousand basis points for the next three to four years at its third annual storage day event aws signaled a continued drive to think differently about data storage and transform the way customers migrate manage and add value to their data over the next decade hello and welcome to this week's wikibon cube insights powered by etr in this breaking analysis we'll give you a brief overview of what we learned at aws's storage day share our assessment of the big announcement of the day a deal with netapp to run ontap natively in the cloud as a managed service and we'll share some new data on how we see the market evolving with aws executive perspectives on its strategy how it thinks about hybrid and where it fits into the emerging data mesh conversation let's start with a snapshot of the announcements made at storage day now as with most aws events this one had a number of announcements and introduced them at a pace that was predictably fast and oftentimes hard to follow here's a quick list of most of them with some comments on each the big big news is the announcement with netapp netapp and aws have engineered a solution which ports the rich netapp stack onto aws and will be delivered as a fully managed service this is a big deal because previously customers either had they had to make a trade-off they had a settle for cloud-based file service with less functionality than you could get with netapp on-prem or it had to lose the agility and elasticity of the cloud and the whole pay-by-the-drink model now customers can get access to a fully functional netapp stack with services like data reduction snaps clones the full multi-protocol support replication all the services ontap delivers in the cloud as a managed service through the aws console our estimate is that 80 of the data on-prem is stored in file format and that's not the revenue but that's the data and we all know about s3 object storage but the biggest market from a capacity standpoint is file storage you know this announcement reminds us quite a bit of the vmware cloud on aws deal but applied to storage netapp's aunt anthony lai told me dave this is bigger and we're going to come back to that in a moment aws announced s3 multi-region access points it's a service that optimizes storage performance it takes into account latency network congestion and the location of data copies to deliver data via the best route to ensure our best performance this is something we've talked about for quite some time using metadata to optimize that that access aws also announced improvements to s3 tiering where it will no longer charge for small objects of less than 128k so for example customers won't be charged for most metadata and other smaller objects remember aws years ago hired a bunch of emc engineers and those guys built a lot of tiering functionality into their boxes and we'll come back to that later in this episode aws also announced backup and monitoring tools to ensure backups are in compliance with regulations and corporate edicts this frankly is table stakes and was was overdue in my view aws also made a number of other announcements that have been well covered in the press around block storage and simplified data migration tools so we'll leave that to your perusal through other outlets i want to come back to the big picture on the market dynamics now as we've reported in previous breaking analysis segments aws storage revenue is on a path to 10 billion dollars we reported this last year this chart puts the market in context it shows our estimates for worldwide enterprise storage revenue in the calendar year 2021. this data is meant to include all storage revenue including primary secondary and archival storage and related maintenance services dell is the leader in the 60 billion market with aws now hot on its tail with 15 of the market in terms of the way we've cut it now in the pre-cloud days customers would tell us our storage strategy is the following we buy emc for block and netapp for file keeping it simple while remnants of this past habit continue the market is definitely changing as you can see here the companies highlighted in red represent the growing hyperscaler presence and you can see in the pi on the right they now account for around 25 percent of the market and they're growing much much faster than the on-prem vendors well over that thousand basis points when you combine them all a couple of other things to note in the data we're excluding kindrel from ibm's figures that's ibm spinout but including our estimates of storage software for example spectrums protect that is sold as part of the ibm cloud but not reported in ibm's income statement by the way pre-kindred spin ibm storage business we believe would approach the size of netapp's business now in the yellow we've highlighted the portion of hyper-converged that comprises storage this includes vmware nutanix cisco and others vmware and nutanix are the largest hci players but in total the storage piece of that market is less than two billion okay so the way to look at this market is changing traditional on-prem is vying for budgets with cloud storage services which are rapidly gaining presence in the market and we're seeing the on-prem piece evolve of course into as a service models with hpe's green lake dell's apex and other on-prem cloud-like models now let's come back to the netapp aws deal netapp as we know is the gold standard for file services they've been the market leader for a long long time and other than pure which is considerably smaller netapp is the one company that consistently was able to beat emc in the market emc developed its its nas business and developed on its own nasdaq and it bought isilon to compete with netapp with isilon's excellent global file system but generally netapp remains the best file storage company today now emerging disruptors like cumulo vast weka they would take issue with this statement and rightly so as they have really promising technology but netapp remains the king of the file hill you can't debate that now netapp however has had some serious headwinds as the largest independent storage player as seen in this etr chart the data shows a nine-year view of netapp's presence in the etr survey presence is referred to by etr as market share it's not traditional market share it measures the pervasiveness of responses in the etr survey over a thousand customers each quarter so the percentage of mentions essentially that netapp is getting and you can see well netapp remains a leader it has had a difficult time expanding its tam and it's become frankly less relevant in the eye in the grand scheme and the grand eyes of it buyers the company hit headwinds when it began migrating its base to ontap 8 and was late riding a number of new waves including flash but generally it is recovered from those headwinds and it's really now focused on the cloud opportunity opportunity as evidenced by this deal with aws now as i said earlier netapp evp anthony lai told me that this deal is bigger than vmware cloud on aws like me you may be wondering how can that be vmware is the leader in the data center it has half a million customers its deal with aws has been a tremendous success as seen in this etr chart the data here shows spending momentum or net score from when vmware cloud on aws was picked up in the etr surveys with a meaningful n which today is approaching 100 responses in the survey the yellow line is there for context it's vmware's overall business so repeat it buyers who responded vmware versus specifically vmware cloud on aws so you see vmware overall has a huge presence in the survey more than 600 n the red line is vmware cloud on aws and that red dotted line you see that that's that's my magic 40 mark anything above that line we consider elevated net score or spending velocity and while we saw some deceleration earlier this year in that line that top line for vmware cloud vmware cloud and aws has been consistently showing well in the survey well above that 40 percent line so could this netapp deal be bigger than vmware cloud on aws well probably not in our view but we like the strategy of netapp going cloud native on aws and aws's commitment to deliver this as a managed service now where could get interesting is across clouds in other words if netapp can take a page out of snowflake and build an abstraction layer that hides the underlying complexity of not only the aws cloud but also gcp and azure where you log into the netapp cloud netapp data cloud if you will just go ahead and steal steal it from snowflake and then netapp optimizes your on-prem your aws your azure and or your gcp file storage we see that as a winning strategy that could dramatically expand netapp's tam politically it may not sit well with aws but so what netapp has to go multi-cloud to expand that tam when the vmware deal was announced many people felt it was a one-way street where all the benefit would eventually accrue to aws in reality this has certainly been a near-term winner for aws and vmware and of course importantly vmware and aws join customers now longer term it's going to clearly be a win for aws because it gets access to vmware's customer base but we also think it will serve vmware well because it gives the company a clear and concise cloud strategy especially if it can go across clouds and eventually get to the edge so with this netapp aws deal will it be as big probably not in our view but it is big netapp in our view just leapfrogged the competition because of the deep engineering commitment aws has made this isn't a marketplace deal it's a native managed service and we think that's pretty huge okay we're going to close with a few thoughts on aws storage strategy and some other thoughts on hybrid talk about capturing mission critical workloads and where aws fits in the overall data mesh conversation which is one of our favorite topics first let's talk about aws's storage strategy overall as with other services aws approach is to give builders access to tools at a very granular level that means it does mean a lot of apis and access to primitives that are essentially building blocks while this may require greater developer skills it also allows aws to get to market quickly and add functionality faster than the competition enterprises however where they will pay up for solutions so this leaves some nice white space for partners and also competitors and especially the on-prem folks but let's hear from an aws executive i spoke to milan thompson bucheveck an aws vp on the cube and asked her to describe aws's storage strategy here's what she said play the clip we are dynamically and constantly evolving our aws storage services based on what the application and the customer want that is fundamentally what we do every day we talked a little bit about those deployments that are happening right now dave that is something that idea of constant dynamic evolution just can't be replicated by on-premises where you buy a box and it sits in your data center for three or more years and what's unique about us among the cloud services is again that perspective of the 15 years where we are building applications in ways that are unique because we have more customers and we have more customers doing more things so you know i i've said this before uh it's all about speed of innovation dave time and change wait for no one and if you're a business and you're trying to transform your business and base it on a set of technologies that change rapidly you have to use aws services i mean if you look at some of the launches that we talk about today and you think about s3's multi-region access points that's a fundamental change for customers that want to store copies of their data in any number of different regions and get a 60 performance improvement by leveraging the technology that we've built up over over time the the ability for us to route to intelligently router requests across our network that and fsx for netapp ontap nobody else has these capabilities today and it's because we are at the forefront of talking to different customers and that dynamic evolution of storage that's the core of our strategy so as you hear and can see by milan's statements how these guys think outside the box mentality at the end of the day customers want rock solid storage that's dirt cheap and lightning fast they always have and they always will but what i'm hearing from aws is they think about delivering these capabilities in the broader context of an application or a business think deeper business integration not the traditional suppliers don't think about that as well but the services mentality the cloud services mentality is different than dropping off a box at a loading dock turning it over to a professional services organization and then moving on to the next deal now i also had a chance to speak with wayne dusso he's another aws vp in the storage group wayne do so is a long time tech athlete for years he was responsible for building storage arrays at emc aws as i said hired a bunch of emcs years ago and those guys did a lot of tiered storage so i asked wayne what's the difference in mentality when you're building boxes versus cloud services here's what he said you have physical constraints you have to worry about the physical resources on that device for the life of that device which is years think about what changes in three or five years think about the last two years alone and what's changed can you imagine having being constrained by only uh having boxes available to you during this last two years versus having the cloud and being able to expand or contract based on your business needs that would be really tough right and it has been tough and that's why we've seen customers from every industry accelerate uh their use of the cloud during these last two years so i get that so what's your mindset when you're building storage services and data services so so each of the surfaces that we have in object block file movement services data services each of them provides very specific customer value in each are deeply integrated with the rest of aws so that when you need object services you start using them the integrations come along with you when if you're using traditional block we talked about ebs io2 block express when using file just the example alone today with ontap you know you get to use what you need when you need it and the way that you're used to using it without any concern so so the big difference is no constraints in the box but lots of opportunities to blend in with other services now all that said there are cases where the box is gonna win because of locality and and physics and latency issues you know particularly where latency is king that's where a box is gonna be advantageous and we'll come back to that in a bit okay but what about hybrid how does aws think about hybrid and on-prem here's my take and then let's hear from milan again the cloud is expanding it's moving out to the edge and aws looks at the data center as just another edge node and it's bringing its infrastructure as code mentality to that edge and of course to data centers so if aws is truly customer centric which we believe it is it will naturally have to accommodate on-prem use cases and it is doing just that here's how milan thompson-bucheveck explained how aws is thinking about hybrid roll the clip for us dave it always comes back to what the customer is asking for and we were talking to customers and they were talking about their edge and what they wanted to do with it we said how are we going to help and so if i just take s3 for outposts as an example or ebs and outposts you know we have customers like morningstar and morningstar wants outposts because they are using it as a step in their journey to being on the cloud if you take a customer like first adudabi bank they're using outposts because they need data residency for their compliance requirements and then we have other customers that are using outposts to help like dish networks as an example to place the storage as close as account to the applications for low latency all of those are customer driven requirements for their architecture for us dave we think in the fullness of time every customer and all applications are going to be on the cloud because it makes sense and those businesses need that speed of innovation but when we build things like our announcement today of fxs for netapp ontap we build them because customers asked us to help them with their journey to the cloud just like we built s3 and evs for outposts for the same reason so look this is a case where the box or the appliance wins latency matters as we said and aws gets that this is where matt baker of dell is right it's not a zero-sum game this is especially accurate as it pertains to the cloud versus on-prem discussion but a budget dollar is a budget dollar and the dollar can't go to two places so the battle will come down to who has the best solution the best relationships and who can deliver the most rock solid storage at the lowest cost and highest performance let's take a look at mission critical workloads for a second we're seeing aws go after these it's doing a database it's doing it with block storage we're talking about oracle sap microsoft sql server db2 that kind of stuff high volume oltp transactions mission critical work now there's no doubt that aws is picking up a lot of low hanging fruit with business critical workloads but the really hard to move work isn't going without a fight frankly it's not going that fast aws and mace has made some improvements to block storage to remove some of the challenges related but generally we see this is a very long road ahead for aws and other cloud suppliers oracle is the king of mission critical work along with ibm mainframes and those infrastructures generally it's not easy to move to the cloud it's too risky it's too expensive and the business case oftentimes isn't there because very frequently you have to freeze applications to do so what generally what people are doing is they're building an abstraction layer over that putting that abstraction layer maybe in the cloud building new apps that can connect to the back end and the into the cloud but that back end is largely cemented and fossilized look it's all in the definition no doubt there's plenty of mission critical work that is going to move but just really depends on how you define it even aws struggles to move its most critical transaction systems off of oracle but we'll continue to keep an open mind there it's just that today we define the most mission-critical workloads as we define them we don't see a lot of movement to the hyperscale clouds and we're going to close with some thoughts on data mesh so one of our favorite topics we've written extensively about this and interviewed and are collaborating with jamaa dagani who has coined the term and we've announced a media collaboration with the data mesh community and believe it's a strong direction for the industry so we wanted to understand how aws thinks about data mesh and where it fits in the conversation here's what milan had to say about that play the clip we have customers today that are taking the data mesh architectures and implementing them with aws services and dave i want to go back to the start of amazon when amazon first began we grew because the amazon technologies were built in microservices fundamentally a data match is about separation or abstraction of what individual components do and so if i look at data mesh really you're talking about two things you're talking about separating the data storage and the characteristics of data from the data services that interact and operate on that storage and with data mesh it's all about making sure that the businesses the decentralized business model can work with that data now our aws customers are putting their storage in a centralized place because it's easier to track it's easier to view compliance and it's easier to predict growth and control costs but we started with building blocks and we deliberately built our storage services separate from our data services so we have data services like lake formation and glue we have a number of these data services that our customers are using to build that customized data mesh on top of that centralized storage so really it's about at the end of the day speed it's about innovation it's about making sure that you can decentralize and separate your data services from your storage so businesses can go faster so it's very true that aws has customers that are implementing data mess data mesh data mess data mesh can be a data mess if you don't do it right jpmorgan chase is a firm that is doing that we've we've covered that they've got a great video out there check out the breaking analysis archive you'll see that hellofresh has also initiated a data mesh architecture in the cloud and several others are starting to pop up i think the point is the issues and challenges around data mesh are more organizational and process related and less focused on the technology platform look data by its very nature is decentralized so when mylan talks about customers building on centralized storage that's a logical view of the storage but not necessarily physically centralized it may be in a in a hybrid device it may be a copy that lives outside of that same physical location this is an important point as jpmorgan chase pointed out the data mesh must accommodate data products and services that are in the cloud and also on-prem it's got to be inclusive the data mesh looks at the data store as a node on the data mesh it shouldn't be confined by the technology whether it's a data warehouse a data hub a data mart or an s3 bucket so i would say this while people think of the cloud as a centralized walled garden and in many respects it is that very same cloud is expanding into a massively distributed architecture and that fits with the data mesh architectural model as i say the big challenges of data mesh are less technical and more cultural and we're super excited to see how data mesh plays out over time and we're really excited to be part of part of the the community and a media partner of the data mesh community okay that's it for now remember i publish each week on wikibon.com and siliconangle.com and these episodes they're all available as podcasts all you do is search for breaking analysis podcasts you can always connect on twitter i'm at d vellante or email me at david.velante at siliconangle.com i appreciate the comments you guys make on linkedin and don't forget to check out etr.plus for all the survey action this is dave vellante for the cube insights powered by etr be well and we'll see you next time [Music] you

Published Date : Sep 3 2021

SUMMARY :

and the dollar can't go to two places so

ENTITIES

Entity	Category	Confidence
2022	DATE	0.99+
10 billion dollars	QUANTITY	0.99+
40 percent	QUANTITY	0.99+
three	QUANTITY	0.99+
less than two billion	QUANTITY	0.99+
11 billion	QUANTITY	0.99+
nine-year	QUANTITY	0.99+
wayne dusso	PERSON	0.99+
isilon	ORGANIZATION	0.99+
morningstar	ORGANIZATION	0.99+
aws	ORGANIZATION	0.99+
two places	QUANTITY	0.99+
100 responses	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
15 years	QUANTITY	0.99+
ibm	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
more than 600	QUANTITY	0.99+
each week	QUANTITY	0.99+
today	DATE	0.99+
last year	DATE	0.99+
jpmorgan chase	ORGANIZATION	0.99+
dave vellante	PERSON	0.99+
boston	LOCATION	0.98+
less than 128k	QUANTITY	0.98+
amazon	ORGANIZATION	0.98+
nutanix	ORGANIZATION	0.98+
over a thousand customers	QUANTITY	0.98+
d vellante	PERSON	0.98+
wayne	PERSON	0.98+
around nine billion dollars	QUANTITY	0.98+
microsoft	ORGANIZATION	0.98+
milan thompson-bucheveck	PERSON	0.97+
vmware	ORGANIZATION	0.97+
two things	QUANTITY	0.97+
40	QUANTITY	0.97+
around 25 percent	QUANTITY	0.97+
netapp	ORGANIZATION	0.97+
this year	DATE	0.97+
more than a thousand basis points	QUANTITY	0.96+
each	QUANTITY	0.96+
matt baker	PERSON	0.96+
netapp	TITLE	0.96+
AWS	ORGANIZATION	0.96+
jamaa dagani	PERSON	0.96+
first	QUANTITY	0.96+
one	QUANTITY	0.96+
third annual	QUANTITY	0.95+
60 performance	QUANTITY	0.95+
milan	PERSON	0.95+
twitter	ORGANIZATION	0.95+
one-way	QUANTITY	0.95+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for HelloFresh: