Predictions 2022: Top Analysts See the Future of Data
(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)
SUMMARY :
and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Menninger | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Doug Henschen | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Brad Shimmin | PERSON | 0.99+ |
Doug | PERSON | 0.99+ |
Tony Baer | PERSON | 0.99+ |
Dave Velannte | PERSON | 0.99+ |
Tony | PERSON | 0.99+ |
Carl | PERSON | 0.99+ |
Brad | PERSON | 0.99+ |
Carl Olofson | PERSON | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
2014 | DATE | 0.99+ |
Sanjeev Mohan | PERSON | 0.99+ |
Ventana Research | ORGANIZATION | 0.99+ |
2022 | DATE | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
January of 2022 | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
381 databases | QUANTITY | 0.99+ |
IDC | ORGANIZATION | 0.99+ |
Informatica | ORGANIZATION | 0.99+ |
Snowflake | ORGANIZATION | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
two | QUANTITY | 0.99+ |
Sanjeev | PERSON | 0.99+ |
2021 | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
Omdia | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
SanjMo | ORGANIZATION | 0.99+ |
79% | QUANTITY | 0.99+ |
second question | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
15 data stores | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
SAP | ORGANIZATION | 0.99+ |
JG Chirapurath, Microsoft CLEAN
>> Okay, we're now going to explore the vision of the future of cloud computing from the perspective of one of the leaders in the field, JG Chirapurath is the Vice President of Azure Data AI and Edge at Microsoft. JG, welcome to theCUBE on Cloud, thanks so much for participating. >> Well, thank you, Dave. And it's a real pleasure to be here with you and just want to welcome the audience as well. >> Well, JG, judging from your title, we have a lot of ground to cover and our audience is definitely interested in all the topics that are implied there. So let's get right into it. We've said many times in theCUBE that the new innovation cocktail comprises machine intelligence or AI applied to troves of data with the scale of the cloud. It's no longer we're driven by Moore's law. It's really those three factors and those ingredients are going to power the next wave of value creation in the economy. So first, do you buy into that premise? >> Yes, absolutely. We do buy into it and I think one of the reasons why we put data analytics and AI together, is because all of that really begins with the collection of data and managing it and governing it, unlocking analytics in it. And we tend to see things like AI, the value creation that comes from AI as being on that continuum of having started off with really things like analytics and proceeding to be machine learning and the use of data in interesting ways. >> Yes, I'd like to get some more thoughts around data and how you see the future of data and the role of cloud and maybe how Microsoft strategy fits in there. I mean, your portfolio, you've got SQL Server, Azure SQL, you got Arc which is kind of Azure everywhere for people that aren't familiar with that you got Synapse which course does all the integration, the data warehouse and it gets things ready for BI and consumption by the business and the whole data pipeline. And then all the other services, Azure Databricks, you got you got Cosmos in there, you got Blockchain, you've got Open Source services like PostgreSQL and MySQL. So lots of choices there. And I'm wondering, how do you think about the future of cloud data platforms? It looks like your strategy is right tool for the right job. Is that fair? >> It is fair, but it's also just to step back and look at it. It's fundamentally what we see in this market today, is that customers they seek really a comprehensive proposition. And when I say a comprehensive proposition it is sometimes not just about saying that, "Hey, listen "we know you're a sequence of a company, "we absolutely trust that you have the best "Azure SQL database in the cloud. "But tell us more." We've got data that is sitting in Hadoop systems. We've got data that is sitting in PostgreSQL, in things like MongoDB. So that open source proposition today in data and data management and database management has become front and center. So our real sort of push there is when it comes to migration management modernization of data to present the broadest possible choice to our customers, so we can meet them where they are. However, when it comes to analytics, one of the things they ask for is give us lot more convergence use. It really, it isn't about having 50 different services. It's really about having that one comprehensive service that is converged. That's where things like Synapse fits in where you can just land any kind of data in the lake and then use any compute engine on top of it to drive insights from it. So fundamentally, it is that flexibility that we really sort of focus on to meet our customers where they are. And really not pushing our dogma and our beliefs on it but to meet our customers according to the way they've deployed stuff like this. >> So that's great. I want to stick on this for a minute because when I have guests on like yourself they never want to talk about the competition but that's all we ever talk about. And that's all your customers ever talk about. Because the counter to that right tool for the right job and that I would say is really kind of Amazon's approach is that you got the single unified data platform, the mega database. So it does it all. And that's kind of Oracle's approach. It sounds like you want to have your cake and eat it too. So you got the right tool with the right job approach but you've got an integration layer that allows you to have that converged database. I wonder if you could add color to that and confirm or deny what I just said. >> No, that's a very fair observation but I'd say there's a nuance in what I sort of described. When it comes to data management, when it comes to apps, we have then customers with the broadest choice. Even in that perspective, we also offer convergence. So case in point, when you think about cosmos DB under that one sort of service, you get multiple engines but with the same properties. Right, global distribution, the five nines availability. It gives customers the ability to basically choose when they have to build that new cloud native app to adopt cosmos DB and adopt it in a way that is an choose an engine that is most flexible to them. However, when it comes to say, writing a SequenceServer for example, if modernizing it, you want sometimes, you just want to lift and shift it into things like IS. In other cases, you want to completely rewrite it. So you need to have the flexibility of choice there that is presented by a legacy of what sits on premises. When you move into things like analytics, we absolutely believe in convergence. So we don't believe that look, you need to have a relational data warehouse that is separate from a Hadoop system that is separate from say a BI system that is just, it's a bolt-on. For us, we love the proposition of really building things that are so integrated that once you land data, once you prep it inside the Lake you can use it for analytics, you can use it for BI, you can use it for machine learning. So I think, our sort of differentiated approach speaks for itself there. >> Well, that's interesting because essentially again you're not saying it's an either or, and you see a lot of that in the marketplace. You got some companies you say, "No, it's the data lake." And others say "No, no, put it in the data warehouse." And that causes confusion and complexity around the data pipeline and a lot of cutting. And I'd love to get your thoughts on this. A lot of customers struggle to get value out of data and specifically data product builders are frustrated that it takes them too long to go from, this idea of, hey, I have an idea for a data service and it can drive monetization, but to get there you got to go through this complex data life cycle and pipeline and beg people to add new data sources and do you feel like we have to rethink the way that we approach data architecture? >> Look, I think we do in the cloud. And I think what's happening today and I think the place where I see the most amount of rethink and the most amount of push from our customers to really rethink is the area of analytics and AI. It's almost as if what worked in the past will not work going forward. So when you think about analytics only in the enterprise today, you have relational systems, you have Hadoop systems, you've got data marts, you've got data warehouses you've got enterprise data warehouse. So those large honking databases that you use to close your books with. But when you start to modernize it, what people are saying is that we don't want to simply take all of that complexity that we've built over, say three, four decades and simply migrate it en masse exactly as they are into the cloud. What they really want is a completely different way of looking at things. And I think this is where services like Synapse completely provide a differentiated proposition to our customers. What we say there is land the data in any way you see, shape or form inside the lake. Once you landed inside the lake, you can essentially use a Synapse Studio to prep it in the way that you like. Use any compute engine of your choice and operate on this data in any way that you see fit. So case in point, if you want to hydrate a relational data warehouse, you can do so. If you want to do ad hoc analytics using something like Spark, you can do so. If you want to invoke Power BI on that data or BI on that data, you can do so. If you want to bring in a machine learning model on this prep data, you can do so. So inherently, so when customers buy into this proposition, what it solves for them and what it gives to them is complete simplicity. One way to land the data multiple ways to use it. And it's all integrated. >> So should we think of Synapse as an abstraction layer that abstracts away the complexity of the underlying technology? Is that a fair way to think about it? >> Yeah, you can think of it that way. It abstracts away Dave, a couple of things. It takes away that type of data. Sort of complexities related to the type of data. It takes away the complexity related to the size of data. It takes away the complexity related to creating pipelines around all these different types of data. And fundamentally puts it in a place where it can be now consumed by any sort of entity inside the Azure proposition. And by that token, even Databricks. You can in fact use Databricks in sort of an integrated way with the Azure Synapse >> Right, well, so that leads me to this notion of and I wonder if you buy into it. So my inference is that a data warehouse or a data lake could just be a node inside of a global data mesh. And then it's Synapse is sort of managing that technology on top. Do you buy into that? That global data mesh concept? >> We do and we actually do see our customers using Synapse and the value proposition that it brings together in that way. Now it's not where they start, oftentimes when a customer comes and says, "Look, I've got an enterprise data warehouse, "I want to migrate it." Or "I have a Hadoop system, I want to migrate it." But from there, the evolution is absolutely interesting to see. I'll give you an example. One of the customers that we're very proud of is FedEx. And what FedEx is doing is it's completely re-imagining its logistics system. That basically the system that delivers, what is it? The 3 million packages a day. And in doing so, in this COVID times, with the view of basically delivering on COVID vaccines. One of the ways they're doing it, is basically using Synapse. Synapse is essentially that analytic hub where they can get complete view into the logistic processes, way things are moving, understand things like delays and really put all of that together in a way that they can essentially get our packages and these vaccines delivered as quickly as possible. Another example, it's one of my favorite. We see once customers buy into it, they essentially can do other things with it. So an example of this is really my favorite story is Peace Parks initiative. It is the premier of white rhino conservancy in the world. They essentially are using data that has landed in Azure, images in particular to basically use drones over the vast area that they patrol and use machine learning on this data to really figure out where is an issue and where there isn't an issue. So that this part with about 200 radios can scramble surgically versus having to range across the vast area that they cover. So, what you see here is, the importance is really getting your data in order, landing consistently whatever the kind of data it is, build the right pipelines, and then the possibilities of transformation are just endless. >> Yeah, that's very nice how you worked in some of the customer examples and I appreciate that. I want to ask you though that some people might say that putting in that layer while you clearly add simplification and is I think a great thing that there begins over time to be a gap, if you will, between the ability of that layer to integrate all the primitives and all the piece parts, and that you lose some of that fine grain control and it slows you down. What would you say to that? >> Look, I think that's what we excel at and that's what we completely sort of buy into. And it's our job to basically provide that level of integration and that granularity in the way that it's an art. I absolutely admit it's an art. There are areas where people crave simplicity and not a lot of sort of knobs and dials and things like that. But there are areas where customers want flexibility. And so I think just to give you an example of both of them, in landing the data, in consistency in building pipelines, they want simplicity. They don't want complexity. They don't want 50 different places to do this. There's one way to do it. When it comes to computing and reducing this data, analyzing this data, they want flexibility. This is one of the reasons why we say, "Hey, listen you want to use Databricks. "If you're buying into that proposition. "And you're absolutely happy with them, "you can plug it into it." You want to use BI and essentially do a small data model, you can use BI. If you say that, "Look, I've landed into the lake, "I really only want to use ML." Bring in your ML models and party on. So that's where the flexibility comes in. So that's sort of that we sort of think about it. >> Well, I like the strategy because one of our guests, Jumark Dehghani is I think one of the foremost thinkers on this notion of of the data mesh And her premise is that the data builders, data product and service builders are frustrated because the big data system is generic to context. There's no context in there. But by having context in the big data architecture and system you can get products to market much, much, much faster. So, and that seems to be your philosophy but I'm going to jump ahead to my ecosystem question. You've mentioned Databricks a couple of times. There's another partner that you have, which is Snowflake. They're kind of trying to build out their own DataCloud, if you will and GlobalMesh, and the one hand they're a partner on the other hand they're a competitor. How do you sort of balance and square that circle? >> Look, when I see Snowflake, I actually see a partner. When we see essentially we are when you think about Azure now this is where I sort of step back and look at Azure as a whole. And in Azure as a whole, companies like Snowflake are vital in our ecosystem. I mean, there are places we compete, but effectively by helping them build the best Snowflake service on Azure, we essentially are able to differentiate and offer a differentiated value proposition compared to say a Google or an AWS. In fact, that's been our approach with Databricks as well. Where they are effectively on multiple clouds and our opportunity with Databricks is to essentially integrate them in a way where we offer the best experience the best integrations on Azure Berna. That's always been our focus. >> Yeah, it's hard to argue with the strategy or data with our data partner and ETR shows Microsoft is both pervasive and impressively having a lot of momentum spending velocity within the budget cycles. I want to come back to AI a little bit. It's obviously one of the fastest growing areas in our survey data. As I said, clearly Microsoft is a leader in this space. What's your vision of the future of machine intelligence and how Microsoft will participate in that opportunity? >> Yeah, so fundamentally, we've built on decades of research around essentially vision, speech and language. That's been the three core building blocks and for a really focused period of time, we focused on essentially ensuring human parity. So if you ever wonder what the keys to the kingdom are, it's the boat we built in ensuring that the research or posture that we've taken there. What we've then done is essentially a couple of things. We've focused on essentially looking at the spectrum that is AI. Both from saying that, "Hey, listen, "it's got to work for data analysts." We're looking to basically use machine learning techniques to developers who are essentially, coding and building machine learning models from scratch. So for that select proposition manifest to us as really AI focused on all skill levels. The other core thing we've done is that we've also said, "Look, it'll only work as long "as people trust their data "and they can trust their AI models." So there's a tremendous body of work and research we do and things like responsible AI. So if you asked me where we sort of push on is fundamentally to make sure that we never lose sight of the fact that the spectrum of AI can sort of come together for any skill level. And we keep that responsible AI proposition absolutely strong. Now against that canvas Dave, I'll also tell you that as Edge devices get way more capable, where they can input on the Edge, say a camera or a mic or something like that. You will see us pushing a lot more of that capability onto the edge as well. But to me, that's sort of a modality but the core really is all skill levels and that responsibility in AI. >> Yeah, so that brings me to this notion of, I want to bring an Edge and hybrid cloud, understand how you're thinking about hybrid cloud, multicloud obviously one of your competitors Amazon won't even say the word multicloud. You guys have a different approach there but what's the strategy with regard to hybrid? Do you see the cloud, you're bringing Azure to the edge maybe you could talk about that and talk about how you're different from the competition. >> Yeah, I think in the Edge from an Edge and I even I'll be the first one to say that the word Edge itself is conflated. Okay, a little bit it's but I will tell you just focusing on hybrid, this is one of the places where, I would say 2020 if I were to look back from a COVID perspective in particular, it has been the most informative. Because we absolutely saw customers digitizing, moving to the cloud. And we really saw hybrid in action. 2020 was the year that hybrid sort of really became real from a cloud computing perspective. And an example of this is we understood that it's not all or nothing. So sometimes customers want Azure consistency in their data centers. This is where things like Azure Stack comes in. Sometimes they basically come to us and say, "We want the flexibility of adopting "flexible button of platforms let's say containers, "orchestrating Kubernetes "so that we can essentially deploy it wherever you want." And so when we designed things like Arc, it was built for that flexibility in mind. So, here's the beauty of what something like Arc can do for you. If you have a Kubernetes endpoint anywhere, we can deploy an Azure service onto it. That is the promise. Which means, if for some reason the customer says that, "Hey, I've got "this Kubernetes endpoint in AWS. And I love Azure SQL. You will be able to run Azure SQL inside AWS. There's nothing that stops you from doing it. So inherently, remember our first principle is always to meet our customers where they are. So from that perspective, multicloud is here to stay. We are never going to be the people that says, "I'm sorry." We will never say (speaks indistinctly) multicloud but it is a reality for our customers. >> So I wonder if we could close, thank you for that. By looking back and then ahead and I want to put forth, maybe it's a criticism, but maybe not. Maybe it's an art of Microsoft. But first, you did Microsoft an incredible job at transitioning its business. Azure is omnipresent, as we said our data shows that. So two-part question first, Microsoft got there by investing in the cloud, really changing its mindset, I think and leveraging its huge software estate and customer base to put Azure at the center of it's strategy. And many have said, me included, that you got there by creating products that are good enough. We do a one Datto, it's still not that great, then a two Datto and maybe not the best, but acceptable for your customers. And that's allowed you to grow very rapidly expand your market. How do you respond to that? Is that a fair comment? Are you more than good enough? I wonder if you could share your thoughts. >> Dave, you hurt my feelings with that question. >> Don't hate me JG. (both laugh) We're getting it out there all right, so. >> First of all, thank you for asking me that. I am absolutely the biggest cheerleader you'll find at Microsoft. I absolutely believe that I represent the work of almost 9,000 engineers. And we wake up every day worrying about our customer and worrying about the customer condition and to absolutely make sure we deliver the best in the first attempt that we do. So when you take the plethora of products we deliver in Azure, be it Azure SQL, be it Azure Cosmos DB, Synapse, Azure Databricks, which we did in partnership with Databricks, Azure Machine Learning. And recently when we premiered, we sort of offered the world's first comprehensive data governance solution in Azure Purview. I would humbly submit it to you that we are leading the way and we're essentially showing how the future of data, AI and the Edge should work in the cloud. >> Yeah, I'd be disappointed if you capitulated in any way, JG. So, thank you for that. And that's kind of last question is looking forward and how you're thinking about the future of cloud. Last decade, a lot about cloud migration, simplifying infrastructure to management and deployment. SaaSifying My Enterprise, a lot of simplification and cost savings and of course redeployment of resources toward digital transformation, other valuable activities. How do you think this coming decade will be defined? Will it be sort of more of the same or is there something else out there? >> I think that the coming decade will be one where customers start to unlock outsize value out of this. What happened to the last decade where people laid the foundation? And people essentially looked at the world and said, "Look, we've got to make a move. "They're largely hybrid, but you're going to start making "steps to basically digitize and modernize our platforms. I will tell you that with the amount of data that people are moving to the cloud, just as an example, you're going to see use of analytics, AI or business outcomes explode. You're also going to see a huge sort of focus on things like governance. People need to know where the data is, what the data catalog continues, how to govern it, how to trust this data and given all of the privacy and compliance regulations out there essentially their compliance posture. So I think the unlocking of outcomes versus simply, Hey, I've saved money. Second, really putting this comprehensive sort of governance regime in place and then finally security and trust. It's going to be more paramount than ever before. >> Yeah, nobody's going to use the data if they don't trust it, I'm glad you brought up security. It's a topic that is at number one on the CIO list. JG, great conversation. Obviously the strategy is working and thanks so much for participating in Cube on Cloud. >> Thank you, thank you, Dave and I appreciate it and thank you to everybody who's tuning into today. >> All right then keep it right there, I'll be back with our next guest right after this short break.
SUMMARY :
of one of the leaders in the field, to be here with you that the new innovation cocktail comprises and the use of data in interesting ways. and how you see the future that you have the best is that you got the single that once you land data, but to get there you got to go in the way that you like. Yeah, you can think of it that way. of and I wonder if you buy into it. and the value proposition and that you lose some of And so I think just to give you an example So, and that seems to be your philosophy when you think about Azure Yeah, it's hard to argue the keys to the kingdom are, Do you see the cloud, you're and I even I'll be the first one to say that you got there by creating products Dave, you hurt my We're getting it out there all right, so. that I represent the work Will it be sort of more of the same and given all of the privacy the data if they don't trust it, thank you to everybody I'll be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
JG | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Microsoft | ORGANIZATION | 0.99+ |
FedEx | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Jumark Dehghani | PERSON | 0.99+ |
Databricks | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
JG Chirapurath | PERSON | 0.99+ |
first | QUANTITY | 0.99+ |
50 different services | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
50 different places | QUANTITY | 0.99+ |
MySQL | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
GlobalMesh | ORGANIZATION | 0.99+ |
Both | QUANTITY | 0.99+ |
first attempt | QUANTITY | 0.99+ |
Second | QUANTITY | 0.99+ |
Last decade | DATE | 0.99+ |
three | QUANTITY | 0.99+ |
three factors | QUANTITY | 0.99+ |
Synapse | ORGANIZATION | 0.99+ |
one way | QUANTITY | 0.99+ |
COVID | OTHER | 0.99+ |
One | QUANTITY | 0.98+ |
first one | QUANTITY | 0.98+ |
first principle | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Azure Stack | TITLE | 0.98+ |
Azure SQL | TITLE | 0.98+ |
Spark | TITLE | 0.98+ |
First | QUANTITY | 0.98+ |
MongoDB | TITLE | 0.98+ |
2020 | DATE | 0.98+ |
about 200 radios | QUANTITY | 0.98+ |
Moore | PERSON | 0.97+ |
PostgreSQL | TITLE | 0.97+ |
four decades | QUANTITY | 0.97+ |
Arc | TITLE | 0.97+ |
single | QUANTITY | 0.96+ |
Snowflake | ORGANIZATION | 0.96+ |
last decade | DATE | 0.96+ |
Azure Purview | TITLE | 0.95+ |
3 million packages a day | QUANTITY | 0.95+ |
One way | QUANTITY | 0.94+ |
three core | QUANTITY | 0.94+ |