Image Title

Search Results for Tony Baer:

Tony Baer, dbInsight | MongoDB World 2022


 

>>Welcome back to the big apple, everybody. The Cube's continuous coverage here of MongoDB world 2022. We're at the new Javet center. It's it's quite nice. It was built during the pandemic. I believe on top of a former bus terminal. I'm told by our next guest Tony bear, who's the principal at DB insight of data and database expert, longtime analyst, Tony. Good to see you. Thanks for coming >>On. Thanks >>For having us. You face to face >>And welcome to New York. >>Yeah. Right. >>New York is open for business. >>So, yeah. And actually, you know, it's interesting. We've been doing a lot of these events lately and, and especially the ones in Vegas, it's the first time everybody's been out, you know, face to face, not so much here, you know, people have been out and about a lot of masks >>In, >>In New York city, but, but it's good. And, and this new venue is fantastic >>Much nicer than the old Javits. >>Yeah. And I would say maybe 3000 people here. >>Yeah. Probably, but I think like most conferences right now are kind of, they're going through like a slow ramp up. And like for instance, you know, sapphires had maybe about one third, their normal turnout. So I think that you're saying like one third to one half seems to be the norm right now are still figuring out how we're, how and where we're gonna get back together. Yeah. >>I think that's about right. And, and I, but I do think that that in most of the cases that we've seen, it's exceeded people's expectations at tenants, but anyway sure. Let's talk about Mongo, very interesting company. You know, we've been kind of been watching their progression from just sort of document database and all the features and functions they're adding, you just published a piece this morning in venture beat is time for Mongo to get into analytics. Yes. You know? Yes. One of your favorite topics. Well, can they expand analytics? They seem to be doing that. Let's dig into it. Well, >>They're taking, they've been taking slow. They've been taking baby steps and there's good reason for that because first thing is an operational database. The last thing you wanna do is slow it down with very complex analytics. On the other hand, there's huge value to be had if you would, if you could, you know, turn, let's say a smart, if you can turn, let's say an operational database or a transaction database into a smart transaction database. In other words, for instance, you know, let's say if you're, you're, you're doing, you know, an eCommerce site and a customer has made an order, that's basically been out of the norm. Whether it be like, you know, good or bad, it would be nice. Basically, if at that point you could then have a next best action, which is where analytics comes in. But it's a very lightweight form of analytics. It's not gonna, it's actually, I think probably the best metaphor for this is real time credit scoring. It's not that they're doing your scoring you in real time. It's that the model has been computed offline so that when you come on in real time, it can make a smart decision. >>Got it. Okay. So, and I think it was your article where I, I wrote down some examples. Sure. Operational, you know, use cases, patient data. There's certainly retail. We had Forbes on earlier, right? Obviously, so very wide range of, of use cases for operational will, will Mongo, essentially, in your view, is it positioned to replace traditional R D BMS? >>Well, okay. That's a long that's, that's much, it's >>Sort of a loaded question, but >>That's, that's a very loaded question. I think that for certain cases, I think it will replace R D BMS, but I still, I mean, where I, where I depart from Mongo is I do not believe that they're going to replace all R D D BMSs. I think, for instance, like when you're doing financial transactions, you know, the world has been used to table, you know, you know, columns and rows and tables. That's, it's a natural form for something that's very structured like that. On the other hand, when you take a look, let's say OT data, or you're taking a look at home listings that tends to more naturally represent itself as documents. And so there's a, so it's kind of like documents are the way that let's say you normally see the world. Relational is the way that you would structure the world. >>Okay. Well, I like that. So, but I mean, in the early days, obviously, and even to this day, it's like the target for Mongo has been Oracle. Yeah. Right, right. And so, and then, you know, you talk to a lot of Oracle customers as do I sure. And they are running the most mission, critical applications in the world, and it's like banking and financial and so many. And, and, and, you know, they've kind of carved out that space, but are we, should we be rethinking the definition of, of mission critical? Is that changing? >>Well, number one, I think what we've traditionally associated mission critical systems with is our financial transaction systems and to a less, and also let's say systems that schedule operations. But the fact is there are many forms of operations where for instance, let's say you're in a social network, do you need to have that very latest update? Or, you know, basically, can you go off, let's say like, you know, a server that's eventually consistent. In other words, the, do you absolutely have, you know, it's just like when you go on Twitter, do you naturally see all the latest tweets? It's not the system's not gonna crash for that reason. Whereas let's say if you're doing it, you know, let's say an ATM banking ATM system, that system better be current. So I think there's a delineation. The fact is, is that in a social network, arguably that operational system is mission critical, but it's mission critical in a different way from a, you know, from, let's say a banking system. >>So coming back to this idea of, of this hybrid, I think, you know, I think Gartner calls it H tab hybrid, transactional analytics >>Is changed by >>The minute, right. I mean, you mentioned that in, in your article, but basically it's bringing analytics to transactions bringing those, those roles together. Right. Right. And you're saying with Mongo, it's, it's lightweight now take, you use two other examples in your article, my SQL heat wave. Right. I think you had a Google example as well, DB, those are, you're saying much, much heavier analytics, is that correct? Or >>I we'll put it this way. I think they're because they're coming from a relational background. And because they also are coming from companies that already have, you know, analytic database or data warehouses, if you will, that their analytic, you know, capabilities are gonna be much more fully rounded than what Mongo has at this point. It's not a criticism of a Mongo MongoDB per >>Per, is that by design though? Or ne not necessarily. Is that a function of maturity? >>I think it's function of maturity. Oh, okay. I mean, look, to a certain extent, it's also a function of design in terms of that the document model is a little, it's not impossible to basically model it for analytics, but it takes more, you know, transformation to, to decide which, you know, let's say field in that document is gonna be a column. >>Now, the big thing about some of these other, these hybrid systems is, is eliminating the need for two databases, right? Eliminating the need for, you know, complex ETL. Is, is that a value proposition that will emerge with, with Mongo in your view? >>You know, I, I mean, put it this way. I think that if you take a look at how they've, how Mongo is basically has added more function to its operations, someone talking about analytics here, for instance, adding streaming, you know, adding, adding, search, adding time series, that's a matter of like where they've eliminated the need to do, you know, transformation ETL, but that's not for analytics per se for analytics. I think through, you know, I mean through replication, there's still gonna be some transformation in terms of turning, let's say data, that's, that's formed in a document into something that's represented by columns. There is a form of transformation, you know, so that said, and Mongo is already, you know, it has some NA you know, nascent capability there, but it's all, but this is still like at a rev 1.0 level, you know, I expect a lot more >>Of so refin you, how Amazon says in the fullness of time, all workloads will be in the cloud. And we could certainly debate that. What do we mean by cloud? So, but there's a sort of analog for Mongo that I'll ask you in the fullness of time, will Mongo be in a position to replace data warehouses or data lakes? No. Or, or, or, and we know the answer is no. So that's of course, yeah. But are these two worlds on a quasi collision course? I think they >>More on a convergence course or the collision course, because number one is I said, the first principle and operational database is the last thing you wanna do is slow it down. And to do all this complex modeling that let's say that you would do in a data bricks, or very complex analytics that you would do in a snowflake that is going to get, you know, you know, no matter how much you partition the load, you know, in Atlas, and yes, you can have separate nodes. The fact is you really do not wanna burden the operational database with that. And that's not what it's meant for, but what it is meant for is, you know, can I make a smart decision on the spot? In other words, kinda like close the loop on that. And so therefore there's a, a form of lightweight analytic that you can perform in there. And actually that's also the same principle, you know, on which let's say for instance, you know, my SQL heat wave and Allo DBR based on, they're not, they're predicated on, they're not meant to replace, you know, whether it be exit data or big query, the idea there is to do more of the lightweight stuff, you know, and keep the database, you know, keep the operations, you know, >>Operating. And, but from a practitioner's standpoint, I, I, I can and should isolate you're saying that node, right. That's what they'll do. Sure. How does that affect cuz my understanding is that that the Mon Mongo specifically, but I think document databases generally will have a primary node. Right? And then you can set up secondary nodes, which then you have to think about availability, but, but would that analytic node be sort of fenced off? Is that part of the >>Well, that's actually what they're, they've already, I mean, they already laid the groundwork for it last year, by saying that you can set up separate nodes and dedicate them to analytics and what they've >>As, as a primary, >>Right? Yes, yes. For analytics and what they've added, what they're a, what they are adding this year is the fact to say like that separate node does not have to be the same instance class, you know, as, as, as, as the, >>What, what does that mean? Explain >>That in other words, it's a, you know, you could have BA you know, for instance, you could have a node for operations, that's basically very eye ops intensive, whereas you could have a node let's say for analytics that might be more compute intensive or, or more he, or, or more heavily, you know, configured with, with memory per se. And so the idea here is you can tailor in a node to the workload. So that's, you know what they're saying with, you know, and I forget what they're calling it, but the idea that you can have a different type, you can specify a different type of node, a different type of instance for the analytic node, I think is, you know, is a major step forward >>And that, and that that's enabled by the cloud and architecture. >>Of course. Yes. I mean, we're separating, compute from data is, is, is the starter. And so yeah. Then at that point you can then start to, you know, you know, to go less vanilla. I think, you know, the re you know, the, you know, the, I guess the fruition of this is going to be when they say, okay, you can run your, let's say your operational nodes, you know, dedicated, but we'll let you run your analytic nodes serverless. Can't do it yet, but I've gotta believe that's on the roadmap. >>Yeah. So seq brings a lot of overhead. So you get MQL, but now square this circle for me, cuz now you got Mago talking sequel. >>They had to start doing that some time. I mean, and I it's been a court take I've had from them from the, from the get go, which I said, I understand that you're looking at this as an alternative to SQL and that's perfectly valid, but don't deny the validity of SQL or the reason why we, you know, we need it. The fact is that you have, okay, the number, you know, according to Ty index, JavaScript is the seventh, most popular language. Most SQL follows closely behind at the ninth, most popular language you don't want to cl. And the fact is those people exist in the enterprise and they're, and they're disproportionately concentrated in analytics. I mean, you know, it's getting a little less, so now we're seeing like, you know, basically, you know, Python, the programmatic, but still, you know, a lot of sequel expertise there. It does not make, it makes no sense for Mongo to, to, to ignore or to overlook that audience. I think now they're, you know, you know, they're taking baby steps to start, you know, reaching out to them. >>It's interesting. You see it going both ways. See Oracle announces a Mongo, DB, Mongo. I mean, it's just convergence. You called it not, I love collisions, you know, >>I know it's like, because you thrive on drama and I thrive on can't. We all love each other, but you know, act. But the thing is actually, I've been, I wrote about this. I forget when I think it was like 2014 or 2016. It's when we, I was noticed I was noting basically the, you know, the rise of all these specialized databases and probably Amazon, you know, AWS is probably the best exemplar of that. I've got 15 or 16 or however, number of databases and they're all dedicated purpose. Right. But I also was, you know, basically saw that inevitably there was gonna be some overlap. It's not that all databases were gonna become one and the same we're gonna be, we're gonna become back into like the, you know, into a pan G continent or something like that. But that you're gonna have a relational database that can do JSON and, and a, and a document database that can do relational. I mean, you know, it's, to me, that's a no brainer. >>So I asked Andy Ja one time, I'd love to get your take on this, about those, you know, multiple data stores at the time. They probably had a thousand. I think they're probably up to 15 now, right? Different APIs, different S et cetera. And his response. I said, why don't you make it easier for, for customers and maybe build an abstraction or converge these? And he said, well, it's by design. What if you buy this? And, and what your thoughts are, cuz I, you know, he's a pretty straight shooter. Yeah. It's by design because it allows us as the market moves, we can move with it. And if we, if we give developers access to those low level primitives and APIs, then they can move with, with at market speed. Right. And so that again, by design, now we heard certainly Mongo poo pooing that today they didn't mention, they didn't call out Amazon. Yeah. Oracle has no compunction about specifically calling out Amazon. They do it all the time. What do you make of that? Can't Amazon have its cake and eat it too. In other words, extend some of the functionality of those specific databases without going to the Swiss army. >>I I'll put it this way. You, you kind of tapped in you're, you're sort of like, you know, killing me softly with your song there, which is that, you know, I was actually kind of went on a rant about this, actually know in, you know, come, you know, you know, my year ahead sort of out predictions. And I said, look, cloud folks, it's great that you're making individual SAS, you know, products easy to use. But now that I have to mix and match SAS products, you know, the burden of integration is on my shoulders. Start making my life easier. I think a good, you know, a good example of this would be, you know, for instance, you could take something like, you know, let's say like a Google big query. There's no reason why I can't have a piece of that that might, you know, might be paired, say, you know, say with span or something like that. >>The idea being is that if we're all working off a common, you know, common storage, we, you know, it's in cloud native, we can separate the computer engines. It means that we can use the right engine for the right part of the task. And the thing is that maybe, you know, myself as a consumer, I should not have to be choosing between big query and span. But the thing is, I should be able to say, look, I want to, you know, globally distribute database, but I also wanna do some analytics and therefore behind the scenes, you know, new microservices, it could connect the two wouldn't >>Microsoft synapse be an example of doing that. >>It should be an example. I wish I, I would love to hear more from Microsoft about this. They've been radio silent for about the past two or three years in data. You hardly hear about it, but synapse is actually those actually one of the ideas I had in mind now keep in mind that with synapse, you're not talking about, let's say, you know, I mean, it's, it's obviously a sequel data warehouse. It's not pure spark. It's basically their, it was their curated version of spark, but that's fine. But again, I would love to hear Microsoft talk more about that. They've been very quiet. >>Yeah. You, you, the intent is there to >>Simplify >>It exactly. And create an abstraction. Exactly. Yeah. They have been quiet about it. Yeah. Yeah. You would expect that, that maybe they're still trying to figure it out. So what's your prognosis from Mongo? I mean, since this company IP, you know, usually I, I tell and I tell everybody this, especially my kids, like don't buy a stock at IPO. You'll always get a better chance at a cheaper price to buy it. Yeah. And even though that was true with Mongo, you didn't have a big window. No. Like you did, for instance, with, with Facebook, certainly that's been the case with snowflake and sure. Alibaba, I mean, I name a zillion style was almost universal. Yeah. But, but since that, that, that first, you know, few months, period, this, this company has been on a roll. Right. And it, it obviously has been some volatility, but the execution has been outstanding. >>No question about that. I mean, the thing is, look what I, what I, and I'm just gonna talk on the product side on the sales side. Yeah. But on the product side, from the get go, they made a product that was easy for developers. Whereas let's say someone's giving an example, for instance, Cosmo CB, where to do certain operations. They had to go through multiple services in, you know, including Azure portal with Atlas, it's all within Atlas. So they've really, it's been kinda like design thinking from the start initially with, with the core Mongo DB, you know, you, the on premise, both this predates Atlas, I mean, part of it was that they were coming with a language that developers knew was just Javas script. The construct that they knew, which was JS on. So they started with that home core advantage, but they weren't the only ones doing that. But they did it with tooling that was very intuitive to developers that met developers, where they lived and what I give them, you know, then additional credit for is that when they went to the cloud and it wasn't an immediate thing, Atlas was not an overnight success, but they employed that same design thinking to Atlas, they made Atlas a good cloud experience. They didn't just do a lift and shift the cloud. And so that's why today basically like five or six years later, Atlas's most of their business. >>Yeah. It's what, 60% of the business now. Yeah. And then Dave, on the, on the earning scholar, maybe it wasn't Dave and somebody else in response to question said, yeah, ultimately this is the future will be be 90% of the business. I'm not gonna predict when. So my, my question is, okay, so let's call that the midterm midterm ATLA is gonna be 90% of the business with some exceptions that people just won't move to the cloud. What's next is the edge. A new opportunity is Mongo architecturally suited for the, I mean, it's certainly suited for the right, the home Depot store. Sure. You know, at the edge. Yeah. If you, if you consider that edge, which I guess it is form of edge, but how about the far edge EVs cell towers, you know, far side, real time, AI inferencing, what's the requirement there, can Mongo fit there? Any thoughts >>On that? I think the AI and the inferencing stuff is interesting. It's something which really Mongo has not tackled yet. I think we take the same principle, which is the lightweight stuff. In other words, you'll say, do let's say a classification or a prediction or some sort of prescriptive action in other words, where you're not doing some convolution, neural networking and trying to do like, you know, text, text to voice or, or, or vice versa. Well, you're not trying to do all that really fancy stuff. I think that's, you know, if you're keeping it SIM you know, kinda like the kiss principle, I think that's very much within Mongo's future. I think with the realm they have, they basically have the infrastructure to go out to the edge. I think with the fact that they've embraced GraphQL has also made them a lot more extensible. So I think they certainly do have, you know, I, I do see the edge as being, you know, you know, in, in, you know, in their, in their pathway. I do see basically lightweight analytics and lightweight, let's say machine learning definitely in their >>Future. And, but, and they would, would you agree that they're in a better position to tap that opportunity than say a snowflake or an Oracle now maybe M and a can change that. R D can maybe change that, but fundamentally from an architectural standpoint yeah. Are they in a better position? >>Good question. I think that that Mongo snowflake by virtual fact, I mean that they've been all, you know, all cloud start off with, I think makes it more difficult, not impossible to move out to the edge, but it means that, and I, and know, and I, and I said, they're really starting to making some tentative moves in that direction. I'm looking forward to next week to, you know, seeing what, you know, hearing what we're gonna, what they're gonna be saying about that. But I do think, right. You know, you know, to answer your question directly, I'd say like right now, I'd say Mongo probably has a, you know, has a head start there. >>I'm losing track of time. I could go forever with you. Tony bear DB insight with tons of insights. Thanks so much for coming back with. >>It's only one insight insight, Dave. Good to see you again. All >>Right. Good to see you. Thank you. Okay. Keep it right there. Right back at the Java center, Mongo DB world 2022, you're watching the cube.

Published Date : Jun 7 2022

SUMMARY :

We're at the new Javet center. You face to face and especially the ones in Vegas, it's the first time everybody's been out, you know, And, and this new venue is fantastic And like for instance, you know, sapphires had maybe about one third, their normal turnout. you just published a piece this morning in venture beat is time for Mongo It's that the model has been computed offline so that when you come on in Operational, you know, use cases, patient data. That's a long that's, that's much, it's transactions, you know, the world has been used to table, you know, you know, columns and rows and and then, you know, you talk to a lot of Oracle customers as do I sure. you know, it's just like when you go on Twitter, do you naturally see all the latest tweets? I mean, you mentioned that in, in your article, but basically it's bringing analytics to transactions bringing are coming from companies that already have, you know, analytic database or data warehouses, Per, is that by design though? but it takes more, you know, transformation to, to decide which, you know, Eliminating the need for, you know, complex ETL. I think through, you know, I mean through replication, there's still gonna be some transformation in terms of turning, but there's a sort of analog for Mongo that I'll ask you in the fullness of time, And actually that's also the same principle, you know, on which let's say for instance, And then you can set up secondary nodes, which then you have to think about availability, the fact to say like that separate node does not have to be the same instance class, you know, for the analytic node, I think is, you know, is a major step forward you know, the re you know, the, you know, the, I guess the fruition of this is going to be when they but now square this circle for me, cuz now you got Mago talking sequel. I think now they're, you know, you know, they're taking baby steps to start, you know, reaching out to them. You called it not, I love collisions, you know, I mean, you know, it's, to me, that's a no brainer. I said, why don't you make it easier for, for customers and maybe build an abstraction or converge these? I think a good, you know, a good example of this would be, you know, for instance, you could take something But the thing is, I should be able to say, look, I want to, you know, globally distribute database, let's say, you know, I mean, it's, it's obviously a sequel data warehouse. I mean, since this company IP, you know, usually I, I tell and I tell everybody this, to developers that met developers, where they lived and what I give them, you know, but how about the far edge EVs cell towers, you know, you know, you know, in, in, you know, in their, in their pathway. And, but, and they would, would you agree that they're in a better position to tap that opportunity I mean that they've been all, you know, all cloud start off with, I could go forever with you. Good to see you again. Right back at the Java center, Mongo DB

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
TeresaPERSON

0.99+

ComcastORGANIZATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Khalid Al RumaihiPERSON

0.99+

Phil SorenPERSON

0.99+

BahrainLOCATION

0.99+

MikePERSON

0.99+

Dave VolantePERSON

0.99+

TIBCOORGANIZATION

0.99+

General ElectricORGANIZATION

0.99+

Teresa CarlsonPERSON

0.99+

John FurrierPERSON

0.99+

Jeff FrickPERSON

0.99+

TonyPERSON

0.99+

2016DATE

0.99+

AWSORGANIZATION

0.99+

PegaORGANIZATION

0.99+

KhalidPERSON

0.99+

Tony BaerPERSON

0.99+

AsiaLOCATION

0.99+

Dave VellantePERSON

0.99+

2014DATE

0.99+

$100 millionQUANTITY

0.99+

Palo AltoLOCATION

0.99+

SunnyvaleLOCATION

0.99+

March 2015DATE

0.99+

DavePERSON

0.99+

JeffPERSON

0.99+

MongoORGANIZATION

0.99+

46%QUANTITY

0.99+

90%QUANTITY

0.99+

Todd NielsenPERSON

0.99+

2017DATE

0.99+

SeptemberDATE

0.99+

MicrosoftORGANIZATION

0.99+

JulyDATE

0.99+

USLOCATION

0.99+

AtlasORGANIZATION

0.99+

Bahrain Economic Development BoardORGANIZATION

0.99+

KuwaitLOCATION

0.99+

MaltaLOCATION

0.99+

Hong KongLOCATION

0.99+

SingaporeLOCATION

0.99+

2012DATE

0.99+

Gulf Cooperation CouncilORGANIZATION

0.99+

So CalORGANIZATION

0.99+

VMwareORGANIZATION

0.99+

United StatesLOCATION

0.99+

VegasLOCATION

0.99+

JohnPERSON

0.99+

New YorkLOCATION

0.99+

Tony Baer, Doug Henschen and Sanjeev Mohan, Couchbase | Couchbase Application Modernization


 

(upbeat music) >> Welcome to this CUBE Power Panel where we're going to talk about application modernization, also success templates, and take a look at some new survey data to see how CIOs are thinking about digital transformation, as we get deeper into the post isolation economy. And with me are three familiar VIP guests to CUBE audiences. Tony Bear, the principal at DB InSight, Doug Henschen, VP and principal analyst at Constellation Research and Sanjeev Mohan principal at SanjMo. Guys, good to see you again, welcome back. >> Thank you. >> Glad to be here. >> Thanks for having us. >> Glad to be here. >> All right, Doug. Let's get started with you. You know, this recent survey, which was commissioned by Couchbase, 650 CIOs and CTOs, and IT practitioners. So obviously very IT heavy. They responded to the following question, "In response to the pandemic, my organization accelerated our application modernization strategy and of course, an overwhelming majority, 94% agreed or strongly agreed." So I'm sure, Doug, that you're not shocked by that, but in the same survey, modernizing existing technologies was second only behind cyber security is the top investment priority this year. Doug, bring us into your world and tell us the trends that you're seeing with the clients and customers you work with in their modernization initiatives. >> Well, the survey, of course, is spot on. You know, any Constellation Research analyst, any systems integrator will tell you that we saw more transformation work in the last two years than in the prior six to eight years. A lot of it was forced, you know, a lot of movement to the cloud, a lot of process improvement, a lot of automation work, but transformational is aspirational and not every company can be a leader. You know, at Constellation, we focus our research on those market leaders and that's only, you know, the top 5% of companies that are really innovating, that are really disrupting their markets and we try to share that with companies that want to be fast followers, that these are the next 20 to 25% of companies that don't want to get left behind, but don't want to hit some of the same roadblocks and you know, pioneering pitfalls that the real leaders are encountering when they're harnessing new technologies. So the rest of the companies, you know, the cautious adopters, the laggards, many of them fall by the wayside, that's certainly what we saw during the pandemic. Who are these leaders? You know, the old saw examples that people saw at the Amazons, the Teslas, the Airbnbs, the Ubers and Lyfts, but new examples are emerging every year. And as a consumer, you immediately recognize these transformed experiences. One of my favorite examples from the pandemic is Rocket Mortgage. No disclaimer required, I don't own stock and you're not client, but when I wanted to take advantage of those record low mortgage interest rates, I called my current bank and some, you know, stall word, very established conventional banks, I'm talking to you Bank of America, City Bank, and they were taking days and weeks to get back to me. Rocket Mortgage had the locked in commitment that day, a very proactive, consistent communications across web, mobile, email, all customer touchpoints. I closed in a matter of weeks an entirely digital seamless process. This is back in the gloves and masks days and the loan officer came parked in our driveway, wiped down an iPad, handed us that iPad, we signed all those documents digitally, completely electronic workflow. The only wet signatures required were those demanded by the state. So it's easy to spot these transformed experiences. You know, Rocket had most of that in place before the pandemic, and that's why they captured 8% of the national mortgage market by 2020 and they're on track to hit 10% here in 2022. >> Yeah, those are great examples. I mean, I'm not a shareholder either, but I am a customer. I even went through the same thing in the pandemic. It was all done in digital it was a piece of cake and I happened to have to do another one with a different firm and stuck with that firm for a variety of reasons and it was night and day. So to your point, it was a forced merge to digital. If you were there beforehand, you had real advantage, it could accelerate your lead during the pandemic. Okay, now Tony bear. Mr. Bear, I understand you're skeptical about all this buzz around digital transformation. So in that same survey, the data shows that the majority of respondents said that their digital initiatives were largely reactive to outside forces, the pandemic compliance changes, et cetera. But at the same time, they indicated that the results while somewhat mixed were generally positive. So why are you skeptical? >> The reason being, and by the way, I have nothing against application modernization. The problem... I think the problem I ever said, it often gets conflated with digital transformation and digital transformation itself has become such a buzzword and so overused that it's really hard, if not impossible to pin down (coughs) what digital transformation actually means. And very often what you'll hear from, let's say a C level, you know, (mumbles) we want to run like Google regardless of whether or not that goal is realistic you know, for that organization (coughs). The thing is that we've been using, you know, businesses have been using digital data since the days of the mainframe, since the... Sorry that data has been digital. What really has changed though, is just the degree of how businesses interact with their customers, their partners, with the whole rest of the ecosystem and how their business... And how in many cases you take look at the auto industry that the nature of the business, you know, is changing. So there is real change of foot, the question is I think we need to get more specific in our goals. And when you look at it, if we can boil it down to a couple, maybe, you know, boil it down like really over simplistically, it's really all about connectedness. No, I'm not saying connectivity 'cause that's more of a physical thing, but connectedness. Being connected to your customer, being connected to your supplier, being connected to the, you know, to the whole landscape, that you operate in. And of course today we have many more channels with which we operate, you know, with customers. And in fact also if you take a look at what's happening in the automotive industry, for instance, I was just reading an interview with Bill Ford, you know, their... Ford is now rapidly ramping up their electric, you know, their electric vehicle strategy. And what they realize is it's not just a change of technology, you know, it is a change in their business, it's a change in terms of the relationship they have with their customer. Their customers have traditionally been automotive dealers who... And the automotive dealers have, you know, traditionally and in many cases by state law now have been the ones who own the relationship with the end customer. But when you go to an electric vehicle, the product becomes a lot more of a software product. And in turn, that means that Ford would have much more direct interaction with its end customers. So that's really what it's all about. It's about, you know, connectedness, it's also about the ability to act, you know, we can say agility, it's about ability not just to react, but to anticipate and act. And so... And of course with all the proliferation, you know, the explosion of data sources and connectivity out there and the cloud, which allows much more, you know, access to compute, it changes the whole nature of the ball game. The fact is that we have to avoid being overwhelmed by this and make our goals more, I guess, tangible, more strictly defined. >> Yeah, now... You know, great points there. And I want to just bring in some survey data, again, two thirds of the respondents said their digital strategies were set by IT and only 26% by the C-suite, 8% by the line of business. Now, this was largely a survey of CIOs and CTOs, but, wow, doesn't seem like the right mix. It's a Doug's point about, you know, leaders in lagers. My guess is that Rocket Mortgage, their digital strategy was led by the chief digital officer potentially. But at the same time, you would think, Tony, that application modernization is a prerequisite for digital transformation. But I want to go to Sanjeev in this war in the survey. And respondents said that on average, they want 58% of their IT spend to be in the public cloud three years down the road. Now, again, this is CIOs and CTOs, but (mumbles), but that's a big number. And there was no ambiguity because the question wasn't worded as cloud, it was worded as public cloud. So Sanjeev, what do you make of that? What's your feeling on cloud as flexible architecture? What does this all mean to you? >> Dave, 58% of IT spend in the cloud is a huge change from today. Today, most estimates, peg cloud IT spend to be somewhere around five to 15%. So what this number tells us is that the cloud journey is still in its early days, so we should buckle up. We ain't seen nothing yet, but let me add some color to this. CIOs and CTOs maybe ramping up their cloud deployment, but they still have a lot of problems to solve. I can tell you from my previous experience, for example, when I was in Gartner, I used to talk to a lot of customers who were in a rush to move into the cloud. So if we were to plot, let's say a maturity model, typically a maturity model in any discipline in IT would have something like crawl, walk, run. So what I was noticing was that these organizations were jumping straight to run because in the pandemic, they were under the gun to quickly deploy into the cloud. So now they're kind of coming back down to, you know, to crawl, walk, run. So basically they did what they had to do under the circumstances, but now they're starting to resolve some of the very, very important issues. For example, security, data privacy, governance, observability, these are all very big ticket items. Another huge problem that nav we are noticing more than we've ever seen, other rising costs. Cloud makes it so easy to onboard new use cases, but it leads to all kinds of unexpected increase in spikes in your operating expenses. So what we are seeing is that organizations are now getting smarter about where the workloads should be deployed. And sometimes it may be in more than one cloud. Multi-cloud is no longer an aspirational thing. So that is a huge trend that we are seeing and that's why you see there's so much increased planning to spend money in public cloud. We do have some issues that we still need to resolve. For example, multi-cloud sounds great, but we still need some sort of single pane of glass, control plane so we can have some fungibility and move workloads around. And some of this may also not be in public cloud, some workloads may actually be done in a more hybrid environment. >> Yeah, definitely. I call it Supercloud. People win sometimes-- >> Supercloud. >> At that term, but it's above multi-cloud, it floats, you know, on topic. But so you clearly identified some potholes. So I want to talk about the evolution of the application experience 'cause there's some potholes there too. 81% of their respondents in that survey said, "Our development teams are embracing the cloud and other technologies faster than the rest of the organization can adopt and manage them." And that was an interesting finding to me because you'd think that infrastructure is code and designing insecurity and containers and Kubernetes would be a great thing for organizations, and it is I'm sure in terms of developer productivity, but what do you make of this? Does the modernization path also have some potholes, Sanjeev? What are those? >> So, first of all, Dave, you mentioned in your previous question, there's no ambiguity, it's a public cloud. This one, I feel it has quite a bit of ambiguity because it talks about cloud and other technologies, that sort of opens up the kimono, it's like that's everything. Also, it says that the rest of the organization is not able to adopt and manage. Adoption is a business function, management is an IT function. So I feed this question is a bit loaded. We know that app modernization is here to stay, developing in the cloud removes a lot of traditional barriers or procuring instantiating infrastructure. In addition, developers today have so many more advanced tools. So they're able to develop the application faster because they have like low-code/no-code options, they have notebooks to write the machine learning code, they have the entire DevOps CI/CD tool chain that makes it easy to version control and push changes. But there are potholes. For example, are developers really interested in fixing data quality problems, all data, privacy, data, access, data governance? How about monitoring? I doubt developers want to get encumbered with all of these operationalization management pieces. Developers are very keen to deliver new functionality. So what we are now seeing is that it is left to the data team to figure out all of these operationalization productionization things that the developers have... You know, are not truly interested in that. So which actually takes me to this topic that, Dave, you've been quite actively covering and we've been talking about, see, the whole data mesh. >> Yeah, I was going to say, it's going to solve all those data quality problems, Sanjeev. You know, I'm a sucker for data mesh. (laughing) >> Yeah, I know, but see, what's going to happen with data mesh is that developers are now going to have more domain resident power to develop these applications. What happens to all of the data curation governance quality that, you know, a central team used to do. So there's a lot of open ended questions that still need to be answered. >> Yeah, That gets automated, Tony, right? With computational governance. So-- >> Of course. >> It's not trivial, it's not trivial, but I'm still an optimist by the end of the decade we'll start to get there. Doug, I want to go to you again and talk about the business case. We all remember, you know, the business case for modernization that is... We remember the Y2K, there was a big it spending binge and this was before the (mumbles) of the enterprise, right? CIOs, they'd be asked to develop new applications and the business maybe helps pay for it or offset the cost with the initial work and deployment then IT got stuck managing the sprawling portfolio for years. And a lot of the apps had limited adoption or only served a few users, so there were big pushes toward rationalizing the portfolio at that time, you know? So do I modernize, they had to make a decision, consolidate, do I sunset? You know, it was all based on value. So what's happening today and how are businesses making the case to modernize, are they going through a similar rationalization exercise, Doug? >> Well, the Y2K era experience that you talked about was back in the days of, you know, throw the requirements over the wall and then we had waterfall development that lasted months in some cases years. We see today's most successful companies building cross functional teams. You know, the C-suite the line of business, the operations, the data and analytics teams, the IT, everybody has a seat at the table to lead innovation and modernization initiatives and they don't start, the most successful companies don't start by talking about technology, they start by envisioning a business outcome by envisioning a transformed customer experience. You hear the example of Amazon writing the press release for the product or service it wants to deliver and then it works backwards to create it. You got to work backwards to determine the tech that will get you there. What's very clear though, is that you can't transform or modernize by lifting and shifting the legacy mess into the cloud. That doesn't give you the seamless processes, that doesn't give you data driven personalization, it doesn't give you a connected and consistent customer experience, whether it's online or mobile, you know, bots, chat, phone, everything that we have today that requires a modern, scalable cloud negative approach and agile deliver iterative experience where you're collaborating with this cross-functional team and course correct, again, making sure you're on track to what's needed. >> Yeah. Now, Tony, both Doug and Sanjeev have been, you know, talking about what I'm going to call this IT and business schism, and we've all done surveys. One of the things I'd love to see Couchbase do in future surveys is not only survey the it heavy, but also survey the business heavy and see what they say about who's leading the digital transformation and who's in charge of the customer experience. Do you have any thoughts on that, Tony? >> Well, there's no question... I mean, it's kind like, you know, the more things change. I mean, we've been talking about that IT and the business has to get together, we talked about this back during, and Doug, you probably remember this, back during the Y2K ERP days, is that you need these cross functional teams, we've been seeing this. I think what's happening today though, is that, you know, back in the Y2K era, we were basically going into like our bedrock systems and having to totally re-engineer them. And today what we're looking at is that, okay, those bedrock systems, the ones that basically are keeping the lights on, okay, those are there, we're not going to mess with that, but on top of that, that's where we're going to innovate. And that gives us a chance to be more, you know, more directed and therefore we can bring these related domains together. I mean, that's why just kind of, you know, talk... Where Sanjeev brought up the term of data mesh, I've been a bit of a cynic about data mesh, but I do think that work and work is where we bring a bunch of these connected teams together, teams that have some sort of shared context, though it's everybody that's... Every team that's working, let's say around the customer, for instance, which could be, you know, in marketing, it could be in sales, order processing in some cases, you know, in logistics and delivery. So I think that's where I think we... You know, there's some hope and the fact is that with all the advanced, you know, basically the low-code/no-code tools, they are ways to bring some of these other players, you know, into the process who previously had to... Were sort of, you know, more at the end of like a, you know, kind of a... Sort of like they throw it over the wall type process. So I do believe, but despite all my cynicism, I do believe there's some hope. >> Thank you. Okay, last question. And maybe all of you could answer this. Maybe, Sanjeev, you can start it off and then Doug and Tony can chime in. In the survey, about a half, nearly half of the 650 respondents said they could tangibly show their organizations improve customer experiences that were realized from digital projects in the last 12 months. Now, again, not surprising, but we've been talking about digital experiences, but there's a long way to go judging from our pandemic customer experiences. And we, again, you know, some were great, some were terrible. And so, you know, and some actually got worse, right? Will that improve? When and how will it improve? Where's 5G and things like that fit in in terms of improving customer outcomes? Maybe, Sanjeev, you could start us off here. And by the way, plug any research that you're working on in this sort of area, please do. >> Thank you, Dave. As a resident optimist on this call, I'll get us started and then I'm sure Doug and Tony will have interesting counterpoints. So I'm a technology fan boy, I have to admit, I am in all of all these new companies and how they have been able to rise up and handle extreme scale. In this time that we are speaking on this show, these food delivery companies would have probably handled tens of thousands of orders in minutes. So these concurrent orders, delivery, customer support, geospatial location intelligence, all of this has really become commonplace now. It used to be that, you know, large companies like Apple would be able to handle all of these supply chain issues, disruptions that we've been facing. But now in my opinion, I think we are seeing this in, Doug mentioned Rocket Mortgage. So we've seen it in FinTech and shopping apps. So we've seen the same scale and it's more than 5G. It includes things like... Even in the public cloud, we have much more efficient, better hardware, which can do like deep learning networks much more efficiently. So machine learning, a lot of natural language programming, being able to handle unstructured data. So in my opinion, it's quite phenomenal to see how technology has actually come to rescue and as, you know, billions of us have gone online over the last two years. >> Yeah, so, Doug, so Sanjeev's point, he's saying, basically, you ain't seen nothing yet. What are your thoughts here, your final thoughts. >> Well, yeah, I mean, there's some incredible technologies coming including 5G, but you know, it's only going to pave the cow path if the underlying app, if the underlying process is clunky. You have to modernize, take advantage of, you know, serverless scalability, autonomous optimization, advanced data science. There's lots of cutting edge capabilities out there today, but you know, lifting and shifting you got to get your hands dirty and actually modernize on that data front. I mentioned my research this year, I'm doing a lot of in depth looks at some of the analytical data platforms. You know, these lake houses we've had some conversations about that and helping companies to harness their data, to have a more personalized and predictive and proactive experience. So, you know, we're talking about the Snowflakes and Databricks and Googles and Teradata and Vertica and Yellowbrick and that's the research I'm focusing on this year. >> Yeah, your point about paving the cow path is right on, especially over the pandemic, a lot of the processes were unknown. But you saw this with RPA, paving the cow path only got you so far. And so, you know, great points there. Tony, you get the last word, bring us home. >> Well, I'll put it this way. I think there's a lot of hope in terms of that the new generation of developers that are coming in are a lot more savvy about things like data. And I think also the new generation of people in the business are realizing that we need to have data as a core competence. So I do have optimism there that the fact is, I think there is a much greater consciousness within both the business side and the technical. In the technology side, the organization of the importance of data and how to approach that. And so I'd like to just end on that note. >> Yeah, excellent. And I think you're right. Putting data at the core is critical data mesh I think very well describes the problem and (mumbles) credit lays out a solution, just the technology's not there yet, nor are the standards. Anyway, I want to thank the panelists here. Amazing. You guys are always so much fun to work with and love to have you back in the future. And thank you for joining today's broadcast brought to you by Couchbase. By the way, check out Couchbase on the road this summer at their application modernization summits, they're making up for two years of shut in and coming to you. So you got to go to couchbase.com/roadshow to find a city near you where you can meet face to face. In a moment. Ravi Mayuram, the chief technology officer of Couchbase will join me. You're watching theCUBE, the leader in high tech enterprise coverage. (bright music)

Published Date : May 19 2022

SUMMARY :

Guys, good to see you again, welcome back. but in the same survey, So the rest of the companies, you know, and I happened to have to do another one it's also about the ability to act, So Sanjeev, what do you make of that? Dave, 58% of IT spend in the cloud I call it Supercloud. it floats, you know, on topic. Also, it says that the say, it's going to solve that still need to be answered. Yeah, That gets automated, Tony, right? And a lot of the apps had limited adoption is that you can't transform or modernize One of the things I'd love to see and the business has to get together, nearly half of the 650 respondents and how they have been able to rise up you ain't seen nothing yet. and that's the research paving the cow path only got you so far. in terms of that the new and love to have you back in the future.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DougPERSON

0.99+

TonyPERSON

0.99+

Ravi MayuramPERSON

0.99+

AppleORGANIZATION

0.99+

Tony BearPERSON

0.99+

DavePERSON

0.99+

Doug HenschenPERSON

0.99+

Bank of AmericaORGANIZATION

0.99+

Tony BaerPERSON

0.99+

AmazonORGANIZATION

0.99+

FordORGANIZATION

0.99+

iPadCOMMERCIAL_ITEM

0.99+

Sanjeev MohanPERSON

0.99+

SanjeevPERSON

0.99+

TeradataORGANIZATION

0.99+

94%QUANTITY

0.99+

VerticaORGANIZATION

0.99+

58%QUANTITY

0.99+

Constellation ResearchORGANIZATION

0.99+

YellowbrickORGANIZATION

0.99+

8%QUANTITY

0.99+

2022DATE

0.99+

todayDATE

0.99+

City BankORGANIZATION

0.99+

Bill FordPERSON

0.99+

two yearsQUANTITY

0.99+

GooglesORGANIZATION

0.99+

81%QUANTITY

0.99+

10%QUANTITY

0.99+

DB InSightORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

TodayDATE

0.99+

2020DATE

0.99+

CouchbaseORGANIZATION

0.99+

SnowflakesORGANIZATION

0.99+

5%QUANTITY

0.98+

650 CIOsQUANTITY

0.98+

AmazonsORGANIZATION

0.98+

bothQUANTITY

0.98+

OneQUANTITY

0.98+

LyftsORGANIZATION

0.98+

secondQUANTITY

0.98+

SanjMoORGANIZATION

0.98+

26%QUANTITY

0.98+

UbersORGANIZATION

0.98+

three yearsQUANTITY

0.98+

650 respondentsQUANTITY

0.98+

pandemicEVENT

0.97+

this yearDATE

0.97+

15%QUANTITY

0.97+

RocketORGANIZATION

0.97+

more than one cloudQUANTITY

0.97+

25%QUANTITY

0.97+

Tony bearPERSON

0.97+

around fiveQUANTITY

0.96+

two thirdsQUANTITY

0.96+

about a halfQUANTITY

0.96+

Analyst Predictions 2023: The Future of Data Management


 

(upbeat music) >> Hello, this is Dave Valente with theCUBE, and one of the most gratifying aspects of my role as a host of "theCUBE TV" is I get to cover a wide range of topics. And quite often, we're able to bring to our program a level of expertise that allows us to more deeply explore and unpack some of the topics that we cover throughout the year. And one of our favorite topics, of course, is data. Now, in 2021, after being in isolation for the better part of two years, a group of industry analysts met up at AWS re:Invent and started a collaboration to look at the trends in data and predict what some likely outcomes will be for the coming year. And it resulted in a very popular session that we had last year focused on the future of data management. And I'm very excited and pleased to tell you that the 2023 edition of that predictions episode is back, and with me are five outstanding market analyst, Sanjeev Mohan of SanjMo, Tony Baer of dbInsight, Carl Olofson from IDC, Dave Menninger from Ventana Research, and Doug Henschen, VP and Principal Analyst at Constellation Research. Now, what is it that we're calling you, guys? A data pack like the rat pack? No, no, no, no, that's not it. It's the data crowd, the data crowd, and the crowd includes some of the best minds in the data analyst community. They'll discuss how data management is evolving and what listeners should prepare for in 2023. Guys, welcome back. Great to see you. >> Good to be here. >> Thank you. >> Thanks, Dave. (Tony and Dave faintly speaks) >> All right, before we get into 2023 predictions, we thought it'd be good to do a look back at how we did in 2022 and give a transparent assessment of those predictions. So, let's get right into it. We're going to bring these up here, the predictions from 2022, they're color-coded red, yellow, and green to signify the degree of accuracy. And I'm pleased to report there's no red. Well, maybe some of you will want to debate that grading system. But as always, we want to be open, so you can decide for yourselves. So, we're going to ask each analyst to review their 2022 prediction and explain their rating and what evidence they have that led them to their conclusion. So, Sanjeev, please kick it off. Your prediction was data governance becomes key. I know that's going to knock you guys over, but elaborate, because you had more detail when you double click on that. >> Yeah, absolutely. Thank you so much, Dave, for having us on the show today. And we self-graded ourselves. I could have very easily made my prediction from last year green, but I mentioned why I left it as yellow. I totally fully believe that data governance was in a renaissance in 2022. And why do I say that? You have to look no further than AWS launching its own data catalog called DataZone. Before that, mid-year, we saw Unity Catalog from Databricks went GA. So, overall, I saw there was tremendous movement. When you see these big players launching a new data catalog, you know that they want to be in this space. And this space is highly critical to everything that I feel we will talk about in today's call. Also, if you look at established players, I spoke at Collibra's conference, data.world, work closely with Alation, Informatica, a bunch of other companies, they all added tremendous new capabilities. So, it did become key. The reason I left it as yellow is because I had made a prediction that Collibra would go IPO, and it did not. And I don't think anyone is going IPO right now. The market is really, really down, the funding in VC IPO market. But other than that, data governance had a banner year in 2022. >> Yeah. Well, thank you for that. And of course, you saw data clean rooms being announced at AWS re:Invent, so more evidence. And I like how the fact that you included in your predictions some things that were binary, so you dinged yourself there. So, good job. Okay, Tony Baer, you're up next. Data mesh hits reality check. As you see here, you've given yourself a bright green thumbs up. (Tony laughing) Okay. Let's hear why you feel that was the case. What do you mean by reality check? >> Okay. Thanks, Dave, for having us back again. This is something I just wrote and just tried to get away from, and this just a topic just won't go away. I did speak with a number of folks, early adopters and non-adopters during the year. And I did find that basically that it pretty much validated what I was expecting, which was that there was a lot more, this has now become a front burner issue. And if I had any doubt in my mind, the evidence I would point to is what was originally intended to be a throwaway post on LinkedIn, which I just quickly scribbled down the night before leaving for re:Invent. I was packing at the time, and for some reason, I was doing Google search on data mesh. And I happened to have tripped across this ridiculous article, I will not say where, because it doesn't deserve any publicity, about the eight (Dave laughing) best data mesh software companies of 2022. (Tony laughing) One of my predictions was that you'd see data mesh washing. And I just quickly just hopped on that maybe three sentences and wrote it at about a couple minutes saying this is hogwash, essentially. (laughs) And that just reun... And then, I left for re:Invent. And the next night, when I got into my Vegas hotel room, I clicked on my computer. I saw a 15,000 hits on that post, which was the most hits of any single post I put all year. And the responses were wildly pro and con. So, it pretty much validates my expectation in that data mesh really did hit a lot more scrutiny over this past year. >> Yeah, thank you for that. I remember that article. I remember rolling my eyes when I saw it, and then I recently, (Tony laughing) I talked to Walmart and they actually invoked Martin Fowler and they said that they're working through their data mesh. So, it takes a really lot of thought, and it really, as we've talked about, is really as much an organizational construct. You're not buying data mesh >> Bingo. >> to your point. Okay. Thank you, Tony. Carl Olofson, here we go. You've graded yourself a yellow in the prediction of graph databases. Take off. Please elaborate. >> Yeah, sure. So, I realized in looking at the prediction that it seemed to imply that graph databases could be a major factor in the data world in 2022, which obviously didn't become the case. It was an error on my part in that I should have said it in the right context. It's really a three to five-year time period that graph databases will really become significant, because they still need accepted methodologies that can be applied in a business context as well as proper tools in order for people to be able to use them seriously. But I stand by the idea that it is taking off, because for one thing, Neo4j, which is the leading independent graph database provider, had a very good year. And also, we're seeing interesting developments in terms of things like AWS with Neptune and with Oracle providing graph support in Oracle database this past year. Those things are, as I said, growing gradually. There are other companies like TigerGraph and so forth, that deserve watching as well. But as far as becoming mainstream, it's going to be a few years before we get all the elements together to make that happen. Like any new technology, you have to create an environment in which ordinary people without a whole ton of technical training can actually apply the technology to solve business problems. >> Yeah, thank you for that. These specialized databases, graph databases, time series databases, you see them embedded into mainstream data platforms, but there's a place for these specialized databases, I would suspect we're going to see new types of databases emerge with all this cloud sprawl that we have and maybe to the edge. >> Well, part of it is that it's not as specialized as you might think it. You can apply graphs to great many workloads and use cases. It's just that people have yet to fully explore and discover what those are. >> Yeah. >> And so, it's going to be a process. (laughs) >> All right, Dave Menninger, streaming data permeates the landscape. You gave yourself a yellow. Why? >> Well, I couldn't think of a appropriate combination of yellow and green. Maybe I should have used chartreuse, (Dave laughing) but I was probably a little hard on myself making it yellow. This is another type of specialized data processing like Carl was talking about graph databases is a stream processing, and nearly every data platform offers streaming capabilities now. Often, it's based on Kafka. If you look at Confluent, their revenues have grown at more than 50%, continue to grow at more than 50% a year. They're expected to do more than half a billion dollars in revenue this year. But the thing that hasn't happened yet, and to be honest, they didn't necessarily expect it to happen in one year, is that streaming hasn't become the default way in which we deal with data. It's still a sidecar to data at rest. And I do expect that we'll continue to see streaming become more and more mainstream. I do expect perhaps in the five-year timeframe that we will first deal with data as streaming and then at rest, but the worlds are starting to merge. And we even see some vendors bringing products to market, such as K2View, Hazelcast, and RisingWave Labs. So, in addition to all those core data platform vendors adding these capabilities, there are new vendors approaching this market as well. >> I like the tough grading system, and it's not trivial. And when you talk to practitioners doing this stuff, there's still some complications in the data pipeline. And so, but I think, you're right, it probably was a yellow plus. Doug Henschen, data lakehouses will emerge as dominant. When you talk to people about lakehouses, practitioners, they all use that term. They certainly use the term data lake, but now, they're using lakehouse more and more. What's your thoughts on here? Why the green? What's your evidence there? >> Well, I think, I was accurate. I spoke about it specifically as something that vendors would be pursuing. And we saw yet more lakehouse advocacy in 2022. Google introduced its BigLake service alongside BigQuery. Salesforce introduced Genie, which is really a lakehouse architecture. And it was a safe prediction to say vendors are going to be pursuing this in that AWS, Cloudera, Databricks, Microsoft, Oracle, SAP, Salesforce now, IBM, all advocate this idea of a single platform for all of your data. Now, the trend was also supported in 2023, in that we saw a big embrace of Apache Iceberg in 2022. That's a structured table format. It's used with these lakehouse platforms. It's open, so it ensures portability and it also ensures performance. And that's a structured table that helps with the warehouse side performance. But among those announcements, Snowflake, Google, Cloud Era, SAP, Salesforce, IBM, all embraced Iceberg. But keep in mind, again, I'm talking about this as something that vendors are pursuing as their approach. So, they're advocating end users. It's very cutting edge. I'd say the top, leading edge, 5% of of companies have really embraced the lakehouse. I think, we're now seeing the fast followers, the next 20 to 25% of firms embracing this idea and embracing a lakehouse architecture. I recall Christian Kleinerman at the big Snowflake event last summer, making the announcement about Iceberg, and he asked for a show of hands for any of you in the audience at the keynote, have you heard of Iceberg? And just a smattering of hands went up. So, the vendors are ahead of the curve. They're pushing this trend, and we're now seeing a little bit more mainstream uptake. >> Good. Doug, I was there. It was you, me, and I think, two other hands were up. That was just humorous. (Doug laughing) All right, well, so I liked the fact that we had some yellow and some green. When you think about these things, there's the prediction itself. Did it come true or not? There are the sub predictions that you guys make, and of course, the degree of difficulty. So, thank you for that open assessment. All right, let's get into the 2023 predictions. Let's bring up the predictions. Sanjeev, you're going first. You've got a prediction around unified metadata. What's the prediction, please? >> So, my prediction is that metadata space is currently a mess. It needs to get unified. There are too many use cases of metadata, which are being addressed by disparate systems. For example, data quality has become really big in the last couple of years, data observability, the whole catalog space is actually, people don't like to use the word data catalog anymore, because data catalog sounds like it's a catalog, a museum, if you may, of metadata that you go and admire. So, what I'm saying is that in 2023, we will see that metadata will become the driving force behind things like data ops, things like orchestration of tasks using metadata, not rules. Not saying that if this fails, then do this, if this succeeds, go do that. But it's like getting to the metadata level, and then making a decision as to what to orchestrate, what to automate, how to do data quality check, data observability. So, this space is starting to gel, and I see there'll be more maturation in the metadata space. Even security privacy, some of these topics, which are handled separately. And I'm just talking about data security and data privacy. I'm not talking about infrastructure security. These also need to merge into a unified metadata management piece with some knowledge graph, semantic layer on top, so you can do analytics on it. So, it's no longer something that sits on the side, it's limited in its scope. It is actually the very engine, the very glue that is going to connect data producers and consumers. >> Great. Thank you for that. Doug. Doug Henschen, any thoughts on what Sanjeev just said? Do you agree? Do you disagree? >> Well, I agree with many aspects of what he says. I think, there's a huge opportunity for consolidation and streamlining of these as aspects of governance. Last year, Sanjeev, you said something like, we'll see more people using catalogs than BI. And I have to disagree. I don't think this is a category that's headed for mainstream adoption. It's a behind the scenes activity for the wonky few, or better yet, companies want machine learning and automation to take care of these messy details. We've seen these waves of management technologies, some of the latest data observability, customer data platform, but they failed to sweep away all the earlier investments in data quality and master data management. So, yes, I hope the latest tech offers, glimmers that there's going to be a better, cleaner way of addressing these things. But to my mind, the business leaders, including the CIO, only want to spend as much time and effort and money and resources on these sorts of things to avoid getting breached, ending up in headlines, getting fired or going to jail. So, vendors bring on the ML and AI smarts and the automation of these sorts of activities. >> So, if I may say something, the reason why we have this dichotomy between data catalog and the BI vendors is because data catalogs are very soon, not going to be standalone products, in my opinion. They're going to get embedded. So, when you use a BI tool, you'll actually use the catalog to find out what is it that you want to do, whether you are looking for data or you're looking for an existing dashboard. So, the catalog becomes embedded into the BI tool. >> Hey, Dave Menninger, sometimes you have some data in your back pocket. Do you have any stats (chuckles) on this topic? >> No, I'm glad you asked, because I'm going to... Now, data catalogs are something that's interesting. Sanjeev made a statement that data catalogs are falling out of favor. I don't care what you call them. They're valuable to organizations. Our research shows that organizations that have adequate data catalog technologies are three times more likely to express satisfaction with their analytics for just the reasons that Sanjeev was talking about. You can find what you want, you know you're getting the right information, you know whether or not it's trusted. So, those are good things. So, we expect to see the capabilities, whether it's embedded or separate. We expect to see those capabilities continue to permeate the market. >> And a lot of those catalogs are driven now by machine learning and things. So, they're learning from those patterns of usage by people when people use the data. (airy laughs) >> All right. Okay. Thank you, guys. All right. Let's move on to the next one. Tony Bear, let's bring up the predictions. You got something in here about the modern data stack. We need to rethink it. Is the modern data stack getting long at the tooth? Is it not so modern anymore? >> I think, in a way, it's got almost too modern. It's gotten too, I don't know if it's being long in the tooth, but it is getting long. The modern data stack, it's traditionally been defined as basically you have the data platform, which would be the operational database and the data warehouse. And in between, you have all the tools that are necessary to essentially get that data from the operational realm or the streaming realm for that matter into basically the data warehouse, or as we might be seeing more and more, the data lakehouse. And I think, what's important here is that, or I think, we have seen a lot of progress, and this would be in the cloud, is with the SaaS services. And especially you see that in the modern data stack, which is like all these players, not just the MongoDBs or the Oracles or the Amazons have their database platforms. You see they have the Informatica's, and all the other players there in Fivetrans have their own SaaS services. And within those SaaS services, you get a certain degree of simplicity, which is it takes all the housekeeping off the shoulders of the customers. That's a good thing. The problem is that what we're getting to unfortunately is what I would call lots of islands of simplicity, which means that it leads it (Dave laughing) to the customer to have to integrate or put all that stuff together. It's a complex tool chain. And so, what we really need to think about here, we have too many pieces. And going back to the discussion of catalogs, it's like we have so many catalogs out there, which one do we use? 'Cause chances are of most organizations do not rely on a single catalog at this point. What I'm calling on all the data providers or all the SaaS service providers, is to literally get it together and essentially make this modern data stack less of a stack, make it more of a blending of an end-to-end solution. And that can come in a number of different ways. Part of it is that we're data platform providers have been adding services that are adjacent. And there's some very good examples of this. We've seen progress over the past year or so. For instance, MongoDB integrating search. It's a very common, I guess, sort of tool that basically, that the applications that are developed on MongoDB use, so MongoDB then built it into the database rather than requiring an extra elastic search or open search stack. Amazon just... AWS just did the zero-ETL, which is a first step towards simplifying the process from going from Aurora to Redshift. You've seen same thing with Google, BigQuery integrating basically streaming pipelines. And you're seeing also a lot of movement in database machine learning. So, there's some good moves in this direction. I expect to see more than this year. Part of it's from basically the SaaS platform is adding some functionality. But I also see more importantly, because you're never going to get... This is like asking your data team and your developers, herding cats to standardizing the same tool. In most organizations, that is not going to happen. So, take a look at the most popular combinations of tools and start to come up with some pre-built integrations and pre-built orchestrations, and offer some promotional pricing, maybe not quite two for, but in other words, get two products for the price of two services or for the price of one and a half. I see a lot of potential for this. And it's to me, if the class was to simplify things, this is the next logical step and I expect to see more of this here. >> Yeah, and you see in Oracle, MySQL heat wave, yet another example of eliminating that ETL. Carl Olofson, today, if you think about the data stack and the application stack, they're largely separate. Do you have any thoughts on how that's going to play out? Does that play into this prediction? What do you think? >> Well, I think, that the... I really like Tony's phrase, islands of simplification. It really says (Tony chuckles) what's going on here, which is that all these different vendors you ask about, about how these stacks work. All these different vendors have their own stack vision. And you can... One application group is going to use one, and another application group is going to use another. And some people will say, let's go to, like you go to a Informatica conference and they say, we should be the center of your universe, but you can't connect everything in your universe to Informatica, so you need to use other things. So, the challenge is how do we make those things work together? As Tony has said, and I totally agree, we're never going to get to the point where people standardize on one organizing system. So, the alternative is to have metadata that can be shared amongst those systems and protocols that allow those systems to coordinate their operations. This is standard stuff. It's not easy. But the motive for the vendors is that they can become more active critical players in the enterprise. And of course, the motive for the customer is that things will run better and more completely. So, I've been looking at this in terms of two kinds of metadata. One is the meaning metadata, which says what data can be put together. The other is the operational metadata, which says basically where did it come from? Who created it? What's its current state? What's the security level? Et cetera, et cetera, et cetera. The good news is the operational stuff can actually be done automatically, whereas the meaning stuff requires some human intervention. And as we've already heard from, was it Doug, I think, people are disinclined to put a lot of definition into meaning metadata. So, that may be the harder one, but coordination is key. This problem has been with us forever, but with the addition of new data sources, with streaming data with data in different formats, the whole thing has, it's been like what a customer of mine used to say, "I understand your product can make my system run faster, but right now I just feel I'm putting my problems on roller skates. (chuckles) I don't need that to accelerate what's already not working." >> Excellent. Okay, Carl, let's stay with you. I remember in the early days of the big data movement, Hadoop movement, NoSQL was the big thing. And I remember Amr Awadallah said to us in theCUBE that SQL is the killer app for big data. So, your prediction here, if we bring that up is SQL is back. Please elaborate. >> Yeah. So, of course, some people would say, well, it never left. Actually, that's probably closer to true, but in the perception of the marketplace, there's been all this noise about alternative ways of storing, retrieving data, whether it's in key value stores or document databases and so forth. We're getting a lot of messaging that for a while had persuaded people that, oh, we're not going to do analytics in SQL anymore. We're going to use Spark for everything, except that only a handful of people know how to use Spark. Oh, well, that's a problem. Well, how about, and for ordinary conventional business analytics, Spark is like an over-engineered solution to the problem. SQL works just great. What's happened in the past couple years, and what's going to continue to happen is that SQL is insinuating itself into everything we're seeing. We're seeing all the major data lake providers offering SQL support, whether it's Databricks or... And of course, Snowflake is loving this, because that is what they do, and their success is certainly points to the success of SQL, even MongoDB. And we were all, I think, at the MongoDB conference where on one day, we hear SQL is dead. They're not teaching SQL in schools anymore, and this kind of thing. And then, a couple days later at the same conference, they announced we're adding a new analytic capability-based on SQL. But didn't you just say SQL is dead? So, the reality is that SQL is better understood than most other methods of certainly of retrieving and finding data in a data collection, no matter whether it happens to be relational or non-relational. And even in systems that are very non-relational, such as graph and document databases, their query languages are being built or extended to resemble SQL, because SQL is something people understand. >> Now, you remember when we were in high school and you had had to take the... Your debating in the class and you were forced to take one side and defend it. So, I was was at a Vertica conference one time up on stage with Curt Monash, and I had to take the NoSQL, the world is changing paradigm shift. And so just to be controversial, I said to him, Curt Monash, I said, who really needs acid compliance anyway? Tony Baer. And so, (chuckles) of course, his head exploded, but what are your thoughts (guests laughing) on all this? >> Well, my first thought is congratulations, Dave, for surviving being up on stage with Curt Monash. >> Amen. (group laughing) >> I definitely would concur with Carl. We actually are definitely seeing a SQL renaissance and if there's any proof of the pudding here, I see lakehouse is being icing on the cake. As Doug had predicted last year, now, (clears throat) for the record, I think, Doug was about a year ahead of time in his predictions that this year is really the year that I see (clears throat) the lakehouse ecosystems really firming up. You saw the first shots last year. But anyway, on this, data lakes will not go away. I've actually, I'm on the home stretch of doing a market, a landscape on the lakehouse. And lakehouse will not replace data lakes in terms of that. There is the need for those, data scientists who do know Python, who knows Spark, to go in there and basically do their thing without all the restrictions or the constraints of a pre-built, pre-designed table structure. I get that. Same thing for developing models. But on the other hand, there is huge need. Basically, (clears throat) maybe MongoDB was saying that we're not teaching SQL anymore. Well, maybe we have an oversupply of SQL developers. Well, I'm being facetious there, but there is a huge skills based in SQL. Analytics have been built on SQL. They came with lakehouse and why this really helps to fuel a SQL revival is that the core need in the data lake, what brought on the lakehouse was not so much SQL, it was a need for acid. And what was the best way to do it? It was through a relational table structure. So, the whole idea of acid in the lakehouse was not to turn it into a transaction database, but to make the data trusted, secure, and more granularly governed, where you could govern down to column and row level, which you really could not do in a data lake or a file system. So, while lakehouse can be queried in a manner, you can go in there with Python or whatever, it's built on a relational table structure. And so, for that end, for those types of data lakes, it becomes the end state. You cannot bypass that table structure as I learned the hard way during my research. So, the bottom line I'd say here is that lakehouse is proof that we're starting to see the revenge of the SQL nerds. (Dave chuckles) >> Excellent. Okay, let's bring up back up the predictions. Dave Menninger, this one's really thought-provoking and interesting. We're hearing things like data as code, new data applications, machines actually generating plans with no human involvement. And your prediction is the definition of data is expanding. What do you mean by that? >> So, I think, for too long, we've thought about data as the, I would say facts that we collect the readings off of devices and things like that, but data on its own is really insufficient. Organizations need to manipulate that data and examine derivatives of the data to really understand what's happening in their organization, why has it happened, and to project what might happen in the future. And my comment is that these data derivatives need to be supported and managed just like the data needs to be managed. We can't treat this as entirely separate. Think about all the governance discussions we've had. Think about the metadata discussions we've had. If you separate these things, now you've got more moving parts. We're talking about simplicity and simplifying the stack. So, if these things are treated separately, it creates much more complexity. I also think it creates a little bit of a myopic view on the part of the IT organizations that are acquiring these technologies. They need to think more broadly. So, for instance, metrics. Metric stores are becoming much more common part of the tooling that's part of a data platform. Similarly, feature stores are gaining traction. So, those are designed to promote the reuse and consistency across the AI and ML initiatives. The elements that are used in developing an AI or ML model. And let me go back to metrics and just clarify what I mean by that. So, any type of formula involving the data points. I'm distinguishing metrics from features that are used in AI and ML models. And the data platforms themselves are increasingly managing the models as an element of data. So, just like figuring out how to calculate a metric. Well, if you're going to have the features associated with an AI and ML model, you probably need to be managing the model that's associated with those features. The other element where I see expansion is around external data. Organizations for decades have been focused on the data that they generate within their own organization. We see more and more of these platforms acquiring and publishing data to external third-party sources, whether they're within some sort of a partner ecosystem or whether it's a commercial distribution of that information. And our research shows that when organizations use external data, they derive even more benefits from the various analyses that they're conducting. And the last great frontier in my opinion on this expanding world of data is the world of driver-based planning. Very few of the major data platform providers provide these capabilities today. These are the types of things you would do in a spreadsheet. And we all know the issues associated with spreadsheets. They're hard to govern, they're error-prone. And so, if we can take that type of analysis, collecting the occupancy of a rental property, the projected rise in rental rates, the fluctuations perhaps in occupancy, the interest rates associated with financing that property, we can project forward. And that's a very common thing to do. What the income might look like from that property income, the expenses, we can plan and purchase things appropriately. So, I think, we need this broader purview and I'm beginning to see some of those things happen. And the evidence today I would say, is more focused around the metric stores and the feature stores starting to see vendors offer those capabilities. And we're starting to see the ML ops elements of managing the AI and ML models find their way closer to the data platforms as well. >> Very interesting. When I hear metrics, I think of KPIs, I think of data apps, orchestrate people and places and things to optimize around a set of KPIs. It sounds like a metadata challenge more... Somebody once predicted they'll have more metadata than data. Carl, what are your thoughts on this prediction? >> Yeah, I think that what Dave is describing as data derivatives is in a way, another word for what I was calling operational metadata, which not about the data itself, but how it's used, where it came from, what the rules are governing it, and that kind of thing. If you have a rich enough set of those things, then not only can you do a model of how well your vacation property rental may do in terms of income, but also how well your application that's measuring that is doing for you. In other words, how many times have I used it, how much data have I used and what is the relationship between the data that I've used and the benefits that I've derived from using it? Well, we don't have ways of doing that. What's interesting to me is that folks in the content world are way ahead of us here, because they have always tracked their content using these kinds of attributes. Where did it come from? When was it created, when was it modified? Who modified it? And so on and so forth. We need to do more of that with the structure data that we have, so that we can track what it's used. And also, it tells us how well we're doing with it. Is it really benefiting us? Are we being efficient? Are there improvements in processes that we need to consider? Because maybe data gets created and then it isn't used or it gets used, but it gets altered in some way that actually misleads people. (laughs) So, we need the mechanisms to be able to do that. So, I would say that that's... And I'd say that it's true that we need that stuff. I think, that starting to expand is probably the right way to put it. It's going to be expanding for some time. I think, we're still a distance from having all that stuff really working together. >> Maybe we should say it's gestating. (Dave and Carl laughing) >> Sorry, if I may- >> Sanjeev, yeah, I was going to say this... Sanjeev, please comment. This sounds to me like it supports Zhamak Dehghani's principles, but please. >> Absolutely. So, whether we call it data mesh or not, I'm not getting into that conversation, (Dave chuckles) but data (audio breaking) (Tony laughing) everything that I'm hearing what Dave is saying, Carl, this is the year when data products will start to take off. I'm not saying they'll become mainstream. They may take a couple of years to become so, but this is data products, all this thing about vacation rentals and how is it doing, that data is coming from different sources. I'm packaging it into our data product. And to Carl's point, there's a whole operational metadata associated with it. The idea is for organizations to see things like developer productivity, how many releases am I doing of this? What data products are most popular? I'm actually in right now in the process of formulating this concept that just like we had data catalogs, we are very soon going to be requiring data products catalog. So, I can discover these data products. I'm not just creating data products left, right, and center. I need to know, do they already exist? What is the usage? If no one is using a data product, maybe I want to retire and save cost. But this is a data product. Now, there's a associated thing that is also getting debated quite a bit called data contracts. And a data contract to me is literally just formalization of all these aspects of a product. How do you use it? What is the SLA on it, what is the quality that I am prescribing? So, data product, in my opinion, shifts the conversation to the consumers or to the business people. Up to this point when, Dave, you're talking about data and all of data discovery curation is a very data producer-centric. So, I think, we'll see a shift more into the consumer space. >> Yeah. Dave, can I just jump in there just very quickly there, which is that what Sanjeev has been saying there, this is really central to what Zhamak has been talking about. It's basically about making, one, data products are about the lifecycle management of data. Metadata is just elemental to that. And essentially, one of the things that she calls for is making data products discoverable. That's exactly what Sanjeev was talking about. >> By the way, did everyone just no notice how Sanjeev just snuck in another prediction there? So, we've got- >> Yeah. (group laughing) >> But you- >> Can we also say that he snuck in, I think, the term that we'll remember today, which is metadata museums. >> Yeah, but- >> Yeah. >> And also comment to, Tony, to your last year's prediction, you're really talking about it's not something that you're going to buy from a vendor. >> No. >> It's very specific >> Mm-hmm. >> to an organization, their own data product. So, touche on that one. Okay, last prediction. Let's bring them up. Doug Henschen, BI analytics is headed to embedding. What does that mean? >> Well, we all know that conventional BI dashboarding reporting is really commoditized from a vendor perspective. It never enjoyed truly mainstream adoption. Always that 25% of employees are really using these things. I'm seeing rising interest in embedding concise analytics at the point of decision or better still, using analytics as triggers for automation and workflows, and not even necessitating human interaction with visualizations, for example, if we have confidence in the analytics. So, leading companies are pushing for next generation applications, part of this low-code, no-code movement we've seen. And they want to build that decision support right into the app. So, the analytic is right there. Leading enterprise apps vendors, Salesforce, SAP, Microsoft, Oracle, they're all building smart apps with the analytics predictions, even recommendations built into these applications. And I think, the progressive BI analytics vendors are supporting this idea of driving insight to action, not necessarily necessitating humans interacting with it if there's confidence. So, we want prediction, we want embedding, we want automation. This low-code, no-code development movement is very important to bringing the analytics to where people are doing their work. We got to move beyond the, what I call swivel chair integration, between where people do their work and going off to separate reports and dashboards, and having to interpret and analyze before you can go back and do take action. >> And Dave Menninger, today, if you want, analytics or you want to absorb what's happening in the business, you typically got to go ask an expert, and then wait. So, what are your thoughts on Doug's prediction? >> I'm in total agreement with Doug. I'm going to say that collectively... So, how did we get here? I'm going to say collectively as an industry, we made a mistake. We made BI and analytics separate from the operational systems. Now, okay, it wasn't really a mistake. We were limited by the technology available at the time. Decades ago, we had to separate these two systems, so that the analytics didn't impact the operations. You don't want the operations preventing you from being able to do a transaction. But we've gone beyond that now. We can bring these two systems and worlds together and organizations recognize that need to change. As Doug said, the majority of the workforce and the majority of organizations doesn't have access to analytics. That's wrong. (chuckles) We've got to change that. And one of the ways that's going to change is with embedded analytics. 2/3 of organizations recognize that embedded analytics are important and it even ranks higher in importance than AI and ML in those organizations. So, it's interesting. This is a really important topic to the organizations that are consuming these technologies. The good news is it works. Organizations that have embraced embedded analytics are more comfortable with self-service than those that have not, as opposed to turning somebody loose, in the wild with the data. They're given a guided path to the data. And the research shows that 65% of organizations that have adopted embedded analytics are comfortable with self-service compared with just 40% of organizations that are turning people loose in an ad hoc way with the data. So, totally behind Doug's predictions. >> Can I just break in with something here, a comment on what Dave said about what Doug said, which (laughs) is that I totally agree with what you said about embedded analytics. And at IDC, we made a prediction in our future intelligence, future of intelligence service three years ago that this was going to happen. And the thing that we're waiting for is for developers to build... You have to write the applications to work that way. It just doesn't happen automagically. Developers have to write applications that reference analytic data and apply it while they're running. And that could involve simple things like complex queries against the live data, which is through something that I've been calling analytic transaction processing. Or it could be through something more sophisticated that involves AI operations as Doug has been suggesting, where the result is enacted pretty much automatically unless the scores are too low and you need to have a human being look at it. So, I think that that is definitely something we've been watching for. I'm not sure how soon it will come, because it seems to take a long time for people to change their thinking. But I think, as Dave was saying, once they do and they apply these principles in their application development, the rewards are great. >> Yeah, this is very much, I would say, very consistent with what we were talking about, I was talking about before, about basically rethinking the modern data stack and going into more of an end-to-end solution solution. I think, that what we're talking about clearly here is operational analytics. There'll still be a need for your data scientists to go offline just in their data lakes to do all that very exploratory and that deep modeling. But clearly, it just makes sense to bring operational analytics into where people work into their workspace and further flatten that modern data stack. >> But with all this metadata and all this intelligence, we're talking about injecting AI into applications, it does seem like we're entering a new era of not only data, but new era of apps. Today, most applications are about filling forms out or codifying processes and require a human input. And it seems like there's enough data now and enough intelligence in the system that the system can actually pull data from, whether it's the transaction system, e-commerce, the supply chain, ERP, and actually do something with that data without human involvement, present it to humans. Do you guys see this as a new frontier? >> I think, that's certainly- >> Very much so, but it's going to take a while, as Carl said. You have to design it, you have to get the prediction into the system, you have to get the analytics at the point of decision has to be relevant to that decision point. >> And I also recall basically a lot of the ERP vendors back like 10 years ago, we're promising that. And the fact that we're still looking at the promises shows just how difficult, how much of a challenge it is to get to what Doug's saying. >> One element that could be applied in this case is (indistinct) architecture. If applications are developed that are event-driven rather than following the script or sequence that some programmer or designer had preconceived, then you'll have much more flexible applications. You can inject decisions at various points using this technology much more easily. It's a completely different way of writing applications. And it actually involves a lot more data, which is why we should all like it. (laughs) But in the end (Tony laughing) it's more stable, it's easier to manage, easier to maintain, and it's actually more efficient, which is the result of an MIT study from about 10 years ago, and still, we are not seeing this come to fruition in most business applications. >> And do you think it's going to require a new type of data platform database? Today, data's all far-flung. We see that's all over the clouds and at the edge. Today, you cache- >> We need a super cloud. >> You cache that data, you're throwing into memory. I mentioned, MySQL heat wave. There are other examples where it's a brute force approach, but maybe we need new ways of laying data out on disk and new database architectures, and just when we thought we had it all figured out. >> Well, without referring to disk, which to my mind, is almost like talking about cave painting. I think, that (Dave laughing) all the things that have been mentioned by all of us today are elements of what I'm talking about. In other words, the whole improvement of the data mesh, the improvement of metadata across the board and improvement of the ability to track data and judge its freshness the way we judge the freshness of a melon or something like that, to determine whether we can still use it. Is it still good? That kind of thing. Bringing together data from multiple sources dynamically and real-time requires all the things we've been talking about. All the predictions that we've talked about today add up to elements that can make this happen. >> Well, guys, it's always tremendous to get these wonderful minds together and get your insights, and I love how it shapes the outcome here of the predictions, and let's see how we did. We're going to leave it there. I want to thank Sanjeev, Tony, Carl, David, and Doug. Really appreciate the collaboration and thought that you guys put into these sessions. Really, thank you. >> Thank you. >> Thanks, Dave. >> Thank you for having us. >> Thanks. >> Thank you. >> All right, this is Dave Valente for theCUBE, signing off for now. Follow these guys on social media. Look for coverage on siliconangle.com, theCUBE.net. Thank you for watching. (upbeat music)

Published Date : Jan 11 2023

SUMMARY :

and pleased to tell you (Tony and Dave faintly speaks) that led them to their conclusion. down, the funding in VC IPO market. And I like how the fact And I happened to have tripped across I talked to Walmart in the prediction of graph databases. But I stand by the idea and maybe to the edge. You can apply graphs to great And so, it's going to streaming data permeates the landscape. and to be honest, I like the tough grading the next 20 to 25% of and of course, the degree of difficulty. that sits on the side, Thank you for that. And I have to disagree. So, the catalog becomes Do you have any stats for just the reasons that And a lot of those catalogs about the modern data stack. and more, the data lakehouse. and the application stack, So, the alternative is to have metadata that SQL is the killer app for big data. but in the perception of the marketplace, and I had to take the NoSQL, being up on stage with Curt Monash. (group laughing) is that the core need in the data lake, And your prediction is the and examine derivatives of the data to optimize around a set of KPIs. that folks in the content world (Dave and Carl laughing) going to say this... shifts the conversation to the consumers And essentially, one of the things (group laughing) the term that we'll remember today, to your last year's prediction, is headed to embedding. and going off to separate happening in the business, so that the analytics didn't And the thing that we're waiting for and that deep modeling. that the system can of decision has to be relevant And the fact that we're But in the end We see that's all over the You cache that data, and improvement of the and I love how it shapes the outcome here Thank you for watching.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Doug HenschenPERSON

0.99+

Dave MenningerPERSON

0.99+

DougPERSON

0.99+

CarlPERSON

0.99+

Carl OlofsonPERSON

0.99+

Dave MenningerPERSON

0.99+

Tony BaerPERSON

0.99+

TonyPERSON

0.99+

Dave ValentePERSON

0.99+

CollibraORGANIZATION

0.99+

Curt MonashPERSON

0.99+

Sanjeev MohanPERSON

0.99+

Christian KleinermanPERSON

0.99+

Dave ValentePERSON

0.99+

WalmartORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

AWSORGANIZATION

0.99+

SanjeevPERSON

0.99+

Constellation ResearchORGANIZATION

0.99+

IBMORGANIZATION

0.99+

Ventana ResearchORGANIZATION

0.99+

2022DATE

0.99+

HazelcastORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Tony BearPERSON

0.99+

25%QUANTITY

0.99+

2021DATE

0.99+

last yearDATE

0.99+

65%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

todayDATE

0.99+

five-yearQUANTITY

0.99+

TigerGraphORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

two servicesQUANTITY

0.99+

AmazonORGANIZATION

0.99+

DavidPERSON

0.99+

RisingWave LabsORGANIZATION

0.99+

Breaking Analysis: Enterprise Technology Predictions 2022


 

>> From theCUBE Studios in Palo Alto and Boston, bringing you data-driven insights from theCUBE and ETR, this is Breaking Analysis with Dave Vellante. >> The pandemic has changed the way we think about and predict the future. As we enter the third year of a global pandemic, we see the significant impact that it's had on technology strategy, spending patterns, and company fortunes Much has changed. And while many of these changes were forced reactions to a new abnormal, the trends that we've seen over the past 24 months have become more entrenched, and point to the way that's coming ahead in the technology business. Hello and welcome to this week's Wikibon CUBE Insights powered by ETR. In this Breaking Analysis, we welcome our partner and colleague and business friend, Erik Porter Bradley, as we deliver what's becoming an annual tradition for Erik and me, our predictions for Enterprise Technology in 2022 and beyond Erik, welcome. Thanks for taking some time out. >> Thank you, Dave. Luckily we did pretty well last year, so we were able to do this again. So hopefully we can keep that momentum going. >> Yeah, you know, I want to mention that, you know, we get a lot of inbound predictions from companies and PR firms that help shape our thinking. But one of the main objectives that we have is we try to make predictions that can be measured. That's why we use a lot of data. Now not all will necessarily fit that parameter, but if you've seen the grading of our 2021 predictions that Erik and I did, you'll see we do a pretty good job of trying to put forth prognostications that can be declared correct or not, you know, as black and white as possible. Now let's get right into it. Our first prediction, we're going to go run into spending, something that ETR surveys for quarterly. And we've reported extensively on this. We're calling for tech spending to increase somewhere around 8% in 2022, we can see there on the slide, Erik, we predicted spending last year would increase by 4% IDC. Last check was came in at five and a half percent. Gardner was somewhat higher, but in general, you know, not too bad, but looking ahead, we're seeing an acceleration from the ETR September surveys, as you can see in the yellow versus the blue bar in this chart, many of the SMBs that were hard hit by the pandemic are picking up spending again. And the ETR data is showing acceleration above the mean for industries like energy, utilities, retail, and services, and also, notably, in the Forbes largest 225 private companies. These are companies like Mars or Koch industries. They're predicting well above average spending for 2022. So Erik, please weigh in here. >> Yeah, a lot to bring up on this one, I'm going to be quick. So 1200 respondents on this, over a third of which were at the C-suite level. So really good data that we brought in, the usual bucket of, you know, fortune 500, global 2000 make up the meat of that median, but it's 8.3% and rising with momentum as we see. What's really interesting right now is that energy and utilities. This is usually like, you know, an orphan stock dividend type of play. You don't see them at the highest point of tech spending. And the reason why right now is really because this state of tech infrastructure in our energy infrastructure needs help. And it's obvious, remember the Florida municipality break reach last year? When they took over the water systems or they had the ability to? And this is a real issue, you know, there's bad nation state actors out there, and I'm no alarmist, but the energy and utility has to spend this money to keep up. It's really important. And then you also hit on the retail consumer. Obviously what's happened, the work from home shift created a shop from home shift, and the trends that are happening right now in retail. If you don't spend and keep up, you're not going to be around much longer. So I think the really two interesting things here to call out are energy utilities, usually a laggard in IT spend and it's leading, and also retail consumer, a lot of changes happening. >> Yeah. Great stuff. I mean, I recall when we entered the pandemic, really ETR was the first to emphasize the impact that work from home was going to have, so I really put a lot of weight on this data. Okay. Our next prediction is we're going to get into security, it's one of our favorite topics. And that is that the number one priority that needs to be addressed by organizations in 2022 is security and you can see, in this slide, the degree to which security is top of mind, relative to some other pretty important areas like cloud, productivity, data, and automation, and some others. Now people may say, "Oh, this is obvious." But I'm going to add some context here, Erik, and then bring you in. First, organizations, they don't have unlimited budgets. And there are a lot of competing priorities for dollars, especially with the digital transformation mandate. And depending on the size of the company, this data will vary. For example, while security is still number one at the largest public companies, and those are of course of the biggest spenders, it's not nearly as pronounced as it is on average, or in, for example, mid-sized companies and government agencies. And this is because midsized companies or smaller companies, they don't have the resources that larger companies do. Larger companies have done a better job of securing their infrastructure. So these mid-size firms are playing catch up and the data suggests cyber is even a bigger priority there, gaps that they have to fill, you know, going forward. And that's why we think there's going to be more demand for MSSPs, managed security service providers. And we may even see some IPO action there. And then of course, Erik, you and I have talked about events like the SolarWinds Hack, there's more ransomware attacks, other vulnerabilities. Just recently, like Log4j in December. All of this has heightened concerns. Now I want to talk a little bit more about how we measure this, you know, relatively, okay, it's an obvious prediction, but let's stick our necks out a little bit. And so in addition to the rise of managed security services, we're calling for M&A and/or IPOs, we've specified some names here on this chart, and we're also pointing to the digital supply chain as an area of emphasis. Again, Log4j really shone that under a light. And this is going to help the likes of Auth0, which is now Okta, SailPoint, which is called out on this chart, and some others. We're calling some winners in end point security. Erik, you're going to talk about sort of that lifecycle, that transformation that we're seeing, that migration to new endpoint technologies that are going to benefit from this reset refresh cycle. So Erik, weigh in here, let's talk about some of the elements of this prediction and some of the names on that chart. >> Yeah, certainly. I'm going to start right with Log4j top of mind. And the reason why is because we're seeing a real paradigm shift here where things are no longer being attacked at the network layer, they're being attacked at the application layer, and in the application stack itself. And that is a huge shift left. And that's taking in DevSecOps now as a real priority in 2022. That's a real paradigm shift over the last 20 years. That's not where attacks used to come from. And this is going to have a lot of changes. You called out a bunch of names in there that are, they're either going to work. I would add to that list Wiz. I would add Orca Security. Two names in our emerging technology study, in addition to the ones you added that are involved in cloud security and container security. These names are either going to get gobbled up. So the traditional legacy names are going to have to start writing checks and, you know, legacy is not fair, but they're in the data center, right? They're, on-prem, they're not cloud native. So these are the names that money is going to be flowing to. So they're either going to get gobbled up, or we're going to see some IPO's. And on the other thing I want to talk about too, is what you mentioned. We have CrowdStrike on that list, We have SentinalOne on the list. Everyone knows them. Our data was so strong on Tanium that we actually went positive for the first time just today, just this morning, where that was released. The trifecta of these are so important because of what you mentioned, under resourcing. We can't have security just tell us when something happens, it has to automate, and it has to respond. So in this next generation of EDR and XDR, an automated response has to happen because people are under-resourced, salaries are really high, there's a skill shortage out there. Security has to become responsive. It can't just monitor anymore. >> Yeah. Great. And we should call out too. So we named some names, Snyk, Aqua, Arctic Wolf, Lacework, Netskope, Illumio. These are all sort of IPO, or possibly even M&A candidates. All right. Our next prediction goes right to the way we work. Again, something that ETR has been on for awhile. We're calling for a major rethink in remote work for 2022. We had predicted last year that by the end of 2021, there'd be a larger return to the office with the norm being around a third of workers permanently remote. And of course the variants changed that equation and, you know, gave more time for people to think about this idea of hybrid work and that's really come in to focus. So we're predicting that is going to overtake fully remote as the dominant work model with only about a third of the workers back in the office full-time. And Erik, we expect a somewhat lower percentage to be fully remote. It's now sort of dipped under 30%, at around 29%, but it's still significantly higher than the historical average of around 15 to 16%. So still a major change, but this idea of hybrid and getting hybrid right, has really come into focus. Hasn't it? >> Yeah. It's here to stay. There's no doubt about it. We started this in March of 2020, as soon as the virus hit. This is the 10th iteration of the survey. No one, no one ever thought we'd see a number where only 34% of people were going to be in office permanently. That's a permanent number. They're expecting only a third of the workers to ever come back fully in office. And against that, there's 63% that are saying their permanent workforce is going to be either fully remote or hybrid. And this, I can't really explain how big of a paradigm shift this is. Since the start of the industrial revolution, people leave their house and go to work. Now they're saying that's not going to happen. The economic impact here is so broad, on so many different areas And, you know, the reason is like, why not? Right? The productivity increase is real. We're seeing the productivity increase. Enterprises are spending on collaboration tools, productivity tools, We're seeing an increased perception in productivity of their workforce. And the CFOs can cut down an expense item. I just don't see a reason why this would end, you know, I think it's going to continue. And I also want to point out these results, as high as they are, were before the Omicron wave hit us. I can only imagine what these results would have been if we had sent the survey out just two or three weeks later. >> Yeah. That's a great point. Okay. Next prediction, we're going to look at the supply chain, specifically in how it's affecting some of the hardware spending and cloud strategies in the future. So in this chart, ETRS buyers, have you experienced problems procuring hardware as a result of supply chain issues? And, you know, despite the fact that some companies are, you know, I would call out Dell, for example, doing really well in terms of delivering, you can see that in the numbers, it's pretty clear, there's been an impact. And that's not not an across the board, you know, thing where vendors are able to deliver, especially acute in PCs, but also pronounced in networking, also in firewall servers and storage. And what's interesting is how companies are responding and reacting. So first, you know, I'm going to call the laptop and PC demand staying well above pre-COVID norms. It had peaked in 2012. Pre-pandemic it kept dropping and dropping and dropping, in terms of, you know, unit volume, where the market was contracting. And we think can continue to grow this year in double digits in 2022. But what's interesting, Erik, is when you survey customers, is despite the difficulty they're having in procuring network hardware, there's as much of a migration away from existing networks to the cloud. You could probably comment on that. Their networks are more fossilized, but when it comes to firewalls and servers and storage, there's a much higher propensity to move to the cloud. 30% of customers that ETR surveyed will replace security appliances with cloud services and 41% and 34% respectively will move to cloud compute and storage in 2022. So cloud's relentless march on traditional on-prem models continues. Erik, what do you make of this data? Please weigh in on this prediction. >> As if we needed another reason to go to the cloud. Right here, here it is yet again. So this was added to the survey by client demand. They were asking about the procurement difficulties, the supply chain issues, and how it was impacting our community. So this is the first time we ran it. And it really was interesting to see, you know, the move there. And storage particularly I found interesting because it correlated with a huge jump that we saw on one of our vendor names, which was Rubrik, had the highest net score that it's ever had. So clearly we're seeing some correlation with some of these names that are there, you know, really well positioned to take storage, to take data into the cloud. So again, you didn't need another reason to, you know, hasten this digital transformation, but here we are, we have it yet again, and I don't see it slowing down anytime soon. >> You know, that's a really good point. I mean, it's not necessarily bad news for the... I mean, obviously you wish that it had no change, would be great, but things, you know, always going to change. So we'll talk about this a little bit later when we get into the Supercloud conversation, but this is an opportunity for people who embrace the cloud. So we'll come back to that. And I want to hang on cloud a bit and share some recent projections that we've made. The next prediction is the big four cloud players are going to surpass 167 billion, an IaaS and PaaS revenue in 2022. We track this. Observers of this program know that we try to create an apples to apples comparison between AWS, Azure, GCP and Alibaba in IaaS and PaaS. So we're calling for 38% revenue growth in 2022, which is astounding for such a massive market. You know, AWS is probably not going to hit a hundred billion dollar run rate, but they're going to be close this year. And we're going to get there by 2023, you know they're going to surpass that. Azure continues to close the gap. Now they're about two thirds of the size of AWS and Google, we think is going to surpass Alibaba and take the number three spot. Erik, anything you'd like to add here? >> Yeah, first of all, just on a sector level, we saw our sector, new survey net score on cloud jumped another 10%. It was already really high at 48. Went up to 53. This train is not slowing down anytime soon. And we even added an edge compute type of player, like CloudFlare into our cloud bucket this year. And it debuted with a net score of almost 60. So this is really an area that's expanding, not just the big three, but everywhere. We even saw Oracle and IBM jump up. So even they're having success, taking some of their on-prem customers and then selling them to their cloud services. This is a massive opportunity and it's not changing anytime soon, it's going to continue. >> And I think the operative word there is opportunity. So, you know, the next prediction is something that we've been having fun with and that's this Supercloud becomes a thing. Now, the reason I say we've been having fun is we put this concept of Supercloud out and it's become a bit of a controversy. First, you know, what the heck's the Supercloud right? It's sort of a buzz-wordy term, but there really is, we believe, a thing here. We think there needs to be a rethinking or at least an evolution of the term multi-cloud. And what we mean is that in our view, you know, multicloud from a vendor perspective was really cloud compatibility. It wasn't marketed that way, but that's what it was. Either a vendor would containerize its legacy stack, shove it into the cloud, or a company, you know, they'd do the work, they'd build a cloud native service on one of the big clouds and they did do it for AWS, and then Azure, and then Google. But there really wasn't much, if any, leverage across clouds. Now from a buyer perspective, we've always said multicloud was a symptom of multi-vendor, meaning I got different workloads, running in different clouds, or I bought a company and they run on Azure, and I do a lot of work on AWS, but generally it wasn't necessarily a prescribed strategy to build value on top of hyperscale infrastructure. There certainly was somewhat of a, you know, reducing lock-in and hedging the risk. But we're talking about something more here. We're talking about building value on top of the hyperscale gift of hundreds of billions of dollars in CapEx. So in addition, we're not just talking about transforming IT, which is what the last 10 years of cloud have been like. And, you know, doing work in the cloud because it's cheaper or simpler or more agile, all of those things. So that's beginning to change. And this chart shows some of the technology vendors that are leaning toward this Supercloud vision, in our view, building on top of the hyperscalers that are highlighted in red. Now, Jerry Chan at Greylock, they wrote a piece called Castles in the Cloud. It got our thinking going, and he and the team at Greylock, they're building out a database of all the cloud services and all the sub-markets in cloud. And that got us thinking that there's a higher level of abstraction coalescing in the market, where there's tight integration of services across clouds, but the underlying complexity is hidden, and there's an identical experience across clouds, and even, in my dreams, on-prem for some platforms, so what's new or new-ish and evolving are things like location independence, you've got to include the edge on that, metadata services to optimize locality of reference and data source awareness, governance, privacy, you know, application independent and dependent, actually, recovery across clouds. So we're seeing this evolve. And in our view, the two biggest things that are new are the technology is evolving, where you're seeing services truly integrate cross-cloud. And the other big change is digital transformation, where there's this new innovation curve developing, and it's not just about making your IT better. It's about SaaS-ifying and automating your entire company workflows. So Supercloud, it's not just a vendor thing to us. It's the evolution of, you know, the, the Marc Andreessen quote, "Every company will be a SaaS company." Every company will deliver capabilities that can be consumed as cloud services. So Erik, the chart shows spending momentum on the y-axis and net score, or presence in the ETR data center, or market share on the x-axis. We've talked about snowflake as the poster child for this concept where the vision is you're in their cloud and sharing data in that safe place. Maybe you could make some comments, you know, what do you think of this Supercloud concept and this change that we're sensing in the market? >> Well, I think you did a great job describing the concept. So maybe I'll support it a little bit on the vendor level and then kind of give examples of the ones that are doing it. You stole the lead there with Snowflake, right? There is no better example than what we've seen with what Snowflake can do. Cross-portability in the cloud, the ability to be able to be, you know, completely agnostic, but then build those services on top. They're better than anything they could offer. And it's not just there. I mean, you mentioned edge compute, that's a whole nother layer where this is coming in. And CloudFlare, the momentum there is out of control. I mean, this is a company that started off just doing CDN and trying to compete with Okta Mite. And now they're giving you a full soup to nuts with security and actual edge compute layer, but it's a fantastic company. What they're doing, it's another great example of what you're seeing here. I'm going to call out HashiCorp as well. They're more of an infrastructure services, a little bit more of an open-source freemium model, but what they're doing as well is completely cloud agnostic. It's dynamic. It doesn't care if you're in a container, it doesn't matter where you are. They recently IPO'd and they're down 25%, but their data looks so good across both of our emerging technology and TISA survey. It's certainly another name that's playing on this. And another one that we mentioned as well is Rubrik. If you need storage, compute, and in the cloud layer and you need to be agnostic to it, they're another one that's really playing in this space. So I think it's a great concept you're bringing up. I think it's one that's here to stay and there's certainly a lot of vendors that fit into what you're describing. >> Excellent. Thank you. All right, let's shift to data. The next prediction, it might be a little tough to measure. Before I said we're trying to be a little black and white here, but it relates to Data Mesh, which is, the ideas behind that term were created by Zhamak Dehghani of ThoughtWorks. And we see Data Mesh is really gaining momentum in 2022, but it's largely going to be, we think, confined to a more narrow scope. Now, the impetus for change in data architecture in many companies really stems from the fact that their Hadoop infrastructure really didn't solve their data problems and they struggle to get more value out of their data investments. Data Mesh prescribes a shift to a decentralized architecture in domain ownership of data and a shift to data product thinking, beyond data for analytics, but data products and services that can be monetized. Now this a very powerful in our view, but they're difficult for organizations to get their heads around and further decentralization creates the need for a self-service platform and federated data governance that can be automated. And not a lot of standards around this. So it's going to take some time. At our power panel a couple of weeks ago on data management, Tony Baer predicted a backlash on Data Mesh. And I don't think it's going to be so much of a backlash, but rather the adoption will be more limited. Most implementations we think are going to use a starting point of AWS and they'll enable domains to access and control their own data lakes. And while that is a very small slice of the Data Mesh vision, I think it's going to be a starting point. And the last thing I'll say is, this is going to take a decade to evolve, but I think it's the right direction. And whether it's a data lake or a data warehouse or a data hub or an S3 bucket, these are really, the concept is, they'll eventually just become nodes on the data mesh that are discoverable and access is governed. And so the idea is that the stranglehold that the data pipeline and process and hyper-specialized roles that they have on data agility is going to evolve. And decentralized architectures and the democratization of data will eventually become a norm for a lot of different use cases. And Erik, I wonder if you'd add anything to this. >> Yeah. There's a lot to add there. The first thing that jumped out to me was that that mention of the word backlash you said, and you said it's not really a backlash, but what it could be is these are new words trying to solve an old problem. And I do think sometimes the industry will notice that right away and maybe that'll be a little pushback. And the problems are what you already mentioned, right? We're trying to get to an area where we can have more assets in our data site, more deliverable, and more usable and relevant to the business. And you mentioned that as self-service with governance laid on top. And that's really what we're trying to get to. Now, there's a lot of ways you can get there. Data fabric is really the technical aspect and data mesh is really more about the people, the process, and the governance, but the two of those need to meet, in order to make that happen. And as far as tools, you know, there's even cataloging names like Informatica that play in this, right? Istio plays in this, Snowflake plays in this. So there's a lot of different tools that will support it. But I think you're right in calling out AWS, right? They have AWS Lake, they have AWS Glue. They have so much that's trying to drive this. But I think the really important thing to keep here is what you said. It's going to be a decade long journey. And by the way, we're on the shoulders of giants a decade ago that have even gotten us to this point to talk about these new words because this has been an ongoing type of issue, but ultimately, no matter which vendors you use, this is going to come down to your data governance plan and the data literacy in your business. This is really about workflows and people as much as it is tools. So, you know, the new term of data mesh is wonderful, but you still have to have the people and the governance and the processes in place to get there. >> Great, thank you for that, Erik. Some great points. All right, for the next prediction, we're going to shine the spotlight on two of our favorite topics, Snowflake and Databricks, and the prediction here is that, of course, Databricks is going to IPO this year, as expected. Everybody sort of expects that. And while, but the prediction really is, well, while these two companies are facing off already in the market, they're also going to compete with each other for M&A, especially as Databricks, you know, after the IPO, you're going to have, you know, more prominence and a war chest. So first, these companies, they're both looking pretty good, the same XY graph with spending velocity and presence and market share on the horizontal axis. And both Snowflake and Databricks are well above that magic 40% red dotted line, the elevated line, to us. And for context, we've included a few other firms. So you can see kind of what a good position these two companies are really in, especially, I mean, Snowflake, wow, it just keeps moving to the right on this horizontal picture, but maintaining the next net score in the Y axis. Amazing. So, but here's the thing, Databricks is using the term Lakehouse implying that it has the best of data lakes and data warehouses. And Snowflake has the vision of the data cloud and data sharing. And Snowflake, they've nailed analytics, and now they're moving into data science in the domain of Databricks. Databricks, on the other hand, has nailed data science and is moving into the domain of Snowflake, in the data warehouse and analytics space. But to really make this seamless, there has to be a semantic layer between these two worlds and they're either going to build it or buy it or both. And there are other areas like data clean rooms and privacy and data prep and governance and machine learning tooling and AI, all that stuff. So the prediction is they'll not only compete in the market, but they'll step up and in their competition for M&A, especially after the Databricks IPO. We've listed some target names here, like Atscale, you know, Iguazio, Infosum, Habu, Immuta, and I'm sure there are many, many others. Erik, you care to comment? >> Yeah. I remember a year ago when we were talking Snowflake when they first came out and you, and I said, "I'm shocked if they don't use this war chest of money" "and start going after more" "because we know Slootman, we have so much respect for him." "We've seen his playbook." And I'm actually a little bit surprised that here we are, at 12 months later, and he hasn't spent that money yet. So I think this prediction's just spot on. To talk a little bit about the data side, Snowflake is in rarefied air. It's all by itself. It is the number one net score in our entire TISA universe. It is absolutely incredible. There's almost no negative intentions. Global 2000 organizations are increasing their spend on it. We maintain our positive outlook. It's really just, you know, stands alone. Databricks, however, also has one of the highest overall net sentiments in the entire universe, not just its area. And this is the first time we're coming up positive on this name as well. It looks like it's not slowing down. Really interesting comment you made though that we normally hear from our end-user commentary in our panels and our interviews. Databricks is really more used for the data science side. The MLAI is where it's best positioned in our survey. So it might still have some catching up to do to really have that caliber of usability that you know Snowflake is seeing right now. That's snowflake having its own marketplace. There's just a lot more to Snowflake right now than there is Databricks. But I do think you're right. These two massive vendors are sort of heading towards a collision course, and it'll be very interesting to see how they deploy their cash. I think Snowflake, with their incredible management and leadership, probably will make the first move. >> Well, I think you're right on that. And by the way, I'll just add, you know, Databricks has basically said, hey, it's going to be easier for us to come from data lakes into data warehouse. I'm not sure I buy that. I think, again, that semantic layer is a missing ingredient. So it's going to be really interesting to see how this plays out. And to your point, you know, Snowflake's got the war chest, they got the momentum, they've got the public presence now since November, 2020. And so, you know, they're probably going to start making some aggressive moves. Anyway, next prediction is something, Erik, that you and I have talked about many, many times, and that is observability. I know it's one of your favorite topics. And we see this world screaming for more consolidation it's going all in on cloud native. These legacy stacks, they're fighting to stay relevant, but the direction is pretty clear. And the same XY graph lays out the players in the field, with some of the new entrants that we've also highlighted, like Observe and Honeycomb and ChaosSearch that we've talked about. Erik, we put a big red target around Splunk because everyone wants their gold. So please give us your thoughts. >> Oh man, I feel like I've been saying negative things about Splunk for too long. I've got a bad rap on this name. The Splunk shareholders come after me all the time. Listen, it really comes down to this. They're a fantastic company that was designed to do logging and monitoring and had some great tool sets around what you could do with it. But they were designed for the data center. They were designed for prem. The world we're in now is so dynamic. Everything I hear from our end user community is that all net new workloads will be going to cloud native players. It's that simple. So Splunk has entrenched. It's going to continue doing what it's doing and it does it really, really well. But if you're doing something new, the new workloads are going to be in a dynamic environment and that's going to go to the cloud native players. And in our data, it is extremely clear that that means Datadog and Elastic. They are by far number one and two in net score, increase rates, adoption rates. It's not even close. Even New Relic actually is starting to, you know, entrench itself really well. We saw New Relic's adoption's going up, which is super important because they went to that freemium model, you know, to try to get their little bit of an entrenched customer base and that's working as well. And then you made a great list here, of all the new entrants, but it goes beyond this. There's so many more. In our emerging technology survey, we're seeing Century, Catchpoint, Securonix, Lucid Works. There are so many options in this space. And let's not forget, the biggest data that we're seeing is with Grafana. And Grafana labs as yet to turn on their enterprise. Elastic did it, why can't Grafana labs do it? They have an enterprise stack. So when you look at how crowded this space is, there has to be consolidation. I recently hosted a panel and every single guy on that panel said, "Please give me a consolidation." Because they're the end users trying to actually deploy these and it's getting a little bit confusing. >> Great. Thank you for that. Okay. Last prediction. Erik, might be a little out of your wheelhouse, but you know, you might have some thoughts on it. And that's a hybrid events become the new digital model and a new category in 2022. You got these pure play digital or virtual events. They're going to take a back seat to in-person hybrids. The virtual experience will eventually give way to metaverse experiences and that's going to take some time, but the physical hybrid is going to drive it. And metaverse is ultimately going to define the virtual experience because the virtual experience today is not great. Nobody likes virtual. And hybrid is going to become the business model. Today's pure virtual experience has to evolve, you know, theCUBE first delivered hybrid mid last decade, but nobody really wanted it. We did Mobile World Congress last summer in Barcelona in an amazing hybrid model, which we're showing in some of the pictures here. Alex, if you don't mind bringing that back up. And every physical event that we're we're doing now has a hybrid and virtual component, including the pre-records. You can see in our studios, you see that the green screen. I don't know. Erik, what do you think about, you know, the Zoom fatigue and all this. I know you host regular events with your round tables, but what are your thoughts? >> Well, first of all, I think you and your company here have just done an amazing job on this. So that's really your expertise. I spent 20 years of my career hosting intimate wall street idea dinners. So I'm better at navigating a wine list than I am navigating a conference floor. But I will say that, you know, the trend just goes along with what we saw. If 35% are going to be fully remote. If 70% are going to be hybrid, then our events are going to be as well. I used to host round table dinners on, you know, one or two nights a week. Now those have gone virtual. They're now panels. They're now one-on-one interviews. You know, we do chats. We do submitted questions. We do what we can, but there's no reason that this is going to change anytime soon. I think you're spot on here. >> Yeah. Great. All right. So there you have it, Erik and I, Listen, we always love the feedback. Love to know what you think. Thank you, Erik, for your partnership, your collaboration, and love doing these predictions with you. >> Yeah. I always enjoy them too. And I'm actually happy. Last year you made us do a baker's dozen, so thanks for keeping it to 10 this year. >> (laughs) We've got a lot to say. I know, you know, we cut out. We didn't do much on crypto. We didn't really talk about SaaS. I mean, I got some thoughts there. We didn't really do much on containers and AI. >> You want to keep going? I've got another 10 for you. >> RPA...All right, we'll have you back and then let's do that. All right. All right. Don't forget, these episodes are all available as podcasts, wherever you listen, all you can do is search Breaking Analysis podcast. Check out ETR's website at etr.plus, they've got a new website out. It's the best data in the industry, and we publish a full report every week on wikibon.com and siliconangle.com. You can always reach out on email, David.Vellante@siliconangle.com I'm @DVellante on Twitter. Comment on our LinkedIn posts. This is Dave Vellante for the Cube Insights powered by ETR. Have a great week, stay safe, be well. And we'll see you next time. (mellow music)

Published Date : Jan 22 2022

SUMMARY :

bringing you data-driven and predict the future. So hopefully we can keep to mention that, you know, And this is a real issue, you know, And that is that the number one priority and in the application stack itself. And of course the variants And the CFOs can cut down an expense item. the board, you know, thing interesting to see, you know, and take the number three spot. not just the big three, but everywhere. It's the evolution of, you know, the, the ability to be able to be, and the democratization of data and the processes in place to get there. and is moving into the It is the number one net score And by the way, I'll just add, you know, and that's going to go to has to evolve, you know, that this is going to change anytime soon. Love to know what you think. so thanks for keeping it to 10 this year. I know, you know, we cut out. You want to keep going? This is Dave Vellante for the

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ErikPERSON

0.99+

IBMORGANIZATION

0.99+

Jerry ChanPERSON

0.99+

OracleORGANIZATION

0.99+

AWSORGANIZATION

0.99+

March of 2020DATE

0.99+

Dave VellantePERSON

0.99+

Zhamak DehghaniPERSON

0.99+

DavePERSON

0.99+

Marc AndreessenPERSON

0.99+

GoogleORGANIZATION

0.99+

2022DATE

0.99+

Tony BaerPERSON

0.99+

AlexPERSON

0.99+

DatabricksORGANIZATION

0.99+

8.3%QUANTITY

0.99+

2021DATE

0.99+

DecemberDATE

0.99+

38%QUANTITY

0.99+

last yearDATE

0.99+

November, 2020DATE

0.99+

twoQUANTITY

0.99+

20 yearsQUANTITY

0.99+

Last yearDATE

0.99+

Erik Porter BradleyPERSON

0.99+

AlibabaORGANIZATION

0.99+

41%QUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

MarsORGANIZATION

0.99+

DellORGANIZATION

0.99+

40%QUANTITY

0.99+

30%QUANTITY

0.99+

NetskopeORGANIZATION

0.99+

oneQUANTITY

0.99+

BostonLOCATION

0.99+

GrafanaORGANIZATION

0.99+

63%QUANTITY

0.99+

Arctic WolfORGANIZATION

0.99+

167 billionQUANTITY

0.99+

SlootmanPERSON

0.99+

two companiesQUANTITY

0.99+

35%QUANTITY

0.99+

34%QUANTITY

0.99+

SnykORGANIZATION

0.99+

70%QUANTITY

0.99+

FloridaLOCATION

0.99+

Palo AltoLOCATION

0.99+

4%QUANTITY

0.99+

GreylockORGANIZATION

0.99+

Analyst Predictions 2022: The Future of Data Management


 

[Music] in the 2010s organizations became keenly aware that data would become the key ingredient in driving competitive advantage differentiation and growth but to this day putting data to work remains a difficult challenge for many if not most organizations now as the cloud matures it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible we've also seen better tooling in the form of data workflows streaming machine intelligence ai developer tools security observability automation new databases and the like these innovations they accelerate data proficiency but at the same time they had complexity for practitioners data lakes data hubs data warehouses data marts data fabrics data meshes data catalogs data oceans are forming they're evolving and exploding onto the scene so in an effort to bring perspective to the sea of optionality we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond hello everyone my name is dave vellante with the cube and i'd like to welcome you to a special cube presentation analyst predictions 2022 the future of data management we've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade let me introduce our six power panelists sanjeev mohan is former gartner analyst and principal at sanjamo tony bear is principal at db insight carl olufsen is well-known research vice president with idc dave meninger is senior vice president and research director at ventana research brad shimon chief analyst at ai platforms analytics and data management at omnia and doug henschen vice president and principal analyst at constellation research gentlemen welcome to the program and thanks for coming on thecube today great to be here thank you all right here's the format we're going to use i as moderator are going to call on each analyst separately who then will deliver their prediction or mega trend and then in the interest of time management and pace two analysts will have the opportunity to comment if we have more time we'll elongate it but let's get started right away sanjeev mohan please kick it off you want to talk about governance go ahead sir thank you dave i i believe that data governance which we've been talking about for many years is now not only going to be mainstream it's going to be table stakes and all the things that you mentioned you know with data oceans data lakes lake houses data fabric meshes the common glue is metadata if we don't understand what data we have and we are governing it there is no way we can manage it so we saw informatica when public last year after a hiatus of six years i've i'm predicting that this year we see some more companies go public uh my bet is on colibra most likely and maybe alation we'll see go public this year we we i'm also predicting that the scope of data governance is going to expand beyond just data it's not just data and reports we are going to see more transformations like spark jaws python even airflow we're going to see more of streaming data so from kafka schema registry for example we will see ai models become part of this whole governance suite so the governance suite is going to be very comprehensive very detailed lineage impact analysis and then even expand into data quality we already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management data catalogs also data access governance so these so what we are going to see is that once the data governance platforms become the key entry point into these modern architectures i'm predicting that the usage the number of users of a data catalog is going to exceed that of a bi tool that will take time and we already seen that that trajectory right now if you look at bi tools i would say there are 100 users to a bi tool to one data catalog and i i see that evening out over a period of time and at some point data catalogs will really become you know the main way for us to access data data catalog will help us visualize data but if we want to do more in-depth analysis it'll be the jumping-off point into the bi tool the data science tool and and that is that is the journey i see for the data governance products excellent thank you some comments maybe maybe doug a lot a lot of things to weigh in on there maybe you could comment yeah sanjeev i think you're spot on a lot of the trends uh the one disagreement i think it's it's really still far from mainstream as you say we've been talking about this for years it's like god motherhood apple pie everyone agrees it's important but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking i think one thing that deserves uh mention in this context is uh esg mandates and guidelines these are environmental social and governance regs and guidelines we've seen the environmental rags and guidelines imposed in industries particularly the carbon intensive industries we've seen the social mandates particularly diversity imposed on suppliers by companies that are leading on this topic we've seen governance guidelines now being imposed by banks and investors so these esgs are presenting new carrots and sticks and it's going to demand more solid data it's going to demand more detailed reporting and solid reporting tighter governance but we're still far from mainstream adoption we have a lot of uh you know best of breed niche players in the space i think the signs that it's going to be more mainstream are starting with things like azure purview google dataplex the big cloud platform uh players seem to be uh upping the ante and and addressing starting to address governance excellent thank you doug brad i wonder if you could chime in as well yeah i would love to be a believer in data catalogs um but uh to doug's point i think that it's going to take some more pressure for for that to happen i recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the 90s and that didn't happen quite the way we we anticipated and and uh to sanjeev's point it's because it is really complex and really difficult to do my hope is that you know we won't sort of uh how do we put this fade out into this nebulous nebula of uh domain catalogs that are specific to individual use cases like purview for getting data quality right or like data governance and cyber security and instead we have some tooling that can actually be adaptive to gather metadata to create something i know is important to you sanjeev and that is this idea of observability if you can get enough metadata without moving your data around but understanding the entirety of a system that's running on this data you can do a lot to help with with the governance that doug is talking about so so i just want to add that you know data governance like many other initiatives did not succeed even ai went into an ai window but that's a different topic but a lot of these things did not succeed because to your point the incentives were not there i i remember when starbucks oxley had come into the scene if if a bank did not do service obviously they were very happy to a million dollar fine that was like you know pocket change for them instead of doing the right thing but i think the stakes are much higher now with gdpr uh the floodgates open now you know california you know has ccpa but even ccpa is being outdated with cpra which is much more gdpr like so we are very rapidly entering a space where every pretty much every major country in the world is coming up with its own uh compliance regulatory requirements data residence is becoming really important and and i i think we are going to reach a stage where uh it won't be optional anymore so whether we like it or not and i think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption we were focused on features and these features were disconnected very hard for business to stop these are built by it people for it departments to to take a look at technical metadata not business metadata today the tables have turned cdo's are driving this uh initiative uh regulatory compliances are beating down hard so i think the time might be right yeah so guys we have to move on here and uh but there's some some real meat on the bone here sanjeev i like the fact that you late you called out calibra and alation so we can look back a year from now and say okay he made the call he stuck it and then the ratio of bi tools the data catalogs that's another sort of measurement that we can we can take even though some skepticism there that's something that we can watch and i wonder if someday if we'll have more metadata than data but i want to move to tony baer you want to talk about data mesh and speaking you know coming off of governance i mean wow you know the whole concept of data mesh is decentralized data and then governance becomes you know a nightmare there but take it away tony we'll put it this way um data mesh you know the the idea at least is proposed by thoughtworks um you know basically was unleashed a couple years ago and the press has been almost uniformly almost uncritical um a good reason for that is for all the problems that basically that sanjeev and doug and brad were just you know we're just speaking about which is that we have all this data out there and we don't know what to do about it um now that's not a new problem that was a problem we had enterprise data warehouses it was a problem when we had our hadoop data clusters it's even more of a problem now the data's out in the cloud where the data is not only your data like is not only s3 it's all over the place and it's also including streaming which i know we'll be talking about later so the data mesh was a response to that the idea of that we need to debate you know who are the folks that really know best about governance is the domain experts so it was basically data mesh was an architectural pattern and a process my prediction for this year is that data mesh is going to hit cold hard reality because if you if you do a google search um basically the the published work the articles and databases have been largely you know pretty uncritical um so far you know that you know basically learning is basically being a very revolutionary new idea i don't think it's that revolutionary because we've talked about ideas like this brad and i you and i met years ago when we were talking about so and decentralizing all of us was at the application level now we're talking about at the data level and now we have microservices so there's this thought of oh if we manage if we're apps in cloud native through microservices why don't we think of data in the same way um my sense this year is that you know this and this has been a very active search if you look at google search trends is that now companies are going to you know enterprises are going to look at this seriously and as they look at seriously it's going to attract its first real hard scrutiny it's going to attract its first backlash that's not necessarily a bad thing it means that it's being taken seriously um the reason why i think that that uh that it will you'll start to see basically the cold hard light of day shine on data mesh is that it's still a work in progress you know this idea is basically a couple years old and there's still some pretty major gaps um the biggest gap is in is in the area of federated governance now federated governance itself is not a new issue uh federated governance position we're trying to figure out like how can we basically strike the balance between getting let's say you know between basically consistent enterprise policy consistent enterprise governance but yet the groups that understand the data know how to basically you know that you know how do we basically sort of balance the two there's a huge there's a huge gap there in practice and knowledge um also to a lesser extent there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data you know basically through the full life cycle from developed from selecting the data from you know building the other pipelines from determining your access control determining looking at quality looking at basically whether data is fresh or whether or not it's trending of course so my predictions is that it will really receive the first harsh scrutiny this year you are going to see some organization enterprises declare premature victory when they've uh when they build some federated query implementations you're going to see vendors start to data mesh wash their products anybody in the data management space they're going to say that whether it's basically a pipelining tool whether it's basically elt whether it's a catalog um or confederated query tool they're all going to be like you know basically promoting the fact of how they support this hopefully nobody is going to call themselves a data mesh tool because data mesh is not a technology we're going to see one other thing come out of this and this harks back to the metadata that sanji was talking about and the catalogs that he was talking about which is that there's going to be a new focus on every renewed focus on metadata and i think that's going to spur interest in data fabrics now data fabrics are pretty vaguely defined but if we just take the most elemental definition which is a common metadata back plane i think that if anybody is going to get serious about data mesh they need to look at a data fabric because we all at the end of the day need to speak you know need to read from the same sheet of music so thank you tony dave dave meninger i mean one of the things that people like about data mesh is it pretty crisply articulates some of the flaws in today's organizational approaches to data what are your thoughts on this well i think we have to start by defining data mesh right the the term is already getting corrupted right tony said it's going to see the cold hard uh light of day and there's a problem right now that there are a number of overlapping terms that are similar but not identical so we've got data virtualization data fabric excuse me for a second sorry about that data virtualization data fabric uh uh data federation right uh so i i think that it's not really clear what each vendor means by these terms i see data mesh and data fabric becoming quite popular i've i've interpreted data mesh as referring primarily to the governance aspects as originally you know intended and specified but that's not the way i see vendors using i see vendors using it much more to mean data fabric and data virtualization so i'm going to comment on the group of those things i think the group of those things is going to happen they're going to happen they're going to become more robust our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access again whether you define it as mesh or fabric or virtualization isn't really the point here but this notion that there are different elements of data metadata and governance within an organization that all need to be managed collectively the interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not it's almost double 68 of organizations i'm i'm sorry um 79 of organizations that were using virtualized access express satisfaction with their access to the data lake only 39 expressed satisfaction if they weren't using virtualized access so thank you uh dave uh sanjeev we just got about a couple minutes on this topic but i know you're speaking or maybe you've spoken already on a panel with jamal dagani who sort of invented the concept governance obviously is a big sticking point but what are your thoughts on this you are mute so my message to your mark and uh and to the community is uh as opposed to what dave said let's not define it we spent the whole year defining it there are four principles domain product data infrastructure and governance let's take it to the next level i get a lot of questions on what is the difference between data fabric and data mesh and i'm like i can compare the two because data mesh is a business concept data fabric is a data integration pattern how do you define how do you compare the two you have to bring data mesh level down so to tony's point i'm on a warp path in 2022 to take it down to what does a data product look like how do we handle shared data across domains and govern it and i think we are going to see more of that in 2022 is operationalization of data mesh i think we could have a whole hour on this topic couldn't we uh maybe we should do that uh but let's go to let's move to carl said carl your database guy you've been around that that block for a while now you want to talk about graph databases bring it on oh yeah okay thanks so i regard graph database as basically the next truly revolutionary database management technology i'm looking forward to for the graph database market which of course we haven't defined yet so obviously i have a little wiggle room in what i'm about to say but that this market will grow by about 600 percent over the next 10 years now 10 years is a long time but over the next five years we expect to see gradual growth as people start to learn how to use it problem isn't that it's used the problem is not that it's not useful is that people don't know how to use it so let me explain before i go any further what a graph database is because some of the folks on the call may not may not know what it is a graph database organizes data according to a mathematical structure called a graph a graph has elements called nodes and edges so a data element drops into a node the nodes are connected by edges the edges connect one node to another node combinations of edges create structures that you can analyze to determine how things are related in some cases the nodes and edges can have properties attached to them which add additional informative material that makes it richer that's called a property graph okay there are two principal use cases for graph databases there's there's semantic proper graphs which are used to break down human language text uh into the semantic structures then you can search it organize it and and and answer complicated questions a lot of ai is aimed at semantic graphs another kind is the property graph that i just mentioned which has a dazzling number of use cases i want to just point out is as i talk about this people are probably wondering well we have relational databases isn't that good enough okay so a relational database defines it uses um it supports what i call definitional relationships that means you define the relationships in a fixed structure the database drops into that structure there's a value foreign key value that relates one table to another and that value is fixed you don't change it if you change it the database becomes unstable it's not clear what you're looking at in a graph database the system is designed to handle change so that it can reflect the true state of the things that it's being used to track so um let me just give you some examples of use cases for this um they include uh entity resolution data lineage uh um social media analysis customer 360 fraud prevention there's cyber security there's strong supply chain is a big one actually there's explainable ai and this is going to become important too because a lot of people are adopting ai but they want a system after the fact to say how did the ai system come to that conclusion how did it make that recommendation right now we don't have really good ways of tracking that okay machine machine learning in general um social network i already mentioned that and then we've got oh gosh we've got data governance data compliance risk management we've got recommendation we've got personalization anti-money money laundering that's another big one identity and access management network and i.t operations is already becoming a key one where you actually have mapped out your operation your your you know whatever it is your data center and you you can track what's going on as things happen there root cause analysis fraud detection is a huge one a number of major credit card companies use graph databases for fraud detection risk analysis tracking and tracing churn analysis next best action what-if analysis impact analysis entity resolution and i would add one other thing or just a few other things to this list metadata management so sanjay here you go this is your engine okay because i was in metadata management for quite a while in my past life and one of the things i found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it but grass can okay grafts can do things like say this term in this context means this but in that context it means that okay things like that and in fact uh logistics management supply chain it also because it handles recursive relationships by recursive relationships i mean objects that own other objects that are of the same type you can do things like bill materials you know so like parts explosion you can do an hr analysis who reports to whom how many levels up the chain and that kind of thing you can do that with relational databases but yes it takes a lot of programming in fact you can do almost any of these things with relational databases but the problem is you have to program it it's not it's not supported in the database and whenever you have to program something that means you can't trace it you can't define it you can't publish it in terms of its functionality and it's really really hard to maintain over time so carl thank you i wonder if we could bring brad in i mean brad i'm sitting there wondering okay is this incremental to the market is it disruptive and replaceable what are your thoughts on this space it's already disrupted the market i mean like carl said go to any bank and ask them are you using graph databases to do to get fraud detection under control and they'll say absolutely that's the only way to solve this problem and it is frankly um and it's the only way to solve a lot of the problems that carl mentioned and that is i think it's it's achilles heel in some ways because you know it's like finding the best way to cross the seven bridges of konigsberg you know it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique uh it it still unfortunately kind of stands apart from the rest of the community that's building let's say ai outcomes as the great great example here the graph databases and ai as carl mentioned are like chocolate and peanut butter but technologically they don't know how to talk to one another they're completely different um and you know it's you can't just stand up sql and query them you've got to to learn um yeah what is that carlos specter or uh special uh uh yeah thank you uh to actually get to the data in there and if you're gonna scale that data that graph database especially a property graph if you're gonna do something really complex like try to understand uh you know all of the metadata in your organization you might just end up with you know a graph database winter like we had the ai winter simply because you run out of performance to make the thing happen so i i think it's already disrupted but we we need to like treat it like a first-class citizen in in the data analytics and ai community we need to bring it into the fold we need to equip it with the tools it needs to do that the magic it does and to do it not just for specialized use cases but for everything because i i'm with carl i i think it's absolutely revolutionary so i had also identified the principal achilles heel of the technology which is scaling now when these when these things get large and complex enough that they spill over what a single server can handle you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down so that's still a problem to be solved sanjeev any quick thoughts on this i mean i think metadata on the on the on the word cloud is going to be the the largest font uh but what are your thoughts here i want to like step away so people don't you know associate me with only meta data so i want to talk about something a little bit slightly different uh dbengines.com has done an amazing job i think almost everyone knows that they chronicle all the major databases that are in use today in january of 2022 there are 381 databases on its list of ranked list of databases the largest category is rdbms the second largest category is actually divided into two property graphs and rdf graphs these two together make up the second largest number of data databases so talking about accolades here this is a problem the problem is that there's so many graph databases to choose from they come in different shapes and forms uh to bright's point there's so many query languages in rdbms is sql end of the story here we've got sci-fi we've got gremlin we've got gql and then your proprietary languages so i think there's a lot of disparity in this space but excellent all excellent points sanji i must say and that is a problem the languages need to be sorted and standardized and it needs people need to have a road map as to what they can do with it because as you say you can do so many things and so many of those things are unrelated that you sort of say well what do we use this for i'm reminded of the saying i learned a bunch of years ago when somebody said that the digital computer is the only tool man has ever devised that has no particular purpose all right guys we gotta we gotta move on to dave uh meninger uh we've heard about streaming uh your prediction is in that realm so please take it away sure so i like to say that historical databases are to become a thing of the past but i don't mean that they're going to go away that's not my point i mean we need historical databases but streaming data is going to become the default way in which we operate with data so in the next say three to five years i would expect the data platforms and and we're using the term data platforms to represent the evolution of databases and data lakes that the data platforms will incorporate these streaming capabilities we're going to process data as it streams into an organization and then it's going to roll off into historical databases so historical databases don't go away but they become a thing of the past they store the data that occurred previously and as data is occurring we're going to be processing it we're going to be analyzing we're going to be acting on it i mean we we only ever ended up with historical databases because we were limited by the technology that was available to us data doesn't occur in batches but we processed it in batches because that was the best we could do and it wasn't bad and we've continued to improve and we've improved and we've improved but streaming data today is still the exception it's not the rule right there's there are projects within organizations that deal with streaming data but it's not the default way in which we deal with data yet and so that that's my prediction is that this is going to change we're going to have um streaming data be the default way in which we deal with data and and how you label it what you call it you know maybe these databases and data platforms just evolve to be able to handle it but we're going to deal with data in a different way and our research shows that already about half of the participants in our analytics and data benchmark research are using streaming data you know another third are planning to use streaming technologies so that gets us to about eight out of ten organizations need to use this technology that doesn't mean they have to use it throughout the whole organization but but it's pretty widespread in its use today and has continued to grow if you think about the consumerization of i.t we've all been conditioned to expect immediate access to information immediate responsiveness you know we want to know if an uh item is on the shelf at our local retail store and we can go in and pick it up right now you know that's the world we live in and that's spilling over into the enterprise i.t world where we have to provide those same types of capabilities um so that's my prediction historical database has become a thing of the past streaming data becomes the default way in which we we operate with data all right thank you david well so what what say you uh carl a guy who's followed historical databases for a long time well one thing actually every database is historical because as soon as you put data in it it's now history it's no longer it no longer reflects the present state of things but even if that history is only a millisecond old it's still history but um i would say i mean i know you're trying to be a little bit provocative in saying this dave because you know as well as i do that people still need to do their taxes they still need to do accounting they still need to run general ledger programs and things like that that all involves historical data that's not going to go away unless you want to go to jail so you're going to have to deal with that but as far as the leading edge functionality i'm totally with you on that and i'm just you know i'm just kind of wondering um if this chain if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way m applications work um saying that uh an application should respond instantly as soon as the state of things changes what do you say about that i i think that's true i think we do have to think about things differently that's you know it's not the way we design systems in the past uh we're seeing more and more systems designed that way but again it's not the default and and agree 100 with you that we do need historical databases you know that that's clear and even some of those historical databases will be used in conjunction with the streaming data right so absolutely i mean you know let's take the data warehouse example where you're using the data warehouse as context and the streaming data as the present you're saying here's a sequence of things that's happening right now have we seen that sequence before and where what what does that pattern look like in past situations and can we learn from that so tony bear i wonder if you could comment i mean if you when you think about you know real-time inferencing at the edge for instance which is something that a lot of people talk about um a lot of what we're discussing here in this segment looks like it's got great potential what are your thoughts yeah well i mean i think you nailed it right you know you hit it right on the head there which is that i think a key what i'm seeing is that essentially and basically i'm going to split this one down the middle is i don't see that basically streaming is the default what i see is streaming and basically and transaction databases um and analytics data you know data warehouses data lakes whatever are converging and what allows us technically to converge is cloud native architecture where you can basically distribute things so you could have you can have a note here that's doing the real-time processing that's also doing it and this is what your leads in we're maybe doing some of that real-time predictive analytics to take a look at well look we're looking at this customer journey what's happening with you know you know with with what the customer is doing right now and this is correlated with what other customers are doing so what i so the thing is that in the cloud you can basically partition this and because of basically you know the speed of the infrastructure um that you can basically bring these together and or and so and kind of orchestrate them sort of loosely coupled manner the other part is that the use cases are demanding and this is part that goes back to what dave is saying is that you know when you look at customer 360 when you look at let's say smart you know smart utility grids when you look at any type of operational problem it has a real-time component and it has a historical component and having predictives and so like you know you know my sense here is that there that technically we can bring this together through the cloud and i think the use case is that is that we we can apply some some real-time sort of you know predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction we have this real time you know input sanjeev did you have a comment yeah i was just going to say that to this point you know we have to think of streaming very different because in the historical databases we used to bring the data and store the data and then we used to run rules on top uh aggregations and all but in case of streaming the mindset changes because the rules normally the inference all of that is fixed but the data is constantly changing so it's a completely reverse way of thinking of uh and building applications on top of that so dave menninger there seemed to be some disagreement about the default or now what kind of time frame are you are you thinking about is this end of decade it becomes the default what would you pin i i think around you know between between five to ten years i think this becomes the reality um i think you know it'll be more and more common between now and then but it becomes the default and i also want sanjeev at some point maybe in one of our subsequent conversations we need to talk about governing streaming data because that's a whole other set of challenges we've also talked about it rather in a two dimensions historical and streaming and there's lots of low latency micro batch sub second that's not quite streaming but in many cases it's fast enough and we're seeing a lot of adoption of near real time not quite real time as uh good enough for most for many applications because nobody's really taking the hardware dimension of this information like how do we that'll just happen carl so near real time maybe before you lose the customer however you define that right okay um let's move on to brad brad you want to talk about automation ai uh the the the pipeline people feel like hey we can just automate everything what's your prediction yeah uh i'm i'm an ai fiction auto so apologies in advance for that but uh you know um i i think that um we've been seeing automation at play within ai for some time now and it's helped us do do a lot of things for especially for practitioners that are building ai outcomes in the enterprise uh it's it's helped them to fill skills gaps it's helped them to speed development and it's helped them to to actually make ai better uh because it you know in some ways provides some swim lanes and and for example with technologies like ottawa milk and can auto document and create that sort of transparency that that we talked about a little bit earlier um but i i think it's there's an interesting kind of conversion happening with this idea of automation um and and that is that uh we've had the automation that started happening for practitioners it's it's trying to move outside of the traditional bounds of things like i'm just trying to get my features i'm just trying to pick the right algorithm i'm just trying to build the right model uh and it's expanding across that full life cycle of building an ai outcome to start at the very beginning of data and to then continue on to the end which is this continuous delivery and continuous uh automation of of that outcome to make sure it's right and it hasn't drifted and stuff like that and because of that because it's become kind of powerful we're starting to to actually see this weird thing happen where the practitioners are starting to converge with the users and that is to say that okay if i'm in tableau right now i can stand up salesforce einstein discovery and it will automatically create a nice predictive algorithm for me um given the data that i that i pull in um but what's starting to happen and we're seeing this from the the the companies that create business software so salesforce oracle sap and others is that they're starting to actually use these same ideals and a lot of deep learning to to basically stand up these out of the box flip a switch and you've got an ai outcome at the ready for business users and um i i'm very much you know i think that that's that's the way that it's going to go and what it means is that ai is is slowly disappearing uh and i don't think that's a bad thing i think if anything what we're going to see in 2022 and maybe into 2023 is this sort of rush to to put this idea of disappearing ai into practice and have as many of these solutions in the enterprise as possible you can see like for example sap is going to roll out this quarter this thing called adaptive recommendation services which which basically is a cold start ai outcome that can work across a whole bunch of different vertical markets and use cases it's just a recommendation engine for whatever you need it to do in the line of business so basically you're you're an sap user you look up to turn on your software one day and you're a sales professional let's say and suddenly you have a recommendation for customer churn it's going that's great well i i don't know i i think that's terrifying in some ways i think it is the future that ai is going to disappear like that but i am absolutely terrified of it because um i i think that what it what it really does is it calls attention to a lot of the issues that we already see around ai um specific to this idea of what what we like to call it omdia responsible ai which is you know how do you build an ai outcome that is free of bias that is inclusive that is fair that is safe that is secure that it's audible etc etc etc etc that takes some a lot of work to do and so if you imagine a customer that that's just a sales force customer let's say and they're turning on einstein discovery within their sales software you need some guidance to make sure that when you flip that switch that the outcome you're going to get is correct and that's that's going to take some work and so i think we're going to see this let's roll this out and suddenly there's going to be a lot of a lot of problems a lot of pushback uh that we're going to see and some of that's going to come from gdpr and others that sam jeeve was mentioning earlier a lot of it's going to come from internal csr requirements within companies that are saying hey hey whoa hold up we can't do this all at once let's take the slow route let's make ai automated in a smart way and that's going to take time yeah so a couple predictions there that i heard i mean ai essentially you disappear it becomes invisible maybe if i can restate that and then if if i understand it correctly brad you're saying there's a backlash in the near term people can say oh slow down let's automate what we can those attributes that you talked about are non trivial to achieve is that why you're a bit of a skeptic yeah i think that we don't have any sort of standards that companies can look to and understand and we certainly within these companies especially those that haven't already stood up in internal data science team they don't have the knowledge to understand what that when they flip that switch for an automated ai outcome that it's it's gonna do what they think it's gonna do and so we need some sort of standard standard methodology and practice best practices that every company that's going to consume this invisible ai can make use of and one of the things that you know is sort of started that google kicked off a few years back that's picking up some momentum and the companies i just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing you know so like for the sap example we know for example that it's convolutional neural network with a long short-term memory model that it's using we know that it only works on roman english uh and therefore me as a consumer can say oh well i know that i need to do this internationally so i should not just turn this on today great thank you carl can you add anything any context here yeah we've talked about some of the things brad mentioned here at idc in the our future of intelligence group regarding in particular the moral and legal implications of having a fully automated you know ai uh driven system uh because we already know and we've seen that ai systems are biased by the data that they get right so if if they get data that pushes them in a certain direction i think there was a story last week about an hr system that was uh that was recommending promotions for white people over black people because in the past um you know white people were promoted and and more productive than black people but not it had no context as to why which is you know because they were being historically discriminated black people being historically discriminated against but the system doesn't know that so you know you have to be aware of that and i think that at the very least there should be controls when a decision has either a moral or a legal implication when when you want when you really need a human judgment it could lay out the options for you but a person actually needs to authorize that that action and i also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases and to some extent they always will so we'll always be chasing after them that's that's absolutely carl yeah i think that what you have to bear in mind as a as a consumer of ai is that it is a reflection of us and we are a very flawed species uh and so if you look at all the really fantastic magical looking supermodels we see like gpt three and four that's coming out z they're xenophobic and hateful uh because the people the data that's built upon them and the algorithms and the people that build them are us so ai is a reflection of us we need to keep that in mind yeah we're the ai's by us because humans are biased all right great okay let's move on doug henson you know a lot of people that said that data lake that term's not not going to not going to live on but it appears to be have some legs here uh you want to talk about lake house bring it on yes i do my prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering i say offering that doesn't mean it's going to be the dominant thing that organizations have out there but it's going to be the predominant vendor offering in 2022. now heading into 2021 we already had cloudera data bricks microsoft snowflake as proponents in 2021 sap oracle and several of these fabric virtualization mesh vendors join the bandwagon the promise is that you have one platform that manages your structured unstructured and semi-structured information and it addresses both the beyond analytics needs and the data science needs the real promise there is simplicity and lower cost but i think end users have to answer a few questions the first is does your organization really have a center of data gravity or is it is the data highly distributed multiple data warehouses multiple data lakes on-premises cloud if it if it's very distributed and you you know you have difficulty consolidating and that's not really a goal for you then maybe that single platform is unrealistic and not likely to add value to you um you know also the fabric and virtualization vendors the the mesh idea that's where if you have this highly distributed situation that might be a better path forward the second question if you are looking at one of these lake house offerings you are looking at consolidating simplifying bringing together to a single platform you have to make sure that it meets both the warehouse need and the data lake need so you have vendors like data bricks microsoft with azure synapse new really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements can meet the user and query concurrency requirements meet those tight slas and then on the other hand you have the or the oracle sap snowflake the data warehouse uh folks coming into the data science world and they have to prove that they can manage the unstructured information and meet the needs of the data scientists i'm seeing a lot of the lake house offerings from the warehouse crowd managing that unstructured information in columns and rows and some of these vendors snowflake in particular is really relying on partners for the data science needs so you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement well thank you doug well tony if those two worlds are going to come together as doug was saying the analytics and the data science world does it need to be some kind of semantic layer in between i don't know weigh in on this topic if you would oh didn't we talk about data fabrics before common metadata layer um actually i'm almost tempted to say let's declare victory and go home in that this is actually been going on for a while i actually agree with uh you know much what doug is saying there which is that i mean we i remembered as far back as i think it was like 2014 i was doing a a study you know it was still at ovum predecessor omnia um looking at all these specialized databases that were coming up and seeing that you know there's overlap with the edges but yet there was still going to be a reason at the time that you would have let's say a document database for json you'd have a relational database for tran you know for transactions and for data warehouse and you had you know and you had basically something at that time that that resembles to do for what we're considering a day of life fast fo and the thing is what i was saying at the time is that you're seeing basically blur you know sort of blending at the edges that i was saying like about five or six years ago um that's all and the the lake house is essentially you know the amount of the the current manifestation of that idea there is a dichotomy in terms of you know it's the old argument do we centralize this all you know you know in in in in in a single place or do we or do we virtualize and i think it's always going to be a yin and yang there's never going to be a single single silver silver bullet i do see um that they're also going to be questions and these are things that points that doug raised they're you know what your what do you need of of of your of you know for your performance there or for your you know pre-performance characteristics do you need for instance hiking currency you need the ability to do some very sophisticated joins or is your requirement more to be able to distribute and you know distribute our processing is you know as far as possible to get you know to essentially do a kind of brute force approach all these approaches are valid based on you know based on the used case um i just see that essentially that the lake house is the culmination of it's nothing it's just it's a relatively new term introduced by databricks a couple years ago this is the culmination of basically what's been a long time trend and what we see in the cloud is that as we start seeing data warehouses as a checkbox item say hey we can basically source data in cloud and cloud storage and s3 azure blob store you know whatever um as long as it's in certain formats like you know like you know parquet or csv or something like that you know i see that as becoming kind of you know a check box item so to that extent i think that the lake house depending on how you define it is already reality um and in some in some cases maybe new terminology but not a whole heck of a lot new under the sun yeah and dave menger i mean a lot of this thank you tony but a lot of this is going to come down to you know vendor marketing right some people try to co-opt the term we talked about data mesh washing what are your thoughts on this yeah so um i used the term data platform earlier and and part of the reason i use that term is that it's more vendor neutral uh we've we've tried to uh sort of stay out of the the vendor uh terminology patenting world right whether whether the term lake house is what sticks or not the concept is certainly going to stick and we have some data to back it up about a quarter of organizations that are using data lakes today already incorporate data warehouse functionality into it so they consider their data lake house and data warehouse one in the same about a quarter of organizations a little less but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake so it's pretty obvious that three quarters of organizations need to bring this stuff together right the need is there the need is apparent the technology is going to continue to verge converge i i like to talk about you know you've got data lakes over here at one end and i'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a in a server and you ignore it right that's not what a data lake is so you've got data lake people over here and you've got database people over here data warehouse people over here database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities so it's obvious that they're going to meet in the middle i mean i think it's like tony says i think we should there declare victory and go home and so so i it's just a follow-up on that so are you saying these the specialized lake and the specialized warehouse do they go away i mean johnny tony data mesh practitioners would say or or advocates would say well they could all live as just a node on the on the mesh but based on what dave just said are we going to see those all morph together well number one as i was saying before there's always going to be this sort of you know kind of you know centrifugal force or this tug of war between do we centralize the data do we do it virtualize and the fact is i don't think that work there's ever going to be any single answer i think in terms of data mesh data mesh has nothing to do with how you physically implement the data you could have a data mesh on a basically uh on a data warehouse it's just that you know the difference being is that if we use the same you know physical data store but everybody's logically manual basically governing it differently you know um a data mission is basically it's not a technology it's a process it's a governance process um so essentially um you know you know i basically see that you know as as i was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring but there are going to be cases where for instance if i need let's say like observe i need like high concurrency or something like that there are certain things that i'm not going to be able to get efficiently get out of a data lake um and you know we're basically i'm doing a system where i'm just doing really brute forcing very fast file scanning and that type of thing so i think there always will be some delineations but i would agree with dave and with doug that we are seeing basically a a confluence of requirements that we need to essentially have basically the element you know the ability of a data lake and a data laid out their warehouse we these need to come together so i think what we're likely to see is organizations look for a converged platform that can handle both sides for their center of data gravity the mesh and the fabric vendors the the fabric virtualization vendors they're all on board with the idea of this converged platform and they're saying hey we'll handle all the edge cases of the stuff that isn't in that center of data gradient that is off distributed in a cloud or at a remote location so you can have that single platform for the center of of your your data and then bring in virtualization mesh what have you for reaching out to the distributed data bingo as they basically said people are happy when they virtualize data i i think yes at this point but to this uh dave meningas point you know they have convert they are converging snowflake has introduced support for unstructured data so now we are literally splitting here now what uh databricks is saying is that aha but it's easy to go from data lake to data warehouse than it is from data warehouse to data lake so i think we're getting into semantics but we've already seen these two converge so is that so it takes something like aws who's got what 15 data stores are they're going to have 15 converged data stores that's going to be interesting to watch all right guys i'm going to go down the list and do like a one i'm going to one word each and you guys each of the analysts if you wouldn't just add a very brief sort of course correction for me so sanjeev i mean governance is going to be the maybe it's the dog that wags the tail now i mean it's coming to the fore all this ransomware stuff which really didn't talk much about security but but but what's the one word in your prediction that you would leave us with on governance it's uh it's going to be mainstream mainstream okay tony bear mesh washing is what i wrote down that's that's what we're going to see in uh in in 2022 a little reality check you you want to add to that reality check is i hope that no vendor you know jumps the shark and calls their offering a data mesh project yeah yeah let's hope that doesn't happen if they do we're going to call them out uh carl i mean graph databases thank you for sharing some some you know high growth metrics i know it's early days but magic is what i took away from that it's the magic database yeah i would actually i've said this to people too i i kind of look at it as a swiss army knife of data because you can pretty much do anything you want with it it doesn't mean you should i mean that's definitely the case that if you're you know managing things that are in a fixed schematic relationship probably a relational database is a better choice there are you know times when the document database is a better choice it can handle those things but maybe not it may not be the best choice for that use case but for a great many especially the new emerging use cases i listed it's the best choice thank you and dave meninger thank you by the way for bringing the data in i like how you supported all your comments with with some some data points but streaming data becomes the sort of default uh paradigm if you will what would you add yeah um i would say think fast right that's the world we live in you got to think fast fast love it uh and brad shimon uh i love it i mean on the one hand i was saying okay great i'm afraid i might get disrupted by one of these internet giants who are ai experts so i'm gonna be able to buy instead of build ai but then again you know i've got some real issues there's a potential backlash there so give us the there's your bumper sticker yeah i i would say um going with dave think fast and also think slow uh to to talk about the book that everyone talks about i would say really that this is all about trust trust in the idea of automation and of a transparent invisible ai across the enterprise but verify verify before you do anything and then doug henson i mean i i look i think the the trend is your friend here on this prediction with lake house is uh really becoming dominant i liked the way you set up that notion of you know the the the data warehouse folks coming at it from the analytics perspective but then you got the data science worlds coming together i still feel as though there's this piece in the middle that we're missing but your your final thoughts we'll give you the last well i think the idea of consolidation and simplification uh always prevails that's why the appeal of a single platform is going to be there um we've already seen that with uh you know hadoop platforms moving toward cloud moving toward object storage and object storage becoming really the common storage point for whether it's a lake or a warehouse uh and that second point uh i think esg mandates are uh are gonna come in alongside uh gdpr and things like that to uh up the ante for uh good governance yeah thank you for calling that out okay folks hey that's all the time that that we have here your your experience and depth of understanding on these key issues and in data and data management really on point and they were on display today i want to thank you for your your contributions really appreciate your time enjoyed it thank you now in addition to this video we're going to be making available transcripts of the discussion we're going to do clips of this as well we're going to put them out on social media i'll write this up and publish the discussion on wikibon.com and siliconangle.com no doubt several of the analysts on the panel will take the opportunity to publish written content social commentary or both i want to thank the power panelist and thanks for watching this special cube presentation this is dave vellante be well and we'll see you next time [Music] you

Published Date : Jan 8 2022

SUMMARY :

the end of the day need to speak you

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
381 databasesQUANTITY

0.99+

2014DATE

0.99+

2022DATE

0.99+

2021DATE

0.99+

january of 2022DATE

0.99+

100 usersQUANTITY

0.99+

jamal daganiPERSON

0.99+

last weekDATE

0.99+

dave meningerPERSON

0.99+

sanjiPERSON

0.99+

second questionQUANTITY

0.99+

15 converged data storesQUANTITY

0.99+

dave vellantePERSON

0.99+

microsoftORGANIZATION

0.99+

threeQUANTITY

0.99+

sanjeevPERSON

0.99+

2023DATE

0.99+

15 data storesQUANTITY

0.99+

siliconangle.comOTHER

0.99+

last yearDATE

0.99+

sanjeev mohanPERSON

0.99+

sixQUANTITY

0.99+

twoQUANTITY

0.99+

carlPERSON

0.99+

tonyPERSON

0.99+

carl olufsenPERSON

0.99+

six yearsQUANTITY

0.99+

davidPERSON

0.99+

carlos specterPERSON

0.98+

both sidesQUANTITY

0.98+

2010sDATE

0.98+

first backlashQUANTITY

0.98+

five yearsQUANTITY

0.98+

todayDATE

0.98+

davePERSON

0.98+

eachQUANTITY

0.98+

three quartersQUANTITY

0.98+

firstQUANTITY

0.98+

single platformQUANTITY

0.98+

lake houseORGANIZATION

0.98+

bothQUANTITY

0.98+

this yearDATE

0.98+

dougPERSON

0.97+

one wordQUANTITY

0.97+

this yearDATE

0.97+

wikibon.comOTHER

0.97+

one platformQUANTITY

0.97+

39QUANTITY

0.97+

about 600 percentQUANTITY

0.97+

two analystsQUANTITY

0.97+

ten yearsQUANTITY

0.97+

single platformQUANTITY

0.96+

fiveQUANTITY

0.96+

oneQUANTITY

0.96+

three quartersQUANTITY

0.96+

californiaLOCATION

0.96+

googleORGANIZATION

0.96+

singleQUANTITY

0.95+

Predictions 2022: Top Analysts See the Future of Data


 

(bright music) >> In the 2010s, organizations became keenly aware that data would become the key ingredient to driving competitive advantage, differentiation, and growth. But to this day, putting data to work remains a difficult challenge for many, if not most organizations. Now, as the cloud matures, it has become a game changer for data practitioners by making cheap storage and massive processing power readily accessible. We've also seen better tooling in the form of data workflows, streaming, machine intelligence, AI, developer tools, security, observability, automation, new databases and the like. These innovations they accelerate data proficiency, but at the same time, they add complexity for practitioners. Data lakes, data hubs, data warehouses, data marts, data fabrics, data meshes, data catalogs, data oceans are forming, they're evolving and exploding onto the scene. So in an effort to bring perspective to the sea of optionality, we've brought together the brightest minds in the data analyst community to discuss how data management is morphing and what practitioners should expect in 2022 and beyond. Hello everyone, my name is Dave Velannte with theCUBE, and I'd like to welcome you to a special Cube presentation, analysts predictions 2022: the future of data management. We've gathered six of the best analysts in data and data management who are going to present and discuss their top predictions and trends for 2022 in the first half of this decade. Let me introduce our six power panelists. Sanjeev Mohan is former Gartner Analyst and Principal at SanjMo. Tony Baer, principal at dbInsight, Carl Olofson is well-known Research Vice President with IDC, Dave Menninger is Senior Vice President and Research Director at Ventana Research, Brad Shimmin, Chief Analyst, AI Platforms, Analytics and Data Management at Omdia and Doug Henschen, Vice President and Principal Analyst at Constellation Research. Gentlemen, welcome to the program and thanks for coming on theCUBE today. >> Great to be here. >> Thank you. >> All right, here's the format we're going to use. I as moderator, I'm going to call on each analyst separately who then will deliver their prediction or mega trend, and then in the interest of time management and pace, two analysts will have the opportunity to comment. If we have more time, we'll elongate it, but let's get started right away. Sanjeev Mohan, please kick it off. You want to talk about governance, go ahead sir. >> Thank you Dave. I believe that data governance which we've been talking about for many years is now not only going to be mainstream, it's going to be table stakes. And all the things that you mentioned, you know, the data, ocean data lake, lake houses, data fabric, meshes, the common glue is metadata. If we don't understand what data we have and we are governing it, there is no way we can manage it. So we saw Informatica went public last year after a hiatus of six. I'm predicting that this year we see some more companies go public. My bet is on Culebra, most likely and maybe Alation we'll see go public this year. I'm also predicting that the scope of data governance is going to expand beyond just data. It's not just data and reports. We are going to see more transformations like spark jawsxxxxx, Python even Air Flow. We're going to see more of a streaming data. So from Kafka Schema Registry, for example. We will see AI models become part of this whole governance suite. So the governance suite is going to be very comprehensive, very detailed lineage, impact analysis, and then even expand into data quality. We already seen that happen with some of the tools where they are buying these smaller companies and bringing in data quality monitoring and integrating it with metadata management, data catalogs, also data access governance. So what we are going to see is that once the data governance platforms become the key entry point into these modern architectures, I'm predicting that the usage, the number of users of a data catalog is going to exceed that of a BI tool. That will take time and we already seen that trajectory. Right now if you look at BI tools, I would say there a hundred users to BI tool to one data catalog. And I see that evening out over a period of time and at some point data catalogs will really become the main way for us to access data. Data catalog will help us visualize data, but if we want to do more in-depth analysis, it'll be the jumping off point into the BI tool, the data science tool and that is the journey I see for the data governance products. >> Excellent, thank you. Some comments. Maybe Doug, a lot of things to weigh in on there, maybe you can comment. >> Yeah, Sanjeev I think you're spot on, a lot of the trends the one disagreement, I think it's really still far from mainstream. As you say, we've been talking about this for years, it's like God, motherhood, apple pie, everyone agrees it's important, but too few organizations are really practicing good governance because it's hard and because the incentives have been lacking. I think one thing that deserves mention in this context is ESG mandates and guidelines, these are environmental, social and governance, regs and guidelines. We've seen the environmental regs and guidelines and posts in industries, particularly the carbon-intensive industries. We've seen the social mandates, particularly diversity imposed on suppliers by companies that are leading on this topic. We've seen governance guidelines now being imposed by banks on investors. So these ESGs are presenting new carrots and sticks, and it's going to demand more solid data. It's going to demand more detailed reporting and solid reporting, tighter governance. But we're still far from mainstream adoption. We have a lot of, you know, best of breed niche players in the space. I think the signs that it's going to be more mainstream are starting with things like Azure Purview, Google Dataplex, the big cloud platform players seem to be upping the ante and starting to address governance. >> Excellent, thank you Doug. Brad, I wonder if you could chime in as well. >> Yeah, I would love to be a believer in data catalogs. But to Doug's point, I think that it's going to take some more pressure for that to happen. I recall metadata being something every enterprise thought they were going to get under control when we were working on service oriented architecture back in the nineties and that didn't happen quite the way we anticipated. And so to Sanjeev's point it's because it is really complex and really difficult to do. My hope is that, you know, we won't sort of, how do I put this? Fade out into this nebula of domain catalogs that are specific to individual use cases like Purview for getting data quality right or like data governance and cybersecurity. And instead we have some tooling that can actually be adaptive to gather metadata to create something. And I know its important to you, Sanjeev and that is this idea of observability. If you can get enough metadata without moving your data around, but understanding the entirety of a system that's running on this data, you can do a lot. So to help with the governance that Doug is talking about. >> So I just want to add that, data governance, like any other initiatives did not succeed even AI went into an AI window, but that's a different topic. But a lot of these things did not succeed because to your point, the incentives were not there. I remember when Sarbanes Oxley had come into the scene, if a bank did not do Sarbanes Oxley, they were very happy to a million dollar fine. That was like, you know, pocket change for them instead of doing the right thing. But I think the stakes are much higher now. With GDPR, the flood gates opened. Now, you know, California, you know, has CCPA but even CCPA is being outdated with CPRA, which is much more GDPR like. So we are very rapidly entering a space where pretty much every major country in the world is coming up with its own compliance regulatory requirements, data residents is becoming really important. And I think we are going to reach a stage where it won't be optional anymore. So whether we like it or not, and I think the reason data catalogs were not successful in the past is because we did not have the right focus on adoption. We were focused on features and these features were disconnected, very hard for business to adopt. These are built by IT people for IT departments to take a look at technical metadata, not business metadata. Today the tables have turned. CDOs are driving this initiative, regulatory compliances are beating down hard, so I think the time might be right. >> Yeah so guys, we have to move on here. But there's some real meat on the bone here, Sanjeev. I like the fact that you called out Culebra and Alation, so we can look back a year from now and say, okay, he made the call, he stuck it. And then the ratio of BI tools to data catalogs that's another sort of measurement that we can take even though with some skepticism there, that's something that we can watch. And I wonder if someday, if we'll have more metadata than data. But I want to move to Tony Baer, you want to talk about data mesh and speaking, you know, coming off of governance. I mean, wow, you know the whole concept of data mesh is, decentralized data, and then governance becomes, you know, a nightmare there, but take it away, Tony. >> We'll put this way, data mesh, you know, the idea at least as proposed by ThoughtWorks. You know, basically it was at least a couple of years ago and the press has been almost uniformly almost uncritical. A good reason for that is for all the problems that basically Sanjeev and Doug and Brad we're just speaking about, which is that we have all this data out there and we don't know what to do about it. Now, that's not a new problem. That was a problem we had in enterprise data warehouses, it was a problem when we had over DoOP data clusters, it's even more of a problem now that data is out in the cloud where the data is not only your data lake, is not only us three, it's all over the place. And it's also including streaming, which I know we'll be talking about later. So the data mesh was a response to that, the idea of that we need to bait, you know, who are the folks that really know best about governance? It's the domain experts. So it was basically data mesh was an architectural pattern and a process. My prediction for this year is that data mesh is going to hit cold heart reality. Because if you do a Google search, basically the published work, the articles on data mesh have been largely, you know, pretty uncritical so far. Basically loading and is basically being a very revolutionary new idea. I don't think it's that revolutionary because we've talked about ideas like this. Brad now you and I met years ago when we were talking about so and decentralizing all of us, but it was at the application level. Now we're talking about it at the data level. And now we have microservices. So there's this thought of have we managed if we're deconstructing apps in cloud native to microservices, why don't we think of data in the same way? My sense this year is that, you know, this has been a very active search if you look at Google search trends, is that now companies, like enterprise are going to look at this seriously. And as they look at it seriously, it's going to attract its first real hard scrutiny, it's going to attract its first backlash. That's not necessarily a bad thing. It means that it's being taken seriously. The reason why I think that you'll start to see basically the cold hearted light of day shine on data mesh is that it's still a work in progress. You know, this idea is basically a couple of years old and there's still some pretty major gaps. The biggest gap is in the area of federated governance. Now federated governance itself is not a new issue. Federated governance decision, we started figuring out like, how can we basically strike the balance between getting let's say between basically consistent enterprise policy, consistent enterprise governance, but yet the groups that understand the data and know how to basically, you know, that, you know, how do we basically sort of balance the two? There's a huge gap there in practice and knowledge. Also to a lesser extent, there's a technology gap which is basically in the self-service technologies that will help teams essentially govern data. You know, basically through the full life cycle, from develop, from selecting the data from, you know, building the pipelines from, you know, determining your access control, looking at quality, looking at basically whether the data is fresh or whether it's trending off course. So my prediction is that it will receive the first harsh scrutiny this year. You are going to see some organization and enterprises declare premature victory when they build some federated query implementations. You going to see vendors start with data mesh wash their products anybody in the data management space that they are going to say that where this basically a pipelining tool, whether it's basically ELT, whether it's a catalog or federated query tool, they will all going to get like, you know, basically promoting the fact of how they support this. Hopefully nobody's going to call themselves a data mesh tool because data mesh is not a technology. We're going to see one other thing come out of this. And this harks back to the metadata that Sanjeev was talking about and of the catalog just as he was talking about. Which is that there's going to be a new focus, every renewed focus on metadata. And I think that's going to spur interest in data fabrics. Now data fabrics are pretty vaguely defined, but if we just take the most elemental definition, which is a common metadata back plane, I think that if anybody is going to get serious about data mesh, they need to look at the data fabric because we all at the end of the day, need to speak, you know, need to read from the same sheet of music. >> So thank you Tony. Dave Menninger, I mean, one of the things that people like about data mesh is it pretty crisply articulate some of the flaws in today's organizational approaches to data. What are your thoughts on this? >> Well, I think we have to start by defining data mesh, right? The term is already getting corrupted, right? Tony said it's going to see the cold hard light of day. And there's a problem right now that there are a number of overlapping terms that are similar but not identical. So we've got data virtualization, data fabric, excuse me for a second. (clears throat) Sorry about that. Data virtualization, data fabric, data federation, right? So I think that it's not really clear what each vendor means by these terms. I see data mesh and data fabric becoming quite popular. I've interpreted data mesh as referring primarily to the governance aspects as originally intended and specified. But that's not the way I see vendors using it. I see vendors using it much more to mean data fabric and data virtualization. So I'm going to comment on the group of those things. I think the group of those things is going to happen. They're going to happen, they're going to become more robust. Our research suggests that a quarter of organizations are already using virtualized access to their data lakes and another half, so a total of three quarters will eventually be accessing their data lakes using some sort of virtualized access. Again, whether you define it as mesh or fabric or virtualization isn't really the point here. But this notion that there are different elements of data, metadata and governance within an organization that all need to be managed collectively. The interesting thing is when you look at the satisfaction rates of those organizations using virtualization versus those that are not, it's almost double, 68% of organizations, I'm sorry, 79% of organizations that were using virtualized access express satisfaction with their access to the data lake. Only 39% express satisfaction if they weren't using virtualized access. >> Oh thank you Dave. Sanjeev we just got about a couple of minutes on this topic, but I know you're speaking or maybe you've always spoken already on a panel with (indistinct) who sort of invented the concept. Governance obviously is a big sticking point, but what are your thoughts on this? You're on mute. (panelist chuckling) >> So my message to (indistinct) and to the community is as opposed to what they said, let's not define it. We spent a whole year defining it, there are four principles, domain, product, data infrastructure, and governance. Let's take it to the next level. I get a lot of questions on what is the difference between data fabric and data mesh? And I'm like I can't compare the two because data mesh is a business concept, data fabric is a data integration pattern. How do you compare the two? You have to bring data mesh a level down. So to Tony's point, I'm on a warpath in 2022 to take it down to what does a data product look like? How do we handle shared data across domains and governance? And I think we are going to see more of that in 2022, or is "operationalization" of data mesh. >> I think we could have a whole hour on this topic, couldn't we? Maybe we should do that. But let's corner. Let's move to Carl. So Carl, you're a database guy, you've been around that block for a while now, you want to talk about graph databases, bring it on. >> Oh yeah. Okay thanks. So I regard graph database as basically the next truly revolutionary database management technology. I'm looking forward for the graph database market, which of course we haven't defined yet. So obviously I have a little wiggle room in what I'm about to say. But this market will grow by about 600% over the next 10 years. Now, 10 years is a long time. But over the next five years, we expect to see gradual growth as people start to learn how to use it. The problem is not that it's not useful, its that people don't know how to use it. So let me explain before I go any further what a graph database is because some of the folks on the call may not know what it is. A graph database organizes data according to a mathematical structure called a graph. The graph has elements called nodes and edges. So a data element drops into a node, the nodes are connected by edges, the edges connect one node to another node. Combinations of edges create structures that you can analyze to determine how things are related. In some cases, the nodes and edges can have properties attached to them which add additional informative material that makes it richer, that's called a property graph. There are two principle use cases for graph databases. There's semantic property graphs, which are use to break down human language texts into the semantic structures. Then you can search it, organize it and answer complicated questions. A lot of AI is aimed at semantic graphs. Another kind is the property graph that I just mentioned, which has a dazzling number of use cases. I want to just point out as I talk about this, people are probably wondering, well, we have relation databases, isn't that good enough? So a relational database defines... It supports what I call definitional relationships. That means you define the relationships in a fixed structure. The database drops into that structure, there's a value, foreign key value, that relates one table to another and that value is fixed. You don't change it. If you change it, the database becomes unstable, it's not clear what you're looking at. In a graph database, the system is designed to handle change so that it can reflect the true state of the things that it's being used to track. So let me just give you some examples of use cases for this. They include entity resolution, data lineage, social media analysis, Customer 360, fraud prevention. There's cybersecurity, there's strong supply chain is a big one actually. There is explainable AI and this is going to become important too because a lot of people are adopting AI. But they want a system after the fact to say, how do the AI system come to that conclusion? How did it make that recommendation? Right now we don't have really good ways of tracking that. Machine learning in general, social network, I already mentioned that. And then we've got, oh gosh, we've got data governance, data compliance, risk management. We've got recommendation, we've got personalization, anti money laundering, that's another big one, identity and access management, network and IT operations is already becoming a key one where you actually have mapped out your operation, you know, whatever it is, your data center and you can track what's going on as things happen there, root cause analysis, fraud detection is a huge one. A number of major credit card companies use graph databases for fraud detection, risk analysis, tracking and tracing turn analysis, next best action, what if analysis, impact analysis, entity resolution and I would add one other thing or just a few other things to this list, metadata management. So Sanjeev, here you go, this is your engine. Because I was in metadata management for quite a while in my past life. And one of the things I found was that none of the data management technologies that were available to us could efficiently handle metadata because of the kinds of structures that result from it, but graphs can, okay? Graphs can do things like say, this term in this context means this, but in that context, it means that, okay? Things like that. And in fact, logistics management, supply chain. And also because it handles recursive relationships, by recursive relationships I mean objects that own other objects that are of the same type. You can do things like build materials, you know, so like parts explosion. Or you can do an HR analysis, who reports to whom, how many levels up the chain and that kind of thing. You can do that with relational databases, but yet it takes a lot of programming. In fact, you can do almost any of these things with relational databases, but the problem is, you have to program it. It's not supported in the database. And whenever you have to program something, that means you can't trace it, you can't define it. You can't publish it in terms of its functionality and it's really, really hard to maintain over time. >> Carl, thank you. I wonder if we could bring Brad in, I mean. Brad, I'm sitting here wondering, okay, is this incremental to the market? Is it disruptive and replacement? What are your thoughts on this phase? >> It's already disrupted the market. I mean, like Carl said, go to any bank and ask them are you using graph databases to get fraud detection under control? And they'll say, absolutely, that's the only way to solve this problem. And it is frankly. And it's the only way to solve a lot of the problems that Carl mentioned. And that is, I think it's Achilles heel in some ways. Because, you know, it's like finding the best way to cross the seven bridges of Koenigsberg. You know, it's always going to kind of be tied to those use cases because it's really special and it's really unique and because it's special and it's unique, it's still unfortunately kind of stands apart from the rest of the community that's building, let's say AI outcomes, as a great example here. Graph databases and AI, as Carl mentioned, are like chocolate and peanut butter. But technologically, you think don't know how to talk to one another, they're completely different. And you know, you can't just stand up SQL and query them. You've got to learn, know what is the Carl? Specter special. Yeah, thank you to, to actually get to the data in there. And if you're going to scale that data, that graph database, especially a property graph, if you're going to do something really complex, like try to understand you know, all of the metadata in your organization, you might just end up with, you know, a graph database winter like we had the AI winter simply because you run out of performance to make the thing happen. So, I think it's already disrupted, but we need to like treat it like a first-class citizen in the data analytics and AI community. We need to bring it into the fold. We need to equip it with the tools it needs to do the magic it does and to do it not just for specialized use cases, but for everything. 'Cause I'm with Carl. I think it's absolutely revolutionary. >> Brad identified the principal, Achilles' heel of the technology which is scaling. When these things get large and complex enough that they spill over what a single server can handle, you start to have difficulties because the relationships span things that have to be resolved over a network and then you get network latency and that slows the system down. So that's still a problem to be solved. >> Sanjeev, any quick thoughts on this? I mean, I think metadata on the word cloud is going to be the largest font, but what are your thoughts here? >> I want to (indistinct) So people don't associate me with only metadata, so I want to talk about something slightly different. dbengines.com has done an amazing job. I think almost everyone knows that they chronicle all the major databases that are in use today. In January of 2022, there are 381 databases on a ranked list of databases. The largest category is RDBMS. The second largest category is actually divided into two property graphs and IDF graphs. These two together make up the second largest number databases. So talking about Achilles heel, this is a problem. The problem is that there's so many graph databases to choose from. They come in different shapes and forms. To Brad's point, there's so many query languages in RDBMS, in SQL. I know the story, but here We've got cipher, we've got gremlin, we've got GQL and then we're proprietary languages. So I think there's a lot of disparity in this space. >> Well, excellent. All excellent points, Sanjeev, if I must say. And that is a problem that the languages need to be sorted and standardized. People need to have a roadmap as to what they can do with it. Because as you say, you can do so many things. And so many of those things are unrelated that you sort of say, well, what do we use this for? And I'm reminded of the saying I learned a bunch of years ago. And somebody said that the digital computer is the only tool man has ever device that has no particular purpose. (panelists chuckle) >> All right guys, we got to move on to Dave Menninger. We've heard about streaming. Your prediction is in that realm, so please take it away. >> Sure. So I like to say that historical databases are going to become a thing of the past. By that I don't mean that they're going to go away, that's not my point. I mean, we need historical databases, but streaming data is going to become the default way in which we operate with data. So in the next say three to five years, I would expect that data platforms and we're using the term data platforms to represent the evolution of databases and data lakes, that the data platforms will incorporate these streaming capabilities. We're going to process data as it streams into an organization and then it's going to roll off into historical database. So historical databases don't go away, but they become a thing of the past. They store the data that occurred previously. And as data is occurring, we're going to be processing it, we're going to be analyzing it, we're going to be acting on it. I mean we only ever ended up with historical databases because we were limited by the technology that was available to us. Data doesn't occur in patches. But we processed it in patches because that was the best we could do. And it wasn't bad and we've continued to improve and we've improved and we've improved. But streaming data today is still the exception. It's not the rule, right? There are projects within organizations that deal with streaming data. But it's not the default way in which we deal with data yet. And so that's my prediction is that this is going to change, we're going to have streaming data be the default way in which we deal with data and how you label it and what you call it. You know, maybe these databases and data platforms just evolved to be able to handle it. But we're going to deal with data in a different way. And our research shows that already, about half of the participants in our analytics and data benchmark research, are using streaming data. You know, another third are planning to use streaming technologies. So that gets us to about eight out of 10 organizations need to use this technology. And that doesn't mean they have to use it throughout the whole organization, but it's pretty widespread in its use today and has continued to grow. If you think about the consumerization of IT, we've all been conditioned to expect immediate access to information, immediate responsiveness. You know, we want to know if an item is on the shelf at our local retail store and we can go in and pick it up right now. You know, that's the world we live in and that's spilling over into the enterprise IT world We have to provide those same types of capabilities. So that's my prediction, historical databases become a thing of the past, streaming data becomes the default way in which we operate with data. >> All right thank you David. Well, so what say you, Carl, the guy who has followed historical databases for a long time? >> Well, one thing actually, every database is historical because as soon as you put data in it, it's now history. They'll no longer reflect the present state of things. But even if that history is only a millisecond old, it's still history. But I would say, I mean, I know you're trying to be a little bit provocative in saying this Dave 'cause you know, as well as I do that people still need to do their taxes, they still need to do accounting, they still need to run general ledger programs and things like that. That all involves historical data. That's not going to go away unless you want to go to jail. So you're going to have to deal with that. But as far as the leading edge functionality, I'm totally with you on that. And I'm just, you know, I'm just kind of wondering if this requires a change in the way that we perceive applications in order to truly be manifested and rethinking the way applications work. Saying that an application should respond instantly, as soon as the state of things changes. What do you say about that? >> I think that's true. I think we do have to think about things differently. It's not the way we designed systems in the past. We're seeing more and more systems designed that way. But again, it's not the default. And I agree 100% with you that we do need historical databases you know, that's clear. And even some of those historical databases will be used in conjunction with the streaming data, right? >> Absolutely. I mean, you know, let's take the data warehouse example where you're using the data warehouse as its context and the streaming data as the present and you're saying, here's the sequence of things that's happening right now. Have we seen that sequence before? And where? What does that pattern look like in past situations? And can we learn from that? >> So Tony Baer, I wonder if you could comment? I mean, when you think about, you know, real time inferencing at the edge, for instance, which is something that a lot of people talk about, a lot of what we're discussing here in this segment, it looks like it's got a great potential. What are your thoughts? >> Yeah, I mean, I think you nailed it right. You know, you hit it right on the head there. Which is that, what I'm seeing is that essentially. Then based on I'm going to split this one down the middle is that I don't see that basically streaming is the default. What I see is streaming and basically and transaction databases and analytics data, you know, data warehouses, data lakes whatever are converging. And what allows us technically to converge is cloud native architecture, where you can basically distribute things. So you can have a node here that's doing the real-time processing, that's also doing... And this is where it leads in or maybe doing some of that real time predictive analytics to take a look at, well look, we're looking at this customer journey what's happening with what the customer is doing right now and this is correlated with what other customers are doing. So the thing is that in the cloud, you can basically partition this and because of basically the speed of the infrastructure then you can basically bring these together and kind of orchestrate them sort of a loosely coupled manner. The other parts that the use cases are demanding, and this is part of it goes back to what Dave is saying. Is that, you know, when you look at Customer 360, when you look at let's say Smart Utility products, when you look at any type of operational problem, it has a real time component and it has an historical component. And having predictive and so like, you know, my sense here is that technically we can bring this together through the cloud. And I think the use case is that we can apply some real time sort of predictive analytics on these streams and feed this into the transactions so that when we make a decision in terms of what to do as a result of a transaction, we have this real-time input. >> Sanjeev, did you have a comment? >> Yeah, I was just going to say that to Dave's point, you know, we have to think of streaming very different because in the historical databases, we used to bring the data and store the data and then we used to run rules on top, aggregations and all. But in case of streaming, the mindset changes because the rules are normally the inference, all of that is fixed, but the data is constantly changing. So it's a completely reversed way of thinking and building applications on top of that. >> So Dave Menninger, there seem to be some disagreement about the default. What kind of timeframe are you thinking about? Is this end of decade it becomes the default? What would you pin? >> I think around, you know, between five to 10 years, I think this becomes the reality. >> I think its... >> It'll be more and more common between now and then, but it becomes the default. And I also want Sanjeev at some point, maybe in one of our subsequent conversations, we need to talk about governing streaming data. 'Cause that's a whole nother set of challenges. >> We've also talked about it rather in two dimensions, historical and streaming, and there's lots of low latency, micro batch, sub-second, that's not quite streaming, but in many cases its fast enough and we're seeing a lot of adoption of near real time, not quite real-time as good enough for many applications. (indistinct cross talk from panelists) >> Because nobody's really taking the hardware dimension (mumbles). >> That'll just happened, Carl. (panelists laughing) >> So near real time. But maybe before you lose the customer, however we define that, right? Okay, let's move on to Brad. Brad, you want to talk about automation, AI, the pipeline people feel like, hey, we can just automate everything. What's your prediction? >> Yeah I'm an AI aficionados so apologies in advance for that. But, you know, I think that we've been seeing automation play within AI for some time now. And it's helped us do a lot of things especially for practitioners that are building AI outcomes in the enterprise. It's helped them to fill skills gaps, it's helped them to speed development and it's helped them to actually make AI better. 'Cause it, you know, in some ways provide some swim lanes and for example, with technologies like AutoML can auto document and create that sort of transparency that we talked about a little bit earlier. But I think there's an interesting kind of conversion happening with this idea of automation. And that is that we've had the automation that started happening for practitioners, it's trying to move out side of the traditional bounds of things like I'm just trying to get my features, I'm just trying to pick the right algorithm, I'm just trying to build the right model and it's expanding across that full life cycle, building an AI outcome, to start at the very beginning of data and to then continue on to the end, which is this continuous delivery and continuous automation of that outcome to make sure it's right and it hasn't drifted and stuff like that. And because of that, because it's become kind of powerful, we're starting to actually see this weird thing happen where the practitioners are starting to converge with the users. And that is to say that, okay, if I'm in Tableau right now, I can stand up Salesforce Einstein Discovery, and it will automatically create a nice predictive algorithm for me given the data that I pull in. But what's starting to happen and we're seeing this from the companies that create business software, so Salesforce, Oracle, SAP, and others is that they're starting to actually use these same ideals and a lot of deep learning (chuckles) to basically stand up these out of the box flip-a-switch, and you've got an AI outcome at the ready for business users. And I am very much, you know, I think that's the way that it's going to go and what it means is that AI is slowly disappearing. And I don't think that's a bad thing. I think if anything, what we're going to see in 2022 and maybe into 2023 is this sort of rush to put this idea of disappearing AI into practice and have as many of these solutions in the enterprise as possible. You can see, like for example, SAP is going to roll out this quarter, this thing called adaptive recommendation services, which basically is a cold start AI outcome that can work across a whole bunch of different vertical markets and use cases. It's just a recommendation engine for whatever you needed to do in the line of business. So basically, you're an SAP user, you look up to turn on your software one day, you're a sales professional let's say, and suddenly you have a recommendation for customer churn. Boom! It's going, that's great. Well, I don't know, I think that's terrifying. In some ways I think it is the future that AI is going to disappear like that, but I'm absolutely terrified of it because I think that what it really does is it calls attention to a lot of the issues that we already see around AI, specific to this idea of what we like to call at Omdia, responsible AI. Which is, you know, how do you build an AI outcome that is free of bias, that is inclusive, that is fair, that is safe, that is secure, that its audible, et cetera, et cetera, et cetera, et cetera. I'd take a lot of work to do. And so if you imagine a customer that's just a Salesforce customer let's say, and they're turning on Einstein Discovery within their sales software, you need some guidance to make sure that when you flip that switch, that the outcome you're going to get is correct. And that's going to take some work. And so, I think we're going to see this move, let's roll this out and suddenly there's going to be a lot of problems, a lot of pushback that we're going to see. And some of that's going to come from GDPR and others that Sanjeev was mentioning earlier. A lot of it is going to come from internal CSR requirements within companies that are saying, "Hey, hey, whoa, hold up, we can't do this all at once. "Let's take the slow route, "let's make AI automated in a smart way." And that's going to take time. >> Yeah, so a couple of predictions there that I heard. AI simply disappear, it becomes invisible. Maybe if I can restate that. And then if I understand it correctly, Brad you're saying there's a backlash in the near term. You'd be able to say, oh, slow down. Let's automate what we can. Those attributes that you talked about are non trivial to achieve, is that why you're a bit of a skeptic? >> Yeah. I think that we don't have any sort of standards that companies can look to and understand. And we certainly, within these companies, especially those that haven't already stood up an internal data science team, they don't have the knowledge to understand when they flip that switch for an automated AI outcome that it's going to do what they think it's going to do. And so we need some sort of standard methodology and practice, best practices that every company that's going to consume this invisible AI can make use of them. And one of the things that you know, is sort of started that Google kicked off a few years back that's picking up some momentum and the companies I just mentioned are starting to use it is this idea of model cards where at least you have some transparency about what these things are doing. You know, so like for the SAP example, we know, for example, if it's convolutional neural network with a long, short term memory model that it's using, we know that it only works on Roman English and therefore me as a consumer can say, "Oh, well I know that I need to do this internationally. "So I should not just turn this on today." >> Thank you. Carl could you add anything, any context here? >> Yeah, we've talked about some of the things Brad mentioned here at IDC and our future of intelligence group regarding in particular, the moral and legal implications of having a fully automated, you know, AI driven system. Because we already know, and we've seen that AI systems are biased by the data that they get, right? So if they get data that pushes them in a certain direction, I think there was a story last week about an HR system that was recommending promotions for White people over Black people, because in the past, you know, White people were promoted and more productive than Black people, but it had no context as to why which is, you know, because they were being historically discriminated, Black people were being historically discriminated against, but the system doesn't know that. So, you know, you have to be aware of that. And I think that at the very least, there should be controls when a decision has either a moral or legal implication. When you really need a human judgment, it could lay out the options for you. But a person actually needs to authorize that action. And I also think that we always will have to be vigilant regarding the kind of data we use to train our systems to make sure that it doesn't introduce unintended biases. In some extent, they always will. So we'll always be chasing after them. But that's (indistinct). >> Absolutely Carl, yeah. I think that what you have to bear in mind as a consumer of AI is that it is a reflection of us and we are a very flawed species. And so if you look at all of the really fantastic, magical looking supermodels we see like GPT-3 and four, that's coming out, they're xenophobic and hateful because the people that the data that's built upon them and the algorithms and the people that build them are us. So AI is a reflection of us. We need to keep that in mind. >> Yeah, where the AI is biased 'cause humans are biased. All right, great. All right let's move on. Doug you mentioned mentioned, you know, lot of people that said that data lake, that term is not going to live on but here's to be, have some lakes here. You want to talk about lake house, bring it on. >> Yes, I do. My prediction is that lake house and this idea of a combined data warehouse and data lake platform is going to emerge as the dominant data management offering. I say offering that doesn't mean it's going to be the dominant thing that organizations have out there, but it's going to be the pro dominant vendor offering in 2022. Now heading into 2021, we already had Cloudera, Databricks, Microsoft, Snowflake as proponents, in 2021, SAP, Oracle, and several of all of these fabric virtualization/mesh vendors joined the bandwagon. The promise is that you have one platform that manages your structured, unstructured and semi-structured information. And it addresses both the BI analytics needs and the data science needs. The real promise there is simplicity and lower cost. But I think end users have to answer a few questions. The first is, does your organization really have a center of data gravity or is the data highly distributed? Multiple data warehouses, multiple data lakes, on premises, cloud. If it's very distributed and you'd have difficulty consolidating and that's not really a goal for you, then maybe that single platform is unrealistic and not likely to add value to you. You know, also the fabric and virtualization vendors, the mesh idea, that's where if you have this highly distributed situation, that might be a better path forward. The second question, if you are looking at one of these lake house offerings, you are looking at consolidating, simplifying, bringing together to a single platform. You have to make sure that it meets both the warehouse need and the data lake need. So you have vendors like Databricks, Microsoft with Azure Synapse. New really to the data warehouse space and they're having to prove that these data warehouse capabilities on their platforms can meet the scaling requirements, can meet the user and query concurrency requirements. Meet those tight SLS. And then on the other hand, you have the Oracle, SAP, Snowflake, the data warehouse folks coming into the data science world, and they have to prove that they can manage the unstructured information and meet the needs of the data scientists. I'm seeing a lot of the lake house offerings from the warehouse crowd, managing that unstructured information in columns and rows. And some of these vendors, Snowflake a particular is really relying on partners for the data science needs. So you really got to look at a lake house offering and make sure that it meets both the warehouse and the data lake requirement. >> Thank you Doug. Well Tony, if those two worlds are going to come together, as Doug was saying, the analytics and the data science world, does it need to be some kind of semantic layer in between? I don't know. Where are you in on this topic? >> (chuckles) Oh, didn't we talk about data fabrics before? Common metadata layer (chuckles). Actually, I'm almost tempted to say let's declare victory and go home. And that this has actually been going on for a while. I actually agree with, you know, much of what Doug is saying there. Which is that, I mean I remember as far back as I think it was like 2014, I was doing a study. I was still at Ovum, (indistinct) Omdia, looking at all these specialized databases that were coming up and seeing that, you know, there's overlap at the edges. But yet, there was still going to be a reason at the time that you would have, let's say a document database for JSON, you'd have a relational database for transactions and for data warehouse and you had basically something at that time that resembles a dupe for what we consider your data life. Fast forward and the thing is what I was seeing at the time is that you were saying they sort of blending at the edges. That was saying like about five to six years ago. And the lake house is essentially on the current manifestation of that idea. There is a dichotomy in terms of, you know, it's the old argument, do we centralize this all you know in a single place or do we virtualize? And I think it's always going to be a union yeah and there's never going to be a single silver bullet. I do see that there are also going to be questions and these are points that Doug raised. That you know, what do you need for your performance there, or for your free performance characteristics? Do you need for instance high concurrency? You need the ability to do some very sophisticated joins, or is your requirement more to be able to distribute and distribute our processing is, you know, as far as possible to get, you know, to essentially do a kind of a brute force approach. All these approaches are valid based on the use case. I just see that essentially that the lake house is the culmination of it's nothing. It's a relatively new term introduced by Databricks a couple of years ago. This is the culmination of basically what's been a long time trend. And what we see in the cloud is that as we start seeing data warehouses as a check box items say, "Hey, we can basically source data in cloud storage, in S3, "Azure Blob Store, you know, whatever, "as long as it's in certain formats, "like, you know parquet or CSP or something like that." I see that as becoming kind of a checkbox item. So to that extent, I think that the lake house, depending on how you define is already reality. And in some cases, maybe new terminology, but not a whole heck of a lot new under the sun. >> Yeah. And Dave Menninger, I mean a lot of these, thank you Tony, but a lot of this is going to come down to, you know, vendor marketing, right? Some people just kind of co-op the term, we talked about you know, data mesh washing, what are your thoughts on this? (laughing) >> Yeah, so I used the term data platform earlier. And part of the reason I use that term is that it's more vendor neutral. We've tried to sort of stay out of the vendor terminology patenting world, right? Whether the term lake houses, what sticks or not, the concept is certainly going to stick. And we have some data to back it up. About a quarter of organizations that are using data lakes today, already incorporate data warehouse functionality into it. So they consider their data lake house and data warehouse one in the same, about a quarter of organizations, a little less, but about a quarter of organizations feed the data lake from the data warehouse and about a quarter of organizations feed the data warehouse from the data lake. So it's pretty obvious that three quarters of organizations need to bring this stuff together, right? The need is there, the need is apparent. The technology is going to continue to converge. I like to talk about it, you know, you've got data lakes over here at one end, and I'm not going to talk about why people thought data lakes were a bad idea because they thought you just throw stuff in a server and you ignore it, right? That's not what a data lake is. So you've got data lake people over here and you've got database people over here, data warehouse people over here, database vendors are adding data lake capabilities and data lake vendors are adding data warehouse capabilities. So it's obvious that they're going to meet in the middle. I mean, I think it's like Tony says, I think we should declare victory and go home. >> As hell. So just a follow-up on that, so are you saying the specialized lake and the specialized warehouse, do they go away? I mean, Tony data mesh practitioners would say or advocates would say, well, they could all live. It's just a node on the mesh. But based on what Dave just said, are we gona see those all morphed together? >> Well, number one, as I was saying before, there's always going to be this sort of, you know, centrifugal force or this tug of war between do we centralize the data, do we virtualize? And the fact is I don't think that there's ever going to be any single answer. I think in terms of data mesh, data mesh has nothing to do with how you're physically implement the data. You could have a data mesh basically on a data warehouse. It's just that, you know, the difference being is that if we use the same physical data store, but everybody's logically you know, basically governing it differently, you know? Data mesh in space, it's not a technology, it's processes, it's governance process. So essentially, you know, I basically see that, you know, as I was saying before that this is basically the culmination of a long time trend we're essentially seeing a lot of blurring, but there are going to be cases where, for instance, if I need, let's say like, Upserve, I need like high concurrency or something like that. There are certain things that I'm not going to be able to get efficiently get out of a data lake. And, you know, I'm doing a system where I'm just doing really brute forcing very fast file scanning and that type of thing. So I think there always will be some delineations, but I would agree with Dave and with Doug, that we are seeing basically a confluence of requirements that we need to essentially have basically either the element, you know, the ability of a data lake and the data warehouse, these need to come together, so I think. >> I think what we're likely to see is organizations look for a converge platform that can handle both sides for their center of data gravity, the mesh and the fabric virtualization vendors, they're all on board with the idea of this converged platform and they're saying, "Hey, we'll handle all the edge cases "of the stuff that isn't in that center of data gravity "but that is off distributed in a cloud "or at a remote location." So you can have that single platform for the center of your data and then bring in virtualization, mesh, what have you, for reaching out to the distributed data. >> As Dave basically said, people are happy when they virtualized data. >> I think we have at this point, but to Dave Menninger's point, they are converging, Snowflake has introduced support for unstructured data. So obviously literally splitting here. Now what Databricks is saying is that "aha, but it's easy to go from data lake to data warehouse "than it is from databases to data lake." So I think we're getting into semantics, but we're already seeing these two converge. >> So take somebody like AWS has got what? 15 data stores. Are they're going to 15 converge data stores? This is going to be interesting to watch. All right, guys, I'm going to go down and list do like a one, I'm going to one word each and you guys, each of the analyst, if you would just add a very brief sort of course correction for me. So Sanjeev, I mean, governance is going to to be... Maybe it's the dog that wags the tail now. I mean, it's coming to the fore, all this ransomware stuff, which you really didn't talk much about security, but what's the one word in your prediction that you would leave us with on governance? >> It's going to be mainstream. >> Mainstream. Okay. Tony Baer, mesh washing is what I wrote down. That's what we're going to see in 2022, a little reality check, you want to add to that? >> Reality check, 'cause I hope that no vendor jumps the shark and close they're offering a data niche product. >> Yeah, let's hope that doesn't happen. If they do, we're going to call them out. Carl, I mean, graph databases, thank you for sharing some high growth metrics. I know it's early days, but magic is what I took away from that, so magic database. >> Yeah, I would actually, I've said this to people too. I kind of look at it as a Swiss Army knife of data because you can pretty much do anything you want with it. That doesn't mean you should. I mean, there's definitely the case that if you're managing things that are in fixed schematic relationship, probably a relation database is a better choice. There are times when the document database is a better choice. It can handle those things, but maybe not. It may not be the best choice for that use case. But for a great many, especially with the new emerging use cases I listed, it's the best choice. >> Thank you. And Dave Menninger, thank you by the way, for bringing the data in, I like how you supported all your comments with some data points. But streaming data becomes the sort of default paradigm, if you will, what would you add? >> Yeah, I would say think fast, right? That's the world we live in, you got to think fast. >> Think fast, love it. And Brad Shimmin, love it. I mean, on the one hand I was saying, okay, great. I'm afraid I might get disrupted by one of these internet giants who are AI experts. I'm going to be able to buy instead of build AI. But then again, you know, I've got some real issues. There's a potential backlash there. So give us your bumper sticker. >> I'm would say, going with Dave, think fast and also think slow to talk about the book that everyone talks about. I would say really that this is all about trust, trust in the idea of automation and a transparent and visible AI across the enterprise. And verify, verify before you do anything. >> And then Doug Henschen, I mean, I think the trend is your friend here on this prediction with lake house is really becoming dominant. I liked the way you set up that notion of, you know, the data warehouse folks coming at it from the analytics perspective and then you get the data science worlds coming together. I still feel as though there's this piece in the middle that we're missing, but your, your final thoughts will give you the (indistinct). >> I think the idea of consolidation and simplification always prevails. That's why the appeal of a single platform is going to be there. We've already seen that with, you know, DoOP platforms and moving toward cloud, moving toward object storage and object storage, becoming really the common storage point for whether it's a lake or a warehouse. And that second point, I think ESG mandates are going to come in alongside GDPR and things like that to up the ante for good governance. >> Yeah, thank you for calling that out. Okay folks, hey that's all the time that we have here, your experience and depth of understanding on these key issues on data and data management really on point and they were on display today. I want to thank you for your contributions. Really appreciate your time. >> Enjoyed it. >> Thank you. >> Thanks for having me. >> In addition to this video, we're going to be making available transcripts of the discussion. We're going to do clips of this as well we're going to put them out on social media. I'll write this up and publish the discussion on wikibon.com and siliconangle.com. No doubt, several of the analysts on the panel will take the opportunity to publish written content, social commentary or both. I want to thank the power panelists and thanks for watching this special CUBE presentation. This is Dave Vellante, be well and we'll see you next time. (bright music)

Published Date : Jan 7 2022

SUMMARY :

and I'd like to welcome you to I as moderator, I'm going to and that is the journey to weigh in on there, and it's going to demand more solid data. Brad, I wonder if you that are specific to individual use cases in the past is because we I like the fact that you the data from, you know, Dave Menninger, I mean, one of the things that all need to be managed collectively. Oh thank you Dave. and to the community I think we could have a after the fact to say, okay, is this incremental to the market? the magic it does and to do it and that slows the system down. I know the story, but And that is a problem that the languages move on to Dave Menninger. So in the next say three to five years, the guy who has followed that people still need to do their taxes, And I agree 100% with you and the streaming data as the I mean, when you think about, you know, and because of basically the all of that is fixed, but the it becomes the default? I think around, you know, but it becomes the default. and we're seeing a lot of taking the hardware dimension That'll just happened, Carl. Okay, let's move on to Brad. And that is to say that, Those attributes that you And one of the things that you know, Carl could you add in the past, you know, I think that what you have to bear in mind that term is not going to and the data science needs. and the data science world, You need the ability to do lot of these, thank you Tony, I like to talk about it, you know, It's just a node on the mesh. basically either the element, you know, So you can have that single they virtualized data. "aha, but it's easy to go from I mean, it's coming to the you want to add to that? I hope that no vendor Yeah, let's hope that doesn't happen. I've said this to people too. I like how you supported That's the world we live I mean, on the one hand I And verify, verify before you do anything. I liked the way you set up We've already seen that with, you know, the time that we have here, We're going to do clips of this as well

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave MenningerPERSON

0.99+

DavePERSON

0.99+

Dave VellantePERSON

0.99+

Doug HenschenPERSON

0.99+

DavidPERSON

0.99+

Brad ShimminPERSON

0.99+

DougPERSON

0.99+

Tony BaerPERSON

0.99+

Dave VelanntePERSON

0.99+

TonyPERSON

0.99+

CarlPERSON

0.99+

BradPERSON

0.99+

Carl OlofsonPERSON

0.99+

MicrosoftORGANIZATION

0.99+

2014DATE

0.99+

Sanjeev MohanPERSON

0.99+

Ventana ResearchORGANIZATION

0.99+

2022DATE

0.99+

OracleORGANIZATION

0.99+

last yearDATE

0.99+

January of 2022DATE

0.99+

threeQUANTITY

0.99+

381 databasesQUANTITY

0.99+

IDCORGANIZATION

0.99+

InformaticaORGANIZATION

0.99+

SnowflakeORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

twoQUANTITY

0.99+

SanjeevPERSON

0.99+

2021DATE

0.99+

GoogleORGANIZATION

0.99+

OmdiaORGANIZATION

0.99+

AWSORGANIZATION

0.99+

SanjMoORGANIZATION

0.99+

79%QUANTITY

0.99+

second questionQUANTITY

0.99+

last weekDATE

0.99+

15 data storesQUANTITY

0.99+

100%QUANTITY

0.99+

SAPORGANIZATION

0.99+

Breaking Analysis: Competition Heats up for Cloud Analytic Databases


 

(enlightening music) >> From theCUBE's studios in Palo Alto and Boston, connecting with thought leaders all around the world, this is a CUBE conversation. >> As we've been reporting, there's a new class of workloads emerging in the cloud. Early cloud was all about IaaS, spinning up storage, compute, and networking infrastructure to support startups, SaaS, easy experimentation, dev test, and increasingly moving business workloads into the cloud. Modern cloud workloads are combining data. They're infusing machine intelligence into application's AI. They're simplifying analytics and scaling with the cloud to deliver business insights in near real time. And at the center of this mega trend is a new class of data stores and analytic databases, what some called data warehouses, a term that I think is outdated really for today's speed of doing business. Welcome to this week's Wikibon CUBE Insights, powered by ETR. In this breaking analysis, we update our view of the emerging cloud native analytic database market. Today, we want to do three things. First, we'll update you on the basics of this market, what you really need to know in the space. The next thing we're going to do, take a look into the competitive environment, and as always, we'll dig into the ETR spending data to see which companies have the momentum in the market, and maybe, ahead of some of the others. Finally, we're going to close with some thoughts on how the competitive landscape is likely to evolve. And we want to answer the question will the cloud giants overwhelm the upstarts, or will the specialists continue to thrive? Let's take a look at some of the basics of this market. We're seeing the evolution of the enterprise data warehouse market space. It's an area that has been critical to supporting reporting and governance requirements for companies, especially post Sarbanes-Oxley, right? However, historically, as I've said many times, EDW has failed to deliver on its promises of a 360-degree view of the business and real-time customer insights. Classic enterprise data warehouses are too cumbersome, they're too complicated, they're too slow, and don't keep pace with the speed of the business. Now, EDW is about a $20 billion market, but the analytic database opportunity in the cloud, we think is much larger, why is that? It's because cloud computing unlocks the ability to rapidly combine multiple data sources, bring data science tooling into the mix, very quickly analyze data, and deliver insights to the business. More importantly, even more importantly, allow a line of business pros to access data in a self service mode. It's a new paradigm that uses the notion of DevOps as applied to the data pipeline, agile data or what we sometimes called DataOps. This is a highly competitive marketplace. In the early part of last decade, you saw Google bring BigQuery to market, Snowflake was founded, AWS did a one-time license deal to acquire the IP to ParAccel, an MPP database, on which it built Redshift. In the latter part of the decade, Microsoft threw his hat in the ring with SQL DW, which Microsoft has now evolved into Azure Synapse. They did so at the Build conference, a few weeks ago. There are other players as well like IBM. So you can see, there's a lot at stake here. The cloud vendors want your data, because they understand this is one of the key ingredients of the next decade of innovation. No longer is Moore's Law, the mainspring of growth. We've said this many times. Rather today, it's data driven, and AI to push insights and scale with the cloud. Here's the interesting dynamic that is emerging in the space. Snowflake is a cloud specialist in this field, having raised more than a billion dollars in venture, a billion four, a billion five. And it's up against the big cloud players, who are moving fast and often stealing moves from Snowflake and driving customers to their respective platforms. Here's an example that we reported on at last year's re:Invent. It's an article by Tony Baer. He wrote this on ZDNet talking about how AWS RA3 separates compute from storage, and of course, this was a founding architectural principle for Snowflake. Here's another example from the information. They were reporting on Microsoft here turning up the heat on Snowflake. And you can see the highlighted text, where the author talks about Microsoft trying to divert customers to its database. So you got this weird dynamic going on. Snowflake doesn't run on-prem, it only runs in the cloud. Runs on AWS, runs on Azure, runs on GCP. The cloud players again, they all want your data to go into their database. So they want you to put their data into their respective platforms. At the same time, they need SaaS ISVs to run in the cloud because it sells infrastructure services. So, is Snowflake, are they going to pivot to run on-prem to try to differentiate from the cloud giants? I asked Frank Slootman, Snowflake's CEO, about the on-prem opportunity, and his perspective earlier this year. Let's listen to what he said. >> Okay, we're not doing this endless hedging that people have done for 20 years, sort of keeping a leg in both worlds. Forget it, this will only work in the public cloud because this is how the utility model works, right? I think everybody is coming to this realization, right? I mean the excuses are running out at this point. We think that it'll, people will come to the public cloud a lot sooner than we will ever come to the private cloud. It's not that we can't run a private cloud, it just diminishes the potential and the value that we that we bring. >> Okay, so pretty definitive statements by Slootman. Now, the question I want to pose today is can Snowflake compete, given the conventional wisdom that we saw in the media articles that the cloud players are going to hurt Snowflake in this market. And if so, how will they compete? Well, let's see what the customers are saying and bring in some ETR survey data. This chart shows two of our favorite metrics from the ETR data set. That is Net Score, which is on the y-axis. Net Score, remember is a measure of spending momentum and market share, which is on the x-axis. Market share is a measure of pervasiveness in the data set. And what we show here are some of the key players in the EDW and cloud native analytic database market. I'll make a couple of points, and we'll dig into this a little bit further. First thing I want to share is you can see from this data, this is the April ETR survey, which was taken at the height of the US lockdown for the pandemic. The survey captured respondents from more than 1,200 CIOs and IT buyers, asking about their spending intentions for analytic databases for the companies that we show here on this kind of x-y chart. So the higher the company is on the vertical axis, the stronger the spending momentum relative to last year, and you could see Snowflake has a 77% Net Score. It leaves all players with AWS Redshift showing very strong, as well. Now in the box in the lower right, you see a chart. Those are the exact Net Scores for all the vendors in the Shared N. A Shared N is a number of citations for that vendor within the N of the 1,269. So you can see the N's are quite large, certainly large enough to feel comfortable with some of the conclusions that we're going to make today. Microsoft, they have a huge footprint. And they somewhat skew the data with its very high market share due to its volume. And you could see where Google sits, it's at good momentum, not as much presence in the marketplace. We've also added a couple of on-prem vendors, Teradata and Oracle primarily on-prem, just for context. They're two companies that compete, they obviously have some cloud offerings, but again, most of their base is on-prem. So what I want to do now is drill into this a little bit more by looking at Snowflake within the individual clouds. So let's look at Snowflake inside of AWS. That's what this next chart shows. So it's customer spending momentum Net Score inside of AWS accounts. And we cut the data to isolate those ETR survey respondents running AWS, so there's an N there of 672 that you can see. The bars show the Net Score granularity for Snowflake and Amazon Redshift. Now, note that we show 96 Shared N responses for Snowflake and 213 for Redshift within the overall N of 672 AWS accounts. The colors show 2020 spending intentions relative to 2019. So let's read left to right here. The replacements are red. And then, the bright red, then, you see spending less by 6% or more, that's the pinkish, and then, flat spending, the gray, increasing spending by more than 6%, that's the forest green, and then, adding to the platform new, that's the lime green. Now, remember Net Score is derived by subtracting the reds from the greens. And you can see that Snowflake has more spending momentum in the AWS cloud than Amazon Redshift, by a small margin, but look at, 80% of the AWS accounts plan to spend more on Snowflake with 35%, they're adding new. Very strong, 76% of AWS customers plan to spend more in 2020 relative to 2019 on Redshift with only 12% adding the platform new. But nonetheless, both are very, very strong, and you can see here, the key point is minimal red and pink, but not a lot of people leaving, not a lot of people spending less. It's going to be critical to see in the June ETR survey, which is in the field this month, if Snowflake is able to hold on to these new accounts that it's gained in the last couple of months. Now, let's look at how Snowflake is doing inside of Azure and compare it to Microsoft. So here's the data from the ETR survey, same view of the data here except we isolate on Azure accounts. The N there is 677 Azure accounts. And we show Snowflake and Microsoft cuts for analytic databases with 83 and 393 Shared N responses respectively. So again, enough I feel to draw some conclusions from this data. Now, note the Net Scores. Snowflake again, winning with 78% versus 51% from Microsoft. 51% is strong but 78% is there's a meaningful lead for Snowflake within the Microsoft base, very interesting. And once again, you see massive new ads, 41% for Snowflake, whereas Microsoft's Net Score is being powered really by growth from existing customers, that forest green. And again, very little red for both companies. So super positive there. Okay, let's take a look now at how Snowflake's doing inside of Google accounts, GCP, Google Cloud Platform. So here's the ETR data, same view of that data, but now, we isolate on GCP accounts. There are fewer, 298 running, then, you got those running Snowflake and Google Analytic databases, largely BigQuery, but could be some others in there but the Snowflake Shared N is 49, it's smaller than on the other clouds, because the company just announced support for GCP, just about a year ago. I think it was last June, but still large enough to draw conclusions from the data. I feel pretty comfortable with that. We're not slicing and dicing it too finely. And you could see Google Shared N at 147. Look at the story. I sound like a broken record. Snowflake is again winning by a meaningful margin if you measure this Net Score or spending momentum. So 77.6% Net Score versus Google at 54%, with Snowflake at 80% in the green. Both companies, very little red. So this is pretty impressive. Snowflake has greater spending momentum than the captive cloud providers in all three of the big US-based clouds. So the big question is can Snowflake hold serve, and continue to grow, and how are they going to to be able to do that? Look, as I said before, this is a very competitive market. We reported that how Snowflake is taking share from some of the legacy on-prem data warehouse players like Teradata and IBM, and from what our data suggests, Lumen and Oracle too. I've reported how IBM is stretched thin on its research and development budget, spends about $6 billion a year, but it's got to spend it across a lot of different lines. Oracle's got more targeted spending R&D. They can target more toward database and direct more of its free cash flow to database than IBM can. But Amazon, and Microsoft, and Google, they don't have that problem. They spend a ton of dough on R&D. And here's an example of the challenge that Snowflake faces. Take a look at this partial list that I drew together of recent innovations. And we show here a set of features that Snowflake has launched in 2020, and AWS since re:Invent last year. I don't have time to go into these, but we do know this that AWS is no slouch at adding features. Amazon, as a company, spends two x more on research and development than Snowflake is worth as a company. So why do I like Snowflake's chances. Well, there are several reasons. First, every dime that Snowflake spends on R&D, go-to market, and ecosystem, goes into making its databases better for its customers. Now, I asked Frank Slootman in the middle of the lockdown how he was allocating precious capital during the pandemic. Let's listen to his response. I've said, there's no layoffs on our radar, number one. Number two, we are hiring. And number three is, we have a higher level of scrutiny on the hires that we're making. And I am very transparent. In other words, I tell people, "Look, I prioritize the roles that are closest "to the drivetrain of the business." Right, it's kind of common sense. But I wanted to make sure that this is how we're thinking about this. There are some roles that are more postponable than others. I'm hiring in engineering, without any reservation because that is the long term, strategic interest of the company. >> But you know, that's only part of the story. And so I want to spend a moment here on some other differentiation, which is multi-cloud. Now, as many of you know, I've been sort of cynical of multi-cloud up until recently. I've said that multi-cloud is a symptom, more of a symptom of multi-vendor and largely, a bunch of vendor marketing hooey today. But that's beginning to change. I see multi-cloud as increasingly viable and important to organizations, not only because CIOs are being asked to clean up the crime scene, as I've often joked, but also because it's increasingly becoming a strategy, right cloud for the right workload. So first, let me reiterate what I said at the top. New workloads are emerging in the cloud, real-time AI, insights extraction, and real-time inferencing is going to be a competitive differentiator. It's all about the data. The new innovation cocktail stems from machine intelligence applied to that data with data science tooling and simplified interfaces that enable scaling with the cloud. You got to have simplicity if you're going to scale and cloud is the best way to scale. It's really the only way to scale globally. So as such, we see cross-cloud exploitation is a real differentiator for Snowflake and others that build high quality cloud native capabilities from multiple clouds, and I want to spend a minute on this topic generally and talk about what it means for Snowflake specifically. Now, we've been pounding the table lately saying that building capabilities natively for the cloud versus putting a wrapper around your stack and making it run in the cloud is key. It's a big difference, why is this? Because cloud native means taking advantage of the primitive capabilities within respective clouds to create the highest performance, the lowest latency, the most efficient services, for that cloud, and the most secure, really exploiting that cloud. And this is enabled only by natively building in the cloud, and that's why Slootman is so dogmatic on this issue. Multi-cloud can be a differentiator for Snowflake. We can think about data lives everywhere. And you want to keep data, where it lives ideally, you don't want to have to move it, whether it's on AWS, Azure, whatever cloud is holding that data. If the answer to your query requires tapping data that lives in multiple clouds across a data network, and the app needs fast answers, then, you need low latency access to that data. So here's what I think. I think Snowflake's game is to automate by extracting, abstracting, sorry, the complexity around the data location, of course, latency is a part of that, metadata, bandwidth concerns, the time to get to query and answers. All those factors that build complexity into the data pipeline and then optimizing that to get insights, irrespective of data location. So a differentiating formula is really to not only be the best analytic database but be cloud agnostic. AWS, for example, they got a cloud agenda, as do Azure and GCP. Their number one answer to multi-cloud is put everything on our cloud. Yeah, Microsoft and Google Anthos, they would argue against that but we know that behind the scenes, that's what they want. They got offerings across clouds but Snowflake is going to make this a top priority. They can lead with that, and they must be best at it. And if Snowflake can do this, it's going to have a very successful future, in our opinion. And by all accounts, and the data that we shared, Snowflake is executing well. All right, so that's a wrap for this week's CUBE Insights, powered by ETR. Don't forget, all these breaking analysis segments are available as podcasts, just Google breaking analysis with Dave Vellante. I publish every week on wikibon.com and siliconangle.com. Check out etr.plus. That's where all the survey data is and reach out to me, I'm @dvellante on Twitter, or you can hit me up on my LinkedIn posts, or email me at david.vellante@siliconangle.com. Thanks for watching, everyone. We'll see you next time. (enlightening music)

Published Date : Jun 5 2020

SUMMARY :

all around the world, this and maybe, ahead of some of the others. I mean the excuses are that the cloud players are going to and cloud is the best way to scale.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

Frank SlootmanPERSON

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Tony BaerPERSON

0.99+

GoogleORGANIZATION

0.99+

Dave VellantePERSON

0.99+

AWSORGANIZATION

0.99+

SlootmanPERSON

0.99+

BostonLOCATION

0.99+

twoQUANTITY

0.99+

TeradataORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

2020DATE

0.99+

77.6%QUANTITY

0.99+

LumenORGANIZATION

0.99+

360-degreeQUANTITY

0.99+

USLOCATION

0.99+

SnowflakeEVENT

0.99+

TodayDATE

0.99+

80%QUANTITY

0.99+

77%QUANTITY

0.99+

two companiesQUANTITY

0.99+

EDWORGANIZATION

0.99+

41%QUANTITY

0.99+

677QUANTITY

0.99+

2019DATE

0.99+

78%QUANTITY

0.99+

54%QUANTITY

0.99+

FirstQUANTITY

0.99+

JuneDATE

0.99+

35%QUANTITY

0.99+

20 yearsQUANTITY

0.99+

SnowflakeTITLE

0.99+

more than a billion dollarsQUANTITY

0.99+

david.vellante@siliconangle.comOTHER

0.99+

51%QUANTITY

0.99+

last yearDATE

0.99+

76%QUANTITY

0.99+

AprilDATE

0.99+

Both companiesQUANTITY

0.99+

672QUANTITY

0.99+

more than 1,200 CIOsQUANTITY

0.99+

SnowflakeORGANIZATION

0.99+

more than 6%QUANTITY

0.99+

firstQUANTITY

0.99+

last JuneDATE

0.99+

todayDATE

0.99+

1,269QUANTITY

0.99+

both companiesQUANTITY

0.99+